How to Integrate OpenAI for healthcare with AWS Lambda for RAG
If you’re building a healthcare RAG agent, the hard part is not generating text. It’s orchestrating secure retrieval, enforcing guardrails, and keeping the whole flow serverless so it can scale without babysitting infrastructure.
OpenAI for healthcare handles the reasoning and response generation. AWS Lambda gives you an event-driven execution layer for retrieval, preprocessing, and policy checks before anything reaches the model.
Prerequisites
- •An AWS account with permission to create:
- •Lambda functions
- •IAM roles
- •CloudWatch logs
- •AWS CLI configured locally:
- •
aws configure
- •
- •Python 3.11 or later
- •
boto3installed for AWS SDK access - •
openaiPython SDK installed - •An OpenAI API key stored as an environment variable:
- •
OPENAI_API_KEY
- •
- •A vector store or document source for RAG:
- •Amazon OpenSearch Serverless, Pinecone, pgvector, or S3-backed retrieval
- •Basic knowledge of:
- •AWS Lambda handler patterns
- •JSON event payloads
- •Python HTTP requests
Integration Steps
1. Set up your Lambda function as the RAG entrypoint
Your Lambda should accept a query, fetch relevant clinical context, and return a normalized payload to your model layer.
import json
import os
import boto3
lambda_client = boto3.client("lambda")
def handler(event, context):
query = event.get("query", "")
patient_id = event.get("patient_id", "")
payload = {
"query": query,
"patient_id": patient_id,
"top_k": 5
}
# Example: invoke a downstream retrieval Lambda or local retriever
response = lambda_client.invoke(
FunctionName=os.environ["RETRIEVAL_FUNCTION_NAME"],
InvocationType="RequestResponse",
Payload=json.dumps(payload).encode("utf-8")
)
result = json.loads(response["Payload"].read())
return {
"statusCode": 200,
"body": json.dumps(result)
}
This pattern keeps your API Lambda thin. Retrieval logic can evolve independently without touching the orchestration layer.
2. Build the retrieval Lambda that returns grounded context
This Lambda queries your document index and returns snippets that will be passed into OpenAI for healthcare.
import json
def handler(event, context):
query = event["query"]
patient_id = event.get("patient_id")
top_k = event.get("top_k", 5)
# Replace this with OpenSearch / pgvector / Pinecone lookup.
retrieved_chunks = [
{
"source": "discharge_summary_2024_09_12",
"text": "Patient discharged with hypertension plan. Continue lisinopril 10mg daily."
},
{
"source": "lab_report_2024_09_10",
"text": "Creatinine within normal range. No acute kidney injury."
}
]
return {
"query": query,
"patient_id": patient_id,
"contexts": retrieved_chunks[:top_k]
}
In production, this is where you enforce document-level access control and PHI scoping before any generation happens.
3. Call OpenAI for healthcare with the retrieved context
Use the OpenAI SDK to send the user query plus retrieved evidence. If you’re using a healthcare-specific deployment or endpoint in your environment, keep the client code isolated so you can swap base URLs or models without changing business logic.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def generate_answer(query: str, contexts: list[dict]) -> str:
evidence_text = "\n".join(
f"- Source: {c['source']}\n Text: {c['text']}"
for c in contexts
)
prompt = f"""
You are a healthcare assistant.
Use only the provided context to answer.
If the answer is not supported by the context, say you don't have enough information.
Question: {query}
Context:
{evidence_text}
"""
response = client.responses.create(
model="gpt-4.1",
input=prompt,
)
return response.output_text
For healthcare workflows, keep prompts explicit about grounding. Do not let the model infer beyond retrieved evidence unless your policy allows it.
4. Wire Lambda and OpenAI together in one orchestration function
This is the production shape most teams end up with: one Lambda receives the request, another retrieves context, then OpenAI generates the final answer.
import json
import os
import boto3
from openai import OpenAI
lambda_client = boto3.client("lambda")
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def handler(event, context):
query = event["query"]
patient_id = event.get("patient_id")
retrieval_resp = lambda_client.invoke(
FunctionName=os.environ["RETRIEVAL_FUNCTION_NAME"],
InvocationType="RequestResponse",
Payload=json.dumps({
"query": query,
"patient_id": patient_id,
"top_k": 5
}).encode("utf-8")
)
retrieval_data = json.loads(retrieval_resp["Payload"].read())
contexts = retrieval_data["contexts"]
evidence_text = "\n".join(
f"[{i+1}] {c['source']}: {c['text']}"
for i, c in enumerate(contexts)
)
response = client.responses.create(
model="gpt-4.1",
input=f"""Answer using only this evidence:
Question: {query}
Evidence:
{evidence_text}
"""
)
return {
"statusCode": 200,
"body": json.dumps({
"answer": response.output_text,
"sources": [c["source"] for c in contexts]
})
}
This gives you a clean RAG pipeline:
- •request comes into Lambda
- •Lambda fetches grounded context
- •OpenAI produces a constrained answer
- •sources are returned for auditability
5. Add basic guardrails before generation
For healthcare workloads, do not skip validation. Filter unsafe inputs and strip obvious PHI if your workflow does not require it.
import re
PHI_PATTERNS = [
r"\b\d{3}-\d{2}-\d{4}\b", # SSN-like pattern
r"\b\d{10}\b", # phone-like pattern
]
def redact_phi(text: str) -> str:
redacted = text
for pattern in PHI_PATTERNS:
redacted = re.sub(pattern, "[REDACTED]", redacted)
return redacted
def safe_query(query: str) -> str:
return redact_phi(query.strip())
Put this in front of both retrieval and generation if user input may contain sensitive identifiers.
Testing the Integration
Run a local test by invoking your orchestration Lambda with a sample clinical question.
import json
import boto3
lambda_client = boto3.client("lambda")
test_event = {
"query": "What medications should be continued after discharge?",
"patient_id": "12345"
}
response = lambda_client.invoke(
FunctionName="healthcare-rag-orchestrator",
InvocationType="RequestResponse",
Payload=json.dumps(test_event).encode("utf-8")
)
payload = json.loads(response["Payload"].read())
print(payload["body"])
Expected output:
{
"answer": "Continue lisinopril 10mg daily based on the discharge summary.",
"sources": [
"discharge_summary_2024_09_12",
"lab_report_2024_09_10"
]
}
If you get an empty answer or unsupported claims, check these first:
- •retrieval is returning relevant chunks
- •prompt says to use only provided evidence
- •your model call is using the correct SDK method and key
Real-World Use Cases
- •Clinical documentation assistant
- •Retrieve chart notes from secure storage and generate discharge summaries or follow-up instructions.
- •Prior authorization support
- •Pull policy documents and patient records into Lambda, then have OpenAI draft payer-ready justification text.
- •Care navigation agent
- •Answer patient questions from approved clinical content while returning citations back to care coordinators for review.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit