How to Integrate OpenAI for healthcare with AWS Lambda for RAG

By Cyprian AaronsUpdated 2026-04-21
openai-for-healthcareaws-lambdarag

If you’re building a healthcare RAG agent, the hard part is not generating text. It’s orchestrating secure retrieval, enforcing guardrails, and keeping the whole flow serverless so it can scale without babysitting infrastructure.

OpenAI for healthcare handles the reasoning and response generation. AWS Lambda gives you an event-driven execution layer for retrieval, preprocessing, and policy checks before anything reaches the model.

Prerequisites

  • An AWS account with permission to create:
    • Lambda functions
    • IAM roles
    • CloudWatch logs
  • AWS CLI configured locally:
    • aws configure
  • Python 3.11 or later
  • boto3 installed for AWS SDK access
  • openai Python SDK installed
  • An OpenAI API key stored as an environment variable:
    • OPENAI_API_KEY
  • A vector store or document source for RAG:
    • Amazon OpenSearch Serverless, Pinecone, pgvector, or S3-backed retrieval
  • Basic knowledge of:
    • AWS Lambda handler patterns
    • JSON event payloads
    • Python HTTP requests

Integration Steps

1. Set up your Lambda function as the RAG entrypoint

Your Lambda should accept a query, fetch relevant clinical context, and return a normalized payload to your model layer.

import json
import os
import boto3

lambda_client = boto3.client("lambda")

def handler(event, context):
    query = event.get("query", "")
    patient_id = event.get("patient_id", "")

    payload = {
        "query": query,
        "patient_id": patient_id,
        "top_k": 5
    }

    # Example: invoke a downstream retrieval Lambda or local retriever
    response = lambda_client.invoke(
        FunctionName=os.environ["RETRIEVAL_FUNCTION_NAME"],
        InvocationType="RequestResponse",
        Payload=json.dumps(payload).encode("utf-8")
    )

    result = json.loads(response["Payload"].read())
    return {
        "statusCode": 200,
        "body": json.dumps(result)
    }

This pattern keeps your API Lambda thin. Retrieval logic can evolve independently without touching the orchestration layer.

2. Build the retrieval Lambda that returns grounded context

This Lambda queries your document index and returns snippets that will be passed into OpenAI for healthcare.

import json

def handler(event, context):
    query = event["query"]
    patient_id = event.get("patient_id")
    top_k = event.get("top_k", 5)

    # Replace this with OpenSearch / pgvector / Pinecone lookup.
    retrieved_chunks = [
        {
            "source": "discharge_summary_2024_09_12",
            "text": "Patient discharged with hypertension plan. Continue lisinopril 10mg daily."
        },
        {
            "source": "lab_report_2024_09_10",
            "text": "Creatinine within normal range. No acute kidney injury."
        }
    ]

    return {
        "query": query,
        "patient_id": patient_id,
        "contexts": retrieved_chunks[:top_k]
    }

In production, this is where you enforce document-level access control and PHI scoping before any generation happens.

3. Call OpenAI for healthcare with the retrieved context

Use the OpenAI SDK to send the user query plus retrieved evidence. If you’re using a healthcare-specific deployment or endpoint in your environment, keep the client code isolated so you can swap base URLs or models without changing business logic.

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def generate_answer(query: str, contexts: list[dict]) -> str:
    evidence_text = "\n".join(
        f"- Source: {c['source']}\n  Text: {c['text']}"
        for c in contexts
    )

    prompt = f"""
You are a healthcare assistant.
Use only the provided context to answer.
If the answer is not supported by the context, say you don't have enough information.

Question: {query}

Context:
{evidence_text}
"""

    response = client.responses.create(
        model="gpt-4.1",
        input=prompt,
    )

    return response.output_text

For healthcare workflows, keep prompts explicit about grounding. Do not let the model infer beyond retrieved evidence unless your policy allows it.

4. Wire Lambda and OpenAI together in one orchestration function

This is the production shape most teams end up with: one Lambda receives the request, another retrieves context, then OpenAI generates the final answer.

import json
import os
import boto3
from openai import OpenAI

lambda_client = boto3.client("lambda")
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def handler(event, context):
    query = event["query"]
    patient_id = event.get("patient_id")

    retrieval_resp = lambda_client.invoke(
        FunctionName=os.environ["RETRIEVAL_FUNCTION_NAME"],
        InvocationType="RequestResponse",
        Payload=json.dumps({
            "query": query,
            "patient_id": patient_id,
            "top_k": 5
        }).encode("utf-8")
    )

    retrieval_data = json.loads(retrieval_resp["Payload"].read())
    contexts = retrieval_data["contexts"]

    evidence_text = "\n".join(
        f"[{i+1}] {c['source']}: {c['text']}"
        for i, c in enumerate(contexts)
    )

    response = client.responses.create(
        model="gpt-4.1",
        input=f"""Answer using only this evidence:

Question: {query}

Evidence:
{evidence_text}
"""
    )

    return {
        "statusCode": 200,
        "body": json.dumps({
            "answer": response.output_text,
            "sources": [c["source"] for c in contexts]
        })
    }

This gives you a clean RAG pipeline:

  • request comes into Lambda
  • Lambda fetches grounded context
  • OpenAI produces a constrained answer
  • sources are returned for auditability

5. Add basic guardrails before generation

For healthcare workloads, do not skip validation. Filter unsafe inputs and strip obvious PHI if your workflow does not require it.

import re

PHI_PATTERNS = [
    r"\b\d{3}-\d{2}-\d{4}\b",   # SSN-like pattern
    r"\b\d{10}\b",              # phone-like pattern
]

def redact_phi(text: str) -> str:
    redacted = text
    for pattern in PHI_PATTERNS:
      redacted = re.sub(pattern, "[REDACTED]", redacted)
    return redacted

def safe_query(query: str) -> str:
    return redact_phi(query.strip())

Put this in front of both retrieval and generation if user input may contain sensitive identifiers.

Testing the Integration

Run a local test by invoking your orchestration Lambda with a sample clinical question.

import json
import boto3

lambda_client = boto3.client("lambda")

test_event = {
    "query": "What medications should be continued after discharge?",
    "patient_id": "12345"
}

response = lambda_client.invoke(
    FunctionName="healthcare-rag-orchestrator",
    InvocationType="RequestResponse",
    Payload=json.dumps(test_event).encode("utf-8")
)

payload = json.loads(response["Payload"].read())
print(payload["body"])

Expected output:

{
  "answer": "Continue lisinopril 10mg daily based on the discharge summary.",
  "sources": [
    "discharge_summary_2024_09_12",
    "lab_report_2024_09_10"
  ]
}

If you get an empty answer or unsupported claims, check these first:

  • retrieval is returning relevant chunks
  • prompt says to use only provided evidence
  • your model call is using the correct SDK method and key

Real-World Use Cases

  • Clinical documentation assistant
    • Retrieve chart notes from secure storage and generate discharge summaries or follow-up instructions.
  • Prior authorization support
    • Pull policy documents and patient records into Lambda, then have OpenAI draft payer-ready justification text.
  • Care navigation agent
    • Answer patient questions from approved clinical content while returning citations back to care coordinators for review.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides