How to Integrate Azure OpenAI for banking with CosmosDB for RAG

By Cyprian AaronsUpdated 2026-04-21

azure-openai-for-bankingcosmosdbrag

Azure OpenAI plus Cosmos DB is a practical stack for bank-grade RAG systems. You use Azure OpenAI to generate grounded answers, and Cosmos DB to store and retrieve policy docs, product terms, KYC rules, and support knowledge with low-latency vector search.

For banking, this matters because the assistant needs controlled retrieval, auditability, and data residency options. You are not just “chatting with documents”; you are building an agent that can answer customer and ops questions from approved internal sources.

Prerequisites

•
An Azure subscription with:
- •Azure OpenAI resource
- •Azure Cosmos DB for NoSQL account
•
Deployed Azure OpenAI model:
- •Chat model like gpt-4o or gpt-4.1
- •Embedding model like text-embedding-3-large or text-embedding-3-small
•A Cosmos DB database and container with vector search enabled
•Python 3.10+
•
Installed packages:
- •openai
- •azure-cosmos
- •python-dotenv
•
Environment variables set:
- •AZURE_OPENAI_ENDPOINT
- •AZURE_OPENAI_API_KEY
- •AZURE_OPENAI_CHAT_DEPLOYMENT
- •AZURE_OPENAI_EMBEDDING_DEPLOYMENT
- •COSMOS_ENDPOINT
- •COSMOS_KEY
- •COSMOS_DB_NAME
- •COSMOS_CONTAINER_NAME

Integration Steps

•

Set up your clients and configuration.

Keep your connection details out of code. For banking systems, use Key Vault in production; .env is fine for local development.

import os
from dotenv import load_dotenv
from openai import AzureOpenAI
from azure.cosmos import CosmosClient

load_dotenv()

azure_openai_client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_version="2024-06-01"
)

cosmos_client = CosmosClient(
    url=os.environ["COSMOS_ENDPOINT"],
    credential=os.environ["COSMOS_KEY"]
)

db = cosmos_client.get_database_client(os.environ["COSMOS_DB_NAME"])
container = db.get_container_client(os.environ["COSMOS_CONTAINER_NAME"])

•

Create embeddings for your banking documents.

Use the embedding deployment to convert policy text into vectors. In RAG, this is the indexable representation you store in Cosmos DB.

def embed_text(text: str) -> list[float]:
    response = azure_openai_client.embeddings.create(
        model=os.environ["AZURE_OPENAI_EMBEDDING_DEPLOYMENT"],
        input=text
    )
    return response.data[0].embedding

sample_doc = {
    "id": "bank-policy-001",
    "doc_type": "credit_card_policy",
    "title": "Credit Card Fee Policy",
    "content": "Annual fees are waived for premium customers with monthly deposits above threshold.",
}

sample_doc["embedding"] = embed_text(sample_doc["content"])
print(len(sample_doc["embedding"]))

•
Store documents in Cosmos DB with their vectors.

Your container should be configured with a vector policy and indexing policy that matches your embedding dimensions. The exact setup depends on your Cosmos DB API version, but the application-side write is straightforward.
```
container.upsert_item({
    "id": sample_doc["id"],
    "doc_type": sample_doc["doc_type"],
    "title": sample_doc["title"],
    "content": sample_doc["content"],
    "embedding": sample_doc["embedding"]
})
```

•

Retrieve top matches using vector search.

Query Cosmos DB with the user question embedding. For RAG, you only send the top retrieved chunks to Azure OpenAI, not the whole corpus.

def retrieve_context(query: str, top_k: int = 3) -> list[dict]:
    query_embedding = embed_text(query)

    query_spec = {
        "query": """
            SELECT TOP @top_k c.id, c.title, c.content
            FROM c
            ORDER BY VectorDistance(c.embedding, @query_embedding)
        """,
        "parameters": [
            {"name": "@top_k", "value": top_k},
            {"name": "@query_embedding", "value": query_embedding},
        ]
    }

    return list(container.query_items(
        query=query_spec["query"],
        parameters=query_spec["parameters"],
        enable_cross_partition_query=True
    ))

hits = retrieve_context("What are the annual fee waiver rules?")
for hit in hits:
    print(hit["title"], hit["content"])

•

Generate a grounded answer with Azure OpenAI.

Pass retrieved context into the chat completion call. In banking, keep the prompt strict: answer only from retrieved content and say when data is insufficient.

def answer_question(question: str) -> str:
    docs = retrieve_context(question)

    context_block = "\n\n".join(
        f"[{i+1}] {doc['title']}: {doc['content']}"
        for i, doc in enumerate(docs)
    )

    messages = [
        {
            "role": "system",
            "content": (
                "You are a banking assistant. Answer only using the provided context. "
                "If the answer is not in the context, say you do not have enough information."
            )
        },
        {
            "role": "user",
            "content": f"Context:\n{context_block}\n\nQuestion: {question}"
        }
    ]

    response = azure_openai_client.chat.completions.create(
        model=os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT"],
        messages=messages,
        temperature=0.2,
        max_tokens=300
    )

    return response.choices[0].message.content

print(answer_question("When is the annual fee waived?"))

Testing the Integration

Run a simple end-to-end test with one known document and one question that should match it.

def test_rag_flow():
    question = "Who gets an annual fee waiver?"
    answer = answer_question(question)
    print("QUESTION:", question)
    print("ANSWER:", answer)

test_rag_flow()

Expected output:

QUESTION: Who gets an annual fee waiver?
ANSWER: Annual fees are waived for premium customers with monthly deposits above threshold.

If retrieval works but generation does not, check:

•The embedding deployment name matches what you deployed in Azure OpenAI
•The Cosmos DB container has vector indexing enabled
•The prompt includes only retrieved context
•The chat deployment name is correct

Real-World Use Cases

•Customer service assistant for card fees, transfer limits, dispute timelines, and loan eligibility.
•Internal ops copilot for compliance teams querying policy manuals, AML procedures, and escalation rules.
•Branch staff assistant that answers product questions from approved knowledge bases without exposing raw backend systems.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit