How to Integrate Azure OpenAI for fintech with CosmosDB for RAG

By Cyprian AaronsUpdated 2026-04-21
azure-openai-for-fintechcosmosdbrag

Combining Azure OpenAI with Cosmos DB gives you a practical RAG stack for fintech agents: keep regulated documents, policies, transaction notes, and product knowledge in a queryable store, then ground model responses on that data instead of free-form generation. That matters when your agent needs to answer questions about lending policy, AML procedures, claims rules, or account servicing with traceable context.

The pattern is simple: Cosmos DB stores the source-of-truth chunks and metadata, Azure OpenAI generates embeddings and answers, and your agent retrieves the best matches before it responds.

Prerequisites

  • An Azure subscription
  • An Azure OpenAI resource with:
    • a chat model deployment
    • an embeddings deployment
  • An Azure Cosmos DB account
    • API for NoSQL enabled
    • database and container created
  • Python 3.10+
  • Installed packages:
    • azure-openai or openai Python SDK compatible with Azure OpenAI
    • azure-cosmos
    • python-dotenv
  • Environment variables set:
    • AZURE_OPENAI_ENDPOINT
    • AZURE_OPENAI_API_KEY
    • AZURE_OPENAI_CHAT_DEPLOYMENT
    • AZURE_OPENAI_EMBEDDING_DEPLOYMENT
    • COSMOS_ENDPOINT
    • COSMOS_KEY
    • COSMOS_DATABASE
    • COSMOS_CONTAINER

Integration Steps

1) Install dependencies and initialize clients

Use the current OpenAI Python SDK for Azure OpenAI and the Cosmos DB SDK for storage access.

pip install openai azure-cosmos python-dotenv
import os
from dotenv import load_dotenv
from openai import AzureOpenAI
from azure.cosmos import CosmosClient

load_dotenv()

aoai_client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version="2024-02-15-preview",
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
)

cosmos_client = CosmosClient(
    url=os.environ["COSMOS_ENDPOINT"],
    credential=os.environ["COSMOS_KEY"],
)

db = cosmos_client.get_database_client(os.environ["COSMOS_DATABASE"])
container = db.get_container_client(os.environ["COSMOS_CONTAINER"])

2) Create a small document ingestion pipeline

For RAG, store chunks with metadata and embeddings. In fintech, keep fields like source, document type, revision date, and compliance tags.

from typing import List

def chunk_text(text: str, chunk_size: int = 800) -> List[str]:
    return [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]

def embed_text(text: str) -> list[float]:
    response = aoai_client.embeddings.create(
        model=os.environ["AZURE_OPENAI_EMBEDDING_DEPLOYMENT"],
        input=text,
    )
    return response.data[0].embedding

def ingest_document(doc_id: str, title: str, text: str, source: str):
    chunks = chunk_text(text)
    for idx, chunk in enumerate(chunks):
        item = {
            "id": f"{doc_id}-{idx}",
            "docId": doc_id,
            "title": title,
            "chunkIndex": idx,
            "content": chunk,
            "source": source,
            "embedding": embed_text(chunk),
        }
        container.upsert_item(item)

A few production notes:

  • Keep each chunk under your embedding token budget.
  • Store the embedding in Cosmos DB so retrieval stays local to your data plane.
  • Add metadata filters later if you need tenant isolation or document-level ACLs.

3) Retrieve relevant chunks from Cosmos DB

Cosmos DB NoSQL supports vector search when configured for vector indexing. Query by similarity first, then pass only the top matches to the chat model.

def retrieve_context(query: str, top_k: int = 3) -> list[dict]:
    query_embedding = embed_text(query)

    sql = """
    SELECT TOP @top_k c.id, c.title, c.content, c.source
    FROM c
    ORDER BY VectorDistance(c.embedding, @query_embedding)
    """

    params = [
        {"name": "@top_k", "value": top_k},
        {"name": "@query_embedding", "value": query_embedding},
    ]

    items = list(container.query_items(
        query=sql,
        parameters=params,
        enable_cross_partition_query=True,
    ))
    return items

If your Cosmos account uses a different vector indexing setup, keep the retrieval contract the same. The key point is that your agent asks Cosmos for semantically relevant chunks before calling Azure OpenAI.

4) Generate a grounded answer with Azure OpenAI

Now build the final prompt from retrieved chunks. In fintech systems, make the assistant cite its sources and avoid answering outside retrieved context.

def answer_question(question: str) -> str:
    contexts = retrieve_context(question)

    context_block = "\n\n".join(
        f"[Source: {c['source']}] {c['content']}"
        for c in contexts
    )

    messages = [
        {
            "role": "system",
            "content": (
                "You are a fintech assistant. Answer only using the provided context. "
                "If the context is insufficient, say you do not have enough information."
            ),
        },
        {
            "role": "user",
            "content": f"Context:\n{context_block}\n\nQuestion: {question}",
        },
    ]

    response = aoai_client.chat.completions.create(
        model=os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT"],
        messages=messages,
        temperature=0.2,
    )

    return response.choices[0].message.content

This is the core RAG loop:

  • embed query
  • fetch nearest chunks from Cosmos DB
  • send grounded context to Azure OpenAI
  • return an answer constrained by source material

5) Wire ingestion and querying together

Use one script to seed documents and test retrieval end-to-end.

if __name__ == "__main__":
    sample_policy = """
    Loan applications above $250,000 require manual underwriting review.
    KYC verification must be completed before disbursement.
    Suspicious activity must be escalated to compliance within one business day.
    """

    ingest_document(
        doc_id="policy-loan-001",
        title="Loan Underwriting Policy",
        text=sample_policy,
        source="internal-policy-v3",
    )

    print(answer_question("What happens when a loan application exceeds $250,000?"))

Testing the Integration

Run a direct verification against both services:

test_question = "When is manual underwriting required?"
result = answer_question(test_question)
print(result)

Expected output:

Manual underwriting is required for loan applications above $250,000.

If retrieval is working correctly, you should see an answer grounded in your stored policy text rather than a generic model response. If you get vague output or hallucinated details, inspect these first:

  • vector indexing configuration in Cosmos DB
  • whether embeddings were stored correctly as arrays of floats
  • whether your chat deployment name matches the Azure OpenAI deployment exactly

Real-World Use Cases

  • Policy Q&A assistant
    • Answer underwriting, claims handling, AML/KYC, or fraud investigation questions from internal policy documents.
  • Customer support copilot
    • Ground responses on product terms, fee schedules, dispute workflows, and account servicing rules.
  • Compliance analyst agent
    • Search regulatory summaries and internal controls to generate draft responses with source references.

This integration works because each system does one job well. Cosmos DB stores durable retrieval state close to your data; Azure OpenAI turns that retrieved context into usable answers for agents that need to behave like they understand your fintech domain.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides