How to Integrate LlamaIndex for investment banking with Supabase for startups

By Cyprian AaronsUpdated 2026-04-21

llamaindex-for-investment-bankingsupabasestartups

Why this integration matters

If you're building an AI agent for a startup that touches investment banking workflows, you need two things: structured retrieval over financial documents and durable state for users, sessions, and outputs. LlamaIndex handles the retrieval layer; Supabase gives you Postgres, auth, and storage without standing up extra infrastructure.

The useful pattern is simple: ingest deal docs, filings, or research into LlamaIndex, then persist agent memory, metadata, and workflow state in Supabase. That gives you a system that can answer banker-style questions, keep context across sessions, and survive real production traffic.

Prerequisites

•Python 3.10+
•
A Supabase project with:
- •SUPABASE_URL
- •SUPABASE_ANON_KEY or service role key for server-side jobs
•Access to an LLM provider used by LlamaIndex, such as OpenAI
•
Installed packages:
- •llama-index
- •supabase
- •python-dotenv
•
A folder of investment banking documents:
- •pitch decks
- •CIMs
- •earnings transcripts
- •SEC filings
•Basic familiarity with Postgres tables and Python async/sync code

Install dependencies:

pip install llama-index supabase python-dotenv

Integration Steps

1) Set up environment variables

Keep credentials out of code. For startup systems, this matters because you will eventually separate local dev, staging, and production keys.

import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
SUPABASE_URL = os.getenv("SUPABASE_URL")
SUPABASE_SERVICE_ROLE_KEY = os.getenv("SUPABASE_SERVICE_ROLE_KEY")

assert OPENAI_API_KEY and SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY

2) Create the Supabase client

Use the Python client to store agent traces, document metadata, or retrieval results. For backend services, use the service role key.

from supabase import create_client, Client

supabase: Client = create_client(SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY)

# Example table for agent runs:
# id (uuid), user_id (text), query (text), answer (text), created_at (timestamp)

A practical pattern is to keep three tables:

•documents for source metadata
•agent_runs for user queries and responses
•citations for chunk-level provenance

3) Load financial documents into LlamaIndex

For investment banking use cases, document ingestion is the core. Start with local files and build your index from parsed text.

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

# Configure models used by LlamaIndex
llm = OpenAI(model="gpt-4o-mini", api_key=OPENAI_API_KEY)
embed_model = OpenAIEmbedding(api_key=OPENAI_API_KEY)

docs = SimpleDirectoryReader("./banking_docs").load_data()

index = VectorStoreIndex.from_documents(
    docs,
    embed_model=embed_model,
)
query_engine = index.as_query_engine(llm=llm)

This works well for startup teams that need fast retrieval over a small-to-medium corpus before moving to a dedicated vector database.

4) Write retrieval results back to Supabase

Now connect the two layers. Query with LlamaIndex, then persist the response so your agent can resume later or support audit trails.

user_query = "What were the main revenue drivers in the latest earnings transcript?"

response = query_engine.query(user_query)

record = {
    "user_id": "startup-user-123",
    "query": user_query,
    "answer": str(response),
}

supabase.table("agent_runs").insert(record).execute()
print(response)

If you want stronger traceability in banking workflows, store citations too:

citations = []

for node in getattr(response, "source_nodes", []):
    citations.append({
        "user_id": "startup-user-123",
        "source_text": node.node.text[:500],
        "score": float(node.score or 0),
    })

if citations:
    supabase.table("citations").insert(citations).execute()

5) Add a startup-friendly chat loop with persisted state

For an AI agent system, you usually need session continuity. Store conversation turns in Supabase and rehydrate them when a user returns.

def save_message(session_id: str, role: str, content: str):
    supabase.table("chat_messages").insert({
        "session_id": session_id,
        "role": role,
        "content": content,
    }).execute()

def get_messages(session_id: str):
    result = (
        supabase.table("chat_messages")
        .select("*")
        .eq("session_id", session_id)
        .order("created_at")
        .execute()
    )
    return result.data

session_id = "deal-room-001"
save_message(session_id, "user", "Summarize EBITDA trends from the last quarter.")
answer = query_engine.query("Summarize EBITDA trends from the last quarter.")
save_message(session_id, "assistant", str(answer))

That pattern gives you a lightweight memory layer without forcing all state into LlamaIndex objects.

Testing the Integration

Run one end-to-end check: query documents through LlamaIndex and confirm the result lands in Supabase.

test_query = "List key risks mentioned in the latest annual report."
result = query_engine.query(test_query)

supabase.table("agent_runs").insert({
    "user_id": "test-user",
    "query": test_query,
    "answer": str(result),
}).execute()

lookup = (
    supabase.table("agent_runs")
    .select("*")
    .eq("user_id", "test-user")
    .eq("query", test_query)
    .limit(1)
    .execute()
)

print("Stored rows:", len(lookup.data))
print("Answer preview:", lookup.data[0]["answer"][:200])

Expected output:

Stored rows: 1
Answer preview: The report highlights...

If that passes, your retrieval layer is working and your persistence layer is recording outputs correctly.

Real-World Use Cases

•
Deal room assistant
- •Let bankers ask questions over CIMs, teasers, diligence notes, and earnings transcripts.
- •Persist every question-answer pair in Supabase for review and compliance.
•
Startup finance copilot
- •Build an internal agent that answers “What changed since last board deck?” using indexed board materials.
- •Store board pack versions and session history in Supabase.
•
Research workflow automation
- •Pull public filings into LlamaIndex for semantic search.
- •Save extracted insights and analyst notes into Supabase tables for downstream dashboards or alerting.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit