LlamaIndex Tutorial (TypeScript): adding cost tracking for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexadding-cost-tracking-for-intermediate-developerstypescript

This tutorial shows you how to add per-request cost tracking to a LlamaIndex TypeScript app, including token usage, model pricing, and a simple way to log spend for each query. You need this when your agent starts handling real traffic and you want to know what each retrieval or generation call is actually costing.

What You'll Need

  • Node.js 18+
  • A TypeScript project with ts-node or a build step
  • llamaindex installed
  • An OpenAI API key
  • Basic familiarity with VectorStoreIndex, QueryEngine, and async/await
  • A place to store logs or metrics, even if it starts as stdout

Install the package if you haven’t already:

npm install llamaindex dotenv
npm install -D typescript ts-node @types/node

Step-by-Step

  1. Start with a standard LlamaIndex TypeScript setup and load your API key from the environment. The important part here is that every request will later flow through one wrapper where we can measure tokens and compute cost.
import "dotenv/config";
import { Document, VectorStoreIndex } from "llamaindex";

async function main() {
  const docs = [
    new Document({ text: "LlamaIndex helps connect data sources to LLMs." }),
    new Document({ text: "Cost tracking matters when queries hit production traffic." }),
  ];

  const index = await VectorStoreIndex.fromDocuments(docs);
  const queryEngine = index.asQueryEngine();

  const response = await queryEngine.query({
    query: "Why track LLM costs?",
  });

  console.log(response.toString());
}

main().catch(console.error);
  1. Add a tiny pricing helper. This keeps model prices out of your business logic and makes it easy to update rates later without touching your query code.
type Usage = {
  promptTokens?: number;
  completionTokens?: number;
  totalTokens?: number;
};

const PRICING_USD_PER_1M_TOKENS = {
  prompt: 0.15,
  completion: 0.6,
};

export function estimateCost(usage: Usage) {
  const promptTokens = usage.promptTokens ?? 0;
  const completionTokens = usage.completionTokens ?? 0;

  const promptCost = (promptTokens / 1_000_000) * PRICING_USD_PER_1M_TOKENS.prompt;
  const completionCost =
    (completionTokens / 1_000_000) * PRICING_USD_PER_1M_TOKENS.completion;

  return {
    promptTokens,
    completionTokens,
    totalTokens: usage.totalTokens ?? promptTokens + completionTokens,
    costUsd: promptCost + completionCost,
  };
}
  1. Wrap your query execution so you can capture usage after each call. In LlamaIndex TypeScript, the cleanest production pattern is to keep the cost logic outside the index and attach it around the request boundary.
import { Document, VectorStoreIndex } from "llamaindex";
import { estimateCost } from "./cost";

async function trackedQuery(queryText: string) {
  const docs = [
    new Document({ text: "Claims teams use AI to summarize case notes." }),
    new Document({ text: "Finance teams need cost visibility per request." }),
  ];

  const index = await VectorStoreIndex.fromDocuments(docs);
  const queryEngine = index.asQueryEngine();

  const response = await queryEngine.query({ query: queryText });
  const rawUsage = response.raw?.usage as
    | { prompt_tokens?: number; completion_tokens?: number; total_tokens?: number }
    | undefined;

  const usage = estimateCost({
    promptTokens: rawUsage?.prompt_tokens,
    completionTokens: rawUsage?.completion_tokens,
    totalTokens: rawUsage?.total_tokens,
  });

  console.log({
    query: queryText,
    answer: response.toString(),
    usage,
  });
}

trackedQuery("What teams care about AI spend?").catch(console.error);
  1. If you want this to work across multiple requests, move the logging into a reusable service function. That gives you one place to send metrics to Datadog, CloudWatch, Postgres, or whatever your platform uses.
import { Document, VectorStoreIndex } from "llamaindex";
import { estimateCost } from "./cost";

export async function answerWithCost(queryText: string) {
  const docs = [
    new Document({ text: "Underwriters review policy risk using structured data." }),
    new Document({ text: "Every model call should be measured in production." }),
  ];

  const index = await VectorStoreIndex.fromDocuments(docs);
  const engine = index.asQueryEngine();

  const response = await engine.query({ query: queryText });
  const rawUsage = response.raw?.usage as any;

  return {
    answer: response.toString(),
    cost: estimateCost({
      promptTokens: rawUsage?.prompt_tokens,
      completionTokens: rawUsage?.completion_tokens,
      totalTokens: rawUsage?.total_tokens,
    }),
  };
}
  1. Log the result in a format your observability stack can ingest. For most teams, JSON logs are enough to start with because they make aggregation by request ID, tenant ID, or route straightforward.
import { answerWithCost } from "./answerWithCost";

async function main() {
  const result = await answerWithCost("How do I track model spend?");
  
   console.log(
     JSON.stringify(
       {
         event: "llm_request",
         route: "/ask",
         modelProvider: "openai",
         ...result.cost,
       },
       null,
       2
     )
   );
}

main().catch(console.error);

Testing It

Run the script once with a simple question and confirm you get both an answer and a structured cost object in stdout. If response.raw?.usage is undefined, check that your provider returns token usage metadata for the model you’re using.

Next, compare the computed cost against the provider dashboard for a few requests. The numbers won’t always match exactly because of rounding and provider-side accounting differences, but they should be close enough for internal chargeback and monitoring.

If you’re wiring this into an API route, test concurrent requests and make sure each log line includes a request identifier. That’s what lets you break down spend by tenant, endpoint, or user later.

Next Steps

  • Add middleware that injects requestId, tenantId, and featureFlag into every cost log.
  • Persist usage records in Postgres so finance can slice spend by customer or workflow.
  • Extend this pattern to embeddings and reranking calls so you track full pipeline cost, not just generation.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides