LangGraph Tutorial (TypeScript): optimizing token usage for intermediate developers

By Cyprian AaronsUpdated 2026-04-21

langgraphoptimizing-token-usage-for-intermediate-developerstypescript

This tutorial shows you how to build a LangGraph workflow in TypeScript that actively reduces token usage by controlling state size, trimming conversation history, and avoiding unnecessary model calls. You need this when your agent is correct but expensive: long-running workflows, repeated tool loops, and bloated chat state will quietly burn tokens and make latency worse.

What You'll Need

•Node.js 18+
•A TypeScript project with "type": "module" or compatible ESM setup
•langgraph package
•@langchain/openai package
•zod package
•OpenAI API key in OPENAI_API_KEY
•Basic familiarity with LangGraph nodes, edges, and state

Install the dependencies:

npm install langgraph @langchain/openai zod
npm install -D typescript tsx @types/node

Step-by-Step

•Start by defining a small state shape. Token waste usually comes from carrying too much data between nodes, so keep only the fields you actually need for routing and the final answer.

import { Annotation, START, END, StateGraph } from "langgraph";
import { ChatOpenAI } from "@langchain/openai";
import { trimMessages } from "@langchain/core/messages";

const GraphState = Annotation.Root({
  messages: Annotation<any[]>({
    reducer: (_, update) => update,
    default: () => [],
  }),
  summary: Annotation<string>({
    reducer: (_, update) => update,
    default: () => "",
  }),
  needsSearch: Annotation<boolean>({
    reducer: (_, update) => update,
    default: () => false,
  }),
});

•Add a cheap classification node before the expensive generation path. This avoids calling a larger prompt chain when the user request is simple or already answered by existing context.

const classifier = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0,
});

async function routeNode(state: typeof GraphState.State) {
  const lastMessage = state.messages[state.messages.length - 1]?.content ?? "";
  const result = await classifier.invoke([
    {
      role: "system",
      content:
        "Decide if this request needs external search. Reply only with yes or no.",
    },
    { role: "user", content: String(lastMessage) },
  ]);

  return {
    needsSearch: String(result.content).toLowerCase().includes("yes"),
  };
}

•Trim message history before every expensive call. This is the main token saver in conversational graphs because the model only needs the recent turns plus a compact summary.

const llm = new ChatOpenAI({
  model: "gpt-4o",
  temperature: 0.2,
});

async function answerNode(state: typeof GraphState.State) {
  const trimmed = trimMessages(state.messages, {
    maxTokens: 800,
    strategy: "last",
    tokenCounter: (msgs) =>
      msgs.reduce((sum, msg) => sum + String(msg.content).split(/\s+/).length, 0),
  });

  const messages = [
    { role: "system", content: "Answer concisely using only the provided context." },
    ...(state.summary ? [{ role: "system", content: `Summary:\n${state.summary}` }] : []),
    ...trimmed,
  ];

  const response = await llm.invoke(messages);
  return { messages: [...state.messages, response] };
}

•Use a summary node for older context instead of passing full history forever. Summaries are much cheaper than replaying dozens of prior turns into every prompt.

async function summarizeNode(state: typeof GraphState.State) {
  const summarizer = new ChatOpenAI({
    model: "gpt-4o-mini",
    temperature: 0,
  });

  const response = await summarizer.invoke([
    {
      role: "system",
      content:
        "Summarize the conversation in under 120 words. Keep facts, decisions, and open questions.",
    },
    ...state.messages,
  ]);

  return {
    summary:
      state.summary.length > 0
        ? `${state.summary}\n${String(response.content)}`
        : String(response.content),
    messages: state.messages.slice(-6),
  };
}

•Wire the graph so it only expands when needed. The pattern here is simple: classify first, summarize when history grows, then answer with a trimmed context window.

const graph = new StateGraph(GraphState)
  .addNode("route", routeNode)
  .addNode("summarize", summarizeNode)
  .addNode("answer", answerNode)
  .addEdge(START, "route")
  .addConditionalEdges("route", (state) =>
    state.needsSearch ? "summarize" : "answer"
  )
  .addEdge("summarize", "answer")
  .addEdge("answer", END)
  .compile();

•Run it with a short initial state and keep your inputs clean. If you pass raw transcripts, tool dumps, or logs into messages, you defeat the whole optimization strategy.

async function main() {
  const result = await graph.invoke({
    messages: [
      { role: "user", content: "Explain our billing error in one paragraph." },
    ],
    summary: "",
    needsSearch: false,
  });

console.log(result.messages[result.messages.length -1]?.content);
}

main().catch(console.error);

Testing It

Run the script with tsx and compare token usage across short and long conversations. The easiest check is to feed in a long message history once with trimming disabled and once with this graph; you should see lower prompt size and faster responses on the optimized version.

Also test a case where the user asks something that does not need extra context. The classifier should route directly to answering without paying for unnecessary summarization work.

If you want hard numbers, log prompt lengths before each invoke() call and track them over time. In production, pair that with provider usage metrics so you can spot regressions when someone changes state shape or adds verbose tool output.

Next Steps

•Add persistent checkpointing so summaries survive process restarts without replaying old turns.
•Replace word-count trimming with true token counting using your provider’s tokenizer.
•Split tool outputs into separate ephemeral state so they do not get re-sent to the model on every turn

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit