LangGraph Tutorial (TypeScript): optimizing token usage for advanced developers
This tutorial shows you how to build a LangGraph workflow in TypeScript that keeps token usage under control by trimming state, routing only necessary context, and summarizing conversation history before it gets expensive. You need this when your agent starts doing real work: multi-turn support, tool calls, and long-running sessions can burn tokens fast if you keep passing the full transcript through every node.
What You'll Need
- •Node.js 18+
- •A TypeScript project with
ts-nodeor a build step - •Packages:
- •
@langchain/langgraph - •
@langchain/openai - •
@langchain/core - •
zod
- •
- •An OpenAI API key in
OPENAI_API_KEY - •A basic understanding of LangGraph state, nodes, and edges
- •Optional but useful:
- •A token counter or logging middleware
- •A Redis or database store if you want persistence later
Step-by-Step
- •Start with a state shape that stores only what each node actually needs. The main trick is to avoid copying the entire message history into every branch when only the latest user message and a short summary are needed.
import { Annotation, START, END, StateGraph } from "@langchain/langgraph";
import { AIMessage, HumanMessage } from "@langchain/core/messages";
const GraphState = Annotation.Root({
messages: Annotation<any[]>({
reducer: (left, right) => left.concat(right),
default: () => [],
}),
summary: Annotation<string>({
reducer: (_left, right) => right,
default: () => "",
}),
route: Annotation<"answer" | "summarize">({
reducer: (_left, right) => right,
default: () => "answer",
}),
});
- •Use a cheap routing node before the expensive model call. This node decides whether to answer directly or summarize first based on message count, which is a simple proxy for token growth.
const routeNode = async (state: typeof GraphState.State) => {
const messageCount = state.messages.length;
return {
route: messageCount > 6 ? "summarize" : "answer",
};
};
const summarizeNode = async (state: typeof GraphState.State) => {
const recent = state.messages.slice(-6);
const summaryText = recent
.map((m) => `${m._getType()}: ${m.content}`)
.join("\n");
return {
summary: state.summary
? `${state.summary}\n${summaryText}`
: summaryText,
messages: [new HumanMessage("Summarized prior context.")],
};
};
- •Build the answer node so it only sends compact context to the model. Instead of passing the full transcript, pass the running summary plus the latest user message; this keeps prompt size stable as sessions grow.
import { ChatOpenAI } from "@langchain/openai";
const model = new ChatOpenAI({
model: "gpt-4o-mini",
temperature: 0,
});
const answerNode = async (state: typeof GraphState.State) => {
const latestUser = [...state.messages]
.reverse()
.find((m) => m._getType() === "human");
const prompt = [
state.summary ? `Conversation summary:\n${state.summary}` : "",
latestUser ? `Latest request:\n${latestUser.content}` : "",
]
.filter(Boolean)
.join("\n\n");
const response = await model.invoke(prompt);
return {
messages: [new AIMessage(response.content as string)],
};
};
- •Wire the graph so long histories get summarized before the answer path runs. The conditional edge is where token control becomes policy instead of an ad hoc habit.
const graph = new StateGraph(GraphState)
.addNode("route", routeNode)
.addNode("summarize", summarizeNode)
.addNode("answer", answerNode)
.addEdge(START, "route")
.addConditionalEdges("route", (state) => state.route, {
summarize: "summarize",
answer: "answer",
})
.addEdge("summarize", "answer")
.addEdge("answer", END);
const app = graph.compile();
- •Run it with a small conversation and inspect how the state evolves. In production you would also log prompt size per node, but even this local run will show that summaries replace raw history after enough turns.
async function main() {
const result1 = await app.invoke({
messages: [new HumanMessage("Help me explain our fraud policy to customers.")],
summary: "",
route: "answer",
});
const result2 = await app.invoke({
messages: [
new HumanMessage("Help me explain our fraud policy to customers."),
new AIMessage("Sure. What audience are you targeting?"),
new HumanMessage("Retail banking customers."),
new AIMessage("Got it."),
new HumanMessage("Now make it shorter and more formal."),
new AIMessage("Understood."),
new HumanMessage("Add one sentence about escalation."),
],
summary:
"Initial request was to explain fraud policy to customers for retail banking.",
route: "answer",
});
console.log(result1.messages.at(-1));
console.log(result2.messages.at(-1));
}
main();
Testing It
Run the script once with a short conversation and once with a longer one. The first path should go straight to answer, while the second should trigger summarize before calling the model.
Check your logs or add simple instrumentation around model.invoke() to compare prompt length between runs. You should see that the long conversation no longer sends every prior message into the model.
If you want a stronger check, print state.summary.length and count messages entering answerNode. The number of raw messages should stay bounded even as total conversation length grows.
Next Steps
- •Add a token budget gate using model-specific token counting before each LLM call
- •Move summaries into persistent storage so they survive process restarts
- •Add tool nodes with strict context filtering so tools never receive unnecessary conversation history
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit