How to Fix 'memory not persisting' in LangChain (Python)
If your LangChain app says memory is “not persisting,” it usually means the chain runs, but the conversation state gets reset between calls. In practice, this shows up when you expect chat history to survive across turns, but you’re recreating the chain, using the wrong memory object, or never wiring memory into the execution path.
The Most Common Cause
The #1 cause is simple: you create memory, but you don’t reuse the same chain instance across requests.
A lot of developers do this in a web handler or notebook cell:
| Broken pattern | Fixed pattern |
|---|---|
| Rebuilds chain every request | Keeps chain + memory alive across turns |
# BROKEN
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
def ask(question: str):
llm = ChatOpenAI(model="gpt-4o-mini")
memory = ConversationBufferMemory()
chain = ConversationChain(llm=llm, memory=memory)
return chain.invoke({"input": question})
print(ask("My name is Sam"))
print(ask("What's my name?"))
This looks fine, but every call creates a fresh ConversationBufferMemory(). So when the second request comes in, the model has no prior messages.
Here’s the fixed version:
# FIXED
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
llm = ChatOpenAI(model="gpt-4o-mini")
memory = ConversationBufferMemory()
chain = ConversationChain(llm=llm, memory=memory)
print(chain.invoke({"input": "My name is Sam"}))
print(chain.invoke({"input": "What's my name?"}))
If you’re using LangChain in a FastAPI app, the real fix is usually to store memory per session/user, not per function call. Otherwise every HTTP request gets a blank slate.
Other Possible Causes
1) You’re using the wrong input key
Some chains expect input, others expect question, and some LCEL setups require explicit message wiring. If your memory hooks are correct but inputs are mismatched, LangChain may raise errors like:
- •
ValueError: Missing some input keys: {'history'} - •
KeyError: 'input'
Example:
# BROKEN
chain.invoke({"question": "Hello"}) # chain expects {"input": ...}
Fix:
# FIXED
chain.invoke({"input": "Hello"})
If you’re using ConversationBufferMemory, check memory.memory_key and make sure the prompt includes that variable.
2) Memory is not attached to the runnable/chain
With newer LangChain patterns, people often build an LLM runnable and assume memory will “just work.” It won’t unless you explicitly wire it in.
Broken:
# BROKEN
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
response = llm.invoke("Remember my name is Sam")
Fixed:
# FIXED
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
llm = ChatOpenAI(model="gpt-4o-mini")
memory = ConversationBufferMemory()
chain = ConversationChain(llm=llm, memory=memory)
If you’re on LCEL (RunnableSequence / RunnableWithMessageHistory), use the history wrapper instead of expecting plain .invoke() to persist state.
3) You’re creating a new session ID every time
This happens with RunnableWithMessageHistory and chat stores. If each request gets a different session_id, LangChain correctly treats it as a different conversation.
Broken config:
config = {"configurable": {"session_id": str(uuid.uuid4())}}
That guarantees a new history on every call.
Fixed:
config = {"configurable": {"session_id": user_id}}
Use a stable key:
- •authenticated user ID
- •browser session ID
- •tenant + user composite key
4) Your store is in-memory only and your app restarts
If you use InMemoryChatMessageHistory, it persists only for the life of the Python process. Restart Gunicorn, reload Uvicorn, redeploy Docker, or scale to another pod, and history disappears.
Example:
from langchain_core.chat_history import InMemoryChatMessageHistory
store = {}
store["user_123"] = InMemoryChatMessageHistory()
That works locally, but not across restarts or multiple workers.
Fix: use Redis, Postgres, DynamoDB, or another persistent backend for chat history.
How to Debug It
- •
Print the memory contents after each turn
- •If messages are empty after turn one, your memory isn’t being reused.
- •Check
memory.load_memory_variables({})or inspect your message store directly.
- •
Confirm the same session/user key is used
- •Log
session_id,user_id, or whatever key backs your chat history. - •If it changes between requests, persistence will fail by design.
- •Log
- •
Verify prompt variables match memory keys
- •If your prompt expects
{history}but memory uses"chat_history", LangChain won’t inject it correctly. - •Common error:
- •
ValueError: Prompt missing required variables: {'history'}
- •
- •If your prompt expects
- •
Check whether you are rebuilding objects per request
- •In Flask/FastAPI/Streamlit/Jupyter notebooks, object scope matters.
- •If you see code like
memory = ConversationBufferMemory()inside a route handler, that’s usually the bug.
Prevention
- •Use a persistent chat store for anything beyond local prototyping.
- •Redis-backed message history is a solid default for web apps.
- •Keep session identity stable.
- •One user/session should map to one conversation thread.
- •Test multi-turn behavior explicitly.
- •Write an integration test that sends two prompts and asserts the second turn sees prior context.
If you want one rule to remember: LangChain memory does not persist because you “enabled” it; it persists because you keep reusing the same conversation state with a stable storage layer.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit