How to Fix 'memory not persisting' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-21
memory-not-persistinglangchainpython

If your LangChain app says memory is “not persisting,” it usually means the chain runs, but the conversation state gets reset between calls. In practice, this shows up when you expect chat history to survive across turns, but you’re recreating the chain, using the wrong memory object, or never wiring memory into the execution path.

The Most Common Cause

The #1 cause is simple: you create memory, but you don’t reuse the same chain instance across requests.

A lot of developers do this in a web handler or notebook cell:

Broken patternFixed pattern
Rebuilds chain every requestKeeps chain + memory alive across turns
# BROKEN
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

def ask(question: str):
    llm = ChatOpenAI(model="gpt-4o-mini")
    memory = ConversationBufferMemory()
    chain = ConversationChain(llm=llm, memory=memory)
    return chain.invoke({"input": question})

print(ask("My name is Sam"))
print(ask("What's my name?"))

This looks fine, but every call creates a fresh ConversationBufferMemory(). So when the second request comes in, the model has no prior messages.

Here’s the fixed version:

# FIXED
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

llm = ChatOpenAI(model="gpt-4o-mini")
memory = ConversationBufferMemory()
chain = ConversationChain(llm=llm, memory=memory)

print(chain.invoke({"input": "My name is Sam"}))
print(chain.invoke({"input": "What's my name?"}))

If you’re using LangChain in a FastAPI app, the real fix is usually to store memory per session/user, not per function call. Otherwise every HTTP request gets a blank slate.

Other Possible Causes

1) You’re using the wrong input key

Some chains expect input, others expect question, and some LCEL setups require explicit message wiring. If your memory hooks are correct but inputs are mismatched, LangChain may raise errors like:

  • ValueError: Missing some input keys: {'history'}
  • KeyError: 'input'

Example:

# BROKEN
chain.invoke({"question": "Hello"})  # chain expects {"input": ...}

Fix:

# FIXED
chain.invoke({"input": "Hello"})

If you’re using ConversationBufferMemory, check memory.memory_key and make sure the prompt includes that variable.


2) Memory is not attached to the runnable/chain

With newer LangChain patterns, people often build an LLM runnable and assume memory will “just work.” It won’t unless you explicitly wire it in.

Broken:

# BROKEN
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")
response = llm.invoke("Remember my name is Sam")

Fixed:

# FIXED
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

llm = ChatOpenAI(model="gpt-4o-mini")
memory = ConversationBufferMemory()
chain = ConversationChain(llm=llm, memory=memory)

If you’re on LCEL (RunnableSequence / RunnableWithMessageHistory), use the history wrapper instead of expecting plain .invoke() to persist state.


3) You’re creating a new session ID every time

This happens with RunnableWithMessageHistory and chat stores. If each request gets a different session_id, LangChain correctly treats it as a different conversation.

Broken config:

config = {"configurable": {"session_id": str(uuid.uuid4())}}

That guarantees a new history on every call.

Fixed:

config = {"configurable": {"session_id": user_id}}

Use a stable key:

  • authenticated user ID
  • browser session ID
  • tenant + user composite key

4) Your store is in-memory only and your app restarts

If you use InMemoryChatMessageHistory, it persists only for the life of the Python process. Restart Gunicorn, reload Uvicorn, redeploy Docker, or scale to another pod, and history disappears.

Example:

from langchain_core.chat_history import InMemoryChatMessageHistory

store = {}
store["user_123"] = InMemoryChatMessageHistory()

That works locally, but not across restarts or multiple workers.

Fix: use Redis, Postgres, DynamoDB, or another persistent backend for chat history.

How to Debug It

  1. Print the memory contents after each turn

    • If messages are empty after turn one, your memory isn’t being reused.
    • Check memory.load_memory_variables({}) or inspect your message store directly.
  2. Confirm the same session/user key is used

    • Log session_id, user_id, or whatever key backs your chat history.
    • If it changes between requests, persistence will fail by design.
  3. Verify prompt variables match memory keys

    • If your prompt expects {history} but memory uses "chat_history", LangChain won’t inject it correctly.
    • Common error:
      • ValueError: Prompt missing required variables: {'history'}
  4. Check whether you are rebuilding objects per request

    • In Flask/FastAPI/Streamlit/Jupyter notebooks, object scope matters.
    • If you see code like memory = ConversationBufferMemory() inside a route handler, that’s usually the bug.

Prevention

  • Use a persistent chat store for anything beyond local prototyping.
    • Redis-backed message history is a solid default for web apps.
  • Keep session identity stable.
    • One user/session should map to one conversation thread.
  • Test multi-turn behavior explicitly.
    • Write an integration test that sends two prompts and asserts the second turn sees prior context.

If you want one rule to remember: LangChain memory does not persist because you “enabled” it; it persists because you keep reusing the same conversation state with a stable storage layer.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides