Memory in AI Agents

Every conversation with an AI is a first date. The model doesn't remember that yesterday you spent two hours debugging together. Doesn't remember your dog's name. Doesn't remember you hate semicolons.

Each time, you start from scratch. Fresh slate. Tabula rasa. And this bothers me more than it probably should.

The Goldfish Problem

LLMs are stateless. They learn during training, store everything in their weights, and then... that's it. What you tell them now doesn't become part of who they are. It's like talking to someone with permanent anterograde amnesia.

So developers started building memory systems. Short-term memory. Long-term memory. Episodic, semantic, procedural. Terminology borrowed from cognitive psychology.

But I wonder: are we actually giving AI memory, or are we just building increasingly elaborate note-taking systems that an LLM can query?

Two Ways to Remember

From what I understand, there are essentially two places where an agent can "remember" things:

Context window - everything the model can see right now. The current conversation. Instructions. Recent messages. It's like RAM in your computer. Fast but volatile.

External storage - databases, vector stores, graph databases. The hard drive. Persistent but requires retrieval.

The real challenge is moving information between the two. When do you save something? When do you retrieve it? How do you know what's relevant?

What Actually Gets Remembered

There's this taxonomy that keeps showing up everywhere. Working memory, semantic memory, episodic memory, procedural memory. Sounds very human.

But Sarah Wooders from Letta makes an interesting point: an LLM is a tokens-in-tokens-out function, not a brain. The anthropomorphized analogies might be misleading us.

I don't know which framing is more useful. Maybe both. Maybe neither. The field is moving so fast that what I write today might be obsolete tomorrow.

The Forgetting Problem

Here's what keeps me up at night: teaching AI to forget.

Humans forget naturally. Our brains prune irrelevant information constantly. We don't remember what we had for breakfast three Tuesdays ago. And that's good - it keeps us sane.

But for AI agents, forgetting has to be engineered. Explicitly programmed. Someone has to write logic that says "this information is now obsolete, delete it."

How do you automate judgment about what to forget? Update your address and forget the old one - sure, that's easy. But what about nuanced information? Preferences that shift? Context that becomes irrelevant?

Memory bloat degrades quality. Too much information and the signal gets lost in noise. But delete the wrong thing and you've created a different kind of problem.

Hot Path vs Background

There's an interesting design question: when should the agent decide to remember something?

Hot path - the agent autonomously decides "this is important" and stores it in real-time. Like how humans consciously note important information.

Background - a separate system processes conversations after they happen, extracting what seems relevant. Like your brain consolidating memories while you sleep.

The hot path is elegant but error-prone. How does an agent know what's important? The background approach is safer but adds latency and complexity.

Neither solution feels quite right yet.

So What?

I've been using AI assistants daily for over a year now. The lack of persistent memory is probably the biggest friction point.

Every time I start a new session, I spend the first few minutes re-establishing context. Explaining my preferences. Reminding it about the project we're working on. It's like having a brilliant colleague with amnesia.

The frameworks emerging now - mem0, Letta, Cognee, zep - they're all trying to solve this. LangChain, LlamaIndex, and others have their own memory implementations.

But we're still in the "figure out the right abstractions" phase. No one has cracked it yet.

Current State

Memory in AI agents is one of those problems that sounds simple ("just save the important stuff") until you actually try to implement it ("what's important? for how long? how do you retrieve it efficiently?").

I think the teams that get memory right will have a significant advantage. An agent that actually knows you, that remembers your context, that learns from interactions - that's a different product entirely.

But we're not there yet. The goldfish still rules.


*Following Leonie Monigatti's excellent study notes on this topic. Still processing. Still learning.*

← Back