Token Recall - Why AI Keeps Forgetting Our Chats
Have you ever had a conversation with somebody in customer support, get transferred to someone else and end up frustrated that the new person you are talking to has completely lost the context of the previous conversation?
I mean, how hard can it be for them to keep a record of the important points?
And have you ever had a conversation with an AI assistant, then moved to a new conversation and end up frustrated that they have completely lost the context of the previous conversation?
Does that sound similar? It’s because it is.
If you spent any time conversing with an AI assistant by now you know they do not remember anything between multiple conversations
Let’s look at why that is...
Each time you're transferred between customer service agents, you have to explain your situation again because the new agent wasn't present for your previous conversation and has no knowledge of what was discussed.
When you start a new conversation with an AI assistant, it’s the same thing. It's like speaking with a fresh customer service agent who has no knowledge of your previous interactions. Each conversation is separate and independent, with the AI starting from scratch and, just as a new customer service agent needs to build their understanding of your situation from the beginning, the AI assistant needs to do that too.
Now, this is by design, for a number of reasons. Starting afresh every conversation helps to protect privacy and maintain consistency. This approach helps keep user data private and prevents one person's information from accidentally bleeding into conversations with others. While this design choice has clear drawbacks for us, the users, it's currently the safest and most reliable way to provide AI assistance.
On top of that, there’s this thing called a token.
Tokens
Tokens are the fundamental units of text processing for AI models - they're how language gets broken down into pieces the AI can understand.
A token can be a whole word, part of a word, a punctuation mark, or even a space. Common English words like "the" and "and” might be a single token, while longer or more complex words are often split into multiple tokens.
As an example, let's look at "Luke Skywalker bullseyed womp rats in his T-16." There are 8 words in that sentence, but there are 14 tokens.
“Luke”, “Skywalker”, “rats”, “in”, “his” and “.” are all single tokens, but “bullseyed” is three (“bull”, “sey”, “ed”), as is “T-16” (“T”, “-“, “16”) and “womp” is split into “wom” and “p”.
Each word, part of a word, or even punctuation mark takes up token space. The key limitation is that AI models can only process a finite number of tokens at once - this is a core technical constraint. When you're having a conversation, both your messages and the AI's responses consume these tokens. If conversations carried over, the token count would eventually reach its limit, leaving no room for new information or responses. Starting fresh each time isn't just a design choice - it's a necessary constraint based on how AI models process language. It ensures there's always sufficient token space available for meaningful interaction.
This is why you get a warning when your conversation is getting too long. It’s also why, when you stay in a long conversation, you start to get frustrated because the assistant is beginning to forget things you previously said. Like that time you tried using AI to plan your holiday in a warm climate because it was freezing outside, and after ten minutes of planning it suddenly started suggesting Iceland.
AI language models process text using what's called a context window. This is the total amount of text the AI can "see" and reference at any given time. Think of it like a container that holds both your messages and the AI's responses. Every model has a fixed maximum size for this container - when it's full, no more text can be processed. That’s why when it started suggesting Iceland instead of Thailand it wasn’t because it's being difficult - it's because those early messages about hating the cold have been pushed out of the context window, like old texts disappearing off the top of your screen in a really long group chat.
You are basically having a 2 am drunken conversation together. Recall is hazy in the moment and completely gone the next day.
The good news is understanding these limitations helps you work better with AI. Keep conversations focused, break complex tasks into smaller chats, and don't be surprised when you need to remind it about important details - just like you might need to recap things for a new customer service agent. It's not perfect, but hey, neither are you.