What is a Context Window?
The AI's short-term memory — and why it has a limit. What tokens are, why context windows matter, and how to work with them effectively.
Every time you send a message to an AI, it doesn't just read that one message. It reads everything — the entire conversation history, the system instructions, any documents you've attached.
All of that gets packed into what's called the context window.
Think of it as short-term memory
The context window is the AI's working memory. Everything it can "see" right now — your messages, its own responses, any files or instructions — lives inside this window.
When the window is full, something gets dropped. And what gets dropped is usually the oldest stuff — the beginning of the conversation.
You've probably experienced this. You're deep in a conversation with an AI, and suddenly it seems to forget something you discussed 30 messages ago. That's not a bug. That's the context window at its limit.
What are tokens?
You'll see "token" used everywhere when people talk about AI limits. It's a slightly odd unit of measurement.
Roughly: 1 token ≈ ¾ of a word. Or about 4 characters.
So "What is a context window?" is about 6 tokens. A full page of text is around 500–700 tokens. A 50-page PDF might be 25,000 tokens.
Why tokens instead of words? It's a technical thing related to how LLMs process text — they break everything into chunks called tokens, which are sub-word pieces. "unhelpful" might be 3 tokens: "un", "help", "ful". You don't need to fully understand it — just know that tokens ≈ words, roughly.
Why does the context window have a limit?
The short answer: it's expensive.
Processing text with an LLM requires a lot of computation. The more text in the context, the more computation needed. And the type of computation that makes LLMs work (called the "attention mechanism" — it lets the model connect any word to any other word) scales up quadratically with context length.
That means doubling your context more than doubles the compute cost. The limit is partly technical, partly economic.
The good news: context windows have been growing fast. Early GPT models had context windows of around 4,000 tokens. Modern models like Claude support over 200,000 tokens — roughly 150,000 words, or a full novel.
What happens when you hit the limit?
The AI starts forgetting.
More precisely: the oldest content in the context gets pushed out to make room for the new. So if you've been having a long conversation, the early part of it disappears from the AI's working memory.
This has real consequences:
- The AI might contradict something it said earlier
- It might forget instructions you gave at the start
- It might ask you to repeat information you already provided
A real example
Our AI operations manager Ozer runs on OpenClaw and has a context window just like any other LLM.
When a support conversation gets very long — dozens of messages, attached logs, document context — he starts losing track of what was said at the beginning. Just like you would if someone read you a 300-page book and then asked you about something on page 1.
The fix isn't to give the AI a better memory (though bigger context windows help). The fix is to be smart about what you put in the context — and to architect your AI systems so that long-running tasks store important information externally and retrieve it when needed.
Why this matters for business
If you're building AI systems or just using AI tools heavily, context window management is something you'll run into fast.
Long documents: Feeding a 200-page contract to an AI? Make sure the model's context window can fit it. Not all of them can.
Long conversations: Customer support bots, AI assistants — anything that runs long conversations needs a strategy for handling what happens when context fills up.
Complex tasks: Multi-step tasks where the AI needs to remember early decisions can run into trouble in very long sessions.
Practical tips
Put the most important stuff first. Instructions, key constraints, critical context — lead with them. If something gets dropped, it'll be from the end, not the beginning.
Keep system prompts lean. Every token your instructions take up is one fewer token available for actual content.
Summarize instead of appending. For very long tasks, periodically summarize what's happened so far rather than carrying the full raw history.
Know your tool's limits. Check the context window size for whatever AI tool you're using. It matters more than most people realize.
💡 Key takeaway: The context window is the AI's working memory — finite, precious, and easy to fill. Knowing this helps you use AI tools more effectively and design AI systems that don't fall apart on long tasks.
🔗 Next up: So the AI can process text — but what if you want it to actually do things? What is an AI Agent? →
Want AI agents working in your business?
We build and deploy AI systems that connect to your real infrastructure. Not demos — production systems.