Why does ChatGPT forget what I told it five minutes ago?

Because the model has no memory between turns by default — only the rolling context window the client decided to send it. ChatGPT, Claude, and Gemini all maintain that window for you within a single conversation, but each chat starts fresh unless an explicit 'memory' feature is on. Even with memory on, only a small structured subset of facts (usually 1,200–2,000 tokens in ChatGPT's case) survives across chats — the conversation itself doesn't.

What's the difference between context window, memory, and RAG?

Three different things. Context window: the literal text the model sees on this turn, capped by the model (200K tokens on Claude, 1M on Gemini 2.5 Pro, 128K–400K on GPT-5 family). Memory: a feature that re-injects a small set of pinned facts into future chats. RAG (retrieval-augmented generation): a system that searches an external index and injects relevant chunks at query time. The model itself doesn't 'remember' anything in any of these — it just sees whatever was placed in front of it on this turn.

Doesn't fine-tuning give the AI long-term memory?

No. Fine-tuning updates the model's weights so it produces text in a particular style or format. It does not store discrete facts you can read, edit, or rely on. Asking a fine-tuned model 'what did I tell you last Tuesday?' is almost guaranteed to get a hallucination — the model has a vague tendency to sound like your old chats, not a record of them.

How many messages can ChatGPT remember in one conversation?

On free-tier GPT-5.3 Instant, roughly 10 messages per 5 hours before quota cuts off; on Plus, around 160 messages per 3 hours on the same model (AI-Toolbox, 2026 figures). 'Remember' inside one chat means 'fits in the context window' — when the window fills, older turns get truncated or summarized by the client, and the model behaves as if they never happened.

Does deleting a chat delete what the model knows about me?

Partially. Deleting a chat removes that conversation from the visible history. But anything the model pinned to its 'memory' feature persists, and any inferences made during training pulls happen on the provider's schedule — not yours. The honest model: assume deletion is a UI action, not a data action, unless the provider documents otherwise.

Will bigger context windows fix this?

No. A 1M-token context window helps within one session, but the model still doesn't carry anything between sessions, doesn't actively retrieve from your past, and forgets entire conversations the moment the window scrolls. Bigger windows make the present larger; they don't make the past accessible. The fix has to be external — a memory layer that re-injects what matters at the right moment.

What does 'persistent memory' actually require?

Four things, in order: (1) a durable store that lives outside the model, (2) a way to capture new facts from conversations, (3) a way to retrieve and re-inject the right facts at the right time, (4) a way for you to see, edit, and delete what's stored. Most provider 'memory' features only solve #3 partially and give you a thin slice of #4. Dedicated memory tools (Mem, Rewind, Konshus, your own RAG setup) try to solve all four.

How does Konshus handle this differently?

Konshus is a memory layer that lives outside any single provider. You import your past conversations and documents; the system distills them into atoms (durable facts with source + confidence + timestamps). When you talk to an AI, you can hand it a tight context block (Whisper) or a full mirror — your choice. Critically, the vault doesn't get wiped when ChatGPT ships a new model or Claude retires the old one. See /complete-ai-backup-guide for the full architecture, or /pricing.

Explainer · ~10 min read

Why Every AI Forgets You: The Architectural Truth About AI Memory

The complaint is universal: "I told it last week, why doesn't it remember?" The honest answer isn't a bug or a quota — it's that the thing people call AI memory is actually five different things in a trench coat, and none of them work the way human memory does. Once you can see the five layers, the forgetting stops being mysterious.

A lone silhouetted figure walking through a long corridor of doorways that dissolve into teal mist behind them, each doorway representing a past conversation fading.

The base case: a model has no memory at all

One-sentence answer: Out of the box, a language model is a stateless function — text in, text out, nothing carried over between calls.

When you send a prompt to GPT-5 or Claude or Gemini at the API level, the model sees exactly what you put in this request and nothing else. There's no hidden "session." If you want it to know what was said three turns ago, you have to include those three turns in this turn. The conversation feel of ChatGPT is the client stitching turns together and resending them — not the model remembering.

Once you accept that, every "memory" you see in a product is a layer added on top.

The five layers people call "AI memory"

Read these in order. They're additive, not interchangeable.

Painterly cutaway diagram showing four conceptual translucent layers floating in dark space — a small clear box (context window), a glowing pattern (fine-tuning), a sticky note (system prompt), and connecting arrows suggesting how each layer feeds the model. — Five conceptual layers. Only one of them — an external retrieval system you control — actually behaves like memory.

1. Context window

The literal text the model sees on this turn. Hard-capped by the model: GPT-5 family runs 128K–400K tokens depending on tier, Claude Sonnet/Opus is 200K, Gemini 2.5 Pro is 1M. Once a conversation exceeds the cap, the client either truncates or summarizes older turns. The model has no idea it lost anything.

2. "Memory" features (ChatGPT Memory, Claude Projects, Gemini Apps Activity)

A small structured store the client re-injects into the system prompt on future chats. ChatGPT's memory is typically 1,200–2,000 tokens of pinned facts. Claude's "memory" is really Project context — you set it, it persists per Project. Gemini's is closer to a cross-Google-account context than discrete pinned facts. None of these store conversations; they store distillations.

3. RAG (retrieval-augmented generation)

An external index — usually a vector store — that the client searches at query time and injects relevant chunks into the context window. This is what "AI that knows your documents" actually means under the hood. It works well when retrieval is tuned; it fails silently when the right chunk isn't in the top-K.

4. Fine-tuning

Modifies the model's weights so it produces text in a particular style. It does not store facts you can read or edit. A fine-tuned model can sound like your past chats without containing them. Asking it "what did I tell you last week?" returns a confident hallucination, every time.

5. System prompt / custom instructions

A static block of text the client prepends to every chat in this account or Project. Counts against the context window. Effective but tiny — usually 1,500 characters of soft cap, and identical for every conversation.

Why none of these are memory in the human sense

One-sentence answer: Human memory is retrieval over a lifetime of episodic detail; every layer above is either too small, too lossy, too static, or too inaccessible.

A human who has known you for a year doesn't recall every conversation, but they can usually answer "what was that thing you told me about your sister last fall?" That requires (a) a durable store, (b) cued retrieval, (c) reasonable confidence calibration, and (d) the ability to forget gracefully. The provider memory features do (a) partially, (b) almost not at all, (c) badly, and (d) by quietly evicting things they decided weren't important.

The 1,200–2,000 token ChatGPT memory cap, for context, is roughly 150 short facts. A year of meaningful conversation with another person produces tens of thousands of those.

Bigger context windows are not the answer

When Gemini 2.5 Pro shipped at 1M tokens and Claude added 200K, a common reaction was: problem solved. It isn't. Three reasons:

Recall over long context degrades. Public benchmarks (NIAH, RULER) consistently show that retrieval accuracy at the middle of a long context is much lower than at the start or end. The model can "see" 1M tokens; it cannot reliably use them.
Latency and cost scale linearly. A 1M-token prompt is a 1M-token bill on every turn. For continuous personal assistance, that's not a viable shape.
It still doesn't survive sessions. The window resets when you close the chat. No matter how big it gets, the past is gone unless something external puts it back.

What real persistent memory requires

Four properties — the missing layer the providers don't ship:

Durable store outside the model. Survives model updates, account changes, provider sunsets.
Capture pipeline. New facts from conversations and documents land in the store automatically — not when you remember to type them in.
Retrieval that actually fires. The right facts get re-injected at the right moment, ideally cued by the model itself rather than a static system prompt.
Member-in-the-loop control. You can see, edit, correct, and delete every fact. If you can't audit it, it isn't yours.

Provider memory features ship #3 partially and a thin slice of #4. Dedicated memory layers (Mem, Rewind in its old form, Konshus, hand-rolled RAG setups) try to do all four.

Where Konshus fits

Konshus is the missing layer. Import your existing conversations from ChatGPT and Claude, drop in journals, docs, and meeting transcripts; the vault distills them into atoms — small facts with source, confidence, and a timestamp. You see every atom. You can edit any of them. When you talk to an AI, the vault hands it a tight context block tuned to the current question. Model updates and provider sunsets don't touch the vault. See the full backup guide for how to seed it from what you already have.

Frequently Asked Questions

A memory layer that doesn't get wiped

Konshus holds your context outside any single provider. Model updates, account changes, and quiet sunsets don't touch the vault. Encrypted, exportable, never used for training.

Meet Konshus