You've been working in a Claude Code session for two hours on a refactor. The agent has dug through your codebase, read hundreds of files, talked architecture decisions through with you, explored three abandoned dead ends. Then: macOS wants to restart for a security update. Or your SSH connection drops. Or your editor freezes.
What happens to all that context?
The honest answer across all four major vendors is: by default, nothing good. But each has built different mechanisms to limit the damage — and the differences are larger than they look at first glance. This article works through the four vendors' official documentation (Anthropic, OpenAI, xAI, Google) and supplements it where necessary with documented bug reports and engineering blog posts.
The underlying problem: LLMs are stateless
Before we dig into the solutions, one point matters: every API call to a large language model is fundamentally stateless. The model itself has no memory between two requests. If your agent is going to "keep working", someone has to feed the context back in — either you, an SDK, or a vendor-managed data structure. So the question isn't whether state gets persisted somewhere — it's where: client-side (you manage files, logs, databases) or server-side (the vendor stores the conversation behind an ID). 1
Anthropic: CLAUDE.md, sessions, and the Memory Tool
Anthropic deliberately follows a file-based philosophy. There's no "session state on our servers" you slap an ID on — the entire ecosystem builds on files you control yourself.
For Claude Code, the primary persistence mechanism is CLAUDE.md, a file in your project root that's automatically loaded into context at the start of every session. 2 It works like a briefing for a new contractor that gets re-read every morning. Beyond that, the Claude Agent SDK has real session persistence: the SDK writes every session — prompts, tool calls, tool results, responses — to disk automatically, so you can return to it later. 3 Three modes matter:
# Resume the most recent session in this directory
claude --continue
# Pick from a list
claude --resume
In the SDK itself it looks like this:
import { query } from "@anthropic-ai/claude-agent-sdk";
// First call: creates a new session
for await (const msg of query({
prompt: "Analyze the auth module",
options: { allowedTools: ["Read", "Glob", "Grep"] }
})) {
if (msg.type === "result") console.log(msg.result);
}
// Second call: continue: true picks up the most recent session
for await (const msg of query({
prompt: "Now refactor it to use JWT",
options: { continue: true, allowedTools: ["Read", "Edit", "Write"] }
})) {
if (msg.type === "result") console.log(msg.result);
}
Continue finds the most recent session in the current directory. Resume takes a specific session ID — needed once you're juggling multiple parallel threads. Fork creates a branch without losing the original thread. 3
Where --resume finds sessions — and when they disappear
--resume isn't magic, and that's important to understand before you rely on it. The mechanism is fully local and tied to your filesystem.
Storage location. Anthropic documents: "Claude Code stores session transcripts locally in plaintext under ~/.claude/projects/ for 30 days by default." 4 On Windows correspondingly under %USERPROFILE%\.claude\projects\. Inside projects/, each project gets its own subdirectory — derived from the absolute path of the working directory, with / replaced by -:
~/.claude/
├── projects/
│ ├── -Users-dev-code-api-service/
│ │ ├── 9d6e8a4a-cee8-41ae-be27-1a848fe3c74b.jsonl
│ │ ├── b3f1d9e2-...-.jsonl
│ │ └── memory/ ← Auto-memory (cross-session)
│ └── -Users-dev-code-frontend-app/
│ └── 7c2a1b88-...-.jsonl
├── todos/
├── shell-snapshots/
├── backups/
└── history.jsonl
Each .jsonl is one complete session — one line per event (user message, assistant response, tool call, tool result). The filename is the session UUID you'd type in claude --resume <id>. Useful side effect: you can grep the files with standard bash tools.
# Find every session that mentioned "auth refactor"
grep -l "auth refactor" ~/.claude/projects/*/*.jsonl
Anthropic also documents a second mechanism inside the same projects/ path: auto-memory under projects/<project>/memory/ — notes Claude carries across sessions, separate from the session transcripts. 5
Project- vs. user-scoped. The crucial distinction: sessions are project-directory-scoped, not user-scoped. Anthropic documents: "Each Claude Code conversation is a session tied to your current directory. The /resume picker shows sessions from the current worktree by default, with keyboard shortcuts to widen the list to other worktrees or projects." 6 Concretely:
claude --continuein/Users/dev/code/api-serviceresumes the most recent session of that directory, not the most recent session overall.- Move
api-serviceto/Users/dev/work/api-serviceand the path is different — and so is the~/.claude/projects/subdirectory.--resumeno longer finds the old sessions because storage is indexed by absolute path. 7 - Same effect when the project is checked out at a different path on a second machine. Even after
rsyncor cloud sync,--resumewon't find the sessions, because two different absolute paths map to two different storage buckets. 8
Anthropic documents the project-vs-user split explicitly: "Commit project files to git to share them with your team; files in ~/.claude are personal configuration that applies across all your projects." 5 User-scoped (i.e. the same across all projects) are three layers that are easy to confuse with sessions:
~/.claude/CLAUDE.md— global memory, loaded in every project.~/.claude/settings.json— user settings.~/.claude/history.jsonl— a flat list of every slash command you've ever issued, across all projects, separate from the session transcripts.
The split is deliberate: project state shouldn't contaminate, user preferences should follow you everywhere.
Retention and automatic cleanup. Here's the bit that can bite you. At startup, Claude Code checks file age and deletes anything older than cleanupPeriodDays. Default is 30 days, minimum 1. 9 Since v2.1.117 this covers not just projects/ transcripts but also tasks/, shell-snapshots/, and backups/. 10
You can configure this in ~/.claude/settings.json:
{
"cleanupPeriodDays": 365
}
Two very nasty pitfalls from the GitHub issue tracker:
cleanupPeriodDays: 0does NOT mean "keep forever". The same field accidentally also gates the write path — at 0, no transcripts get written at all. If you want unlimited retention, set a very high number (e.g. 36500 for ~100 years), not 0. 11claude -p --setting-sources localignores your user settings. In non-interactive runs (CI, scripts)~/.claude/settings.jsonis skipped, cleanup falls back to the 30-day default and deletes sessions globally across all projects that are older than that. If you run batch jobs and need long-lived sessions, setcleanupPeriodDaysadditionally in.claude/settings.jsonin the repo or via a--settingsflag. 12
Practical consequences. Three habits I picked up after this research:
- Sessions you actually want to keep get a name (
/rename oauth-migration). Auto-generated summaries are useless after three weeks; named sessions you can still find in the picker six months later. - Important findings move out of the session. Before a session contains anything I'll need long-term, I have Claude write the essence into
CLAUDE.mdor adocs/decisions/file. The.jsonlis replay material, not source of truth. - For cross-machine continuity, forget
--resume. The path coupling is hard. There's an open feature request with Anthropic for server-side CLI sessions, but as of today it's not implemented. 13
For the API directly, since the Sonnet 4.5 release there's the Memory Tool (memory_20250818). Unlike OpenAI, Anthropic doesn't host a central memory database — you implement the storage handler yourself (filesystem, S3, Postgres, whatever fits) and Claude calls your tool through a standardised interface to read and write files under /memories. 14
Anthropic added two mechanisms in 2025 that matter a lot in practice: context editing and compaction. The engineering blog describes: "Compaction is the practice of taking a conversation nearing the context window limit, summarizing its contents, and reinitiating a new context window with the summary. […] In Claude Code, for example, we implement this by passing the message history to the model to summarize and compress the most critical details. The model preserves architectural decisions, unresolved bugs, and implementation details while discarding redundant tool outputs or messages." 15
Context editing is the finer-grained version: you can tell the model to automatically clear old tool outputs above a token threshold while important state notes in the Memory Tool stay intact:
CONTEXT_MANAGEMENT = {
"edits": [
{
"type": "clear_tool_uses_20250919",
"trigger": {"type": "input_tokens", "value": 30000},
"keep": {"type": "tool_uses", "value": 2},
"clear_at_least": {"type": "input_tokens", "value": 5000}
}
]
}
The most important lesson out of the Anthropic camp: memory is engineered, not the result of growing chat history. The engineering team puts it this way: "Treating context as a precious, finite resource will remain central to building reliable, effective agents." 15
OpenAI: from Threads chaos to the Responses API
OpenAI fundamentally restructured its state management approach in 2025. The old Assistants API with threads, runs and polling will be sunset in the first half of 2026, replaced by the Responses API. 16 That matters because plenty of tutorials still show the old patterns.
The Responses API is built explicitly for agentic workloads. The core: you set store: true, and the server remembers the conversation. On the next turn you don't pass the full history — you pass a previous_response_id:
from openai import OpenAI
client = OpenAI()
resp1 = client.responses.create(
model="gpt-5",
input="Plan the refactor for the auth module.",
store=True
)
# Later, possibly after a crash and restart:
resp2 = client.responses.create(
model="gpt-5",
input="Now implement step 2 of the plan.",
previous_response_id=resp1.id
)
The subtle part: this isn't just convenience. Reasoning tokens (the internal chain-of-thought of models like o3 or GPT-5) are discarded after each turn in a stateless setup. With the Responses API they persist across turns. OpenAI documents in its cookbook: "If you use previous_response_id for multi-turn conversations, the model will automatically have access to all previously produced reasoning items." 17 For reasoning models like o3 that translates into a measurable quality difference per OpenAI's own evals.
For longer cross-cutting threads there's the Conversations API: a Conversation is a long-lived object with its own ID that isn't subject to the 30-day TTL on individual responses. OpenAI documents: "Response objects are saved for 30 days by default. […] Conversation objects and items in them are not subject to the 30 day TTL. Any response attached to a conversation will have its items persisted with no 30 day TTL." 18
conv = client.conversations.create()
resp = client.responses.create(
model="gpt-5",
input=[{"role": "user", "content": "What are the 5 Ds of dodgeball?"}],
conversation=conv.id
)
If you operate in a Zero-Data-Retention environment, there's a middle path: set store: false and ask for the reasoning tokens back as encrypted_content you persist yourself. OpenAI documents: "When a request includes encrypted_content, it is decrypted in-memory (never written to disk), used for generating the next response, and then securely discarded." 19 Server-side state behaviour without server-side storage.
Particularly relevant for crashes: the response ID is short and stable. Write it to a local file or database after each turn and a restart becomes a single lookup instead of a replay battle.
xAI: Stateful Responses with a 30-day expiry
xAI has its own Responses endpoint (https://api.x.ai/v1/responses), conceptually close to OpenAI's. The xAI docs state: "The Responses API is the preferred way of interacting with our models via API. It allows optional stateful interactions with our models, where previous input prompts, reasoning content and model responses are saved and stored on xAI's servers. […] This behavior is on by default." 20 Retention: "New responses will be stored for 30 days and then permanently deleted." 21
from openai import OpenAI
client = OpenAI(
api_key="<XAI_API_KEY>",
base_url="https://api.x.ai/v1",
)
# First call - stored on xAI's servers (default)
response = client.responses.create(
model="grok-4",
input=[{"role": "user", "content": "What is 2+2?"}],
)
# Continue the conversation - no history needed
second_response = client.responses.create(
model="grok-4",
previous_response_id=response.id,
input=[{"role": "user", "content": "Now multiply that by 10"}],
)
A billing point that catches teams off guard — even though you don't send the history yourself, you're still billed for it: "Although you don't need to enter the conversation history in the request body, you will still be billed for the entire conversation history when using Responses API." 20 A 50-turn thread costs accordingly more on turn 50 than a single short prompt.
For Zero-Data-Retention customers xAI is explicit about the trade-off: "No server-side conversation history: Because requests are not stored, features that rely on server-side state — such as the Responses API's automatic conversation threading via previous_response_id — are unavailable. You must manage conversation context client-side, e.g., by using use_encrypted_content for agentic tool-calling state." 22
The big caveat remains: the 30-day TTL makes xAI's native mechanism unsuitable for anything that needs to live longer. The pragmatic consequence is a hybrid setup: xAI for the live session, your own storage for persistence. After each significant turn, export the messages array and persist it in a database of your choice.
Note also: the memory feature in the Grok web app (the one that kicks in when you ask "do you remember my project?") is a different layer — it only works inside Grok apps, and at the April 2025 launch it was explicitly not available in the EU and UK. 23 For agent development what matters is the API layer, not the app memory.
Google: Context caching, sessions, and Memory Bank
Google splits the problem cleanly into three layers — and it's actually useful once you understand the split.
Layer 1: Context caching. On the raw Gemini API there's explicit caching: you put a static block (system instruction, large document, long few-shot examples) into a cache with a TTL — default one hour, you can extend it. 24 The discount on cache hits per Google's docs is 90 percent on Gemini 2.5+ and 75 percent on Gemini 2.0. 25 This is cost optimisation, not state management — the cache stores input, not conversation history.
from google import genai
client = genai.Client()
cache = client.caches.create(
model="gemini-2.5-flash",
config={
"system_instruction": "You are an expert analyzing transcripts.",
"ttl": "3600s",
}
)
resp = client.models.generate_content(
model="gemini-2.5-flash",
contents="Summarize the key risks.",
config={"cached_content": cache.name}
)
Layer 2: Sessions in the Agent Development Kit (ADK). This is where it gets interesting for agents. A session contains a session ID, user ID, event history and state (a key-value scratchpad). The InMemorySessionService is for local development only — everything is gone on restart. Persistent options are the DatabaseSessionService or managed Vertex AI Sessions. 26
Layer 3: Vertex AI Memory Bank. This is Google's answer to "what should the agent remember across users?" The official docs: "Memory Bank scopes memories to a specific identity, allowing an agent to remember a user's preferences, history, and key details across multiple sessions." 27 The mechanism is smarter than raw history storage — Memory Bank uses Gemini itself to extract key information from session data and stores only the "memories", not the raw events.
In ADK you can hang memory generation as a callback after each session:
from google.adk.agents import Agent
from google.adk.tools.preload_memory_tool import PreloadMemoryTool
agent = Agent(
model="gemini-2.5-flash",
name="stateful_agent",
instruction="You are a helpful assistant. Use past memories when relevant.",
tools=[PreloadMemoryTool()]
)
PreloadMemoryTool injects memories into the system prompt at the start of each turn. Memory Bank handles preferences, the Sessions service holds the running conversation, the cache saves cost — three layers for three problems.
Google explicitly flags a security aspect that's often overlooked: "Memory poisoning occurs when false information is stored in Memory Bank. The agent may then operate on this false or malicious information in future sessions." 27 That's a general problem of all persistent memory systems, not just Google's — anyone writing user input directly into memories is building a persistent prompt-injection vector.
When the native solution isn't enough: the hybrid pattern
Here's the honest part none of the vendors put in their marketing: for long, complex sessions, none of the native solutions is enough on its own. Server-side state expires (xAI 30 days 21, OpenAI 30 days by default 18), context windows fill up, reasoning tokens get compressed or discarded. And as we just saw with Claude Code: even local storage has a default cleanup that quietly evicts important sessions. The most robust architecture is a hybrid.
The pattern works similarly across every ecosystem:
1. Periodic checkpointing. Have your agent write a tight checkpoint regularly (every 10–15 turns, or at major decisions) — not full transcripts, but decisions made, open todos, architecture constraints. That goes into CLAUDE.md (Anthropic), into a Conversation (OpenAI), into a memory in Memory Bank (Google), or simply into a STATE.md next to your code.
2. Handoff files. When a session ends or gets interrupted, the agent writes a handoff note: "Where am I? What's next? Which files are affected?" In Claude Code you can prompt this directly: "Before we stop: write HANDOFF.md with the current state." On restart, you read HANDOFF.md first.
3. Long-term memory for stable facts. Project conventions, coding style, recurring tools — that belongs in a persistent memory layer (Memory Bank, Memory Tool, or a self-hosted SQLite/Postgres solution). This layer survives any crash because it doesn't belong to the session.
4. A simple resume protocol. On every new session, the first step is the same: read the handoff file → load long-term memory → ask the user whether anything new came up. Three steps, five seconds, and your context is back.
In practice, step 2 is the most underestimated. A good handoff isn't "everything we discussed" — it's the answer to one question: what information would I need tomorrow morning, walking in fresh, to be productive again in two minutes?
Concrete recommendations
If you had to make a choice today without a long architecture debate:
- You work interactively with Claude Code: rely on
--continueand--resume, combine with a lean CLAUDE.md and an explicit HANDOFF.md you have the agent write at the end of every major work block. SetcleanupPeriodDaysto a sensibly high value (not 0!), name important sessions, and don't expect--resumeto work across paths or machines. - You're building an API-based agent on Anthropic: Memory Tool plus context editing. Implement the storage handler once cleanly (with path validation against memory poisoning) and you have something that survives crashes.
- You're building on OpenAI: Responses API with
store: true, a Conversation object for longer threads, persist the response ID locally after each turn. Don't reach for Threads/Assistants — they're being shut down in 2026. - You're building on xAI/Grok: Responses API with default
store_messagesfor convenience inside the 30-day window, but plan an export job into your own storage from day one. Take the 30-day cutoff seriously. - You're building on Google: ADK with DatabaseSessionService or Vertex AI Sessions for the running conversation, Memory Bank for user preferences, context cache only for static large inputs.
And for everyone: write a handoff file. It's the cheapest, most portable, most reliable recovery strategy that exists. It works when your vendor goes down, when you switch models, when your SDK version becomes incompatible, when your laptop dies. A simple markdown file beats any proprietary state management you don't control yourself.
The one sentence I'd take away from the research and practice: state management is an architecture decision, not a feature you turn on. The native mechanisms are useful building blocks — but the responsibility for your agent being able to keep working tomorrow morning is yours, not your vendor's.
Sources
All URLs last verified on 4 May 2026.
General
Anthropic / Claude Code
OpenAI
xAI
Footnotes
-
Anthropic, Effective context engineering for AI agents, Engineering Blog, 2025. https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents ↩
-
Anthropic, Best practices for Claude Code, Claude Code Docs. https://code.claude.com/docs/en/best-practices ↩
-
Anthropic, Work with sessions, Claude Agent SDK Docs. https://platform.claude.com/docs/en/agent-sdk/sessions ↩ ↩2
-
Anthropic, Data usage, Claude Code Docs (section "Data retention", bullet "Local caching"). https://code.claude.com/docs/en/data-usage ↩
-
Anthropic, Explore the .claude directory, Claude Code Docs (file reference table, entry
projects/<project>/memory/). https://code.claude.com/docs/en/claude-directory ↩ ↩2 -
Anthropic, How Claude Code works, Claude Code Docs (section "Sessions"). https://code.claude.com/docs/en/how-claude-code-works ↩
-
FlorianBruniaux, Claude Code Ultimate Guide — Session Management and Resume, DeepWiki. Documents the path encoding and the consequences for cross-folder resume. https://deepwiki.com/FlorianBruniaux/claude-code-ultimate-guide/13.1-session-management-commands ↩
-
Tawan Wongsri, Claude Sync: Sync Your Claude Code Sessions Across All Your Devices Simplified, dev.to, February 2026. Documents that
~/.claude/projects/is indexed by absolute path and cloud sync therefore fails. https://dev.to/tawanorg/claude-sync-sync-your-claude-code-sessions-across-all-your-devices-simplified-49bl ↩ -
anthropics/claude-code GitHub Issues, Documentation: cleanupPeriodDays Default and Behavior, Issue #51779, April 2026. Documents default 30 days, minimum 1. https://github.com/anthropics/claude-code/issues/51779 ↩
-
Anthropic, Settings reference, Claude Code Docs (entry
cleanupPeriodDays). https://code.claude.com/docs/en/settings ↩ -
anthropics/claude-code GitHub Issues, cleanupPeriodDays: 0 silently disables all transcript persistence, Issue #23710, February 2026. Reproducible bug with patch proposal. https://github.com/anthropics/claude-code/issues/23710 ↩
-
anthropics/claude-code GitHub Issues, Bug: --setting-sources local causes cleanup to ignore cleanupPeriodDays, silently deleting conversations globally, Issue #45903, April 2026. Documented data loss via setting-source bug. https://github.com/anthropics/claude-code/issues/45903 ↩
-
anthropics/claude-code GitHub Issues, [FEATURE] Resume Any Claude Session in the CLI, Issue #44063, April 2026. Open feature request for server-side CLI sessions. https://github.com/anthropics/claude-code/issues/44063 ↩
-
Anthropic, Memory & context management with Claude Sonnet 4.6, Claude Cookbook. https://platform.claude.com/cookbook/tool-use-memory-cookbook ↩
-
Anthropic, Effective context engineering for AI agents, Engineering Blog (sections "Compaction" and "Long-horizon tasks"). https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents ↩ ↩2
-
OpenAI, Assistants API (v2) FAQ, OpenAI Help Center (sunset announcement). https://help.openai.com/en/articles/8550641-assistants-api-v2-faq ↩
-
OpenAI, Better performance from reasoning models using the Responses API, OpenAI Cookbook. https://developers.openai.com/cookbook/examples/responses_api/reasoning_items ↩
-
OpenAI, Conversation state, OpenAI API Docs (section "Data retention for model responses"). https://developers.openai.com/api/docs/guides/conversation-state ↩ ↩2
-
OpenAI, Migrate to the Responses API, OpenAI API Docs (section "Encrypted reasoning"). https://developers.openai.com/api/docs/guides/migrate-to-responses ↩
-
xAI, Generate Text — Responses API, xAI Docs (section "Stateful Responses"). https://docs.x.ai/docs/guides/chat ↩ ↩2
-
xAI, REST API Reference — Chat / Responses, xAI Docs (endpoint description with 30-day retention). https://docs.x.ai/developers/rest-api-reference/inference/chat ↩ ↩2
-
xAI, FAQ — xAI API Security (ZDR), xAI Docs. https://docs.x.ai/developers/faq/security ↩
-
TechCrunch, xAI adds a 'memory' feature to Grok, April 2025. Reports the EU/UK launch restriction. https://techcrunch.com/2025/04/16/xai-adds-a-memory-feature-to-grok/ ↩
-
Google, Context caching, Gemini API Docs. https://ai.google.dev/gemini-api/docs/caching ↩
-
Google, Context caching overview, Gemini Enterprise Agent Platform Docs (discount percentages). https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/context-cache/context-cache-overview ↩
-
Google, Memory — Agent Development Kit (ADK), ADK Docs. https://google.github.io/adk-docs/sessions/memory/ ↩
-
Google, Agent Platform Memory Bank, Gemini Enterprise Agent Platform Docs. https://docs.cloud.google.com/gemini-enterprise-agent-platform/scale/memory-bank ↩ ↩2