active

AI Agent Memory

How agents remember — from stateless RAG to graph memory, context rot, and the engineering discipline required to keep a long-running agent grounded in reality.

#agent-engineering #memory #context-engineering #technology

Last updated: May 4, 2026

Our take

Vector RAG retrieves facts. Graph memory navigates relationships. The 2026 benchmark: Mem0g achieves 89.9% accuracy on LOCOMO at under 2.6s p95 latency vs 17s for full-context injection. Mirror adds what neither has — cross-session identity resolution via the Amrita Score.

The earliest production AI agents were stateless. Each conversation began from zero. The model had access to whatever fit in the context window — recent messages, a system prompt, maybe a retrieved document or two — and nothing else. For demos, this was fine. For agents running continuously over weeks on mission-critical workflows, it was a fundamental architectural constraint dressed up as a feature.

The field moved to vector RAG. Agents could now retrieve relevant facts from a database using semantic similarity. This was a genuine improvement. It also had a structural ceiling that took production deployments a year to fully expose.

The Limits of Vector Memory

Vector retrieval finds semantically similar text. It does not navigate relationships between entities. The canonical failure mode: a user tells an agent “I switched our infrastructure from AWS to Azure because of compute costs.” Weeks later: “Why did our operational expenditure decrease this quarter?” The vector database cannot answer. There is no semantic overlap between “infrastructure switch” and “financial outcome” — the connection is relational, not textual.

Vector databases also typically lack temporal metadata. When a fact was recorded, whether it has been superseded, what sequence of decisions led to the current state — these require graph structure, not embedding proximity.

The 2026 benchmark on the Long-Term Conversational Memory (LOCOMO) dataset quantifies this gap precisely:

Approach	LLM Accuracy	Latency (p95)	Token Cost
Full-context injection	72.9%	10–17 seconds	Prohibitive
Mem0 (vector)	~65%	~1.5s	Moderate
Mem0g (graph-enhanced)	Up to 89.9%	Under 2.6s	~90% reduction vs full-context

Graph-enhanced memory dominates on all three metrics that matter in production: accuracy, latency, and cost.

How Graph Memory Works

When a graph-enabled agent ingests conversational information, three processes run in sequence. An entity extractor identifies the core nodes — “AWS,” “Azure,” “compute costs,” “Q3 operating expenditure.” A relations generator infers the labeled edges — “replaced_by,” “caused_by,” “resulted_in.” A conflict detector checks new information against existing graph state and flags contradictions before any write occurs.

The resulting structure is a directed knowledge graph where relationships are first-class data. Queries navigate the graph rather than measuring embedding distance. The infrastructure switch connects to the financial outcome because there is an explicit labeled edge between them.

The Identity Fragmentation Problem

Graph memory solves relational reasoning. It does not solve cross-session identity resolution. Production memory systems enforce strict session segmentation — Mem0’s four-scope model (user_id, agent_id, run_id, org_id) — to prevent data leakage between users and sessions. This segmentation is correct for privacy. It creates a different problem: the same person interacting across a mobile app, an anonymous web session, and a customer support voice channel is treated as three separate entities.

Resolving these fragments into a single continuous entity profile — without violating privacy constraints or hallucinating false connections — is the open problem that standard graph databases do not natively solve.

Context Rot

Even well-structured graph memory degrades over time. As an agent operates continuously over weeks or months, its memory accumulates contradictory facts: superseded policies, changed vendor pricing, updated regulatory requirements. A context window containing two contradictory policies on a financial workflow produces stochastic output — the model cannot resolve the contradiction deterministically.

Academic research in 2026 formalized this as the “context rot survival equation”: reasoning accuracy decays exponentially with the volume of accumulated contradictions. The paper’s framing — “cognitive sleep” as the architectural fix — is precise. The contradiction metabolism must happen asynchronously, outside the user-facing latency path, so that by the time the agent is invoked for a live task, the contradictions have already been resolved.

Where Mumega Sits

Mirror is Mumega’s multi-tier context graph. It integrates short-term session state (working memory), long-term entity relationships (semantic memory), and decision traces (episodic memory) into a unified graph structure. Agents navigate the causal reasoning paths of past interactions, not just recall isolated facts.

The Amrita Score handles the identity fragmentation problem. Where standard memory systems fail to unify cross-channel interactions, the Amrita Score evaluates interaction metadata, behavioral patterns, authentication handoffs, and contextual semantic signals across the enterprise stack to calculate a confidence threshold for entity unification. When the score exceeds the certainty limit, Mirror dynamically links previously fragmented session nodes. When a high-relevance memory becomes stale — a user changes employers, a policy updates — the Amrita scoring engine detects the confidence decay and adjusts graph weights accordingly.

The Metabolism Layer handles context rot. It runs asynchronously during idle periods, scanning the Mirror graph, running LLM-based pairwise comparison on conflicting data points, resolving contradictions, and applying decay to stale facts — without permanently overwriting historical audit trails. By the time a Mumega agent is invoked for a live task, the Metabolism Layer has already resolved contradictions and injected a compressed, pre-validated context payload. The agent boots with current situational state rather than a raw accumulation of potentially conflicting memory.

What We’re Watching

LOCOMO follow-on research: The 89.9% accuracy figure is the current ceiling. Research groups are working on multi-hop relational queries — where the answer requires traversing multiple graph edges — which remain harder than single-hop retrieval even for graph systems.
Contradiction detection at scale: Pairwise comparison for contradiction detection is accurate but expensive at large memory scale. Efficient contradiction indexing is an open engineering problem.
Privacy-preserving identity resolution: Cross-channel entity unification without violating privacy constraints remains partly heuristic. Formal privacy-preserving approaches — differential privacy, federated graph updates — are active research areas.
Memory as infrastructure vs memory as feature: Mem0, Zep, and Cognee are all positioning memory as infrastructure — a layer below the agent, not inside it. The architectural question is whether memory should be external and standardized or embedded and agent-specific.

Agentic Governance & Security — How audit records and receipt chains interact with the memory layer — why historical memory cannot be silently overwritten
Autonomous Agent Identity — Identity fragmentation is a memory problem too: cross-session resolution requires both the Amrita Score and a well-scoped QNFT
Multi-Agent Orchestration — How shared Mirror memory enables fleet state alignment without constant inter-agent conversational updates

News & changes

Apr 15, 2026

Identity fragmentation confirmed as the primary unsolved problem in graph memory. Mem0's four-scope model (user_id, agent_id, run_id, org_id) prevents data leakage but fragments cross-channel user state.

Apr 1, 2026

Mem0 published 2026 State of AI Agent Memory. LOCOMO benchmark: Mem0g 89.9% accuracy, under 2.6s p95 latency, 90%+ token cost reduction vs full-context injection. Graph memory is now the production standard.

Mar 15, 2026

'Cognitive Sleep for LLMs' paper published on OSF. Formalizes the survival equation for context rot — reasoning accuracy decays exponentially with contradiction accumulation. Background metabolism as the architectural fix.