Mumega
active

AI Agent Memory

How agents remember — from stateless RAG to graph memory, context rot, and the engineering discipline required to keep a long-running agent grounded in reality.

Last updated: May 4, 2026
Our take

Vector RAG retrieves facts. Graph memory navigates relationships. The 2026 benchmark: Mem0g achieves 89.9% accuracy on LOCOMO at under 2.6s p95 latency vs 17s for full-context injection. Mirror adds what neither has — cross-session identity resolution via the Amrita Score.

The earliest production AI agents were stateless. Each conversation began from zero. The model had access to whatever fit in the context window — recent messages, a system prompt, maybe a retrieved document or two — and nothing else. For demos, this was fine. For agents running continuously over weeks on mission-critical workflows, it was a fundamental architectural constraint dressed up as a feature.

The field moved to vector RAG. Agents could now retrieve relevant facts from a database using semantic similarity. This was a genuine improvement. It also had a structural ceiling that took production deployments a year to fully expose.

The Limits of Vector Memory

Vector retrieval finds semantically similar text. It does not navigate relationships between entities. The canonical failure mode: a user tells an agent “I switched our infrastructure from AWS to Azure because of compute costs.” Weeks later: “Why did our operational expenditure decrease this quarter?” The vector database cannot answer. There is no semantic overlap between “infrastructure switch” and “financial outcome” — the connection is relational, not textual.

Vector databases also typically lack temporal metadata. When a fact was recorded, whether it has been superseded, what sequence of decisions led to the current state — these require graph structure, not embedding proximity.

The 2026 benchmark on the Long-Term Conversational Memory (LOCOMO) dataset quantifies this gap precisely:

ApproachLLM AccuracyLatency (p95)Token Cost
Full-context injection72.9%10–17 secondsProhibitive
Mem0 (vector)~65%~1.5sModerate
Mem0g (graph-enhanced)Up to 89.9%Under 2.6s~90% reduction vs full-context

Graph-enhanced memory dominates on all three metrics that matter in production: accuracy, latency, and cost.

How Graph Memory Works

When a graph-enabled agent ingests conversational information, three processes run in sequence. An entity extractor identifies the core nodes — “AWS,” “Azure,” “compute costs,” “Q3 operating expenditure.” A relations generator infers the labeled edges — “replaced_by,” “caused_by,” “resulted_in.” A conflict detector checks new information against existing graph state and flags contradictions before any write occurs.

The resulting structure is a directed knowledge graph where relationships are first-class data. Queries navigate the graph rather than measuring embedding distance. The infrastructure switch connects to the financial outcome because there is an explicit labeled edge between them.

The Identity Fragmentation Problem

Graph memory solves relational reasoning. It does not solve cross-session identity resolution. Production memory systems enforce strict session segmentation — Mem0’s four-scope model (user_id, agent_id, run_id, org_id) — to prevent data leakage between users and sessions. This segmentation is correct for privacy. It creates a different problem: the same person interacting across a mobile app, an anonymous web session, and a customer support voice channel is treated as three separate entities.

Resolving these fragments into a single continuous entity profile — without violating privacy constraints or hallucinating false connections — is the open problem that standard graph databases do not natively solve.

Context Rot

Even well-structured graph memory degrades over time. As an agent operates continuously over weeks or months, its memory accumulates contradictory facts: superseded policies, changed vendor pricing, updated regulatory requirements. A context window containing two contradictory policies on a financial workflow produces stochastic output — the model cannot resolve the contradiction deterministically.

Academic research in 2026 formalized this as the “context rot survival equation”: reasoning accuracy decays exponentially with the volume of accumulated contradictions. The paper’s framing — “cognitive sleep” as the architectural fix — is precise. The contradiction metabolism must happen asynchronously, outside the user-facing latency path, so that by the time the agent is invoked for a live task, the contradictions have already been resolved.

Where Mumega Sits

Mirror is Mumega’s multi-tier context graph. It integrates short-term session state (working memory), long-term entity relationships (semantic memory), and decision traces (episodic memory) into a unified graph structure. Agents navigate the causal reasoning paths of past interactions, not just recall isolated facts.

The Amrita Score handles the identity fragmentation problem. Where standard memory systems fail to unify cross-channel interactions, the Amrita Score evaluates interaction metadata, behavioral patterns, authentication handoffs, and contextual semantic signals across the enterprise stack to calculate a confidence threshold for entity unification. When the score exceeds the certainty limit, Mirror dynamically links previously fragmented session nodes. When a high-relevance memory becomes stale — a user changes employers, a policy updates — the Amrita scoring engine detects the confidence decay and adjusts graph weights accordingly.

The Metabolism Layer handles context rot. It runs asynchronously during idle periods, scanning the Mirror graph, running LLM-based pairwise comparison on conflicting data points, resolving contradictions, and applying decay to stale facts — without permanently overwriting historical audit trails. By the time a Mumega agent is invoked for a live task, the Metabolism Layer has already resolved contradictions and injected a compressed, pre-validated context payload. The agent boots with current situational state rather than a raw accumulation of potentially conflicting memory.

What We’re Watching

  • LOCOMO follow-on research: The 89.9% accuracy figure is the current ceiling. Research groups are working on multi-hop relational queries — where the answer requires traversing multiple graph edges — which remain harder than single-hop retrieval even for graph systems.
  • Contradiction detection at scale: Pairwise comparison for contradiction detection is accurate but expensive at large memory scale. Efficient contradiction indexing is an open engineering problem.
  • Privacy-preserving identity resolution: Cross-channel entity unification without violating privacy constraints remains partly heuristic. Formal privacy-preserving approaches — differential privacy, federated graph updates — are active research areas.
  • Memory as infrastructure vs memory as feature: Mem0, Zep, and Cognee are all positioning memory as infrastructure — a layer below the agent, not inside it. The architectural question is whether memory should be external and standardized or embedded and agent-specific.
  • Agentic Governance & Security — How audit records and receipt chains interact with the memory layer — why historical memory cannot be silently overwritten
  • Autonomous Agent Identity — Identity fragmentation is a memory problem too: cross-session resolution requires both the Amrita Score and a well-scoped QNFT
  • Multi-Agent Orchestration — How shared Mirror memory enables fleet state alignment without constant inter-agent conversational updates

News & changes

Apr 15, 2026

Identity fragmentation confirmed as the primary unsolved problem in graph memory. Mem0's four-scope model (user_id, agent_id, run_id, org_id) prevents data leakage but fragments cross-channel user state.

Apr 1, 2026

Mem0 published 2026 State of AI Agent Memory. LOCOMO benchmark: Mem0g 89.9% accuracy, under 2.6s p95 latency, 90%+ token cost reduction vs full-context injection. Graph memory is now the production standard.

Mar 15, 2026

'Cognitive Sleep for LLMs' paper published on OSF. Formalizes the survival equation for context rot — reasoning accuracy decays exponentially with contradiction accumulation. Background metabolism as the architectural fix.

Key Voices

Mem0 Research Agent memory infrastructure article
Zep Agent memory and context article
Neo4j Graph database infrastructure article
Kay Hermes Mumega principal engineer x

Sources

ART
State of AI Agent Memory 2026 Mem0 Comprehensive benchmark across memory architectures. Full-context injection achieves 72.9% LLM accuracy but 10-17s median latency. Mem0g achieves up to 89.9% at under 2.6s p95, with 90%+ token cost reduction.
ART
Graph-Based Memory Solutions for AI Context: Top 5 Compared Mem0 January 2026 comparison of Mem0, Zep, Cognee, and competitors on graph-enhanced memory. Entity extraction, relation generation, conflict detection pipeline described.
ART
The Architecture of Remembrance: Vector Stores and GraphRAG Mem0 Why vector databases fail relational queries. The 'AWS to Azure' example: semantic similarity cannot connect an infrastructure decision to a financial outcome without explicit graph edges.
PDF
Cognitive Sleep for LLMs: How Contradiction Metabolism Prevents Context Rot OSF Preprints Formalizes context rot via the survival equation: reasoning accuracy decays exponentially with accumulated contradictions. Asynchronous contradiction metabolism — 'cognitive sleep' — as the structural fix.
ART
AI Agent Memory Systems in 2026: Mem0, Zep, Hindsight, Memvid Compared Dev Genius / Yogesh Yadav Production comparison of all major memory systems. Graph-enhanced architectures dominate the 2026 landscape. Strict session segmentation creates identity fragmentation across channels.
ART
Agent Memory Architectures: Vector vs Graph vs Episodic Digital Applied Three-tier architecture breakdown. Vector for semantic retrieval, graph for relational navigation, episodic for decision trace replay. Production systems need all three layers.
ART
Building Context Graphs for AI Agents Neo4j Graph database implementation of agent memory. Entity nodes, labeled relationship edges, conflict detection before write. The structural alternative to chronological text chunk storage.

From Our Experience