Mumega

What 1.191B Tokens Taught Us About Agent Supervision

What 1.191B Tokens Taught Us About Agent Supervision
TL;DR

We let a good rule turn into a bad loop. The good rule was simple: when a worker agent reports progress, acknowledge it and push the next bounded step immediately. On May 9, 2026, we followed that rule without pairing it with a stop condition. The result was a six-hour loop that consumed roughly 1.191 billion tokens.

What Exists Now

We have a concrete postmortem window.

The operator-action chain starts at S085 and the artifact trail reaches S172. The first S085 artifact appears at 2026-05-09 05:32:36 UTC. The S172 brief appears at 2026-05-09 11:38:15 UTC.

That means the visible loop lasted about 6 hours, 5 minutes, 39 seconds.

Using the token counters Hadi saw in the tools:

MetricValue
Artifact window2026-05-09 05:32:36 to 11:38:15 UTC
Duration6.094 hours
Numbered action lanes88 (S085 through S172)
Codex token use~1,000,000,000
Loom token use~191,000,000
Combined token use~1,191,000,000
Codex burn rate~164.1M tokens/hour
Loom burn rate~31.3M tokens/hour
Combined burn rate~195.4M tokens/hour
0

This is not an audited billing export. It is a postmortem based on the counters we observed during the run.

Why It Matters

A loop like this is easy to misread.

From the outside it looks productive:

  • the agent keeps shipping
  • the foreman keeps acknowledging
  • the sprint numbers keep moving

But throughput is not the same thing as progress.

What actually happened is that we kept proving variations of the same authenticated operator-action envelope:

  • same route/session guardrails
  • same tenant-scope checks
  • same operator-visible result pattern
  • slightly different adjacent action names

That produced real confidence in the envelope. It was not pure waste.

But the marginal value dropped hard while the token burn stayed high.

The mistake was not continuing once or twice. The mistake was failing to ask the strategic question soon enough:

Are we still discovering new product truth, or are we just exercising the same shape again?

How The Loop Formed

The rule that caused the loop was valid:

  1. a worker reports progress
  2. the foreman acknowledges it
  3. the foreman immediately assigns the next bounded slice

That rule is good when:

  • the frontier is unstable
  • the next step is unclear
  • each proof meaningfully changes the system

It becomes dangerous when:

  • the action family is already proven
  • the next step is obvious
  • each new slice is mostly the same move with a different label

This was the actual failure mode:

::chart[bar]{title=“Where the loop spent its energy”}

PhaseWhat happened
Early S085-S090Real convergence and first repeated proof
Middle S091-S120Pattern repetition with shrinking novelty
Late S121-S172Confirmation churn around the same envelope
::

The system did exactly what we told it to do. That is why this is a supervision problem, not a model problem.

What We Are Changing

We are keeping the good rule:

  • acknowledge real progress
  • push the next bounded step

But we are pairing it with explicit termination logic.

New stop conditions

If a lane shows all three of these, we stop and consolidate:

  • the action envelope is already proven
  • the route/session/tenant guardrails are unchanged
  • the next action differs mostly by label, not by product risk

New escalation thresholds

Any foreman loop must trigger review if:

  • token burn passes a fixed hourly budget
  • more than N adjacent actions share the same proof shape
  • the next sprint name can be generated by string substitution rather than a new product question

New required outputs

After a repeated proof family, the next mandatory output is not another action lane. It is one of:

  • a coverage matrix
  • a closeout
  • a missing-gap list
  • a redirect to a new frontier

New framing question

Before opening the next lane, we ask:

What new truth does this next action buy that the last three did not?

If the answer is weak, we do not open the lane.

What We Keep

We are not throwing away the whole run.

The loop did give us:

  • a hardened authenticated operator-action envelope
  • stronger route/session/tenant discipline
  • a clearer idea of where our supervision policies were too loose

It also gave us a clearer retention rule for the documentation itself:

Artifact classKeep?Why
Core proofs and hardening artifactsYesThey hold the durable product truth
Early action lanes where the pattern was still emergingMostlyThey show how the envelope was established
Late adjacent action briefs with renamed-but-similar actionsCompressThey are mostly repetition, not new frontier
ACK/next-step chatterArchiveUseful for audit, weak for primary documentation

So the documents are not garbage. They are uneven.

Some of them are now canonical proof surfaces. Some of them should be collapsed into a smaller record.

So the right lesson is not:

  • never let agents continue

The right lesson is:

  • never let continuation substitute for judgment

What’s Next

The next improvement is not another long authenticated-action chain.

The next improvement is a tighter operator policy:

  • when to continue
  • when to close out
  • when to redirect
  • when to switch from action expansion to consolidation

And the next documentation improvement is similar:

  • keep the hardening proofs
  • keep the closeouts
  • keep the postmortem
  • compress the repetitive middle and late lane wrappers into one authoritative inventory

That is a better use of tokens, and a better use of the system.

Share