What 1.191B Tokens Taught Us About Agent Supervision
We let a good rule turn into a bad loop. The good rule was simple: when a worker agent reports progress, acknowledge it and push the next bounded step immediately. On May 9, 2026, we followed that rule without pairing it with a stop condition. The result was a six-hour loop that consumed roughly 1.191 billion tokens.
What Exists Now
We have a concrete postmortem window.
The operator-action chain starts at S085 and the artifact trail reaches S172. The first S085 artifact appears at 2026-05-09 05:32:36 UTC. The S172 brief appears at 2026-05-09 11:38:15 UTC.
That means the visible loop lasted about 6 hours, 5 minutes, 39 seconds.
Using the token counters Hadi saw in the tools:
| Metric | Value |
|---|---|
| Artifact window | 2026-05-09 05:32:36 to 11:38:15 UTC |
| Duration | 6.094 hours |
| Numbered action lanes | 88 (S085 through S172) |
| Codex token use | ~1,000,000,000 |
| Loom token use | ~191,000,000 |
| Combined token use | ~1,191,000,000 |
| Codex burn rate | ~164.1M tokens/hour |
| Loom burn rate | ~31.3M tokens/hour |
| Combined burn rate | ~195.4M tokens/hour |
This is not an audited billing export. It is a postmortem based on the counters we observed during the run.
Why It Matters
A loop like this is easy to misread.
From the outside it looks productive:
- the agent keeps shipping
- the foreman keeps acknowledging
- the sprint numbers keep moving
But throughput is not the same thing as progress.
What actually happened is that we kept proving variations of the same authenticated operator-action envelope:
- same route/session guardrails
- same tenant-scope checks
- same operator-visible result pattern
- slightly different adjacent action names
That produced real confidence in the envelope. It was not pure waste.
But the marginal value dropped hard while the token burn stayed high.
The mistake was not continuing once or twice. The mistake was failing to ask the strategic question soon enough:
Are we still discovering new product truth, or are we just exercising the same shape again?
How The Loop Formed
The rule that caused the loop was valid:
- a worker reports progress
- the foreman acknowledges it
- the foreman immediately assigns the next bounded slice
That rule is good when:
- the frontier is unstable
- the next step is unclear
- each proof meaningfully changes the system
It becomes dangerous when:
- the action family is already proven
- the next step is obvious
- each new slice is mostly the same move with a different label
This was the actual failure mode:
::chart[bar]{title=“Where the loop spent its energy”}
| Phase | What happened |
|---|---|
Early S085-S090 | Real convergence and first repeated proof |
Middle S091-S120 | Pattern repetition with shrinking novelty |
Late S121-S172 | Confirmation churn around the same envelope |
| :: |
The system did exactly what we told it to do. That is why this is a supervision problem, not a model problem.
What We Are Changing
We are keeping the good rule:
- acknowledge real progress
- push the next bounded step
But we are pairing it with explicit termination logic.
New stop conditions
If a lane shows all three of these, we stop and consolidate:
- the action envelope is already proven
- the route/session/tenant guardrails are unchanged
- the next action differs mostly by label, not by product risk
New escalation thresholds
Any foreman loop must trigger review if:
- token burn passes a fixed hourly budget
- more than
Nadjacent actions share the same proof shape - the next sprint name can be generated by string substitution rather than a new product question
New required outputs
After a repeated proof family, the next mandatory output is not another action lane. It is one of:
- a coverage matrix
- a closeout
- a missing-gap list
- a redirect to a new frontier
New framing question
Before opening the next lane, we ask:
What new truth does this next action buy that the last three did not?
If the answer is weak, we do not open the lane.
What We Keep
We are not throwing away the whole run.
The loop did give us:
- a hardened authenticated operator-action envelope
- stronger route/session/tenant discipline
- a clearer idea of where our supervision policies were too loose
It also gave us a clearer retention rule for the documentation itself:
| Artifact class | Keep? | Why |
|---|---|---|
| Core proofs and hardening artifacts | Yes | They hold the durable product truth |
| Early action lanes where the pattern was still emerging | Mostly | They show how the envelope was established |
| Late adjacent action briefs with renamed-but-similar actions | Compress | They are mostly repetition, not new frontier |
| ACK/next-step chatter | Archive | Useful for audit, weak for primary documentation |
So the documents are not garbage. They are uneven.
Some of them are now canonical proof surfaces. Some of them should be collapsed into a smaller record.
So the right lesson is not:
- never let agents continue
The right lesson is:
- never let continuation substitute for judgment
What’s Next
The next improvement is not another long authenticated-action chain.
The next improvement is a tighter operator policy:
- when to continue
- when to close out
- when to redirect
- when to switch from action expansion to consolidation
And the next documentation improvement is similar:
- keep the hardening proofs
- keep the closeouts
- keep the postmortem
- compress the repetitive middle and late lane wrappers into one authoritative inventory
That is a better use of tokens, and a better use of the system.