GitHub Execution Ledger: Public Proof for Agent Work
Mumega is preparing the GitHub Execution Ledger: a public proof surface for agent work. The point is not to show that agents can make changes; it is to show the chain from directive to implementation, tests, receipts, and operating memory.
Most AI product demos show the output.
The harder question is what happened before and after the output:
- Who asked for the work?
- Which agent accepted it?
- What files changed?
- Which tests ran?
- What failed?
- What got remembered?
- Which action is still waiting for approval?
That chain is the difference between agent theater and a system that can operate inside a real company.
What Exists Now
On May 16, 2026, S063 moved from outreach-first into infrastructure for repeatable execution. The latest slice created the pieces needed for a public ledger:
| Proof Surface | Current State |
|---|---|
| Universal Onboarding Engine | sprout_tenant MCP tool live |
| Generated tenant files | 4 files: AGENTS.md, .agent.md, Inkwell canvas, machine config |
| Boundary tests | 11 focused tests passing |
| Runtime smoke | Live MCP scratch project created all 4 onboarding files |
| Core services | SOS MCP, SOS Squad, and Mirror active |
| Social Autopilot | social-autopilot-v1 manifest prepared |
| Public Milestone | This markdown prepared for Inkwell |
This is not the full ledger launch yet. It is the first public milestone artifact: the shape of the ledger and the first proof batch that will feed it.
Why It Matters
Agents fail in companies when their work cannot be audited.
The problem is not just hallucination. It is missing causality. A model can produce a correct patch and still leave the company unable to answer basic operational questions:
- Was this work requested by a human, a coordinator, or another agent?
- Was the task inside the correct tenant boundary?
- Did the agent use broad secrets or scoped tools?
- Did the result pass tests?
- Did any external action happen?
- Can another agent continue without asking Hadi to re-explain everything?
The GitHub Execution Ledger is the public-facing version of the internal answer we are building into SOS.
The Ledger Shape
flowchart LR A[Loom Directive] —> B[SOS Task or MCP Tool] B —> C[Agent Implementation] C —> D[Git Diff] D —> E[Test Evidence] E —> F[Runtime Smoke] F —> G[Receipt or Memory] G —> H[Public Milestone] H —> I[Next Sprint Context]
Each step should be inspectable.
The ledger does not need to expose private tokens, customer data, or raw internal context. It does need to expose enough structure that a reader can see how work moved:
| Ledger Field | Purpose |
|---|---|
| Directive | Why the work started |
| Agent | Who executed the slice |
| Files | What changed |
| Tests | What was verified |
| Runtime proof | Whether the surface actually worked |
| Receipts | What got remembered or reconciled |
| Caveats | What is still blocked |
The First Entry
The first candidate ledger entry is S063 Universal Onboarding Engine.
Loom asked for a tool that could onboard TROP, Amrita, or a new customer project with one call. Codex implemented sprout_tenant as a system-only SOS MCP tool.
The tool takes an absolute project path and generates:
AGENTS.md.agent.md.mumega/inkwell-canvas.md.mumega/living-enterprise.json
It is safe by default. Existing files are skipped unless overwrite_existing: true is set. That matters because onboarding should prevent chaos, not overwrite a tenant’s local meaning.
The focused verification set passed:
| Check | Result |
|---|---|
| Service tests | Passed |
| MCP tool tests | Passed |
| MCP boundary contract | Passed |
| Flow health regression | Passed |
| Compile check | Passed |
| Live MCP smoke | Passed |
What The Ledger Will Not Be
It will not be a vanity changelog.
We do not need a public list of every tiny file edit. We need a ledger of meaningful execution units: the directive, the proof, the verification, the receipt, and the caveat.
The caveat matters. The S063 onboarding slice shipped with one unresolved dependency: configured Gemini credentials are currently invalid or permission-denied. The tool handles that by falling back to deterministic local inspection and returning warnings.
That is exactly the kind of thing a ledger should show. A serious execution record does not hide the imperfect parts. It shows what works, what degraded safely, and what needs repair.
What Comes Next
The next step is to make the ledger mechanical:
- Every completed slice emits a compact execution record.
- GitHub receives the record as a visible artifact.
- Inkwell can turn selected records into public milestones.
- SOS and Mirror keep the private receipts that should not be public.
- Loom uses ledger entries to steer the next sprint without asking agents to reread the world.
If this works, a customer should be able to inspect not just what Mumega claims, but how Mumega works.
That is the standard for a Living Enterprise: not autonomous output, but accountable execution.