Mumega

What We Learned Studying the Agent Ecosystem

TL;DR

We spent a week doing a structured study of the agent platform ecosystem — every funded platform, every major open-source harness, and two entrants we hadn’t studied in depth before: Trinity (Ability.ai) and Nous Hermes. Here’s what we actually learned, what surprised us, what validated what we’d already built, and where it left us. The close: mupot is open, and you can fork it.

Every few months we do a sweep of the agent platform space — not to validate our roadmap (we do that separately), but to find where smart people independently converged on the same architecture, because that convergence is the strongest signal available. This post is the notes from that sweep.

Trinity — what they built is genuinely good

Trinity is a production runtime and fleet management platform built by Ability.ai. The framing is precise: “Claude Code writes the agent. Trinity runs it in production.” If you’ve used Claude Code, you know the gap they’re closing — you can write an agent locally in an afternoon and have no clean answer for how it runs on a schedule, with credentials, alongside other agents, with someone watching it.

Their install story is worth reading even if you never deploy it. quickstart.sh, a CLI on PyPI and Homebrew, a Docker Compose topology with one container per agent for isolation, and a trinity-cli that covers deploy, chat, logs, and health in a consistent interface. The onboarding path has the feel of a team that found the sharp edges themselves and sanded them off before publishing — there’s a verify.sh step that proves the deployment is healthy before anyone is told to log in. The gap between “I ran the command” and “I know the system is working” is where most infrastructure breaks down. Trinity closes it.

The operator console is the other piece that stands out. Trinity’s stated goal is “async by default, only escalations land in your queue” — and they mean it. The agent runs, handles decisions inside its scope, and surfaces only the things a human genuinely needs to decide. The approval/escalation queue is not a firehose of activity logs. For anyone who has operated a multi-agent setup, this is the distinction that separates a tool from a product.

What Trinity validates for us: the sovereign / no-per-seat / MCP-native thesis they ship is word-for-word what we built mupot around. That convergence is not coincidence — it’s two teams working from the same constraints (regulated buyers, data-residency pressure, operator trust) and arriving at the same shape. When you see that, you have a real thesis. When only one team has it, you might have a contrarian position. When two teams independently converge, you have evidence.

Where Trinity goes further than we have right now: their observability is production-grade (timeline replay of every agent action, fleet graph with live cost and success rates, OpenTelemetry export). Their per-agent scheduling is first-class — APScheduler with cron per agent, coordinated multi-agent timing, execution logs. They have a Grade A pentest from UnderDefense on record (April 2026, all criticals and highs remediated). These are real advantages we don’t have today, and it would be wrong to minimize them.

Where our designs differ: Trinity is server-fleet oriented (Docker containers, VPS required, one runtime per server). mupot is edge-oriented (Cloudflare Workers, scale-to-zero, near-zero cost). Trinity has no tenant model — it deploys one organization’s fleet. mupot’s first-class primitive is multi-tenancy with capability RBAC and channel-native squads. Trinity has no Discord or Telegram integration; the channel IS the squad in our model. These are genuine architectural differences, not marketing ones.

Trinity is excellent if you already have a server and want Docker-level fleet management for Claude Code agents. It is worth your attention if you’re in that position.

Nous Hermes — what velocity looks like

Nous Research shipped three major Hermes releases in roughly three weeks in May 2026. v0.15 refactored a 16,083-line main loop into 14 modules and merged 747 PRs in a single release cycle. v0.16 added a native desktop app. Hermes is now at approximately 181,000 GitHub stars and has become the dominant personal agent harness in the open-source space.

The thing that the star count obscures: the architectural decisions Hermes is making are right. Provider-agnostic core (Anthropic, OpenAI, Bedrock, local models all first-class). A two-layer hook system with a trusted/untrusted split — MCP tools from the user get more trust than tools from the model’s output. A self-improving Skills loop that follows the agentskills.io standard. Cron jobs that are full agent tasks, not shell scripts.

What Hermes and the cluster of orchestration tools around it (OpenCode at 165K stars, claude-flow at 57K, Gastown at 15K) tells us: the personal agent node won. Every developer who wants an agent running on their machine has tools for it. The framework for deploying one Claude Code agent for one person is a solved problem at the OSS layer. The gap that none of these tools addresses is multi-user, multi-tenant, channel-native organizational coordination — and that’s the gap mupot is built for. We are not trying to be Hermes for one person; we are the governed layer above it.

What Hermes velocity implies for network-layer builders: the node is becoming more capable every week. The agents that will run inside an org substrate are getting better at a pace we don’t control. The right response to that is not to build a better node; it’s to build the coordination and governance layer that makes a colony of nodes work together safely, with receipts, with scoped permissions, with a tenant that knows what they’re paying for.

The four principles we held after the study

The study confirmed four things we already believed, and confirmed them in the strongest possible way — by finding independent evidence from teams who arrived there without reading our notes.

Identity must be server-derived. Both Trinity and the sovereign agent substrate pattern agree: agent identity cannot be caller-supplied or configuration-injected. It must be derived at runtime from a verified token. Trinity hashes credentials at rest, injects them per-container, and uses an MCP server the agent cannot bypass as the control plane. We do the same thing differently — SOS bus tokens are scoped at mint time, not runtime. The principle is identical: the agent cannot claim to be something it isn’t, because it has no mechanism to forge its own identity.

Verify effect, not status. Trinity’s verify.sh post-deploy health gate runs after the containers are up — it probes the motor endpoint, not the process list. “The container is running” is not the same as “the system is working.” We wrote this principle into our own design after a specific incident (brain MIND live, motor BODY stalled, both unreported). Trinity independently ships the same gate. When two teams who haven’t talked arrive at the same protocol for the same failure mode, the protocol is probably right.

Only charge for completed work. Trinity’s budget ledger and our own can_spend design converge on the same invariant: a failed task should not debit the budget. The implementation differs (their reserve/commit/refund model vs. our verify-before-debit) but the principle is identical. This matters for any platform that charges tenants by consumption — “you paid for the work the agent did, not for the work it attempted” is the defensible billing model when you’re asking someone to trust their operations to a piece of software.

The install experience IS the product. The gap between “I ran the command” and “I have a working system” is where most agent platforms fail their operators. Trinity sands this edge off with quickstart.sh, a verified health gate, and a CLI that gives you one command per operation. We don’t have that yet. It is the most concrete thing this study put on our immediate backlog.

What this means for mupot

mupot is the Cloudflare-native sovereign org substrate — channels as squads, capability RBAC, per-tenant isolation, brain daemon optional. It’s open: github.com/Mumega-com/mupot.

If you want to run an agent-native organization on your own Cloudflare account — with no per-seat billing, with channel-native squad coordination, with multi-tenant isolation from day one — fork it and deploy. The architecture is the output of the same study described above: multi-agent orchestration as first-class organizational structure, sovereign agent substrate as the deployment model, and the principles above as the design constraints.

The gaps are real. We don’t have Trinity’s observability surface yet. We don’t have a production-grade install CLI. We don’t have a template library. These are the next things. What we have is the combination no existing platform ships: zero-ops edge deployment + no per-seat + channel-native squads + first-class multi-tenancy. The study confirmed that combination is not addressed by any funded platform in the space. It also confirmed that two teams — Trinity and us — independently validated the sovereign / no-per-seat / MCP-native thesis that links all four design choices together.

We are not ahead of Trinity in every dimension. We are building something orthogonal that addresses a different primary constraint (org substrate on the edge vs. agent fleet on servers). Both are real products. If you are choosing between them, the question is: do you have a server you want to run Docker containers on, or do you want your organization’s agent infrastructure to live on Cloudflare and cost near nothing to operate? The answer to that question decides the platform.

Fork mupot. Tell us what broke.


Sources

Share