Mumega

Which Agent Harness Should SOS Adapt?

The Short Answer

Yes, there are real harnesses we can adapt.

No, we should not replace SOS with one of them.

The right move is to steal the strongest ideas from the frameworks that already solved parts of the problem: multi-agent routing, durable execution, handoffs, sessions, repo-local context, and worker isolation.

What We Actually Need

SOS is not trying to become a chatbot wrapper. It is trying to become a control plane for a live team of agents.

That means the harness we adapt has to help with:

  • hub-and-spoke routing
  • project-local worker sessions
  • resurrection after crash or compaction
  • stateful handoffs
  • task isolation
  • memory that survives a session dying
  • local tool execution without turning the whole machine into chaos

Most frameworks cover one or two of those well. None of the ones I looked at cover the whole shape of SOS out of the box.

Best Fit: AutoGen Core

If I had to pick the closest architectural reference for SOS, it would be AutoGen.

The current AutoGen docs split the stack into Core and AgentChat. Core is the lower-level event-driven layer for multi-agent systems. AgentChat sits above it with higher-level defaults. That is useful because SOS already has its own strong opinions about routing, state, wake behavior, and task ownership. We do not need a framework that hides those choices from us.

What makes AutoGen relevant is that the official docs already point at the things SOS cares about:

  • distributed agent runtimes
  • event-driven messaging
  • code execution components
  • MCP integration
  • multi-agent team patterns

That maps well to SOS as it exists today: bus, worker runtime, tool access, and role-based coordination.

If we adapt one framework as the structural reference, this is the one.

Best Fit for Durable Flow: LangGraph

LangGraph looks strongest where SOS still needs more explicit lifecycle discipline.

The value is not the “agent” branding. The value is durable execution. LangGraph is built around long-running, stateful workflows with checkpoints, resumability, and explicit graph transitions. That matters because SOS is already dealing with dead sessions, stuck sessions, task restarts, and the difference between idle and actually gone.

Where AutoGen feels like a good reference for the runtime shape, LangGraph feels like a good reference for:

  • state transitions
  • resumable runs
  • checkpointing
  • human-in-the-loop intervention
  • explicit orchestration instead of implicit chat drift

If we want the lifecycle manager to get less magical and more reliable, LangGraph is worth studying closely.

Best Fit for Handoffs and Sessions: OpenAI Agents SDK

The OpenAI Agents SDK is not the full harness SOS needs, but it has a clean mental model for two important pieces: handoffs and sessions.

Its official docs are especially useful for:

  • structured agent-to-agent handoffs
  • persistent session history
  • client-side memory abstractions like SQLite-backed sessions
  • clean separation between continuation and session-managed context

That is exactly the kind of semantic clarity SOS needs when deciding how a hub hands a task to a worker and what history survives the transfer.

I would not use it as the main orchestration layer for SOS. I would absolutely borrow its handoff and session model.

Useful Pattern, Wrong Core: OpenHands

OpenHands has valuable ideas, especially around repo-local behavior and lightweight context packs.

The part I would borrow is the microagent concept: small, local, directory-scoped instructions that travel with the project and shape how work gets done there. That fits the way we already think about project-local Codex and Claude workers.

But OpenHands is not the right backbone for SOS. Its own FAQ says it is intended for a single user on a local workstation and is not designed as a multi-tenant platform. That matters. SOS is already doing named agents, bus traffic, recovery logic, and shared system behavior. OpenHands is more useful as a source of worker-runtime patterns than as the organism itself.

So: copy the repo-local context ideas, not the whole runtime.

Useful, But Higher-Level: CrewAI

CrewAI is solid if the goal is business workflow orchestration with predefined agents, tasks, and flows. It is much less convincing as the core for a system like SOS that already owns its own bus, lifecycle rules, and runtime assumptions.

Its Flows and Crews are still worth reading because they give good patterns for:

  • typed state
  • guarded task progression
  • resumable process structure
  • hierarchical workflow definitions

But for SOS, CrewAI feels like a source of workflow ideas, not a runtime substrate.

The Recommendation

If we want to move fast without rebuilding ideas that already exist, the stack to adapt is:

  1. AutoGen Core for the runtime reference
  2. LangGraph for durable lifecycle and checkpoint semantics
  3. OpenAI Agents SDK for handoffs and session semantics
  4. OpenHands microagents for per-project context packs

That is the combination that best matches the shape of the system we are actually building.

What This Means for SOS

Practically, I would translate that into four concrete design moves.

1. Keep SOS as the control plane

Do not replace the bus, the registry, the wake path, or the squad model.

Those are already the differentiated parts of the system.

2. Make worker execution more formal

Workers should be project-local, disposable, and reproducible.

This is where OpenHands-style local context and AutoGen-style runtime structure help. A worker should come up with:

  • the right repo
  • the right startup files
  • the right model
  • the right role
  • a clean teardown path

3. Make lifecycle state explicit

We should stop treating agent lifecycle as a loose collection of heuristics. The system needs explicit states such as:

  • warm
  • idle
  • busy
  • parked
  • compacted
  • dead

This is where LangGraph-style checkpoint and state-machine thinking helps.

4. Make handoffs first-class

A hub handing work to a worker should not feel like “send a message and hope.” It should feel like a formal transfer:

  • task assigned
  • context attached
  • worker accepted
  • execution started
  • result returned
  • history persisted

This is where the OpenAI Agents SDK handoff/session model is the cleanest reference.

Final Take

There is no single harness I would install and call the problem solved.

But there is absolutely a harness strategy worth adopting.

The strongest move is to let SOS stay sovereign while borrowing:

  • AutoGen’s runtime shape
  • LangGraph’s state discipline
  • OpenAI’s handoff semantics
  • OpenHands’ project-local context pattern

That is not compromise. That is good engineering.

Sources

Share