Understanding Omnigent through Cynthia

In the Bosun post I wrote about the problem that got me building around agents in the first place: once you have more than one coding agent running, the model is no longer the bottleneck. You are. You become the person holding the plan, the state, the approvals, the handoffs, and the memory of which terminal is doing what.

Bosun is my answer to that at the outer loop. It is the chief of staff I talk to from my phone. It reads the boards, dispatches work, watches for blocked sessions, and tells me when I need to make a decision.

This post is about the layer underneath that: Omnigent, and the agent pack I have been building on top of it, Cynthia.

Cynthia is easy to describe badly. You can say “Claude plans, Codex writes, Gemini researches, Kiro reviews”, and that is directionally true, but it misses the part that matters. Cynthia is not just a prompt that calls a few models. It is an Omnigent-native workbench: YAML, skills, policies, subagents, artifacts, approvals, and native coding harnesses all arranged into one governed engineering workflow.

That distinction matters because the frontier labs are not only shipping models now. They are shipping coding agents.

Claude Code is not just Claude. Codex CLI is not just an OpenAI model. Antigravity is not just Gemini. Kiro is not just a model endpoint. Each one carries a harness around the model: prompts, tools, context handling, permission flows, session state, repo behavior, terminal UX, and opinions about how coding work should happen.

The useful thing Omnigent does is let me use those native harnesses as first-class sessions, while putting one control plane around them.

Cynthia as a crewmate

In my own setup, Cynthia can be used two ways. I can open it directly as a workbench, or Bosun can deploy it as one of the crewmates.

Cynthia is both a direct workbench and a crewmate Bosun can deploy

That diagram is the mental model I wish I had started with.

At the bottom is Omnigent: sessions, subagents, terminals, worktrees, policies, approvals, files, diffs, and collaboration. Cynthia sits inside that as an agent pack. It is not trying to recreate the platform. It uses the platform.

The pack itself is mostly ordinary Omnigent material:

YAML config for the root agent and subagents.
Skills for commands like /council, /plan, /delivery, /review, and /status.
Python tools for deterministic artifact work.
Policies for things like allowed write paths, tool limits, and nested-agent restrictions.
Subagent definitions for Claude, Codex, Antigravity, Kiro, and deterministic fake candidates.

That is why Cynthia looks superficially similar to Polly and Debby, the Omnigent examples. They are all YAML-driven agent packs. The difference is the contract.

Polly is a coding orchestrator: break work up, delegate to coding agents, cross-review, and hand the human PRs. Debby is closer to debate. Cynthia is meant to become a governed engineering workbench. It has stages, artifacts, approval records, isolated research, single-writer delivery, review loops, and eventually Temporal-backed mission durability.

So yes, Cynthia is “just an agent pack” in the same sense that a Kubernetes operator is “just YAML and controllers”. The power is in what the pack asks Omnigent to coordinate.

Council is not one model thinking harder

The first Cynthia mode is Council. The mistake I nearly made when explaining it was to flatten the whole thing into “use a few models, then Claude summarizes.” That undersells the architecture.

Council is a fan-out pattern.

Many independent research windows, one synthesis step after the reveal

A Council request starts as one question to Cynthia. Cynthia then opens separate research windows. In my intended main roster those are:

Claude for strong reasoning and architecture judgement.
Codex for repo-aware implementation perspective and codebase archaeology.
Antigravity/Gemini for an independent Google-family view.
Kiro when the question is product, specification, requirements, or acceptance-criteria heavy.

Those research windows are isolated. They should not see each other’s answers while forming their own view. Each one returns a structured report: recommendation, evidence, assumptions, risks, open questions, and confidence.

Only after the reports are collected does Cynthia reveal them to the synthesizer. Claude is the default synthesis layer because it is good at reconciling disagreement, spotting weak evidence, and turning competing reports into a coherent decision.

The important part is that the final answer is not “Claude thought about it once”. It is Claude reconciling a set of independent reports, with the disagreements preserved rather than averaged away.

That is the same reason I like maker-checker patterns in finance. The point is not that any one person is magically objective. The point is that independence changes the failure mode.

The native harness idea

This is where Omnigent gets interesting.

If all I wanted was a panel of model calls, I could use an API gateway and be done. That is useful for some research tasks, but it throws away a lot of what makes coding agents effective.

The frontier labs have spent real engineering effort wrapping their models in coding harnesses. They decide how the model sees a repo, how shell commands are proposed, how edits are applied, when permission prompts appear, how context is compacted, how sessions resume, and what the terminal experience feels like. Those choices are part of the product.

Omnigent’s native harnesses try to preserve that.

Native harnesses preserve the vendor agent runtime and put Omnigent around it

The common shape is:

Omnigent creates or binds a session.
A native wrapper launches the vendor CLI or TUI in a runner-owned terminal.
A bridge directory records the state both sides need: session id, socket paths, native conversation ids, config paths, and other runtime facts.
Omnigent injects web, mobile, or CLI turns into the native harness.
A forwarder, reader, hook, recorder, or RPC client mirrors native output back into Omnigent.
Policies, approvals, MCP tools, attachments, model settings, interrupt, and resume are wired in where the vendor exposes enough surface.

That last clause is doing a lot of work. Native harnesses share a pattern, but they do not share identical capabilities. Each vendor gives Omnigent a different control surface.

Claude native

Claude Code is the cleanest mental model.

Omnigent launches the real claude terminal experience, not a fake chat completion wrapper. The native executor does not try to synthesize Claude’s answer itself. It injects the latest user message into the Claude Code terminal, then the transcript forwarder mirrors what happened back into Omnigent.

Claude also exposes useful hook points. That means Omnigent can translate native events like tool use into its policy system. In practical terms, a dangerous command can be evaluated before it runs, the user can be asked for approval, and the result can still appear in the normal Omnigent session.

That is the gold standard: native UX, Omnigent session state, and real policy enforcement meeting in the middle.

Codex native

Codex has a different shape.

The native path still gives you the real Codex experience, but the bridge is more app-server oriented. Omnigent talks to a Codex remote/app-server transport, starts or steers turns, updates model and reasoning settings, and can interrupt active work. A forwarder mirrors the native output back to the Omnigent chat.

This matters for Cynthia because Codex is my default writer. If delivery becomes a real repository-writing stage, Codex needs a working shell, a real git environment, a workspace it can write to, and external gates that prove the work rather than trusting its self-report.

Codex is also where I learned a useful lesson about native integration: problems are often environmental, not prompt-level. On macOS, for example, the wrong git path can route through an Xcode shim and wedge under a sandbox. The correct fix is not “tell the agent not to use git”. The correct fix is to make the native execution environment resolve a real git binary.

Antigravity native

Antigravity has been the newest and messiest one in my exploration, which makes it a good example of why native harness work is real engineering.

The aim is parity with Claude and Codex: run the real agy CLI/TUI, keep the native terminal usable, mirror activity into Omnigent, deliver web turns into the same conversation, and avoid polluting the user’s global config.

The details are subtle. Antigravity uses Gemini-related local state. On macOS, auth can depend on the real user home and keychain behavior, so relocating HOME breaks login. But writing Omnigent MCP config straight into the user’s real global Gemini config is also wrong. The better shape is to preserve real HOME for auth while using an isolated per-session Gemini directory for generated config.

Antigravity also differs on policy. Where Claude and Codex expose stronger pre-tool hooks, Antigravity’s currently usable surface is more limited. Some interactions can be surfaced and bridged, but tool-call policy may be audit-only depending on what agy exposes. That is not a reason to pretend parity exists. It is exactly the kind of limitation Cynthia should record honestly.

Kiro native

Kiro is the one I expect to use when the problem is less “write this code” and more “does this match the spec?”

Omnigent launches the real kiro-cli TUI and injects messages into it. Permission integration works differently again. Instead of a Claude-style hook or Codex-style app-server path, Kiro exposes useful approval traffic through its ACP recorder. Omnigent can watch that recorder and mirror one-time approval prompts into the chat as approval cards.

The terminal remains authoritative. The web card is additive. That is a subtle but important design choice: Omnigent is not pretending it owns a control surface Kiro did not give it. It observes the native signal, mirrors the useful part, and falls back to the terminal when that is the safer source of truth.

For Cynthia, Kiro is not the default writer. I think its first-class role is review: requirements, acceptance criteria, spec compliance, and product judgement.

The staged workbench

The broader Cynthia system is not just Council. Council is the decision engine, but the end state is a staged engineering loop.

Cynthia turns research into governed delivery through explicit stage artifacts

The shape I want is:

Council: fan out research across Claude, Codex, Antigravity, and Kiro, then synthesize.

Plan: Claude turns the accepted decision into an implementation plan with files, tests, risks, and rollback.

Design: for frontend-heavy work, Claude becomes the design lead. It writes the UX brief, interaction model, visual acceptance criteria, and screenshot review checklist.

Delivery: Codex is the default sole writer. It implements against the approved plan and writes only inside the allowed worktree.

Review: Claude reviews the diff as the primary reviewer. Kiro can be added for spec compliance. Antigravity can be used for an independent black-box pass.

Repair loop: blocking findings go back to the writer for a bounded number of repair attempts. The loop needs explicit exit conditions: review passed, no progress, max iterations, failed gate, or human stop.

Status: Cynthia reports from artifacts, not vibes. The status stage should be able to tell me what happened, what passed, what is blocked, what decisions are pending, and which evidence supports the claim.

Later, Mission adds Temporal. Not to orchestrate every model turn. That stays in Omnigent. Temporal belongs at the coarse stage boundary: long waits, retries, schedules, durable approvals, external events, and mission checkpoints.

Where Claude should write

I said above that Codex is my default writer. I still think that is the right default because Cynthia benefits from a single-writer policy. If everyone edits the repo, review and accountability get messy fast.

But frontend work is a real exception pressure.

Claude often has better product taste. It is stronger at naming the interaction, noticing visual imbalance, and explaining why a UI feels wrong. For that reason, I want Cynthia to treat Claude as the design lead for frontend tasks by default.

That can mean two levels:

Default path: Claude writes the design brief and reviews screenshots, while Codex implements.
Opt-in path: Claude becomes the sole writer for a frontend delivery stage when design taste matters more than keeping Codex as the implementer.

The second path should be explicit. The single-writer rule still holds. It is not “Claude and Codex both edit until it looks good”. It is “choose the writer for this stage, then hold that writer accountable to the gates.”

What Omnigent gives me

The reason I am excited about this direction is that it lets each layer do the job it is actually good at.

Bosun is the chief of staff. It watches my boards, dispatches crewmates, and keeps me in the loop.

Cynthia is one of those crewmates, but a specialized one: a governed engineering workbench for research, planning, delivery, review, and status.

Omnigent is the substrate. It gives Cynthia sessions, native harnesses, subagents, terminals, policies, approvals, files, diffs, collaboration, and a UI I do not have to build.

The native harnesses are the bridge to the frontier labs’ actual coding agents. That is the part I do not want to lose by reducing everything to model APIs. The model is only one component. The harness around it is where a lot of the coding-agent behavior lives.

So the design principle for Cynthia is simple: do not rebuild the platform, and do not flatten native agents into plain completions. Use Omnigent as the control plane, use the labs’ native coding harnesses where they are strongest, and add the governance layer that turns a pile of clever sessions into a workflow I can trust.

What still needs proving

This is still an exploration, not a finished product.

The hard parts are not the diagrams. They are the boring edges:

Can each native harness expose enough policy surface for the level of autonomy I want?
Can Council candidates stay isolated while still getting enough repo context?
Can delivery prove completion through external gates instead of model self-report?
Can frontend review use screenshots and visual acceptance criteria without becoming subjective theatre?
Can Antigravity and Kiro reach the same practical reliability as Claude and Codex on my machine?
Can Temporal add durability later without becoming a second orchestration platform?

Those are the questions Cynthia is meant to answer incrementally.

The direction feels right because it matches how I actually work. I do not want one giant agent pretending to be everything. I want a governed crew: different native tools, each used for the job they are good at, with Omnigent keeping the sessions legible and Cynthia turning the work into evidence.

References and notes

Omnigent: the open-source control plane for coding agents and native harness sessions.
Polly and Debby: Omnigent example agent packs that helped make the YAML-first model click for me.
Cynthia: my Omnigent-native agent pack for Council, planning, delivery, review, status, and eventually mission durability.
Bosun: my chief-of-staff layer that can deploy Cynthia as a crewmate.
Claude Code, Codex CLI, Antigravity, Kiro: native coding harnesses, not just model endpoints.