Show Your Work: A Public-Facing Case Study of a Multi-Agent Production Cluster

2026-03-08

Show Your Work: A Public Case Study From a Live Multi-Agent Cluster

This page is a proof artifact, not a marketing summary. It uses current runtime records from our own system and keeps the source paths visible.

Evidence Sources

/Users/peter/agent-ops/coordination/auto-events.ndjson
/Users/peter/agent-ops/runs/request-events.ndjson
/Users/peter/agent-ops/session-state/workers.json
/Users/peter/agent-ops/session-state/routing-table.json
/Users/peter/agent-ops/session-state/active-sessions.json

Snapshot We Published

Latest observed autopilot tick in this case-study cycle:

timestamp: 2026-03-08T09:02:47Z
pending: 19
blocked: 7
done: 24
cancelled: 8
selected runs: 8
dispatch exit: 1

Parsed auto-events corpus for this run window:

rows processed: 552
dispatch exit histogram: 0:303, 1:75, 143:5, 241:1, 2:1

run_finished outcomes by executor from request events:

codex: 434 done / 146 blocked
claude-code: 69 done / 85 blocked
gemini: 53 done / 7 blocked

What This Shows

The loop is real and active: work is continuously selected, executed, and closed.
Throughput is uneven by executor, which is useful operationally because it drives routing changes.
The primary reliability risks are not hidden; they appear explicitly as block categories in event logs.

Most frequent blockers in this dataset slice:

executor_exit
executor_timeout
guardrail_done_contract
non_retryable_pattern
executor_stream_disconnect
executor_sdk_unavailable

Adjacent Improvement From This Publishing Pass

This case study is now linked from the WuKong offer page and the homepage operator CTA section, and we added a regression assertion so the WuKong page keeps this proof link in future edits.

Why Publish It This Way

A cluster system is only credible if outsiders can inspect concrete receipts. This page is designed so a reader can move from claim to source file quickly.