Team Pilot Playbook: How We Run a 14-Day Autonomous Repo Engagement

Team Pilot Playbook: How We Run a 14-Day Autonomous Repo Engagement

This is the operator-facing playbook we use when onboarding a new repository into the WuKong multi-agent system. It covers discovery, setup, execution, and handoff — with the exact steps, deliverables, and decision points we follow internally.

Who This Is For

Engineering leads or founders who want to see what autonomous agents actually do to a production codebase before committing to a longer engagement. You get bounded scope, real commits, and a clear "was this worth it?" answer in two weeks.

The 14-Day Structure

Days 1–2: Discovery & Connect

What happens: We connect to your repository, run the agent system's initial audit, and produce a prioritized findings report.

- Repository connected to the operator loop (read/write access via deploy key) - Automated codebase scan: dependency health, test coverage gaps, type errors, dead code detection - Initial findings report with severity-ranked items

Deliverable: Discovery report with 10–30 prioritized items, each tagged with estimated complexity and risk level.

Decision point: You review the findings and select which items to tackle. We recommend starting with 3–5 high-confidence, low-risk items to establish the feedback loop.

Days 3–10: Bounded Execution

What happens: Autonomous agents work through the selected items. Each produces a verified PR with:

Agents operate in isolated worktrees — your main branch stays clean until you merge.

Cadence: Daily async summary of what shipped, what's blocked, and what needs human input. We don't send progress updates — only results and decision requests.

Typical output: 10–20 merged PRs across the engagement, depending on item complexity. Each PR includes verification evidence (test output, type check, lint results).

Days 11–13: Proof Assembly

What happens: We compile the engagement results into a proof artifact:

Deliverable: Public-facing proof artifact you can share with your team or use in vendor evaluations.

Day 14: Handoff

What happens: Final handoff package:

- Intake notes and prioritized backlog for continued work - Agent configuration tuned to your repo's patterns - Buyer-ready recap: one page summarizing outcomes for non-technical stakeholders

What We Don't Do

- We don't touch authentication, payments, or security-critical code without explicit human review on each PR. - We don't refactor for the sake of refactoring. Every change ties to a concrete finding from the discovery phase. - We don't send daily standup emails. You get results or silence. If we need something from you, we ask once with full context.

Pricing

Outcome-based: you pay per merged PR that passes your review. No merged PR, no charge. Typical range: $5–25 per PR depending on complexity tier.

No setup fees. No monthly commitments. The 14-day pilot is the evaluation period — if the output isn't worth it, you stop.

The 30-Day Team Pilot (Extended)

For teams that need deeper integration:

- Two repository lanes with clear routing and run-health checks - Shared proof bundle: feed, case study, and outcome summary - Operator handoff with intake notes, next-priority backlog, and buyer-ready recap

Same outcome-based pricing. The extended timeline lets agents accumulate context across related repositories and tackle cross-cutting concerns.

How to Start

Send a pilot intake email to peter@wukongai.io with:

  1. Repository URL (or description if private)
  2. One paragraph on what's frustrating about the codebase right now
  3. Any areas that are off-limits

We'll respond with a discovery timeline and connect within 24 hours.

Why This Works

The same multi-agent system described in our coding audit and case study runs these pilots. The difference: instead of maintaining our own repositories, it maintains yours.

The agents already know how to: - Find and fix TypeScript/JavaScript issues autonomously - Run tests, respect type gates, and produce clean PRs - Write state files so the next session picks up where the last left off - Escalate genuinely ambiguous decisions instead of guessing

We built this system by running it on ourselves for three months. The pilot is how you find out if it works on your codebase too.