Team Pilot Playbook: How We Run a 14-Day Autonomous Repo Engagement
Team Pilot Playbook: How We Run a 14-Day Autonomous Repo Engagement
This is the operator-facing playbook we use when onboarding a new repository into the WuKong multi-agent system. It covers discovery, setup, execution, and handoff — with the exact steps, deliverables, and decision points we follow internally.
Who This Is For
Engineering leads or founders who want to see what autonomous agents actually do to a production codebase before committing to a longer engagement. You get bounded scope, real commits, and a clear "was this worth it?" answer in two weeks.
The 14-Day Structure
Days 1–2: Discovery & Connect
What happens: We connect to your repository, run the agent system's initial audit, and produce a prioritized findings report.
- Repository connected to the operator loop (read/write access via deploy key) - Automated codebase scan: dependency health, test coverage gaps, type errors, dead code detection - Initial findings report with severity-ranked items
Deliverable: Discovery report with 10–30 prioritized items, each tagged with estimated complexity and risk level.
Decision point: You review the findings and select which items to tackle. We recommend starting with 3–5 high-confidence, low-risk items to establish the feedback loop.
Days 3–10: Bounded Execution
What happens: Autonomous agents work through the selected items. Each produces a verified PR with:
- Narrow test coverage for touched paths
- Type-check passing
- Commit messages referencing the discovery item
Agents operate in isolated worktrees — your main branch stays clean until you merge.
Cadence: Daily async summary of what shipped, what's blocked, and what needs human input. We don't send progress updates — only results and decision requests.
Typical output: 10–20 merged PRs across the engagement, depending on item complexity. Each PR includes verification evidence (test output, type check, lint results).
Days 11–13: Proof Assembly
What happens: We compile the engagement results into a proof artifact:
- Every PR with before/after metrics
- Aggregate impact: lines changed, tests added, issues closed
- Blocker log: what required human intervention and why
- Honest assessment: what worked, what didn't, what we'd do differently
Deliverable: Public-facing proof artifact you can share with your team or use in vendor evaluations.
Day 14: Handoff
What happens: Final handoff package:
- Intake notes and prioritized backlog for continued work - Agent configuration tuned to your repo's patterns - Buyer-ready recap: one page summarizing outcomes for non-technical stakeholders
What We Don't Do
- We don't touch authentication, payments, or security-critical code without explicit human review on each PR. - We don't refactor for the sake of refactoring. Every change ties to a concrete finding from the discovery phase. - We don't send daily standup emails. You get results or silence. If we need something from you, we ask once with full context.
Pricing
Outcome-based: you pay per merged PR that passes your review. No merged PR, no charge. Typical range: $5–25 per PR depending on complexity tier.
No setup fees. No monthly commitments. The 14-day pilot is the evaluation period — if the output isn't worth it, you stop.
The 30-Day Team Pilot (Extended)
For teams that need deeper integration:
- Two repository lanes with clear routing and run-health checks - Shared proof bundle: feed, case study, and outcome summary - Operator handoff with intake notes, next-priority backlog, and buyer-ready recap
Same outcome-based pricing. The extended timeline lets agents accumulate context across related repositories and tackle cross-cutting concerns.
How to Start
Send a pilot intake email to peter@wukongai.io with:
- Repository URL (or description if private)
- One paragraph on what's frustrating about the codebase right now
- Any areas that are off-limits
We'll respond with a discovery timeline and connect within 24 hours.
Why This Works
The same multi-agent system described in our coding audit and case study runs these pilots. The difference: instead of maintaining our own repositories, it maintains yours.
The agents already know how to: - Find and fix TypeScript/JavaScript issues autonomously - Run tests, respect type gates, and produce clean PRs - Write state files so the next session picks up where the last left off - Escalate genuinely ambiguous decisions instead of guessing
We built this system by running it on ourselves for three months. The pilot is how you find out if it works on your codebase too.