From 95aa5ef15c3bb942c8557753d2a9d3254d6caeee Mon Sep 17 00:00:00 2001 From: Yeachan-Heo Date: Fri, 3 Apr 2026 14:46:06 +0000 Subject: [PATCH] docs: add clawable harness roadmap --- ROADMAP.md | 331 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 331 insertions(+) create mode 100644 ROADMAP.md diff --git a/ROADMAP.md b/ROADMAP.md new file mode 100644 index 0000000..fec520c --- /dev/null +++ b/ROADMAP.md @@ -0,0 +1,331 @@ +# ROADMAP.md + +# Clawable Coding Harness Roadmap + +## Goal + +Turn claw-code into the most **clawable** coding harness: +- no human-first terminal assumptions +- no fragile prompt injection timing +- no opaque session state +- no hidden plugin or MCP failures +- no manual babysitting for routine recovery + +This roadmap assumes the primary users are **claws wired through hooks, plugins, sessions, and channel events**. + +## Definition of "clawable" + +A clawable harness is: +- deterministic to start +- machine-readable in state and failure modes +- recoverable without a human watching the terminal +- branch/test/worktree aware +- plugin/MCP lifecycle aware +- event-first, not log-first +- capable of autonomous next-step execution + +## Current Pain Points + +### 1. Session boot is fragile +- trust prompts can block TUI startup +- prompts can land in the shell instead of the coding agent +- "session exists" does not mean "session is ready" + +### 2. Truth is split across layers +- tmux state +- clawhip event stream +- git/worktree state +- test state +- gateway/plugin/MCP runtime state + +### 3. Events are too log-shaped +- claws currently infer too much from noisy text +- important states are not normalized into machine-readable events + +### 4. Recovery loops are too manual +- restart worker +- accept trust prompt +- re-inject prompt +- detect stale branch +- retry failed startup +- classify infra vs code failures manually + +### 5. Branch freshness is not enforced enough +- side branches can miss already-landed main fixes +- broad test failures can be stale-branch noise instead of real regressions + +### 6. Plugin/MCP failures are under-classified +- startup failures, handshake failures, config errors, partial startup, and degraded mode are not exposed cleanly enough + +### 7. Human UX still leaks into claw workflows +- too much depends on terminal/TUI behavior instead of explicit agent state transitions and control APIs + +## Product Principles + +1. **State machine first** — every worker has explicit lifecycle states. +2. **Events over scraped prose** — channel output should be derived from typed events. +3. **Recovery before escalation** — known failure modes should auto-heal once before asking for help. +4. **Branch freshness before blame** — detect stale branches before treating red tests as new regressions. +5. **Partial success is first-class** — e.g. MCP startup can succeed for some servers and fail for others, with structured degraded-mode reporting. +6. **Terminal is transport, not truth** — tmux/TUI may remain implementation details, but orchestration state must live above them. +7. **Policy is executable** — merge, retry, rebase, stale cleanup, and escalation rules should be machine-enforced. + +## Roadmap + +## Phase 1 — Reliable Worker Boot + +### 1. Ready-handshake lifecycle for coding workers +Add explicit states: +- `spawning` +- `trust_required` +- `ready_for_prompt` +- `prompt_accepted` +- `running` +- `blocked` +- `finished` +- `failed` + +Acceptance: +- prompts are never sent before `ready_for_prompt` +- trust prompt state is detectable and emitted +- shell misdelivery becomes detectable as a first-class failure state + +### 2. Trust prompt resolver +Add allowlisted auto-trust behavior for known repos/worktrees. + +Acceptance: +- trusted repos auto-clear trust prompts +- events emitted for `trust_required` and `trust_resolved` +- non-allowlisted repos remain gated + +### 3. Structured session control API +Provide machine control above tmux: +- create worker +- await ready +- send task +- fetch state +- fetch last error +- restart worker +- terminate worker + +Acceptance: +- a claw can operate a coding worker without raw send-keys as the primary control plane + +## Phase 2 — Event-Native Clawhip Integration + +### 4. Canonical lane event schema +Define typed events such as: +- `lane.started` +- `lane.ready` +- `lane.prompt_misdelivery` +- `lane.blocked` +- `lane.red` +- `lane.green` +- `lane.commit.created` +- `lane.pr.opened` +- `lane.merge.ready` +- `lane.finished` +- `lane.failed` +- `branch.stale_against_main` + +Acceptance: +- clawhip consumes typed lane events +- Discord summaries are rendered from structured events instead of pane scraping alone + +### 5. Failure taxonomy +Normalize failure classes: +- `prompt_delivery` +- `trust_gate` +- `branch_divergence` +- `compile` +- `test` +- `plugin_startup` +- `mcp_startup` +- `mcp_handshake` +- `gateway_routing` +- `tool_runtime` +- `infra` + +Acceptance: +- blockers are machine-classified +- dashboards and retry policies can branch on failure type + +### 6. Actionable summary compression +Collapse noisy event streams into: +- current phase +- last successful checkpoint +- current blocker +- recommended next recovery action + +Acceptance: +- channel status updates stay short and machine-grounded +- claws stop inferring state from raw build spam + +## Phase 3 — Branch/Test Awareness and Auto-Recovery + +### 7. Stale-branch detection before broad verification +Before broad test runs, compare current branch to `main` and detect if known fixes are missing. + +Acceptance: +- emit `branch.stale_against_main` +- suggest or auto-run rebase/merge-forward according to policy +- avoid misclassifying stale-branch failures as new regressions + +### 8. Recovery recipes for common failures +Encode known automatic recoveries for: +- trust prompt unresolved +- prompt delivered to shell +- stale branch +- compile red after cross-crate refactor +- MCP startup handshake failure +- partial plugin startup + +Acceptance: +- one automatic recovery attempt occurs before escalation +- the attempted recovery is itself emitted as structured event data + +### 9. Green-ness contract +Workers should distinguish: +- targeted tests green +- package green +- workspace green +- merge-ready green + +Acceptance: +- no more ambiguous "tests passed" messaging +- merge policy can require the correct green level for the lane type + +## Phase 4 — Claws-First Task Execution + +### 10. Typed task packet format +Define a structured task packet with fields like: +- objective +- scope +- repo/worktree +- branch policy +- acceptance tests +- commit policy +- reporting contract +- escalation policy + +Acceptance: +- claws can dispatch work without relying on long natural-language prompt blobs alone +- task packets can be logged, retried, and transformed safely + +### 11. Policy engine for autonomous coding +Encode automation rules such as: +- if green + scoped diff + review passed -> merge to dev +- if stale branch -> merge-forward before broad tests +- if startup blocked -> recover once, then escalate +- if lane completed -> emit closeout and cleanup session + +Acceptance: +- doctrine moves from chat instructions into executable rules + +### 12. Claw-native dashboards / lane board +Expose a machine-readable board of: +- repos +- active claws +- worktrees +- branch freshness +- red/green state +- current blocker +- merge readiness +- last meaningful event + +Acceptance: +- claws can query status directly +- human-facing views become a rendering layer, not the source of truth + +## Phase 5 — Plugin and MCP Lifecycle Maturity + +### 13. First-class plugin/MCP lifecycle contract +Each plugin/MCP integration should expose: +- config validation contract +- startup healthcheck +- discovery result +- degraded-mode behavior +- shutdown/cleanup contract + +Acceptance: +- partial-startup and per-server failures are reported structurally +- successful servers remain usable even when one server fails + +### 14. MCP end-to-end lifecycle parity +Close gaps from: +- config load +- server registration +- spawn/connect +- initialize handshake +- tool/resource discovery +- invocation path +- error surfacing +- shutdown/cleanup + +Acceptance: +- parity harness and runtime tests cover healthy and degraded startup cases +- broken servers are surfaced as structured failures, not opaque warnings + +## Immediate Backlog (from current real pain) + +1. Worker readiness handshake + trust resolution +2. Prompt misdelivery detection and recovery +3. Canonical lane event schema in clawhip +4. Failure taxonomy + blocker normalization +5. Stale-branch detection before workspace tests +6. MCP structured degraded-startup reporting +7. Structured task packet format +8. Lane board / machine-readable status API + +## Suggested Session Split + +### Session A — worker boot protocol +Focus: +- trust prompt detection +- ready-for-prompt handshake +- prompt misdelivery detection + +### Session B — clawhip lane events +Focus: +- canonical lane event schema +- failure taxonomy +- summary compression + +### Session C — branch/test intelligence +Focus: +- stale-branch detection +- green-level contract +- recovery recipes + +### Session D — MCP lifecycle hardening +Focus: +- startup/handshake reliability +- structured failed server reporting +- degraded-mode runtime behavior +- lifecycle tests/harness coverage + +### Session E — typed task packets + policy engine +Focus: +- structured task format +- retry/merge/escalation rules +- autonomous lane closure behavior + +## MVP Success Criteria + +We should consider claw-code materially more clawable when: +- a claw can start a worker and know with certainty when it is ready +- claws no longer accidentally type tasks into the shell +- stale-branch failures are identified before they waste debugging time +- clawhip reports machine states, not just tmux prose +- MCP/plugin startup failures are classified and surfaced cleanly +- a coding lane can self-recover from common startup and branch issues without human babysitting + +## Short Version + +claw-code should evolve from: +- a CLI a human can also drive + +to: +- a **claw-native execution runtime** +- an **event-native orchestration substrate** +- a **plugin/hook-first autonomous coding harness**