Give the proven agentic board-drain an operator front door — a first-class `dossier-runtime drain` subcommand, offline-by-default with explicit `--agentic` opt-in
0063-agentic-drain-operator-cli-front-door
- Reversibility
- two-way door
DEC-0063 — The agentic drain gets an operator front door (dossier-runtime drain)
Reversibility: two-way door. The change is purely additive — one new CLI subcommand and one new transport-resolver function. The default behavior of every existing command is unchanged, the default drain worker is the offline deterministic one (no spend, no network), and the whole thing can be removed without touching the loop core it composes. Nothing is locked in.
Context — the proven loop had no operator-facing door
The act-and-learn loop is real and field-proven: the board governance core (DEC-0024) plus the live persona-grounded multi-turn session (DEC-0053) ran end-to-end on both transports. But the only way to actually run the agentic drain was a hand-run harness script — scripts/agency-phase0-live.mjs / agency-phase0-dogfood.mjs. The shipped runner scripts/board-drain.mjs deliberately runs the offline default worker, not the agentic one, and isn't even a pnpm script. The control-plane CLI (dossier-runtime) wired provision / run / site / review-queue / approve / reject — but not the drain. So the platform's most impressive proven capability had no front door: an operator could provision a tenant and dispose a review task, but could not run the loop that produces one without authoring a script.
This is the "built to the seam, then stops at a harness" gap the FDE global read flagged as the single highest-leverage next move (verified against cli.ts, board-drain.mjs, and the harness scripts).
The one open architecture call — default offline or default agentic?
The FDE surfaced exactly one decision worth recording: when an operator types dossier-runtime drain, does the default run the deterministic offline worker, or the live spending agent?
Decision: default OFFLINE; the live agent is explicit opt-in (--agentic). This is the conservative, two-way-door choice and it is forced by the standing non-negotiables:
- Offline by construction (DEC-0008 §6). The default must never reach the network or spend a credit. A bare
drainis a safe, free, deterministic claim→transition pass — the same postureboard-drain.mjsalready takes. - No silent network (the auth-seam posture, DEC-0019).
--agenticresolves its transport through the sameresolveAuthFromEnvgaterunuses: it refuses with a precise message unless--subscription(Claude sub, no key) orANTHROPIC_API_KEY(the SDK transport) is present. Spend is opt-in and gated, never a default surprise. - Two-way door first. Defaulting to the cheap, reversible path and making the spending path explicit is the right asymmetry: a wrong free drain costs nothing; a surprise paid drain costs trust and credits.
Decision
Add dossier-runtime drain --root <dir> --client <id> as a first-class subcommand (registered in the single SUBCOMMANDS source so the Sub type, the main() switch, and isSub cannot drift — the DEC-0039 follow-up contract):
- Default →
DefaultBoardWorkerviadrainBoardSerialized: offline, deterministic, no LLM, no network, no spend. --agentic→ the liveAgentSdkBoardWorkerover acreateLiveSession, transport resolved through the auth seam; bounded by--max-turns/--max-budget(Inv 5 tier-one per-session cap), model via--model(defaulthaiku).- Both paths run through
drainBoardSerialized, so Invariant 4 (per-tenant drain lock — one drain at a time) and Invariant 5 (the per-tenant budget envelope + board-pause kill switch) hold unchanged. A drain that is serialized-out or over-budget is a clean no-op (exit 0), mirroringboard-drain.mjs's paused-exits-0 posture. - The agent transitions work only to
review— neverdone. The operator closes the loop with the existingreview-queue→approve/rejectverbs (Invariant 3, the human gate). The drain command emits the exactreview-queuefollow-up command when any task reachesreview.
Single-source-of-truth: resolveLiveRunner(auth)
The transport→runner mapping is centralized in auth.ts as resolveLiveRunner(auth) — the live-session analogue of resolveClaudeClient: subscription → cliTurnRunner, api-key → agentSdkTurnRunner({apiKey}), mock → throws (a live drain needs a real transport; the offline path uses the default worker, not a runner). This is the same insulation DEC-0019 established for one-shot extraction, now extended to the multi-turn drain, so the CLI, the future plugin, and any runtime can never drift on how a transport becomes a runner. Resolving a runner is offline-safe — both factories touch the network/optional-SDK only when a turn actually runs.
Why this was assembly, not a build
Every dependency was already green and verified: drainBoardSerialized, AgentSdkBoardWorker, createLiveSession, the auth seam, the disposition verbs. No new runtime mechanism was introduced — the work was wiring proven pieces into the operator surface plus one small resolver. That is precisely why it was the highest-leverage move: it converts a capability that was verified-in-a-script into one that is verified-by-an-operator, at additive, reversible cost.
Proven this session (no fabricated status)
pnpm --filter @dossier/runtime typecheck— clean.pnpm --filter @dossier/runtime test— 130 passed, 1 skipped (the opt-in live test); includes the newresolveLiveRunnercases and the updatedSUBCOMMANDScontract.- End-to-end through the built CLI: provision a real tenant → write a
backlogtask →dossier-runtime drain(offline) transitions itbacklog → in_progresson disk (exit 0) →dossier-runtime drain --agenticwithout a transport refuses with the no-silent-network message (exit 2). pnpm kb:check— clean.
Scope — the /board review-surface is the other half, and a topology question
The FDE's highest-leverage move was "CLI drain + a /board approve/ship view." This decision delivers and proves the CLI front door. The app-side review surface is the SvelteKit layer's work (Spec the v0 agency dashboard surface (Phase 0 dogfood — Dossier's own .claude/agents team on Dossier's own OKF; daily-standup / approve-ship loop)) and carries a genuine topology question that is not an FDE call to slip into a view: a read-only review-queue surface (show tasks awaiting disposition + the proposed diff) is straightforwardly shippable, but performing governed, git-mutating approve/reject from a public Vercel deployment needs auth + server-side git access + a decision on where the tenant repo lives — a Principal Platform Architect call. Recorded here so the split is explicit: the loop is now operator-runnable from the CLI; the browser surface is sequenced behind that question.
Consequences
- The proven agentic loop is reachable by an operator without authoring a script — the harness scripts (
agency-phase0-*.mjs) remain as live proofs, not the only entry point. - The drain's default is offline-by-construction; spend is explicit + auth-gated; Inv 3/4/5 are preserved by composition (the command owns no new invariant logic).
resolveLiveRunnermakes the agentic transport choice a single config-driven seam, ready for the plugin packaging surface to reuse verbatim.