Give the proven agentic board-drain an operator front door — a first-class `dossier-runtime drain` subcommand, offline-by-default with explicit `--agentic` opt-in

0063-agentic-drain-operator-cli-front-door

decision read as Explain confidence asserted status active 2026-06-20 owner forward-deployed-engineer

Reversibility: two-way door

DEC-0063 — The agentic drain gets an operator front door (`dossier-runtime drain`)

Reversibility: two-way door. The change is purely additive — one new CLI subcommand and one new transport-resolver function. The default behavior of every existing command is unchanged, the default drain worker is the offline deterministic one (no spend, no network), and the whole thing can be removed without touching the loop core it composes. Nothing is locked in.

Context — the proven loop had no operator-facing door

The act-and-learn loop is real and field-proven: the board governance core (DEC-0024) plus the live persona-grounded multi-turn session (DEC-0053) ran end-to-end on both transports. But the only way to actually run the agentic drain was a hand-run harness script — scripts/agency-phase0-live.mjs / agency-phase0-dogfood.mjs. The shipped runner scripts/board-drain.mjs deliberately runs the offline default worker, not the agentic one, and isn't even a pnpm script. The control-plane CLI (dossier-runtime) wired provision / run / site / review-queue / approve / reject — but not the drain. So the platform's most impressive proven capability had no front door: an operator could provision a tenant and dispose a review task, but could not run the loop that produces one without authoring a script.

This is the "built to the seam, then stops at a harness" gap the FDE global read flagged as the single highest-leverage next move (verified against cli.ts, board-drain.mjs, and the harness scripts).

The one open architecture call — default offline or default agentic?

The FDE surfaced exactly one decision worth recording: when an operator types dossier-runtime drain, does the default run the deterministic offline worker, or the live spending agent?

Decision: default OFFLINE; the live agent is explicit opt-in (--agentic). This is the conservative, two-way-door choice and it is forced by the standing non-negotiables:

Offline by construction (DEC-0008 §6). The default must never reach the network or spend a credit. A bare drain is a safe, free, deterministic claim→transition pass — the same posture board-drain.mjs already takes.
No silent network (the auth-seam posture, DEC-0019). --agentic resolves its transport through the same resolveAuthFromEnv gate run uses: it refuses with a precise message unless --subscription (Claude sub, no key) or ANTHROPIC_API_KEY (the SDK transport) is present. Spend is opt-in and gated, never a default surprise.
Two-way door first. Defaulting to the cheap, reversible path and making the spending path explicit is the right asymmetry: a wrong free drain costs nothing; a surprise paid drain costs trust and credits.

Decision

Add dossier-runtime drain --root <dir> --client <id> as a first-class subcommand (registered in the single SUBCOMMANDS source so the Sub type, the main() switch, and isSub cannot drift — the DEC-0039 follow-up contract):

Default → DefaultBoardWorker via drainBoardSerialized: offline, deterministic, no LLM, no network, no spend.
--agentic → the live AgentSdkBoardWorker over a createLiveSession, transport resolved through the auth seam; bounded by --max-turns / --max-budget (Inv 5 tier-one per-session cap), model via --model (default haiku).
Both paths run through drainBoardSerialized, so Invariant 4 (per-tenant drain lock — one drain at a time) and Invariant 5 (the per-tenant budget envelope + board-pause kill switch) hold unchanged. A drain that is serialized-out or over-budget is a clean no-op (exit 0), mirroring board-drain.mjs's paused-exits-0 posture.
The agent transitions work only to review — never done. The operator closes the loop with the existing review-queue → approve/reject verbs (Invariant 3, the human gate). The drain command emits the exact review-queue follow-up command when any task reaches review.

Single-source-of-truth: `resolveLiveRunner(auth)`

The transport→runner mapping is centralized in auth.ts as resolveLiveRunner(auth) — the live-session analogue of resolveClaudeClient: subscription → cliTurnRunner, api-key → agentSdkTurnRunner({apiKey}), mock → throws (a live drain needs a real transport; the offline path uses the default worker, not a runner). This is the same insulation DEC-0019 established for one-shot extraction, now extended to the multi-turn drain, so the CLI, the future plugin, and any runtime can never drift on how a transport becomes a runner. Resolving a runner is offline-safe — both factories touch the network/optional-SDK only when a turn actually runs.

Why this was assembly, not a build

Every dependency was already green and verified: drainBoardSerialized, AgentSdkBoardWorker, createLiveSession, the auth seam, the disposition verbs. No new runtime mechanism was introduced — the work was wiring proven pieces into the operator surface plus one small resolver. That is precisely why it was the highest-leverage move: it converts a capability that was verified-in-a-script into one that is verified-by-an-operator, at additive, reversible cost.

Proven this session (no fabricated status)

pnpm --filter @dossier/runtime typecheck — clean.
pnpm --filter @dossier/runtime test — 130 passed, 1 skipped (the opt-in live test); includes the new resolveLiveRunner cases and the updated SUBCOMMANDS contract.
End-to-end through the built CLI: provision a real tenant → write a backlog task → dossier-runtime drain (offline) transitions it backlog → in_progress on disk (exit 0) → dossier-runtime drain --agentic without a transport refuses with the no-silent-network message (exit 2).
pnpm kb:check — clean.

Scope — the `/board` review-surface is the other half, and a topology question

The FDE's highest-leverage move was "CLI drain + a /board approve/ship view." This decision delivers and proves the CLI front door. The app-side review surface is the SvelteKit layer's work (Spec the v0 agency dashboard surface (Phase 0 dogfood — Dossier's own .claude/agents team on Dossier's own OKF; daily-standup / approve-ship loop)) and carries a genuine topology question that is not an FDE call to slip into a view: a read-only review-queue surface (show tasks awaiting disposition + the proposed diff) is straightforwardly shippable, but performing governed, git-mutating approve/reject from a public Vercel deployment needs auth + server-side git access + a decision on where the tenant repo lives — a Principal Platform Architect call. Recorded here so the split is explicit: the loop is now operator-runnable from the CLI; the browser surface is sequenced behind that question.

Consequences

The proven agentic loop is reachable by an operator without authoring a script — the harness scripts (agency-phase0-*.mjs) remain as live proofs, not the only entry point.
The drain's default is offline-by-construction; spend is explicit + auth-gated; Inv 3/4/5 are preserved by composition (the command owns no new invariant logic).
resolveLiveRunner makes the agentic transport choice a single config-driven seam, ready for the plugin packaging surface to reuse verbatim.

DEC-0063 — The agentic drain gets an operator front door (dossier-runtime drain)