Subscription-backed extraction is a first-class transport — ClaudeCodeClient (no API keys)

0019-subscription-extraction-client

decision read as Explain confidence asserted status active 2026-06-15 owner extraction-engineer

Reversibility: two-way door

DEC-0019 — Subscription-backed extraction (ClaudeCodeClient, no API keys)

Reversibility: two-way door — the transport is one of two siblings behind the ClaudeClient seam (swappable/removable without touching the pipeline); the seam itself and faithfulness-over-coverage are the durable parts.

Context

Extraction runtime architecture — the moat built the moat — the staged extraction pipeline — on the ClaudeClient seam, with AnthropicClaudeClient (the only consumer of @anthropic-ai/sdk, using forced tool use for the typed OKF transform) as the live transport. That transport requires an ANTHROPIC_API_KEY. At the user's direction — "keep building and testing with my subscription, no API keys" — extraction needed a second, equally first-class transport that runs on a Claude subscription with no key. This also has direct GTM relevance: per the user role (agencies are the go-to-market; clients are served through the agency), an agency running a client's learning loop on its own Claude subscription avoids per-client API-key management entirely.

This decision builds that transport: ClaudeCodeClient (packages/extraction/src/llm/claude-code.ts) and the dossier-runtime run --subscription wiring (packages/runtime/src/cli.ts). Verified this session, all offline: 8 new unit tests for the client (prompt build; envelope parse incl. fence/prose tolerance; model-tier→CLI-alias mapping; failure-degradation), and a capstone end-to-end run through the package client on the subscription (1 page, 3 calls → 19 atoms, 0 rejected, committed into the tenant's own isolated repo). The runtime test suite stays offline by construction (the live claude -p spawn is exercised only by real test-runs, never in CI).

Options considered

Anthropic API key only — keep AnthropicClaudeClient as the sole transport. Rejected: it forces every operator (and every agency running a client loop) to provision and manage API keys, directly contradicting the user's stated direction and adding per-client key management to the GTM path.
A bespoke subscription-auth client — re-implement the Anthropic message/tool protocol against subscription auth. Rejected: bespoke infra where a Claude primitive already exists (Claude-primitives-first build strategy); the Claude Code CLI already does headless, subscription-authed inference.
ClaudeCodeClient — a second transport behind the same ClaudeClient seam, driving the claude CLI headless (chosen). Same seam as AnthropicClaudeClient, so the pipeline is unchanged; only the transport differs. The live claude -p --output-format json spawn sits behind an injectable CliRunner seam so the prompt-building and response-parsing are unit-tested with no subprocess — CI stays offline by construction, the same discipline that keeps AnthropicClaudeClient the only @anthropic-ai/sdk consumer.

Decision

Add ClaudeCodeClient as a first-class extraction transport — the subscription sibling of AnthropicClaudeClient behind the same ClaudeClient seam.

CLI headless transport, no key. It runs forced structured extraction via the Claude Code CLI headless (claude -p --output-format json --model <alias>) on the user's subscription — no ANTHROPIC_API_KEY, no @anthropic-ai/sdk. It is the only place in @dossier/extraction that shells out to the claude binary.
CliRunner seam keeps CI offline. The live subprocess spawn is isolated behind an injectable CliRunner ((args, stdin) => Promise<string>); tests inject a fake runner, so prompt-building and envelope-parsing are unit-tested with no subprocess. Same seam-with-mock discipline as the live ClaudeClient / Embedder (MCP agentic foundation — tenant-scoped GraphRAG over the OKF KB) / AgentSdkOrchestrator (Runtime orchestration & per-tenant control plane — the learning loop becomes a runnable system).
No tool_choice, so strict-JSON-against-the-schema instead. The CLI cannot force a tool, so the prompt frames the model as a non-interactive extraction function and instructs a single strict-JSON object matching the OKF tool input_schema (req.inputSchema, the same schema the Anthropic client forces). The output is then validated by @dossier/okf validate() exactly as before — parsed-and-validated, never parsed-and-hoped. Model tiers map to CLI aliases (opus/haiku/sonnet), with a forceModel override.
Faithfulness over coverage. A per-segment CLI or parse failure degrades to zero atoms for that segment (surfaced via failures counters + the onCall hook), never aborting a long run. Coverage loss is recoverable; a crash mid-corpus is not.
Runtime wiring. dossier-runtime run --subscription injects ClaudeCodeClient (no key required); the default path keeps the ANTHROPIC_API_KEY AnthropicClaudeClient and still refuses to run without a key (no silent network).

Rationale

It honors the user's direction and the GTM. "Keep building and testing with my subscription, no API keys" is now a supported, first-class path — and an agency can run a client's loop on its own Claude subscription with no per-client key management (Dossier — Mission & North Star's "delivered through the agencies that already serve them").
Use the Claude primitive before bespoke infra. Driving the Claude Code CLI headless is exactly Claude-primitives-first build strategy — the subscription-authed inference primitive already exists; we wire to it rather than re-implement an auth/transport layer.
The moat is untouched; only the transport changed. Both clients satisfy the same ClaudeClient seam, so the staged pipeline, prompt spec, and OKF validation are unchanged. Output is still validated by @dossier/okf — the absence of tool_choice is compensated by schema-instructed JSON + the same downstream validate(), so the typed-transform guarantee holds.
CI stays offline by construction. The CliRunner seam means no test ever spawns the binary; the live spawn is exercised only by real test-runs. This keeps the offline-first invariant the whole monorepo holds to.
Faithfulness>coverage guards the IP metric. Degrading a failed segment to zero atoms (rather than crashing, or emitting unvalidated guesses) is consistent with Live extraction eval harness — what we measure is what extraction optimizes for's faithfulness floor — we never fabricate atoms to pad coverage.
asserted, not verified. The client is built and verified green offline (8 new unit tests) with one real capstone end-to-end run on the subscription (1 page / 3 calls → 19 atoms, 0 rejected, committed). That is design-level conviction backed by a single real run — not yet validated across many corpora, cost/latency profiles, or against client/market use.

Consequences

Extraction now has two first-class transports behind one seam: AnthropicClaudeClient (API key, forced tool use) and ClaudeCodeClient (subscription, schema-instructed JSON). The pipeline and OKF validation are identical across both.
CLI dependency at runtime, not at build/CI. The subscription path requires the claude CLI on PATH; CI never needs it (the CliRunner seam). The default runner uses shell mode on win32 only (where claude is a .cmd shim); args are a fixed flag/alias allowlist and the prompt travels via stdin, so there is no shell-injection surface.
The live claude -p spawn is not covered by CI — it is offline-by-construction and exercised only by real test-runs (this is intentional, but it means the live transport's behavior is verified by reproduced runs, not the suite).
Two-way vs. durable. The transport itself is one of two swappable siblings (two-way door — add, swap, or remove without touching the pipeline). The durable commitments are the ClaudeClient seam, OKF-validated output, and faithfulness-over-coverage.
Provenance note (scope): run --subscription --source-dir stamps file-path provenance (it ingests local files via LocalFilesConnector, Ingestion connector seam — assemble, don't build, and ingestion owns the input contract). URL provenance belongs to the web-ingest path (the reserved FirecrawlConnector), not the generic file CLI — see the related milestone for how URL provenance was achieved by re-stamping a staged crawl.

Review

Promote to verified once the subscription transport has run across multiple real corpora — confirm cost/latency are acceptable, that schema-instructed JSON holds up vs. forced tool use on extraction quality (run Live extraction eval harness — what we measure is what extraction optimizes for across both transports), and that the faithfulness-over-coverage degradation behaves correctly under real failure rates. Revisit if the Claude Code CLI gains a tool_choice-equivalent (the strict-JSON prompt could then be replaced by a true forced tool, narrowing the gap to AnthropicClaudeClient).