Subscription-backed extraction is a first-class transport — ClaudeCodeClient (no API keys)
0019-subscription-extraction-client
- Reversibility
- two-way door
DEC-0019 — Subscription-backed extraction (ClaudeCodeClient, no API keys)
Reversibility: two-way door — the transport is one of two siblings behind the ClaudeClient seam (swappable/removable without touching the pipeline); the seam itself and faithfulness-over-coverage are the durable parts.
Context
Extraction runtime architecture — the moat built the moat — the staged extraction pipeline — on the ClaudeClient seam, with AnthropicClaudeClient (the only consumer of @anthropic-ai/sdk, using forced tool use for the typed OKF transform) as the live transport. That transport requires an ANTHROPIC_API_KEY. At the user's direction — "keep building and testing with my subscription, no API keys" — extraction needed a second, equally first-class transport that runs on a Claude subscription with no key. This also has direct GTM relevance: per the user role (agencies are the go-to-market; clients are served through the agency), an agency running a client's learning loop on its own Claude subscription avoids per-client API-key management entirely.
This decision builds that transport: ClaudeCodeClient (packages/extraction/src/llm/claude-code.ts) and the dossier-runtime run --subscription wiring (packages/runtime/src/cli.ts). Verified this session, all offline: 8 new unit tests for the client (prompt build; envelope parse incl. fence/prose tolerance; model-tier→CLI-alias mapping; failure-degradation), and a capstone end-to-end run through the package client on the subscription (1 page, 3 calls → 19 atoms, 0 rejected, committed into the tenant's own isolated repo). The runtime test suite stays offline by construction (the live claude -p spawn is exercised only by real test-runs, never in CI).
Options considered
- Anthropic API key only — keep
AnthropicClaudeClientas the sole transport. Rejected: it forces every operator (and every agency running a client loop) to provision and manage API keys, directly contradicting the user's stated direction and adding per-client key management to the GTM path. - A bespoke subscription-auth client — re-implement the Anthropic message/tool protocol against subscription auth. Rejected: bespoke infra where a Claude primitive already exists (Claude-primitives-first build strategy); the Claude Code CLI already does headless, subscription-authed inference.
ClaudeCodeClient— a second transport behind the sameClaudeClientseam, driving theclaudeCLI headless (chosen). Same seam asAnthropicClaudeClient, so the pipeline is unchanged; only the transport differs. The liveclaude -p --output-format jsonspawn sits behind an injectableCliRunnerseam so the prompt-building and response-parsing are unit-tested with no subprocess — CI stays offline by construction, the same discipline that keepsAnthropicClaudeClientthe only@anthropic-ai/sdkconsumer.
Decision
Add ClaudeCodeClient as a first-class extraction transport — the subscription sibling of AnthropicClaudeClient behind the same ClaudeClient seam.
- CLI headless transport, no key. It runs forced structured extraction via the Claude Code CLI headless (
claude -p --output-format json --model <alias>) on the user's subscription — noANTHROPIC_API_KEY, no@anthropic-ai/sdk. It is the only place in@dossier/extractionthat shells out to theclaudebinary. CliRunnerseam keeps CI offline. The live subprocess spawn is isolated behind an injectableCliRunner((args, stdin) => Promise<string>); tests inject a fake runner, so prompt-building and envelope-parsing are unit-tested with no subprocess. Same seam-with-mock discipline as the liveClaudeClient/Embedder(MCP agentic foundation — tenant-scoped GraphRAG over the OKF KB) /AgentSdkOrchestrator(Runtime orchestration & per-tenant control plane — the learning loop becomes a runnable system).- No
tool_choice, so strict-JSON-against-the-schema instead. The CLI cannot force a tool, so the prompt frames the model as a non-interactive extraction function and instructs a single strict-JSON object matching the OKF toolinput_schema(req.inputSchema, the same schema the Anthropic client forces). The output is then validated by@dossier/okfvalidate()exactly as before — parsed-and-validated, never parsed-and-hoped. Model tiers map to CLI aliases (opus/haiku/sonnet), with aforceModeloverride. - Faithfulness over coverage. A per-segment CLI or parse failure degrades to zero atoms for that segment (surfaced via
failurescounters + theonCallhook), never aborting a long run. Coverage loss is recoverable; a crash mid-corpus is not. - Runtime wiring.
dossier-runtime run --subscriptioninjectsClaudeCodeClient(no key required); the default path keeps theANTHROPIC_API_KEYAnthropicClaudeClientand still refuses to run without a key (no silent network).
Rationale
- It honors the user's direction and the GTM. "Keep building and testing with my subscription, no API keys" is now a supported, first-class path — and an agency can run a client's loop on its own Claude subscription with no per-client key management (Dossier — Mission & North Star's "delivered through the agencies that already serve them").
- Use the Claude primitive before bespoke infra. Driving the Claude Code CLI headless is exactly Claude-primitives-first build strategy — the subscription-authed inference primitive already exists; we wire to it rather than re-implement an auth/transport layer.
- The moat is untouched; only the transport changed. Both clients satisfy the same
ClaudeClientseam, so the staged pipeline, prompt spec, and OKF validation are unchanged. Output is still validated by@dossier/okf— the absence oftool_choiceis compensated by schema-instructed JSON + the same downstreamvalidate(), so the typed-transform guarantee holds. - CI stays offline by construction. The
CliRunnerseam means no test ever spawns the binary; the live spawn is exercised only by real test-runs. This keeps the offline-first invariant the whole monorepo holds to. - Faithfulness>coverage guards the IP metric. Degrading a failed segment to zero atoms (rather than crashing, or emitting unvalidated guesses) is consistent with Live extraction eval harness — what we measure is what extraction optimizes for's faithfulness floor — we never fabricate atoms to pad coverage.
asserted, notverified. The client is built and verified green offline (8 new unit tests) with one real capstone end-to-end run on the subscription (1 page / 3 calls → 19 atoms, 0 rejected, committed). That is design-level conviction backed by a single real run — not yet validated across many corpora, cost/latency profiles, or against client/market use.
Consequences
- Extraction now has two first-class transports behind one seam:
AnthropicClaudeClient(API key, forced tool use) andClaudeCodeClient(subscription, schema-instructed JSON). The pipeline and OKF validation are identical across both. - CLI dependency at runtime, not at build/CI. The subscription path requires the
claudeCLI on PATH; CI never needs it (theCliRunnerseam). The default runner uses shell mode on win32 only (whereclaudeis a.cmdshim); args are a fixed flag/alias allowlist and the prompt travels via stdin, so there is no shell-injection surface. - The live
claude -pspawn is not covered by CI — it is offline-by-construction and exercised only by real test-runs (this is intentional, but it means the live transport's behavior is verified by reproduced runs, not the suite). - Two-way vs. durable. The transport itself is one of two swappable siblings (two-way door — add, swap, or remove without touching the pipeline). The durable commitments are the
ClaudeClientseam, OKF-validated output, and faithfulness-over-coverage. - Provenance note (scope):
run --subscription --source-dirstamps file-path provenance (it ingests local files viaLocalFilesConnector, Ingestion connector seam — assemble, don't build, and ingestion owns the input contract). URL provenance belongs to the web-ingest path (the reservedFirecrawlConnector), not the generic file CLI — see the related milestone for how URL provenance was achieved by re-stamping a staged crawl.
Review
Promote to verified once the subscription transport has run across multiple real corpora — confirm cost/latency are acceptable, that schema-instructed JSON holds up vs. forced tool use on extraction quality (run Live extraction eval harness — what we measure is what extraction optimizes for across both transports), and that the faithfulness-over-coverage degradation behaves correctly under real failure rates. Revisit if the Claude Code CLI gains a tool_choice-equivalent (the strict-JSON prompt could then be replaced by a true forced tool, narrowing the gap to AnthropicClaudeClient).