Do not scan/mirror Anthropic's docs into the KB — distill curated references instead

0050-do-not-mirror-anthropic-docs-distill-references

decision read as Explain confidence asserted status active 2026-06-18 owner principal-architect

Reversibility: two-way door

DEC-0050 — Do not mirror Anthropic's docs into the KB; distill curated references instead

Reversibility: two-way door — a curation/sourcing policy, not built code; nothing is mirrored, so reversing means adding a scan if a real need ever appears (e.g. an offline/air-gapped build with no live doc access). The durable commitment is the principle — reference, don't absorb, for external docs we do not own and that churn fast — which generalizes the OKF upstream relationship — complement at the format layer, competitor at the serving layer stance to the Anthropic-docs case.

Context

On 2026-06-18 the Principal Forward Deployed Engineer verified that 0% of the Claude/Anthropic documentation is scanned or cached anywhere in this repo — what exists are ~8 hand-cited doc-page URLs inside ADRs and research atoms, never a captured corpus. The question this forces: should we ingest Anthropic's docs into the KB so agents stop re-fetching the same stable facts? This is the same shape as the OKF-upstream call (OKF upstream relationship — complement at the format layer, competitor at the serving layer): an external body of knowledge Dossier builds on but does not own.

The trigger was real recurring friction — agents kept re-reading the docs in-session to recall the same stable facts (which file a subagent lives in, the hook event names, the current Opus model ID). We needed a durable answer that lowers that recall cost without absorbing someone else's fast-moving corpus.

Options considered

Scan/mirror the docs into the KB (crawl code.claude.com/docs + platform.claude.com/docs into OKF atoms or a cached corpus). Rejected: (a) provenance/sovereignty — the docs are Anthropic's, not the org's own institutional memory; the KB is for the org's compounding knowledge, not a mirror of a vendor's manual (the Dossier — The Knowledge Model (v0) "capture judgment, not just facts" + Dossier — Mission & North Star sovereignty stance). (b) Staleness — the Claude model/primitive surface moves fast (Fable 5, Opus 4.8 are recent); a frozen mirror rots and quietly misleads. (c) Redundant — we already read docs transiently in-session via live primitives, so a mirror is bloat + rot risk with no owned-knowledge gain.
Capture nothing — keep re-fetching in-session every time. Rejected: leaves the real recurring friction unaddressed; the stable core (the six primitives + their config locations, the current model family) genuinely is re-derived over and over and is worth capturing once.
Distill one curated reference atom of the stable core + adopt "reference, don't absorb" as the standing rule (chosen). Capture only the high-churn-resistant facts a build-with-primitives shop keeps re-deriving — grounded in primary docs at capture time, dated, provenance-bearing, recency-caveated — and route everything live (a new model, a changed param, an edge case) to the live primitives. Lowers recall cost without absorbing the corpus, and matches the OKF upstream relationship — complement at the format layer, competitor at the serving layer evidence→reference, judgment→decision precedent.

Decision

Do NOT scan, crawl, or mirror Anthropic's Claude / Claude Code docs into the KB. Adopt the standing rule — reference, don't absorb — for high-churn external documentation Dossier builds on but does not own:

Read the docs transiently in-session via the live primitives — the claude-api skill, the claude-code-guide agent, WebFetch against the docs. These are always-current; the docs are their source of truth, not the KB.
Synthesize the judgment into an ADR when a doc fact forces a real choice (the existing pattern — DEC-0004 et al.).
Capture only the stable, re-derived core as a dated, provenance-bearing reference atom, recency-caveated, explicitly framed as "implication, NOT a decision." The first such atom is Claude Primitives & Model Family — June 2026 Snapshot (the six primitives + config locations + the current Claude model family/pricing).

Rationale

Sovereignty/provenance is the line. The KB is the org's owned, compounding memory. Anthropic's docs are Anthropic's — mirroring them in confuses whose knowledge this is and what the org can claim to own (exactly the distinction OKF upstream relationship — complement at the format layer, competitor at the serving layer draws: complement at the format/source layer, don't absorb).
A frozen snapshot of a fast-moving surface rots. The model/primitive surface churns; a mirror would silently age. A small, dated, caveated distillation of only the stable core is honest about its own shelf life and points to the live source for anything that moves.
The right pattern already exists. Transient read → synthesize ADR → distill the stable core is what we already do; this decision names it as policy rather than inventing machinery.
asserted, not verified. This is a curation/sourcing judgment (which knowledge the KB should and should not hold), not a mechanism-tested claim. The underlying facts that 0% is cached today and that the live primitives exist are verified; the policy on top is asserted.

Consequences

No doc-scan/mirror tooling is built or wired; the KB stays free of a vendor doc corpus.
Agents have a known baseline (Claude Primitives & Model Family — June 2026 Snapshot) for the stable core and a clear instruction to use the live primitives for anything current — lowering recall cost without a rotting mirror.
The reference library gains a fourth domain (Claude primitives/models) under the same evidence→reference, judgment→decision discipline.
Generalizable: the same "reference, don't absorb" rule applies to any future external doc set Dossier builds on but does not own.

Review

Promote asserted → verified only if the policy is stress-tested by a real counter-need and holds — e.g. an offline/air-gapped build with no live doc access where re-deriving from the live primitives is impossible, forcing a deliberate, scoped cache. If that case ever lands, capture it as its own decision (it would amend, not silently override, this stance). Absent such a case, this stays an asserted curation policy.