Do not scan/mirror Anthropic's docs into the KB — distill curated references instead

0050-do-not-mirror-anthropic-docs-distill-references

decision read as Explain confidence asserted status active 2026-06-18 owner principal-architect
Reversibility
two-way door

DEC-0050 — Do not mirror Anthropic's docs into the KB; distill curated references instead

Reversibility: two-way door — a curation/sourcing policy, not built code; nothing is mirrored, so reversing means adding a scan if a real need ever appears (e.g. an offline/air-gapped build with no live doc access). The durable commitment is the principle — reference, don't absorb, for external docs we do not own and that churn fast — which generalizes the OKF upstream relationship — complement at the format layer, competitor at the serving layer stance to the Anthropic-docs case.

Context

On 2026-06-18 the Principal Forward Deployed Engineer verified that 0% of the Claude/Anthropic documentation is scanned or cached anywhere in this repo — what exists are ~8 hand-cited doc-page URLs inside ADRs and research atoms, never a captured corpus. The question this forces: should we ingest Anthropic's docs into the KB so agents stop re-fetching the same stable facts? This is the same shape as the OKF-upstream call (OKF upstream relationship — complement at the format layer, competitor at the serving layer): an external body of knowledge Dossier builds on but does not own.

The trigger was real recurring friction — agents kept re-reading the docs in-session to recall the same stable facts (which file a subagent lives in, the hook event names, the current Opus model ID). We needed a durable answer that lowers that recall cost without absorbing someone else's fast-moving corpus.

Options considered

  1. Scan/mirror the docs into the KB (crawl code.claude.com/docs + platform.claude.com/docs into OKF atoms or a cached corpus). Rejected: (a) provenance/sovereignty — the docs are Anthropic's, not the org's own institutional memory; the KB is for the org's compounding knowledge, not a mirror of a vendor's manual (the Dossier — The Knowledge Model (v0) "capture judgment, not just facts" + Dossier — Mission & North Star sovereignty stance). (b) Staleness — the Claude model/primitive surface moves fast (Fable 5, Opus 4.8 are recent); a frozen mirror rots and quietly misleads. (c) Redundant — we already read docs transiently in-session via live primitives, so a mirror is bloat + rot risk with no owned-knowledge gain.
  2. Capture nothing — keep re-fetching in-session every time. Rejected: leaves the real recurring friction unaddressed; the stable core (the six primitives + their config locations, the current model family) genuinely is re-derived over and over and is worth capturing once.
  3. Distill one curated reference atom of the stable core + adopt "reference, don't absorb" as the standing rule (chosen). Capture only the high-churn-resistant facts a build-with-primitives shop keeps re-deriving — grounded in primary docs at capture time, dated, provenance-bearing, recency-caveated — and route everything live (a new model, a changed param, an edge case) to the live primitives. Lowers recall cost without absorbing the corpus, and matches the OKF upstream relationship — complement at the format layer, competitor at the serving layer evidence→reference, judgment→decision precedent.

Decision

Do NOT scan, crawl, or mirror Anthropic's Claude / Claude Code docs into the KB. Adopt the standing rule — reference, don't absorb — for high-churn external documentation Dossier builds on but does not own:

  1. Read the docs transiently in-session via the live primitives — the claude-api skill, the claude-code-guide agent, WebFetch against the docs. These are always-current; the docs are their source of truth, not the KB.
  2. Synthesize the judgment into an ADR when a doc fact forces a real choice (the existing pattern — DEC-0004 et al.).
  3. Capture only the stable, re-derived core as a dated, provenance-bearing reference atom, recency-caveated, explicitly framed as "implication, NOT a decision." The first such atom is Claude Primitives & Model Family — June 2026 Snapshot (the six primitives + config locations + the current Claude model family/pricing).

Rationale

  • Sovereignty/provenance is the line. The KB is the org's owned, compounding memory. Anthropic's docs are Anthropic's — mirroring them in confuses whose knowledge this is and what the org can claim to own (exactly the distinction OKF upstream relationship — complement at the format layer, competitor at the serving layer draws: complement at the format/source layer, don't absorb).
  • A frozen snapshot of a fast-moving surface rots. The model/primitive surface churns; a mirror would silently age. A small, dated, caveated distillation of only the stable core is honest about its own shelf life and points to the live source for anything that moves.
  • The right pattern already exists. Transient read → synthesize ADR → distill the stable core is what we already do; this decision names it as policy rather than inventing machinery.
  • asserted, not verified. This is a curation/sourcing judgment (which knowledge the KB should and should not hold), not a mechanism-tested claim. The underlying facts that 0% is cached today and that the live primitives exist are verified; the policy on top is asserted.

Consequences

  • No doc-scan/mirror tooling is built or wired; the KB stays free of a vendor doc corpus.
  • Agents have a known baseline (Claude Primitives & Model Family — June 2026 Snapshot) for the stable core and a clear instruction to use the live primitives for anything current — lowering recall cost without a rotting mirror.
  • The reference library gains a fourth domain (Claude primitives/models) under the same evidence→reference, judgment→decision discipline.
  • Generalizable: the same "reference, don't absorb" rule applies to any future external doc set Dossier builds on but does not own.

Review

Promote asserted → verified only if the policy is stress-tested by a real counter-need and holds — e.g. an offline/air-gapped build with no live doc access where re-deriving from the live primitives is impossible, forcing a deliberate, scoped cache. If that case ever lands, capture it as its own decision (it would amend, not silently override, this stance). Absent such a case, this stays an asserted curation policy.