Deterministic edge-invariant repair stage (extraction Stage 5.5)

0042-deterministic-edge-repair-stage

decision read as Explain confidence asserted status active 2026-06-16 owner extraction-engineer
Reversibility
two-way door

DEC-0042 — Deterministic edge-invariant repair stage (extraction Stage 5.5)

Reversibility: two-way door — the stage is a pure function inserted between two existing stages and toggleable per-repair (mintStubs: false leaves dangling edges for the validator to flag instead; removing the stage reverts to the prior reject-on-graph-error behavior). The durable commitment is the principle — make the DEC-0007 invariant true by construction rather than trusting the prompt — not this specific repair set, which extends as DEC-0007 pins more process-only edges.

Surfaced retroactively by the log-auditor via trace distillation (the _pending digest flagged a non-trivial pipeline change with no DEC). The why is fully recoverable from packages/extraction/src/pipeline/repair.ts and src/index.ts — recorded, not invented.

Context

Extraction runtime architecture — the moat §3 defines the staged pipeline load → segment → extract → validate → resolve → link → serialize, and The produces edge is canonical on the producing process only pins a hard graph invariant: produces is canonical on a process only; any other type that "results in" a thing references it associatively via relates_to. The extract stage asks a probabilistic model (forced tool use) to honor that invariant in the prompt. A prompt rule reduces the rate of a slip; it cannot make the invariant true. Two failure shapes recur in real extraction:

  1. A non-process atom (system/role/capability/…) carries a produces edge — a DEC-0007 type error.
  2. A process states its producer relationship in the descriptive outputs array (free text) instead of the canonical produces edge, so the artifact reads as an orphan-artifact (DEC-0007: every artifact needs exactly one producing process).
  3. The model references a tool/entity by id (bigquery, an ADK) in an edge without minting the atom — a dangling-edge graph error.

All three were observed live: shapes 1 and 3 on the original OKF-README run that motivated this work; shape 2 surfaced when re-running the SAME doc post-fix (the model emitted outputs: [okf-concept-document] on a process but no produces edge). Before this decision, any of these would trip validateGraph and, under failOnGraphErrors, fail the whole run — discarding good atoms for a defect that is mechanically correctable and whose remedy DEC-0007 already prescribes.

Options considered

  1. Prompt-only (status quo). Keep asking the model to honor DEC-0007 and reject the run when it slips. Rejected: a probabilistic guarantee for an invariant that must be deterministic; throws away a whole run for a one-edge defect; the residual slip-rate is unmeasured.
  2. Drop the offending edge / dangling reference silently. Rejected twice over: a downgrade-by-deletion loses a real relationship (the model was right about the connection, wrong about the label), and silently dropping a dangling reference erases provenance. Silent correction also makes the model's true error-rate invisible — the opposite of Live extraction eval harness — what we measure is what extraction optimizes for's intent.
  3. Insert a deterministic, offline repair stage between resolve and link that corrects and records (chosen). Make the invariant true by construction, preserve the relationship, ground the reference, and surface every repair so it stays auditable and measurable.

Decision

Add Stage 5.5 — repair(ResolvedAtom[], opts) → { atoms, repairs } to @dossier/extraction, composed in run() between Stage 5 (resolve/dedup) and Stage 6 (link), so the graph validator only ever sees a set in which the DEC-0007 invariants already hold. The stage is pure (returns new atoms, never mutates inputs), deterministic, and offline (no LLM). Three repairs, all grounded in The produces edge is canonical on the producing process only:

  1. Process-only edge on a non-process atom → downgrade to relates_to. The targets are moved (not dropped) onto relates_to — the exact remedy DEC-0007 prescribes for a capability. A PROCESS_ONLY_EDGE_FIELDS set names the rule by intent (['produces'] today) and extends if DEC-0007 pins more.
  2. Orphan artifact + a process naming it in outputs → promote to produces. When an otherwise-orphaned artifact id appears in some process's descriptive outputs array, promote it to the canonical produces edge on that process. This is normalization, not invention: the model already declared the output; we only move it to DEC-0007's canonical label. Promotion fires ONLY when a process already names the artifact (never a fabricated producer), and the first process by id order wins so the artifact keeps exactly one declared producer (single source of truth).
  3. Dangling edge to an undefined, non-known-external id → mint a thin system stub. Rather than emit an edge to nothing, mint a minimal stub (type: system, confidence: inferred, the referrer's source/timestamp as provenance, a body that flags it for curation). system is chosen because the common dangling target is a tool an atom uses and a system carries no producer obligation (an artifact stub would itself become an orphan needing a process). mintStubs: false opts out (leave dangling for the validator). knownExternalIds mirrors ValidateGraphOptions so a cross-silo reference is dangling to neither the guard nor the validator.

Every change is recorded as a Repair[] and surfaced on the run result (PipelineResult.repairs) — never silent. The stage mutates only the derived candidate set; atoms here are still pre-emit and still inferred, so no source of truth is touched (Adopt OKF as Dossier's canonical knowledge format).

Rationale

  • Make the invariant true by construction. Determinism belongs in code, not in a prompt. The prompt still reduces the slip rate; the guard makes the residual slip non-fatal and self-correcting — a slip auto-repairs instead of failing the run.
  • Preserve the relationship, ground the reference. The model is usually right that a connection exists and wrong only about its label or about minting the target. Moving (not dropping) edges and minting (not erasing) references keeps the graph both whole and faithful.
  • Auditable + measurable. Surfacing every repair keeps the loop honest and lets Live extraction eval harness — what we measure is what extraction optimizes for read the prompt's true residual error-rate from the repair count — a silent fix would hide exactly the number we want to drive down.
  • Stubs are never silently authoritative. A minted stub is inferred, thin, and body-flagged "enrich or delete" so a curator can never mistake it for verified knowledge — consistent with provenance-always (Dossier — The Knowledge Model (v0)).
  • Cheap and safe to reverse. A pure function between two stages; toggleable per-repair; removable to restore the prior reject behavior.

Consequences

  • run() no longer fails on a mechanically-correctable DEC-0007 slip. Good atoms survive; the graph validator sees a clean set; failOnGraphErrors now fires only on defects the repair stage can't (or is told not to) fix.
  • RunResult gains a first-class repairs: readonly Repair[] (empty = the model emitted a clean graph). Downstream consumers (eval, the runtime loop, reconcile) can read what was auto-corrected.
  • A new class of atom exists in output: the minted stub. It is inferred and curation-flagged, but it is a new node in the tenant graph — curators and reconcile should expect and enrich/retire them. (Reconcile churn on stubs is worth watching — see Review.)
  • The repair set is a living list keyed to DEC-0007. If DEC-0007 ever pins another process-only edge, add it to PROCESS_ONLY_EDGE_FIELDS — the rule reads by intent so the extension is one line.

Prompt complement (Gap 1 + Gap 2 reduction)

This guard is the enforcement half; the Extraction runtime architecture — the moat extraction prompt (packages/extraction/src/pipeline/prompt.ts) carries the reduction half, tightened in the same change: rule 4 now extends the never-fabricate discipline to every structured decision field (options_considered, reversibility, consequences, decided_by, review_date — omit, never invent; all are schema-optional) rather than rationale alone, and rule 5 forbids a process-only edge on a non-process atom, forbids an edge to an unminted id, and tells a process to declare an artifact via the produces edge (not only outputs). The decision-field fabrication (Gap 1) has no cheap deterministic check — a fabricated options_considered is schema-valid — so it is prompt-only, pinned by a prompt-contract test; the edge invariants (Gap 2) are belt-and-suspenders (prompt reduces, guard enforces).

Review

Recorded asserted (a methodology choice whose correctness is unit-proven but whose effectiveness as a quality lever is not yet validated against a real client corpus):

  • Verified now (unit): packages/extraction/test/repair.test.ts passes green — all three repairs, the move-not-drop / promote-not-invent semantics, single-declared-producer ordering, the stub provenance, the mintStubs: false opt-out, and knownExternalIds are covered; run-repair.test.ts drives the full run() with the proven run's offending atoms and proves it completes clean (zero graph errors) with the repairs surfaced; pipeline.test.ts pins the prompt contract and the omitted-decision-field validity. Extraction suite 81 pass / 1 skip (was 62 / 1); okf unchanged at 143.
  • Verified manually, live (one-off — NOT in the offline suite, not reproduced by CI): re-running the SAME Google OKF README through run() on the subscription transport (claude -p, no API key), out to a scratch dir (never knowledge/), completed with failOnGraphErrors: true and zero graph errors across repeated runs; the emitted decision carries faithful context/decision/rationale (and source-grounded consequences) with options_considered/reversibility omitted (the proven run had fabricated both), and the system references its artifact via relates_to, never produces.
  • Promotion gate (asserted → verified): the Live extraction eval harness — what we measure is what extraction optimizes for live harness shows (a) the repair stage lowers graph-integrity errors to zero on the gold corpus without depressing edge-faithfulness (downgrades/promotions land on the right targets) and (b) minted-stub volume stays low enough that stubs don't pollute the graph or churn reconcile. Revisit if stub volume is high (tighten minting / route gaps back to extraction per First full-loop SERVE on a real external client — reconcile divergent extraction runs to one canonical KB on a quality rubric; lexical retrieval sufficient (VectorRetriever seam not yet needed)) or if a downgrade/promotion ever moves a target that was legitimately a different edge.