Deterministic edge-invariant repair stage (extraction Stage 5.5)
0042-deterministic-edge-repair-stage
- Reversibility
- two-way door
DEC-0042 — Deterministic edge-invariant repair stage (extraction Stage 5.5)
Reversibility: two-way door — the stage is a pure function inserted between two existing stages and toggleable per-repair (mintStubs: false leaves dangling edges for the validator to flag instead; removing the stage reverts to the prior reject-on-graph-error behavior). The durable commitment is the principle — make the DEC-0007 invariant true by construction rather than trusting the prompt — not this specific repair set, which extends as DEC-0007 pins more process-only edges.
Surfaced retroactively by the log-auditor via trace distillation (the
_pendingdigest flagged a non-trivial pipeline change with no DEC). The why is fully recoverable frompackages/extraction/src/pipeline/repair.tsandsrc/index.ts— recorded, not invented.
Context
Extraction runtime architecture — the moat §3 defines the staged pipeline load → segment → extract → validate → resolve → link → serialize, and The produces edge is canonical on the producing process only pins a hard graph invariant: produces is canonical on a process only; any other type that "results in" a thing references it associatively via relates_to. The extract stage asks a probabilistic model (forced tool use) to honor that invariant in the prompt. A prompt rule reduces the rate of a slip; it cannot make the invariant true. Two failure shapes recur in real extraction:
- A non-
processatom (system/role/capability/…) carries aproducesedge — a DEC-0007 type error. - A
processstates its producer relationship in the descriptiveoutputsarray (free text) instead of the canonicalproducesedge, so the artifact reads as anorphan-artifact(DEC-0007: every artifact needs exactly one producing process). - The model references a tool/entity by id (
bigquery, an ADK) in an edge without minting the atom — adangling-edgegraph error.
All three were observed live: shapes 1 and 3 on the original OKF-README run that motivated this work; shape 2 surfaced when re-running the SAME doc post-fix (the model emitted outputs: [okf-concept-document] on a process but no produces edge). Before this decision, any of these would trip validateGraph and, under failOnGraphErrors, fail the whole run — discarding good atoms for a defect that is mechanically correctable and whose remedy DEC-0007 already prescribes.
Options considered
- Prompt-only (status quo). Keep asking the model to honor DEC-0007 and reject the run when it slips. Rejected: a probabilistic guarantee for an invariant that must be deterministic; throws away a whole run for a one-edge defect; the residual slip-rate is unmeasured.
- Drop the offending edge / dangling reference silently. Rejected twice over: a downgrade-by-deletion loses a real relationship (the model was right about the connection, wrong about the label), and silently dropping a dangling reference erases provenance. Silent correction also makes the model's true error-rate invisible — the opposite of Live extraction eval harness — what we measure is what extraction optimizes for's intent.
- Insert a deterministic, offline repair stage between resolve and link that corrects and records (chosen). Make the invariant true by construction, preserve the relationship, ground the reference, and surface every repair so it stays auditable and measurable.
Decision
Add Stage 5.5 — repair(ResolvedAtom[], opts) → { atoms, repairs } to @dossier/extraction, composed in run() between Stage 5 (resolve/dedup) and Stage 6 (link), so the graph validator only ever sees a set in which the DEC-0007 invariants already hold. The stage is pure (returns new atoms, never mutates inputs), deterministic, and offline (no LLM). Three repairs, all grounded in The produces edge is canonical on the producing process only:
- Process-only edge on a non-process atom → downgrade to
relates_to. The targets are moved (not dropped) ontorelates_to— the exact remedy DEC-0007 prescribes for a capability. APROCESS_ONLY_EDGE_FIELDSset names the rule by intent (['produces']today) and extends if DEC-0007 pins more. - Orphan artifact + a process naming it in
outputs→ promote toproduces. When an otherwise-orphaned artifact id appears in someprocess's descriptiveoutputsarray, promote it to the canonicalproducesedge on that process. This is normalization, not invention: the model already declared the output; we only move it to DEC-0007's canonical label. Promotion fires ONLY when a process already names the artifact (never a fabricated producer), and the first process by id order wins so the artifact keeps exactly one declared producer (single source of truth). - Dangling edge to an undefined, non-known-external id → mint a thin
systemstub. Rather than emit an edge to nothing, mint a minimal stub (type: system,confidence: inferred, the referrer'ssource/timestampas provenance, a body that flags it for curation).systemis chosen because the common dangling target is a tool an atomusesand asystemcarries no producer obligation (anartifactstub would itself become an orphan needing a process).mintStubs: falseopts out (leave dangling for the validator).knownExternalIdsmirrorsValidateGraphOptionsso a cross-silo reference is dangling to neither the guard nor the validator.
Every change is recorded as a Repair[] and surfaced on the run result (PipelineResult.repairs) — never silent. The stage mutates only the derived candidate set; atoms here are still pre-emit and still inferred, so no source of truth is touched (Adopt OKF as Dossier's canonical knowledge format).
Rationale
- Make the invariant true by construction. Determinism belongs in code, not in a prompt. The prompt still reduces the slip rate; the guard makes the residual slip non-fatal and self-correcting — a slip auto-repairs instead of failing the run.
- Preserve the relationship, ground the reference. The model is usually right that a connection exists and wrong only about its label or about minting the target. Moving (not dropping) edges and minting (not erasing) references keeps the graph both whole and faithful.
- Auditable + measurable. Surfacing every repair keeps the loop honest and lets Live extraction eval harness — what we measure is what extraction optimizes for read the prompt's true residual error-rate from the repair count — a silent fix would hide exactly the number we want to drive down.
- Stubs are never silently authoritative. A minted stub is
inferred, thin, and body-flagged "enrich or delete" so a curator can never mistake it for verified knowledge — consistent with provenance-always (Dossier — The Knowledge Model (v0)). - Cheap and safe to reverse. A pure function between two stages; toggleable per-repair; removable to restore the prior reject behavior.
Consequences
run()no longer fails on a mechanically-correctable DEC-0007 slip. Good atoms survive; the graph validator sees a clean set;failOnGraphErrorsnow fires only on defects the repair stage can't (or is told not to) fix.RunResultgains a first-classrepairs: readonly Repair[](empty = the model emitted a clean graph). Downstream consumers (eval, the runtime loop, reconcile) can read what was auto-corrected.- A new class of atom exists in output: the minted stub. It is
inferredand curation-flagged, but it is a new node in the tenant graph — curators and reconcile should expect and enrich/retire them. (Reconcile churn on stubs is worth watching — see Review.) - The repair set is a living list keyed to DEC-0007. If DEC-0007 ever pins another process-only edge, add it to
PROCESS_ONLY_EDGE_FIELDS— the rule reads by intent so the extension is one line.
Prompt complement (Gap 1 + Gap 2 reduction)
This guard is the enforcement half; the Extraction runtime architecture — the moat extraction prompt (packages/extraction/src/pipeline/prompt.ts) carries the reduction half, tightened in the same change: rule 4 now extends the never-fabricate discipline to every structured decision field (options_considered, reversibility, consequences, decided_by, review_date — omit, never invent; all are schema-optional) rather than rationale alone, and rule 5 forbids a process-only edge on a non-process atom, forbids an edge to an unminted id, and tells a process to declare an artifact via the produces edge (not only outputs). The decision-field fabrication (Gap 1) has no cheap deterministic check — a fabricated options_considered is schema-valid — so it is prompt-only, pinned by a prompt-contract test; the edge invariants (Gap 2) are belt-and-suspenders (prompt reduces, guard enforces).
Review
Recorded asserted (a methodology choice whose correctness is unit-proven but whose effectiveness as a quality lever is not yet validated against a real client corpus):
- Verified now (unit):
packages/extraction/test/repair.test.tspasses green — all three repairs, the move-not-drop / promote-not-invent semantics, single-declared-producer ordering, the stub provenance, themintStubs: falseopt-out, andknownExternalIdsare covered;run-repair.test.tsdrives the fullrun()with the proven run's offending atoms and proves it completes clean (zero graph errors) with the repairs surfaced;pipeline.test.tspins the prompt contract and the omitted-decision-field validity. Extraction suite 81 pass / 1 skip (was 62 / 1); okf unchanged at 143. - Verified manually, live (one-off — NOT in the offline suite, not reproduced by CI): re-running the SAME Google OKF README through
run()on the subscription transport (claude -p, no API key), out to a scratch dir (neverknowledge/), completed withfailOnGraphErrors: trueand zero graph errors across repeated runs; the emitteddecisioncarries faithfulcontext/decision/rationale(and source-groundedconsequences) withoptions_considered/reversibilityomitted (the proven run had fabricated both), and thesystemreferences its artifact viarelates_to, neverproduces. - Promotion gate (
asserted → verified): the Live extraction eval harness — what we measure is what extraction optimizes for live harness shows (a) the repair stage lowers graph-integrity errors to zero on the gold corpus without depressing edge-faithfulness (downgrades/promotions land on the right targets) and (b) minted-stub volume stays low enough that stubs don't pollute the graph or churn reconcile. Revisit if stub volume is high (tighten minting / route gaps back to extraction per First full-loop SERVE on a real external client — reconcile divergent extraction runs to one canonical KB on a quality rubric; lexical retrieval sufficient (VectorRetriever seam not yet needed)) or if a downgrade/promotion ever moves a target that was legitimately a different edge.