The compounding merge — the per-tenant learning loop accumulates by id + confidence instead of overwriting (okf reconcile() + opt-in reconcile in extraction/runtime)

0028-compounding-reconcile-merge

decision read as Explain confidence verified status active 2026-06-16 owner extraction-engineer
Reversibility
two-way door

DEC-0028 — The compounding merge

Reversibility: two-way door — the merge internals, the orphan handling, and the CLI flag are swappable; the durable commitments are that the loop compounds by id + confidence, that human curation is never clobbered by a lower-confidence machine write, and that vanished atoms are flagged, not deleted.

Context

The mission (Dossier — Mission & North Star) names the product as a "compounding learning loop ... that humans curate and agents extend." That word was unbuilt. The per-tenant loop (Runtime orchestration & per-tenant control plane — the learning loop becomes a runnable system) overwrote on every extraction pass instead of accumulating — verified this session by tracing three places in @dossier/extraction:

  • emit.ts writeFile-overwrote atoms by path; its comment literally read "emit overwrites deterministically."
  • run() never read the existing KB back before emitting.
  • prompt.ts instructed the model (rule 7) to "reuse an existing id" while supplying no existing ids to reuse.

The consequence: a second extraction pass silently clobbered human curation (a human's edited body and promoted confidence were overwritten back to the machine's inferred output), and on any id drift it accreted orphan/duplicate atoms instead of updating in place. The loop "ran again" but did not compound. This decision makes the loop actually accumulate — the missing half of the mission. Built and verified green offline this session (2026-06-16).

Options considered

1. Whether to make the loop compound at all — keep overwrite vs. merge.

  • (a) Keep wholesale overwrite (writeFile by path, no read-back). Rejected: it is the direct cause of the clobber — it makes "humans curate and agents extend" impossible (the next agent pass erases the human's curation), and it is unsafe under any repeated agent write, which is exactly what Agentic "sprint board" architecture — a git-resident OKF task board worked by bounded, hook-governed Agent SDK loops requires.
  • (b) Reconcile existing-on-disk against incoming by atom id (chosen). The loop iteration becomes a diff, not a rewrite — atoms accumulate, human edits survive, and vanished atoms are flagged rather than deleted.

2. Where the merge logic lives — a new schema field vs. the existing confidence ladder, and which package owns it.

  • (a) Add a new schema field (e.g. a curated/locked flag) to mark human-owned atoms. Rejected as unnecessary: the existing confidence ladder (verified > asserted > inferred, the order of confidenceValues) already encodes "humans curate, agents extend" — machine extraction stamps inferred, a human promotes to asserted/verified. The guard rides the field we already have; no new schema.
  • (b) Put the merge in @dossier/extraction or @dossier/runtime. Rejected: atom identity belongs to the keystone. @dossier/okf owns id, parse/serialize, and validation — so it must own atom merge.
  • (c) A PURE reconcile() in @dossier/okf, keyed on id, honoring the confidence ladder (chosen). No fs/network — a pure function the keystone owns, callable from anywhere, testable with no I/O.

3. What happens to an atom that is in the KB but absent from this pass — delete vs. flag (orphaned).

  • (a) Delete it (treat the latest extraction as authoritative). Rejected hard: losing institutional memory must be a curation act, never a silent side effect of re-running extraction. A connector that simply didn't crawl a page this run would otherwise erase real knowledge.
  • (b) Leave it on disk and FLAG it orphaned (chosen). Surfaced in the summary; a human (or a later curation step) decides.

Decision

Make the per-tenant loop compound: reconcile existing-on-disk vs. incoming by atom id, governed by the confidence ladder, opt-in, with deletion never an extraction side effect. Three coupled pieces:

  • @dossier/okf reconcile() — NEW pure function (packages/okf/src/reconcile.ts, exported from src/index.ts). No fs/network (the keystone owns atom identity, so it owns atom merge). Merges by id, honoring the confidence ladder. Actions:
    • added — new id.
    • updated — same id, changed content, incoming confidence ≥ existing.
    • unchanged — byte-identical after canonical serialize.
    • preservedTHE CURATION GUARD: a lower-confidence incoming NEVER clobbers a higher-confidence atom; the existing atom is kept.
    • orphaned — in the KB but absent from this pass → FLAGGED, never deleted.
    • Rides the existing confidence field — no new schema. Cold start (empty KB) → every atom added, byte-for-byte identical to the old overwrite path (so the feature is opt-in with zero behavior change on the first run).
  • @dossier/extraction run() (packages/extraction/src/index.ts) — opt-in reconcile?: boolean on RunOptions + a new reconcile?: ReconcileSummary on RunResult. When on: reads the existing tenant OKF repo once (new readOkfRepo helper — recursive, skips .git, missing dir = cold run, a hand-broken atom is counted unparseable and left on disk, never crashes the loop), feeds its ids into the prompt, reconciles, and emits only the add/update diff (preserved/unchanged/orphaned left untouched on disk). Closes the prompt.ts gap: buildMessages(segment, knownIds) injects the existing-id catalog as a cached-within-run turn after the few-shot, so rule 7 ("reuse an existing id") is actionable on LIVE runs. serialize.ts refactored to expose serializeAtomToFile(atom, body) — the SSOT path layout used by both the serialize stage and the reconcile path.
  • @dossier/runtime (packages/runtime/src/loop.ts + cli.ts) — LoopOptions.reconcile forwarded to run(); LoopResult.reconcile surfaces the per-iteration tally (the observable proof memory accumulated); stage reports show "reconciled — N added, N updated, N preserved, N orphaned"; new CLI flag dossier-runtime run --reconcile.

Rationale

  • It is the mission's missing half — and the absence was verified, not assumed. "Compounding ... humans curate and agents extend" (Dossier — Mission & North Star) was contradicted by emit.ts/run()/prompt.ts; the clobber failure mode was traced in code and then reproduced (the contrast test below). This makes the word real.
  • The confidence ladder already encodes curation, so the guard needs no new field. Machine extraction stamps inferred; a human promotes to asserted/verified. preserved is simply "a lower-confidence machine write does not overwrite a higher-confidence atom" — semantics the existing field already carries (Dossier — The Knowledge Model (v0)).
  • It is the PREREQUISITE for the agentic board. "Agents extend" (Agentic "sprint board" architecture — a git-resident OKF task board worked by bounded, hook-governed Agent SDK loops) means agents write to the KB repeatedly — which is unsafe while a re-extraction overwrites wholesale. Reconcile is the foundation that board sits on.
  • It is the return-arc of the runtime loop, and it keeps sovereignty literal. A loop iteration (Runtime orchestration & per-tenant control plane — the learning loop becomes a runnable system) becomes a real diff in the client's own git, not a wholesale rewrite, and deletion is never an extraction side effect (Adopt OKF as Dossier's canonical knowledge format — the client owns the record of how their memory was learned; vanishing knowledge must be a curation act).
  • Opt-in with an identical cold start de-risks it. On an empty KB the reconcile path is byte-for-byte the old overwrite path, so adopting it changes nothing on a first run; the divergence only appears on re-passes, exactly where compounding is wanted.
  • verified, not asserted. We observed the clobber failure mode, confirmed its root cause in code, and a contrast test reproduces it; the fix is reproduced green end-to-end offline (below). This is mechanism-level verification, not a market claim.

Consequences

  • The loop now compounds. Re-running extraction over an existing tenant KB accumulates by id and confidence: new atoms add, changed machine atoms update, human-curated atoms are preserved, and atoms no longer seen are orphaned (flagged, on disk) — never silently overwritten or deleted.
  • Human curation survives the next machine pass — the core behavior the platform sells, now actually built.
  • Observability. The runtime surfaces a per-iteration tally ("reconciled — N added, N updated, N preserved, N orphaned") — the visible proof that memory accumulated. New CLI flag dossier-runtime run --reconcile.
  • Build-side only. pnpm plugin:check in sync — nothing added to the client-facing plugin subset; SSOT intact (serializeAtomToFile is the single path-layout source for both stages).
  • Verification (reproduced this session, no network). pnpm test 322 passed / 1 skipped (was 310/1 → +12): packages/okf/test/reconcile.test.ts (9 — add/update/unchanged/preserved-guard/orphan/id-less/full-merge) and packages/runtime/test/reconcile-loop.test.ts (3): (i) first pass adds 3, then a human edit (body rewrite + confidenceverified) SURVIVES the second machine pass (added:0 updated:0 unchanged:2 preserved:1 orphaned:0; disk re-read shows confidence still verified and the human-curated body intact; the id-feed reaching the prompt asserted via MockClaudeClient.requests); (ii) CONTRASTwithout --reconcile the same edit is clobbered back to inferred (proves why the feature exists); (iii) an empty re-pass orphans all 3 and leaves them on disk. pnpm typecheck (tsc -b) exit 0; pnpm lint 0 errors; okf/extraction/runtime bundles clean.
  • Two-way vs. durable. The merge internals, the orphan handling, and the CLI flag are swappable (two-way door). The durable commitments are: the loop compounds by id + confidence; human curation is never clobbered by a lower-confidence machine write; and vanished atoms are flagged, not deleted.

Review

Promote nothing further until the reconcile path runs on a live re-crawl (not just the fixture) and the open items below are addressed, then revisit whether orphaned should become a first-class lifecycle signal. Open follow-ups, routed (not resolved here):