Tighten reconcile diffs against timestamp churn on live re-crawls
task-reconcile-timestamp-churn
Tighten reconcile diffs against timestamp churn
The compounding merge — the per-tenant learning loop accumulates by id + confidence instead of overwriting (okf reconcile() + opt-in reconcile in extraction/runtime) made the loop compound by id + confidence instead of overwriting — and recorded an honest open follow-up, routed to Knowledge-Extraction & GraphRAG Engineer: timestamp derives from provenance.retrievedAt (in validate.ts). The reconcile fixtures use a fixed retrievedAt, so re-runs read unchanged; but a connector that stamps a fresh fetch time per crawl (like the HttpConnector) would bump every atom's timestamp on re-crawl → a noisy wall of updated diffs even when nothing actually changed.
This is not a curation gap (the curation guard from DEC-0028 holds) — it is a tight-diff refinement: the compounding loop should produce a git diff that reflects what actually changed, not the clock.
In review (handoff gate)
This atom sits at review — the work is proposed and awaiting the approval gate. The merge policy half (confidence precedence, whether orphaned is a first-class lifecycle signal) is the Principal Knowledge-Format Architect's call and is tracked separately; this task is the mechanism half. Approve → done.
Options
- Compare candidates modulo volatile provenance (exclude
retrievedAt-derivedtimestampfrom the change check), or - Carry the prior
timestampforward when content is byte-identical after canonical serialize. Either keeps real changes detectable while killing the clock-only churn.