Improve extraction EMIT-time type discipline for DXA vertical types (so future runs don't need post-hoc curation)

task-extraction-emit-time-vertical-type-discipline

task confidence inferred status backlog 2026-06-19 owner extraction-engineer
source log-auditor — surfaced closing two sibling RBA type-discipline curation tasks (task-rba-engagement-retype-client-workflows + task-extraction-system-type-discipline-rba); the extraction-engineer flagged the shared root cause (weak emit-time vertical-type assignment), and both closed tasks' notes record the durable emit-time fix as a lesson never yet filed as forward work

Improve extraction EMIT-time type discipline for DXA vertical types

Surfaced by the log-auditor while closing two sibling RBA type-discipline curation tasks, and flagged by the Knowledge-Extraction & GraphRAG Engineer as their shared root cause: the extraction EMIT path assigns vertical/concept types weakly, so the same class of mis-typing recurred twice on one tenant and was each time fixed by hand after the fact.

The one root cause behind two closed tasks

Closed task Mis-emit Fixed by (stopgap)
Re-type 6 client-specific RBA `workflow` atoms as DXA `engagement`s (single-client SOW instances, not standing orchestrations) single-client SOW instance emitted as generic workflow instead of DXA engagement 6 atoms re-typed (2 new engagement nodes, 4 merged), tenant commit 975cc83
Fix extraction type-discipline — `system` used as a catch-all + non-slug ids (RBA run) deliverables / process phases emitted as system (the catch-all) instead of artifact / process 23 atoms re-typed, ids slugified, tenant commit 8229530

Both were closed by deterministic data surgery on the RBA tenant — a one-time, post-hoc re-type. Both closing notes recorded the durable fix as a lesson rather than filed work: the surgery is the stopgap; the emit path is where the type call should be made. This task makes that durable fix real, so future extractions are type-disciplined at emit and need no curation pass.

What "emit-time type discipline" means

A heuristic on the extraction EMIT path (Extraction runtime architecture — the moat) that strengthens a concept's type from grounded signals in the source, conservatively:

Why a task (forward-looking), not a fix-in-place or a new ADR

This is forward-looking loop improvement, not tenant-data cleanup — the RBA tenant is already type-disciplined, conformance-clean, and graph-clean after the two surgery passes (parse 100%, validateGraph 0/0). Building a heuristic on the emit path and validating it against the Live extraction eval harness — what we measure is what extraction optimizes for judge is an extraction-layer code change owned by the Knowledge-Extraction & GraphRAG Engineer, with the Principal Knowledge-Format Architect confirming the type calls — not a one-token hygiene correction, and not a new direction (it executes the lesson already recorded in Task Board — Audit Log / Dossier — Decision & Audit Log under DEC-0056's and DEC-0057's frames, so no new ADR). Board globbed before filing — the two sibling RBA tasks are the post-hoc curation (both done); no open task covered the emit-time durable fix (grep of "emit-time" / "type-discipline" / "heuristic" returned only those closed RBA surgery tasks). confidence: inferred (agent-filed).