Improve extraction EMIT-time type discipline for DXA vertical types (so future runs don't need post-hoc curation)

task-extraction-emit-time-vertical-type-discipline

task confidence inferred status backlog 2026-06-19 owner extraction-engineer

source log-auditor — surfaced closing two sibling RBA type-discipline curation tasks (task-rba-engagement-retype-client-workflows + task-extraction-system-type-discipline-rba); the extraction-engineer flagged the shared root cause (weak emit-time vertical-type assignment), and both closed tasks' notes record the durable emit-time fix as a lesson never yet filed as forward work

Improve extraction EMIT-time type discipline for DXA vertical types

Surfaced by the log-auditor while closing two sibling RBA type-discipline curation tasks, and flagged by the Knowledge-Extraction & GraphRAG Engineer as their shared root cause: the extraction EMIT path assigns vertical/concept types weakly, so the same class of mis-typing recurred twice on one tenant and was each time fixed by hand after the fact.

The one root cause behind two closed tasks

Closed task	Mis-emit	Fixed by (stopgap)
Re-type 6 client-specific RBA `workflow` atoms as DXA `engagement`s (single-client SOW instances, not standing orchestrations)	single-client SOW instance emitted as generic `workflow` instead of DXA `engagement`	6 atoms re-typed (2 new engagement nodes, 4 merged), tenant commit `975cc83`
Fix extraction type-discipline — `system` used as a catch-all + non-slug ids (RBA run)	deliverables / process phases emitted as `system` (the catch-all) instead of `artifact` / `process`	23 atoms re-typed, ids slugified, tenant commit `8229530`

Both were closed by deterministic data surgery on the RBA tenant — a one-time, post-hoc re-type. Both closing notes recorded the durable fix as a lesson rather than filed work: the surgery is the stopgap; the emit path is where the type call should be made. This task makes that durable fix real, so future extractions are type-disciplined at emit and need no curation pass.

What "emit-time type discipline" means

A heuristic on the extraction EMIT path (Extraction runtime architecture — the moat) that strengthens a concept's type from grounded signals in the source, conservatively:

client-name + single-client-SOW signal → DXA engagement (a named one-off delivery for a specific client) rather than workflow (the org's standing path-through-nodes). The Dossier — The Knowledge Model (v0) is explicit on workflow vs engagement (Digital Experience Agency vertical as the first reference implementation).
deliverable / output signal → artifact; phase / activity signal → process. system stays reserved to a tool/software/platform the org uses (the knowledge-model definition) — never a catch-all.
No grounding → conservative fallback. Faithfulness over coverage (Concept identity = type + (canonical-title OR prefix-stripped id), exact-match closure — dedup owned by the @dossier/okf keystone, in-pass + opt-in reconcile + loop default, knowledge-model principle 8): a defensible generic type beats a fabricated strong one.

Why a task (forward-looking), not a fix-in-place or a new ADR

This is forward-looking loop improvement, not tenant-data cleanup — the RBA tenant is already type-disciplined, conformance-clean, and graph-clean after the two surgery passes (parse 100%, validateGraph 0/0). Building a heuristic on the emit path and validating it against the Live extraction eval harness — what we measure is what extraction optimizes for judge is an extraction-layer code change owned by the Knowledge-Extraction & GraphRAG Engineer, with the Principal Knowledge-Format Architect confirming the type calls — not a one-token hygiene correction, and not a new direction (it executes the lesson already recorded in Task Board — Audit Log / Dossier — Decision & Audit Log under DEC-0056's and DEC-0057's frames, so no new ADR). Board globbed before filing — the two sibling RBA tasks are the post-hoc curation (both done); no open task covered the emit-time durable fix (grep of "emit-time" / "type-discipline" / "heuristic" returned only those closed RBA surgery tasks). confidence: inferred (agent-filed).