Fix extraction type-discipline — `system` used as a catch-all + non-slug ids (RBA run)

task-extraction-system-type-discipline-rba

task confidence inferred status done 2026-06-19 owner extraction-engineer
source log-auditor — surfaced from the FDE QA pass on the 33-page RBA Firecrawl tenant (DEC-0055), independently corroborated by the docs-surface build (same non-slug URLs); closed from the reference-tenant QA pass (tenant commit `8229530`)

Fix extraction type-discipline — system as a catch-all + non-slug ids

The RBA Firecrawl run (First live FirecrawlConnector run against a real client source — field evidence for the reserved web seam) surfaced two coupled extraction-quality gaps, independently corroborated by the docs-surface build (which rendered the same ugly URLs — so this is not a single-tool artifact).

The two gaps

  1. system is being used as a catch-all. Of 25 system atoms, 18 are mis-typed: UX deliverables like "Clickable Prototype" / "Wireframes" (these are artifacts), process phases (these are processes), and a methodology. The Dossier — The Knowledge Model (v0) reserves system for a tool/software/platform the org uses (Salesforce, Figma, SharePoint) — none of these qualify.
  2. Non-slug ids. The same atoms carry ids with spaces, parens, and uppercase — e.g. /systems/Organizational Change Management (OCM)/. The knowledge-model requires id to be a stable unique slug — the permanent address; these violate the stable-address convention and produce the ugly URLs the docs build showed.

Why it matters beyond cosmetics

The type confusion is a root of several of the same-type duplicate clusters in Make the learning loop dedup/reconcile at scale (collapse same-type duplicate clusters; default-on compounding) (a concept extracted once as system and once correctly will not dedup across types). And non-slug ids poison the route map / GraphRAG addressability the whole model depends on (knowledge-model principle 6: stable slugs as permanent addresses).

Why a task, not a fix-in-place

Re-typing 18 atoms correctly (which is artifact vs process vs the methodology case) is a knowledge-model judgment for the Principal Knowledge-Format Architect, and enforcing type-discipline + id-slugification in the extraction path is a code change owned by the Knowledge-Extraction & GraphRAG Engineer — not a one-token hygiene correction. Scoped to packages/extraction (the durable fix) + the RBA tenant OKF (clients/rba/tenants-firecrawl/rba-consulting, a gitignored sandbox per Fix git-per-tenant isolation when a tenant root is nested inside another repo) for the re-type/re-slug. Filed by the log-auditor from the QA pass; confidence: inferred.

Resolution (2026-06-19, tenant commit 8229530)

DONE via deterministic data surgery (no LLM re-extraction). Closed backlog → done:

The durable extraction-time fix (type discipline + id-slugification at emit, so a future run can't regress) remains the curation lesson recorded in Dossier — Decision & Audit Log under DEC-0056's frame — the post-hoc surgery here is the stopgap. The system catch-all root cause is the auto-minted workflow-stages stub defaulting to type: system.