Fix extraction type-discipline — `system` used as a catch-all + non-slug ids (RBA run)
task-extraction-system-type-discipline-rba
Fix extraction type-discipline — system as a catch-all + non-slug ids
The RBA Firecrawl run (First live FirecrawlConnector run against a real client source — field evidence for the reserved web seam) surfaced two coupled extraction-quality gaps, independently corroborated by the docs-surface build (which rendered the same ugly URLs — so this is not a single-tool artifact).
The two gaps
systemis being used as a catch-all. Of 25systematoms, 18 are mis-typed: UX deliverables like "Clickable Prototype" / "Wireframes" (these areartifacts), process phases (these areprocesses), and a methodology. The Dossier — The Knowledge Model (v0) reservessystemfor a tool/software/platform the org uses (Salesforce, Figma, SharePoint) — none of these qualify.- Non-slug ids. The same atoms carry ids with spaces, parens, and uppercase — e.g.
/systems/Organizational Change Management (OCM)/. The knowledge-model requiresidto be a stable unique slug — the permanent address; these violate the stable-address convention and produce the ugly URLs the docs build showed.
Why it matters beyond cosmetics
The type confusion is a root of several of the same-type duplicate clusters in Make the learning loop dedup/reconcile at scale (collapse same-type duplicate clusters; default-on compounding) (a concept extracted once as system and once correctly will not dedup across types). And non-slug ids poison the route map / GraphRAG addressability the whole model depends on (knowledge-model principle 6: stable slugs as permanent addresses).
Why a task, not a fix-in-place
Re-typing 18 atoms correctly (which is artifact vs process vs the methodology case) is a knowledge-model judgment for the Principal Knowledge-Format Architect, and enforcing type-discipline + id-slugification in the extraction path is a code change owned by the Knowledge-Extraction & GraphRAG Engineer — not a one-token hygiene correction. Scoped to packages/extraction (the durable fix) + the RBA tenant OKF (clients/rba/tenants-firecrawl/rba-consulting, a gitignored sandbox per Fix git-per-tenant isolation when a tenant root is nested inside another repo) for the re-type/re-slug. Filed by the log-auditor from the QA pass; confidence: inferred.
Resolution (2026-06-19, tenant commit 8229530)
DONE via deterministic data surgery (no LLM re-extraction). Closed backlog → done:
- 23 mis-typed
systematoms re-typed — 7 UX deliverables →artifact, 16 activities/phases →process. Genuine systems keptsystem(Azure, Power Platform, Sitecore, Dynamics).systemcount 70 → 47 (verified:find systems -name '*.md'= 47 at8229530). - All 18 non-slug ids slugified to kebab-case; every inbound edge + body
[[wikilink]]remapped — 0 stranded. - Re-typing exposed 8 new same-concept merges (a stub typed
systemand the canonical atom typed correctly didn't dedup across types) → dedup re-run folded them; this is the Concept identity = type + (canonical-title OR prefix-stripped id), exact-match closure — dedup owned by the @dossier/okf keystone, in-pass + opt-in reconcile + loop default type-collision interaction, now captured in Dossier — Decision & Audit Log. - okf tests 170/170 green.
The durable extraction-time fix (type discipline + id-slugification at emit, so a future run can't regress) remains the curation lesson recorded in Dossier — Decision & Audit Log under DEC-0056's frame — the post-hoc surgery here is the stopgap. The
systemcatch-all root cause is the auto-minted workflow-stagesstub defaulting totype: system.