Language-pack backlog — author per-language tree-sitter tag-query + schema packs beyond TS/Python (CONTINGENT on the v1 build proceeding)

task-codebase-language-pack-backlog

task confidence inferred status backlog 2026-06-16 owner ingestion-engineer
source log-auditor — filed from DEC-0040 as the contingent per-language coverage backlog the decision names (downstream of the go/no-go spike)

Language-pack backlog — per-language tree-sitter packs beyond TS/Python (contingent)

Codebase ingestion as the 4th connector — a three-layer deterministic code-graph substrate + git-mined "why", gated on a de-risk spike and dogfooded on this repo first names the language pack as layer 2 of the code-ingestion substrate: the unit of incremental language coverage. The v1 cut is TypeScript + Python; this task tracks authoring packs for the rest (Go, Java/C#, etc.) — but only if the v1 build proceeds, and only after the de-risk spike returns a go.

What a language pack is (and is not)

A pack is per-language tree-sitter tag-query (.scm) + schema data that emits into the CLOSED structural edge kinds (contains / imports / calls / references). It is substrate-internal DATA, not a fork — "treats every codebase the same" holds at the taxonomy level precisely because a per-language query pack sits underneath the closed taxonomy. Adding a language is adding a pack; it never widens or redefines the substrate's closed node/edge set. (Framework/platform specificity is a different layer — registry-driven overlays, deferred to v2+.)

Prioritization

By DXA-client stack clustering — the languages that real client codebases actually present (Digital Experience Agency vertical as the first reference implementation is the GTM lens). Each pack is independently shippable.

Why this is contingent / blocked

It is downstream of De-risk spike (GO/NO-GO) — mine the git "why" through the existing faithfulness judge, report two numbers (the go/no-go gate) and of substrate v1 existing — there is nothing to emit into until the substrate is built, and the substrate is not built until the spike returns a go. The dependencies edge to the spike is an ordering hint the owner checks, not an automatic hold (per Dossier — The Knowledge Model (v0) / Dossier — Work Items (the agentic board)). Owned by the Ingestion & Connectors Engineer (it owns the connector seam — Ingestion connector seam — assemble, don't build, and ingestion owns the input contract — and language packs are the substrate's incremental connector-coverage unit). Provenance: filed by the log-auditor from the ratified DEC-0040; confidence: inferred (agent-filed from the decision, not human-curated). Kept lean: this is a single backlog tracker, not one task per language — split it per-pack only when the v1 build actually proceeds.