MCP agentic foundation — tenant-scoped GraphRAG over the OKF KB

0011-mcp-agentic-foundation-graphrag

decision read as Explain confidence asserted status active 2026-06-14 owner mcp-engineer

Reversibility: two-way door

DEC-0011 — MCP agentic foundation (tenant-scoped GraphRAG over the OKF KB)

Reversibility: two-way door — on retrieval internals, the tool surface, and ranking; the embeddings/vector seam pattern and the one-server-one-tenant isolation invariant are the durable parts.

Context

Extraction runtime architecture — the moat reserved @dossier/mcp (name + dir + README stub) as the layer that would "serve GraphRAG off the derived graph"; Claude-primitives-first build strategy names the MCP server as the agentic foundation — "expose each client's OKF KB to downstream agents." With the extraction moat built and verified (DEC-0008), the missing piece is the read/serve side: the layer where the institutional memory actually reaches an agent. This decision builds that reserved package — the fourth architecture layer (okf → extraction → eval → agentic foundation) and the layer the product's differentiation lives in.

The build (verified this session, all green, no network): @dossier/mcp on the official @modelcontextprotocol/sdk, reusing @dossier/okf (parse / buildGraph / validateGraph) — the keystone dependency, never re-implemented. It loads the real DXA vertical (dxa-vertical, verticals/digital-experience-agency/, 53 atoms / 174 edges) as the test tenant KB. Verified: @dossier/mcp 37 tests green; repo-wide 225 passed / 1 gated-skip.

Options considered

1. Retrieval strategy — how an agent finds the right knowledge.

(a) Vector-only. Embed atoms, return nearest neighbors. Strong recall, but a black box: the answer is a similarity neighborhood with no account of why these atoms belong together. For a system whose whole thesis is captured judgment and relationships (Dossier — The Knowledge Model (v0) principle 5), this throws away the most valuable signal — the typed edges — at query time.
(b) Graph-only. Traverse typed edges from a known starting atom. Fully explainable, but needs a seed; with no lexical/semantic entry point an agent must already know the id to start from. Brittle as the front door.
(c) GraphRAG = search-seed + typed-graph-expand (chosen). A retriever seeds a small set of relevant atoms, then a bounded BFS over the typed OKF edges expands to the reached subgraph — returning the reached atoms plus the exact edges traversed. Every hop is a real, named relationship, so the result is a traceable subgraph, not a similarity blob.

2. Multi-tenancy posture — how client KBs are isolated.

(a) Pooled multi-tenant. One server, many client KBs, filtered by a tenant key on each query. Efficient, but a single filter bug or a crafted id is a cross-client leak — the exact MCP-isolation risk Claude-primitives-first build strategy flagged for review, against the sovereignty promise of Dossier — Mission & North Star.
(b) Siloed — one server = one tenant = one OKF repo (chosen). The isolation boundary is the process, not a query filter. Matches DEC-0008's "siloed OKF repo output / one client = one OKF git repo" and is the conservative default until multi-tenant scale forces otherwise.

3. Embeddings backend — deterministic vs. live vector now.

(a) Ship a live embeddings/vector index now (real semantic recall day one) — but networks CI, couples the serve layer to a vendor embeddings SDK, and stands up a vector store as if it were the source of truth.
(b) Deterministic lexical default now; live vector behind an injectable seam, reserved (chosen). A vendor embeddings SDK would import in exactly one place; CI stays fully offline; the OKF repo stays the system of record and any index is a derived cache.

Decision

1. GraphRAG retrieval — search-seed + typed-graph-expand. get_related seeds from a concept (or a search result), then BFS-expands the typed edges from @dossier/okf's buildGraph to a bounded depth, returning the reached subgraph and the exact edges traversed. Explainability is a first-class output, not a debug afterthought — this is the differentiation vs. vector-only retrieval and why the stack chose GraphRAG.

2. Retrieval is deterministic lexical by default, with a reserved vector seam. The default retriever is deterministic lexical scoring (offline, repeatable). Semantic retrieval sits behind an injectable Embedder / VectorIndex seam — the only place a vendor embeddings SDK would import — mirroring the ClaudeClient seam pattern from Extraction runtime architecture — the moat. The live embeddings backend is intentionally deferred. This keeps CI fully offline and honors Adopt OKF as Dossier's canonical knowledge format: the OKF repo is the system of record; the KB index and any vector index are derived, replaceable caches.

3. Tenant isolation invariant — one server = one tenant = one OKF repo. A single explicit TenantConfig { clientId, okfRepoPath, knownExternalIds? } is threaded, never ambient. A two-layer file gate makes the silo real at the agent boundary:

confinePath rejects .. / absolute-escape / sibling-prefix tricks syntactically before any I/O.
lstat + realpath detects symlinks and re-confines their target, so a link can't point out of the repo. knownExternalIds only suppresses dangling-edge flags for atoms known to live outside this repo — it is never a read backdoor; nothing outside okfRepoPath is ever served.

4. Five retrieval tools.

search_concepts — lexical (vector-ready) search → seed atoms.
get_concept — fetch one atom by id (confined read).
get_related — the GraphRAG differentiator: seed-expand over typed edges, returns subgraph + traversed edges.
list_concepts — enumerate/filter the tenant KB.
kb_health — surfaces @dossier/okf validateGraph integrity (DEC-0007 violations, dangling edges) to agents/operators.

Rationale

Explainable retrieval is the product. GraphRAG returns why atoms belong together — the traversed typed edges — which is exactly the relationship-and-judgment IP Dossier — The Knowledge Model (v0) and Dossier — Mission & North Star exist to preserve. Vector-only would discard at query time the very signal extraction worked to capture.
The seam keeps us sovereign and offline. Putting embeddings behind an injectable interface (like ClaudeClient) means CI never networks, the vector index is a swappable cache not a dependency, and Adopt OKF as Dossier's canonical knowledge format's "OKF is the system of record; indexes are replaceable" holds at the serve layer — not just the author layer.
Isolation by process, not by filter, is the conservative read of the flagged risk. Claude-primitives-first build strategy explicitly reserved review for MCP isolation at multi-tenant scale. One-server-one-tenant makes the boundary structural; the two-layer file gate (syntactic confinement + symlink re-confinement) makes "siloed per client" (DEC-0008) enforced at the agent boundary, not merely intended.
The keystone is reused, not re-implemented. @dossier/mcp binds to @dossier/okf's parse / buildGraph / validateGraph — the same contract extraction targets — so serve and author can never disagree about what an atom or an edge is.
asserted, not verified. Built and verified green against the real DXA KB (37 tests; repo-wide 225 passed / 1 gated-skip, no network) — but the isolation invariant and retrieval quality are not yet battle-tested at real multi-tenant scale, and the live vector backend is unbuilt. This is design-level conviction backed by an offline single-tenant run, not field evidence.

Consequences

Institutional memory is now agent-queryable — with explainable, tenant-isolated retrieval. The DEC-0008 reserved-@dossier/mcp loop is closed; the fourth architecture layer (the agentic foundation) stands up.
The embeddings seam is a standing reservation. When corpus size demands semantic recall, the live Embedder / VectorIndex backend drops into the existing seam — no serve-layer rewrite, no change to the OKF-as-source-of-truth invariant.
knownExternalIds is a documented, narrow affordance (dangling-edge suppression only). It must never be widened into a cross-repo read path without revisiting this decision — that would breach the isolation invariant.
Isolation correctness is now a tested boundary (confinePath + symlink re-confinement) and must stay tested: new read paths in @dossier/mcp go through the gate or they regress the sovereignty guarantee.
Two-way vs. durable. Retrieval internals, ranking, and the exact tool surface are expected to evolve (two-way door). The seam pattern and the one-server-one-tenant isolation invariant are the durable, harder-to-reverse commitments.

Review

Revisit at real multi-tenant scale: does one-server-one-tenant hold operationally, or does provisioning pressure (the platform-engineer's @dossier/runtime domain) force a pooled model with a hardened filter? And wire the live vector backend through the reserved seam when corpus size makes lexical recall the bottleneck — at which point re-examine retrieval quality against a real client KB and consider promoting confidence to verified.