MCP agentic foundation — tenant-scoped GraphRAG over the OKF KB
0011-mcp-agentic-foundation-graphrag
- Reversibility
- two-way door
DEC-0011 — MCP agentic foundation (tenant-scoped GraphRAG over the OKF KB)
Reversibility: two-way door — on retrieval internals, the tool surface, and ranking; the embeddings/vector seam pattern and the one-server-one-tenant isolation invariant are the durable parts.
Context
Extraction runtime architecture — the moat reserved @dossier/mcp (name + dir + README stub) as the layer that would "serve GraphRAG off the derived graph"; Claude-primitives-first build strategy names the MCP server as the agentic foundation — "expose each client's OKF KB to downstream agents." With the extraction moat built and verified (DEC-0008), the missing piece is the read/serve side: the layer where the institutional memory actually reaches an agent. This decision builds that reserved package — the fourth architecture layer (okf → extraction → eval → agentic foundation) and the layer the product's differentiation lives in.
The build (verified this session, all green, no network): @dossier/mcp on the official @modelcontextprotocol/sdk, reusing @dossier/okf (parse / buildGraph / validateGraph) — the keystone dependency, never re-implemented. It loads the real DXA vertical (dxa-vertical, verticals/digital-experience-agency/, 53 atoms / 174 edges) as the test tenant KB. Verified: @dossier/mcp 37 tests green; repo-wide 225 passed / 1 gated-skip.
Options considered
1. Retrieval strategy — how an agent finds the right knowledge.
- (a) Vector-only. Embed atoms, return nearest neighbors. Strong recall, but a black box: the answer is a similarity neighborhood with no account of why these atoms belong together. For a system whose whole thesis is captured judgment and relationships (Dossier — The Knowledge Model (v0) principle 5), this throws away the most valuable signal — the typed edges — at query time.
- (b) Graph-only. Traverse typed edges from a known starting atom. Fully explainable, but needs a seed; with no lexical/semantic entry point an agent must already know the id to start from. Brittle as the front door.
- (c) GraphRAG = search-seed + typed-graph-expand (chosen). A retriever seeds a small set of relevant atoms, then a bounded BFS over the typed OKF edges expands to the reached subgraph — returning the reached atoms plus the exact edges traversed. Every hop is a real, named relationship, so the result is a traceable subgraph, not a similarity blob.
2. Multi-tenancy posture — how client KBs are isolated.
- (a) Pooled multi-tenant. One server, many client KBs, filtered by a tenant key on each query. Efficient, but a single filter bug or a crafted id is a cross-client leak — the exact MCP-isolation risk Claude-primitives-first build strategy flagged for review, against the sovereignty promise of Dossier — Mission & North Star.
- (b) Siloed — one server = one tenant = one OKF repo (chosen). The isolation boundary is the process, not a query filter. Matches DEC-0008's "siloed OKF repo output / one client = one OKF git repo" and is the conservative default until multi-tenant scale forces otherwise.
3. Embeddings backend — deterministic vs. live vector now.
- (a) Ship a live embeddings/vector index now (real semantic recall day one) — but networks CI, couples the serve layer to a vendor embeddings SDK, and stands up a vector store as if it were the source of truth.
- (b) Deterministic lexical default now; live vector behind an injectable seam, reserved (chosen). A vendor embeddings SDK would import in exactly one place; CI stays fully offline; the OKF repo stays the system of record and any index is a derived cache.
Decision
1. GraphRAG retrieval — search-seed + typed-graph-expand. get_related seeds from a concept (or a search result), then BFS-expands the typed edges from @dossier/okf's buildGraph to a bounded depth, returning the reached subgraph and the exact edges traversed. Explainability is a first-class output, not a debug afterthought — this is the differentiation vs. vector-only retrieval and why the stack chose GraphRAG.
2. Retrieval is deterministic lexical by default, with a reserved vector seam. The default retriever is deterministic lexical scoring (offline, repeatable). Semantic retrieval sits behind an injectable Embedder / VectorIndex seam — the only place a vendor embeddings SDK would import — mirroring the ClaudeClient seam pattern from Extraction runtime architecture — the moat. The live embeddings backend is intentionally deferred. This keeps CI fully offline and honors Adopt OKF as Dossier's canonical knowledge format: the OKF repo is the system of record; the KB index and any vector index are derived, replaceable caches.
3. Tenant isolation invariant — one server = one tenant = one OKF repo. A single explicit TenantConfig { clientId, okfRepoPath, knownExternalIds? } is threaded, never ambient. A two-layer file gate makes the silo real at the agent boundary:
confinePathrejects../ absolute-escape / sibling-prefix tricks syntactically before any I/O.lstat+realpathdetects symlinks and re-confines their target, so a link can't point out of the repo.knownExternalIdsonly suppresses dangling-edge flags for atoms known to live outside this repo — it is never a read backdoor; nothing outsideokfRepoPathis ever served.
4. Five retrieval tools.
search_concepts— lexical (vector-ready) search → seed atoms.get_concept— fetch one atom by id (confined read).get_related— the GraphRAG differentiator: seed-expand over typed edges, returns subgraph + traversed edges.list_concepts— enumerate/filter the tenant KB.kb_health— surfaces@dossier/okfvalidateGraphintegrity (DEC-0007 violations, dangling edges) to agents/operators.
Rationale
- Explainable retrieval is the product. GraphRAG returns why atoms belong together — the traversed typed edges — which is exactly the relationship-and-judgment IP Dossier — The Knowledge Model (v0) and Dossier — Mission & North Star exist to preserve. Vector-only would discard at query time the very signal extraction worked to capture.
- The seam keeps us sovereign and offline. Putting embeddings behind an injectable interface (like
ClaudeClient) means CI never networks, the vector index is a swappable cache not a dependency, and Adopt OKF as Dossier's canonical knowledge format's "OKF is the system of record; indexes are replaceable" holds at the serve layer — not just the author layer. - Isolation by process, not by filter, is the conservative read of the flagged risk. Claude-primitives-first build strategy explicitly reserved review for MCP isolation at multi-tenant scale. One-server-one-tenant makes the boundary structural; the two-layer file gate (syntactic confinement + symlink re-confinement) makes "siloed per client" (DEC-0008) enforced at the agent boundary, not merely intended.
- The keystone is reused, not re-implemented.
@dossier/mcpbinds to@dossier/okf'sparse/buildGraph/validateGraph— the same contract extraction targets — so serve and author can never disagree about what an atom or an edge is. asserted, notverified. Built and verified green against the real DXA KB (37 tests; repo-wide 225 passed / 1 gated-skip, no network) — but the isolation invariant and retrieval quality are not yet battle-tested at real multi-tenant scale, and the live vector backend is unbuilt. This is design-level conviction backed by an offline single-tenant run, not field evidence.
Consequences
- Institutional memory is now agent-queryable — with explainable, tenant-isolated retrieval. The DEC-0008 reserved-
@dossier/mcploop is closed; the fourth architecture layer (the agentic foundation) stands up. - The embeddings seam is a standing reservation. When corpus size demands semantic recall, the live
Embedder/VectorIndexbackend drops into the existing seam — no serve-layer rewrite, no change to the OKF-as-source-of-truth invariant. knownExternalIdsis a documented, narrow affordance (dangling-edge suppression only). It must never be widened into a cross-repo read path without revisiting this decision — that would breach the isolation invariant.- Isolation correctness is now a tested boundary (
confinePath+ symlink re-confinement) and must stay tested: new read paths in@dossier/mcpgo through the gate or they regress the sovereignty guarantee. - Two-way vs. durable. Retrieval internals, ranking, and the exact tool surface are expected to evolve (two-way door). The seam pattern and the one-server-one-tenant isolation invariant are the durable, harder-to-reverse commitments.
Review
Revisit at real multi-tenant scale: does one-server-one-tenant hold operationally, or does provisioning pressure (the platform-engineer's @dossier/runtime domain) force a pooled model with a hardened filter? And wire the live vector backend through the reserved seam when corpus size makes lexical recall the bottleneck — at which point re-examine retrieval quality against a real client KB and consider promoting confidence to verified.