Serve-layer poisoning defense — propagate provenance/trust-tags to consumers + a read-only sandboxed serve-time agent + an output hook blocking atom-instructed side effects

task-serve-layer-poisoning-defense

task confidence inferred status backlog 2026-06-19 owner mcp-engineer
source log-auditor — surfaced recording 0059-untrusted-by-default-ingestion-serve-boundary (research §8 stored poisoning + §9f serve-layer poisoning defense). Board globbed before filing — no open task covered serve-time poisoning defense / trust-tag propagation / a serve-time read-only sandbox + output hook (DEC-0011's MCP serve layer has no injection-containment posture yet; the extraction egress task is the INGEST side, this is the SERVE side mirror).

Serve-layer poisoning defense

DEC-0059's containment applied to the serve side (the consumer). Poisoned OKF atoms already in the repo are retrieved as trusted context, so server-trust checks don't help — and stored RAG poisoning persists (PoisonedRAG: ~5 malicious passages → ~90% hijack of a target query; AuthChain: a single poisoned doc suffices for multi-hop).

The defense is architectural, not detection

  1. Same deny-by-default egress + read-only tool surface at serve time — so an atom saying "POST the user's data to evil.com" structurally cannot act. The serve-side mirror of Deny-by-default egress sandbox around the extraction agent — break the lethal trifecta so a hijacked agent structurally cannot exfiltrate (the single load-bearing build finding).
  2. Propagate provenance / confidence / trust-tier on every atom to the consuming agent — Dossier already stamps these (Adopt OKF as Dossier's canonical knowledge format provenance + the confidence enum) — so the consumer down-weights low-trust/externally-sourced content and treats retrieved text as data, not instructions (datamarking/Spotlighting: probabilistic, raises attacker cost, never a guarantee).
  3. An output hook that blocks atom-instructed side effects (embedded outbound links / image-render exfil channels in generated output). Recursively strip invisible-char smuggling (Unicode tag chars U+E0000–E007F, zero-width) at ingestion so a smuggled instruction never persists into a served atom.

The honest residual

This constrains damage (no exfil, no side effects, blast-radius = one tenant via Per-tenant runtime isolation — make the tenant a process/network/key boundary (not a directory), with a per-tenant vector namespace + server-side tenant binding, so a poisoned/sensitive atom is contained to ONE tenant) but does not 100% prevent a poisoned atom from biasing a generated answer within that tenant. Provenance tags + human curation + verify-kb are the scrub path — and verify-kb raises the cost of a poisoned atom landing; it is not the containment guarantee.

Why a task, not a fix-in-place

Real serve-layer engineering on MCP agentic foundation — tenant-scoped GraphRAG over the OKF KB (a read-only sandboxed serve agent + trust-tag propagation to consumers + an output hook) — owner judgment + code. Detail + citations: research/2026-06-18-sensitive-data-and-injection-defense.md §8, §9a, §9f. confidence: inferred (agent-filed from DEC-0059).