Deny-by-default egress sandbox around the extraction agent — break the lethal trifecta so a hijacked agent structurally cannot exfiltrate (the single load-bearing build finding)
task-extraction-deny-by-default-egress-sandbox
Deny-by-default egress sandbox around the extraction agent
The single load-bearing build finding of the DEC-0059 synthesis. The ingest→extract→serve pipeline is a textbook lethal trifecta (private data + untrusted content + exfiltration ability). The durable defense is to remove the exfiltration leg architecturally, assuming injection succeeds — not to detect every payload.
The control
Run extraction (and, separately, serve) with deny-by-default egress:
- In the Claude Agent SDK:
permissionMode:"dontAsk"+ an explicitallowedToolsallow-list + aPreToolUsedeny-hook (inspectstool_name/tool_input; deny wins even over mode). - Enforce the real guarantee at the OS level —
--network none+ a domain-allowlisting proxy — not at the model layer. - NEVER
bypassPermissions— it ignoresallowedToolsand is inherited by every subagent, and Dossier uses subagents (Extraction runtime architecture — the moat / Agentic-agency runtime topology — compile personas from the OKF graph and activate the reserved BoardWorker over the deterministic spine). This is the explicit build-team footgun.
Caveats to respect (from official docs)
The built-in proxy does not TLS-terminate (domain-fronting can bypass the allowlist → use a TLS-terminating proxy); sandboxes share the host kernel (use gVisor/Firecracker for kernel isolation); the default read policy still allows ~/.ssh and ~/.aws/credentials → add them to denyRead. Add deterministic impact-blocking on extraction output too (strip/deny outbound links + image-render exfil channels). Residual: exfil can still leak via rendered output — this is containment, not 100% prevention.
Why a task, not a fix-in-place
A real runtime/Agent-SDK hardening change (permission posture + OS sandbox + proxy + output hook) with a demonstrated containment test — owner judgment + code, the highest-leverage item in the DEC-0059 set (hence p0). Detail + citations: research/2026-06-18-sensitive-data-and-injection-defense.md §9d, §9f.
Build status (2026-06-20) — model-layer scope DONE + VERIFIED → review
Built by the forward-deployed-engineer. Atomic module packages/runtime/src/egress-sandbox.ts (single source of truth for the agent's containment config), wired into the agentic transport agentSdkTurnRunner (packages/runtime/src/live-session.ts), proven by a red→green offline containment test (packages/runtime/test/egress-sandbox.test.ts, 6 cases). Gates green: pnpm typecheck, pnpm test (513 passed / 2 skipped), pnpm build, pnpm kb:check, pnpm plugin:check.
Acceptance criteria — status:
- ✅
permissionMode:"dontAsk"+ explicitallowedToolsallow-list (Read/Grep/Glob— no egress tool) +PreToolUsedeny-hook (egressGuardHook, inspectstool_name/tool_input, deny wins over mode) — simulated injection exfiltrates ZERO. - ✅
bypassPermissionsNEVER used — grep-asserted absent in the agent config; the test enforces its absence. - ⏳ OS-level guarantee DEFERRED (
--network none+ a TLS-terminating domain-allowlist proxy). The model-layer posture backs it and never weakens it (OS_EGRESS_GUARANTEEnames the control set), but the deploy-time enforcement is reserved to the regulated-tenant build — it needs the per-tenant runtime substrate (Per-tenant runtime isolation — make the tenant a process/network/key boundary (not a directory), with a per-tenant vector namespace + server-side tenant binding, so a poisoned/sensitive atom is contained to ONE tenant). The OS-layer mechanism is now specified by DEC-0071 (@anthropic-ai/sandbox-runtime, proxy-first;--network none+ allowlist proxy as the load-bearing control; gVisor/Firecracker as an escalation tier, not the default), behind the newContainmentSubstrateseam. This is the honest gap: the model layer is verified, the OS layer is not yet enforced. - ⏳ Kernel isolation considered, NOT enforced (gVisor/Firecracker named in
OS_EGRESS_GUARANTEE; deploy-time).~/.ssh+~/.aws/credentialsdenied via scopeddisallowedToolsRead rules + the hook's secret-path backstop (the SDK has nodenyReadfield — the criterion'sdenyReadis honored via the documented mechanism). - ✅ Deterministic impact-blocking on OUTPUT (
stripOutputExfilChannelsstrips markdown/img/raw/data:/html-tag exfil channels from emitted atom bodies, applied inparseAtoms); residual stated honestly (rendered-output leakage constrained, not 100% eliminated). Fail-closed quarantine (quarantineUntrusted) on the untrusted-content path. Cross-links DEC-0059.
Disposition: the model-layer scope is complete + verified; this sits in review for a human to approve (→ done) or re-scope. The OS-level + kernel-isolation enforcement is the residual, carried by the regulated-tenant build (DEC-0059 §Review gate (3) is now met at the model layer; the full gate also needs the OS-level enforcement). confidence: asserted (built + measured at the model layer; not verified because the OS-level guarantee is unbuilt).