Fail-closed quarantine wrapper for the Unstructured file path (zero-element / encrypted-by-header / unknown-MIME → quarantine-by-default) — because Unstructured fails EMPTY, not CLOSED

task-unstructured-fail-closed-quarantine

task confidence inferred status backlog 2026-06-19 owner ingestion-engineer
source log-auditor — surfaced recording 0059-untrusted-by-default-ingestion-serve-boundary, which names the Unstructured fail-empty behavior (research §7b) as "the single most important file-path takeaway". Board globbed before filing — no open task covered the Unstructured connector or a fail-closed/quarantine path (UnstructuredConnector is still a reserved stub per DEC-0013).

Fail-closed quarantine wrapper for the Unstructured file path

The single most important file-path takeaway of the DEC-0059 synthesis: Unstructured fails empty, not closed. Encrypted / copy-protected files return zero elements with only a logged warning — and a silent empty result reads downstream as "clean," so a regulated document Dossier could not parse would sail through as if it carried no sensitive content.

The fix

Dossier must add its own fail-closed quarantine wrapper around the Unstructured connector (Ingestion connector seam — assemble, don't build, and ingestion owns the input contract reserves UnstructuredConnector as a stub): zero-element / encrypted-by-header / unknown-MIME → quarantine-by-default (deny + route to human review). The wrapper must distinguish "parsed and found nothing" from "could not parse," and treat the latter as quarantine. Note the adjacent blind spot: Unstructured's default scan is text-only and misses PII in images/scanned PDFs (Presidio Image Redactor + OCR is beta), so an unreadable/zero-element file is precisely the case not to trust.

Why a task, not a fix-in-place

Real connector-layer engineering (a wrapper, a MIME/header probe, a quarantine route) plus a verification fixture — owner judgment + code. This is the file-path realization of DEC-0059's "every detector is probabilistic, so fail closed" leg. Detail + citations: research/2026-06-18-sensitive-data-and-injection-defense.md §7b. confidence: inferred (agent-filed from DEC-0059).