Stay serialize-only for intra-tenant drains — keep the per-task worktree mechanism built-but-dormant until a measured throughput trigger fires
0062-defer-intra-tenant-worktree-parallelism
- Reversibility
- two-way door
DEC-0062 — Stay serialize-only for intra-tenant drains; keep worktree parallelism built-but-dormant
Reversibility: two-way door. The deferral is fully reversible — the day the trigger below fires, this decision is revisited and the pinned design (below) is built behind the already-proven withTaskWorktree seam, activation being a flag, not a rewrite. What this decision deliberately does not take is the one-way door (relaxing the Inv 4 per-tenant drain lock to a per-task lock); keeping that door shut is the conservative, reversible call. The pinned design exists so that when the door is opened, the topology is already decided rather than re-litigated under load.
Disposes the topology call DEC-0053 §4 (Invariant 4) explicitly designed toward and carried forward as reserved "scale" work, surfaced to the Principal Platform Architect as Activate parallel intra-tenant drains via per-task git worktrees, or stay serialize-only? (one-way-door topology call). The worktree isolation mechanism is built and offline-proven (packages/runtime/src/worktree.ts, packages/runtime/test/worktree.test.ts — real git worktree add in a temp repo, two tasks edit+commit concurrently without corrupting each other, every path confineToTenant-gated). This decision resolves whether to activate it. It does not.
Context — the pressure that would justify activation does not exist
The mechanism is de-risked; the need is the question, and the ground truth says there is none:
- The platform runs one drain, one task at a time.
drainBoardSerializedserializes per tenant (Inv 4 holds);scripts/board-drain.mjscalls a singledrainBoardwithmaxTasksPerRun: 1(one-task-per-session, Agentic "sprint board" architecture — a git-resident OKF task board worked by bounded, hook-governed Agent SDK loops §6). There is no scheduler, no Actions matrix, no caller anywhere that dispatches two concurrent drains on one tenant. Verified this session against the actual invocation sites —board.ts,board-drain.mjs,agency-phase0-{dogfood,live}.mjsare the only drain entry points and none run concurrently. - One early client tenant (RBA, under gitignored
clients/), plus the dogfood repo. Cross-tenant parallelism is already free (one MCP server +confineToTenantper tenant — DEC-0053 §4); the only thing worktrees buy is intra-tenant parallelism, i.e. running two tasks for the same client at once. No client is generating that load, and no latency SLO is being missed. - There is no observability that even measures intra-tenant queue depth or drain latency yet. Activating a one-way-door topology change to relieve a pressure we are not measuring would be building for an imagined future — the textbook YAGNI failure, and exactly the kind of premature-complexity cost a future-self regrets (the bar this role is measured against).
DEC-0053 §4's "serialize NOW, design TOWARD worktrees" was the right call then and is the right call now. The "design toward" obligation has been fully discharged by building + proving the mechanism. "Toward" is not "now."
Options considered
- Activate now (relax Inv 4 to a per-task lock; wire N concurrent
withTaskWorktreedrains). Rejected. Takes a one-way door (drain-lock relaxation, new merge + budget-accrual topology, a real test surface to maintain) to buy throughput nobody is requesting. Cost: permanent complexity in the isolation contract + an irreversible loosening of the safest invariant, against zero measured benefit. Negative expected value today. - Delete the mechanism as premature; rebuild if ever needed. Rejected. Wasteful and anti-leverage — the mechanism is built and proven, costs ~0 to keep dormant, and its existence is precisely what makes future activation a flag not a rewrite. Deleting it would re-incur the build + re-derisk cost later. Keep proven optionality; don't pay to destroy it.
- Stay serialize-only; keep the mechanism built-but-dormant; pin the activation design + a measurable trigger. Chosen. Inv 4 (serialize) is reaffirmed as the standing topology. The mechanism stays behind the proven
withTaskWorktreeseam, annotated as deliberately dormant. The one-way door stays shut. The three sub-questions are pre-resolved (below) so the day the trigger fires, an FDE builds a decided design under load instead of litigating topology under load.
Decision
Serialize-only is the standing intra-tenant drain topology. Do not activate per-task worktree parallelism. Keep worktree.ts built-but-dormant behind withTaskWorktree; annotate it as such (header amended to point here). Hold the Inv 4 per-tenant drain lock as-is.
The explicit revisit trigger (the one-way door reopens when ALL three hold)
Revisit this decision — and build the pinned design — when all of:
- A real tenant has sustained intra-tenant queue depth. At least one tenant accumulates ≥ 3 simultaneously claimable, independent (
dependencies-free or dependency-satisfied) tasks that a single serialized drain cannot clear inside the tenant's expected wake cadence — i.e. work that is genuinely parallelizable is repeatedly waiting on the serial lock, not on dependencies or human review. - That backlog has a latency cost someone is paying. The serial drain's wall-clock-to-
reviewfor a tenant's board misses an explicit, written latency expectation (a client SLO, or a dogfood throughput target we set deliberately) — i.e. serialization is a measured bottleneck, not a hypothetical one. - We can measure it. Per-tenant drain latency / intra-tenant queue depth is observable (a counter, a board metric) so activation's benefit is verifiable, not asserted.
The first sustained breach of (1)+(2) with (3) in place flips this. Until then, serialize-only stands. (A single bursty board, or a tenant whose tasks are mostly dependency-chained — which serialize anyway — does not trip it; the trigger is sustained, parallelizable, latency-costly load.)
The three open sub-questions — PRE-RESOLVED (the pinned design a future activation builds to)
So the door is decided, not deferred-and-vague. These are the design an FDE pins to when the trigger fires; they are recorded now while the context is fresh, but they do not activate anything.
- Merge topology → PRs the human gate disposes, NOT fast-forward to main. Each per-task worktree commits to its own
dossier/task/<id>branch; activation lands those as PRs (or their local-merge analogue) that a human disposes via the already-built Inv 3 gate —dispose.tsapproveTask(review →done+ the real merge commit) /rejectTask. A fast-forward-to-main path is explicitly rejected: it would let an agentic branch reachdone/main without the human disposition, violating Inv 3 (only a human merge moves a task todone) — the non-bypassabilitydispose.tsmade structural. So parallelism changes where work is staged (isolated branches), never who closes the loop (the human, through the existing singledone-writing path). The worker still transitions only toreview; the branch is what the human merges on approve. - Budget accrual → the per-team split (
budget.ts) is the apportionment key; accrual SUMS across worktrees against the unchanged tenant ceiling; the kill switch stays tenant-level.decideTeamBudget(DEC-0053 §5, built) is exactly the apportionment key for dividing the tenant envelope across concurrent drains: each concurrent worktree-drain carries ateamIdand accrues against both its team sub-envelope and the outer tenant ceiling. The concurrent drains'spentThisDrainUsdmust be summed into a single tenant-level running total before the post-drainenforceBudgetcheck (the scheduler owns this sum — a per-drain check alone could let N drains each pass individually while collectively breaching). The hard-stop / board-pause kill switch is unchanged and stays tenant-level: a team breach denies that team's drain without pausing the board (other teams continue); only the tenant-ceiling breach trips the.board-pausesentinel — exactlyenforceBudget's currentdeniedBy !== 'team'rule. Concurrency does not weaken the kill switch; it just requires the scheduler to feed it the summed accrual, not a single drain's. - Drain-lock relaxation (THE one-way door) → a per-task sub-lock NESTED under a RETAINED per-tenant coordination lease; the isolation contract is: worktree path = isolation unit, shared object store, serialized ref update. Inv 4 does not fully relax to a free-for-all per-task lock. The contract that keeps tenant safety intact:
- Retain a per-tenant coordination lease (the existing
.drain-lockshape) as the outer bound that gates admission — it caps concurrency at a configured N and owns the summed budget accrual (sub-question 2). It is no longer "one drain," but it is still the single tenant-level coordinator. - Add a per-task sub-lock keyed on the worktree path / branch — the worktree is the isolation unit (proven: separate checkout, own branch, shared
.gitobject store, no interleavedgit add -A). Two tasks may run concurrently iff they hold distinct worktree sub-locks. - Serialize the one shared mutation: the ref update / merge to the tenant main line. Concurrent work is safe (isolated trees); concurrent integration is not (one HEAD). Branch creation and per-worktree commits parallelize; the merge-to-main hop (sub-question 1's PR disposition) is serialized through the human gate, which is already a serialization point. The object store is shared and append-only under concurrent commits to distinct refs — the proven-safe case.
- Isolation contract, stated: a concurrent drain may read/write only inside its own
confineToTenant-gated worktree path; it may commit only to its owndossier/task/<id>branch; it may not touch another worktree's path, another task's branch, the tenant main ref (only the human merge does), or another tenant's subtree. Inv 1/2/3/6/7 are untouched by construction — they live insideAgentSdkBoardWorker.execute(), which is per-task and worktree-agnostic.
- Retain a per-tenant coordination lease (the existing
Consequences
- Inv 4 (serialize) is reaffirmed as the standing topology, not merely "not yet relaxed." DEC-0053's
verifiedscope is unchanged: parallelism remains explicitly not covered (consistent with DEC-0053 §Promotion item 2, which carried scale forward as reserved). worktree.tsis annotated dormant-by-decision (header amended to cite this record) so a future reader knows the mechanism's non-activation is a decided topology stance with a trigger, not an oversight or unfinished work.- The carried-forward "scale" reservation in DEC-0053 is now half-disposed: the per-team budget split (§5) was built; intra-tenant parallelism (§4) is hereby decided to defer with a trigger. DEC-0053's only remaining open scale item collapses to "activate worktrees when DEC-0062's trigger fires."
- No code changes to the drain path. Serialize-only is the existing behavior; this decision changes a comment and disposes a task. No test churn, no new surface to maintain — the correct cost for a deferral.
Relation to DEC-0053
This is the dated disposition of the §4 one-way door Agentic-agency runtime topology — compile personas from the OKF graph and activate the reserved BoardWorker over the deterministic spine designed toward and §Promotion item 2 carried forward as reserved scale work. It does not re-grade DEC-0053's frontmatter (the verified scope already excludes parallelism). It converts "reserved, undecided" into "deferred, decided, with a measurable trigger and a pinned activation design."