[DEFERRED] Multi-pod: N concurrent GPU pods

Dispatch

canonical · plan

Spec

markdown

hand-off · dispatch

Dispatch

Auto-dispatch

when it reaches planned

Design-loop

design pass before build

target project worktree · optional branch · optional agent tool · optional session id · optional

kind

No dispatch requests yet — dispatch above to generate a copy-paste packet.

provenance · append-only

Trace

live

status change 1d ago
agent · conductor-claude
parent set 1d ago

Grouped under epic #201.
agent · conductor-claude
plan proposed 1d ago

# [DEFERRED] Multi-pod: N concurrent GPU pods **STATUS: DEFERRED by operator (2026-07-04)** — single-pod optimization first (Track B). Un-shelve after #205/#206 land. Full design: conductor `docs/spec-multi-pod.md`. ## Why (later) N concurrent pods draining the shared `embedding_tasks` consumer group = ~N× throughput — the only lever that makes 60M tractable (1 pod ≈ 7-10 items/s → 70-90 days; the per-pod xadd floor can't be tuned away). But per-pod utilization (#202/#205/#206) comes first — multi-pod multiplies whatever a single pod does, so don't multiply an under-used pod. ## What blocks it (all assume max_instances=1) - Instance store: `GpuRuntimeInstance` = one row per provider (`firstOrNew(['provider'])`), single `active_external_id` → needs a row per pod. - Store keys by provider → key by `(provider, external_id)`. - Adapter tracks a single pod_id → act on a specific pod_id passed in. - Watcher: `runningInstanceCount`/spawn/decide/teardown assume ≤1 → per-pod counting, scale-out spawn, per-pod idle teardown. - Config: `max_instances` (default 1) → raise + scale policy. - Prereq (DONE): zombie-instance `b374af5` + resume `4efa653` fixes (multi-pod = much more spawn/teardown churn). ## Design sketch Per-pod rows (`external_id` = natural key); scale-out `desired = clamp(ceil(depth/TARGET_DEPTH_PER_POD), 0, MAX_INSTANCES)`, spawn one/tick (ramped), per-pod idle teardown; adapter `stop`/`destroy`/`status` take an explicit `external_id`; cost guardrails (MAX_INSTANCES cap, per-pod 24h alert, optional $/hr ceiling). Phasing: model+store (behind max_instances=1) → adapter → watcher scale-out → validate at 2–3. ## Note Different axis from **multi-APP** (vodmanager) — that's per-app result routing (#203/#205), not more pods for one app. ## Status DEFERRED.

agent · conductor-claude
note added 1d ago

DEFERRED by operator 2026-07-04 — single-pod optimization first. N concurrent pods = ~Nx throughput (the real 60M scaler) but multiplies whatever one pod does, so per-pod utilization (Track B) comes first. Full design in conductor docs/spec-multi-pod.md; condensed spec to follow.
agent · conductor-claude
participant joined 1d ago
system · conductor-claude

epic · dependencies

Relationships

epic parent

Epic: Conductor multi-model + multi-app embedding (the vodmanager foundation)

depends on

No dependencies — dispatchable once planned.

agents · waves

Participants

conductor-claude participant · active

trace · graph

Links

No links yet — they accrue as agents work the brief.

scope

Projects

conductor · primary

dogfood · read-only

Agent’s-eye view

The literal recall_brief payload an agent gets — same service path as the MCP tool.