Epic: Conductor multi-model + multi-app embedding (the vodmanager foundation)

Dispatch

canonical · plan

Spec

markdown

hand-off · dispatch

Dispatch

Auto-dispatch

when it reaches planned

Design-loop

design pass before build

target project worktree · optional branch · optional agent tool · optional session id · optional

kind

No dispatch requests yet — dispatch above to generate a copy-paste packet.

provenance · append-only

Trace

live

plan proposed 1d ago

# Epic: Conductor multi-model + multi-app embedding (the vodmanager foundation) **Objective:** scale conductor's GPU embedding pipeline for **throughput-per-dollar** on large deferred backlogs (lounge ~60M images now; vodmanager text/video the strategic goal). Every workload is queued/deferred — **latency-insensitive** — so we optimize utilization + cost, favor big batches + deep queues, and can schedule coarsely (pack once at pod startup; no reactive autoscaler needed). ## Strategy: two tracks + one immediate op, split on REPO boundaries (low merge conflict) - **Immediate op — #202 (legit-embedding):** VRAM-auto-size the image batch on the *current* proven worker → ~2.5× throughput on the live lounge image backfill while Track B is built. Reusable core (nvidia-smi detect + batch memory model). - **Track A — Laravel repos (conductor-client, conductor), ships to prod now — #203:** the per-app result-routing *contract*, config-gated OFF (zero behavior change). The reusable half of Phase-4 multi-app routing; pins the contract Track B builds against. - **Track B — Python worker (legit-embedding), built in an ISOLATED worktree + Solo project — #204 → #205 → #206:** the architecture — one process, each model loaded once, async I/O overlap instead of duplicate model processes. Includes the worker-side `reply_to` routing (the expensive container half of Phase-4, built ONCE here, not twice). ## Key decision & rationale (single-process multi-model = "B") The current worker gets throughput by running N duplicate processes, each holding a full model copy, to hide the slow ~1235ms Tailscale `xadd` write behind GPU compute (~530ms; GPU ~30% utilized — HANDOFF §5). To *saturate* the GPU you'd need ~4 concurrent in-flight batches → 4 full model copies in VRAM. A single async process gets the same overlap with **one** model copy + N activation buffers — far cheaper, the industry-standard multi-model serving pattern (Triton/Ray Serve/TorchServe), and exactly what vodmanager needs (a video analyzer co-resident with embedders). Operator confirmed "B is the no-brainer." ## Corrected regression diagnosis (grounding recon, 2026-07-04) The prior **"CUDA-after-fork" story is WRONG.** The worker is NOT Celery-prefork — it's N `subprocess.Popen`'d `--pool=solo` processes (fresh execve, no inherited CUDA). The real `1766f48` bug: the CPU-only prep worker eagerly creates a phantom CUDA context by reading `image_batch_size` (which calls `torch.cuda.get_device_properties`) at `preparation.py:113`, and the formula sizes off `total_memory` not *free* VRAM → the fixed reserve gets eaten → OOM. Fix direction (nvidia-smi, resolve pre-Popen) unchanged; reason corrected. See #202. ## Children & shape - #202 image batch auto-size (immediate) — independent - #203 result-routing contract, Laravel side (Track A) — buildable now; ACTIVATION coupled to #205 - #204 substrate spike (Track B) — independent - #205 build single-process multi-model worker + reply_to (Track B) — depends #204 - #206 worker-type cost table + greedy packer (Track B) — depends #205 - #207 text-path first live validation — independent (cheap single-app test; also a #205 smoke) - #208 multi-pod — DEFERRED - #209 vodmanager onboarding — DEFERRED (depends #203 gate-on + #205) ## Sources conductor `docs/HANDOFF.md`, `docs/spec-vram-aware-tuning.md`, `docs/spec-multi-pod.md`; umbrella `_conductor-vodmanager-integration-plan.md`. Master cross-repo plan + cadence: Solo scratchpad in the conductor project. **Nothing here is `planned`** — promotion to `planned` (daemon dispatch) is the operator's per-brief call; #202 and #203 are the promote-ready near-term ones.

agent · conductor-claude
note added 1d ago

Umbrella epic grouping the conductor GPU-embedding throughput + multi-app work: two tracks (Laravel-side result-routing contract now; isolated single-process multi-model worker rewrite) + an immediate image-batch win, with multi-pod and vodmanager onboarding deferred. Full spec to follow.
agent · conductor-claude
participant joined 1d ago
system · conductor-claude

epic · dependencies

Relationships

epic parent

children · 0/8 complete

depends on

No dependencies — dispatchable once planned.

agents · waves

Participants

conductor-claude participant · active

trace · graph

Links

No links yet — they accrue as agents work the brief.

scope

Projects

conductor · primary
conductor-client · consumer
legit-embedding · consumer
lounge · consumer
vodmanager · consumer

dogfood · read-only

Agent’s-eye view

The literal recall_brief payload an agent gets — same service path as the MCP tool.