flower
/
All briefs
idea draft session_capture legit-embedding blocked
epic · Epic: Conductor multi-model + multi-app embedding (t...

Build single-process multi-model async worker + worker-side reply_to (Track B)

canonical · plan

Spec

markdown

hand-off · dispatch

Dispatch

Auto-dispatch

when it reaches planned

Design-loop

design pass before build

Blocked — dispatch is gated

Waiting on 1 unfinished dependency. Complete or cancel it to dispatch.

provenance · append-only

Trace

live
or paste a screenshot uploading…
  1. dependency added 1d ago

    Now depends on #204 (Spike: choose the single-process multi-model serving substrate (Track B)).

    agent · conductor-claude
  2. parent set 1d ago

    Grouped under epic #201.

    agent · conductor-claude
  3. plan proposed 1d ago

    # Build single-process multi-model async worker + worker-side reply_to (Track B) Repo: **legit-embedding**, built in an **ISOLATED worktree** (worktree-manager) + its own **Solo project** so it never disturbs the shipping foundation. **Depends on #204** (substrate decision). The convergence point for BOTH multi-model serving AND the worker half of Phase-4 result-routing. ## What Replace the N-Popen / `--pool=solo` multi-process design with ONE process that loads each model once and hides the slow XADD behind async concurrency instead of duplicate model copies. ### Surface replaced (from recon) - **Model load:** `models.py:77` SingletonEmbeddingModel (EVA02) + `text_models.py:13` ModelManager — load once each; extensible to future audio/video analyzers. Lean on existing multi-model-in-one-process primitives: `_get_or_load_model` (`models.py:105`); text idle auto-unload (`text_models.py:116`). - **Concurrency:** `start_workers.py:54-83` Popen fan-out + `cli.py:73-85` `--pool=solo` → one process, async in-flight batches. - **Loop:** `embedding.py:100` / `text_embedding.py:59`, blocking pipelined XADD `embedding.py:181-186` / `text_embedding.py:229-245` → async overlap (write batch N while computing batch N+1). - **Handoff:** replace the `/dev/shm` prep→embed Celery hop (`preparation.py` send_task L429) with an in-process async pipeline. - **Reuse unchanged:** `encode_embedding` (`encoding.py:22`, base64_fp32), `StreamConfig` (`config.py`). ### INCLUDE worker-side reply_to routing (the worker half of Phase-4) The worker currently NEVER reads `reply_to`/`source` (0 grep hits); envelope fields at `preparation.py:173`; result always written to the global stream (`embedding.py:151`). Change: thread `reply_to`/`source` from the envelope through the pipeline into batch metadata, and write each result to `metadata.get('reply_to', <global default>)` in the XADD path (`embedding.py:151` / `text_embedding.py:227`), falling back to the global stream when absent. **Build + bake the container's routing ONCE here** (not on the old worker). Pairs with #203's consumer gate — flip both together. ### Telemetry identity consolidation A single process must consolidate the hostname+PID-keyed consumer/metric names (`cli.py:67`, `embedding.py:23`, `preparation.py:68`, `stats_publisher.py`) so conductor telemetry/dashboards stay coherent. ## Validate Image parity + throughput vs #202 (batch/util from telemetry); then bring text online (#207) as the 2nd-model exercise. Single-task smoke before bulk (same discipline as #202). Watch worker Sentry for OOM. ## Status Draft (idea). Depends #204. Isolated build. When it bakes: flip #203's gate + retire the old worker image.

    agent · conductor-claude
  4. note added 1d ago

    Rewrite the worker to ONE process that loads each model once and hides the slow XADD behind async concurrency instead of duplicate model processes. Includes the worker-side reply_to routing (the container half of Phase-4, built once here). Built in an isolated worktree + Solo project. Depends on the substrate spike. Full spec to follow.

    agent · conductor-claude
  5. participant joined 1d ago
    system · conductor-claude

epic · dependencies

Relationships

depends on

agents · waves

Participants

  • conductor-claude participant · active

trace · graph

Links

No links yet — they accrue as agents work the brief.

scope

Projects

  • conductor · consumer
  • conductor-client · consumer
  • legit-embedding · primary

dogfood · read-only

Agent’s-eye view

The literal recall_brief payload an agent gets — same service path as the MCP tool.