review · segments

Research vodmanager GPU/compute needs for conductor integration

pi 231 events 1 segments

segment 1 of 1

Investigate vodmanager's GPU/compute-heavy work to determine conductor additions

Abandoned

The AI systematically searched vodmanager's app directory, reading transcription backends (ReplicateBackend, WhisperXBackend, WhisperCppBackend, WhisperApiBackend), transcription dispatcher, highlight scoring/detection/refinement, encoding (FfmpegBinary, StaticSegmentDetector, VodEncoder), analytics builder, speaker cluster models, audio/video processing jobs, and the existing cog/ directory containing a Replicate diarization model. It also read the conductor-client wire contract (Conductor.php, TaskEnvelope, result consumer/handler, Embedding model, config) and the legit-embedding worker tasks (preparation, embedding, text_embedding, config). It explored the conductor service (WatchCommand, GpuRuntimeProfileResolver, RunPodRuntimeAdapter) to understand container image specification. Found that vodmanager performs transcription, diarization, speaker embeddings, highlight refinement (LLM), static segment detection, and encoding—many GPU-bound. No existing conductor integration is present. A draft research document was composed internally but the Solo scratchpad write (to scratchpad 860) failed due to a missing 'name' parameter. The session ends with the assistant recognizing the issue and intending to re-read the scratchpad.

outcome

Comprehensive mapping identified but not yet recorded in scratchpad 860; scratchpad write failed.

next steps

Read scratchpad 860 to get its name field
Re-invoke scratchpad write with correct name to save research output
Synthesize findings into scratchpad 860 per the user's requested structure (1-6 sections)
Classify each task as GPU-bound, CPU-bound, or mixed
Determine which tasks fit the existing embedding-shaped stream contract vs. require new stream types or object-storage references
Define payload size, input/output shapes, and result handling for each task type
Document prioritized recommendations and open questions
Reply 'RESEARCH COMPLETE — scratchpad 860' to the user

key decisions

No decisions recorded yet; this segment is purely investigative.

open questions

Exact payload sizes for audio/video files vs. small text/image embeddings
Whether inline Redis streams suffice or object-storage references (B2/S3/MinIO) are needed for audio/video
Result shapes for transcription (segments + speaker labels) and speaker embeddings (not plain vectors)
GPU vs. CPU resource requirements per task (e.g., whisper transcription vs. highlight scoring)
Whether conductor-client should gain new task-type helpers or remain focused on embedding-style tasks
How the existing envelope (source, reply_to, bytes inline) should generalize for non-embedding tasks
How will conductor support non-embedding GPU tasks like transcription (audio to text) and speaker diarization?
What changes to the task envelope/stream contract are needed for large binary payloads (MP3, audio segments) vs. current inline base64?
Would encoding (ffmpeg) and static segment detection fit conductor's task model, or remain on Forge?
Should highlight LLM refinement be handled by conductor (self-hosted model) or stay as cloud API calls?
What is the name of scratchpad 860?

2 weeks ago → 2 weeks ago