review · segments
Research vodmanager GPU/compute needs for conductor integration
pi 231 events 1 segments
segment 1 of 1
Investigate vodmanager's GPU/compute-heavy work to determine conductor additions
The AI systematically searched vodmanager's app directory, reading transcription backends (ReplicateBackend, WhisperXBackend, WhisperCppBackend, WhisperApiBackend), transcription dispatcher, highlight scoring/detection/refinement, encoding (FfmpegBinary, StaticSegmentDetector, VodEncoder), analytics builder, speaker cluster models, audio/video processing jobs, and the existing cog/ directory containing a Replicate diarization model. It also read the conductor-client wire contract (Conductor.php, TaskEnvelope, result consumer/handler, Embedding model, config) and the legit-embedding worker tasks (preparation, embedding, text_embedding, config). It explored the conductor service (WatchCommand, GpuRuntimeProfileResolver, RunPodRuntimeAdapter) to understand container image specification. Found that vodmanager performs transcription, diarization, speaker embeddings, highlight refinement (LLM), static segment detection, and encoding—many GPU-bound. No existing conductor integration is present. A draft research document was composed internally but the Solo scratchpad write (to scratchpad 860) failed due to a missing 'name' parameter. The session ends with the assistant recognizing the issue and intending to re-read the scratchpad.
outcome
Comprehensive mapping identified but not yet recorded in scratchpad 860; scratchpad write failed.
next steps
- Read scratchpad 860 to get its name field
- Re-invoke scratchpad write with correct name to save research output
- Synthesize findings into scratchpad 860 per the user's requested structure (1-6 sections)
- Classify each task as GPU-bound, CPU-bound, or mixed
- Determine which tasks fit the existing embedding-shaped stream contract vs. require new stream types or object-storage references
- Define payload size, input/output shapes, and result handling for each task type
- Document prioritized recommendations and open questions
- Reply 'RESEARCH COMPLETE — scratchpad 860' to the user
key decisions
- No decisions recorded yet; this segment is purely investigative.
open questions
- Exact payload sizes for audio/video files vs. small text/image embeddings
- Whether inline Redis streams suffice or object-storage references (B2/S3/MinIO) are needed for audio/video
- Result shapes for transcription (segments + speaker labels) and speaker embeddings (not plain vectors)
- GPU vs. CPU resource requirements per task (e.g., whisper transcription vs. highlight scoring)
- Whether conductor-client should gain new task-type helpers or remain focused on embedding-style tasks
- How the existing envelope (source, reply_to, bytes inline) should generalize for non-embedding tasks
- How will conductor support non-embedding GPU tasks like transcription (audio to text) and speaker diarization?
- What changes to the task envelope/stream contract are needed for large binary payloads (MP3, audio segments) vs. current inline base64?
- Would encoding (ffmpeg) and static segment detection fit conductor's task model, or remain on Forge?
- Should highlight LLM refinement be handled by conductor (self-hosted model) or stay as cloud API calls?
- What is the name of scratchpad 860?
2 weeks ago → 2 weeks ago