flower
/

review · segments

Research vodmanager GPU/compute needs for conductor integration

pi 231 events 1 segments

segment 1 of 1

Investigate vodmanager's GPU/compute-heavy work to determine conductor additions

Abandoned

The AI systematically searched vodmanager's app directory, reading transcription backends (ReplicateBackend, WhisperXBackend, WhisperCppBackend, WhisperApiBackend), transcription dispatcher, highlight scoring/detection/refinement, encoding (FfmpegBinary, StaticSegmentDetector, VodEncoder), analytics builder, speaker cluster models, audio/video processing jobs, and the existing cog/ directory containing a Replicate diarization model. It also read the conductor-client wire contract (Conductor.php, TaskEnvelope, result consumer/handler, Embedding model, config) and the legit-embedding worker tasks (preparation, embedding, text_embedding, config). It explored the conductor service (WatchCommand, GpuRuntimeProfileResolver, RunPodRuntimeAdapter) to understand container image specification. Found that vodmanager performs transcription, diarization, speaker embeddings, highlight refinement (LLM), static segment detection, and encoding—many GPU-bound. No existing conductor integration is present. A draft research document was composed internally but the Solo scratchpad write (to scratchpad 860) failed due to a missing 'name' parameter. The session ends with the assistant recognizing the issue and intending to re-read the scratchpad.

outcome

Comprehensive mapping identified but not yet recorded in scratchpad 860; scratchpad write failed.

next steps

  • Read scratchpad 860 to get its name field
  • Re-invoke scratchpad write with correct name to save research output
  • Synthesize findings into scratchpad 860 per the user's requested structure (1-6 sections)
  • Classify each task as GPU-bound, CPU-bound, or mixed
  • Determine which tasks fit the existing embedding-shaped stream contract vs. require new stream types or object-storage references
  • Define payload size, input/output shapes, and result handling for each task type
  • Document prioritized recommendations and open questions
  • Reply 'RESEARCH COMPLETE — scratchpad 860' to the user

key decisions

  • No decisions recorded yet; this segment is purely investigative.

open questions

  • Exact payload sizes for audio/video files vs. small text/image embeddings
  • Whether inline Redis streams suffice or object-storage references (B2/S3/MinIO) are needed for audio/video
  • Result shapes for transcription (segments + speaker labels) and speaker embeddings (not plain vectors)
  • GPU vs. CPU resource requirements per task (e.g., whisper transcription vs. highlight scoring)
  • Whether conductor-client should gain new task-type helpers or remain focused on embedding-style tasks
  • How the existing envelope (source, reply_to, bytes inline) should generalize for non-embedding tasks
  • How will conductor support non-embedding GPU tasks like transcription (audio to text) and speaker diarization?
  • What changes to the task envelope/stream contract are needed for large binary payloads (MP3, audio segments) vs. current inline base64?
  • Would encoding (ffmpeg) and static segment detection fit conductor's task model, or remain on Forge?
  • Should highlight LLM refinement be handled by conductor (self-hosted model) or stay as cloud API calls?
  • What is the name of scratchpad 860?

2 weeks ago 2 weeks ago