review · segments

Your task spec is at /tmp/legit-embedding-clip-spec.md — read it in full and execute it. Goal: make the worker's IMAGE embedding model selectable via the task's model_name (mirror the TEXT path), and add clip-vit-b-32 (OpenAI CLIP ViT-B/32, 512-dim,

codex 294 events 1 segments runpod-container-runtime

segment 1 of 1

Add selectable CLIP image model alongside default EVA02

Done

Read the task spec and explored the repo structure, including models.py, preparation.py, embedding.py, text_models.py, config, tests, and dependencies. Implemented changes: parameterized image model selection in models.py with a cache keyed by model_name, added CLIP model loading via open_clip with 224px/CLIP-norm transforms, grouped image preprocessing by model in preparation.py so EVA02 and CLIP batches use separate transforms and tensor shapes, updated embedding.py to handle grouped payloads while maintaining backward compatibility with single-path payloads, added open-clip-torch dependency to requirements.txt and setup.py, and added clip-vit-b-32 entry to models-config.json. Added unit tests for model selection, transform dispatch, batch grouping, and embedding dimensions. Installed open-clip-torch dependency. Committed locally as commit 225d689 on runpod-container-runtime branch.

outcome

Local commit 225d689 on runpod-container-runtime branch with CLIP image embedding support implemented and tested.

next steps

Rebuild and push the Docker image to GHCR via the publish workflow

key decisions

Use open_clip.create_model_and_transforms for CLIP, extracting the visual tower and L2-normalizing output to 512-dim
Group image preprocessing by resolved model_name in preparation.py so EVA02 and CLIP batches never share a tensor
Keep backward compatibility in embedding.py: accept both legacy single-path payloads and new grouped payloads
EVA02 remains the default image model; CLIP is only selected when model_name contains 'clip' or 'ViT-B-32'
Use open-clip-torch>=2.24.0 for CLIP support
Resolve CLIP aliases (clip-vit-b-32, ViT-B-32, etc.) to canonical name
Cache image transforms per model name
Group image batches by resolved model name before stacking tensors

open questions

—

2 weeks ago → 2 weeks ago