Vectors in MySQL (source of truth); Meili as a rebuildable index

canonical · plan

Spec

markdown

hand-off · dispatch

Dispatch

Auto-dispatch

when it reaches planned

Design-loop

design pass before build

This brief is complete — dispatch is closed.

#10 done fresh flower · flower/vectors-in-mysql

You are being dispatched from flower Brief #28: Vectors in MySQL (source of truth); Meili as a rebuildable index

Recall pointer:
- Use recall_brief with id 28 for the full folder if you need provenance.

Target:
- project: flower (/Users/mikeferrara/Documents/code/flower)
- branch: flower/vectors-in-mysql
- worktree: not specified
- kind: fresh

Current brief spec:
## Goal
Store embedding vectors in **MySQL as the source of truth**; use Meilisearch purely as a
**rebuildable index**. Today embeddings live ONLY in Meili (`chunks` has `meili_id`, no vector
column), so Meili is a data store we depend on — dropping/rebuilding a Meili index means losing
the vectors and re-embedding (expensive LLM calls). Target: drop indexes + rebuild from MySQL
vectors = cheap, no re-embedding.

## Current state
- `chunks`: id, chunkable_type/id, project_id, worktree_id, harness, text, content_hash,
  token_count, file_paths, tags, ts, `meili_id`. **No vector column.**
- `EmbedChunks`: creates chunk rows → calls the embedding provider → upserts to Meili (hybrid).

## Change
1. **Persist the vector in MySQL** — add embedding storage (either a column on `chunks`, or a
   `chunk_embeddings` table keyed by chunk_id) holding the raw float vector + the embedding
   model + dims. Choose a compact storage format (packed binary/blob preferred over JSON for
   size; chunks × dims × 4 bytes matters).
2. **`EmbedChunks` writes MySQL first, THEN Meili** — vector persisted to MySQL is the source of
   truth; the Meili push is the index step.
3. **Rebuild command** — `flower:reindex-from-vectors` (or similar): reads stored MySQL vectors
   and (re)pushes them to Meili with NO LLM re-embedding. Dropping + rebuilding a Meili index
   becomes a cheap MySQL→Meili push.
4. **Backfill** — existing chunks have `meili_id` but no MySQL vector. If Meili can export stored
   vectors, backfill from Meili (no re-embed); else backfill = one-time re-embed. FLAG the cost
   + confirm approach before running a big backfill.

## Acceptance
- Chunks store their embedding vector in MySQL; EmbedChunks persists to MySQL then indexes to Meili.
- `reindex-from-vectors` rebuilds Meili from MySQL without LLM calls (test with a dropped index).
- Migration + tests; keep suite green. `Brief: #<id>` trailer.

## Open questions (resolve in the plan)
- Storage format/size (dims ~1536–3072); does the current embed provider/dim fit a column vs table?
- Does Meili expose stored vectors for a no-re-embed backfill, or is a one-time re-embed required?

## Provenance
Operator 2026-07-01: "vectors should live in MySQL, push to Meili for indexing; don't DEPEND on
Meili as a data store — dropping/rebuilding indexes shouldn't be a problem."

Recent/key trace events:
[1] participant_joined flower-orchestrator: (no body)
[2] note_added flower-orchestrator: Stop depending on Meili as the embedding data store — persist vectors in MySQL, push to Meili for indexing, add a rebuild-from-MySQL command so dropping/rebuilding indexes needs no re-embedding.
[3] plan_proposed flower-orchestrator: ## Goal
Store embedding vectors in **MySQL as the source of truth**; use Meilisearch purely as a
**rebuildable index**. Today embeddings live ONLY in Meili (`chunks` has `meili_id`, no vector
column), so Meili is a data store we depend on — dropping/rebuilding a Meili index means losing
the vectors and re-embedding (expensive LLM calls). Target: drop indexes + rebuild from MySQL
vectors = cheap, no re-embedding.

## Current state
- `chunks`: id, chunkable_type/id, project_id, worktree_id, harness, text, content_hash,
  token_count, file_paths, tags, ts, `meili_id`. **No vector column.**
- `EmbedChunks`: creates chunk rows → calls the embedding provider → upserts to Meili (hybrid).

## Change
1. **Persist the vector in MySQL** — add embedding storage (either a column on `chunks`, or a
   `chunk_embeddings` table keyed by chunk_id) holding the raw float vector + the embedding
   model + dims. Choose a compact storage format (packed binary/blob preferred over JSON for
   size; chunks × dims × 4 bytes matters).
2. **`EmbedChunks` writes MySQL first, THEN Meili** — vector persisted to MySQL is the source of
   truth; the Meili push is the index step.
3. **Rebuild command** — `flower:reindex-from-vectors` (or similar): reads stored MySQL vectors
   and (re)pushes them to Meili with NO LLM re-embedding. Dropping + rebuilding a Meili index
   becomes a cheap MySQL→Meili push.
4. **Backfill** — existing chunks have `meili_id` but no MySQL vector. If Meili can export stored
   vectors, backfill from Meili (no re-embed); else backfill = one-time re-embed. FLAG the cost
   + confirm approach before running a big backfill.

## Acceptance
- Chunks store their embedding vector in MySQL; EmbedChunks persists to MySQL then indexes to Meili.
- `reindex-from-vectors` rebuilds Meili from MySQL without LLM calls (test with a dropped index).
- Migration + tests; keep suite green. `Brief: #<id>` trailer.

## Open questions (resolve in the plan)
- Storage format/size (dims ~1536–3072); does the current embed provider/dim fit a column vs table?
- Does Meili expose stored vectors for a no-re-embed backfill, or is a one-time re-embed required?

## Provenance
Operator 2026-07-01: "vectors should live in MySQL, push to Meili for indexing; don't DEPEND on
Meili as a data store — dropping/rebuilding indexes shouldn't be a problem."
[4] status_change flower-orchestrator: (no body)

Recommended linked context:
{
    "todos": [],
    "scratchpads": []
}

Execution notes:
- Treat the brief as the source of truth.
- Keep work scoped to this dispatch request.
- Use brief_append / brief_update_status when reporting material progress.
- Add a git commit trailer `Brief: #28` to every commit for this brief so flower can exact-link commits back to the brief.

provenance · append-only

Trace

live

link added 4d ago
agent · system:commit-trailer
link added 4d ago
agent · system:commit-trailer
participant joined 4d ago
system · system:commit-trailer
status change 4d ago
agent · flower-orchestrator
dispatched 4d ago

Dispatch request #10 marked done.
agent · flower-orchestrator
dispatched 4d ago

Dispatch request #10 queued for flower.
agent · flower-orchestrator
status change 4d ago
agent · flower-orchestrator
status change 4d ago
agent · flower-orchestrator
plan proposed 4d ago

## Goal Store embedding vectors in **MySQL as the source of truth**; use Meilisearch purely as a **rebuildable index**. Today embeddings live ONLY in Meili (`chunks` has `meili_id`, no vector column), so Meili is a data store we depend on — dropping/rebuilding a Meili index means losing the vectors and re-embedding (expensive LLM calls). Target: drop indexes + rebuild from MySQL vectors = cheap, no re-embedding. ## Current state - `chunks`: id, chunkable_type/id, project_id, worktree_id, harness, text, content_hash, token_count, file_paths, tags, ts, `meili_id`. **No vector column.** - `EmbedChunks`: creates chunk rows → calls the embedding provider → upserts to Meili (hybrid). ## Change 1. **Persist the vector in MySQL** — add embedding storage (either a column on `chunks`, or a `chunk_embeddings` table keyed by chunk_id) holding the raw float vector + the embedding model + dims. Choose a compact storage format (packed binary/blob preferred over JSON for size; chunks × dims × 4 bytes matters). 2. **`EmbedChunks` writes MySQL first, THEN Meili** — vector persisted to MySQL is the source of truth; the Meili push is the index step. 3. **Rebuild command** — `flower:reindex-from-vectors` (or similar): reads stored MySQL vectors and (re)pushes them to Meili with NO LLM re-embedding. Dropping + rebuilding a Meili index becomes a cheap MySQL→Meili push. 4. **Backfill** — existing chunks have `meili_id` but no MySQL vector. If Meili can export stored vectors, backfill from Meili (no re-embed); else backfill = one-time re-embed. FLAG the cost + confirm approach before running a big backfill. ## Acceptance - Chunks store their embedding vector in MySQL; EmbedChunks persists to MySQL then indexes to Meili. - `reindex-from-vectors` rebuilds Meili from MySQL without LLM calls (test with a dropped index). - Migration + tests; keep suite green. `Brief: #<id>` trailer. ## Open questions (resolve in the plan) - Storage format/size (dims ~1536–3072); does the current embed provider/dim fit a column vs table? - Does Meili expose stored vectors for a no-re-embed backfill, or is a one-time re-embed required? ## Provenance Operator 2026-07-01: "vectors should live in MySQL, push to Meili for indexing; don't DEPEND on Meili as a data store — dropping/rebuilding indexes shouldn't be a problem."

agent · flower-orchestrator
note added 4d ago

Stop depending on Meili as the embedding data store — persist vectors in MySQL, push to Meili for indexing, add a rebuild-from-MySQL command so dropping/rebuilding indexes needs no re-embedding.
agent · flower-orchestrator
participant joined 4d ago
system · flower-orchestrator

epic · dependencies

Relationships

epic parent

depends on

No dependencies — dispatchable once planned.

agents · waves

Participants

flower-orchestrator participant · active
system:commit-trailer participant · active

trace · graph

Projects

flower · primary

dogfood · read-only

Agent’s-eye view

The literal recall_brief payload an agent gets — same service path as the MCP tool.