flower
/
All briefs
complete draft feedback flower

Prune orphaned SessionSegment chunks + cascade cleanup (recall-quality)

canonical · plan

Spec

markdown

hand-off · dispatch

Dispatch

Auto-dispatch

when it reaches planned

Design-loop

design pass before build

This brief is complete — dispatch is closed.

provenance · append-only

Trace

live
or paste a screenshot uploading…
  1. status change 2d ago
    agent · flower-ops
  2. participant joined 2d ago
    system · flower-ops
  3. link added 3d ago
    agent · system:commit-trailer
  4. participant joined 3d ago
    system · system:commit-trailer
  5. comment 3d ago

    Completed Brief #68 on branch flower/prune-orphaned-chunks. Commit: - 71b144d Prune orphaned recall chunks Scope: - Added flower:prune-orphaned-chunks with --dry-run, --limit, and --type. - Added OrphanedChunkPruner for generic polymorphic chunk orphan detection, bounded Meili document deletion by meili_id, then chunk_embeddings and chunks deletion. - Added MeiliIndexManager::deleteDocumentsByIds for payload-bounded delete batches. - Added SessionSegmentObserver to delete segment chunks, embeddings, and Meili docs on segment deletion. - Routed SegmentSession stale-tail deletion through the segment observer instead of deleting chunks directly. - Registered bounded scheduled reconcile: flower:prune-orphaned-chunks --limit=<config> hourly. - Added sqlite tests with a fake Meili manager for command dry-run/real-run, limit behavior, observer cleanup, and scheduler registration. Verification: - ~/bin/php vendor/bin/pint app/Console/Commands/PruneOrphanedChunks.php app/Jobs/SegmentSession.php app/Observers/SessionSegmentObserver.php app/Providers/FlowerServiceProvider.php app/Services/Search/MeiliIndexManager.php app/Services/Search/OrphanedChunkPruner.php config/flower.php tests/Feature/PruneOrphanedChunksTest.php - ~/bin/php artisan test --filter='PruneOrphanedChunksTest|SegmentSessionTest' - ~/bin/php artisan test Safety: never ran flower:prune-orphaned-chunks against live data, not even dry-run; only exercised it inside sqlite tests with a fake Meili manager. No migration added.

    agent · codex:w2:flower/prune-orphaned-chunks
  6. participant joined 3d ago
    system · codex:w2:flower/prune-orphaned-chunks
  7. plan proposed 3d ago

    # Prune orphaned SessionSegment chunks + cascade cleanup (recall-quality) **Origin:** feedback #47 → root-caused by flower-ops (cycle 144). Canonical spec: Solo scratchpad #1036 (proj 49); todo #690. Credit ops for the diagnosis + measurement. ## Problem (correctness / recall-quality, NOT availability) `SessionSegment` rows are **hard-deleted** on session re-ingest / re-segmentation, but the delete does **not cascade** to `chunks` (polymorphic `chunkable`) → `chunk_embeddings` → **Meili documents**. `EmbedChunks::buildChunks()` only rebuilds from *current* segments, so orphaned chunks are never revisited: orphaned `embedded`/`pending` rows stay stuck (the health WARN), and orphaned `indexed` rows **stay live in Meili and are returned by recall search** (stale deleted-segment content). ## Measured scope (project flower / id 16, 2026-07-02) - SessionSegment chunks: **11,223**; map to a live segment: 2,430; **orphaned: 8,793 (~78%)**. - Orphaned with `meili_id` (stale in Meili): **8,768**. - Orphaned embedding status: `indexed` **8,628** (≈57% of all 15,052 indexed embeddings — silent recall pollution), `embedded` 159 + `pending` 6 (= the visible "165 not indexed yet" WARN). ## Fix **Part A — one-time prune command** `flower:prune-orphaned-chunks {--dry-run} {--limit=} {--type=}`: 1. Find chunks whose polymorphic `chunkable` no longer exists (start with SessionSegment; make it generic across chunkable types and report any other orphaned types found). 2. Delete their **Meili docs by `meili_id`** in bounded batches (respect Meili payload limits) **before** deleting DB rows. 3. Delete `chunk_embeddings` then `chunks` in bounded batches. 4. Report found / meili-deleted / db-deleted. `--dry-run` counts only. Run `--dry-run` first, confirm ~8,793 / ~8,768, then execute. Clears both the WARN and the stale-search pollution. **Part B — self-heal (durable fix):** - Cascade cleanup at segment-deletion time: a `SessionSegment` `deleting` model event/observer (or FK ON DELETE CASCADE for DB rows + an observer for the Meili side) that removes the segment's chunks + chunk_embeddings + **Meili docs**. Route the re-segmentation delete path through it. - Safety-net **scheduled reconcile** (extend `SessionIngestStateReconciler` or a small new job) that periodically prunes orphaned chunks in bounded batches, so any future gap self-heals. - Meili deletion MUST be in both paths — the current gap is that nothing removes the Meili doc. **Do NOT use `flower:reindex-from-vectors`** — it upserts to Meili but never stamps `status=indexed` (can't clear the WARN) and would re-index orphans. ## Verify - Post-prune: orphan count → 0; health WARN clears; `recall_search` no longer returns deleted-segment content. - Re-ingest/re-segment a test session → its OLD segments' chunks + embeddings + Meili docs are gone (no new orphans). - `php artisan test` green; pint clean. ## Dispatch / execution notes - BUILD (command + observer + reconcile + tests) is safe in a worktree on sqlite — dispatchable to a Codex or Claude lane (ops recommends Codex). - **The one-time prune EXECUTION against MAIN is destructive** (deletes ~8.8k Meili docs + ~8.8k DB rows) and stays with the ORCHESTRATOR on MAIN, run only after: dry-run reviewed + operator greenlight. Workers must NOT run the prune against live data.

    agent · flower-orchestrator
  8. note added 3d ago

    Root-caused by flower-ops (cycle 144) from feedback #47; full spec in Solo scratchpad #1036 (proj 49), tracked as todo #690. SessionSegment hard-deletes on re-segmentation don't cascade to chunks → chunk_embeddings → Meili docs, so 8,793 orphaned chunks (78% of 11,223) persist; 8,628 are still status=indexed and live in Meili = ~57% of the searchable corpus is stale deleted-segment content polluting recall. The 165 embed-WARN is just the visible tip (159 embedded + 6 pending, all orphaned). Fix = (A) bounded, dry-run-first prune command + (B) cascade-on-delete + scheduled reconcile self-heal. NOT a re-push (my #47 framing was wrong — re-pushing would re-index deleted segments). Full spec to follow.

    agent · flower-orchestrator
  9. participant joined 3d ago
    system · flower-orchestrator

epic · dependencies

Relationships

epic parent

depends on

No dependencies — dispatchable once planned.

agents · waves

Participants

  • flower-orchestrator participant · active
  • codex:w2:flower/prune-orphaned-chunks participant · active
  • system:commit-trailer participant · active
  • flower-ops participant · active

trace · graph

Links

  • Commit #1592 execution

scope

Projects

  • flower · primary

dogfood · read-only

Agent’s-eye view

The literal recall_brief payload an agent gets — same service path as the MCP tool.