The ingest pipeline
flower turns raw agent transcripts into searchable memory in four stages. Each stage is a queued job, so the pipeline is resilient and observable.
transcript ingest segment embed index
┌──────────┐ ┌───────────┐ ┌────────────┐ ┌──────────┐ ┌─────────────┐
│ Claude / │─────▶│ parse into │────▶│ LLM splits │───▶│ vectorize│───▶│ Meilisearch │
│ Codex / │ │ events, │ │ into goal- │ │ each │ │ hybrid │
│ Pi log │ │ usage, │ │ scoped │ │ chunk │ │ (keyword + │
└──────────┘ │ artifacts │ │ segments │ └──────────┘ │ vector) │
└───────────┘ │ + summaries│ └─────────────┘
└────────────┘
IngestSession → SegmentSession → EmbedChunks
1. Ingest
IngestSession reads a session's transcript through the right harness
adapter (claude / codex / pi) and stores structured records: events,
tool calls, token usage, and artifacts (the files it touched). This is
pure parsing — no LLM yet.
2. Segment
SegmentSession runs an LLM over the ingested session to split it into
segments — goal-scoped units, each with a summary,
status, and next_steps. Big sessions are summarized map/reduce (chunked)
so oversized inputs don't stall the provider.
Segmentation runs on an isolated long queue with generous timeouts, so heavy summaries never starve the live pipeline.
3. Embed
EmbedChunks turns each chunkable record into a vector embedding. Payloads are
byte-bounded before they're sent, so a giant session can't blow past provider or
index limits.
4. Index
Embeddings and text land in Meilisearch, which serves hybrid search —
keyword and vector together. That's the corpus recall_search and /search
query. Because agents and the UI share the exact query path, they see identical
results.
Watching it run
Every stage is a Horizon job, so failures are visible:
- Horizon — queue/job failures and retries. If recall looks stale, a worker likely died or a job is failing here.
flower_pipeline_status(a debug-mode MCP tool) — session counts by ingest state, segment counts, chunk totals: see exactly where things are stuck.storage/logs/laravel.log— graceful-degradation paths log warnings here (e.g. hybrid-search fell back to keyword-only) instead of throwing./storage— the corpus's footprint over time (MySQL / Meilisearch / Redis), sampled on a schedule. A quick read on how fast the index is growing, and the groundwork for retention / auto-purge.
More on the debug tools and where to look: Using the flower MCP.