flower
/
Docs Guides

The ingest pipeline

flower turns raw agent transcripts into searchable memory in four stages. Each stage is a queued job, so the pipeline is resilient and observable.

  transcript          ingest            segment            embed            index
 ┌──────────┐      ┌───────────┐     ┌────────────┐    ┌──────────┐    ┌─────────────┐
 │ Claude / │─────▶│ parse into │────▶│ LLM splits │───▶│ vectorize│───▶│ Meilisearch │
 │ Codex /  │      │ events,    │     │ into goal- │    │ each     │    │ hybrid      │
 │ Pi log   │      │ usage,     │     │ scoped     │    │ chunk    │    │ (keyword +  │
 └──────────┘      │ artifacts  │     │ segments   │    └──────────┘    │  vector)    │
                   └───────────┘     │ + summaries│                     └─────────────┘
                                     └────────────┘
   IngestSession      →      SegmentSession      →       EmbedChunks

1. Ingest

IngestSession reads a session's transcript through the right harness adapter (claude / codex / pi) and stores structured records: events, tool calls, token usage, and artifacts (the files it touched). This is pure parsing — no LLM yet.

2. Segment

SegmentSession runs an LLM over the ingested session to split it into segments — goal-scoped units, each with a summary, status, and next_steps. Big sessions are summarized map/reduce (chunked) so oversized inputs don't stall the provider.

Segmentation runs on an isolated long queue with generous timeouts, so heavy summaries never starve the live pipeline.

3. Embed

EmbedChunks turns each chunkable record into a vector embedding. Payloads are byte-bounded before they're sent, so a giant session can't blow past provider or index limits.

4. Index

Embeddings and text land in Meilisearch, which serves hybrid search — keyword and vector together. That's the corpus recall_search and /search query. Because agents and the UI share the exact query path, they see identical results.

Watching it run

Every stage is a Horizon job, so failures are visible:

  • Horizon — queue/job failures and retries. If recall looks stale, a worker likely died or a job is failing here.
  • flower_pipeline_status (a debug-mode MCP tool) — session counts by ingest state, segment counts, chunk totals: see exactly where things are stuck.
  • storage/logs/laravel.log — graceful-degradation paths log warnings here (e.g. hybrid-search fell back to keyword-only) instead of throwing.
  • /storage — the corpus's footprint over time (MySQL / Meilisearch / Redis), sampled on a schedule. A quick read on how fast the index is growing, and the groundwork for retention / auto-purge.

More on the debug tools and where to look: Using the flower MCP.