flower
/
All briefs
complete draft note flower

FLOWER-16 endgame: adaptive summarize subdivide + per-session failed-attempt circuit breaker

canonical · plan

Spec

markdown

hand-off · dispatch

Dispatch

Auto-dispatch

when it reaches planned

Design-loop

design pass before build

This brief is complete — dispatch is closed.

provenance · append-only

Trace

live
or paste a screenshot uploading…
  1. link added 3d ago
    agent · system:commit-trailer
  2. link added 3d ago
    agent · system:commit-trailer
  3. participant joined 3d ago
    system · system:commit-trailer
  4. status change 3d ago
    agent · flower-orchestrator
  5. comment 3d ago

    Implemented Brief #58 on branch flower/flower16-subdivide. Commits: - e346ae2 Parameterize adaptive summarizer subdivision - 88ab091 Add summarize failure circuit breaker Scope: - Confirmed adaptive timeout-only subdivision was already present on master; added missing configurable chunk_timeout_subdivide_factor and non-timeout map error regression coverage. - Added sessions.summarize_failures + sessions.last_summarize_failed_at migration and Session casts. - Added SummarizeFailurePolicy with max_failures and failure_retry_after_minutes config. - SegmentSession now preflight-skips capped/recently failed sessions, increments/stamps on failure, and resets counters on success. - IngestSession::chainSegmentation now skips dispatch when failure policy says capped or retry-debounced. - flower:ingest-backlog now surfaces summary failed circuit-open and summary retry debounced counts/samples, and excludes those sessions from rekick. Verification: - ~/bin/php vendor/bin/pint app/Console/Commands/IngestBacklog.php app/Jobs/IngestSession.php app/Jobs/SegmentSession.php app/Models/Session.php app/Services/Summarize/AiSegmentSummarizer.php app/Services/Summarize/SummarizeFailurePolicy.php config/flower.php database/migrations/2026_07_02_083000_add_summarize_failures_to_sessions_table.php tests/Feature/ChunkedSummarizerTest.php tests/Feature/FoundationTest.php tests/Feature/IngestBacklogCommandTest.php tests/Feature/SegmentSessionTest.php tests/Feature/TranscriptIngestTest.php - ~/bin/php artisan test --filter='ChunkedSummarizerTest|SegmentSessionTest|TranscriptIngestTest|IngestBacklogCommandTest|FoundationTest|ReconcileIngestStateCommandTest' - ~/bin/php artisan test Safety: sqlite tests only. Did not run migrate, live LLM, backfill, or mutating tinker against the worktree/live DB. Handoff flag: migration + SegmentSession/IngestSession/summarizer Horizon job code changed; MAIN needs migrate plus graceful Horizon reload after merge.

    agent · codex:w2:flower/flower16-subdivide
  6. comment 3d ago

    Started implementation on branch flower/flower16-subdivide from master. Grounded with recall_brief #58 and prior adaptive chunking memory. Safety boundary acknowledged: sqlite php artisan test only; will not run live LLM, migrate, backfill, or mutating tinker against the worktree/live DB. Will flag MAIN migrate + graceful Horizon reload because this touches Session/IngestSession/SegmentSession/summarizer job code.

    agent · codex:w2:flower/flower16-subdivide
  7. participant joined 3d ago
    system · codex:w2:flower/flower16-subdivide
  8. status change 3d ago
    agent · flower-orchestrator
  9. note added 3d ago

    ## Context (FLOWER-16 residual; ops cycles 126/128; operator-directed 2026-07-02) After #53's giant guard (>50M-token sessions skipped as too_large), the ongoing failed_jobs bleed is MODERATE sessions (e.g. 115, 406, 897 — 1.7–5.4k events, UNDER the 50M ceiling) whose summarization chunk requests fail with **cURL error 28 @ 60s** (provider stalls). Confirmed mechanics in `IngestSession::chainSegmentation()` (app/Jobs/IngestSession.php:289): `flower:watch` re-ingests a session whenever its transcript file signature changes; IngestSession re-summarizes UNLESS (a) too_large, or (b) a SessionSegment was created in the last 10 min (a debounce). **A perpetually-FAILING session never creates a SessionSegment, so the (success-keyed) debounce never trips → it re-dispatches SegmentSession every watch tick and fails forever.** Two coordinated fixes: ## Fix 1 — Adaptive reduce-on-timeout (subdivide) in the summarizer When a summarization HTTP request fails with a TIMEOUT (cURL-28 / ConnectException / provider stall), **recursively subdivide that chunk** into smaller sub-chunks and retry each, down to a configurable floor. This is the FLOWER-16 endgame that was planned ("adaptive reduce-on-timeout"). Wire it into the chunked map/reduce summarizer (SummarizerService / the SegmentSession chunk loop). Respect config/flower.php `summarize.*` (chunk thresholds/sizes/max_tokens); add config for the subdivide factor + min-chunk floor. Goal: large-but-under-ceiling sessions (115/406/897) SUCCEED instead of timing out. Distinguish a timeout (subdivide) from a genuine hard error (don't infinitely subdivide non-timeout failures). ## Fix 2 — Per-session failed-attempt counter + enforcement (circuit breaker) So a session that still can't be summarized STOPS churning: - Migration (sqlite-safe): add `summarize_failures` (int, default 0) + `last_summarize_failed_at` (nullable ts) to `sessions`. - SegmentSession: on failure, increment `summarize_failures` + stamp `last_summarize_failed_at`; on success, reset `summarize_failures=0`. - ENFORCE in `IngestSession::chainSegmentation()` (and/or SegmentSession preflight): if `summarize_failures >= config('flower.summarize.max_failures', 3)`, do NOT re-dispatch — skip cleanly like the too_large circuit-break (no new failed_jobs). Surface these in `flower:ingest-backlog` counts (a distinct 'summary_failed' state or a counter-based bucket) so they're VISIBLE, not silently dropped. - Also debounce FAILED attempts by `last_summarize_failed_at` (don't retry a failing session more than once per config window even before the cap), so retries are spaced, not every tick. ## Acceptance - A large-but-under-ceiling session that previously timed out now summarizes via subdivide (unit test: simulated timeout on the large chunk → succeeds after subdivide). - A session that fails `max_failures` times stops being re-dispatched (no new failed_jobs) + is counted/visible in ingest-backlog; a later success resets the counter. - Active/normal sessions unaffected (still ~10-min debounced; subdivide only triggers on timeout). - Tests (sqlite): subdivide-on-timeout success; counter increment + reset; enforcement caps re-dispatch; non-timeout error does NOT subdivide. Suite green; pint. - FLAG: SegmentSession/IngestSession/summarizer are Horizon job code → graceful Horizon reload on merge; migration → orchestrator runs migrate on MAIN. ## Guardrails ⚠️ sqlite `php artisan test` ONLY; no live LLM/DB from the worktree. Orchestrator runs migrate + Horizon reload + any reconcile on MAIN. ## Provenance FLOWER-16 endgame. Operator-directed 2026-07-02: "fix FLOWER-16 with the adaptive subdivide + put a counter for failed attempts to summarize on the model(s) and enforce that." Answers the "re-summarizing active sessions?" question: yes, but 10-min debounced (success-keyed) — the churn is FAILING sessions bypassing that debounce; this counter closes it.

    agent · flower-orchestrator
  10. participant joined 3d ago
    system · flower-orchestrator

epic · dependencies

Relationships

epic parent

depends on

No dependencies — dispatchable once planned.

agents · waves

Participants

  • flower-orchestrator participant · active
  • codex:w2:flower/flower16-subdivide participant · active
  • system:commit-trailer participant · active

trace · graph

Links

  • Commit #1579 execution
  • Commit #1580 execution

scope

Projects

  • flower · primary

dogfood · read-only

Agent’s-eye view

The literal recall_brief payload an agent gets — same service path as the MCP tool.