flower
/
All briefs
idea draft feedback flower
from feedback #123 · Coordination queue not draining after orchestrator r...

Coordination-drain resilience: auto-complete a reset's own signals + stale-signal safety-net sweep + guarantee non-orch resets within one heartbeat (fb #123)

Dispatch

canonical · plan

Spec

markdown

hand-off · dispatch

Dispatch

Auto-dispatch

when it reaches planned

Design-loop

design pass before build

Direct dispatch — no refine required. The packet tells the agent to ask questions only if the request is blocked by ambiguity.

kind

No dispatch requests yet — dispatch above to generate a copy-paste packet.

provenance · append-only

Trace

live
or paste a screenshot uploading…
  1. link added 19h ago
    agent · flower-orchestrator
  2. note added 19h ago

    From feedback #123 (refine + ops diagnosis, fix-spec scratchpad 1095). The acute incident is already remediated by hand (2026-07-05 00:34: drained the stuck queue — ops reset #129 → daemon_start_reset spawned ops successor 41; vestigial #127/#128 cleared; #118 parent-misfire failed). This brief is the DURABLE fix so it can't recur by luck. ## Root cause Two compounding factors: (1) the orchestrator (daemon 40) processed auto_dispatch signals opportunistically *while dispatching briefs* but did NOT run a full `recall_signals` drain each heartbeat, so RESET-kind + route + successor_ready signals piled up (ops's routine reset stranded ~1.7h; ops climbed 48%→66%). (2) feedback #127 (interactive AskUserQuestion) blocked the work loop for the final stretch, worsening it. #127 fix (no interactive questions) is already filed/memoried; this brief is the RECOVER half. ## Deliverable (recover / resilience) 1. **Auto-complete a reset's own coordination signals.** When a reset completes (predecessor retired via daemon_retire_predecessor), auto-complete its `reset` (#127-style) and `successor_ready` (#128-style) signals so they don't linger pending forever (they stranded across BOTH 28→35 and 36→40 resets). Same for a non-orch reset once the successor retires the predecessor. 2. **Stale-signal safety-net sweep.** A scheduled sweep (mirror `flower:reconcile-auto-dispatch`) that detects coordination signals pending beyond a threshold (esp. `reset` / `successor_ready` — a routine reset should never wait >1 heartbeat) and surfaces/recovers them: re-target signals whose `target_daemon_id` is a retired predecessor to the live orchestrator, and alert (recall_health warning + optional daemon_poke) when a reset request sits unactioned. Auto_dispatch already has a reconcile sweep; coordination-reset has none. 3. **Guarantee non-orch resets within one heartbeat.** Ensure the baton-holding orchestrator's drain executes `daemon_start_reset` for a pending ops/refine reset promptly regardless of stale target ids or a big dispatch wave in progress (reset-kind should not sit behind an auto_dispatch backlog). 4. **Regression tests:** ops requests reset → orchestrator executes daemon_start_reset within one heartbeat; a completed reset leaves no pending #127/#128; the stale sweep re-targets/clears predecessor-targeted signals. ## Constraints / refs Additive; don't change the make-before-break lifecycle or the baton/single-merge invariant. Charter side (drain-every-heartbeat discipline + no-interactive-questions) folds into the #243 charter-refresh brief. Refs: fb #123, fb #127, #226 (daemon lifecycle), fix-spec scratchpad 1095, todo 393/713.

    agent · flower-orchestrator
  3. participant joined 19h ago
    system · flower-orchestrator

epic · dependencies

Relationships

epic parent

depends on

No dependencies — dispatchable once planned.

agents · waves

Participants

  • flower-orchestrator participant · active

trace · graph

Links

  • Feedback #123 seed

scope

Projects

  • flower · primary

dogfood · read-only

Agent’s-eye view

The literal recall_brief payload an agent gets — same service path as the MCP tool.