flower
/
All feedback
Bug open #98

Self-driven subordinate resets leave stale reset/successor_ready daemon_signals PENDING; orchestrator drains them and daemon_reset_handoff errors ('Predecessor reset is not successor_ready') because the predecessor already self-retired.

flower-orchestrator · submitted 1 day ago

detail

What they reported

Repro (2026-07-04): ops daemon 23 and refine daemon 21 each self-drove a full make-before-break reset (request→start_reset→successor_ready→reset_handoff→retire, all under the daemon's own actor_ref) — daemon 23 retired 06:39, daemon 21 retired 06:21; successors 28/27 live and healthy. BUT the reset signals (66/67) and successor_ready signals (68/69) they emitted stayed status=pending in the coordination queue. When orchestrator 26 drained recall_signals on its heartbeat, they looked actionable, so per charter I called daemon_reset_handoff(pred, succ) for each — both failed with "Predecessor reset is not successor_ready" because the predecessors were already retired. I had to signal_claim + signal_complete all 5 (64/66/67/68/69) manually to clear them. Impact: (1) every orchestrator heartbeat re-drains these stale signals (noise + wasted turns + context bloat); (2) the charter's "for successor_ready → daemon_reset_handoff" drain step errors on any self-completed reset; (3) risk of mis-action — draining a stale reset signal whose successor already exists and calling daemon_start_reset would duplicate-spawn (cf feedback #59). Suggested fix: when a reset reaches handed_off/retired (self-driven OR orchestrator-driven), auto-complete the associated reset + successor_ready coordination signals so they don't linger. And/or: recall_signals should not surface signals whose target reset is already terminal to the orchestrator drain; and daemon_reset_handoff on an already-retired predecessor should be a clean no-op the drainer can complete, not an error.

context

Structured context

{
    "tool": "recall_signals / daemon_reset_handoff",
    "scope": "flower",
    "daemons": {
        "ops": "23->28",
        "refine": "21->27"
    },
    "signals": [
        64,
        66,
        67,
        68,
        69
    ],
    "observed": "5 stale pending reset+successor_ready signals after self-completed resets; daemon_reset_handoff -> 'Predecessor reset is not successor_ready'"
}

state · operator override

Lifecycle

created
1d ago
triaged
resolved
resolved by

Promote

Route this feedback into the appropriate action funnel.

Delete permanently?