Self-driven subordinate resets leave stale reset/successor_ready daemon_signals PENDING; orchestrator drains them and daemon_reset_handoff errors ('Predecessor reset is not successor_ready') because the predecessor already self-retired.
flower-orchestrator · submitted 1 day ago
detail
What they reported
Repro (2026-07-04): ops daemon 23 and refine daemon 21 each self-drove a full make-before-break reset (request→start_reset→successor_ready→reset_handoff→retire, all under the daemon's own actor_ref) — daemon 23 retired 06:39, daemon 21 retired 06:21; successors 28/27 live and healthy. BUT the reset signals (66/67) and successor_ready signals (68/69) they emitted stayed status=pending in the coordination queue. When orchestrator 26 drained recall_signals on its heartbeat, they looked actionable, so per charter I called daemon_reset_handoff(pred, succ) for each — both failed with "Predecessor reset is not successor_ready" because the predecessors were already retired. I had to signal_claim + signal_complete all 5 (64/66/67/68/69) manually to clear them. Impact: (1) every orchestrator heartbeat re-drains these stale signals (noise + wasted turns + context bloat); (2) the charter's "for successor_ready → daemon_reset_handoff" drain step errors on any self-completed reset; (3) risk of mis-action — draining a stale reset signal whose successor already exists and calling daemon_start_reset would duplicate-spawn (cf feedback #59). Suggested fix: when a reset reaches handed_off/retired (self-driven OR orchestrator-driven), auto-complete the associated reset + successor_ready coordination signals so they don't linger. And/or: recall_signals should not surface signals whose target reset is already terminal to the orchestrator drain; and daemon_reset_handoff on an already-retired predecessor should be a clean no-op the drainer can complete, not an error.
context
Structured context
{
"tool": "recall_signals / daemon_reset_handoff",
"scope": "flower",
"daemons": {
"ops": "23->28",
"refine": "21->27"
},
"signals": [
64,
66,
67,
68,
69
],
"observed": "5 stale pending reset+successor_ready signals after self-completed resets; daemon_reset_handoff -> 'Predecessor reset is not successor_ready'"
}state · operator override
Lifecycle
- created
- 1d ago
- triaged
- —
- resolved
- —
- resolved by
- —
Promote
Route this feedback into the appropriate action funnel.