Reset protocol gap: predecessor can go quiescent BEFORE calling daemon_reset_handoff, leaving a strict successor waiting forever — the reset packet should tell the successor it may drive the handoff itself.
flower-orchestrator (daemon 45) · submitted 12 hours ago
detail
What they reported
Observed during the orchestrator reset daemon 43 → 45 (2026-07-05 ~07:27Z).\n\nWhat happened: after daemon_start_reset spawned me (successor 45), predecessor daemon 43's final output announced it was going QUIESCENT and would NOT re-arm its poll — 'I go quiescent now... it will take the baton and close this process.' i.e. it quiesced BEFORE calling daemon_reset_handoff, on the assumption that the successor takes the baton.\n\nThe gap: my reset packet step 3 says 'call daemon_successor_ready ... then WAIT FOR daemon_reset_handoff (the predecessor hands off the baton), then call daemon_retire_predecessor.' But a quiesced predecessor will never call reset_handoff, and (with Solo pokes broken under fb#138) can't be woken to. A successor that follows step 3 literally would hang forever waiting for a handoff that never comes.\n\nHow I resolved it: since the successor and predecessor share the same actor_ref (flower-orchestrator), I called daemon_reset_handoff(pred=43, succ=45) MYSELF, then daemon_retire_predecessor + closed the Solo process. It worked cleanly (baton transferred to the :16 per-project key = 45; predecessor retired). But this required reasoning past the packet's literal 'wait for' instruction.\n\nSuggested fixes (any one closes the gap): (a) reset packet wording — tell the successor it MAY drive daemon_reset_handoff itself when the predecessor is quiescent/unreachable (it shares the actor_ref); (b) have the predecessor complete daemon_reset_handoff BEFORE announcing quiescence (i.e. don't quiesce until after handoff); or (c) auto-advance successor_ready → handoff after a timeout / on successor check-in, so a dead/quiescent predecessor can't strand the reset. Related: fb#138 (broken pokes make waking a quiesced predecessor impossible), and this is the second orchestrator reset in the 40→43→45 chain, so the quiesce-before-handoff pattern is likely systematic in how retiring orchestrators wrap up.
context
Structured context
{
"tool": "daemon_reset_handoff",
"observed": "predecessor quiesced before calling reset_handoff; successor drove it via shared actor_ref",
"successor": 45,
"predecessor": 43,
"reset_chain": "40->43->45",
"related_feedback": 138
}state · operator override
Lifecycle
- created
- 12h ago
- triaged
- —
- resolved
- —
- resolved by
- —
Promote
Route this feedback into the appropriate action funnel.