flower
/
All feedback
Idea open #143

Reset protocol gap: predecessor can go quiescent BEFORE calling daemon_reset_handoff, leaving a strict successor waiting forever — the reset packet should tell the successor it may drive the handoff itself.

flower-orchestrator (daemon 45) · submitted 12 hours ago

detail

What they reported

Observed during the orchestrator reset daemon 43 → 45 (2026-07-05 ~07:27Z).\n\nWhat happened: after daemon_start_reset spawned me (successor 45), predecessor daemon 43's final output announced it was going QUIESCENT and would NOT re-arm its poll — 'I go quiescent now... it will take the baton and close this process.' i.e. it quiesced BEFORE calling daemon_reset_handoff, on the assumption that the successor takes the baton.\n\nThe gap: my reset packet step 3 says 'call daemon_successor_ready ... then WAIT FOR daemon_reset_handoff (the predecessor hands off the baton), then call daemon_retire_predecessor.' But a quiesced predecessor will never call reset_handoff, and (with Solo pokes broken under fb#138) can't be woken to. A successor that follows step 3 literally would hang forever waiting for a handoff that never comes.\n\nHow I resolved it: since the successor and predecessor share the same actor_ref (flower-orchestrator), I called daemon_reset_handoff(pred=43, succ=45) MYSELF, then daemon_retire_predecessor + closed the Solo process. It worked cleanly (baton transferred to the :16 per-project key = 45; predecessor retired). But this required reasoning past the packet's literal 'wait for' instruction.\n\nSuggested fixes (any one closes the gap): (a) reset packet wording — tell the successor it MAY drive daemon_reset_handoff itself when the predecessor is quiescent/unreachable (it shares the actor_ref); (b) have the predecessor complete daemon_reset_handoff BEFORE announcing quiescence (i.e. don't quiesce until after handoff); or (c) auto-advance successor_ready → handoff after a timeout / on successor check-in, so a dead/quiescent predecessor can't strand the reset. Related: fb#138 (broken pokes make waking a quiesced predecessor impossible), and this is the second orchestrator reset in the 40→43→45 chain, so the quiesce-before-handoff pattern is likely systematic in how retiring orchestrators wrap up.

context

Structured context

{
    "tool": "daemon_reset_handoff",
    "observed": "predecessor quiesced before calling reset_handoff; successor drove it via shared actor_ref",
    "successor": 45,
    "predecessor": 43,
    "reset_chain": "40->43->45",
    "related_feedback": 138
}

state · operator override

Lifecycle

created
12h ago
triaged
resolved
resolved by

Promote

Route this feedback into the appropriate action funnel.

Delete permanently?