flower
/
All briefs
complete draft note flower
epic · Notes for the /roster view: 1) Let's add a 'Go to r...

Daemon self-reset / rolling replacement (palette refresh)

canonical · plan

Spec

markdown

hand-off · dispatch

Dispatch

Auto-dispatch

when it reaches planned

Design-loop

design pass before build

This brief is complete — dispatch is closed.

provenance · append-only

Trace

live
or paste a screenshot uploading…
  1. note added 2d ago

    CRITICAL DESIGN CONSTRAINT discovered during the first live orchestrator handoff (2026-07-03): the successor must NOT be spawned by the predecessor agent. When the outgoing orchestrator (969) spawned its successor (996) via agent-level `mcp__solo__spawn_agent`, Solo made 996 a SUB-PROCESS of 969. Consequences: (a) N (child) may be unable to close O (parent), and (b) closing O may cascade-kill N → headless fleet. The N-closes-O handshake only works when N is NOT a child of O. FIX for #76's automated reset: spawn the successor via the APP's SpawnDaemonBridge → `solo-cli processes spawn` (which creates a TOP-LEVEL Solo-managed process owned by Solo, not a child of the calling agent), NOT via an agent calling spawn_agent. The DaemonResetService already references SpawnDaemonBridge — good; the constraint is that the LIVE reset must go through that app path, and a self-resetting daemon must NOT spawn its own successor as an agent-child. In the manual handoff we mitigated by leaving the predecessor (969) inert (no baton, heartbeat cancelled) and retiring it externally rather than via N-closes-O. Add a test/guard + spec note: successor spawn path = SpawnDaemonBridge (top-level), and document the "don't close your own parent / don't spawn your successor as your child" rule in AgentConventions.

    agent · flower-orchestrator
  2. link added 2d ago
    agent · system:commit-trailer
  3. participant joined 2d ago
    system · system:commit-trailer
  4. status change 2d ago
    agent · flower-orchestrator
  5. link added 2d ago
    agent · flower-orchestrator
  6. status change 2d ago
    agent · flower-orchestrator
  7. plan proposed 2d ago

    # Daemon self-reset / rolling replacement (make-before-break) Child of #66 (66f). Composes 66c (spawn) + 66d (wind-down/close) + #84 (signals). Design LOCKED in operator session 2026-07-03. **Full scope, not a v1.** ## Purpose Reset a daemon to a FRESH MCP/tool palette + clean context without a coverage gap. Needed because a daemon's loaded MCP tools / slash commands go stale as the app ships new ones (only negotiated at session start). General primitive for ANY daemon; the platform-orchestrator self-reset is the hard case. ## State + baton - `reset_state` on `daemon_agents`: `none → resetting → successor_spawned → successor_ready → handed_off → retired` (+ `failed`), with timestamps. - **Baton** = single-orchestrator invariant. Implement however is cleanest (a `Setting` `active_orchestrator_daemon_id`, or a `daemon_agents.holds_orchestrator_baton` bool — designer's choice). Exactly ONE orchestrator holds it; **draining signals + merging to master are GATED on holding the baton** (split-brain guard). Transferred atomically at handoff. ## Handshake — make-before-break (predecessor P, successor S) 1. **Trigger** (manual/auto): P sets `reset_state=resetting`, writes its handoff packet (66d resume-safe scratchpad: goal, in-flight work, pending signals, open briefs). **P keeps running.** 2. **Spawn S** via SpawnDaemonBridge (66c) with a "successor to P + handoff pointer" packet. `reset_state=successor_spawned`. 3. **S boots fresh** (current palette by construction), checks in, reads the handoff, emits a `successor_ready` signal (#84). `reset_state=successor_ready`. 4. **Confirm gate**: P sees `successor_ready` + verifies S checked in / live. Do NOT over-verify the palette — a fresh session gets current tools by construction; if it somehow didn't, it's no worse off than P was (operator call). 5. **Baton pass** (orchestrator case): P flushes in-flight state, atomically transfers the baton to S, `reset_state=handed_off`. Non-orchestrator daemons: no baton, just handoff. 6. **S closes P** ← the key move (resolves who-closes-the-closer): S, holding the baton + CloseDaemon (66d), closes P via `solo-cli processes stop`. `reset_state=retired`. 7. **S takes over** — sole orchestrator, fresh palette, continues from the handoff. ## Triggers (auto AND manual) - **Manual**: a "reset/refresh" button per daemon on /roster (next to poke/retire) → enqueue a `reset` signal. - **Auto (palette drift)**: `flower:mcp-tool-drift` detects the daemon's loaded palette is stale vs current → enqueue a `reset` signal. - **Auto (context threshold)**: OWNED BY THE SEPARATE SCHEDULER BRIEF (context ≥ threshold → reset after finishing beneficial work). #76 exposes the reset trigger; the scheduler invokes it. ## Failure / retry (config-driven) - If S doesn't emit `successor_ready` within `reset.successor_boot_timeout` (~2-3 min default): RETRY — immediate on first retry, then backoff. Config: `reset.max_retries` + `reset.backoff_base_seconds`/`reset.backoff_multiplier` (reasonable defaults). - After max retries: ABORT — P stays live (never closed), `reset_state=failed→none`, alert operator. No headless gap (P never left). - Watchdog (edge): if P dies mid-swap before handoff, S (once live) sees no baton-holder and self-promotes. ## Deliverables - Migration: `reset_state` + timestamps on daemon_agents; the baton marker. - A reset-orchestration service (the make-before-break state machine) built on SpawnDaemonBridge + CloseDaemon + CoordinationQueue. - Signals/tools: add `reset` + `successor_ready` signal kinds (extend #84); a `daemon_request_reset` enqueue path + the successor-ready ack; baton transfer. - Roster UI: "reset/refresh" button per daemon. - AgentConventions: the reset protocol (predecessor: write handoff + manage swap; successor: read handoff + emit successor_ready + close predecessor; baton + drain/merge gating). - Config: `reset.*` (boot timeout, max_retries, backoff, defaults). ## Guardrails - Worktree, sqlite tests, MOCK spawn/close/signals — NEVER perform a real orchestrator swap. Migration written, run on MAIN by the orchestrator. The live self-reset is an operator/orchestrator action on MAIN. ## Tests (sqlite) State-machine transitions; baton single-holder + drain/merge gating; successor_ready gate; retry/backoff on boot timeout; abort-keeps-predecessor-live; N-closes-O path (mocked close); split-brain guard. ## Done = Full make-before-break reset (any daemon + orchestrator self-reset via N-closes-O) with baton, auto+manual triggers, retry/backoff, tests green, committed. Live swap + migration are the orchestrator's.

    agent · flower-orchestrator
  8. unblocked 2d ago

    Unblocked — #84 reached complete.

    system · flower-orchestrator
  9. dependency added 2d ago

    Now depends on #84 (Orchestrator coordination queue (DB substrate) — wire poke (66b) + fix feedback→orchestrator routing (#49)).

    agent · flower-orchestrator
  10. unblocked 2d ago

    Unblocked — #73 reached complete.

    system · flower-orchestrator
  11. dependency added 3d ago

    Now depends on #73 (Roster: graceful wind-down — 'Retire' / 'Turn out the lights').

    agent · flower-orchestrator
  12. dependency added 3d ago

    Now depends on #72 (Roster: ensure daemons — 'Water the garden' / 'And there was light').

    agent · flower-orchestrator
  13. parent set 3d ago

    Grouped under epic #66.

    agent · flower-orchestrator
  14. note added 3d ago

    Child of #66 (new — from the self-reset discussion, 2026-07-02). Composes 66c (spawn) + 66d (wind-down) into a make-before-break self-replacement: spawn the replacement seeded with the handoff packet → confirm it is live + checked in → hand off → close self. The fresh session boots with the CURRENT MCP + slash-command palette — the clean fix for "a new MCP tool/command shipped but running daemons can't see it until restart" (MCP + commands negotiate only at session start). Enabled by #67 (the replacement self-resolves identity from env + context from ingested usage → correct roster correlation with zero reconfig). Trigger: manual "refresh" button or auto via flower:mcp-tool-drift (hashes the tool set — a daemon compares its loaded palette to current). HARD CASE: the platform orchestrator resetting ITSELF — needs a successor↔predecessor handshake (old spawns successor with "take over from me; signal when live + checked in" → successor tells predecessor to close), same "orchestrator closes last" invariant with a replacement spliced in. Deps: 66c + 66d. L.

    agent · flower-orchestrator
  15. participant joined 3d ago
    system · flower-orchestrator

epic · dependencies

Relationships

depends on

agents · waves

Participants

  • flower-orchestrator participant · active
  • system:commit-trailer participant · active

trace · graph

Links

  • Commit #1615 execution
  • Scratchpad #346 execution

scope

Projects

  • flower · primary

dogfood · read-only

Agent’s-eye view

The literal recall_brief payload an agent gets — same service path as the MCP tool.