Per-worker roster identity — workers as first-class roster citizens

canonical · plan

Spec

markdown

hand-off · dispatch

Dispatch

Auto-dispatch

when it reaches planned

Design-loop

design pass before build

This brief is complete — dispatch is closed.

#145 done fresh flower

claimed by flower-278-worker

You are being dispatched from flower Brief #278: Per-worker roster identity — workers as first-class roster citizens

Recall pointer:
- Use recall_brief with id 278 for the full folder if you need provenance.

Target:
- project: flower (/Users/mikeferrara/Documents/code/flower)
- branch: choose an appropriate branch
- worktree: not specified
- kind: fresh

Current brief spec:
# Per-worker roster identity — workers as first-class roster citizens

**Overnight auto-dispatch, MANDATORY review-gated (touches the LIVE fleet's roster model).** Parent epic #263.

## Problem (proven tonight)
`daemon_agents` / `daemon_checkin` key the roster row by **role+project** (one row per role per project). Distinct workers with the same role collapse into one shared row: three flower-only workers (`flower-only-tester`, `flower-only-tester-2`, `lxc-claude-agent-1`) all mapped to a single `flower-other` row — only the audit trail kept their real actor_refs. This blocks the operator's target: **"the roster = anything working that flower knows about."** For a fleet of remote LXC workers (M LXCs × N agents) each needs its own roster presence.

## Goal
Key the roster row on **actor_ref within project** so each distinct worker gets its own row — while existing standing daemons keep working **exactly** as today.

## Spec
1. Change `daemon_checkin` / the daemon_agents keying so upsert is by `(project_id, actor_ref)` rather than `(project_id, role)`. Each distinct `actor_ref` → its own row.
2. **BACKWARD-COMPATIBILITY (load-bearing):** the standing daemons' actor_refs are already role-derived + unique per project (`flower-orchestrator`, `flower-ops`, `flower-refine`, `lounge-orchestrator`, …). Keying on actor_ref must keep each as a single stable row — verify their check-ins are byte-for-byte unaffected (same row reused; heartbeats / reset / winddown all intact).
3. **Migration:** safe, idempotent migration over existing `daemon_agents` data; change the unique constraint from role-keyed to actor_ref-keyed without data loss on live rows.
4. **Roster read:** `recall_roster` shows all worker rows. Optionally add a lightweight `kind`/`ephemeral` marker to distinguish standing daemons from ephemeral workers so the roster stays legible — keep minimal, do not over-scope.
5. The `role` enum stays (workers use `other`); identity now comes from actor_ref, not role.

## Acceptance
- `php artisan test` green, new tests: (a) two check-ins with distinct actor_refs (role=other) → **two** rows; (b) an existing role-based daemon (actor_ref `flower-orchestrator`) checking in repeatedly → still **one** stable row; (c) migration idempotent + preserves live rows.
- **Live-fleet safety:** must not disrupt the running orchestrator/ops/refine check-in / heartbeat / reset. The reviewer must confirm.
- `./vendor/bin/pint` clean.

## MANDATORY review
Modifies the daemon/roster model the LIVE fleet depends on. `brief_request_review` force is set — MUST pass an independent adversarial review (backward-compat + migration safety) before it can complete/merge. If the reviewer has ANY doubt about live-fleet safety, hold for the operator.

## Non-goals
- The nested {node → LXC/worktree → agents} tree view (later polish).
- Changing how remote workers are spawned/dispatched.

Recent/key trace events:
[1] participant_joined claude-interactive: (no body)
[2] note_added claude-interactive: Key daemon_agents / daemon_checkin on actor_ref (within project) so N distinct workers surface as N roster rows instead of collapsing to one role-derived slot (proven collapse tonight: two flower-only workers → one shared flower-other row). MUST be backward-compatible: existing standing daemons (flower-orchestrator/ops/refine, lounge-*) keep checking in with ZERO disruption. Realizes Mike's target: "the roster = anything working that flower knows about." Overnight, MANDATORY review-gated (touches the live fleet's roster model — must not merge without a passing adversarial review confirming backward-compat + migration safety). Full spec to follow.
[3] plan_proposed claude-interactive: # Per-worker roster identity — workers as first-class roster citizens

**Overnight auto-dispatch, MANDATORY review-gated (touches the LIVE fleet's roster model).** Parent epic #263.

## Problem (proven tonight)
`daemon_agents` / `daemon_checkin` key the roster row by **role+project** (one row per role per project). Distinct workers with the same role collapse into one shared row: three flower-only workers (`flower-only-tester`, `flower-only-tester-2`, `lxc-claude-agent-1`) all mapped to a single `flower-other` row — only the audit trail kept their real actor_refs. This blocks the operator's target: **"the roster = anything working that flower knows about."** For a fleet of remote LXC workers (M LXCs × N agents) each needs its own roster presence.

## Goal
Key the roster row on **actor_ref within project** so each distinct worker gets its own row — while existing standing daemons keep working **exactly** as today.

## Spec
1. Change `daemon_checkin` / the daemon_agents keying so upsert is by `(project_id, actor_ref)` rather than `(project_id, role)`. Each distinct `actor_ref` → its own row.
2. **BACKWARD-COMPATIBILITY (load-bearing):** the standing daemons' actor_refs are already role-derived + unique per project (`flower-orchestrator`, `flower-ops`, `flower-refine`, `lounge-orchestrator`, …). Keying on actor_ref must keep each as a single stable row — verify their check-ins are byte-for-byte unaffected (same row reused; heartbeats / reset / winddown all intact).
3. **Migration:** safe, idempotent migration over existing `daemon_agents` data; change the unique constraint from role-keyed to actor_ref-keyed without data loss on live rows.
4. **Roster read:** `recall_roster` shows all worker rows. Optionally add a lightweight `kind`/`ephemeral` marker to distinguish standing daemons from ephemeral workers so the roster stays legible — keep minimal, do not over-scope.
5. The `role` enum stays (workers use `other`); identity now comes from actor_ref, not role.

## Acceptance
- `php artisan test` green, new tests: (a) two check-ins with distinct actor_refs (role=other) → **two** rows; (b) an existing role-based daemon (actor_ref `flower-orchestrator`) checking in repeatedly → still **one** stable row; (c) migration idempotent + preserves live rows.
- **Live-fleet safety:** must not disrupt the running orchestrator/ops/refine check-in / heartbeat / reset. The reviewer must confirm.
- `./vendor/bin/pint` clean.

## MANDATORY review
Modifies the daemon/roster model the LIVE fleet depends on. `brief_request_review` force is set — MUST pass an independent adversarial review (backward-compat + migration safety) before it can complete/merge. If the reviewer has ANY doubt about live-fleet safety, hold for the operator.

## Non-goals
- The nested {node → LXC/worktree → agents} tree view (later polish).
- Changing how remote workers are spawned/dispatched.
[4] parent_set claude-interactive: Grouped under epic #263.
[5] review_requested claude-interactive: LIVE-FLEET SAFETY CRITICAL: this changes the daemon_agents/daemon_checkin keying the running orchestrator/ops/refine depend on. Adversarially verify: (1) existing role-derived daemons still upsert to a single stable row (no duplicate/orphan rows, heartbeat/reset/winddown intact); (2) the migration is idempotent + loses no live rows + safely swaps the unique constraint; (3) two distinct actor_refs (role=other) now create two rows. If there is ANY doubt the live fleet's check-in could break, HOLD for the operator — do not merge.
[6] status_change claude-interactive: (no body)

Recommended linked context:
{
    "todos": [],
    "scratchpads": []
}

Execution notes:
- Treat the brief as the source of truth.
- Keep work scoped to this dispatch request.
- Use brief_append / brief_update_status when reporting material progress; as your final dispatched-worker step, call brief_dispatch_complete with dispatch_request_id (or brief_id) and actor_ref.
- Codex workers should verify mutating Flower tools with tool_search query `brief_append brief_dispatch_complete flower_feedback` (limit 20) when tool availability is in doubt; report raw SEE/LOAD vs NOT visible instead of silently using local fallbacks.
- Add a git commit trailer `Brief: #278` to every commit for this brief so flower can exact-link commits back to the brief.
- Need an operator call while working this brief? A question ABOUT THIS BRIEF -> brief_ask(278, ...); a standalone decision not tied to the brief -> decision_ask(...). Both expose the full affordance set (confirm | single_choice | multi_choice | text, options + recommended, allow_write_in); prefer async questions over blocking and set is_blocking only when you truly cannot proceed.
- Cited-refs index (Brief #244): when a report / checkpoint / DONE summary cites numbered entities, append a compact `Refs:` block at the END mapping each `#N` to its REAL stored title + status — `#<num>: <title> (<status>)`, one per line (typed prefixes: bare `#N` = brief, `fb#N` feedback, `dec#N` decision, `sess#N` session, `dr#N` dispatch_request). Resolve any title you did not already pull with recall_refs instead of guessing; skip the block on trivial one-liners.

provenance · append-only

Trace

live

link added 4h ago
agent · system:commit-trailer
link added 4h ago
agent · system:commit-trailer
participant joined 4h ago
system · system:commit-trailer
note added 5h ago

**Merged + migrated (2026-07-05, claude-interactive).** Operator answered #74 = merge_now (15:48). The orchestrator drained the wake signal (#177) but did NOT execute the merge (moved on to dispatching #275/#276), so I executed it interactively (operator-approved merge_now; migration pre-checked safe — 0 dups / 0 null-refs). - Merged `flower/278-per-worker-roster-identity` → master (--no-ff merge commit). - `php artisan migrate --force` → migration DONE (73ms) on live MySQL. - New unique key live: `daemon_agents_actor_identity_unique (project_id, actor_ref, solo_process_id)` (old role-keyed index replaced). - Live daemon rows intact (3 — refine/ops/orchestrator, correct actor_ref+pid); no data loss. **Note on activation:** the DB keying is live now, but the running flower MCP servers + Horizon still cache the OLD (role-based) code until they reload — old code + new key coexist safely (updateOrCreate matches existing rows; verified in review). The NEW per-worker-identity behavior (distinct row per worker actor_ref) fully activates as those processes reload (daemon resets / MCP restarts / `flower:horizon-reload`).

agent · claude-interactive
operator answer 5h ago

{"choice":"merge_now"}
operator · operator:mike
participant joined 5h ago
system · operator:mike
agent question 9h ago

#278 (per-worker roster identity) is built + passed my adversarial review (backward-compat, reset-coexistence via retained solo_process_id, anti-hijack guard, idempotent non-destructive migration, tests green). The only thing left is EXECUTING the migration on live MySQL daemon_agents — the roster table every daemon (including me) depends on. My analysis says it's safe (non-destructive failure mode, no expected constraint violations), but I held the merge rather than run a live-roster migration autonomously while you slept. How do you want to proceed?

agent · flower-orchestrator
status change 9h ago
agent · flower-orchestrator
review passed 9h ago

PASS — Adversarial review PASS (code + migration safety). Verified: (1) Backward-compat — standing daemons resolve to canonical <project>-<role> (1:1 with role) so actor_ref keying is byte-identical to old role keying; live orchestrator/ops/refine heartbeat unaffected. (2) Reset-safety — solo_process_id RETAINED in the unique key (project_id,actor_ref,solo_process_id), so make-before-break predecessor(pid)+successor(null pid) coexist (NULLs distinct); the reset flow is preserved (traced against DaemonResetService.registerExpected). (3) Anti-hijack — a worker checking in as a standing role's ref (e.g. flower-ops) falls back to flower-other (traced forIdentity→normalize; a new guard beyond spec). (4) Migration — idempotent (Schema::hasIndex guards up/down), backfills only null/blank actor_ref to canonical (existing refs untouched, no row deletion), NON-DESTRUCTIVE failure mode (ADD UNIQUE fails cleanly on any pre-existing dup, no data loss); old-code daemons keep checking in through the index swap (updateOrCreate matches existing row, not index-dependent). Tests comprehensive (existing reset/roster/epic-baton tests unmodified + green; new tests cover all 3 acceptance criteria + anti-hijack + respawn-adopts-placeholder). **MERGE HELD FOR OPERATOR:** the live-MySQL migration was NOT run (only sqlite). Code is safe to merge; holding the live execution on the roster table the whole fleet depends on for the operator's go rather than running it autonomously overnight (asymmetric risk). See brief_ask.

agent · flower-orchestrator
status change 9h ago
agent · flower-278-worker
dispatched 9h ago

Dispatch request #145 marked done.
agent · flower-278-worker
note added 9h ago

BUILD COMPLETE on branch flower/278-per-worker-roster-identity (commit 107858e, off master 3587ff3). NOT merged — pending the mandatory adversarial review. Tests: full `php artisan test` GREEN (1284 passed / 3 pre-existing skips / 0 fail, sqlite). `pint` clean on all changed files. WHAT CHANGED - app/Support/DaemonActorRef.php: new `forIdentity(role, projectSlug, epicBriefId, callerActorRef)` — the stored/identity actor_ref. Standing roles (orchestrator/ops/refine/lead) → canonical `<project>-<role>` (ignores the caller ref, exactly as today). Worker (role=other) → its OWN caller ref; an empty/protected/daemon-resolvable ref falls back to `<project>-other`. - app/Services/DaemonRosterService.php: checkin() + registerExpected() now store the resolved identity ref (not always-canonical); `daemonIdentityQuery` / `nullIdPlaceholderForUpdate` / `mergeNullIdPlaceholders` / `daemonForCheckin` re-keyed from (role, project, epic) → (project_id, actor_ref). Added derived `kind` (standing|ephemeral) to daemonPayload + compactDaemonPayload. - Migration 2026_07_05_120000_key_daemon_agents_by_actor_ref.php: swaps unique key (role, project_id, solo_process_id) → (project_id, actor_ref, solo_process_id). Backfills null actor_ref → canonical. Idempotent (Schema::hasIndex guards), portable (sqlite), no row deletion. - DaemonSchemaReload: excludes role=other (workers aren't standing daemons). BackfillDaemonActorRef: skips role=other so it never clobbers a worker's identity ref. KEY DESIGN DECISION FOR THE REVIEWER — solo_process_id STAYS in the key. The spec says "key on (project_id, actor_ref)". I kept solo_process_id in the identity/constraint because make-before-break REQUIRES it: DaemonResetService spawns a successor via registerExpected(role, project) → same canonical actor_ref as the live predecessor, null pid. Predecessor (…, 'flower-orchestrator', 967) and successor (…, 'flower-orchestrator', NULL) must coexist as two rows during the overlap. Pure (project, actor_ref) keying would merge/reject the successor and BREAK reset. With solo_process_id retained, NULLs are distinct in the unique index so the two rows coexist exactly as before. For standing daemons canonical ref is 1:1 with role, so keying on actor_ref is byte-for-byte identical to the old role keying — heartbeat/reset/winddown untouched (verified by the existing DaemonRosterToolsTest / EpicLeadIdentityTest / EpicBatonCoexistenceTest, all still green unmodified). ADVERSARIAL POINTS TO CHECK 1. Backward-compat: standing daemons still one stable row (test_a_standing_daemon_actor_ref_repeated_checkins_stays_one_stable_row; existing DaemonRosterToolsTest unchanged/green). 2. Anti-hijack: a worker checking in AS 'flower-ops' does NOT adopt/overwrite the ops row — forIdentity falls it back to 'flower-other' (test_a_worker_cannot_hijack_a_standing_daemons_row + forIdentity unit tests). This is a NEW safety guard beyond the spec, added because actor_ref is now the sole key. 3. Migration: idempotent + preserves rows + backfills nulls + swaps the unique key + enforces it (test_migration_is_idempotent_and_preserves_rows_and_backfills_missing_actor_refs). Re-runnable up()/down() guarded on index existence. NOTE: I did NOT run migrate against MySQL (per dispatch); only sqlite via the test suite. down() re-adds the old role-keyed unique — best-effort on live data. 4. Respawn without reset: a standing daemon that gets a NEW pid with no expected placeholder creates a new row — but that was already the case under the old pid-keyed identity (unchanged), and the real reset path pre-registers the placeholder the successor adopts (test_a_respawned_standing_daemon_adopts_its_expected_placeholder_not_a_new_row). OUT OF SCOPE (left minimal): the nested node→LXC→agents tree view; rendering the `kind` marker in the /roster Livewire UI (marker is on the recall_roster MCP payload only). If the reviewer has ANY doubt about live-fleet check-in safety, HOLD for the operator — do not merge.

agent · flower-278-worker
link added 10h ago
agent · flower-278-worker
link added 10h ago
agent · flower-278-worker
status change 10h ago
agent · flower-278-worker
link added 10h ago
agent · flower-278-worker
link added 10h ago
agent · flower-278-worker
dispatched 10h ago

Dispatch request #145 claimed.
agent · flower-278-worker
participant joined 10h ago
system · flower-278-worker
dispatched 10h ago

Dispatch request #145 queued for flower.
agent · flower-orchestrator
status change 10h ago
agent · flower-orchestrator
participant joined 10h ago
system · flower-orchestrator
status change 10h ago
agent · claude-interactive
review requested 10h ago

LIVE-FLEET SAFETY CRITICAL: this changes the daemon_agents/daemon_checkin keying the running orchestrator/ops/refine depend on. Adversarially verify: (1) existing role-derived daemons still upsert to a single stable row (no duplicate/orphan rows, heartbeat/reset/winddown intact); (2) the migration is idempotent + loses no live rows + safely swaps the unique constraint; (3) two distinct actor_refs (role=other) now create two rows. If there is ANY doubt the live fleet's check-in could break, HOLD for the operator — do not merge.

agent · claude-interactive
parent set 10h ago

Grouped under epic #263.
agent · claude-interactive
plan proposed 10h ago

# Per-worker roster identity — workers as first-class roster citizens **Overnight auto-dispatch, MANDATORY review-gated (touches the LIVE fleet's roster model).** Parent epic #263. ## Problem (proven tonight) `daemon_agents` / `daemon_checkin` key the roster row by **role+project** (one row per role per project). Distinct workers with the same role collapse into one shared row: three flower-only workers (`flower-only-tester`, `flower-only-tester-2`, `lxc-claude-agent-1`) all mapped to a single `flower-other` row — only the audit trail kept their real actor_refs. This blocks the operator's target: **"the roster = anything working that flower knows about."** For a fleet of remote LXC workers (M LXCs × N agents) each needs its own roster presence. ## Goal Key the roster row on **actor_ref within project** so each distinct worker gets its own row — while existing standing daemons keep working **exactly** as today. ## Spec 1. Change `daemon_checkin` / the daemon_agents keying so upsert is by `(project_id, actor_ref)` rather than `(project_id, role)`. Each distinct `actor_ref` → its own row. 2. **BACKWARD-COMPATIBILITY (load-bearing):** the standing daemons' actor_refs are already role-derived + unique per project (`flower-orchestrator`, `flower-ops`, `flower-refine`, `lounge-orchestrator`, …). Keying on actor_ref must keep each as a single stable row — verify their check-ins are byte-for-byte unaffected (same row reused; heartbeats / reset / winddown all intact). 3. **Migration:** safe, idempotent migration over existing `daemon_agents` data; change the unique constraint from role-keyed to actor_ref-keyed without data loss on live rows. 4. **Roster read:** `recall_roster` shows all worker rows. Optionally add a lightweight `kind`/`ephemeral` marker to distinguish standing daemons from ephemeral workers so the roster stays legible — keep minimal, do not over-scope. 5. The `role` enum stays (workers use `other`); identity now comes from actor_ref, not role. ## Acceptance - `php artisan test` green, new tests: (a) two check-ins with distinct actor_refs (role=other) → **two** rows; (b) an existing role-based daemon (actor_ref `flower-orchestrator`) checking in repeatedly → still **one** stable row; (c) migration idempotent + preserves live rows. - **Live-fleet safety:** must not disrupt the running orchestrator/ops/refine check-in / heartbeat / reset. The reviewer must confirm. - `./vendor/bin/pint` clean. ## MANDATORY review Modifies the daemon/roster model the LIVE fleet depends on. `brief_request_review` force is set — MUST pass an independent adversarial review (backward-compat + migration safety) before it can complete/merge. If the reviewer has ANY doubt about live-fleet safety, hold for the operator. ## Non-goals - The nested {node → LXC/worktree → agents} tree view (later polish). - Changing how remote workers are spawned/dispatched.

agent · claude-interactive
note added 10h ago

Key daemon_agents / daemon_checkin on actor_ref (within project) so N distinct workers surface as N roster rows instead of collapsing to one role-derived slot (proven collapse tonight: two flower-only workers → one shared flower-other row). MUST be backward-compatible: existing standing daemons (flower-orchestrator/ops/refine, lounge-*) keep checking in with ZERO disruption. Realizes Mike's target: "the roster = anything working that flower knows about." Overnight, MANDATORY review-gated (touches the live fleet's roster model — must not merge without a passing adversarial review confirming backward-compat + migration safety). Full spec to follow.

agent · claude-interactive
participant joined 10h ago
system · claude-interactive

epic · dependencies

Relationships

epic parent

Decouple agent/process management from Solo → flower-native fleet (local-first, then remote)

depends on

No dependencies — dispatchable once planned.

agents · waves

Participants

claude-interactive participant · active
flower-orchestrator reviewer · active
flower-278-worker participant · active
operator:mike participant · active
system:commit-trailer participant · active

trace · graph

Projects

flower · primary

dogfood · read-only

Agent’s-eye view

The literal recall_brief payload an agent gets — same service path as the MCP tool.