flower
/
All briefs
in progress draft note flower

Decouple agent/process management from Solo → flower-native fleet (local-first, then remote)

Dispatch

canonical · plan

Spec

markdown

hand-off · dispatch

Dispatch

Auto-dispatch

when it reaches planned

Design-loop

design pass before build

kind

No dispatch requests yet — dispatch above to generate a copy-paste packet.

provenance · append-only

Trace

live
or paste a screenshot uploading…
  1. link added 1h ago
    agent · flower-orchestrator
  2. merged 4h ago

    Merged source briefs: #279 Remote agents on Proxmox: flower-proxy for Solo-equivalent tooling + coordination over tailnet

    agent · flower-refine
  3. refinement 4h ago

    **Folded in from #279 (dedup — refine, 2026-07-05).** The orchestrator seeded #279 ("Remote agents on Proxmox: flower-proxy over tailnet") from the completed Proxmox investigation; it converged on the SAME architecture as this epic (flower is the brain/bus; don't proxy Solo; give remote agents the flower `/mcp` surface + scoped identity). Merging #279 → #263. Its two non-redundant framings, carried forward: 1. **Remote LXC = a new environment/isolation class → ties #275.** Each remote worktree-LXC is another environment to track alongside local git worktrees (own FS/procs/DB vs shared). The worktree/environment-isolation-metadata brief (#275) should model environments generically (local-worktree | remote-LXC), not just Mac-local worktrees — so dispatch/verification decisions work identically for a remote LXC and a local isolated worktree. 2. **Coordination-parity when Solo is unreachable — now with LIVE EVIDENCE (validates the whole thesis).** #279 asked "what breaks when a remote agent can't reach Solo (timers/pokes)?" As of 2026-07-05 we have a concrete answer from a REAL outage: Solo's timer subsystem is currently broken fleet-wide (`timer_set`/`timer_list` fail with `-32602 metadata must be at most 16384 bytes when serialized` — ~1330 accumulated unpurged timers overflow a 16 KB response envelope; flower bug #130). During it, the fleet stayed alive because the **flower-native** heartbeat (`php artisan flower:daemon-checkin`, HTTP-reachable) kept working, while the **Solo-native loop timer / queued-poke transport** did NOT — daemons bridged their wake via a non-Solo mechanism (harness ScheduleWakeup). This is exactly the remote failure mode #279 anticipated, happening locally: **the loop-wake/poke transport is the last Solo dependency in the coordination path.** Concrete requirement for this epic: a **flower-native heartbeat + wake/poke** so an agent (remote, OR local when Solo degrades) stays a live roster citizen without any Solo timer. The daemon_checkin COMMAND already proves the heartbeat half is Solo-independent; the missing half is a durable flower-driven wake to replace the Solo loop timer / `timer_set` poke.

    agent · flower-refine
  4. participant joined 4h ago
    system · flower-refine
  5. note added 4h ago

    **Morning status (2026-07-05, operator back).** Overnight autonomous run: - **#277 (/mcp auth): COMPLETE + MERGED** to master (commit f2f9b8f + merge c291d05). Config-gated bearer middleware on `Mcp::web('/mcp')`; stdio untouched; open until `FLOWER_MCP_HTTP_TOKEN` set (no-op backward-compat). Adversarial review PASS; 5 new tests + full suite green. - **#278 (per-worker roster identity): BUILT + adversarial review PASS, MERGE HELD for operator** (branch `flower/278-per-worker-roster-identity`, commit 107858e — NOT merged to master). The worker/orchestrator correctly refused to run the live `daemon_agents` migration autonomously (asymmetric risk) → **open decision #74** (hold / merge-now / merge-check-dups). Design: unique key (project_id, actor_ref, solo_process_id) — kept `solo_process_id` so make-before-break reset still works; anti-hijack guard (a worker can't adopt a standing daemon's row); idempotent non-destructive migration; added a `kind` (standing|ephemeral) marker to the recall_roster payload. - **Migration pre-check (this session):** live `daemon_agents` = 11 rows, **ZERO** violating dups, **ZERO** null/empty actor_refs → the migration is safe to run. - **Fleet healthy:** orchestrator reset cleanly overnight (now roster id 53), all 3 daemons (orchestrator/ops/refine) live, roster clean. - Held infra items unchanged (Pi pass-2 / PHP / #110 tailscale-serve / thin_pool_autoextend). LXC artifacts intact (CT 310 template + CT 311 flower-wt1 running). - **Recommendation:** merge `flower/278` + run the migration now (operator present + pre-check clean), then verify fleet check-in. Resolves #74.

    agent · claude-interactive
  6. note added 9h ago

    **Overnight auto-dispatch set up (2026-07-05, operator asleep).** Two safe, review-gated code briefs flagged `auto_dispatch_on_planned=1` + `planned` for the orchestrator to auto-pick-up: - **#277** flower /mcp HTTP auth (config-gated bearer middleware) — closes the unauth-`/mcp` gap; the server half of standing remote reach. - **#278** per-worker roster identity — workers as first-class roster rows (the `flower-other` collapse fix). Both have `brief_request_review` **force** → MANDATORY adversarial review before merge. Worst case = implemented + **held at in_review** for the operator (safe failure mode; nothing merges blind). Fleet confirmed live + auto-dispatching (#248 claimed at set-up time); cap = 4 concurrent. **Held for the operator (NOT auto-dispatched — need creds / host-infra / supervision):** Pi pass-2 (needs Pi model keys), PHP + re-template the base (host op), `tailscale serve` flower.test / #110 (infra + security call), `thin_pool_autoextend` on the R710 (host). #110 is separately `planned` but NOT auto-flagged. **Dependency:** the fleet runs on MAIN — the Mac must stay awake for the orchestrator to work overnight. **Live artifacts on the R710:** `flower-agent-base` template (CT 310) + `flower-wt1` linked clone (CT 311, running).

    agent · claude-interactive
  7. note added 11h ago

    **Target architecture (Mike, 2026-07-05) — parallel worktree-environments.** - **LXC per worktree**, each an isolated environment running **1+ flower-only agents**. The `flower-agent` base image (#270) becomes a **Proxmox template**; clone it per worktree to spin up parallel environments fast. - On homelab-desktop's **ZFS**, **linked clones from the template are near-instant + space-efficient** → cheap parallel envs. Build path: full-clone a base → provision → `pct template` → **linked-clone per worker**. - This is the **worktree-manager pattern lifted to the LXC level**: stronger isolation than Mac-local git worktrees (separate FS/procs/resource caps, disposable). - **1 agent/worktree = clean default** (no shared-tree edit conflicts); **N/LXC supported** but shared-checkout agents must coordinate edits (or use sub-worktrees inside the LXC). - **Reinforces the per-worker-identity requirement (#265):** M LXCs × N agents = M×N workers, all needing distinct roster identities. Target roster view = the tree `node → LXC/worktree → agents`. This is now a load-bearing prerequisite for the parallel vision, not a nicety. - **Correction:** Claude Code needs **node ≥18 (use node 22 LTS)**; the ≥24.15 in earlier notes was the my-pi/pi-extension constraint, not Claude Code — don't over-provision the base on that account.

    agent · claude-interactive
  8. status change 16h ago
    agent · claude-interactive
  9. plan proposed 16h ago

    # Epic — Decouple agent/process management from Solo → flower-native fleet ## Vision Move flower's fleet so a coding agent's **coordination** no longer depends on Solo's MCP. Then an agent can run **anywhere** — a local worktree today, a remote Proxmox LXC later — while **flower is the single brain/bus**. Solo stays as the operator's local terminal UI and (optionally) the process spawner, but is no longer *required* for an agent to be a first-class fleet participant. ## Why now We want build agents off the laptop (Proxmox homelab) to reclaim local resources. Investigation (2026-07-04 design session) showed the true prerequisite for "remote" is **not networking** — it's **decoupling agent coordination from Solo**, which is identical work whether the agent is local or remote. So prove it locally first; it's immediately useful on its own. ## Key verified findings (2026-07-04 session) 1. **Solo has two separate planes.** (a) The agent toolset `mcp__solo__*` is a **macOS-only Rust stdio binary** (`/Applications/Solo.app/Contents/MacOS/mcp`) over a **unix domain socket + local SQLite**, registered globally in `~/.claude.json`; it has **no HTTP client**. (b) The HTTP API `127.0.0.1:24678` (bearer token; used by `solo-cli` + flower's `SoloClient`) is a *different* plane exposing only projects/processes/scratchpads/todos — **no locks/kv/timers/send_input** (those are MCP-only). Consequence: you cannot tunnel a remote agent onto Solo's *toolset* plane the way Solo Discord users tunnel the HTTP plane on Windows — on macOS the agent-MCP is a unix socket + a macOS binary that can't run on Linux. 2. **A decoupled agent does not need Solo's toolset.** flower's `/mcp` (`Mcp::web('/mcp', FlowerServer::class)`, `routes/ai.php` — built "for remote/web MCP clients") already covers every *worker* coordination need: identity/roster (`daemon_checkin`, `recall_roster`, `recall_active`), durable work + status + deps + provenance (`brief_*`), single-writer claim (`signal_claim`, `brief_claim`), async messaging (`daemon_poke`, `signal_*`, `decision_*`), dispatch/handoff (`brief_dispatch`). The only Solo-only tools (`send_input`, `get_process_output`, in-band `spawn_agent`, PTY timers) act **only on Solo PTYs on the Mac** — orchestrator functions, not worker functions; a worker delegates them via the dispatch queue. 3. **A flower→Solo proxy is possible but not the mechanism.** flower already owns the transport (`SoloClient` shells `solo-cli --json`), but **attribution is impossible** — no Solo write endpoint accepts an on-behalf-of actor, so every proxied write collapses to flower's identity (soft attribution only), and it can't proxy locks/kv/timers/send_input (MCP-only). At most optional polish (mirror todos/scratchpads into Solo's UI); never the core. 4. **Subscription auth for remote Claude Code is clean; Codex is the weak link.** `claude setup-token` → a 1-year `CLAUDE_CODE_OAUTH_TOKEN` env var bills the Pro/Max **subscription**, set-and-forget (do NOT copy `~/.claude/.credentials.json` — headless auto-refresh bug #50743). Codex CLI has **no headless subscription-token** (OpenAI recommends API keys for programmatic use) → keep Codex local or accept API billing remotely. Subscription **quota is itself a concurrency ceiling** (N agents burn N× the shared Max pool; verify the 2026-06-15 Agent-SDK separate-credit change against the account). 5. **Homelab substrate exists** (read from docs, not live-verified): two non-clustered Proxmox nodes on Tailscale — R710 (96GB ECC, slow 2009 Xeon → RAM-rich, good for I/O-bound workspaces) and AMD 5800X3D (fast, NVMe, RTX 4070S, but 32GB non-ECC + 3 documented lockups → cap at 1-2 build workspaces, fix Redis CT 206 maxmemory first). Clonable LXC templates already exist (`cog-builder` w/ Docker; `lounge-worker` toolchain). flower.test-over-Tailscale is planned but maybe not live (brief #110). ## Structure — two tracks (so nothing false-blocks) - **Track A (critical path) — coordination decoupling.** An agent coordinates purely via flower `/mcp`, no Solo MCP. **Phase 1 (local, #264) → Phase 2 (remote, #265).** - **Track B (parallel, optional) — flowerd process supervision.** flower becomes "the hands" (spawn/kill/supervise), replacing Solo's process layer (todos #334–336). **Explicitly does NOT gate Track A or remote** — Solo can keep spawning the ssh PTY. Pursued for its own value (operator independence from Solo), on its own schedule. No child brief yet. ## Phase map - **Phase 1 — Local flower-managed agents (#264).** ← start here. Feasibility spike then a local harness session coordinating entirely via flower `/mcp`. Proves the thesis, immediately dogfoodable. - **Phase 2 — Remote agent environment (#265).** Depends on #264. Proxmox LXC + Tailscale + subscription auth + transcript ship-back. Carries findings #4/#5. - **Track B / flowerd.** Future, optional, parallel; not gated. ## Non-goals - Replacing Solo's operator UI (the interactive terminal grid — genuinely valuable, stays). - Building flowerd before Phase 1 proves coordination-decoupling. - A general, faithful flower→Solo tool proxy (attribution makes it unfaithful). ## Open questions (retired progressively via phase spikes, not up-front) - Does `/mcp` actually serve callable tools end-to-end? (Phase 1 spike #1 — LINCHPIN) - Auth seam on `/mcp` (currently none) — tailnet-only + bearer. (Phase 1 spike #2 / build) - Is flower.test live over Tailscale yet, or still brief #110? (Phase 1 spike #3) - Real current homelab capacity + reachability. (Phase 1 spike #4, feeds Phase 2) - Remote transcript ingestion into recall (else remote agents' own work is invisible). (Phase 2) - Subscription concurrency vs quota / ToS. (Phase 2) ## Provenance Distilled from the 2026-07-04 design session (Mike + claude-interactive): initial homelab/Solo/coupling recon, Solo two-plane + injection verification, the DiGi/Solo-Discord token-rotation evidence, coordination-overlap analysis, and the Claude Code headless subscription-auth check.

    agent · claude-interactive
  10. note added 16h ago

    Epic: decouple a coding agent's COORDINATION (and optionally its process supervision) from Solo, so agents can run anywhere while flower is the single brain/bus. Track A (critical path): coordination decoupling — Phase 1 local → Phase 2 remote Proxmox. Track B (parallel, optional): flowerd process supervision (replaces Solo's spawn/kill; does NOT gate Track A). Captures verified findings from the 2026-07-04 design session. Full spec to follow.

    agent · claude-interactive
  11. participant joined 16h ago
    system · claude-interactive

epic · dependencies

Relationships

epic parent

depends on

No dependencies — dispatchable once planned.

agents · waves

Participants

  • claude-interactive participant · active
  • flower-orchestrator participant · active
  • operator:mike participant · active
  • flower-refine participant · active

trace · graph

Links

  • Scratchpad #399 execution

scope

Projects

  • flower · primary

dogfood · read-only

Agent’s-eye view

The literal recall_brief payload an agent gets — same service path as the MCP tool.