flower
/
All briefs
planned draft note flower

LLM/Model/Provider routing decisions/guidance Just something I want t

Dispatch

canonical · plan

Spec

markdown

hand-off · dispatch

Dispatch

Auto-dispatch

when it reaches planned

Design-loop

design pass before build

kind

No dispatch requests yet — dispatch above to generate a copy-paste packet.

provenance · append-only

Trace

live
or paste a screenshot uploading…
  1. link added 1d ago
    agent · flower-refine
  2. link added 2d ago
    agent · flower-refine
  3. status change 2d ago
    agent · flower-refine
  4. refinement 2d ago

    ## Objective Design + recommendation (brainstorm brief, NOT implementation): a **central guidance model for routing work** to the right LLM/model/provider (and effort tier) by task type + quality/speed/cost needs — so we can start using Pi and other models (openrouter, deepseek, etc.) where xhigh Codex / Opus-4.8 quality isn't required. Deliverable is a recommendation to **discuss before building**. ## Problem / current state (grounding-corrected 2026-07-03) Routing today is ad hoc AND only partial — corrected against the code: - **Config-as-data already exists** in `config/flower.php`: `summarize.models[]` (`{key,label,provider,model,mode,cost_per_mtok}`; the *active* summarizer = the `prompt_templates(kind=segment_summary)` row), `briefs.default_model` (= `deepseek/deepseek-v4-flash`), and `pricing.models{}` (per-model in/out/cache rates). Embedder models live in the `embedders` table. - A per-brief `model` override column exists — but it is consumed **ONLY by `BriefRefiner::resolveModel()`** (the in-process refine LLM call). **Dispatch and spawn ignore it entirely** — they select only a **harness (`agent_tool`)**, never a model or effort tier. - So there is **no per-task-type routing today, and no model/effort selection at dispatch/spawn at all.** This brief **INTRODUCES** routing; it does not centralize something already distributed. Framing it as "introduce" (not "centralize") is important for the recommendation. - The Solo agentTool id map (Claude=3 / Codex=4 / Pi=10 / …) is real; harness selection persists a "last-chosen" per project/actor (`SoloAgentToolService`). ## Design questions (recommend on each) 1. **Routing taxonomy.** Dimensions (task type — orchestrate / refine / design / code / review / summarize; required quality tier; latency + cost; risk) → a proposed default routing table/tree (task → model/provider + effort + harness). 2. **Representation + where it lives.** A versioned doc, config-as-data (`config/flower.php` style), a DB table, and/or a UI. How agents CONSULT it at decision time (charter guidance vs a tool/endpoint). Concrete substrate: **extend the refine-only `brief.model` resolution + the config model settings toward dispatch/spawn** (which today have no model concept). 3. **Provider/model inventory + capability tags.** Claude tiers, Codex, Pi/Composer, openrouter models like deepseek — tag each with strengths / cost / speed. 4. **"Good enough" testing.** A lightweight way to trial a cheaper model on a task class and judge quality (ties to #143 adversarial review as a quality gate). ## Prerequisite role (grounding 2026-07-03) **#142 is effectively a prerequisite for the model-forcing bits of #41 (high-effort Deep Review) and #143 (independent reviewer model).** Both assume you can select a model/effort at dispatch — which does not exist yet (dispatch/spawn pick only a harness). Either sequence #41/#143's model-forcing behind #142, or scope model-forcing out of them. ## Deliverable A recommendation covering **(routing taxonomy + default table/tree, representation + where it lives + how agents consult it, provider/model capability inventory, a "good-enough" testing approach)** + a small proposed change set as thin slices. To discuss before building. ## Constraints / non-goals - Brainstorm/discovery — get it on record + recommend; do NOT build yet. - Keep it incremental — a guidance doc/config first, not a routing engine. - Implementation out of scope here → follow-up brief(s) after discussion. ## Definition of done Routing guidance recommendation written up for operator discussion; brief ready for the build decision.

    agent · flower-refine
  5. spec snapshot 2d ago

    ## Objective Design + recommendation (brainstorm brief, NOT implementation): a **central guidance model for routing work** to the right LLM/model/provider (and effort tier) by task type + quality/speed/cost needs — so we can start using Pi and other models (openrouter, deepseek, etc.) where xhigh Codex / Opus-4.8 quality isn't required. Deliverable is a recommendation to **discuss before building**. ## Problem / intent Today routing is ad hoc: we piggyback on the Solo playbook's Claude-vs-Codex heuristic, always run Claude as orchestrator, and have a per-brief `model` override. The operator wants a central, explicit routing guidance (document / decision tree / priority set) that says where each type of work goes and how it's handled — and to test whether cheaper/faster models (e.g. deepseek v4 flash, Pi/Composer) are "good enough" for some task classes. ## Design questions (recommend on each) 1. **Routing taxonomy.** What dimensions drive routing (task type — orchestrate / refine / design / code / review / summarize; required quality tier; latency + cost sensitivity; risk)? Produce a proposed default routing table/tree mapping (task → model/provider + effort). 2. **Representation + where it lives.** A versioned doc, a config-as-data table (`config/flower.php` style), a DB table, and/or a UI. How do agents CONSULT it at decision time (charter guidance vs a tool/endpoint)? Substrate to build on: the existing per-brief `model` override + `config/flower.php` model settings + the Solo agentTool id map (Claude/Codex/Pi/…). 3. **Provider/model inventory + capability tags.** Which providers/models are in play (Claude tiers, Codex, Pi/Composer, openrouter models like deepseek) and how to tag each with strengths / cost / speed. 4. **"Good enough" testing.** A lightweight way to trial a cheaper model on a task class and judge quality (ties to #143 adversarial review as a quality gate). ## Investigation scope (light) - Current model-config surface: `config/flower.php` (`summarize.*` + any model settings), the per-brief `brief_model` override, the Solo agentTool id map. - How briefs/dispatch already select a model (`brief.model`) — extend vs central policy. - Solo playbook routing heuristics currently in use. ## Deliverable A recommendation covering **(routing taxonomy + default table/tree, representation + where it lives + how agents consult it, provider/model capability inventory, a "good-enough" testing approach)** + a small proposed change set as thin slices. To discuss before building. ## Constraints / non-goals - Brainstorm/discovery — get it on record + recommend; do NOT build yet. - Keep it incremental — a guidance doc/config first, not a routing engine. - Implementation out of scope here → follow-up brief(s) after discussion. ## Definition of done Routing guidance recommendation written up for operator discussion; brief ready for the build decision.

    system · flower-refine
  6. refinement 2d ago

    ## Objective Design + recommendation (brainstorm brief, NOT implementation): a **central guidance model for routing work** to the right LLM/model/provider (and effort tier) by task type + quality/speed/cost needs — so we can start using Pi and other models (openrouter, deepseek, etc.) where xhigh Codex / Opus-4.8 quality isn't required. Deliverable is a recommendation to **discuss before building**. ## Problem / intent Today routing is ad hoc: we piggyback on the Solo playbook's Claude-vs-Codex heuristic, always run Claude as orchestrator, and have a per-brief `model` override. The operator wants a central, explicit routing guidance (document / decision tree / priority set) that says where each type of work goes and how it's handled — and to test whether cheaper/faster models (e.g. deepseek v4 flash, Pi/Composer) are "good enough" for some task classes. ## Design questions (recommend on each) 1. **Routing taxonomy.** What dimensions drive routing (task type — orchestrate / refine / design / code / review / summarize; required quality tier; latency + cost sensitivity; risk)? Produce a proposed default routing table/tree mapping (task → model/provider + effort). 2. **Representation + where it lives.** A versioned doc, a config-as-data table (`config/flower.php` style), a DB table, and/or a UI. How do agents CONSULT it at decision time (charter guidance vs a tool/endpoint)? Substrate to build on: the existing per-brief `model` override + `config/flower.php` model settings + the Solo agentTool id map (Claude/Codex/Pi/…). 3. **Provider/model inventory + capability tags.** Which providers/models are in play (Claude tiers, Codex, Pi/Composer, openrouter models like deepseek) and how to tag each with strengths / cost / speed. 4. **"Good enough" testing.** A lightweight way to trial a cheaper model on a task class and judge quality (ties to #143 adversarial review as a quality gate). ## Investigation scope (light) - Current model-config surface: `config/flower.php` (`summarize.*` + any model settings), the per-brief `brief_model` override, the Solo agentTool id map. - How briefs/dispatch already select a model (`brief.model`) — extend vs central policy. - Solo playbook routing heuristics currently in use. ## Deliverable A recommendation covering **(routing taxonomy + default table/tree, representation + where it lives + how agents consult it, provider/model capability inventory, a "good-enough" testing approach)** + a small proposed change set as thin slices. To discuss before building. ## Constraints / non-goals - Brainstorm/discovery — get it on record + recommend; do NOT build yet. - Keep it incremental — a guidance doc/config first, not a routing engine. - Implementation out of scope here → follow-up brief(s) after discussion. ## Definition of done Routing guidance recommendation written up for operator discussion; brief ready for the build decision.

    agent · flower-refine
  7. spec snapshot 2d ago

    LLM/Model/Provider routing decisions/guidance Just something I want to get on record/start brainstorming how we might implement it - right now I presume we're piggybacking off of our Solo playbook about whether to send work to Claude or Codex sessions - we always use Claude as our orchestrator... but I'd like to start testing with Pi and some other models via openrouter or other providers. Either for things like added speed where we don't need xhigh/codex/opus 4.8 levels of quality as well as to do some testing in terms of if models like deepseek v4 flash are "good enough" for some tasks/what those might be/etc. I think it would make sense to have some sort of central guidance/document/tree/priority set in terms of where we send what type of work and how that's all routed/handled. Thoughts on that?

    system · flower-refine
  8. participant joined 2d ago
    system · flower-refine
  9. status change 2d ago
    agent · operator:mike
  10. note added 2d ago

    LLM/Model/Provider routing decisions/guidance Just something I want to get on record/start brainstorming how we might implement it - right now I presume we're piggybacking off of our Solo playbook about whether to send work to Claude or Codex sessions - we always use Claude as our orchestrator... but I'd like to start testing with Pi and some other models via openrouter or other providers. Either for things like added speed where we don't need xhigh/codex/opus 4.8 levels of quality as well as to do some testing in terms of if models like deepseek v4 flash are "good enough" for some tasks/what those might be/etc. I think it would make sense to have some sort of central guidance/document/tree/priority set in terms of where we send what type of work and how that's all routed/handled. Thoughts on that?

    operator · operator:mike
  11. participant joined 2d ago
    system · operator:mike

epic · dependencies

Relationships

epic parent

depends on

No dependencies — dispatchable once planned.

agents · waves

Participants

  • operator:mike participant · active
  • flower-refine participant · active

trace · graph

Links

  • Scratchpad #378 execution
  • Scratchpad #375 execution

scope

Projects

  • flower · primary

dogfood · read-only

Agent’s-eye view

The literal recall_brief payload an agent gets — same service path as the MCP tool.