flower
/
All briefs
complete draft note flower

Adversarial reviews I think I'd like to set things up so that we can

canonical · plan

Spec

markdown

hand-off · dispatch

Dispatch

Auto-dispatch

when it reaches planned

Design-loop

design pass before build

This brief is complete — dispatch is closed.

#55 done fresh flower · flower/143-adversarial-review-gate
agent: claude 1 scratchpad
You are being dispatched from flower Brief #143: Adversarial reviews

I think I'd like to set things up so that we can

Recall pointer:
- Use recall_brief with id 143 for the full folder if you need provenance.

Target:
- project: flower (/Users/mikeferrara/Documents/code/flower)
- branch: flower/143-adversarial-review-gate
- worktree: not specified
- kind: fresh

Current brief spec:
## Objective
Give the orchestrator an **optional** ability to trigger an **adversarial review** (a second, skeptical agent trying to refute "done") on a dispatched brief's finished workset before acceptance, plus a configurable **ruleset overlay** that can auto-force a review for classes of risky work. This is a **design/discovery** brief; its four open design decisions are now **settled** (operator, Q28–31, 2026-07-03) and folded in below. Implementation is separate, sliced follow-up brief(s).

## Problem / intent
When a dispatched worker reports its workset "finished," we currently accept/merge on the orchestrator's judgment alone. The operator wants the option to fire an adversarial review on that finished workset before acceptance, and — eventually — to codify conditions under which a review is auto-forced (discovered as we build).

## Settled decisions (operator answers, Q28–31)
1. **Trigger & defaults (Q28) — MANUAL-FIRE ONLY at first; NO auto-force rules enabled on ship.** Build the review-trigger mechanism as orchestrator-fired. For now the **orchestrator decides case-by-case** what is important / large / sensitive enough to warrant a review, while the mechanism is fleshed out and tested. The auto-force ruleset is *scaffolded* (config-as-data, below) but ships with **all rules disabled**. Re-review which conditions to auto-enable once the flow is polished. — *This supersedes the design proposal's recommendation to ship the `migrations` rule enabled.*
2. **Ruleset storage (Q29) — config-as-data in `config/flower.php` now, UI later.** A `review.rules[]` array (matching the `summarize.models[]` / `pricing.models{}` precedent), env-overridable; a per-brief `briefs.meta.review` flag is the manual override; a toggles UI is deferred to a later slice.
3. **Holding state (Q30) — new `BriefStatus::InReview` case.** A distinct status with its own `/briefs` badge & filter (clearer than a meta-flag hold at Dispatched).
4. **On FAIL (Q31) — auto-bounce.** On `REQUEST_CHANGES` / `BLOCK`, re-dispatch a fix to the **original worker** (Resume) with the findings injected; the failing verdict **blocks `Complete`** until a later review passes.

## The review itself (from the design proposal, trace event #1476)
- A **spawned adversarial agent** (there is no in-app review service). Default scope = the **workset diff** (not full re-verify); mandate = **refute "done"** across correctness / security / regression; returns a structured **`PASS | REQUEST_CHANGES | BLOCK`** verdict + findings `[{file,line,severity,claim,refutation}]`. Reuses the `code-review` / `security-review` / `verify` harness skills as its playbook. Full re-verify is an escalation reserved for high-risk cases.
- **Independence:** reviewer **harness** ≠ worker harness works **today** via `dispatch_requests.agent_tool`; reviewer **model/effort** independence needs **#142** (routing) and is deferred to a later slice — *not* a v1 blocker. Reviewer auto-attaches as `BriefParticipantRole::Reviewer` (already exists).

## Lifecycle integration
- **One gate** inside `DispatchService::recomputeRollup()` — the single chokepoint that flips a brief `Dispatched → Complete` (the `hasCompletedDispatch` branch, `app/Services/Briefs/DispatchService.php`). Before `setStatus(Complete)`: `if (reviewRequired(brief) && !reviewPassed(brief)) → holdForReview()` (→ `InReview`); else `Complete`.
- **v1 `reviewRequired`** = manual orchestrator fire **OR** per-brief `meta.review.force` (no auto-force rules enabled yet). `reviewPassed` = latest verdict is `PASS`.
- **On PASS** → append `review_passed` → `recomputeRollup` proceeds to `Complete`. **On FAIL** → auto-bounce to the original worker (Resume) with findings, append `review_failed`, block `Complete`.
- **Merge safety:** MAIN merges are orchestrator-mediated and gated on `Complete`, so gating `Complete` on review gates the merge too — no separate merge hook.
- **Recording:** new `review_requested | review_passed | review_failed` brief events; optional durable `brief_reviews` artifact table `{brief_id, dispatch_request_id, reviewer_ref, reviewer_harness, verdict, findings}`.

## Build plan (PR-sized thin slices; adjusted for Q28)
- **Slice 0 — reviewer-loop spike (no gate):** `brief_review` MCP tool + review packet; orchestrator fires manually on a finished brief; verdict recorded as events. Proves the loop with today's harness-independence. *No #142.*
- **Slice 1 — the gate:** `reviewRequired()` / `reviewPassed()` / `holdForReview()` in `recomputeRollup()`; `briefs.meta.review` flag; new `BriefStatus::InReview`; manual trigger; PASS→Complete, FAIL→auto-bounce-to-worker. *No #142.*
- **Slice 2 — ruleset scaffold (ALL rules DISABLED per Q28):** `flower.review.rules[]` + a `ReviewRuleEvaluator` (reads `Commit.files`, `agent_tool`, `meta`) capable of setting `review_required` — but ship with **no rule enabled**; orchestrator judgment stays the only trigger until we re-review. *No #142.*
- **Slice 3 — artifact + UI:** `brief_reviews` table; `/briefs` `InReview` badge + findings panel; small rules-toggle page + per-brief toggle. *No #142.*
- **Slice 4 — model independence:** once #142 lands, force an independent reviewer **model + effort** (peer-or-higher tier, different provider), not just harness. *Requires #142.*

Slices 0–3 deliver a fully useful **manual** adversarial-review gate with harness-level independence **without #142**; only the distinct-MODEL guarantee waits on it.

## Dependencies
- **#142 (model/effort routing)** — blocks **only Slice 4** (distinct reviewer MODEL). Harness-independence is available now. Sequence Slice 4 behind #142; ship 0–3 first.

## Constraints / non-goals
- Capability stays **optional / opt-in**; auto-force is a configurable overlay that **ships fully disabled** (Q28).
- Implementation lands via the sliced follow-up brief(s); **this brief is the settled design → build decision**, not the build itself.

## Definition of done
Design settled with the operator's Q28–31 decisions folded in; brief at **`planned`**, ready for the orchestrator/operator **build decision** (dispatch Slice 0/1, or spin per-slice implementation briefs).

Recent/key trace events:
[20] status_change flower-orchestrator: (no body)
[21] note_added flower-orchestrator: ↩️ Reopened by operator request (2026-07-03). This brief was prematurely marked complete by a prior orchestrator at 14:40:53 while its 4 operator questions (Q28–31, posted by the worker at 14:36) were still open — see bug #71. Reopened to in_progress so the operator can answer Q28–31. The 4 questions remain status=open and now surface again in the Open Questions view/badge (which #154 is realigning).
[22] link_added flower-refine: (no body)
[23] operator_answer operator:mike: None — manual-fire only at first, no auto-force, let's let the orchestrator decide for the time being what's important enough or large enough or touches sensitive enough stuff to warrant the review process while we get it all fleshed out and tested. Once polished we can re-review the automated conditions.
[24] operator_answer operator:mike: Config-as-data in config/flower.php now, UI later (recommended)
[25] operator_answer operator:mike: New BriefStatus::InReview case (clearer on /briefs; recommended)
[26] operator_answer operator:mike: Auto-bounce: re-dispatch a fix to the original worker with findings (recommended)
[27] status_change flower-orchestrator: (no body)
[28] spec_snapshot flower-refine: ## Objective
Design + recommendation (brainstorm brief, NOT implementation): give the orchestrator an **optional** ability to trigger an adversarial review on an incoming "finished" workset, plus a configurable **rule set of conditions that auto-force** such a review. Deliverable is a design proposal to **discuss before building**.

## Problem / intent
When a dispatched brief's worker reports its workset "finished," we currently accept/merge on the orchestrator's judgment alone. The operator wants the option to fire an **adversarial review** (a second, skeptical agent trying to find fault / refute "done") on that finished workset before acceptance — and to codify conditions under which such a review is **auto-forced** (discovered as we build).

## Design questions (recommend on each)
1. **Trigger model.** (a) orchestrator manually fires a review on a finished workset; (b) rule-based auto-force. What conditions belong in the ruleset (touches migrations/auth/payments, > N files, security-sensitive paths, brief tagged high-risk, worker was a cheaper/faster model, etc.)? How is the ruleset stored + edited (config-as-data, a UI, per-brief flag)?
2. **What the adversarial review IS.** Scope (diff/workset review vs full re-verify); the reviewer's mandate (correctness / security / regression; try to REFUTE the "done" claim); which agent/model runs it — a distinct model/harness for an independent perspective (ties to #142 routing).
3. **Lifecycle integration.** Where it sits (after worker `brief_dispatch_complete`, before brief → `complete` / before a MAIN merge); how a failing review feeds back (re-open/bounce the brief, block completion); how results are recorded (brief event, feedback, a review artifact).
4. **Optionality + defaults.** Keep it opt-in per dispatch; which auto-force conditions (if any) ship on by default.

## Investigation scope (light)
- Current dispatch → completion path (DispatchService, `brief_dispatch_complete`, status rollup) and where a review gate could insert.
- Existing review affordances that could back the reviewer (the `code-review` / `verify` skills).
- How #142 (model routing) would pick the reviewer's model for independence.

## Deliverable
A design proposal covering **(trigger model + ruleset representation, reviewer mandate/scope, lifecycle integration, defaults)** + a small proposed change set sized as thin slices. To discuss before building.

## Constraints / non-goals
- Brainstorm/discovery — do NOT build yet.
- Keep the capability **optional** (orchestrator-triggered), with auto-force as a configurable overlay, not mandatory.
- Implementation is out of scope here → separate follow-up brief(s) after discussion.

## Definition of done
Design proposal written up for operator discussion; brief ready for the build decision.
[29] refinement flower-refine: ## Objective
Give the orchestrator an **optional** ability to trigger an **adversarial review** (a second, skeptical agent trying to refute "done") on a dispatched brief's finished workset before acceptance, plus a configurable **ruleset overlay** that can auto-force a review for classes of risky work. This is a **design/discovery** brief; its four open design decisions are now **settled** (operator, Q28–31, 2026-07-03) and folded in below. Implementation is separate, sliced follow-up brief(s).

## Problem / intent
When a dispatched worker reports its workset "finished," we currently accept/merge on the orchestrator's judgment alone. The operator wants the option to fire an adversarial review on that finished workset before acceptance, and — eventually — to codify conditions under which a review is auto-forced (discovered as we build).

## Settled decisions (operator answers, Q28–31)
1. **Trigger & defaults (Q28) — MANUAL-FIRE ONLY at first; NO auto-force rules enabled on ship.** Build the review-trigger mechanism as orchestrator-fired. For now the **orchestrator decides case-by-case** what is important / large / sensitive enough to warrant a review, while the mechanism is fleshed out and tested. The auto-force ruleset is *scaffolded* (config-as-data, below) but ships with **all rules disabled**. Re-review which conditions to auto-enable once the flow is polished. — *This supersedes the design proposal's recommendation to ship the `migrations` rule enabled.*
2. **Ruleset storage (Q29) — config-as-data in `config/flower.php` now, UI later.** A `review.rules[]` array (matching the `summarize.models[]` / `pricing.models{}` precedent), env-overridable; a per-brief `briefs.meta.review` flag is the manual override; a toggles UI is deferred to a later slice.
3. **Holding state (Q30) — new `BriefStatus::InReview` case.** A distinct status with its own `/briefs` badge & filter (clearer than a meta-flag hold at Dispatched).
4. **On FAIL (Q31) — auto-bounce.** On `REQUEST_CHANGES` / `BLOCK`, re-dispatch a fix to the **original worker** (Resume) with the findings injected; the failing verdict **blocks `Complete`** until a later review passes.

## The review itself (from the design proposal, trace event #1476)
- A **spawned adversarial agent** (there is no in-app review service). Default scope = the **workset diff** (not full re-verify); mandate = **refute "done"** across correctness / security / regression; returns a structured **`PASS | REQUEST_CHANGES | BLOCK`** verdict + findings `[{file,line,severity,claim,refutation}]`. Reuses the `code-review` / `security-review` / `verify` harness skills as its playbook. Full re-verify is an escalation reserved for high-risk cases.
- **Independence:** reviewer **harness** ≠ worker harness works **today** via `dispatch_requests.agent_tool`; reviewer **model/effort** independence needs **#142** (routing) and is deferred to a later slice — *not* a v1 blocker. Reviewer auto-attaches as `BriefParticipantRole::Reviewer` (already exists).

## Lifecycle integration
- **One gate** inside `DispatchService::recomputeRollup()` — the single chokepoint that flips a brief `Dispatched → Complete` (the `hasCompletedDispatch` branch, `app/Services/Briefs/DispatchService.php`). Before `setStatus(Complete)`: `if (reviewRequired(brief) && !reviewPassed(brief)) → holdForReview()` (→ `InReview`); else `Complete`.
- **v1 `reviewRequired`** = manual orchestrator fire **OR** per-brief `meta.review.force` (no auto-force rules enabled yet). `reviewPassed` = latest verdict is `PASS`.
- **On PASS** → append `review_passed` → `recomputeRollup` proceeds to `Complete`. **On FAIL** → auto-bounce to the original worker (Resume) with findings, append `review_failed`, block `Complete`.
- **Merge safety:** MAIN merges are orchestrator-mediated and gated on `Complete`, so gating `Complete` on review gates the merge too — no separate merge hook.
- **Recording:** new `review_requested | review_passed | review_failed` brief events; optional durable `brief_reviews` artifact table `{brief_id, dispatch_request_id, reviewer_ref, reviewer_harness, verdict, findings}`.

## Build plan (PR-sized thin slices; adjusted for Q28)
- **Slice 0 — reviewer-loop spike (no gate):** `brief_review` MCP tool + review packet; orchestrator fires manually on a finished brief; verdict recorded as events. Proves the loop with today's harness-independence. *No #142.*
- **Slice 1 — the gate:** `reviewRequired()` / `reviewPassed()` / `holdForReview()` in `recomputeRollup()`; `briefs.meta.review` flag; new `BriefStatus::InReview`; manual trigger; PASS→Complete, FAIL→auto-bounce-to-worker. *No #142.*
- **Slice 2 — ruleset scaffold (ALL rules DISABLED per Q28):** `flower.review.rules[]` + a `ReviewRuleEvaluator` (reads `Commit.files`, `agent_tool`, `meta`) capable of setting `review_required` — but ship with **no rule enabled**; orchestrator judgment stays the only trigger until we re-review. *No #142.*
- **Slice 3 — artifact + UI:** `brief_reviews` table; `/briefs` `InReview` badge + findings panel; small rules-toggle page + per-brief toggle. *No #142.*
- **Slice 4 — model independence:** once #142 lands, force an independent reviewer **model + effort** (peer-or-higher tier, different provider), not just harness. *Requires #142.*

Slices 0–3 deliver a fully useful **manual** adversarial-review gate with harness-level independence **without #142**; only the distinct-MODEL guarantee waits on it.

## Dependencies
- **#142 (model/effort routing)** — blocks **only Slice 4** (distinct reviewer MODEL). Harness-independence is available now. Sequence Slice 4 behind #142; ship 0–3 first.

## Constraints / non-goals
- Capability stays **optional / opt-in**; auto-force is a configurable overlay that **ships fully disabled** (Q28).
- Implementation lands via the sliced follow-up brief(s); **this brief is the settled design → build decision**, not the build itself.

## Definition of done
Design settled with the operator's Q28–31 decisions folded in; brief at **`planned`**, ready for the orchestrator/operator **build decision** (dispatch Slice 0/1, or spin per-slice implementation briefs).
[30] note_added flower-refine: Folded operator answers Q28–31 into the canonical spec (prior spec snapshotted as event #1657; new spec = #1658); moving refining → planned. Only material change vs the dispatched design proposal (#1476): **Q28 = NO auto-force rules enabled on ship** — manual / orchestrator-judgment fire only until the mechanism is polished (supersedes the proposal's "ship the migrations rule enabled" default). Q29 (config-as-data in config/flower.php now, UI later), Q30 (new BriefStatus::InReview), Q31 (auto-bounce a fix to the original worker) all confirm the proposal. Build plan: Slices 0–3 are buildable now with harness-level reviewer independence; Slice 4 (distinct reviewer MODEL) waits on #142 (model routing). Stopping at planned — dispatch is the orchestrator's call.
[31] status_change flower-refine: (no body)

Recommended linked context:
{
    "todos": [],
    "scratchpads": [
        {
            "id": 375,
            "solo_scratchpad_id": "1067",
            "name": "flower-refine — reset handoff (2026-07-03 #2)",
            "archived": false,
            "revision": 2
        }
    ]
}

Execution notes:
- Treat the brief as the source of truth.
- Keep work scoped to this dispatch request.
- Use brief_append / brief_update_status when reporting material progress; as your final dispatched-worker step, call brief_dispatch_complete with dispatch_request_id (or brief_id) and actor_ref.
- Codex workers should verify mutating Flower tools with tool_search query `brief_append brief_dispatch_complete flower_feedback` (limit 20) when tool availability is in doubt; report raw SEE/LOAD vs NOT visible instead of silently using local fallbacks.
- Add a git commit trailer `Brief: #143` to every commit for this brief so flower can exact-link commits back to the brief.
#43 done fresh flower
agent: claude
You are being dispatched from flower Brief #143: Adversarial reviews

I think I'd like to set things up so that we can

Recall pointer:
- Use recall_brief with id 143 for the full folder if you need provenance.

Target:
- project: flower (/Users/mikeferrara/Documents/code/flower)
- branch: choose an appropriate branch
- worktree: not specified
- kind: fresh

Current brief spec:
## Objective
Design + recommendation (brainstorm brief, NOT implementation): give the orchestrator an **optional** ability to trigger an adversarial review on an incoming "finished" workset, plus a configurable **rule set of conditions that auto-force** such a review. Deliverable is a design proposal to **discuss before building**.

## Problem / intent
When a dispatched brief's worker reports its workset "finished," we currently accept/merge on the orchestrator's judgment alone. The operator wants the option to fire an **adversarial review** (a second, skeptical agent trying to find fault / refute "done") on that finished workset before acceptance — and to codify conditions under which such a review is **auto-forced** (discovered as we build).

## Design questions (recommend on each)
1. **Trigger model.** (a) orchestrator manually fires a review on a finished workset; (b) rule-based auto-force. What conditions belong in the ruleset (touches migrations/auth/payments, > N files, security-sensitive paths, brief tagged high-risk, worker was a cheaper/faster model, etc.)? How is the ruleset stored + edited (config-as-data, a UI, per-brief flag)?
2. **What the adversarial review IS.** Scope (diff/workset review vs full re-verify); the reviewer's mandate (correctness / security / regression; try to REFUTE the "done" claim); which agent/model runs it — a distinct model/harness for an independent perspective (ties to #142 routing).
3. **Lifecycle integration.** Where it sits (after worker `brief_dispatch_complete`, before brief → `complete` / before a MAIN merge); how a failing review feeds back (re-open/bounce the brief, block completion); how results are recorded (brief event, feedback, a review artifact).
4. **Optionality + defaults.** Keep it opt-in per dispatch; which auto-force conditions (if any) ship on by default.

## Investigation scope (light)
- Current dispatch → completion path (DispatchService, `brief_dispatch_complete`, status rollup) and where a review gate could insert.
- Existing review affordances that could back the reviewer (the `code-review` / `verify` skills).
- How #142 (model routing) would pick the reviewer's model for independence.

## Deliverable
A design proposal covering **(trigger model + ruleset representation, reviewer mandate/scope, lifecycle integration, defaults)** + a small proposed change set sized as thin slices. To discuss before building.

## Constraints / non-goals
- Brainstorm/discovery — do NOT build yet.
- Keep the capability **optional** (orchestrator-triggered), with auto-force as a configurable overlay, not mandatory.
- Implementation is out of scope here → separate follow-up brief(s) after discussion.

## Definition of done
Design proposal written up for operator discussion; brief ready for the build decision.

Recent/key trace events:
[1] participant_joined operator:mike: (no body)
[2] note_added operator:mike: Adversarial reviews

I think I'd like to set things up so that we can give the orchestrator the optional ability to trigger/fire an adversarial review on an incoming "finished" workset. So let's brainstorm a bit about how that might be done - optional things that might be worth doing, or not, etc. Maybe we set some sort of rule set/list of conditions that would auto-force a review as we go through building and find places where we think it should always happen.

Thoughts? ideas?
[3] status_change operator:mike: (no body)
[4] participant_joined flower-refine: (no body)
[5] spec_snapshot flower-refine: Adversarial reviews

I think I'd like to set things up so that we can give the orchestrator the optional ability to trigger/fire an adversarial review on an incoming "finished" workset. So let's brainstorm a bit about how that might be done - optional things that might be worth doing, or not, etc. Maybe we set some sort of rule set/list of conditions that would auto-force a review as we go through building and find places where we think it should always happen.

Thoughts? ideas?
[6] refinement flower-refine: ## Objective
Design + recommendation (brainstorm brief, NOT implementation): give the orchestrator an **optional** ability to trigger an adversarial review on an incoming "finished" workset, plus a configurable **rule set of conditions that auto-force** such a review. Deliverable is a design proposal to **discuss before building**.

## Problem / intent
When a dispatched brief's worker reports its workset "finished," we currently accept/merge on the orchestrator's judgment alone. The operator wants the option to fire an **adversarial review** (a second, skeptical agent trying to find fault / refute "done") on that finished workset before acceptance — and to codify conditions under which such a review is **auto-forced** (discovered as we build).

## Design questions (recommend on each)
1. **Trigger model.** (a) orchestrator manually fires a review on a finished workset; (b) rule-based auto-force. What conditions belong in the ruleset (touches migrations/auth/payments, > N files, security-sensitive paths, brief tagged high-risk, worker was a cheaper/faster model, etc.)? How is the ruleset stored + edited (config-as-data, a UI, per-brief flag)?
2. **What the adversarial review IS.** Scope (diff/workset review vs full re-verify); the reviewer's mandate (correctness / security / regression; try to REFUTE the "done" claim); which agent/model runs it — a distinct model/harness for an independent perspective (ties to #142 routing).
3. **Lifecycle integration.** Where it sits (after worker `brief_dispatch_complete`, before brief → `complete` / before a MAIN merge); how a failing review feeds back (re-open/bounce the brief, block completion); how results are recorded (brief event, feedback, a review artifact).
4. **Optionality + defaults.** Keep it opt-in per dispatch; which auto-force conditions (if any) ship on by default.

## Investigation scope (light)
- Current dispatch → completion path (DispatchService, `brief_dispatch_complete`, status rollup) and where a review gate could insert.
- Existing review affordances that could back the reviewer (the `code-review` / `verify` skills).
- How #142 (model routing) would pick the reviewer's model for independence.

## Deliverable
A design proposal covering **(trigger model + ruleset representation, reviewer mandate/scope, lifecycle integration, defaults)** + a small proposed change set sized as thin slices. To discuss before building.

## Constraints / non-goals
- Brainstorm/discovery — do NOT build yet.
- Keep the capability **optional** (orchestrator-triggered), with auto-force as a configurable overlay, not mandatory.
- Implementation is out of scope here → separate follow-up brief(s) after discussion.

## Definition of done
Design proposal written up for operator discussion; brief ready for the build decision.
[7] refinement flower-refine: Grounding review (2026-07-03) VERIFIED the lifecycle seam this spec names is precise: `DispatchService::recomputeRollup()` is the gate-able chokepoint where a brief flips to `BriefStatus::Complete` (after a dispatch `finish()`/`complete()`). A review gate inserts cleanly between "dispatch Done" and "brief → Complete" — hold at a new intermediate status or require a passed-review flag. Entry path: `brief_dispatch_complete` → `DispatchService::completeForBrief` → `finish(Done)` → `recomputeRollup`. No in-app review hooks exist (the `code-review`/`verify` affordances are harness skills, not app services), so the reviewer is a spawned agent. DEPENDENCY to call out: picking an INDEPENDENT reviewer *model* needs model/effort selection that doesn't exist yet — dispatch/spawn pick only a HARNESS (`agent_tool`), never a model; `brief.model` reaches only `BriefRefiner`. So #143's model-independence rides on **#142** (routing). Ruleset substrate is strong (config-as-data precedent; `dispatch_requests` records `agent_tool` so "worker was a cheaper model" is a knowable auto-force condition; `Commit.files` gives "touches migrations / N files"). Sound as a discovery/design brief → moving to planned.
[8] status_change flower-refine: (no body)

Recommended linked context:
{
    "todos": [],
    "scratchpads": []
}

Execution notes:
- Treat the brief as the source of truth.
- Keep work scoped to this dispatch request.
- Use brief_append / brief_update_status when reporting material progress; as your final dispatched-worker step, call brief_dispatch_complete with dispatch_request_id (or brief_id) and actor_ref.
- Codex workers should verify mutating Flower tools with tool_search query `brief_append brief_dispatch_complete flower_feedback` (limit 20) when tool availability is in doubt; report raw SEE/LOAD vs NOT visible instead of silently using local fallbacks.
- Add a git commit trailer `Brief: #143` to every commit for this brief so flower can exact-link commits back to the brief.

provenance · append-only

Trace

live
or paste a screenshot uploading…
  1. link added 1d ago
    agent · system:commit-trailer
  2. participant joined 1d ago
    system · system:commit-trailer
  3. merged 1d ago

    Slices 0-1 merged to MAIN as a44d9fa (branch flower/143-adversarial-review-gate, worker commit e3ae93a). No migration (BriefStatus::InReview is an enum case on the existing string column). Full suite green on MAIN: 769/767 (2 pre-existing skips, 5141 assertions). The recomputeRollup review gate is INERT BY DEFAULT — ReviewState::required() is false unless a review is manually fired or meta.review.force is set — so existing brief-completion behavior is unchanged. Delivered: ReviewService, brief_review + brief_request_review MCP tools, ReviewVerdict enum, review_requested/passed/failed events, BriefStatus::InReview + /briefs filter. NOTE: the new MCP review tools won't be callable until the flower MCP server reloads its code. Remaining: Slice 2 (ruleset scaffold, all-disabled), Slice 3 (UI badge+findings panel), Slice 4 (distinct reviewer model, needs #142) — to be spun as follow-up briefs.

    agent · flower-orchestrator
  4. status change 1d ago
    agent · flower-143
  5. dispatched 1d ago

    Dispatch request #55 marked done.

    agent · flower-143
  6. note added 1d ago

    Built Slices 0–1 (manual adversarial-review gate) on branch flower/143-adversarial-review-gate (commit e3ae93a). Slices 2–4 NOT built, per dispatch. Slice 0 (reviewer-loop spike) + Slice 1 (the gate), manual-fire only per Q28 (no auto-force rules): - BriefStatus::InReview added (Q30) — distinct hold state with a /briefs badge entry (STATUS_META). ReviewVerdict enum (PASS|REQUEST_CHANGES|BLOCK). New brief events review_requested/review_passed/review_failed. - ReviewState (app/Support/Reviews) = pure meta.review reader (required()/passed()/apply()) — the shared read seam so the gate needs no dependency on the review service (avoids a DispatchService↔ReviewService cycle). ReviewService owns the write-path and depends on DispatchService one-way. - Gate in DispatchService::recomputeRollup() (hasCompletedDispatch branch, before setStatus(Complete)): if reviewRequired && !reviewPassed → holdForReview() → InReview; also catches a review fired retroactively on an already-Complete brief. reviewRequired = manual orchestrator fire (meta.review.state=requested) OR per-brief meta.review.force; reviewPassed = latest verdict PASS. - On PASS → append review_passed → rollup proceeds to Complete. On REQUEST_CHANGES/BLOCK (Q31) → append review_failed → auto-bounce a fix dispatch to the ORIGINAL worker (Resume when a worker session id is on the prior Done dispatch, else Fresh fallback to the same routing); findings ride the review_failed trace event that the bounce packet renders; failing verdict blocks Complete until a later review passes. Full fail→fix→pass loop verified. - MCP surface: brief_request_review (orchestrator manual-fire; records review_requested, returns the reviewer packet — mandate to refute "done", workset/branch pointer, code-review/security-review/verify playbook, harness-independence hint, how to submit) and brief_review (reviewer verdict; auto-attaches BriefParticipantRole::Reviewer). Both registered on FlowerServer's first page. Notes/decisions: - No brief_reviews table and no DispatchKind::Review (both are Slice 3+ scope): verdicts are recorded as brief events + a meta.review blob on the brief. - Reviewer HARNESS-independence works today (packet hint + reviewer_harness param); reviewer MODEL/effort independence still waits on #142 (Slice 4). - Resume-to-original-worker uses the prior Done dispatch's resume_session_id ?? session_id; when neither is present it falls back to a Fresh dispatch to the same harness/worktree/branch (the worker's real session id isn't reliably captured on dispatch_requests today — a known gap, flagged for later). Tests: added tests/Feature/Briefs/ReviewGateTest.php (7) + tests/Feature/Mcp/ReviewToolsTest.php (5). Full suite green: 767 passed / 2 pre-existing skips / 0 failures (769 total). ./vendor/bin/pint clean on all touched files.

    agent · flower-143
  7. participant joined 1d ago
    system · flower-143
  8. dispatched 1d ago

    Dispatch request #55 queued for flower.

    agent · flower-orchestrator
  9. status change 1d ago
    agent · flower-orchestrator
  10. status change 1d ago
    agent · flower-refine
  11. note added 1d ago

    Folded operator answers Q28–31 into the canonical spec (prior spec snapshotted as event #1657; new spec = #1658); moving refining → planned. Only material change vs the dispatched design proposal (#1476): **Q28 = NO auto-force rules enabled on ship** — manual / orchestrator-judgment fire only until the mechanism is polished (supersedes the proposal's "ship the migrations rule enabled" default). Q29 (config-as-data in config/flower.php now, UI later), Q30 (new BriefStatus::InReview), Q31 (auto-bounce a fix to the original worker) all confirm the proposal. Build plan: Slices 0–3 are buildable now with harness-level reviewer independence; Slice 4 (distinct reviewer MODEL) waits on #142 (model routing). Stopping at planned — dispatch is the orchestrator's call.

    agent · flower-refine
  12. refinement 1d ago

    ## Objective Give the orchestrator an **optional** ability to trigger an **adversarial review** (a second, skeptical agent trying to refute "done") on a dispatched brief's finished workset before acceptance, plus a configurable **ruleset overlay** that can auto-force a review for classes of risky work. This is a **design/discovery** brief; its four open design decisions are now **settled** (operator, Q28–31, 2026-07-03) and folded in below. Implementation is separate, sliced follow-up brief(s). ## Problem / intent When a dispatched worker reports its workset "finished," we currently accept/merge on the orchestrator's judgment alone. The operator wants the option to fire an adversarial review on that finished workset before acceptance, and — eventually — to codify conditions under which a review is auto-forced (discovered as we build). ## Settled decisions (operator answers, Q28–31) 1. **Trigger & defaults (Q28) — MANUAL-FIRE ONLY at first; NO auto-force rules enabled on ship.** Build the review-trigger mechanism as orchestrator-fired. For now the **orchestrator decides case-by-case** what is important / large / sensitive enough to warrant a review, while the mechanism is fleshed out and tested. The auto-force ruleset is *scaffolded* (config-as-data, below) but ships with **all rules disabled**. Re-review which conditions to auto-enable once the flow is polished. — *This supersedes the design proposal's recommendation to ship the `migrations` rule enabled.* 2. **Ruleset storage (Q29) — config-as-data in `config/flower.php` now, UI later.** A `review.rules[]` array (matching the `summarize.models[]` / `pricing.models{}` precedent), env-overridable; a per-brief `briefs.meta.review` flag is the manual override; a toggles UI is deferred to a later slice. 3. **Holding state (Q30) — new `BriefStatus::InReview` case.** A distinct status with its own `/briefs` badge & filter (clearer than a meta-flag hold at Dispatched). 4. **On FAIL (Q31) — auto-bounce.** On `REQUEST_CHANGES` / `BLOCK`, re-dispatch a fix to the **original worker** (Resume) with the findings injected; the failing verdict **blocks `Complete`** until a later review passes. ## The review itself (from the design proposal, trace event #1476) - A **spawned adversarial agent** (there is no in-app review service). Default scope = the **workset diff** (not full re-verify); mandate = **refute "done"** across correctness / security / regression; returns a structured **`PASS | REQUEST_CHANGES | BLOCK`** verdict + findings `[{file,line,severity,claim,refutation}]`. Reuses the `code-review` / `security-review` / `verify` harness skills as its playbook. Full re-verify is an escalation reserved for high-risk cases. - **Independence:** reviewer **harness** ≠ worker harness works **today** via `dispatch_requests.agent_tool`; reviewer **model/effort** independence needs **#142** (routing) and is deferred to a later slice — *not* a v1 blocker. Reviewer auto-attaches as `BriefParticipantRole::Reviewer` (already exists). ## Lifecycle integration - **One gate** inside `DispatchService::recomputeRollup()` — the single chokepoint that flips a brief `Dispatched → Complete` (the `hasCompletedDispatch` branch, `app/Services/Briefs/DispatchService.php`). Before `setStatus(Complete)`: `if (reviewRequired(brief) && !reviewPassed(brief)) → holdForReview()` (→ `InReview`); else `Complete`. - **v1 `reviewRequired`** = manual orchestrator fire **OR** per-brief `meta.review.force` (no auto-force rules enabled yet). `reviewPassed` = latest verdict is `PASS`. - **On PASS** → append `review_passed` → `recomputeRollup` proceeds to `Complete`. **On FAIL** → auto-bounce to the original worker (Resume) with findings, append `review_failed`, block `Complete`. - **Merge safety:** MAIN merges are orchestrator-mediated and gated on `Complete`, so gating `Complete` on review gates the merge too — no separate merge hook. - **Recording:** new `review_requested | review_passed | review_failed` brief events; optional durable `brief_reviews` artifact table `{brief_id, dispatch_request_id, reviewer_ref, reviewer_harness, verdict, findings}`. ## Build plan (PR-sized thin slices; adjusted for Q28) - **Slice 0 — reviewer-loop spike (no gate):** `brief_review` MCP tool + review packet; orchestrator fires manually on a finished brief; verdict recorded as events. Proves the loop with today's harness-independence. *No #142.* - **Slice 1 — the gate:** `reviewRequired()` / `reviewPassed()` / `holdForReview()` in `recomputeRollup()`; `briefs.meta.review` flag; new `BriefStatus::InReview`; manual trigger; PASS→Complete, FAIL→auto-bounce-to-worker. *No #142.* - **Slice 2 — ruleset scaffold (ALL rules DISABLED per Q28):** `flower.review.rules[]` + a `ReviewRuleEvaluator` (reads `Commit.files`, `agent_tool`, `meta`) capable of setting `review_required` — but ship with **no rule enabled**; orchestrator judgment stays the only trigger until we re-review. *No #142.* - **Slice 3 — artifact + UI:** `brief_reviews` table; `/briefs` `InReview` badge + findings panel; small rules-toggle page + per-brief toggle. *No #142.* - **Slice 4 — model independence:** once #142 lands, force an independent reviewer **model + effort** (peer-or-higher tier, different provider), not just harness. *Requires #142.* Slices 0–3 deliver a fully useful **manual** adversarial-review gate with harness-level independence **without #142**; only the distinct-MODEL guarantee waits on it. ## Dependencies - **#142 (model/effort routing)** — blocks **only Slice 4** (distinct reviewer MODEL). Harness-independence is available now. Sequence Slice 4 behind #142; ship 0–3 first. ## Constraints / non-goals - Capability stays **optional / opt-in**; auto-force is a configurable overlay that **ships fully disabled** (Q28). - Implementation lands via the sliced follow-up brief(s); **this brief is the settled design → build decision**, not the build itself. ## Definition of done Design settled with the operator's Q28–31 decisions folded in; brief at **`planned`**, ready for the orchestrator/operator **build decision** (dispatch Slice 0/1, or spin per-slice implementation briefs).

    agent · flower-refine
  13. spec snapshot 1d ago

    ## Objective Design + recommendation (brainstorm brief, NOT implementation): give the orchestrator an **optional** ability to trigger an adversarial review on an incoming "finished" workset, plus a configurable **rule set of conditions that auto-force** such a review. Deliverable is a design proposal to **discuss before building**. ## Problem / intent When a dispatched brief's worker reports its workset "finished," we currently accept/merge on the orchestrator's judgment alone. The operator wants the option to fire an **adversarial review** (a second, skeptical agent trying to find fault / refute "done") on that finished workset before acceptance — and to codify conditions under which such a review is **auto-forced** (discovered as we build). ## Design questions (recommend on each) 1. **Trigger model.** (a) orchestrator manually fires a review on a finished workset; (b) rule-based auto-force. What conditions belong in the ruleset (touches migrations/auth/payments, > N files, security-sensitive paths, brief tagged high-risk, worker was a cheaper/faster model, etc.)? How is the ruleset stored + edited (config-as-data, a UI, per-brief flag)? 2. **What the adversarial review IS.** Scope (diff/workset review vs full re-verify); the reviewer's mandate (correctness / security / regression; try to REFUTE the "done" claim); which agent/model runs it — a distinct model/harness for an independent perspective (ties to #142 routing). 3. **Lifecycle integration.** Where it sits (after worker `brief_dispatch_complete`, before brief → `complete` / before a MAIN merge); how a failing review feeds back (re-open/bounce the brief, block completion); how results are recorded (brief event, feedback, a review artifact). 4. **Optionality + defaults.** Keep it opt-in per dispatch; which auto-force conditions (if any) ship on by default. ## Investigation scope (light) - Current dispatch → completion path (DispatchService, `brief_dispatch_complete`, status rollup) and where a review gate could insert. - Existing review affordances that could back the reviewer (the `code-review` / `verify` skills). - How #142 (model routing) would pick the reviewer's model for independence. ## Deliverable A design proposal covering **(trigger model + ruleset representation, reviewer mandate/scope, lifecycle integration, defaults)** + a small proposed change set sized as thin slices. To discuss before building. ## Constraints / non-goals - Brainstorm/discovery — do NOT build yet. - Keep the capability **optional** (orchestrator-triggered), with auto-force as a configurable overlay, not mandatory. - Implementation is out of scope here → separate follow-up brief(s) after discussion. ## Definition of done Design proposal written up for operator discussion; brief ready for the build decision.

    system · flower-refine
  14. status change 2d ago
    agent · flower-orchestrator
  15. operator answer 2d ago

    Auto-bounce: re-dispatch a fix to the original worker with findings (recommended)

    operator · operator:mike
  16. operator answer 2d ago

    New BriefStatus::InReview case (clearer on /briefs; recommended)

    operator · operator:mike
  17. operator answer 2d ago

    Config-as-data in config/flower.php now, UI later (recommended)

    operator · operator:mike
  18. operator answer 2d ago

    None — manual-fire only at first, no auto-force, let's let the orchestrator decide for the time being what's important enough or large enough or touches sensitive enough stuff to warrant the review process while we get it all fleshed out and tested. Once polished we can re-review the automated conditions.

    operator · operator:mike
  19. link added 2d ago
    agent · flower-refine
  20. note added 2d ago

    ↩️ Reopened by operator request (2026-07-03). This brief was prematurely marked complete by a prior orchestrator at 14:40:53 while its 4 operator questions (Q28–31, posted by the worker at 14:36) were still open — see bug #71. Reopened to in_progress so the operator can answer Q28–31. The 4 questions remain status=open and now surface again in the Open Questions view/badge (which #154 is realigning).

    agent · flower-orchestrator
  21. status change 2d ago
    agent · flower-orchestrator
  22. status change 2d ago
    agent · flower-orchestrator
  23. dispatched 2d ago

    Dispatch request #43 marked done.

    agent · flower-orchestrator
  24. agent question 2d ago

    When a review returns REQUEST_CHANGES/BLOCK, what should happen automatically?

    agent · flower-design-143
  25. agent question 2d ago

    How should a brief be held while its review is pending, rather than flipping to Complete?

    agent · flower-design-143
  26. agent question 2d ago

    How should the auto-force ruleset be stored and edited?

    agent · flower-design-143
  27. agent question 2d ago

    Which auto-force conditions should ship ENABLED by default (rest stay defined-but-disabled, turned on as we learn)?

    agent · flower-design-143
  28. plan proposed 2d ago

    # Brief #143 — Adversarial Reviews: Design Proposal *Design/discovery deliverable (flower-design-143, Solo proc 1032, wt2). No code written. To discuss before building.* ## TL;DR — recommendation Give the orchestrator an **optional** "second skeptic" pass on a finished workset, gated at the exact place a brief becomes `Complete`, with a **config-as-data ruleset** that can *auto-force* the pass for classes of risky work. 1. **One gate, two triggers.** Insert a review check inside `DispatchService::recomputeRollup()` — the single chokepoint that flips a brief `Dispatched → Complete`. Both a **manual** orchestrator fire and a **rule-based auto-force** converge on the same `review_required` flag the gate reads. 2. **The review is a spawned adversarial agent** (there is no in-app review service). It reviews the **workset diff**, is mandated to **REFUTE "done"** on correctness / security / regression, and returns a structured **PASS / REQUEST_CHANGES / BLOCK** verdict recorded as brief events. Reuse the existing `code-review` / `security-review` / `verify` harness skills as its playbook. 3. **Ruleset = config-as-data** (`config/flower.php` → `review.rules[]`), env-overridable, matching the `summarize.models[]` precedent; a **per-brief `briefs.meta.review` flag** is the manual override; a small toggles UI is deferred. 4. **Ship conservative.** Capability **off by default**, opt-in per dispatch. Exactly **one** auto-force rule enabled on ship — *touches migrations* — with the rest **defined-but-disabled**, so we turn them on "as we find places where it should always happen" (the operator's framing). 5. **#142 dependency is partial, not blocking.** Reviewer **harness**-independence (Codex reviews Claude) works **today** via `dispatch_requests.agent_tool`. Reviewer **model/effort**-independence needs #142 and is scoped to a later slice. ## Grounding — what the code does today (read-only investigation) **Completion funnel:** `brief_dispatch_complete` → `DispatchService::completeForBrief()` → `complete()` → `finish($req, DispatchStatus::Done)` → **`recomputeRollup($brief)`**. **The gate — `recomputeRollup()` (`app/Services/Briefs/DispatchService.php:167`):** if non-terminal dispatch → stay Dispatched; **elif hasCompletedDispatch → setStatus(Complete) [lines 176–182, THE FLIP]**; elif Dispatched → Planned. Every completion route lands here → **one insertion point gates all of them** (confirms the refine daemon's grounding note, event #7). **Substrate present today (no big lift):** - `dispatch_requests.agent_tool` records the worker's **harness** → both a "cheap-harness" auto-force condition *and* reviewer harness-independence. - `Commit.files` is a **JSON array** of paths; commits link to briefs via `brief_links` → "touches migrations/auth/payments", ">N files" are knowable today. - `briefs.meta` is a **JSON column** → per-brief flags (`review.force`, `review.state`) with **no migration**. - **Config-as-data precedent**: `flower.summarize.models[]`, `flower.briefs.default_model`, `flower.pricing.models{}` → a `flower.review.rules[]` array fits the house style. - **`BriefParticipantRole::Reviewer` already exists** → the model already anticipates a reviewer participant; reviewer auto-attaches in that role. - **No in-app review service** — `code-review`/`verify`/`security-review` are *harness skills* → reviewer must be a **spawned agent** driven by a dispatch packet. - `DispatchKind` = `Fresh|Resume`; `BriefStatus` has no `in_review` case yet. **#142 gap (confirmed):** dispatch/spawn select only a **harness** (`agent_tool`), never a model/effort; `brief.model` reaches only `BriefRefiner`. Distinct reviewer MODEL waits on #142; distinct reviewer HARNESS is available now. **Prior art (recall):** the adversarial-reviewer loop is already used manually in flower's own history — e.g. the design-doc consensus loop (verdict `partial` + blocking B1) and the `reddit:import-archive` review where **Codex 900 did the work, Claude 903 adversarially reviewed it**, returned `REQUEST CHANGES` with a P0, and fixes were bounced back to Codex. This brief **codifies that proven pattern** into the dispatch lifecycle. ## Q1 — Trigger model + ruleset representation **Two triggers, one gate.** Both set a `review_required` intent `recomputeRollup()` honors: **(a) Manual** orchestrator fire (always available, even with no rule match — the *optional* capability); **(b) Auto-force** — ruleset evaluated at completion; a match sets `review_required` before the flip. **Candidate conditions (knowable today?):** touches migrations — `Commit.files` glob `database/migrations/**` ✅ | touches auth/payments/security paths — configurable path-globs ✅ | `>N` files — `count(Commit.files)` ✅ | worker on cheap/fast **harness** — `dispatch_requests.agent_tool` ✅ (proxy for model) | worker on below-tier **model** — ⛔ needs **#142** | brief tagged high-risk — `briefs.meta.tags`/`priority≥threshold` ✅ | project flagged sensitive — config allowlist ✅. **Storage (recommendation):** rules → **config-as-data** `config/flower.php` `review.rules[] = [{id,label,when:{files_glob|files_gt|agent_tool_in|brief_tag},enabled}]` + `review.enabled` master switch (default OFF); per-brief override → **`briefs.meta.review`** `{force:true|false}` (trumps rules); **UI deferred** to a later slice. Rules are findings we accrete over time → git-versioned config fits better than a DB row. ## Q2 — What the adversarial review IS **Scope (default = workset diff, NOT full re-verify):** review the diff of the dispatch's workset (linked commits/branch diff) + brief spec + the "done" claim. Cheaper, focused, matches the manual pattern. **Full re-verify** (exercise the flow via `/verify`) is an *escalation* reserved for high-risk rule matches. **Reviewer mandate — refute "done":** assume the work is wrong until proven right; try to break the claim across **correctness** (requirements, edge cases), **security** (authz/authn, injection, secrets, sensitive-path regressions), **regression** (adjacent behavior, meaningful-not-tautological tests). **Verdict (structured):** `PASS | REQUEST_CHANGES | BLOCK` + findings `[{file,line,severity,claim,refutation}]`, mirroring the `code-review` finding shape. Packet points the reviewer at `/code-review high`, `/security-review`, `/verify`. **Independence axes:** **harness (today)** — pick `agent_tool` ≠ worker's `agent_tool` (known from `dispatch_requests.agent_tool`); **model/effort (needs #142)** — force peer-or-higher tier + different provider family. Reviewer auto-attaches as `BriefParticipantRole::Reviewer`. ## Q3 — Lifecycle integration **Insertion point:** inside `recomputeRollup()`, in the `hasCompletedDispatch` branch, **before** `setStatus(Complete)`: `if (reviewRequired(brief) && !reviewPassed(brief)) return holdForReview(brief);` else `setStatus(Complete)`. `reviewRequired` = `meta.review.force===true` OR any enabled rule matches OR a manual review was requested; `reviewPassed` = latest verdict is PASS. **Holding state (operator choice):** (A) new **`BriefStatus::InReview`** case — clearest on `/briefs` (own badge/filter), costs enum + label/switch churn; (B) **meta-flag hold** — keep `Dispatched`/`Blocked` + `meta.review.state=pending|passed|failed`, smallest slice. *Recommend (A)*, with (B) as the thin-slice fallback. **Firing:** the review is itself a spawn/dispatch — a review request (a `dispatch_request` with a new `DispatchKind::Review`, or a dedicated `brief_reviews` row) targeting an independent harness with a review packet (spec + diff + "done" claim + refute mandate); reviewer submits via a new `brief_review_submit(verdict,findings,actor_ref)` tool. **On PASS:** mark passed, append `review_passed`, `recomputeRollup()` proceeds → `Complete`. **On FAIL (`REQUEST_CHANGES`/`BLOCK`) — bounce:** flip back to a workable status (`in_progress`/`refining`), append `review_failed` with findings, re-open a **fix dispatch to the ORIGINAL worker** (Resume) with findings injected (the Codex-fixes-its-own-work loop). Failing verdict **blocks `Complete`** until a later review passes; optionally auto-file Feedback. **Recording:** new `BriefEventKind` `review_requested|review_passed|review_failed` (thin-slice can reuse `comment`+meta); optional durable `brief_reviews` artifact table `{brief_id,dispatch_request_id,reviewer_ref,reviewer_harness,verdict,findings,created_at}`. **Merge safety:** MAIN merges are orchestrator-mediated and gated on `Complete`, so gating `Complete` on review gates the merge too — **no separate merge hook needed.** ## Q4 — Optionality + defaults - Capability **OFF by default** (`review.enabled=false`), opt-in per dispatch (`review:true` option or `meta.review.force`). - Auto-force rules ship **DISABLED except `migrations`** (schema = high-blast-radius, trivially detectable), which ships **enabled**. Rest defined-but-disabled → grow the ruleset empirically ("turn it on where we find it should always happen"). ## Proposed thin slices (PR-sized) - **Slice 0 — reviewer-loop spike (no gate):** `brief_review` MCP tool + review packet; orchestrator fires manually on a finished brief; verdict recorded as events. Proves the loop with today's harness-independence. *No #142.* - **Slice 1 — the gate:** `reviewRequired()`/`reviewPassed()`/`holdForReview()` in `recomputeRollup()`; `briefs.meta.review` flag; PASS→Complete, FAIL→bounce-to-worker; manual trigger; introduce `InReview` (or meta-hold). *No #142.* - **Slice 2 — the ruleset:** `flower.review.rules[]` + `ReviewRuleEvaluator` (reads `Commit.files`, `agent_tool`, `meta`) sets `review_required` at completion; ship migrations-rule enabled. *No #142.* - **Slice 3 — artifact + UI:** `brief_reviews` table; `/briefs` badge + findings panel; small rules-toggle page + per-brief toggle. *No #142.* - **Slice 4 — model independence:** once #142 lands, force an independent reviewer **model+effort** (peer-or-higher tier, different provider), not just harness. **Requires #142.** Slices 0–3 deliver a fully useful adversarial-review gate with harness-level independence **without #142**; only the "distinct MODEL" guarantee waits on it. ## #142 dependency (explicit) - **Not a hard blocker.** Reviewer harness-independence works now via `dispatch_requests.agent_tool`. - **Blocks only Slice 4** — reviewer model/effort selection (dispatch/spawn can't pick a model today). Sequence Slice 4 behind #142; ship 0–3 first. ## Open decisions (see brief_ask) 1. Which auto-force conditions ship **enabled** by default? (recommend: migrations only) 2. Ruleset representation: **config-as-data now + UI later** (recommended) vs UI-first vs per-brief-only? 3. Holding state: new **`InReview`** status vs **meta-flag hold at Dispatched**? 4. On FAIL: **auto-bounce/re-dispatch to original worker** vs **park at InReview for operator**? *(Full copy also saved to Solo scratchpad in project 56.)*

    agent · flower-design-143
  29. participant joined 2d ago
    system · flower-design-143
  30. dispatched 2d ago

    Dispatch request #43 queued for flower.

    agent · flower-orchestrator
  31. status change 2d ago
    agent · flower-orchestrator
  32. participant joined 2d ago
    system · flower-orchestrator
  33. status change 2d ago
    agent · flower-refine
  34. refinement 2d ago

    Grounding review (2026-07-03) VERIFIED the lifecycle seam this spec names is precise: `DispatchService::recomputeRollup()` is the gate-able chokepoint where a brief flips to `BriefStatus::Complete` (after a dispatch `finish()`/`complete()`). A review gate inserts cleanly between "dispatch Done" and "brief → Complete" — hold at a new intermediate status or require a passed-review flag. Entry path: `brief_dispatch_complete` → `DispatchService::completeForBrief` → `finish(Done)` → `recomputeRollup`. No in-app review hooks exist (the `code-review`/`verify` affordances are harness skills, not app services), so the reviewer is a spawned agent. DEPENDENCY to call out: picking an INDEPENDENT reviewer *model* needs model/effort selection that doesn't exist yet — dispatch/spawn pick only a HARNESS (`agent_tool`), never a model; `brief.model` reaches only `BriefRefiner`. So #143's model-independence rides on **#142** (routing). Ruleset substrate is strong (config-as-data precedent; `dispatch_requests` records `agent_tool` so "worker was a cheaper model" is a knowable auto-force condition; `Commit.files` gives "touches migrations / N files"). Sound as a discovery/design brief → moving to planned.

    agent · flower-refine
  35. refinement 2d ago

    ## Objective Design + recommendation (brainstorm brief, NOT implementation): give the orchestrator an **optional** ability to trigger an adversarial review on an incoming "finished" workset, plus a configurable **rule set of conditions that auto-force** such a review. Deliverable is a design proposal to **discuss before building**. ## Problem / intent When a dispatched brief's worker reports its workset "finished," we currently accept/merge on the orchestrator's judgment alone. The operator wants the option to fire an **adversarial review** (a second, skeptical agent trying to find fault / refute "done") on that finished workset before acceptance — and to codify conditions under which such a review is **auto-forced** (discovered as we build). ## Design questions (recommend on each) 1. **Trigger model.** (a) orchestrator manually fires a review on a finished workset; (b) rule-based auto-force. What conditions belong in the ruleset (touches migrations/auth/payments, > N files, security-sensitive paths, brief tagged high-risk, worker was a cheaper/faster model, etc.)? How is the ruleset stored + edited (config-as-data, a UI, per-brief flag)? 2. **What the adversarial review IS.** Scope (diff/workset review vs full re-verify); the reviewer's mandate (correctness / security / regression; try to REFUTE the "done" claim); which agent/model runs it — a distinct model/harness for an independent perspective (ties to #142 routing). 3. **Lifecycle integration.** Where it sits (after worker `brief_dispatch_complete`, before brief → `complete` / before a MAIN merge); how a failing review feeds back (re-open/bounce the brief, block completion); how results are recorded (brief event, feedback, a review artifact). 4. **Optionality + defaults.** Keep it opt-in per dispatch; which auto-force conditions (if any) ship on by default. ## Investigation scope (light) - Current dispatch → completion path (DispatchService, `brief_dispatch_complete`, status rollup) and where a review gate could insert. - Existing review affordances that could back the reviewer (the `code-review` / `verify` skills). - How #142 (model routing) would pick the reviewer's model for independence. ## Deliverable A design proposal covering **(trigger model + ruleset representation, reviewer mandate/scope, lifecycle integration, defaults)** + a small proposed change set sized as thin slices. To discuss before building. ## Constraints / non-goals - Brainstorm/discovery — do NOT build yet. - Keep the capability **optional** (orchestrator-triggered), with auto-force as a configurable overlay, not mandatory. - Implementation is out of scope here → separate follow-up brief(s) after discussion. ## Definition of done Design proposal written up for operator discussion; brief ready for the build decision.

    agent · flower-refine
  36. spec snapshot 2d ago

    Adversarial reviews I think I'd like to set things up so that we can give the orchestrator the optional ability to trigger/fire an adversarial review on an incoming "finished" workset. So let's brainstorm a bit about how that might be done - optional things that might be worth doing, or not, etc. Maybe we set some sort of rule set/list of conditions that would auto-force a review as we go through building and find places where we think it should always happen. Thoughts? ideas?

    system · flower-refine
  37. participant joined 2d ago
    system · flower-refine
  38. status change 2d ago
    agent · operator:mike
  39. note added 2d ago

    Adversarial reviews I think I'd like to set things up so that we can give the orchestrator the optional ability to trigger/fire an adversarial review on an incoming "finished" workset. So let's brainstorm a bit about how that might be done - optional things that might be worth doing, or not, etc. Maybe we set some sort of rule set/list of conditions that would auto-force a review as we go through building and find places where we think it should always happen. Thoughts? ideas?

    operator · operator:mike
  40. participant joined 2d ago
    system · operator:mike

epic · dependencies

Relationships

epic parent

depends on

No dependencies — dispatchable once planned.

agents · waves

Participants

  • operator:mike participant · active
  • flower-refine participant · active
  • flower-orchestrator participant · active
  • flower-design-143 participant · active
  • flower-143 participant · active
  • system:commit-trailer participant · active

trace · graph

Links

  • Commit #1729 execution
  • Scratchpad #375 execution

scope

Projects

  • flower · primary

dogfood · read-only

Agent’s-eye view

The literal recall_brief payload an agent gets — same service path as the MCP tool.