recall_search: bound result size (per-hit snippet cap + total-payload budget) so it never exceeds the MCP token cap

canonical · plan

Spec

markdown

hand-off · dispatch

Dispatch

Auto-dispatch

when it reaches planned

Design-loop

design pass before build

This brief is complete — dispatch is closed.

#74 done fresh flower · flower/188-recall-search-result-cap

agent: claude

You are being dispatched from flower Brief #188: recall_search: bound result size (per-hit snippet cap + total-payload budget) so it never exceeds the MCP token cap

Recall pointer:
- Use recall_brief with id 188 for the full folder if you need provenance.

Target:
- project: flower (/Users/mikeferrara/Documents/code/flower)
- branch: flower/188-recall-search-result-cap
- worktree: not specified
- kind: fresh

Current brief spec:
## Ask
`recall_search` can return a result that exceeds the MCP tool-result token cap and gets dumped to a file, making it unusable inline. Reported (feedback #87, flower-123 claude): `recall_search(limit:12, scope=project, project=flower, broad query)` returned **70,709 chars** → tripped the "exceeds maximum allowed tokens" guard → written to a tool-results file instead of returned inline. Contrast: `recall_brief(id:123)` in the same session was perfectly shaped. So this is specific to recall_search result sizing on a corpus with large chunks (session_segment / doc / brief chunks are multi-KB each; result size ≈ limit × snippet-length, unbounded on both axes).

## Fix (prefer 1+2 together so no query/limit combo can exceed the cap)
1. **Total-payload budget (preferred):** cap the serialized result at a safe ceiling (~24KB, well under the MCP cap). When over budget, trim per-hit snippets proportionally and/or drop lowest-ranked hits, and append a visible truncation marker (e.g. `"N more hits truncated — narrow the query or lower limit"`) so truncation is never silent.
2. **Per-hit snippet cap:** truncate each hit snippet to a max length (~400–600 chars) with an ellipsis. Optionally expose a `snippet_chars` param (and/or a `body:false` lean mode) so callers can opt into leaner results.
3. **Clamp effective limit** when snippets are large so the total stays under cap.

Keep hits ranked; preserve the existing result shape/fields (just bounded). Match the existing `app/Mcp/Tools` shape and the recall serializer path the web UI shares.

## Verify
- Re-run the reported case: `recall_search(limit:12, scope=project, project=flower)` with a broad query that previously returned ~70KB → result returns **inline under the cap**, still ranked, with a clear truncation signal when trimmed.
- Add a test asserting the serialized recall_search result stays under the ceiling for a large-snippet corpus (seed a few multi-KB chunks).
- `php artisan test` green; run pint on changed files.

## Constraints
- Backend-only change. Work ONLY inside your assigned worktree; NEVER edit under MAIN (`/Users/mikeferrara/Documents/code/flower`). Use relative paths.
- Add commit trailer `Brief: #188` to every commit.

## Meta
Severity: medium (recall usability/quality, NOT app-availability). Autonomous. Source: feedback #87 / fix-spec Solo scratchpad 1074 / signal #59 (routed by flower-ops). Recurring pain — also referenced in #123/#121 worker feedback and related to #186.

Recent/key trace events:
[1] participant_joined flower-orchestrator: (no body)
[2] note_added flower-orchestrator: ## Ask
`recall_search` can return a result that exceeds the MCP tool-result token cap and gets dumped to a file, making it unusable inline. Reported (feedback #87, flower-123 claude): `recall_search(limit:12, scope=project, project=flower, broad query)` returned **70,709 chars** → tripped the "exceeds maximum allowed tokens" guard → written to a tool-results file instead of returned inline. Contrast: `recall_brief(id:123)` in the same session was perfectly shaped. So this is specific to recall_search result sizing on a corpus with large chunks (session_segment / doc / brief chunks are multi-KB each; result size ≈ limit × snippet-length, unbounded on both axes).

## Fix (prefer 1+2 together so no query/limit combo can exceed the cap)
1. **Total-payload budget (preferred):** cap the serialized result at a safe ceiling (~24KB, well under the MCP cap). When over budget, trim per-hit snippets proportionally and/or drop lowest-ranked hits, and append a visible truncation marker (e.g. `"N more hits truncated — narrow the query or lower limit"`) so truncation is never silent.
2. **Per-hit snippet cap:** truncate each hit snippet to a max length (~400–600 chars) with an ellipsis. Optionally expose a `snippet_chars` param (and/or a `body:false` lean mode) so callers can opt into leaner results.
3. **Clamp effective limit** when snippets are large so the total stays under cap.

Keep hits ranked; preserve the existing result shape/fields (just bounded). Match the existing `app/Mcp/Tools` shape and the recall serializer path the web UI shares.

## Verify
- Re-run the reported case: `recall_search(limit:12, scope=project, project=flower)` with a broad query that previously returned ~70KB → result returns **inline under the cap**, still ranked, with a clear truncation signal when trimmed.
- Add a test asserting the serialized recall_search result stays under the ceiling for a large-snippet corpus (seed a few multi-KB chunks).
- `php artisan test` green; run pint on changed files.

## Meta
Severity: medium (recall usability/quality, NOT app-availability). Autonomous. Source: feedback #87 / fix-spec Solo scratchpad 1074 / signal #59 (routed by flower-ops). Recurring pain — also referenced in #123/#121 worker feedback and related to #186.
[3] link_added flower-orchestrator: (no body)
[4] plan_proposed flower-orchestrator: ## Ask
`recall_search` can return a result that exceeds the MCP tool-result token cap and gets dumped to a file, making it unusable inline. Reported (feedback #87, flower-123 claude): `recall_search(limit:12, scope=project, project=flower, broad query)` returned **70,709 chars** → tripped the "exceeds maximum allowed tokens" guard → written to a tool-results file instead of returned inline. Contrast: `recall_brief(id:123)` in the same session was perfectly shaped. So this is specific to recall_search result sizing on a corpus with large chunks (session_segment / doc / brief chunks are multi-KB each; result size ≈ limit × snippet-length, unbounded on both axes).

## Fix (prefer 1+2 together so no query/limit combo can exceed the cap)
1. **Total-payload budget (preferred):** cap the serialized result at a safe ceiling (~24KB, well under the MCP cap). When over budget, trim per-hit snippets proportionally and/or drop lowest-ranked hits, and append a visible truncation marker (e.g. `"N more hits truncated — narrow the query or lower limit"`) so truncation is never silent.
2. **Per-hit snippet cap:** truncate each hit snippet to a max length (~400–600 chars) with an ellipsis. Optionally expose a `snippet_chars` param (and/or a `body:false` lean mode) so callers can opt into leaner results.
3. **Clamp effective limit** when snippets are large so the total stays under cap.

Keep hits ranked; preserve the existing result shape/fields (just bounded). Match the existing `app/Mcp/Tools` shape and the recall serializer path the web UI shares.

## Verify
- Re-run the reported case: `recall_search(limit:12, scope=project, project=flower)` with a broad query that previously returned ~70KB → result returns **inline under the cap**, still ranked, with a clear truncation signal when trimmed.
- Add a test asserting the serialized recall_search result stays under the ceiling for a large-snippet corpus (seed a few multi-KB chunks).
- `php artisan test` green; run pint on changed files.

## Constraints
- Backend-only change. Work ONLY inside your assigned worktree; NEVER edit under MAIN (`/Users/mikeferrara/Documents/code/flower`). Use relative paths.
- Add commit trailer `Brief: #188` to every commit.

## Meta
Severity: medium (recall usability/quality, NOT app-availability). Autonomous. Source: feedback #87 / fix-spec Solo scratchpad 1074 / signal #59 (routed by flower-ops). Recurring pain — also referenced in #123/#121 worker feedback and related to #186.
[5] status_change flower-orchestrator: (no body)

Recommended linked context:
{
    "todos": [],
    "scratchpads": []
}

Execution notes:
- Treat the brief as the source of truth.
- Keep work scoped to this dispatch request.
- Use brief_append / brief_update_status when reporting material progress; as your final dispatched-worker step, call brief_dispatch_complete with dispatch_request_id (or brief_id) and actor_ref.
- Codex workers should verify mutating Flower tools with tool_search query `brief_append brief_dispatch_complete flower_feedback` (limit 20) when tool availability is in doubt; report raw SEE/LOAD vs NOT visible instead of silently using local fallbacks.
- Add a git commit trailer `Brief: #188` to every commit for this brief so flower can exact-link commits back to the brief.

provenance · append-only

Trace

live

link added 1d ago
agent · system:commit-trailer
participant joined 1d ago
system · system:commit-trailer
status change 1d ago
agent · flower-orchestrator
dispatched 1d ago

Dispatch request #74 marked done.
agent · flower-orchestrator
merged 1d ago

Merged flower/188-recall-search-result-cap → master on MAIN (merge commit e17e61d, over worker commit aa27683). No conflicts, no migration. Files: app/Services/Recall/RecallService.php (boundSearchPayload + snippet/byte caps + truncation marker), app/Mcp/Tools/RecallSearchTool.php (snippet_chars param), config/flower.php (recall.search.{max_bytes, snippet_chars, min_snippet_chars}), tests/Feature/Recall/RecallSearchPayloadBudgetTest.php (3 tests). Post-merge on MAIN: pint clean, full suite 918 passed / 1 skipped / 0 failures. Note: recall_search served via the flower MCP — the cap takes effect for new/reconnected MCP sessions (not force-reloading running agents for a medium-sev usability fix).

agent · flower-orchestrator
dispatched 1d ago

Spawned Claude worker proc 1085 (`flower-w188-recall-search`) in the backend worktree (Solo project 53, /Users/mikeferrara/Documents/code/worktrees/flower/backend), branch flower/188-recall-search-result-cap (based off master c3f4957). Kicked off 04:34 with a hard worktree pin (verify pwd/branch, never touch MAIN, relative paths, Brief:#188 trailer). Dispatch request #74. Completion watch armed.

agent · flower-orchestrator
dispatched 1d ago

Dispatch request #74 queued for flower.
agent · flower-orchestrator
status change 1d ago
agent · flower-orchestrator
status change 1d ago
agent · flower-orchestrator
plan proposed 1d ago

## Ask `recall_search` can return a result that exceeds the MCP tool-result token cap and gets dumped to a file, making it unusable inline. Reported (feedback #87, flower-123 claude): `recall_search(limit:12, scope=project, project=flower, broad query)` returned **70,709 chars** → tripped the "exceeds maximum allowed tokens" guard → written to a tool-results file instead of returned inline. Contrast: `recall_brief(id:123)` in the same session was perfectly shaped. So this is specific to recall_search result sizing on a corpus with large chunks (session_segment / doc / brief chunks are multi-KB each; result size ≈ limit × snippet-length, unbounded on both axes). ## Fix (prefer 1+2 together so no query/limit combo can exceed the cap) 1. **Total-payload budget (preferred):** cap the serialized result at a safe ceiling (~24KB, well under the MCP cap). When over budget, trim per-hit snippets proportionally and/or drop lowest-ranked hits, and append a visible truncation marker (e.g. `"N more hits truncated — narrow the query or lower limit"`) so truncation is never silent. 2. **Per-hit snippet cap:** truncate each hit snippet to a max length (~400–600 chars) with an ellipsis. Optionally expose a `snippet_chars` param (and/or a `body:false` lean mode) so callers can opt into leaner results. 3. **Clamp effective limit** when snippets are large so the total stays under cap. Keep hits ranked; preserve the existing result shape/fields (just bounded). Match the existing `app/Mcp/Tools` shape and the recall serializer path the web UI shares. ## Verify - Re-run the reported case: `recall_search(limit:12, scope=project, project=flower)` with a broad query that previously returned ~70KB → result returns **inline under the cap**, still ranked, with a clear truncation signal when trimmed. - Add a test asserting the serialized recall_search result stays under the ceiling for a large-snippet corpus (seed a few multi-KB chunks). - `php artisan test` green; run pint on changed files. ## Constraints - Backend-only change. Work ONLY inside your assigned worktree; NEVER edit under MAIN (`/Users/mikeferrara/Documents/code/flower`). Use relative paths. - Add commit trailer `Brief: #188` to every commit. ## Meta Severity: medium (recall usability/quality, NOT app-availability). Autonomous. Source: feedback #87 / fix-spec Solo scratchpad 1074 / signal #59 (routed by flower-ops). Recurring pain — also referenced in #123/#121 worker feedback and related to #186.

agent · flower-orchestrator
link added 1d ago
agent · flower-orchestrator
note added 1d ago

## Ask `recall_search` can return a result that exceeds the MCP tool-result token cap and gets dumped to a file, making it unusable inline. Reported (feedback #87, flower-123 claude): `recall_search(limit:12, scope=project, project=flower, broad query)` returned **70,709 chars** → tripped the "exceeds maximum allowed tokens" guard → written to a tool-results file instead of returned inline. Contrast: `recall_brief(id:123)` in the same session was perfectly shaped. So this is specific to recall_search result sizing on a corpus with large chunks (session_segment / doc / brief chunks are multi-KB each; result size ≈ limit × snippet-length, unbounded on both axes). ## Fix (prefer 1+2 together so no query/limit combo can exceed the cap) 1. **Total-payload budget (preferred):** cap the serialized result at a safe ceiling (~24KB, well under the MCP cap). When over budget, trim per-hit snippets proportionally and/or drop lowest-ranked hits, and append a visible truncation marker (e.g. `"N more hits truncated — narrow the query or lower limit"`) so truncation is never silent. 2. **Per-hit snippet cap:** truncate each hit snippet to a max length (~400–600 chars) with an ellipsis. Optionally expose a `snippet_chars` param (and/or a `body:false` lean mode) so callers can opt into leaner results. 3. **Clamp effective limit** when snippets are large so the total stays under cap. Keep hits ranked; preserve the existing result shape/fields (just bounded). Match the existing `app/Mcp/Tools` shape and the recall serializer path the web UI shares. ## Verify - Re-run the reported case: `recall_search(limit:12, scope=project, project=flower)` with a broad query that previously returned ~70KB → result returns **inline under the cap**, still ranked, with a clear truncation signal when trimmed. - Add a test asserting the serialized recall_search result stays under the ceiling for a large-snippet corpus (seed a few multi-KB chunks). - `php artisan test` green; run pint on changed files. ## Meta Severity: medium (recall usability/quality, NOT app-availability). Autonomous. Source: feedback #87 / fix-spec Solo scratchpad 1074 / signal #59 (routed by flower-ops). Recurring pain — also referenced in #123/#121 worker feedback and related to #186.

agent · flower-orchestrator
participant joined 1d ago
system · flower-orchestrator

epic · dependencies

Relationships

epic parent

depends on

No dependencies — dispatchable once planned.

agents · waves

Participants

flower-orchestrator participant · active
system:commit-trailer participant · active

trace · graph

Projects

flower · primary

dogfood · read-only

Agent’s-eye view

The literal recall_brief payload an agent gets — same service path as the MCP tool.