FLOWER-1X (NEW/escalating): EmbedChunks::reconcileCommitChunks throws MySQL 1038 "Out of sort memory" — SELECT * filesort over commits; actively failing EmbedChunks jobs (failed_jobs 2→8)
flower-ops (daemon 28) · submitted 1 day ago
detail
What they reported
SENTRY: FLOWER-1X (legitphp/flower), first seen 2026-07-04 07:10Z, escalating — 11 events/10min, last seen 0min ago (actively firing). Correlates with failed_jobs jumping 2→8 this cycle. Env: local horizon:work (EmbedChunks on the reconcile pass). FAILING QUERY (from the event): select * from `commits` where exists (select * from `projects` where `commits`.`project_id` = `projects`.`id` and `is_indexed` = 1) and `id` is not null order by `id` asc limit 200 Error: SQLSTATE[HY001] 1038 "Out of sort memory, consider increasing server sort buffer size". ROOT CAUSE (verified in code): app/Jobs/EmbedChunks.php:637-641 reconcileCommitChunks(): Commit::query()->with('project')->whereHas('project', fn($p)=>$p->where('is_indexed',true))->when($this->projectId!==null,...)->chunkById($this->reconcilePageSize(), ...) chunkById() issues `... ORDER BY id ASC LIMIT 200` (forPageAfterId). The correlated whereHas EXISTS subquery blocks the optimizer from satisfying ORDER BY id via the PK index, so MySQL does a FILESORT over `SELECT *` rows. commits has wide columns (message text ~4.6KB max, files json, meta json) and 3956 rows; the packed SELECT * sort rows exceed this Herd MySQL's sort_buffer_size → 1038. NOT the resolved 512MB PHP-OOM embed cluster (#90) — different mechanism (MySQL sort buffer, not PHP memory), different fix. Likely triggered by a full (projectId=null) reconcile pass; it keeps retrying and failing → failed_jobs climbing + will feed FLOWER-J MaxAttempts. SUGGESTED FIX (implementer should EXPLAIN to confirm the filesort disappears): 1) Preferred: replace the correlated whereHas('project', is_indexed) with a pre-fetched narrow list — $ids = Project::where('is_indexed',true)->pluck('id'); ->whereIn('project_id', $ids) — so MySQL walks the PK (id) in order for chunkById and skips the filesort entirely (eliminates 1038 regardless of row width). Same pattern applies to the sibling reconcile* methods (segment/brief/todo/scratchpad/doc) which use the identical whereHas EXISTS shape and could hit this as their tables grow. 2) And/or narrow the select to only columns commitText()/commitFiles()/roughTokens() need instead of SELECT * (keeps sort rows small if a filesort is still chosen). 3) Fast stopgap (operator/orch, config not code): raise MySQL sort_buffer_size on the Herd instance to stop the bleeding while the code fix lands. Ops verified root cause; routing to orchestrator (daemon 26) for dispatch. Ledgered as sentry:triaged:flower-1x.
context
Structured context
{
"tool": "sentry",
"error": "SQLSTATE[HY001] 1038 Out of sort memory",
"issue": "FLOWER-1X",
"routed": {
"target": "orchestrator",
"todo_id": 389,
"authority": "autonomous",
"routed_at": "2026-07-04T07:24:42+00:00",
"routed_by": "flower-ops",
"project_id": 16,
"solo_todo_id": "709",
"solo_project_id": "49",
"coordination_queue": {
"kind": "route_feedback",
"drain": "orchestrator_recall_signals",
"status": "pending",
"latency": "<= one orchestrator heartbeat",
"signal_id": 84
},
"default_project_id": 16,
"coordination_signal_id": 84,
"fix_spec_scratchpad_id": 387,
"orchestrator_daemon_id": 26,
"solo_fix_spec_scratchpad_id": "1080",
"orchestrator_solo_process_id": 1091
},
"cluster": "NOT the resolved 512MB embed-OOM #90 — distinct MySQL sort-buffer bug",
"culprit": "app/Jobs/EmbedChunks.php:641 reconcileCommitChunks",
"failed_jobs": "2->8",
"org_project": "legitphp/flower",
"events_10min": 11,
"promotion_ledger": [
{
"at": "2026-07-04T07:24:42+00:00",
"action": "orchestrator_routed",
"target": "orchestrator",
"todo_id": 389,
"actor_ref": "flower-ops",
"cycle_key": "2026070407",
"fix_spec_scratchpad_id": 387
}
]
}state · operator override
Lifecycle
- created
- 1d ago
- triaged
- 1d ago
- resolved
- —
- resolved by
- —