Bug planned #99 routed · orchestrator

FLOWER-1X (NEW/escalating): EmbedChunks::reconcileCommitChunks throws MySQL 1038 "Out of sort memory" — SELECT * filesort over commits; actively failing EmbedChunks jobs (failed_jobs 2→8)

flower-ops (daemon 28) · submitted 1 day ago

detail

What they reported

SENTRY: FLOWER-1X (legitphp/flower), first seen 2026-07-04 07:10Z, escalating — 11 events/10min, last seen 0min ago (actively firing). Correlates with failed_jobs jumping 2→8 this cycle. Env: local horizon:work (EmbedChunks on the reconcile pass). FAILING QUERY (from the event): select * from `commits` where exists (select * from `projects` where `commits`.`project_id` = `projects`.`id` and `is_indexed` = 1) and `id` is not null order by `id` asc limit 200 Error: SQLSTATE[HY001] 1038 "Out of sort memory, consider increasing server sort buffer size". ROOT CAUSE (verified in code): app/Jobs/EmbedChunks.php:637-641 reconcileCommitChunks(): Commit::query()->with('project')->whereHas('project', fn($p)=>$p->where('is_indexed',true))->when($this->projectId!==null,...)->chunkById($this->reconcilePageSize(), ...) chunkById() issues `... ORDER BY id ASC LIMIT 200` (forPageAfterId). The correlated whereHas EXISTS subquery blocks the optimizer from satisfying ORDER BY id via the PK index, so MySQL does a FILESORT over `SELECT *` rows. commits has wide columns (message text ~4.6KB max, files json, meta json) and 3956 rows; the packed SELECT * sort rows exceed this Herd MySQL's sort_buffer_size → 1038. NOT the resolved 512MB PHP-OOM embed cluster (#90) — different mechanism (MySQL sort buffer, not PHP memory), different fix. Likely triggered by a full (projectId=null) reconcile pass; it keeps retrying and failing → failed_jobs climbing + will feed FLOWER-J MaxAttempts. SUGGESTED FIX (implementer should EXPLAIN to confirm the filesort disappears): 1) Preferred: replace the correlated whereHas('project', is_indexed) with a pre-fetched narrow list — $ids = Project::where('is_indexed',true)->pluck('id'); ->whereIn('project_id', $ids) — so MySQL walks the PK (id) in order for chunkById and skips the filesort entirely (eliminates 1038 regardless of row width). Same pattern applies to the sibling reconcile* methods (segment/brief/todo/scratchpad/doc) which use the identical whereHas EXISTS shape and could hit this as their tables grow. 2) And/or narrow the select to only columns commitText()/commitFiles()/roughTokens() need instead of SELECT * (keeps sort rows small if a filesort is still chosen). 3) Fast stopgap (operator/orch, config not code): raise MySQL sort_buffer_size on the Herd instance to stop the bleeding while the code fix lands. Ops verified root cause; routing to orchestrator (daemon 26) for dispatch. Ledgered as sentry:triaged:flower-1x.

context

Structured context

{
    "tool": "sentry",
    "error": "SQLSTATE[HY001] 1038 Out of sort memory",
    "issue": "FLOWER-1X",
    "routed": {
        "target": "orchestrator",
        "todo_id": 389,
        "authority": "autonomous",
        "routed_at": "2026-07-04T07:24:42+00:00",
        "routed_by": "flower-ops",
        "project_id": 16,
        "solo_todo_id": "709",
        "solo_project_id": "49",
        "coordination_queue": {
            "kind": "route_feedback",
            "drain": "orchestrator_recall_signals",
            "status": "pending",
            "latency": "<= one orchestrator heartbeat",
            "signal_id": 84
        },
        "default_project_id": 16,
        "coordination_signal_id": 84,
        "fix_spec_scratchpad_id": 387,
        "orchestrator_daemon_id": 26,
        "solo_fix_spec_scratchpad_id": "1080",
        "orchestrator_solo_process_id": 1091
    },
    "cluster": "NOT the resolved 512MB embed-OOM #90 — distinct MySQL sort-buffer bug",
    "culprit": "app/Jobs/EmbedChunks.php:641 reconcileCommitChunks",
    "failed_jobs": "2->8",
    "org_project": "legitphp/flower",
    "events_10min": 11,
    "promotion_ledger": [
        {
            "at": "2026-07-04T07:24:42+00:00",
            "action": "orchestrator_routed",
            "target": "orchestrator",
            "todo_id": 389,
            "actor_ref": "flower-ops",
            "cycle_key": "2026070407",
            "fix_spec_scratchpad_id": 387
        }
    ]
}

state · operator override

Lifecycle

created: 1d ago
triaged: 1d ago
resolved: —
resolved by: —

resolution note — saved when you archive / ignore / mark duplicate

Delete permanently?