Bug 312688

Summary: YARR Interpreter Omits Named Group from `indices.groups` via Unconditional Tracking-Slot Reset on Backtrack
Product: WebKit Reporter: parkjuny
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Minor CC: webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: PC   
OS: Linux   

parkjuny
Reported 2026-04-18 12:24:56 PDT
## Summary The YARR interpreter resets the duplicate named-group tracking slot to zero on every backtrack. `RegExpMatchesArray.h` gates the `indicesGroups` property write on that slot being non-zero, so the interpreter silently omits the named-group property from `indices.groups`. The JIT retains a stale (non-zero) slot after the same backtrack and accidentally produces the correct result. The property should always be present. ## Bug ### Summary For a duplicate named capture group that partially matches and then backtracks, `RegExpMatchesArray.h:228` writes the `"x"` property into `indicesGroups` only when `captureIndex > 0`. The interpreter zeros the tracking slot on backtrack, producing `captureIndex = 0`, so the property is never written and `"x" in m.indices.groups` returns `false`. The JIT does not zero the slot on backtrack, so `captureIndex` remains non-zero and the property is written with value `undefined` — matching the correct behavior. The JIT output is the correct answer; the interpreter output is wrong. ### Detail **Root-cause site — `RegExpMatchesArray.h:228`:** ```cpp groups->putDirect(vm, Identifier::fromString(vm, groupName), value); // always written — correct if (createIndices && captureIndex > 0) // BUG: skips write when slot is 0 indicesGroups->putDirect(vm, Identifier::fromString(vm, groupName), indicesArray->getIndexQuickly(captureIndex)); ``` `groups` always receives the property (even when `captureIndex == 0`, value is `jsUndefined()`). `indicesGroups` should mirror `groups`, but the `captureIndex > 0` guard prevents it. **Interpreter backtrack — `YarrInterpreter.cpp:173-176`, `1134-1137` — restores slot to 0:** For a `{N}` group, the interpreter uses `matchParentheses`/`backtrackParentheses` with a `ParenthesesDisjunctionContext` save/restore mechanism. Each context saves the current tracking slot value on allocation and zeros it: ```cpp // YarrInterpreter.cpp:173-176 — ParenthesesDisjunctionContext constructor for (unsigned duplicateNamedGroupId : m_duplicateNamedGroups) { subpatternAndGroupIdBackup[...] = output[pattern->offsetForDuplicateNamedGroupId(duplicateNamedGroupId)]; // saves current value output[pattern->offsetForDuplicateNamedGroupId(duplicateNamedGroupId)] = 0; } ``` On backtrack, `resetMatches` → `restoreOutput` puts back the saved value: ```cpp // YarrInterpreter.cpp:1134-1138 void resetMatches(ByteTerm& term, ParenthesesDisjunctionContext* context) { unsigned firstSubpatternId = term.subpatternId(); context->restoreOutput(output, firstSubpatternId); } ``` Because `recordParenthesesMatch` — which writes the subpatternId to the slot — is called only after all N iterations succeed (`YarrInterpreter.cpp:1450`), the per-iteration contexts always save a slot value of 0. Backtracking therefore restores the slot to 0. After this, `subpatternIdForGroupName` returns 0 → `captureIndex = 0` → the `indicesGroups` write is skipped. **JIT backtrack — `YarrJIT.cpp:4885-4888` — does not reset slot:** ```cpp if (shouldRecordSubpatterns() && term->containsAnyCaptures()) { for (unsigned subpattern = term->parentheses.subpatternId; subpattern <= term->parentheses.lastSubpatternId; subpattern++) clearSubpattern(subpattern); // clears capture start/end; tracking slot left stale } ``` The stale slot keeps `captureIndex > 0`, so the `indicesGroups` write proceeds and the property is present with value `undefined`. **`subpatternIdForGroupName` — `RegExp.h:119-130` — reads the tracking slot:** ```cpp return ovector[offsetVectorBaseForNamedCaptures() + it->value[0] - 1]; // tracking slot: 0 or subpatternId ``` ### Trigger Conditions 1. Regex has the **`/d` flag**. 2. Pattern contains **duplicate named capture groups** across alternatives. 3. At least one duplicate group is **quantified `{N}` with N ≥ 2** (required for the JIT/interpreter discrepancy; with `{1}` both engines return `false`). 4. That group **partially matches then fails** (FixedCount path is entered and backtracked). 5. The overall match succeeds via an **alternative that does not define the duplicate group**. ## Version ### Reproduced Version - `main` branch latest commit (2026/04/19): `a4390137a4039d12b4a0843e4f2b37e9ce2b6e6c` ## Reproduction Case ### Release Build ```bash jsc poc.js # JIT (default): true ← correct jsc --useRegExpJIT=false poc.js # Interpreter: false ← wrong ``` Debug build produces identical output; no assertion fires as the stale slot is not validated by any ASSERT. ### PoC Code ```js let m = /(?<x>a){2}z|(?<x>b){2}y|c/d.exec("aac"); print("x" in m.indices.groups); ``` ## Suggested Patch ### File: `Source/JavaScriptCore/runtime/RegExpMatchesArray.h` #### Diff ```diff --- a/Source/JavaScriptCore/runtime/RegExpMatchesArray.h +++ b/Source/JavaScriptCore/runtime/RegExpMatchesArray.h @@ -225,8 +225,10 @@ ALWAYS_INLINE JSArray* createRegExpMatchesArray( value = jsUndefined(); groups->putDirect(vm, Identifier::fromString(vm, groupName), value); - if (createIndices && captureIndex > 0) - indicesGroups->putDirect(vm, Identifier::fromString(vm, groupName), indicesArray->getIndexQuickly(captureIndex)); + if (createIndices) { + JSValue indicesValue = captureIndex > 0 ? indicesArray->getIndexQuickly(captureIndex) : jsUndefined(); + indicesGroups->putDirect(vm, Identifier::fromString(vm, groupName), indicesValue); + } } } } ``` This mirrors how `groups` is built (lines 221–226) and ensures `indicesGroups` always contains a property for every named capture group, with value `undefined` when the group did not participate. Note: `YarrJIT.cpp:4885-4888` should separately add `storeDuplicateNamedGroupSubpatternId(duplicateNamedGroupId, 0)` in the FixedCount backtrack loop (matching the `ParenthesesSubpatternOnceBegin` backtrack at lines 4772–4778) to eliminate the stale slot — but that is an independent state-hygiene fix and does not affect observable behavior once `RegExpMatchesArray.h` is corrected. ### Credit Information Reporter credit: Junyoung Park (@candymate) of KAIST Hacking Lab
Attachments
Radar WebKit Bug Importer
Comment 1 2026-04-19 13:22:15 PDT
Kai Tamkun
Comment 2 2026-04-30 10:52:52 PDT
EWS
Comment 3 2026-05-22 14:30:35 PDT
Committed 313762@main (e1cdfab158f3): <https://commits.webkit.org/313762@main> Reviewed commits have been landed. Closing PR #63982 and removing active labels.
Note You need to log in before you can comment on or make changes to this bug.