Bug 312688
| Summary: | YARR Interpreter Omits Named Group from `indices.groups` via Unconditional Tracking-Slot Reset on Backtrack | ||
|---|---|---|---|
| Product: | WebKit | Reporter: | parkjuny |
| Component: | JavaScriptCore | Assignee: | Nobody <webkit-unassigned> |
| Status: | RESOLVED FIXED | ||
| Severity: | Minor | CC: | webkit-bug-importer |
| Priority: | P2 | Keywords: | InRadar |
| Version: | WebKit Nightly Build | ||
| Hardware: | PC | ||
| OS: | Linux | ||
parkjuny
## Summary
The YARR interpreter resets the duplicate named-group tracking slot to zero on every backtrack. `RegExpMatchesArray.h` gates the `indicesGroups` property write on that slot being non-zero, so the interpreter silently omits the named-group property from `indices.groups`. The JIT retains a stale (non-zero) slot after the same backtrack and accidentally produces the correct result. The property should always be present.
## Bug
### Summary
For a duplicate named capture group that partially matches and then backtracks, `RegExpMatchesArray.h:228` writes the `"x"` property into `indicesGroups` only when `captureIndex > 0`. The interpreter zeros the tracking slot on backtrack, producing `captureIndex = 0`, so the property is never written and `"x" in m.indices.groups` returns `false`. The JIT does not zero the slot on backtrack, so `captureIndex` remains non-zero and the property is written with value `undefined` — matching the correct behavior. The JIT output is the correct answer; the interpreter output is wrong.
### Detail
**Root-cause site — `RegExpMatchesArray.h:228`:**
```cpp
groups->putDirect(vm, Identifier::fromString(vm, groupName), value); // always written — correct
if (createIndices && captureIndex > 0) // BUG: skips write when slot is 0
indicesGroups->putDirect(vm, Identifier::fromString(vm, groupName), indicesArray->getIndexQuickly(captureIndex));
```
`groups` always receives the property (even when `captureIndex == 0`, value is `jsUndefined()`). `indicesGroups` should mirror `groups`, but the `captureIndex > 0` guard prevents it.
**Interpreter backtrack — `YarrInterpreter.cpp:173-176`, `1134-1137` — restores slot to 0:**
For a `{N}` group, the interpreter uses `matchParentheses`/`backtrackParentheses` with a `ParenthesesDisjunctionContext` save/restore mechanism. Each context saves the current tracking slot value on allocation and zeros it:
```cpp
// YarrInterpreter.cpp:173-176 — ParenthesesDisjunctionContext constructor
for (unsigned duplicateNamedGroupId : m_duplicateNamedGroups) {
subpatternAndGroupIdBackup[...] = output[pattern->offsetForDuplicateNamedGroupId(duplicateNamedGroupId)]; // saves current value
output[pattern->offsetForDuplicateNamedGroupId(duplicateNamedGroupId)] = 0;
}
```
On backtrack, `resetMatches` → `restoreOutput` puts back the saved value:
```cpp
// YarrInterpreter.cpp:1134-1138
void resetMatches(ByteTerm& term, ParenthesesDisjunctionContext* context)
{
unsigned firstSubpatternId = term.subpatternId();
context->restoreOutput(output, firstSubpatternId);
}
```
Because `recordParenthesesMatch` — which writes the subpatternId to the slot — is called only after all N iterations succeed (`YarrInterpreter.cpp:1450`), the per-iteration contexts always save a slot value of 0. Backtracking therefore restores the slot to 0. After this, `subpatternIdForGroupName` returns 0 → `captureIndex = 0` → the `indicesGroups` write is skipped.
**JIT backtrack — `YarrJIT.cpp:4885-4888` — does not reset slot:**
```cpp
if (shouldRecordSubpatterns() && term->containsAnyCaptures()) {
for (unsigned subpattern = term->parentheses.subpatternId; subpattern <= term->parentheses.lastSubpatternId; subpattern++)
clearSubpattern(subpattern); // clears capture start/end; tracking slot left stale
}
```
The stale slot keeps `captureIndex > 0`, so the `indicesGroups` write proceeds and the property is present with value `undefined`.
**`subpatternIdForGroupName` — `RegExp.h:119-130` — reads the tracking slot:**
```cpp
return ovector[offsetVectorBaseForNamedCaptures() + it->value[0] - 1]; // tracking slot: 0 or subpatternId
```
### Trigger Conditions
1. Regex has the **`/d` flag**.
2. Pattern contains **duplicate named capture groups** across alternatives.
3. At least one duplicate group is **quantified `{N}` with N ≥ 2** (required for the JIT/interpreter discrepancy; with `{1}` both engines return `false`).
4. That group **partially matches then fails** (FixedCount path is entered and backtracked).
5. The overall match succeeds via an **alternative that does not define the duplicate group**.
## Version
### Reproduced Version
- `main` branch latest commit (2026/04/19): `a4390137a4039d12b4a0843e4f2b37e9ce2b6e6c`
## Reproduction Case
### Release Build
```bash
jsc poc.js # JIT (default): true ← correct
jsc --useRegExpJIT=false poc.js # Interpreter: false ← wrong
```
Debug build produces identical output; no assertion fires as the stale slot is not validated by any ASSERT.
### PoC Code
```js
let m = /(?<x>a){2}z|(?<x>b){2}y|c/d.exec("aac");
print("x" in m.indices.groups);
```
## Suggested Patch
### File: `Source/JavaScriptCore/runtime/RegExpMatchesArray.h`
#### Diff
```diff
--- a/Source/JavaScriptCore/runtime/RegExpMatchesArray.h
+++ b/Source/JavaScriptCore/runtime/RegExpMatchesArray.h
@@ -225,8 +225,10 @@ ALWAYS_INLINE JSArray* createRegExpMatchesArray(
value = jsUndefined();
groups->putDirect(vm, Identifier::fromString(vm, groupName), value);
- if (createIndices && captureIndex > 0)
- indicesGroups->putDirect(vm, Identifier::fromString(vm, groupName), indicesArray->getIndexQuickly(captureIndex));
+ if (createIndices) {
+ JSValue indicesValue = captureIndex > 0 ? indicesArray->getIndexQuickly(captureIndex) : jsUndefined();
+ indicesGroups->putDirect(vm, Identifier::fromString(vm, groupName), indicesValue);
+ }
}
}
}
```
This mirrors how `groups` is built (lines 221–226) and ensures `indicesGroups` always contains a property for every named capture group, with value `undefined` when the group did not participate.
Note: `YarrJIT.cpp:4885-4888` should separately add `storeDuplicateNamedGroupSubpatternId(duplicateNamedGroupId, 0)` in the FixedCount backtrack loop (matching the `ParenthesesSubpatternOnceBegin` backtrack at lines 4772–4778) to eliminate the stale slot — but that is an independent state-hygiene fix and does not affect observable behavior once `RegExpMatchesArray.h` is corrected.
### Credit Information
Reporter credit: Junyoung Park (@candymate) of KAIST Hacking Lab
| Attachments | ||
|---|---|---|
| Add attachment proposed patch, testcase, etc. |
Radar WebKit Bug Importer
<rdar://problem/175122294>
Kai Tamkun
Pull request: https://github.com/WebKit/WebKit/pull/63982
EWS
Committed 313762@main (e1cdfab158f3): <https://commits.webkit.org/313762@main>
Reviewed commits have been landed. Closing PR #63982 and removing active labels.