<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>312690</bug_id>
          
          <creation_ts>2026-04-18 14:49:02 -0700</creation_ts>
          <short_desc>Input Position Corruption in Yarr Backreference Backward Matching via `tryReadBackward` Surrogate Pair Rewind</short_desc>
          <delta_ts>2026-05-11 14:06:45 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>JavaScriptCore</component>
          <version>WebKit Nightly Build</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Linux</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords>InRadar</keywords>
          <priority>P2</priority>
          <bug_severity>Minor</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>parkjuny</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>karlcow</cc>
    
    <cc>webkit-bug-importer</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>2201878</commentid>
    <comment_count>0</comment_count>
    <who name="">parkjuny</who>
    <bug_when>2026-04-18 14:49:02 -0700</bug_when>
    <thetext>## Summary

A position corruption bug exists in the Yarr regex interpreter&apos;s `tryConsumeBackReference` function when processing greedy backreferences inside lookbehind assertions with supplementary (non-BMP) Unicode characters. The `tryReadBackward` function calls `rewind(1)` when it encounters a UTF-16 trail surrogate preceded by a lead surrogate, but when the backreference match subsequently fails, the Greedy quantifier path in `matchBackReference` does not restore the corrupted input position. This causes subsequent regex term matching to use a wrong position, producing incorrect match results that violate the ECMAScript specification.

## Bug

### Summary

In `YarrInterpreter.cpp`, the function `tryConsumeBackReference()` does not restore the input position (`pos`) when returning `false` in the backward (lookbehind) matching direction. The function `tryReadBackward()` has a side effect: when it encounters a UTF-16 trail surrogate preceded by a lead surrogate (a supplementary character), it calls `rewind(1)` to decrement `pos` by 1 to account for the two-code-unit width of the surrogate pair. If the backreference comparison then fails, `tryConsumeBackReference` returns `false` without restoring `pos`, leaving it corrupted (decremented by 1). The Greedy case in `matchBackReference` unconditionally returns `true` after the while loop exits (when `tryConsumeBackReference` returns `false`), propagating the corrupted position to subsequent term matching.

### Detail

The bug involves three interacting code paths:

**1. `tryReadBackward()` — the position-modifying side effect:**

```cpp
// Source/JavaScriptCore/yarr/YarrInterpreter.cpp:348-360
char32_t NODELETE tryReadBackward(unsigned negativePositionOffest)
{
    if (pos &lt; negativePositionOffest)
        return errorCodePoint;
    unsigned p = pos - negativePositionOffest;
    ASSERT(p &lt; length);
    auto result = input[p];
    if (U16_IS_TRAIL(result) &amp;&amp; decodeSurrogatePairs &amp;&amp; p &gt; 0 &amp;&amp; U16_IS_LEAD(input[p - 1])) {
        rewind(1);  // &lt;-- Modifies pos! Decrements by 1
        return U16_GET_SUPPLEMENTARY(input[p - 1], result);
    }
    return result;
}
```

When reading backward and encountering a trail surrogate preceded by a lead surrogate, `rewind(1)` decrements `pos` by 1 to account for the full surrogate pair. This is correct for successful matches (the extra code unit is tracked via the position decrement), but becomes a bug when the match fails and `pos` is not restored.

**2. `tryConsumeBackReference()` — missing position restoration on backward failure:**

```cpp
// Source/JavaScriptCore/yarr/YarrInterpreter.cpp:641-690
bool NODELETE tryConsumeBackReference(int matchBegin, int matchEnd, ByteTerm&amp; term)
{
    unsigned matchSize = (unsigned)(matchEnd - matchBegin);

    if (term.matchDirection() == Forward) {
        if (!input.checkInput(matchSize))
            return false;
    }

    for (unsigned i = 0; i &lt; matchSize; ++i) {
        unsigned negativeInputOffset = term.inputPosition + matchSize - i;
        if (term.matchDirection() == Backward &amp;&amp; negativeInputOffset &gt; input.getPos())
            return false;  // &lt;-- pos not yet corrupted, but no restoration either

        char32_t oldCh = input.reread(matchBegin + i);
        char32_t ch;
        if (!U_IS_BMP(oldCh)) {
            ch = input.readSurrogatePairChecked(negativeInputOffset);
            ++i;
        } else
            ch = term.matchDirection() == Forward ? input.readCheckedDontAdvance(negativeInputOffset) : input.tryReadBackward(negativeInputOffset);
            // ^^^ tryReadBackward may call rewind(1), modifying pos

        if (oldCh == errorCodePoint || ch == errorCodePoint)
            return false;  // &lt;-- Returns false WITHOUT restoring pos for Backward direction

        if (oldCh == ch)
            continue;

        // ... case-insensitive checks ...

        if (term.matchDirection() == Forward)
            input.uncheckInput(matchSize);  // Only restores for Forward!

        return false;  // &lt;-- Returns false WITHOUT restoring pos for Backward direction
    }

    if (term.matchDirection() == Backward)
        input.uncheckInput(matchSize);  // Only called on SUCCESS

    return true;
}
```

Lines 680–681 handle Forward direction position restoration on failure via `uncheckInput`. There is no equivalent restoration for the Backward direction. Lines 686–687 restore position for the Backward direction, but only on **success** (return true path).

**3. `matchBackReference()` Greedy case — no position save/restore:**

```cpp
// Source/JavaScriptCore/yarr/YarrInterpreter.cpp:1049-1055
case QuantifierType::Greedy: {
    unsigned matchAmount = 0;
    while ((matchAmount &lt; term.atom.quantityMaxCount) &amp;&amp; tryConsumeBackReference(matchBegin, matchEnd, term))
        ++matchAmount;
    backTrack-&gt;matchAmount = matchAmount;
    return true;  // &lt;-- Always returns true, even with corrupted pos
}
```

When `tryConsumeBackReference` returns `false` (match failure), the while loop exits and the function unconditionally returns `true`. The position corruption from `tryReadBackward`&apos;s `rewind(1)` persists, and the caller sees a position that is off by 1.

Compare with FixedCount (lines 1039–1046), which properly restores position via `input.setPos(backTrack-&gt;begin)` on failure:

```cpp
case QuantifierType::FixedCount: {
    for (unsigned matchAmount = 0; matchAmount &lt; term.atom.quantityMaxCount; ++matchAmount) {
        if (!tryConsumeBackReference(matchBegin, matchEnd, term)) {
            input.setPos(backTrack-&gt;begin);  // &lt;-- Properly restores position
            return false;
        }
    }
    return true;
}
```

Also note that at initialization time (lines 1005–1007), the Greedy case does not save `backTrack-&gt;begin`:

```cpp
case QuantifierType::Greedy:
    backTrack-&gt;matchAmount = 0;
    break;  // &lt;-- Does NOT save backTrack-&gt;begin = input.getPos()
```

While FixedCount does save it (lines 1001–1003):

```cpp
case QuantifierType::FixedCount:
    backTrack-&gt;begin = input.getPos();  // &lt;-- Saves position for restoration
    break;
```

### Trigger Conditions

1. A lookbehind assertion (`(?&lt;=...)`) that forces backward matching direction (lookbehinds always use the Yarr interpreter, not JIT).
2. A greedy backreference quantifier (`\N*` or `\N+`) inside the lookbehind.
3. The backward read position lands on a trail surrogate of a supplementary (non-BMP) Unicode character.
4. The backreference match fails (the captured text does not match the supplementary character).
5. The Unicode flag (`u` or `v`) must be set for surrogate pair decoding.

## Version

### Reproduced Version

- `main` branch latest commit (2026/04/18): `a4390137a4038c07661f58e45ee205b5a623f9dd`

## Reproduction Case

### Release Build

```bash
WebKitBuild/JSCOnly/Release/bin/jsc /tmp/poc.js
```

Result:

```
null
b,a
b,a
b,a
```

Line 1 (`null`) is the bug: `/(?&lt;=\u{10000}\1*(a))b/u` should match at index 3. Lines 2–4 are the BMP-only, non-greedy, and fixed-count controls — all match correctly, isolating the bug to greedy backreference + non-BMP character.

Debug build result identical to release. No ASSERT fires — the position corruption produces a silent wrong result; `rewind(1)` leaves `pos` at a valid in-bounds index so no internal sanity check triggers.

### V8 (Chrome)

```
b,a
b,a
b,a
b,a
```

All four cases produce `b,a` — confirming the first case is a JSC-only bug.

### PoC Code

```js
// Yarr: greedy backref in lookbehind with non-BMP char corrupts pos (tryConsumeBackReference)
print(/(?&lt;=\u{10000}\1*(a))b/u.exec(&quot;\u{10000}ab&quot;));   // null (bug; expected: [&quot;b&quot;,&quot;a&quot;,index:3])
print(/(?&lt;=Z\1*(a))b/u.exec(&quot;Zab&quot;));                   // [&quot;b&quot;,&quot;a&quot;,...] — BMP-only, works
print(/(?&lt;=\u{10000}\1*?(a))b/u.exec(&quot;\u{10000}ab&quot;));  // [&quot;b&quot;,&quot;a&quot;,...] — non-greedy, works
print(/(?&lt;=\u{10000}\1{0}(a))b/u.exec(&quot;\u{10000}ab&quot;)); // [&quot;b&quot;,&quot;a&quot;,...] — fixed-count, works
```

## Suggested Patch

### File: `Source/JavaScriptCore/yarr/YarrInterpreter.cpp`

The root cause is that `tryConsumeBackReference` does not save and restore `input.pos` on failure paths when `matchDirection() == Backward`. The fix saves `pos` at function entry and restores it on every early-return `false` path in the backward case.

```diff
--- a/Source/JavaScriptCore/yarr/YarrInterpreter.cpp
+++ b/Source/JavaScriptCore/yarr/YarrInterpreter.cpp
@@ -641,50 +641,59 @@ public:
     bool NODELETE tryConsumeBackReference(int matchBegin, int matchEnd, ByteTerm&amp; term)
     {
         unsigned matchSize = (unsigned)(matchEnd - matchBegin);
 
         if (term.matchDirection() == Forward) {
             if (!input.checkInput(matchSize))
                 return false;
         }
 
+        unsigned savedPos = input.getPos();
+
         for (unsigned i = 0; i &lt; matchSize; ++i) {
             unsigned negativeInputOffset = term.inputPosition + matchSize - i;
-            if (term.matchDirection() == Backward &amp;&amp; negativeInputOffset &gt; input.getPos())
+            if (term.matchDirection() == Backward &amp;&amp; negativeInputOffset &gt; input.getPos()) {
+                input.setPos(savedPos);
                 return false;
+            }
 
             char32_t oldCh = input.reread(matchBegin + i);
             char32_t ch;
             if (!U_IS_BMP(oldCh)) {
                 ch = input.readSurrogatePairChecked(negativeInputOffset);
                 ++i;
             } else
                 ch = term.matchDirection() == Forward ? input.readCheckedDontAdvance(negativeInputOffset) : input.tryReadBackward(negativeInputOffset);
 
-            if (oldCh == errorCodePoint || ch == errorCodePoint)
+            if (oldCh == errorCodePoint || ch == errorCodePoint) {
+                if (term.matchDirection() == Backward)
+                    input.setPos(savedPos);
                 return false;
+            }
 
             if (oldCh == ch)
                 continue;
 
             if (term.ignoreCase()) {
                 // See ES 6.0, 21.2.2.8.2 for the definition of Canonicalize(). For non-Unicode
                 // patterns, Unicode values are never allowed to match against ASCII ones.
                 // For Unicode, we need to check all canonical equivalents of a character.
                 if (isLegacyCompilation() &amp;&amp; (isASCII(oldCh) || isASCII(ch))) {
                     if (toASCIIUpper(oldCh) == toASCIIUpper(ch))
                         continue;
                 } else if (areCanonicallyEquivalent(oldCh, ch, isEitherUnicodeCompilation() ? CanonicalMode::Unicode : CanonicalMode::UCS2))
                     continue;
             }
 
             if (term.matchDirection() == Forward)
                 input.uncheckInput(matchSize);
+            else
+                input.setPos(savedPos);
 
             return false;
         }
 
         if (term.matchDirection() == Backward)
             input.uncheckInput(matchSize);
 
         return true;
     }
```

#### Explanation

The fix saves `input.pos` at the start of `tryConsumeBackReference` and restores it on all failure paths when `matchDirection() == Backward`. This ensures that the `rewind(1)` side effect from `tryReadBackward`&apos;s surrogate pair handling does not leak out of a failed backreference match attempt.

The fix addresses all three `return false` paths in the function:
1. When `negativeInputOffset &gt; input.getPos()` (boundary check, defensive).
2. When `oldCh == errorCodePoint || ch == errorCodePoint` (read error — the critical path where `tryReadBackward` has already called `rewind(1)`).
3. When characters do not match and case-insensitive comparison also fails.

This is consistent with how the Forward direction handles failure via `input.uncheckInput(matchSize)`, and how the FixedCount case in `matchBackReference` handles failure via `input.setPos(backTrack-&gt;begin)`. An alternative fix (Option B) would save/restore position in the Greedy case of `matchBackReference`, but fixing it at the source in `tryConsumeBackReference` is preferable as it also protects any future callers.

### Credit Information

Reporter credit: Junyoung Park (@candymate) of KAIST Hacking Lab</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2202031</commentid>
    <comment_count>1</comment_count>
    <who name="Radar WebKit Bug Importer">webkit-bug-importer</who>
    <bug_when>2026-04-19 13:22:10 -0700</bug_when>
    <thetext>&lt;rdar://problem/175122467&gt;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2205004</commentid>
    <comment_count>2</comment_count>
    <who name="Kai Tamkun">k_tamkun</who>
    <bug_when>2026-04-27 11:13:46 -0700</bug_when>
    <thetext>Pull request: https://github.com/WebKit/WebKit/pull/63722</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2209840</commentid>
    <comment_count>3</comment_count>
    <who name="EWS">ews-feeder</who>
    <bug_when>2026-05-11 14:06:43 -0700</bug_when>
    <thetext>Committed 313026@main (f44dcfd3c922): &lt;https://commits.webkit.org/313026@main&gt;

Reviewed commits have been landed. Closing PR #63722 and removing active labels.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>