<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>289567</bug_id>
          
          <creation_ts>2025-03-11 14:39:38 -0700</creation_ts>
          <short_desc>[Yarr] Improve processing of adjacent or near adjacent single characters</short_desc>
          <delta_ts>2025-03-31 12:40:08 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>JavaScriptCore</component>
          <version>Other</version>
          <rep_platform>Unspecified</rep_platform>
          <op_sys>Unspecified</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords>InRadar</keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Michael Saboff">msaboff</reporter>
          <assigned_to name="Michael Saboff">msaboff</assigned_to>
          <cc>webkit-bug-importer</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>2102338</commentid>
    <comment_count>0</comment_count>
    <who name="Michael Saboff">msaboff</who>
    <bug_when>2025-03-11 14:39:38 -0700</bug_when>
    <thetext>There currently is an optimization in the Yarr JIT where we process adjacent single character atoms.  For example, /abcd/ is processed as:
   1:Term PatternCharacter checked-offset:(4) &apos;a&apos;
              &lt;44&gt; 0x12f018b6c:    sub      x17, x0, #4
              &lt;48&gt; 0x12f018b70:    ldr      w17, [x17, x1] 
              &lt;52&gt; 0x12f018b74:    movz     w16, #0x6261
              &lt;56&gt; 0x12f018b78:    movk     w16, #0x6463, lsl #16 -&gt; 0x64636261
              &lt;60&gt; 0x12f018b7c:    cmp      w17, w16
              &lt;64&gt; 0x12f018b80:    b.ne     0x12f018b90 -&gt; &lt;80&gt;
   2:Term PatternCharacter checked-offset:(4) &apos;b&apos; already handled
   3:Term PatternCharacter checked-offset:(4) &apos;c&apos; already handled
   4:Term PatternCharacter checked-offset:(4) &apos;d&apos; already handled

but if there is something in between we could check characters that a nearly adjacent individually.  For example, /a\dbc/ is currently processed as:
   1:Term PatternCharacter checked-offset:(4) &apos;a&apos;
              &lt;84&gt; 0x12f015054:    sub      x17, x0, #4
              &lt;88&gt; 0x12f015058:    ldrb     w6, [x17, x1]
              &lt;92&gt; 0x12f01505c:    cmp      w6, #97
              &lt;96&gt; 0x12f015060:    b.ne     0x12f015098 -&gt; &lt;152&gt;
   2:Term PatternCharacter checked-offset:(4) &apos;b&apos;
             &lt;100&gt; 0x12f015064:    sub      x17, x0, #2
             &lt;104&gt; 0x12f015068:    ldrh     w6, [x17, x1]
             &lt;108&gt; 0x12f01506c:    movz     w16, #0x6362 -&gt; 25442
             &lt;112&gt; 0x12f015070:    cmp      w6, w16
             &lt;116&gt; 0x12f015074:    b.ne     0x12f015098 -&gt; &lt;152&gt;
   3:Term PatternCharacter checked-offset:(4) &apos;c&apos; already handled
   4:Term PatternCharacterClass checked-offset:(4) &lt;digits&gt;
             ...
Note that we have an existing optimization to move the matching of character classes to after single character atoms.

For the second case, we could load 4 characters and mask out the character class character like:
   1:Term PatternCharacter checked-offset:(4) &apos;a&apos;
              &lt;84&gt; 0x12f014f54:    sub      x17, x0, #4 
              &lt;88&gt; 0x12f014f58:    ldr      w6, [x17, x1]
              &lt;92&gt; 0x12f014f5c:    and      w6, w6, #0xffff00ff
              &lt;96&gt; 0x12f014f60:    movz     w16, #0x61
             &lt;100&gt; 0x12f014f64:    movk     w16, #0x6362, lsl #16 -&gt; 0x63620061
             &lt;104&gt; 0x12f014f68:    cmp      w6, w16
             &lt;108&gt; 0x12f014f6c:    b.ne     0x12f014f90 -&gt; &lt;144&gt;
   2:Term PatternCharacter checked-offset:(4) &apos;b&apos; already handled
   3:Term PatternCharacter checked-offset:(4) &apos;c&apos; already handled
   4:Term PatternCharacterClass checked-offset:(4) &lt;digits&gt;
             ...
This eliminating a load, compare and branch.

The more general case is to use larger load, compare and branch code sequences for single character atoms, including patterns that have mixed in single character width character class atoms.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2102339</commentid>
    <comment_count>1</comment_count>
    <who name="Radar WebKit Bug Importer">webkit-bug-importer</who>
    <bug_when>2025-03-11 14:40:17 -0700</bug_when>
    <thetext>&lt;rdar://problem/146795365&gt;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2102355</commentid>
    <comment_count>2</comment_count>
    <who name="Michael Saboff">msaboff</who>
    <bug_when>2025-03-11 15:28:33 -0700</bug_when>
    <thetext>Pull request: https://github.com/WebKit/WebKit/pull/42284</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2102491</commentid>
    <comment_count>3</comment_count>
    <who name="EWS">ews-feeder</who>
    <bug_when>2025-03-12 01:38:01 -0700</bug_when>
    <thetext>Committed 292003@main (1e14cbbdc2f5): &lt;https://commits.webkit.org/292003@main&gt;

Reviewed commits have been landed. Closing PR #42284 and removing active labels.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2107517</commentid>
    <comment_count>4</comment_count>
    <who name="EWS">ews-feeder</who>
    <bug_when>2025-03-31 12:40:08 -0700</bug_when>
    <thetext>Committed 289651.362@safari-7621-branch (b78009996aa0): &lt;https://commits.webkit.org/289651.362@safari-7621-branch&gt;

Reviewed commits have been landed. Closing PR #2897 and removing active labels.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>