<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>275279</bug_id>
          
          <creation_ts>2024-06-07 15:06:08 -0700</creation_ts>
          <short_desc>[JSC] Use immediate bit-vectors for character class matching in YarrJIT</short_desc>
          <delta_ts>2024-07-26 15:03:20 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>JavaScriptCore</component>
          <version>WebKit Nightly Build</version>
          <rep_platform>Unspecified</rep_platform>
          <op_sys>Unspecified</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          <see_also>https://bugs.webkit.org/show_bug.cgi?id=277174</see_also>
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords>InRadar</keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="David Degazio">d_degazio</reporter>
          <assigned_to name="David Degazio">d_degazio</assigned_to>
          <cc>webkit-bug-importer</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>2040417</commentid>
    <comment_count>0</comment_count>
    <who name="David Degazio">d_degazio</who>
    <bug_when>2024-06-07 15:06:08 -0700</bug_when>
    <thetext>Currently YarrJIT is relatively naive when it comes to matching character classes, as far as I can tell (ignoring some special cases, i.e. unifying ASCII letters using a bit flip, lookup tables for spaces) there are pretty much two ways we match characters:
 - Binary search over the codepoint values, which is reasonably effective in general for lots of contiguous ranges, but relatively branchy.
 - Sequential character-by-character equality checks (!) for sets of non-range character matches.

The latter especially is super naive, if we have a set of N discontiguous characters we are potentially doing N independent branch-if-equal checks...

I think we can improve this by exploiting the fact that lots of character checks are quite close together. Even considering unicode, my guess is it&apos;s pretty likely for ranges in a character class to be close to each other in the codepoint range, or for there to be clusters of codepoints in general within a few tens of each other. In these cases, we can do a single range-check, a subtract, and a bit-vector test to see if a character is within some small set, even if the set is sparse. If we keep the sets small enough, we can also avoid any (data) memory overhead by directly materializing the set into a register.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2040418</commentid>
    <comment_count>1</comment_count>
    <who name="Radar WebKit Bug Importer">webkit-bug-importer</who>
    <bug_when>2024-06-07 15:06:19 -0700</bug_when>
    <thetext>&lt;rdar://problem/129419939&gt;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2040880</commentid>
    <comment_count>2</comment_count>
    <who name="David Degazio">d_degazio</who>
    <bug_when>2024-06-11 15:34:37 -0700</bug_when>
    <thetext>Pull request: https://github.com/WebKit/WebKit/pull/29727</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2043446</commentid>
    <comment_count>3</comment_count>
    <who name="EWS">ews-feeder</who>
    <bug_when>2024-06-27 12:56:21 -0700</bug_when>
    <thetext>Committed 280425@main (34b0b047bb64): &lt;https://commits.webkit.org/280425@main&gt;

Reviewed commits have been landed. Closing PR #29727 and removing active labels.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>