<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>111424</bug_id>
          
          <creation_ts>2013-03-05 04:53:46 -0800</creation_ts>
          <short_desc>Use a bitfield instead of TextPosition to save bytes in CompactToken</short_desc>
          <delta_ts>2023-04-01 00:51:11 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>New Bugs</component>
          <version>528+ (Nightly build)</version>
          <rep_platform>Unspecified</rep_platform>
          <op_sys>Unspecified</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>CONFIGURATION CHANGED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          <blocked>111645</blocked>
          <everconfirmed>1</everconfirmed>
          <reporter name="Eric Seidel (no email)">eric</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>abarth</cc>
    
    <cc>annevk</cc>
    
    <cc>tonyg</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>847563</commentid>
    <comment_count>0</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2013-03-05 04:53:46 -0800</bug_when>
    <thetext>Use a bitfield instead of TextPosition to save bytes in CompactToken

We could use the remaining 25-bits left over from the other bitfields in CompactToken to save 4 bytes in CompactToken.

We have a lot of CompactTokens while parsing a Document, so keeping them small seems important. :)

We&apos;ll have to verify with a perf test (telemetry?) to make sure that this is actually a win.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>848039</commentid>
    <comment_count>1</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2013-03-05 13:56:49 -0800</bug_when>
    <thetext>Because the TextPosition is both a column and a line, we can&apos;t split 25 bits between both and have any sort of reasonable sized page handling.

So I think we should use an offset instead.  That&apos;s slightly more complicated, but will allow us to handle basically any sized page with ease.

The idea would be that each chunk would contain a TextPosition, which is the starting position for that chunk.

Each token would contain a 12 bit line-offset (4095 lines) and a 13 bit column offset (8191 columns).

I could imagine pages breaking past those intra-token offset limits with very long character tokens, but I expect that to be very rare.

Saving 8 bytes storage for every token we process is likely to be worth the code complexity, but again, we&apos;d have to perf test.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>848042</commentid>
    <comment_count>2</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2013-03-05 14:00:00 -0800</bug_when>
    <thetext>(In reply to comment #1)
&gt; I could imagine pages breaking past those intra-token offset limits with very long character tokens, but I expect that to be very rare.

Oh, I should also note, that to blow past those limits you would need your entire character token to be delivered to the parser in a single chunk from the network (unless it was loaded from the cache or a file:url when the whole document is parsed at once).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>848043</commentid>
    <comment_count>3</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2013-03-05 14:01:50 -0800</bug_when>
    <thetext>This makes me wonder if the tokenizer shouldn&apos;t be smart enough to yield character tokens after a certain size anyway. :)  Probably not worth the if in such a hot branch, but could save a bunch of reallocs for increasingly large buffers.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>848047</commentid>
    <comment_count>4</comment_count>
    <who name="Adam Barth">abarth</who>
    <bug_when>2013-03-05 14:04:33 -0800</bug_when>
    <thetext>Do we know that saving space in CompactHTMLToken is important?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>848055</commentid>
    <comment_count>5</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2013-03-05 14:09:12 -0800</bug_when>
    <thetext>(In reply to comment #4)
&gt; Do we know that saving space in CompactHTMLToken is important?

We do not.  That data is a precursor to this work.  For example if we were to see a change from Tony&apos;s XSSInfo pointer removal we might be more exited about this change.

Since we process tokens in 1000 token chunks, this bug is about saving 8 bytes per token or 8k of memory per chunk.  It&apos;s not clear to me if that would be a win or not.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>848058</commentid>
    <comment_count>6</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2013-03-05 14:11:47 -0800</bug_when>
    <thetext>One could easily test by just simply removing the TextPosition member and running a benchmark. It&apos;s not clear which benchmark would show this, since the run-perf-test ones are currently blocked on bug 107236.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1945927</commentid>
    <comment_count>7</comment_count>
    <who name="Anne van Kesteren">annevk</who>
    <bug_when>2023-04-01 00:51:11 -0700</bug_when>
    <thetext>This has been refactored quite a bit already.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>