Bug 111424

Summary: Use a bitfield instead of TextPosition to save bytes in CompactToken
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: New BugsAssignee: Nobody <webkit-unassigned>
Status: RESOLVED CONFIGURATION CHANGED    
Severity: Normal CC: abarth, annevk, tonyg
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 111645    

Description Eric Seidel (no email) 2013-03-05 04:53:46 PST
Use a bitfield instead of TextPosition to save bytes in CompactToken

We could use the remaining 25-bits left over from the other bitfields in CompactToken to save 4 bytes in CompactToken.

We have a lot of CompactTokens while parsing a Document, so keeping them small seems important. :)

We'll have to verify with a perf test (telemetry?) to make sure that this is actually a win.
Comment 1 Eric Seidel (no email) 2013-03-05 13:56:49 PST
Because the TextPosition is both a column and a line, we can't split 25 bits between both and have any sort of reasonable sized page handling.

So I think we should use an offset instead.  That's slightly more complicated, but will allow us to handle basically any sized page with ease.

The idea would be that each chunk would contain a TextPosition, which is the starting position for that chunk.

Each token would contain a 12 bit line-offset (4095 lines) and a 13 bit column offset (8191 columns).

I could imagine pages breaking past those intra-token offset limits with very long character tokens, but I expect that to be very rare.

Saving 8 bytes storage for every token we process is likely to be worth the code complexity, but again, we'd have to perf test.
Comment 2 Eric Seidel (no email) 2013-03-05 14:00:00 PST
(In reply to comment #1)
> I could imagine pages breaking past those intra-token offset limits with very long character tokens, but I expect that to be very rare.

Oh, I should also note, that to blow past those limits you would need your entire character token to be delivered to the parser in a single chunk from the network (unless it was loaded from the cache or a file:url when the whole document is parsed at once).
Comment 3 Eric Seidel (no email) 2013-03-05 14:01:50 PST
This makes me wonder if the tokenizer shouldn't be smart enough to yield character tokens after a certain size anyway. :)  Probably not worth the if in such a hot branch, but could save a bunch of reallocs for increasingly large buffers.
Comment 4 Adam Barth 2013-03-05 14:04:33 PST
Do we know that saving space in CompactHTMLToken is important?
Comment 5 Eric Seidel (no email) 2013-03-05 14:09:12 PST
(In reply to comment #4)
> Do we know that saving space in CompactHTMLToken is important?

We do not.  That data is a precursor to this work.  For example if we were to see a change from Tony's XSSInfo pointer removal we might be more exited about this change.

Since we process tokens in 1000 token chunks, this bug is about saving 8 bytes per token or 8k of memory per chunk.  It's not clear to me if that would be a win or not.
Comment 6 Eric Seidel (no email) 2013-03-05 14:11:47 PST
One could easily test by just simply removing the TextPosition member and running a benchmark. It's not clear which benchmark would show this, since the run-perf-test ones are currently blocked on bug 107236.
Comment 7 Anne van Kesteren 2023-04-01 00:51:11 PDT
This has been refactored quite a bit already.