Bug 111424 - Use a bitfield instead of TextPosition to save bytes in CompactToken
Summary: Use a bitfield instead of TextPosition to save bytes in CompactToken
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 111645
  Show dependency treegraph
 
Reported: 2013-03-05 04:53 PST by Eric Seidel (no email)
Modified: 2013-03-06 16:45 PST (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2013-03-05 04:53:46 PST
Use a bitfield instead of TextPosition to save bytes in CompactToken

We could use the remaining 25-bits left over from the other bitfields in CompactToken to save 4 bytes in CompactToken.

We have a lot of CompactTokens while parsing a Document, so keeping them small seems important. :)

We'll have to verify with a perf test (telemetry?) to make sure that this is actually a win.
Comment 1 Eric Seidel (no email) 2013-03-05 13:56:49 PST
Because the TextPosition is both a column and a line, we can't split 25 bits between both and have any sort of reasonable sized page handling.

So I think we should use an offset instead.  That's slightly more complicated, but will allow us to handle basically any sized page with ease.

The idea would be that each chunk would contain a TextPosition, which is the starting position for that chunk.

Each token would contain a 12 bit line-offset (4095 lines) and a 13 bit column offset (8191 columns).

I could imagine pages breaking past those intra-token offset limits with very long character tokens, but I expect that to be very rare.

Saving 8 bytes storage for every token we process is likely to be worth the code complexity, but again, we'd have to perf test.
Comment 2 Eric Seidel (no email) 2013-03-05 14:00:00 PST
(In reply to comment #1)
> I could imagine pages breaking past those intra-token offset limits with very long character tokens, but I expect that to be very rare.

Oh, I should also note, that to blow past those limits you would need your entire character token to be delivered to the parser in a single chunk from the network (unless it was loaded from the cache or a file:url when the whole document is parsed at once).
Comment 3 Eric Seidel (no email) 2013-03-05 14:01:50 PST
This makes me wonder if the tokenizer shouldn't be smart enough to yield character tokens after a certain size anyway. :)  Probably not worth the if in such a hot branch, but could save a bunch of reallocs for increasingly large buffers.
Comment 4 Adam Barth 2013-03-05 14:04:33 PST
Do we know that saving space in CompactHTMLToken is important?
Comment 5 Eric Seidel (no email) 2013-03-05 14:09:12 PST
(In reply to comment #4)
> Do we know that saving space in CompactHTMLToken is important?

We do not.  That data is a precursor to this work.  For example if we were to see a change from Tony's XSSInfo pointer removal we might be more exited about this change.

Since we process tokens in 1000 token chunks, this bug is about saving 8 bytes per token or 8k of memory per chunk.  It's not clear to me if that would be a win or not.
Comment 6 Eric Seidel (no email) 2013-03-05 14:11:47 PST
One could easily test by just simply removing the TextPosition member and running a benchmark. It's not clear which benchmark would show this, since the run-perf-test ones are currently blocked on bug 107236.