Bug 111424

Summary:	Use a bitfield instead of TextPosition to save bytes in CompactToken
Product:	WebKit	Reporter:	Eric Seidel (no email) <eric>
Component:	New Bugs	Assignee:	Nobody <webkit-unassigned>
Status:	RESOLVED CONFIGURATION CHANGED
Severity:	Normal	CC:	abarth, annevk, tonyg
Priority:	P2
Version:	528+ (Nightly build)
Hardware:	Unspecified
OS:	Unspecified
Bug Depends on:
Bug Blocks:	111645

Eric Seidel (no email)

Reported 2013-03-05 04:53:46 PST

Use a bitfield instead of TextPosition to save bytes in CompactToken We could use the remaining 25-bits left over from the other bitfields in CompactToken to save 4 bytes in CompactToken. We have a lot of CompactTokens while parsing a Document, so keeping them small seems important. :) We'll have to verify with a perf test (telemetry?) to make sure that this is actually a win.

Attachments
Add attachment proposed patch, testcase, etc.

Eric Seidel (no email)

Comment 1 2013-03-05 13:56:49 PST

Because the TextPosition is both a column and a line, we can't split 25 bits between both and have any sort of reasonable sized page handling. So I think we should use an offset instead. That's slightly more complicated, but will allow us to handle basically any sized page with ease. The idea would be that each chunk would contain a TextPosition, which is the starting position for that chunk. Each token would contain a 12 bit line-offset (4095 lines) and a 13 bit column offset (8191 columns). I could imagine pages breaking past those intra-token offset limits with very long character tokens, but I expect that to be very rare. Saving 8 bytes storage for every token we process is likely to be worth the code complexity, but again, we'd have to perf test.

Eric Seidel (no email)

Comment 2 2013-03-05 14:00:00 PST

(In reply to comment #1) > I could imagine pages breaking past those intra-token offset limits with very long character tokens, but I expect that to be very rare. Oh, I should also note, that to blow past those limits you would need your entire character token to be delivered to the parser in a single chunk from the network (unless it was loaded from the cache or a file:url when the whole document is parsed at once).

Eric Seidel (no email)

Comment 3 2013-03-05 14:01:50 PST

This makes me wonder if the tokenizer shouldn't be smart enough to yield character tokens after a certain size anyway. :) Probably not worth the if in such a hot branch, but could save a bunch of reallocs for increasingly large buffers.

Adam Barth

Comment 4 2013-03-05 14:04:33 PST

Do we know that saving space in CompactHTMLToken is important?

Eric Seidel (no email)

Comment 5 2013-03-05 14:09:12 PST

(In reply to comment #4) > Do we know that saving space in CompactHTMLToken is important? We do not. That data is a precursor to this work. For example if we were to see a change from Tony's XSSInfo pointer removal we might be more exited about this change. Since we process tokens in 1000 token chunks, this bug is about saving 8 bytes per token or 8k of memory per chunk. It's not clear to me if that would be a win or not.

Eric Seidel (no email)

Comment 6 2013-03-05 14:11:47 PST

One could easily test by just simply removing the TextPosition member and running a benchmark. It's not clear which benchmark would show this, since the run-perf-test ones are currently blocked on bug 107236.

Anne van Kesteren

Comment 7 2023-04-01 00:51:11 PDT

This has been refactored quite a bit already.

Note You need to log in before you can comment on or make changes to this bug.