Use a bitfield instead of TextPosition to save bytes in CompactToken
We could use the remaining 25-bits left over from the other bitfields in CompactToken to save 4 bytes in CompactToken.
We have a lot of CompactTokens while parsing a Document, so keeping them small seems important. :)
We'll have to verify with a perf test (telemetry?) to make sure that this is actually a win.
Because the TextPosition is both a column and a line, we can't split 25 bits between both and have any sort of reasonable sized page handling.
So I think we should use an offset instead. That's slightly more complicated, but will allow us to handle basically any sized page with ease.
The idea would be that each chunk would contain a TextPosition, which is the starting position for that chunk.
Each token would contain a 12 bit line-offset (4095 lines) and a 13 bit column offset (8191 columns).
I could imagine pages breaking past those intra-token offset limits with very long character tokens, but I expect that to be very rare.
Saving 8 bytes storage for every token we process is likely to be worth the code complexity, but again, we'd have to perf test.
(In reply to comment #1)
> I could imagine pages breaking past those intra-token offset limits with very long character tokens, but I expect that to be very rare.
Oh, I should also note, that to blow past those limits you would need your entire character token to be delivered to the parser in a single chunk from the network (unless it was loaded from the cache or a file:url when the whole document is parsed at once).
This makes me wonder if the tokenizer shouldn't be smart enough to yield character tokens after a certain size anyway. :) Probably not worth the if in such a hot branch, but could save a bunch of reallocs for increasingly large buffers.
Do we know that saving space in CompactHTMLToken is important?
(In reply to comment #4)
> Do we know that saving space in CompactHTMLToken is important?
We do not. That data is a precursor to this work. For example if we were to see a change from Tony's XSSInfo pointer removal we might be more exited about this change.
Since we process tokens in 1000 token chunks, this bug is about saving 8 bytes per token or 8k of memory per chunk. It's not clear to me if that would be a win or not.
One could easily test by just simply removing the TextPosition member and running a benchmark. It's not clear which benchmark would show this, since the run-perf-test ones are currently blocked on bug 107236.