RESOLVED FIXED 228858
white space atomization during parsing is expensive
https://bugs.webkit.org/show_bug.cgi?id=228858
Summary white space atomization during parsing is expensive
Cameron McCormack (:heycam)
Reported 2021-08-05 22:41:56 PDT
We have a memory optimization where the HTML parser will atomize any text node string that is all white space. The process for this is a bit expensive, since we must loop over all the characters in the string three times: * First, to check that all the characters are white space * Second, to hash the string when looking up the atom hash table * Third, to check the string for equality with any existing atom hash table entry Most white space strings we encounter have a limited form -- they have at most three or four runs of consecutive equal white space characters, e.g. it's common to see a newline followed by a number of space characters. We can take advantage of this by compressing the white space string into a simple run-length encoded form while we check that the string is entirely white space. If we keep a cache of recently atomized white space strings that can be quickly looked up, keyed off the encoded form, we can re-use a previous result of atomizing an identical string and avoid the hashing and hash entry equality checks. I have a WIP patch for this that is showing a 1% improvement on Speedometer 2 overall (due to 2-4% improvements on a few of the subtests) and no change to PLT5 on my local machine.
Attachments
Patch (17.57 KB, patch)
2021-08-06 01:03 PDT, Cameron McCormack (:heycam)
no flags
Patch (17.57 KB, patch)
2021-08-07 17:10 PDT, Cameron McCormack (:heycam)
no flags
Patch (17.76 KB, patch)
2021-08-08 23:52 PDT, Cameron McCormack (:heycam)
no flags
Radar WebKit Bug Importer
Comment 1 2021-08-05 22:42:07 PDT
Cameron McCormack (:heycam)
Comment 2 2021-08-06 01:03:56 PDT
Cameron McCormack (:heycam)
Comment 3 2021-08-07 17:10:47 PDT
Yusuke Suzuki
Comment 4 2021-08-08 23:19:12 PDT
Comment on attachment 435138 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=435138&action=review r=me > Source/WebCore/html/parser/HTMLConstructionSite.cpp:858 > + code |= (end - startOfRun); It should be `character - startOfRun`.
Cameron McCormack (:heycam)
Comment 5 2021-08-08 23:52:34 PDT
EWS
Comment 6 2021-08-09 04:05:13 PDT
Committed r280772 (240356@main): <https://commits.webkit.org/240356@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 435169 [details].
Yusuke Suzuki
Comment 7 2021-08-09 05:35:22 PDT
Per Arne Vollan
Comment 8 2021-08-19 18:37:29 PDT
Comment on attachment 435169 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=435169&action=review > Source/WebCore/html/parser/HTMLConstructionSite.cpp:896 > + WTFLogAlways("reuse code %llx", code); Will this create a lot of logging? > Source/WebCore/html/parser/HTMLConstructionSite.cpp:901 > + WTFLogAlways("override"); Ditto. > Source/WebCore/html/parser/HTMLConstructionSite.cpp:906 > + WTFLogAlways("replace code %llx", code); Ditto. > Source/WebCore/html/parser/HTMLConstructionSite.cpp:913 > + WTFLogAlways("new code %llx", code); Ditto.
Cameron McCormack (:heycam)
Comment 9 2021-08-19 18:44:07 PDT
(In reply to Per Arne Vollan from comment #8) > Comment on attachment 435169 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=435169&action=review > > > Source/WebCore/html/parser/HTMLConstructionSite.cpp:896 > > + WTFLogAlways("reuse code %llx", code); > > Will this create a lot of logging? Yes, Yusuke removed them shortly after this patch landed.
Note You need to log in before you can comment on or make changes to this bug.