WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
228858
white space atomization during parsing is expensive
https://bugs.webkit.org/show_bug.cgi?id=228858
Summary
white space atomization during parsing is expensive
Cameron McCormack (:heycam)
Reported
2021-08-05 22:41:56 PDT
We have a memory optimization where the HTML parser will atomize any text node string that is all white space. The process for this is a bit expensive, since we must loop over all the characters in the string three times: * First, to check that all the characters are white space * Second, to hash the string when looking up the atom hash table * Third, to check the string for equality with any existing atom hash table entry Most white space strings we encounter have a limited form -- they have at most three or four runs of consecutive equal white space characters, e.g. it's common to see a newline followed by a number of space characters. We can take advantage of this by compressing the white space string into a simple run-length encoded form while we check that the string is entirely white space. If we keep a cache of recently atomized white space strings that can be quickly looked up, keyed off the encoded form, we can re-use a previous result of atomizing an identical string and avoid the hashing and hash entry equality checks. I have a WIP patch for this that is showing a 1% improvement on Speedometer 2 overall (due to 2-4% improvements on a few of the subtests) and no change to PLT5 on my local machine.
Attachments
Patch
(17.57 KB, patch)
2021-08-06 01:03 PDT
,
Cameron McCormack (:heycam)
no flags
Details
Formatted Diff
Diff
Patch
(17.57 KB, patch)
2021-08-07 17:10 PDT
,
Cameron McCormack (:heycam)
no flags
Details
Formatted Diff
Diff
Patch
(17.76 KB, patch)
2021-08-08 23:52 PDT
,
Cameron McCormack (:heycam)
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2021-08-05 22:42:07 PDT
<
rdar://problem/81601565
>
Cameron McCormack (:heycam)
Comment 2
2021-08-06 01:03:56 PDT
Created
attachment 435057
[details]
Patch
Cameron McCormack (:heycam)
Comment 3
2021-08-07 17:10:47 PDT
Created
attachment 435138
[details]
Patch
Yusuke Suzuki
Comment 4
2021-08-08 23:19:12 PDT
Comment on
attachment 435138
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=435138&action=review
r=me
> Source/WebCore/html/parser/HTMLConstructionSite.cpp:858 > + code |= (end - startOfRun);
It should be `character - startOfRun`.
Cameron McCormack (:heycam)
Comment 5
2021-08-08 23:52:34 PDT
Created
attachment 435169
[details]
Patch
EWS
Comment 6
2021-08-09 04:05:13 PDT
Committed
r280772
(
240356@main
): <
https://commits.webkit.org/240356@main
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 435169
[details]
.
Yusuke Suzuki
Comment 7
2021-08-09 05:35:22 PDT
Committed
r280773
(
240357@main
): <
https://commits.webkit.org/240357@main
>
Per Arne Vollan
Comment 8
2021-08-19 18:37:29 PDT
Comment on
attachment 435169
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=435169&action=review
> Source/WebCore/html/parser/HTMLConstructionSite.cpp:896 > + WTFLogAlways("reuse code %llx", code);
Will this create a lot of logging?
> Source/WebCore/html/parser/HTMLConstructionSite.cpp:901 > + WTFLogAlways("override");
Ditto.
> Source/WebCore/html/parser/HTMLConstructionSite.cpp:906 > + WTFLogAlways("replace code %llx", code);
Ditto.
> Source/WebCore/html/parser/HTMLConstructionSite.cpp:913 > + WTFLogAlways("new code %llx", code);
Ditto.
Cameron McCormack (:heycam)
Comment 9
2021-08-19 18:44:07 PDT
(In reply to Per Arne Vollan from
comment #8
)
> Comment on
attachment 435169
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=435169&action=review
> > > Source/WebCore/html/parser/HTMLConstructionSite.cpp:896 > > + WTFLogAlways("reuse code %llx", code); > > Will this create a lot of logging?
Yes, Yusuke removed them shortly after this patch landed.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug