WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
106127
[meta] HTML parser shouldn't block the main thread
https://bugs.webkit.org/show_bug.cgi?id=106127
Summary
[meta] HTML parser shouldn't block the main thread
Adam Barth
Reported
2013-01-04 13:08:22 PST
This is a meta bug for moving the HTML parser off the main thread. We're currently evaluating how much performance there is to be gained from this change. The performance gains might arise in two ways: 1) Moving parsing off the main thread could make web pages more responsive because the main thread is available for handling input events and executing JavaScript. 2) Moving parsing off the main thread could make web pages load more quickly because WebCore can do other work in parallel with parsing HTML (such as parsing CSS or attaching elements to the render tree). While we investigate these possible performance benefits, we might refactor the parser a bit to remove main-thread dependencies from the core objects (e.g., HTMLTokenizer and HTMLTreeBuilder). Once we have more data, we'll start a discussion on webkit-dev before making any major architectural changes.
Attachments
HTML parser runtime (measured on chromium-mac on a Macbook Pro via inspector instrumentation)
(17.46 KB, text/html)
2013-01-08 14:25 PST
,
Adam Barth
no flags
Details
HTML parser runtime (measured on chromium-android on a Nexus 7 via inspector instrumentation)
(17.22 KB, text/html)
2013-01-09 16:05 PST
,
Adam Barth
no flags
Details
View All
Add attachment
proposed patch, testcase, etc.
Adam Barth
Comment 1
2013-01-06 18:42:17 PST
Here's a slide deck from Mozilla related to this topic:
http://people.mozilla.com/~roc/Samsung/MozillaParallelism.pdf
Antti Koivisto
Comment 2
2013-01-07 05:03:08 PST
Over a run of bunch of real world web sites we seem to have ~3% of main thread CPU time in the HTML tokenization and parsing (excluding the actual tree building, the most expensive part). There are surely individual cases much worse than that. This is big enough to support architectural changes like this. The goal should be to eventually have the whole path from networking on off the main thread and only do the actual tree building there.
Eric Seidel (no email)
Comment 3
2013-01-07 12:02:50 PST
(In reply to
comment #2
)
> Over a run of bunch of real world web sites we seem to have ~3% of main thread CPU time in the HTML tokenization and parsing (excluding the actual tree building, the most expensive part). There are surely individual cases much worse than that. This is big enough to support architectural changes like this. > > The goal should be to eventually have the whole path from networking on off the main thread and only do the actual tree building there.
I'm very curious about this number! Adam and I briefly looked into generating a number like that last Friday. I had assumed parse time would be larger than 3% of total active main thread time, especially on Mobile. Could you share some of your methodology? Or other percentages of main thread usage? I'd be very interested in what you know about how we're spending time on the main thread, and happy to help you reduce it. I assume you just used dtrace + the PLT or similar? Our first-crack plan had been to use inspector timeline events and page cyclers (same idea as the plt), but I'm less interested in what events the inspector happens to record, and more about total time on the main thread and where it's going. We could also use systrace for this, and I might go that route next.
Eric Seidel (no email)
Comment 4
2013-01-07 12:03:45 PST
(In reply to
comment #3
)
> (In reply to
comment #2
)
>
> I'm very curious about this number!
We can also discuss this offline or in a separate bug. I don't need to hijack Adam's meta-bug. But I remain very interested in your testing and being able to repeat it/compare numbers/speed-up webkit.
Tony Gentilcore
Comment 5
2013-01-08 10:53:22 PST
(In reply to
comment #1
)
> Here's a slide deck from Mozilla related to this topic: >
http://people.mozilla.com/~roc/Samsung/MozillaParallelism.pdf
And some more in-depth design discussion:
https://developer.mozilla.org/en/Gecko/HTML_parser_threading
Antti Koivisto
Comment 6
2013-01-08 11:48:25 PST
Note that Mozilla implementation of this wasn't necessarily that evidence driven:
https://twitter.com/hsivonen/status/129457178368151552
Adam Barth
Comment 7
2013-01-08 12:44:49 PST
> Over a run of bunch of real world web sites we seem to have ~3% of main thread CPU time in the HTML tokenization and parsing (excluding the actual tree building, the most expensive part).
Why you say "tree building," do you mean the work down by the HTMLTreeBuilder object or the actually parserAppendChild/attach calls? We should be able to move HTMLTreeBuilder onto the background thread, but we probably would not be able to move parserAppendChild or attach. nduca did some measurements with Chromium's telemetry profiler (which uses the inspector timeline's notion of what constitutes HTML parsing time). On a selection of 25 popular web sites, he sees the parser using between 2% and 8% of main thread CPU time (with an average of 5%). Some examples on the high end (i.e., >=7%) are games.yahoo.com, www.youtube.com,
http://en.wikipedia.org/wiki/Wikipedia
, and pinterest.com. These numbers seem consistent with Antti's measurements given that Antti is likely excluding some amount of tree building work that the inspector is charging to the parser.
Adam Barth
Comment 8
2013-01-08 14:25:53 PST
Created
attachment 181766
[details]
HTML parser runtime (measured on chromium-mac on a Macbook Pro via inspector instrumentation) Here's more details from the dataset Nat took on his Macbook Pro. The "ParseHTML" column represents the total amount of time attributed to the HTML parser by the inspector instrumentation. The "ParseHTML_max" column is the largest contiguous chunk of time (in a single load of the page). Looking at the ParseHTML_max column, the parser seems to often consume multiple frames (by which I mean 60 Hz time slices on the main thread). In some cases, such as
http://en.wikipedia.org/wiki/Wikipedia
and
http://games.yahoo.com
the parser creates 7-9 frames of jank. Note: These measurements were taken on a Macbook Pro. It would be interesting to see how these measurements compare on a mobile device.
Eric Seidel (no email)
Comment 9
2013-01-08 14:31:47 PST
(In reply to
comment #8
)
> Created an attachment (id=181766) [details] > Looking at the ParseHTML_max column, the parser seems to often consume multiple frames (by which I mean 60 Hz time slices on the main thread). In some cases, such as
http://en.wikipedia.org/wiki/Wikipedia
and
http://games.yahoo.com
the parser creates 7-9 frames of jank.
The parser is currently set only to yield every 4000 tokens or 500ms. Which is likely waaay too long on a touch device.
http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLParserScheduler.cpp#L34
It would be interesting to build with a much lower threshold (like 30ms) and see how the web feels. Definitely pulling the parser off the main thread might help with these sorts of jank.
Antti Koivisto
Comment 10
2013-01-08 15:31:54 PST
(In reply to
comment #7
)
> > Over a run of bunch of real world web sites we seem to have ~3% of main thread CPU time in the HTML tokenization and parsing (excluding the actual tree building, the most expensive part). > > Why you say "tree building," do you mean the work down by the HTMLTreeBuilder object or the actually parserAppendChild/attach calls? We should be able to move HTMLTreeBuilder onto the background thread, but we probably would not be able to move parserAppendChild or attach.
I was pruning out entire HTMLTreeBuilder::constructTreeFromAtomicToken(). Pruning more carefully (calls to Element functions only) leaves ~3.5% in total.
> nduca did some measurements with Chromium's telemetry profiler (which uses the inspector timeline's notion of what constitutes HTML parsing time). On a selection of 25 popular web sites, he sees the parser using between 2% and 8% of main thread CPU time (with an average of 5%). Some examples on the high end (i.e., >=7%) are games.yahoo.com, www.youtube.com,
http://en.wikipedia.org/wiki/Wikipedia
, and pinterest.com.
I would like to see measurements done without relying on inspector infrastructure (for example by simple instrumentation code) so we know what exactly is being measured. As I said I think this is worth doing based on the current data already. However it would be good to realistic understanding what kinds of gains to expect.
Adam Barth
Comment 11
2013-01-09 16:05:38 PST
Created
attachment 182002
[details]
HTML parser runtime (measured on chromium-android on a Nexus 7 via inspector instrumentation) Here's are the results from the chromium-android port on a Nexus 7 (using a content_shell build from this afternoon). The parser takes up more time on the main thread. For example, on games.yahoo.com the HTML parser takes up 1.2 seconds. On average, the HTML parser takes 486 ms of main thread time. The "max" times are also considerably worse on the Nexus 7. The average "max" value is about 10 frames (60 Hz time slices), with the worse case being 38 frames.
Adam Barth
Comment 12
2013-01-09 16:07:07 PST
> I would like to see measurements done without relying on inspector infrastructure (for example by simple instrumentation code) so we know what exactly is being measured.
I would prefer to use a tool like instruments as well, but unfortunately the measurement harness we're using is build out of the inspector instrumentation.
> As I said I think this is worth doing based on the current data already. However it would be good to realistic understanding what kinds of gains to expect.
I agree. I'll send an email to webkit-dev.
Tony Gentilcore
Comment 13
2013-01-24 11:53:59 PST
I ran the prototype through the top 25 suite in Telemetry on a Galaxy Nexus (with the exception of Calendar which didn't load with the threaded parser). This benchmark loads cached sites from a local web page replay instance. Results are preliminary but encouraging: Default Threaded Improvement DOMContentLoaded 4972 4304 13% ParseHTML total 702 593 14% ParseHTML avg 9 5 44% ParseHTML max 309 107 65% Full results:
https://docs.google.com/a/chromium.org/spreadsheet/ccc?key=0AmVDuVhIZxCTdGdLUlhkbnVUaDlCQ01uVm92S05saHc#gid=0
One suspicious thing is that the absolute value of DOMContentLoaded improved more than ParseHTML. Perhaps due to our doc.write bug, we are actually doing less work on some of the pages. I'm also a little surprised the ParseHTML numbers didn't improve more. That suggests tree building is still taking a fair amount of time.
Adam Barth
Comment 14
2013-02-22 13:36:11 PST
The spreadsheet for triaging the remaining test failures is at
https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdE5IbVJESW00V2F5RUIwRDk3WEhMblE&usp=sharing
Eric Seidel (no email)
Comment 15
2013-03-02 02:02:25 PST
This is now on by default in Chromium Canary:
https://groups.google.com/a/chromium.org/forum/#!topic/chromium-dev/hBUVtg7gacE
See the announcement for details on the (substantial) perf win (even for single-core devices!?) Other ports probably want to wait a couple days before turning this on, in case there are other bugs we should shake out.
Bug 110937
may also block at least Mac WK1 from enabling this for the time being.
Eric Seidel (no email)
Comment 16
2013-03-05 02:05:06 PST
***
Bug 57376
has been marked as a duplicate of this bug. ***
Eric Seidel (no email)
Comment 17
2013-03-05 02:08:11 PST
The parser was disabled on Chromium Canary due to a couple crashers we fixed today. It should be back on as of Weds' Canary.
Eric Seidel (no email)
Comment 18
2013-03-06 17:34:19 PST
This bug is almost ready to close. Filed
bug 111645
for tracking further perf improvements to the threaded parser codepath.
Adam Barth
Comment 19
2013-03-07 12:37:07 PST
IMHO, we should fix
bug 109764
before closing this bug.
Adam Barth
Comment 20
2013-03-11 19:52:49 PDT
The parser appears to work. :)
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug