Bug 106127 - [meta] HTML parser shouldn't block the main thread
Summary: [meta] HTML parser shouldn't block the main thread
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
: 57376 (view as bug list)
Depends on: 57376 63531 90751 106128 106251 106256 106268 106375 106401 106496 106595 106597 106607 106615 106618 106694 106722 106854 106919 107068 107069 107070 107071 107082 107083 107086 107087 107105 107140 107150 107158 107159 107160 107170 107190 107201 107317 107320 107330 107332 107367 107368 107519 107522 107561 107569 107575 107584 107593 107596 107603 107664 107713 107751 107753 107755 107807 107876 107975 107983 108027 108096 108394 108531 108557 108655 108666 108698 108726 108880 108970 108984 109076 109237 109240 109477 109485 109486 109495 109598 109607 109738 109742 109750 109754 109760 109764 109995 110251 110258 110276 110408 110517 110529 110532 110537 110538 110637 110643 110647 110678 110801 110907 110929 110937 110949 110951 111021 111023 111043 111044 111130 111135 111200 111248 111249 111253 111272 111365 111423 111610 112057
Blocks:
  Show dependency treegraph
 
Reported: 2013-01-04 13:08 PST by Adam Barth
Modified: 2013-03-11 19:52 PDT (History)
24 users (show)

See Also:


Attachments
HTML parser runtime (measured on chromium-mac on a Macbook Pro via inspector instrumentation) (17.46 KB, text/html)
2013-01-08 14:25 PST, Adam Barth
no flags Details
HTML parser runtime (measured on chromium-android on a Nexus 7 via inspector instrumentation) (17.22 KB, text/html)
2013-01-09 16:05 PST, Adam Barth
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Adam Barth 2013-01-04 13:08:22 PST
This is a meta bug for moving the HTML parser off the main thread.

We're currently evaluating how much performance there is to be gained from this change.  The performance gains might arise in two ways:

1) Moving parsing off the main thread could make web pages more responsive because the main thread is available for handling input events and executing JavaScript.
2) Moving parsing off the main thread could make web pages load more quickly because WebCore can do other work in parallel with parsing HTML (such as parsing CSS or attaching elements to the render tree).

While we investigate these possible performance benefits, we might refactor the parser a bit to remove main-thread dependencies from the core objects (e.g., HTMLTokenizer and HTMLTreeBuilder).  Once we have more data, we'll start a discussion on webkit-dev before making any major architectural changes.
Comment 1 Adam Barth 2013-01-06 18:42:17 PST
Here's a slide deck from Mozilla related to this topic:
http://people.mozilla.com/~roc/Samsung/MozillaParallelism.pdf
Comment 2 Antti Koivisto 2013-01-07 05:03:08 PST
Over a run of bunch of real world web sites we seem to have ~3% of main thread CPU time in the HTML tokenization and parsing (excluding the actual tree building, the most expensive part). There are surely individual cases much worse than that. This is big enough to support architectural changes like this.

The goal should be to eventually have the whole path from networking on off the main thread and only do the actual tree building there.
Comment 3 Eric Seidel (no email) 2013-01-07 12:02:50 PST
(In reply to comment #2)
> Over a run of bunch of real world web sites we seem to have ~3% of main thread CPU time in the HTML tokenization and parsing (excluding the actual tree building, the most expensive part). There are surely individual cases much worse than that. This is big enough to support architectural changes like this.
> 
> The goal should be to eventually have the whole path from networking on off the main thread and only do the actual tree building there.

I'm very curious about this number!

Adam and I briefly looked into generating a number like that last Friday.  I had assumed parse time would be larger than 3% of total active main thread time, especially on Mobile.

Could you share some of your methodology?  Or other percentages of main thread usage?  I'd be very interested in what you know about how we're spending time on the main thread, and happy to help you reduce it.

I assume you just used dtrace + the PLT or similar?  Our first-crack plan had been to use inspector timeline events and page cyclers (same idea as the plt), but I'm less interested in what events the inspector happens to record, and more about total time on the main thread and where it's going.  We could also use systrace for this, and I might go that route next.
Comment 4 Eric Seidel (no email) 2013-01-07 12:03:45 PST
(In reply to comment #3)
> (In reply to comment #2)
>
> I'm very curious about this number!

We can also discuss this offline or in a separate bug.  I don't need to hijack Adam's meta-bug.  But I remain very interested in your testing and being able to repeat it/compare numbers/speed-up webkit.
Comment 5 Tony Gentilcore 2013-01-08 10:53:22 PST
(In reply to comment #1)
> Here's a slide deck from Mozilla related to this topic:
> http://people.mozilla.com/~roc/Samsung/MozillaParallelism.pdf

And some more in-depth design discussion:
https://developer.mozilla.org/en/Gecko/HTML_parser_threading
Comment 6 Antti Koivisto 2013-01-08 11:48:25 PST
Note that Mozilla implementation of this wasn't necessarily that evidence driven: https://twitter.com/hsivonen/status/129457178368151552
Comment 7 Adam Barth 2013-01-08 12:44:49 PST
> Over a run of bunch of real world web sites we seem to have ~3% of main thread CPU time in the HTML tokenization and parsing (excluding the actual tree building, the most expensive part).

Why you say "tree building," do you mean the work down by the HTMLTreeBuilder object or the actually parserAppendChild/attach calls?  We should be able to move HTMLTreeBuilder onto the background thread, but we probably would not be able to move parserAppendChild or attach.

nduca did some measurements with Chromium's telemetry profiler (which uses the inspector timeline's notion of what constitutes HTML parsing time).  On a selection of 25 popular web sites, he sees the parser using between 2% and 8% of main thread CPU time (with an average of 5%).  Some examples on the high end (i.e., >=7%) are games.yahoo.com, www.youtube.com, http://en.wikipedia.org/wiki/Wikipedia, and pinterest.com.

These numbers seem consistent with Antti's measurements given that Antti is likely excluding some amount of tree building work that the inspector is charging to the parser.
Comment 8 Adam Barth 2013-01-08 14:25:53 PST
Created attachment 181766 [details]
HTML parser runtime (measured on chromium-mac on a Macbook Pro via inspector instrumentation)

Here's more details from the dataset Nat took on his Macbook Pro.  The "ParseHTML" column represents the total amount of time attributed to the HTML parser by the inspector instrumentation.  The "ParseHTML_max" column is the largest contiguous chunk of time (in a single load of the page).

Looking at the ParseHTML_max column, the parser seems to often consume multiple frames (by which I mean 60 Hz time slices on the main thread).  In some cases, such as http://en.wikipedia.org/wiki/Wikipedia and http://games.yahoo.com the parser creates 7-9 frames of jank.

Note: These measurements were taken on a Macbook Pro.  It would be interesting to see how these measurements compare on a mobile device.
Comment 9 Eric Seidel (no email) 2013-01-08 14:31:47 PST
(In reply to comment #8)
> Created an attachment (id=181766) [details]
> Looking at the ParseHTML_max column, the parser seems to often consume multiple frames (by which I mean 60 Hz time slices on the main thread).  In some cases, such as http://en.wikipedia.org/wiki/Wikipedia and http://games.yahoo.com the parser creates 7-9 frames of jank.

The parser is currently set only to yield every 4000 tokens or 500ms.  Which is likely waaay too long on a touch device.

http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLParserScheduler.cpp#L34

It would be interesting to build with a much lower threshold (like 30ms) and see how the web feels.

Definitely pulling the parser off the main thread might help with these sorts of jank.
Comment 10 Antti Koivisto 2013-01-08 15:31:54 PST
(In reply to comment #7)
> > Over a run of bunch of real world web sites we seem to have ~3% of main thread CPU time in the HTML tokenization and parsing (excluding the actual tree building, the most expensive part).
> 
> Why you say "tree building," do you mean the work down by the HTMLTreeBuilder object or the actually parserAppendChild/attach calls?  We should be able to move HTMLTreeBuilder onto the background thread, but we probably would not be able to move parserAppendChild or attach.

I was pruning out entire HTMLTreeBuilder::constructTreeFromAtomicToken(). Pruning more carefully (calls to Element functions only) leaves ~3.5% in total.

> nduca did some measurements with Chromium's telemetry profiler (which uses the inspector timeline's notion of what constitutes HTML parsing time).  On a selection of 25 popular web sites, he sees the parser using between 2% and 8% of main thread CPU time (with an average of 5%).  Some examples on the high end (i.e., >=7%) are games.yahoo.com, www.youtube.com, http://en.wikipedia.org/wiki/Wikipedia, and pinterest.com.

I would like to see measurements done without relying on inspector infrastructure (for example by simple instrumentation code) so we know what exactly is being measured. 

As I said I think this is worth doing based on the current data already. However it would be good to realistic understanding what kinds of gains to expect.
Comment 11 Adam Barth 2013-01-09 16:05:38 PST
Created attachment 182002 [details]
HTML parser runtime (measured on chromium-android on a Nexus 7 via inspector instrumentation)

Here's are the results from the chromium-android port on a Nexus 7 (using a content_shell build from this afternoon).  The parser takes up more time on the main thread.  For example, on games.yahoo.com the HTML parser takes up 1.2 seconds.  On average, the HTML parser takes 486 ms of main thread time.

The "max" times are also considerably worse on the Nexus 7.  The average "max" value is about 10 frames (60 Hz time slices), with the worse case being 38 frames.
Comment 12 Adam Barth 2013-01-09 16:07:07 PST
> I would like to see measurements done without relying on inspector infrastructure (for example by simple instrumentation code) so we know what exactly is being measured. 

I would prefer to use a tool like instruments as well, but unfortunately the measurement harness we're using is build out of the inspector instrumentation.

> As I said I think this is worth doing based on the current data already. However it would be good to realistic understanding what kinds of gains to expect.

I agree.  I'll send an email to webkit-dev.
Comment 13 Tony Gentilcore 2013-01-24 11:53:59 PST
I ran the prototype through the top 25 suite in Telemetry on a Galaxy Nexus (with the exception of Calendar which didn't load with the threaded parser). This benchmark loads cached sites from a local web page replay instance.

Results are preliminary but encouraging:

                  Default  Threaded  Improvement
DOMContentLoaded     4972      4304          13%
ParseHTML total       702       593          14%
ParseHTML avg           9         5          44%
ParseHTML max         309       107          65%

Full results:
https://docs.google.com/a/chromium.org/spreadsheet/ccc?key=0AmVDuVhIZxCTdGdLUlhkbnVUaDlCQ01uVm92S05saHc#gid=0

One suspicious thing is that the absolute value of DOMContentLoaded improved more than ParseHTML. Perhaps due to our doc.write bug, we are actually doing less work on some of the pages. I'm also a little surprised the ParseHTML numbers didn't improve more. That suggests tree building is still taking a fair amount of time.
Comment 14 Adam Barth 2013-02-22 13:36:11 PST
The spreadsheet for triaging the remaining test failures is at

https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdE5IbVJESW00V2F5RUIwRDk3WEhMblE&usp=sharing
Comment 15 Eric Seidel (no email) 2013-03-02 02:02:25 PST
This is now on by default in Chromium Canary:
https://groups.google.com/a/chromium.org/forum/#!topic/chromium-dev/hBUVtg7gacE
See the announcement for details on the (substantial) perf win (even for single-core devices!?)

Other ports probably want to wait a couple days before turning this on, in case there are other bugs we should shake out.

Bug 110937 may also block at least Mac WK1 from enabling this for the time being.
Comment 16 Eric Seidel (no email) 2013-03-05 02:05:06 PST
*** Bug 57376 has been marked as a duplicate of this bug. ***
Comment 17 Eric Seidel (no email) 2013-03-05 02:08:11 PST
The parser was disabled on Chromium Canary due to a couple crashers we fixed today.  It should be back on as of Weds' Canary.
Comment 18 Eric Seidel (no email) 2013-03-06 17:34:19 PST
This bug is almost ready to close.  Filed bug 111645 for tracking further perf improvements to the threaded parser codepath.
Comment 19 Adam Barth 2013-03-07 12:37:07 PST
IMHO, we should fix bug 109764 before closing this bug.
Comment 20 Adam Barth 2013-03-11 19:52:49 PDT
The parser appears to work.  :)