Bug 112069

Summary: Threaded HTML Parser should limit speculation to avoid using too much memory
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: New BugsAssignee: Eric Seidel (no email) <eric>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, buildbot, dglazkov, esprehn+autocc, ojan.autocc, pfeldman, rniwa, tonyg, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 110547    
Bug Blocks: 111645    
Attachments:
Description Flags
wip, hangs several tests.
none
Patch
none
Patch
none
Patch for landing
none
Patch for landing
none
Patch for landing
none
Patch for landing
none
Patch for landing
none
updated
none
Patch for landing
none
Patch for landing
none
Fixed one test, but still breaks 2 inspector tests somehow
none
Patch for landing none

Description Eric Seidel (no email) 2013-03-11 14:43:00 PDT
Threaded HTML Parser should limit speculation to avoid using too much memory
Comment 1 Eric Seidel (no email) 2013-03-11 14:43:28 PDT
Created attachment 192564 [details]
wip, hangs several tests.
Comment 2 Eric Seidel (no email) 2013-03-11 14:44:13 PDT
Comment on attachment 192564 [details]
wip, hangs several tests.

View in context: https://bugs.webkit.org/attachment.cgi?id=192564&action=review

> Source/WebCore/dom/NodeRenderingContext.cpp:186
> +    // display:none iframes don't need renderers.
> +    // However, if our ownerElement is not yet attached, we can't trust its renderer() being null.
> +    if (m_node->document()->ownerElement()
> +        && m_node->document()->ownerElement()->attached()
> +        && !m_node->document()->ownerElement()->renderer())
> +        return false;

This was from another patch.  ignore.
Comment 3 Eric Seidel (no email) 2013-03-11 17:06:27 PDT
Created attachment 192599 [details]
Patch
Comment 4 Eric Seidel (no email) 2013-03-11 17:09:55 PDT
*** Bug 111893 has been marked as a duplicate of this bug. ***
Comment 5 Adam Barth 2013-03-11 17:23:27 PDT
Comment on attachment 192599 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=192599&action=review

> Source/WebCore/html/parser/HTMLDocumentParser.cpp:-348
> -    if (!m_currentChunk) {
> -        // If there is no m_currentChunk, we must have already called didFailSpeculation
> -        // for this chunk.
> -        // FIXME: In this case, we're losing whatever state has been changed since
> -        // we called didFailSpeculation. See https://bugs.webkit.org/show_bug.cgi?id=110546
> -        return;
> -    }

I'm worried that we're not handling this case anymore.  Maybe this patch fixes it?  We should have test coverage for it.
Comment 6 Eric Seidel (no email) 2013-03-11 18:24:00 PDT
Created attachment 192614 [details]
Patch
Comment 7 Adam Barth 2013-03-11 18:35:09 PDT
Comment on attachment 192614 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=192614&action=review

> Source/WebCore/html/parser/BackgroundHTMLParser.cpp:53
> +static const size_t outstandingCheckpointStopLimit = 10;
> +static const size_t outstandingCheckpointStartLimit = 3;

We should tune these constants based on some performance data.

> Source/WebCore/html/parser/BackgroundHTMLParser.cpp:152
> +    // No need to start speculating until the main thread is nearly caught up.
> +    if (m_input.outstandingCheckpointCount() > outstandingCheckpointStartLimit)
> +        return;

How do you know that we'll actually finish parsing everything?  Don't we need to bypass this check in the finish() call?
Comment 8 Adam Barth 2013-03-11 20:43:03 PDT
(In reply to comment #7)
> (From update of attachment 192614 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=192614&action=review
> 
> > Source/WebCore/html/parser/BackgroundHTMLParser.cpp:53
> > +static const size_t outstandingCheckpointStopLimit = 10;
> > +static const size_t outstandingCheckpointStartLimit = 3;
> 
> We should tune these constants based on some performance data.
> 
> > Source/WebCore/html/parser/BackgroundHTMLParser.cpp:152
> > +    // No need to start speculating until the main thread is nearly caught up.
> > +    if (m_input.outstandingCheckpointCount() > outstandingCheckpointStartLimit)
> > +        return;
> 
> How do you know that we'll actually finish parsing everything?  Don't we need to bypass this check in the finish() call?

This code is correct because the main thread will eventually catch up, which will kick the background thread into action.
Comment 9 WebKit Review Bot 2013-03-11 20:46:22 PDT
Comment on attachment 192614 [details]
Patch

Rejecting attachment 192614 [details] from commit-queue.

Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=webkit-commit-queue.appspot.com', '--bot-id=gce-cq-04', 'apply-attachment', '--no-update', '--non-interactive', 192614, '--port=chromium-xvfb']" exit_code: 2 cwd: /mnt/git/webkit-commit-queue

Last 500 characters of output:

Hunk #6 succeeded at 486 (offset 2 lines).
Hunk #7 succeeded at 886 (offset 2 lines).
2 out of 7 hunks FAILED -- saving rejects to file Source/WebCore/html/parser/HTMLDocumentParser.cpp.rej
patching file Source/WebCore/html/parser/HTMLDocumentParser.h
Hunk #1 succeeded at 143 (offset 2 lines).
Hunk #2 succeeded at 196 (offset 2 lines).

Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', '--force', '--reviewer', 'Adam Barth']" exit_code: 1 cwd: /mnt/git/webkit-commit-queue

Full output: http://webkit-commit-queue.appspot.com/results/17130351
Comment 10 Eric Seidel (no email) 2013-03-11 22:33:37 PDT
Created attachment 192640 [details]
Patch for landing
Comment 11 Eric Seidel (no email) 2013-03-11 22:59:05 PDT
Created attachment 192644 [details]
Patch for landing
Comment 12 Eric Seidel (no email) 2013-03-11 23:04:25 PDT
Created attachment 192646 [details]
Patch for landing
Comment 13 Eric Seidel (no email) 2013-03-11 23:12:46 PDT
Created attachment 192648 [details]
Patch for landing
Comment 14 WebKit Review Bot 2013-03-11 23:17:15 PDT
Comment on attachment 192648 [details]
Patch for landing

Rejecting attachment 192648 [details] from commit-queue.

Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=webkit-commit-queue.appspot.com', '--bot-id=gce-cq-04', 'validate-changelog', '--non-interactive', 192648, '--port=chromium-xvfb']" exit_code: 1 cwd: /mnt/git/webkit-commit-queue

/mnt/git/webkit-commit-queue/LayoutTests/ChangeLog neither lists a valid reviewer nor contains the string "Unreviewed" or "Rubber stamp" (case insensitive).

Full output: http://webkit-commit-queue.appspot.com/results/17174031
Comment 15 Eric Seidel (no email) 2013-03-11 23:19:28 PDT
Created attachment 192649 [details]
Patch for landing
Comment 16 Eric Seidel (no email) 2013-03-12 04:20:43 PDT
I wonder if this is causing falkiness.  Teh CQ is having trouble landing it.
Comment 17 Eric Seidel (no email) 2013-03-12 04:21:28 PDT
Comment on attachment 192649 [details]
Patch for landing

I'm going to look at this again tomorrow before we land it.
Comment 18 Eric Seidel (no email) 2013-03-13 15:25:00 PDT
Created attachment 193008 [details]
updated
Comment 19 Eric Seidel (no email) 2013-03-13 15:45:45 PDT
Created attachment 193012 [details]
Patch for landing
Comment 20 Eric Seidel (no email) 2013-03-13 16:25:07 PDT
Created attachment 193019 [details]
Patch for landing
Comment 21 WebKit Review Bot 2013-03-13 18:28:09 PDT
Comment on attachment 193019 [details]
Patch for landing

Attachment 193019 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://webkit-commit-queue.appspot.com/results/17194175

New failing tests:
inspector/debugger/dom-breakpoints.html
inspector/audits/audits-panel-functional.html
fast/parser/external-script-document-write.html
http/tests/navigation/lockedhistory-iframe.html
Comment 22 Build Bot 2013-03-13 18:28:47 PDT
Comment on attachment 193019 [details]
Patch for landing

Attachment 193019 [details] did not pass mac-ews (mac):
Output: http://webkit-commit-queue.appspot.com/results/17215030

New failing tests:
fast/parser/external-script-document-write.html
Comment 23 Build Bot 2013-03-13 19:33:30 PDT
Comment on attachment 193019 [details]
Patch for landing

Attachment 193019 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-commit-queue.appspot.com/results/17199079

New failing tests:
fast/parser/external-script-document-write.html
Comment 24 Eric Seidel (no email) 2013-03-13 22:50:47 PDT
The Mac failures are because I failed to add the necessary subresource for the new test in my diff.  I'm not sure about the cr-linux failures.  Investigating.
Comment 25 Eric Seidel (no email) 2013-03-13 23:16:50 PDT
(In reply to comment #21)
> (From update of attachment 193019 [details])
> Attachment 193019 [details] did not pass chromium-ews (chromium-xvfb):
> Output: http://webkit-commit-queue.appspot.com/results/17194175
> 
> New failing tests:
> inspector/debugger/dom-breakpoints.html
> inspector/audits/audits-panel-functional.html
> http/tests/navigation/lockedhistory-iframe.html

These other 3 are real failures from my patch.  I'll investigate.
Comment 26 Eric Seidel (no email) 2013-03-13 23:29:13 PDT
Created attachment 193072 [details]
Fixed one test, but still breaks 2 inspector tests somehow
Comment 27 WebKit Review Bot 2013-03-14 01:10:29 PDT
Comment on attachment 193072 [details]
Fixed one test, but still breaks 2 inspector tests somehow

Attachment 193072 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://webkit-commit-queue.appspot.com/results/17132110

New failing tests:
inspector/debugger/dom-breakpoints.html
inspector/audits/audits-panel-functional.html
http/tests/navigation/lockedhistory-iframe.html
Comment 28 Eric Seidel (no email) 2013-03-14 02:03:36 PDT
I dumped printfs all over the parser.  There are multiple documents being parsed at once in these tests, these limits change the order in which the documents complete.  I think the inspector tests are just racy here, and I'm causing us to now alway lose the race.

If I adjust the outstandingChunkLimit up to 10, then the failures go away. :)
Comment 29 Eric Seidel (no email) 2013-03-14 02:07:15 PDT
Created attachment 193091 [details]
Patch for landing
Comment 30 Eric Seidel (no email) 2013-03-14 02:08:54 PDT
In the end this is just a rather awkward producer/consumer implementation, using the MessageQueue lock to build a semaphore. :)  We could probably change this around to be simpler/more-explicit in future patches.
Comment 31 Eric Seidel (no email) 2013-03-14 02:12:14 PDT
(In reply to comment #28)
> I dumped printfs all over the parser.  There are multiple documents being parsed at once in these tests, these limits change the order in which the documents complete.  I think the inspector tests are just racy here, and I'm causing us to now alway lose the race.
> 
> If I adjust the outstandingChunkLimit up to 10, then the failures go away. :)

I unified the start/stop limits into one.  What was happening was that if your document could parse with fewer than 10 checkpoints it would never yield.  So short documents were beating out longer ones, especially since longer ones got throttled down to 3 checkpoints after the first one. :)

So I removed the separate start/stop limits, and that magically made the larger document able to parse again before the smaller one as it did not have to wait for 7 checkpoints to get processed before it could start pumping again.

Again, I believe this to be a race in the inspector tests for expecting one document to not load before the other but it's not longer possible to trigger it again for the meanwhile. :)
Comment 32 WebKit Review Bot 2013-03-14 02:27:16 PDT
Comment on attachment 193091 [details]
Patch for landing

Clearing flags on attachment: 193091

Committed r145797: <http://trac.webkit.org/changeset/145797>
Comment 33 WebKit Review Bot 2013-03-14 02:27:21 PDT
All reviewed patches have been landed.  Closing bug.
Comment 34 Eric Seidel (no email) 2013-03-14 02:27:36 PDT
To restate:  The previous code worked that if your main document had more than 10 checkpoints (aka chunks) in it (1000 tokens, or a </script> tag cause a chunk/checkpoint) then it would yield, and continue yielding until the main thread had processed enough of those chunks until only 3 outstanding checkpoints/chunks were remaining.  In that time, if it caused a sub-resource load and that load happend to fit under 10 chunks (or better yet, under 3) then it likely would be processed to completion before the main load was.  This is what seemed to be biting the inspector tests.

I didn't dig deep enough to figure out which page (it could even have been the inspector page itself) was being loaded "out of order", but such an investigation would be possible using an earlier version of my patch and some printfs in HTMLDocumentParser.

It was important that the start and stop limits were separate, as that gave time for "little" documents to easily complete parsing in one go before the larger document (which had yielded the background thread) would have processed stop - start tokens and be able to continue/finish parsing.
Comment 35 Eric Seidel (no email) 2013-03-14 02:35:20 PDT
*** Bug 110547 has been marked as a duplicate of this bug. ***