Bug 110547

Summary: Free up background parser's checkpoints when speculation succeeds
Product: WebKit Reporter: Tony Gentilcore <tonyg>
Component: WebCore Misc.Assignee: Eric Seidel (no email) <eric>
Status: RESOLVED DUPLICATE    
Severity: Normal CC: abarth, eric, esprehn+autocc, ojan.autocc, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 112170    
Bug Blocks: 111645, 111893, 112069    
Attachments:
Description Flags
wip
none
Builds, but crashes fast/preloader tests
none
Patch
none
Hopefully addressed Adam's concerns none

Description Tony Gentilcore 2013-02-21 19:56:30 PST
When speculation fails, we send a resumeFrom message to the background thread parser which causes it to free up all its checkpoints. We should send a message when speculation succeeds so that it can free up memory used for previous checkpoints.
Comment 1 Eric Seidel (no email) 2013-03-05 02:14:07 PST
Created attachment 191440 [details]
wip
Comment 2 Eric Seidel (no email) 2013-03-05 02:14:38 PST
I'm not even sure that compiles yet (my laptop is slow).  But I welcome your feedback.  I'm likely to go to sleep before finishing it tonight.
Comment 3 Eric Seidel (no email) 2013-03-05 02:52:43 PST
Created attachment 191448 [details]
Builds, but crashes fast/preloader tests
Comment 4 Eric Seidel (no email) 2013-03-05 02:53:13 PST
Comment on attachment 191448 [details]
Builds, but crashes fast/preloader tests

Old habits. :)  Didn't mean to mark for review.
Comment 5 Eric Seidel (no email) 2013-03-05 03:50:09 PST
Created attachment 191458 [details]
Patch
Comment 6 Eric Seidel (no email) 2013-03-05 04:07:28 PST
Comment on attachment 191458 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=191458&action=review

> Source/WebCore/ChangeLog:10
> +        pages, as we will no longer hold multiple copies of every
> +        source byte during the whole parse.

This is slightly misleading, as this is really only letting go of one copy.  The m_segments and m_checkpoint.input hold onto refcounted StringImpls, so there is just one shared copy which we're now releasing earlier.  I expect this will only effect particularly large loads.
Comment 7 Eric Seidel (no email) 2013-03-06 22:55:50 PST
I believe this patch is ready for review and landing?
Comment 8 Daniel Bates 2013-03-06 23:45:18 PST
Comment on attachment 191458 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=191458&action=review

> Source/WebCore/ChangeLog:12
> +        Many LayoutTests exercise this code path, and I've manually (debugger and printf)

I think you're missing the word "verified" in this sentence.

> Source/WebCore/html/parser/BackgroundHTMLInputStream.cpp:58
> +    const Checkpoint& checkpoint = m_checkpoints[checkpointIndex];

Would it make sense to assert the invariant checkpointIndex < m_checkpoints.size() in this function?
Comment 9 Adam Barth 2013-03-07 00:06:31 PST
Comment on attachment 191458 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=191458&action=review

I should look at this again when I'm less tired.  It's not as beautiful as I had hoped.

> Source/WebCore/html/parser/BackgroundHTMLInputStream.cpp:60
> +    for (size_t i = 0; i < checkpoint.numberOfSegmentsAlreadyAppended; ++i)
> +        m_segments[i] = String();

Does this work correctly in the case where there are multiple checkpoints per append?  Probably.

Also, this algorithm is n^2.  Should we keep some state for the smallest index that has actual data and start the loop from there?

> Source/WebCore/html/parser/HTMLDocumentParser.cpp:462
> +        checkpointIndex = std::max(chunk->inputCheckpoint, checkpointIndex);

Why is this needed?

> Source/WebCore/html/parser/HTMLDocumentParser.cpp:470
> +        // Processing a chunk can call document.write, causing us to invalidate any remaining speculations.
> +        // Similarly, if we're stopped just abort and don't bother to tell the parser about passing this checkpoint.
> +        if (m_speculations.isEmpty() || isStopped()) {
> +            checkpointIndex = 0;
> +            break;
> +        }

This is pretty ugly....
Comment 10 Adam Barth 2013-03-08 11:16:28 PST
Comment on attachment 191458 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=191458&action=review

I realize my complaints below are mostly aesthetic...

> Source/WebCore/html/parser/BackgroundHTMLInputStream.cpp:56
> +void BackgroundHTMLInputStream::releaseDataUpThroughCheckpoint(HTMLInputCheckpoint checkpointIndex)

invalidateCheckpointsBefore ?

>> Source/WebCore/html/parser/BackgroundHTMLInputStream.cpp:58
>> +    const Checkpoint& checkpoint = m_checkpoints[checkpointIndex];
> 
> Would it make sense to assert the invariant checkpointIndex < m_checkpoints.size() in this function?

We should ASSERT that checkpoint hasn't been invalidated yet.

>> Source/WebCore/html/parser/BackgroundHTMLInputStream.cpp:60
>> +        m_segments[i] = String();
> 
> Does this work correctly in the case where there are multiple checkpoints per append?  Probably.
> 
> Also, this algorithm is n^2.  Should we keep some state for the smallest index that has actual data and start the loop from there?

Please fix the n^2 issue.

>> Source/WebCore/html/parser/HTMLDocumentParser.cpp:470
>> +        // Processing a chunk can call document.write, causing us to invalidate any remaining speculations.
>> +        // Similarly, if we're stopped just abort and don't bother to tell the parser about passing this checkpoint.
>> +        if (m_speculations.isEmpty() || isStopped()) {
>> +            checkpointIndex = 0;
>> +            break;
>> +        }
> 
> This is pretty ugly....

There must be a way to structure this code so that it's less subtle.  Maybe we should do this work in validateSpeculations instead?
Comment 11 Eric Seidel (no email) 2013-03-08 11:36:18 PST
Comment on attachment 191458 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=191458&action=review

>> Source/WebCore/html/parser/BackgroundHTMLInputStream.cpp:56
>> +void BackgroundHTMLInputStream::releaseDataUpThroughCheckpoint(HTMLInputCheckpoint checkpointIndex)
> 
> invalidateCheckpointsBefore ?

Happy to name it whatever.  The tricky part for me was that it's inclusive of the checkpoint you pass it invalidateCheckpointsUntil(checkpoint)?

>>> Source/WebCore/html/parser/BackgroundHTMLInputStream.cpp:60
>>> +        m_segments[i] = String();
>> 
>> Does this work correctly in the case where there are multiple checkpoints per append?  Probably.
>> 
>> Also, this algorithm is n^2.  Should we keep some state for the smallest index that has actual data and start the loop from there?
> 
> Please fix the n^2 issue.

OK.  I'll keep a m_lastInvalidSegment or similar.

>> Source/WebCore/html/parser/HTMLDocumentParser.cpp:462
>> +        checkpointIndex = std::max(chunk->inputCheckpoint, checkpointIndex);
> 
> Why is this needed?

Since we can assume the chunks have checkpoints in ascending order, it's not.  We just need the last one.

>>> Source/WebCore/html/parser/HTMLDocumentParser.cpp:470
>>> +        }
>> 
>> This is pretty ugly....
> 
> There must be a way to structure this code so that it's less subtle.  Maybe we should do this work in validateSpeculations instead?

We need to know when the chunk is processed.  I'll have to check when validateSpeculations is called.
Comment 12 Eric Seidel (no email) 2013-03-08 15:16:24 PST
Created attachment 192294 [details]
Hopefully addressed Adam's concerns
Comment 13 Adam Barth 2013-03-08 15:21:02 PST
Comment on attachment 192294 [details]
Hopefully addressed Adam's concerns

View in context: https://bugs.webkit.org/attachment.cgi?id=192294&action=review

I can't tell if you changed HTMLDocumentParser::pumpPendingSpeculations or if I've just read it enough times to understand what it's doing.  :)

> Source/WebCore/html/parser/BackgroundHTMLInputStream.h:72
> +    // Note: These indicies may === vector.size(), in which case there are no valid checkpoints/segments at this time.

===   ->    ==

(Triple-equals is JavaScript silliness.)
Comment 14 Adam Barth 2013-03-08 15:22:36 PST
Comment on attachment 192294 [details]
Hopefully addressed Adam's concerns

View in context: https://bugs.webkit.org/attachment.cgi?id=192294&action=review

> Source/WebCore/html/parser/HTMLDocumentParser.cpp:484
> +        HTMLParserThread::shared()->postTask(bind(&BackgroundHTMLParser::passedCheckpoint, m_backgroundParser, lastCheckpointPassed));

The only problem with this patch is that we'll never call passedCheckpoint if we never call pumpPendingSpeculations.  We only call pumpPendingSpeculations if we ended up queuing speculations.  It's entirely possible that we can process every chunk from the background parser immediately after receiving them.
Comment 15 Eric Seidel (no email) 2013-03-08 15:27:51 PST
(In reply to comment #14)
> (From update of attachment 192294 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=192294&action=review
> 
> > Source/WebCore/html/parser/HTMLDocumentParser.cpp:484
> > +        HTMLParserThread::shared()->postTask(bind(&BackgroundHTMLParser::passedCheckpoint, m_backgroundParser, lastCheckpointPassed));
> 
> The only problem with this patch is that we'll never call passedCheckpoint if we never call pumpPendingSpeculations.  We only call pumpPendingSpeculations if we ended up queuing speculations.  It's entirely possible that we can process every chunk from the background parser immediately after receiving them.

That's true.  I'm not sure if I should do that in a follow-up, or go another round with this one.  I'm open to either.
Comment 16 Eric Seidel (no email) 2013-03-08 15:29:04 PST
(In reply to comment #15)
> (In reply to comment #14)
> > (From update of attachment 192294 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=192294&action=review
> > 
> > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:484
> > > +        HTMLParserThread::shared()->postTask(bind(&BackgroundHTMLParser::passedCheckpoint, m_backgroundParser, lastCheckpointPassed));
> > 
> > The only problem with this patch is that we'll never call passedCheckpoint if we never call pumpPendingSpeculations.  We only call pumpPendingSpeculations if we ended up queuing speculations.  It's entirely possible that we can process every chunk from the background parser immediately after receiving them.
> 
> That's true.  I'm not sure if I should do that in a follow-up, or go another round with this one.  I'm open to either.

It seems inefficient to tell the background parser to release a chunk after each chunk.  But maybe I'm optimizing prematurely.  I have no data to suggest that messaging is expensive at all. :)
Comment 17 Adam Barth 2013-03-08 15:32:14 PST
Followup patch is fine.
Comment 18 WebKit Review Bot 2013-03-08 15:41:32 PST
Comment on attachment 192294 [details]
Hopefully addressed Adam's concerns

Clearing flags on attachment: 192294

Committed r145277: <http://trac.webkit.org/changeset/145277>
Comment 19 WebKit Review Bot 2013-03-08 15:41:36 PST
All reviewed patches have been landed.  Closing bug.
Comment 20 Eric Seidel (no email) 2013-03-11 17:08:50 PDT
This work continued in bug 112069.
Comment 21 WebKit Review Bot 2013-03-12 11:12:05 PDT
Re-opened since this is blocked by bug 112170
Comment 22 Eric Seidel (no email) 2013-03-14 02:35:20 PDT

*** This bug has been marked as a duplicate of bug 112069 ***