Summary: | Free up background parser's checkpoints when speculation succeeds | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Tony Gentilcore <tonyg> | ||||||||||
Component: | WebCore Misc. | Assignee: | Eric Seidel (no email) <eric> | ||||||||||
Status: | RESOLVED DUPLICATE | ||||||||||||
Severity: | Normal | CC: | abarth, eric, esprehn+autocc, ojan.autocc, webkit.review.bot | ||||||||||
Priority: | P2 | ||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Bug Depends on: | 112170 | ||||||||||||
Bug Blocks: | 111645, 111893, 112069 | ||||||||||||
Attachments: |
|
Description
Tony Gentilcore
2013-02-21 19:56:30 PST
Created attachment 191440 [details]
wip
I'm not even sure that compiles yet (my laptop is slow). But I welcome your feedback. I'm likely to go to sleep before finishing it tonight. Created attachment 191448 [details]
Builds, but crashes fast/preloader tests
Comment on attachment 191448 [details]
Builds, but crashes fast/preloader tests
Old habits. :) Didn't mean to mark for review.
Created attachment 191458 [details]
Patch
Comment on attachment 191458 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=191458&action=review > Source/WebCore/ChangeLog:10 > + pages, as we will no longer hold multiple copies of every > + source byte during the whole parse. This is slightly misleading, as this is really only letting go of one copy. The m_segments and m_checkpoint.input hold onto refcounted StringImpls, so there is just one shared copy which we're now releasing earlier. I expect this will only effect particularly large loads. I believe this patch is ready for review and landing? Comment on attachment 191458 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=191458&action=review > Source/WebCore/ChangeLog:12 > + Many LayoutTests exercise this code path, and I've manually (debugger and printf) I think you're missing the word "verified" in this sentence. > Source/WebCore/html/parser/BackgroundHTMLInputStream.cpp:58 > + const Checkpoint& checkpoint = m_checkpoints[checkpointIndex]; Would it make sense to assert the invariant checkpointIndex < m_checkpoints.size() in this function? Comment on attachment 191458 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=191458&action=review I should look at this again when I'm less tired. It's not as beautiful as I had hoped. > Source/WebCore/html/parser/BackgroundHTMLInputStream.cpp:60 > + for (size_t i = 0; i < checkpoint.numberOfSegmentsAlreadyAppended; ++i) > + m_segments[i] = String(); Does this work correctly in the case where there are multiple checkpoints per append? Probably. Also, this algorithm is n^2. Should we keep some state for the smallest index that has actual data and start the loop from there? > Source/WebCore/html/parser/HTMLDocumentParser.cpp:462 > + checkpointIndex = std::max(chunk->inputCheckpoint, checkpointIndex); Why is this needed? > Source/WebCore/html/parser/HTMLDocumentParser.cpp:470 > + // Processing a chunk can call document.write, causing us to invalidate any remaining speculations. > + // Similarly, if we're stopped just abort and don't bother to tell the parser about passing this checkpoint. > + if (m_speculations.isEmpty() || isStopped()) { > + checkpointIndex = 0; > + break; > + } This is pretty ugly.... Comment on attachment 191458 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=191458&action=review I realize my complaints below are mostly aesthetic... > Source/WebCore/html/parser/BackgroundHTMLInputStream.cpp:56 > +void BackgroundHTMLInputStream::releaseDataUpThroughCheckpoint(HTMLInputCheckpoint checkpointIndex) invalidateCheckpointsBefore ? >> Source/WebCore/html/parser/BackgroundHTMLInputStream.cpp:58 >> + const Checkpoint& checkpoint = m_checkpoints[checkpointIndex]; > > Would it make sense to assert the invariant checkpointIndex < m_checkpoints.size() in this function? We should ASSERT that checkpoint hasn't been invalidated yet. >> Source/WebCore/html/parser/BackgroundHTMLInputStream.cpp:60 >> + m_segments[i] = String(); > > Does this work correctly in the case where there are multiple checkpoints per append? Probably. > > Also, this algorithm is n^2. Should we keep some state for the smallest index that has actual data and start the loop from there? Please fix the n^2 issue. >> Source/WebCore/html/parser/HTMLDocumentParser.cpp:470 >> + // Processing a chunk can call document.write, causing us to invalidate any remaining speculations. >> + // Similarly, if we're stopped just abort and don't bother to tell the parser about passing this checkpoint. >> + if (m_speculations.isEmpty() || isStopped()) { >> + checkpointIndex = 0; >> + break; >> + } > > This is pretty ugly.... There must be a way to structure this code so that it's less subtle. Maybe we should do this work in validateSpeculations instead? Comment on attachment 191458 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=191458&action=review >> Source/WebCore/html/parser/BackgroundHTMLInputStream.cpp:56 >> +void BackgroundHTMLInputStream::releaseDataUpThroughCheckpoint(HTMLInputCheckpoint checkpointIndex) > > invalidateCheckpointsBefore ? Happy to name it whatever. The tricky part for me was that it's inclusive of the checkpoint you pass it invalidateCheckpointsUntil(checkpoint)? >>> Source/WebCore/html/parser/BackgroundHTMLInputStream.cpp:60 >>> + m_segments[i] = String(); >> >> Does this work correctly in the case where there are multiple checkpoints per append? Probably. >> >> Also, this algorithm is n^2. Should we keep some state for the smallest index that has actual data and start the loop from there? > > Please fix the n^2 issue. OK. I'll keep a m_lastInvalidSegment or similar. >> Source/WebCore/html/parser/HTMLDocumentParser.cpp:462 >> + checkpointIndex = std::max(chunk->inputCheckpoint, checkpointIndex); > > Why is this needed? Since we can assume the chunks have checkpoints in ascending order, it's not. We just need the last one. >>> Source/WebCore/html/parser/HTMLDocumentParser.cpp:470 >>> + } >> >> This is pretty ugly.... > > There must be a way to structure this code so that it's less subtle. Maybe we should do this work in validateSpeculations instead? We need to know when the chunk is processed. I'll have to check when validateSpeculations is called. Created attachment 192294 [details]
Hopefully addressed Adam's concerns
Comment on attachment 192294 [details] Hopefully addressed Adam's concerns View in context: https://bugs.webkit.org/attachment.cgi?id=192294&action=review I can't tell if you changed HTMLDocumentParser::pumpPendingSpeculations or if I've just read it enough times to understand what it's doing. :) > Source/WebCore/html/parser/BackgroundHTMLInputStream.h:72 > + // Note: These indicies may === vector.size(), in which case there are no valid checkpoints/segments at this time. === -> == (Triple-equals is JavaScript silliness.) Comment on attachment 192294 [details] Hopefully addressed Adam's concerns View in context: https://bugs.webkit.org/attachment.cgi?id=192294&action=review > Source/WebCore/html/parser/HTMLDocumentParser.cpp:484 > + HTMLParserThread::shared()->postTask(bind(&BackgroundHTMLParser::passedCheckpoint, m_backgroundParser, lastCheckpointPassed)); The only problem with this patch is that we'll never call passedCheckpoint if we never call pumpPendingSpeculations. We only call pumpPendingSpeculations if we ended up queuing speculations. It's entirely possible that we can process every chunk from the background parser immediately after receiving them. (In reply to comment #14) > (From update of attachment 192294 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=192294&action=review > > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:484 > > + HTMLParserThread::shared()->postTask(bind(&BackgroundHTMLParser::passedCheckpoint, m_backgroundParser, lastCheckpointPassed)); > > The only problem with this patch is that we'll never call passedCheckpoint if we never call pumpPendingSpeculations. We only call pumpPendingSpeculations if we ended up queuing speculations. It's entirely possible that we can process every chunk from the background parser immediately after receiving them. That's true. I'm not sure if I should do that in a follow-up, or go another round with this one. I'm open to either. (In reply to comment #15) > (In reply to comment #14) > > (From update of attachment 192294 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=192294&action=review > > > > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:484 > > > + HTMLParserThread::shared()->postTask(bind(&BackgroundHTMLParser::passedCheckpoint, m_backgroundParser, lastCheckpointPassed)); > > > > The only problem with this patch is that we'll never call passedCheckpoint if we never call pumpPendingSpeculations. We only call pumpPendingSpeculations if we ended up queuing speculations. It's entirely possible that we can process every chunk from the background parser immediately after receiving them. > > That's true. I'm not sure if I should do that in a follow-up, or go another round with this one. I'm open to either. It seems inefficient to tell the background parser to release a chunk after each chunk. But maybe I'm optimizing prematurely. I have no data to suggest that messaging is expensive at all. :) Followup patch is fine. Comment on attachment 192294 [details] Hopefully addressed Adam's concerns Clearing flags on attachment: 192294 Committed r145277: <http://trac.webkit.org/changeset/145277> All reviewed patches have been landed. Closing bug. This work continued in bug 112069. Re-opened since this is blocked by bug 112170 *** This bug has been marked as a duplicate of bug 112069 *** |