Bug 14757

Summary: REGRESSION: HTMLTokenizer::processingData implementation is incorrect
Product: WebKit Reporter: Matt Perry <mpComplete>
Component: Page LoadingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ddkilzer, mitz
Priority: P1 Keywords: InRadar, Regression
Version: 523.x (Safari 3)   
Hardware: All   
OS: All   
Attachments:
Description Flags
patch to HTMLTokenizer
hyatt: review-
above patch with test hang fix
darin: review-
fixed patch with test
none
patch with assertion fix
beidson: review-
Final patch with tweak I suggested to previous patch darin: review+

Description Matt Perry 2007-07-24 14:01:44 PDT
Here is the implementation:
bool HTMLTokenizer::processingData() const
{
    return m_timer.isActive();
}

The problem is that this function can be called from within m_timer's fired callback.  A non-repeating timer that is being fired is not active (maybe *this* is the real bug?).  However, the tokenizer actually is processing data while inside its Timer::fired callback.  This function returns an incorrect result in that case.

I saw this problem surface when loading a page with frames on a slow machine.  Here's the sequence of events:
- The parsing of the HTML was deferred and put on a timer (HTMLTokenizer::m_timer).
- When that timer fired, one of the subframes was created.
- Initializing that subframe caused the FrameLoaders to do a recursiveCheckLoadComplete.
- Since HTMLTokenizer::processingData returned false incorrectly, the toplevel frame was assumed to have been done loading, and marked as such incorrectly.
- Frame load events were then dispatched out of order (dispatchDidFinishLoad was called for the toplevel frame before subframes).

And here's the corresponding sample callstack from a run of Safari on Mac:

#0  WebCore::TimerBase::isActive (this=0x28bded8) at /Users/mpcomplete/src/WebKit/WebCore/platform/Timer.cpp:188
#1  0x0101e13a in WebCore::HTMLTokenizer::processingData (this=0x28bde00) at /Users/mpcomplete/src/WebKit/WebCore/html/HTMLTokenizer.cpp:1497
#2  0x0138dcd6 in WebCore::DocumentLoader::isLoadingInAPISense (this=0x2926200) at /Users/mpcomplete/src/WebKit/WebCore/loader/DocumentLoader.cpp:468
#3  0x01383ac2 in WebCore::FrameLoader::checkLoadCompleteForThisFrame (this=0x281fa00) at /Users/mpcomplete/src/WebKit/WebCore/loader/FrameLoader.cpp:2777
#4  0x01383d38 in WebCore::FrameLoader::recursiveCheckLoadComplete (this=0x281fa00) at /Users/mpcomplete/src/WebKit/WebCore/loader/FrameLoader.cpp:2899
#5  0x01383dd6 in WebCore::FrameLoader::checkLoadComplete (this=0x2927c00) at /Users/mpcomplete/src/WebKit/WebCore/loader/FrameLoader.cpp:2910
#6  0x01388e72 in WebCore::FrameLoader::checkCompleted (this=0x2927c00) at /Users/mpcomplete/src/WebKit/WebCore/loader/FrameLoader.cpp:1185
#7  0x0138b93f in WebCore::FrameLoader::endIfNotLoadingMainResource (this=0x2927c00) at /Users/mpcomplete/src/WebKit/WebCore/loader/FrameLoader.cpp:1006
#8  0x0138b9b5 in WebCore::FrameLoader::end (this=0x2927c00) at /Users/mpcomplete/src/WebKit/WebCore/loader/FrameLoader.cpp:978
#9  0x0138f85c in WebCore::DocumentLoader::finishedLoading (this=0x28c4800) at /Users/mpcomplete/src/WebKit/WebCore/loader/DocumentLoader.cpp:319
#10 0x0138bbb2 in WebCore::FrameLoader::init (this=0x2927c00) at /Users/mpcomplete/src/WebKit/WebCore/loader/FrameLoader.cpp:265
#11 0x010b6d58 in WebCore::Frame::init (this=0x15e74360) at /Users/mpcomplete/src/WebKit/WebCore/page/Frame.cpp:219
#12 0x0032b4b4 in -[WebFrameBridge finishInitializingWithPage:frameName:frameView:ownerElement:] (self=0x15e35410, _cmd=0x3ac664, page=0x2148060, name=0x15e6d0f0, frameView=0x15e6dc10, ownerElement=0x15e758e0) at /Users/mpcomplete/src/WebKit/WebKit/WebCoreSupport/WebFrameBridge.mm:147
#13 0x00327f1c in -[WebFrameBridge initSubframeWithOwnerElement:frameName:frameView:] (self=0x15e35410, _cmd=0x3ac6d0, ownerElement=0x15e758e0, name=0x15e6d0f0, frameView=0x15e6dc10) at /Users/mpcomplete/src/WebKit/WebKit/WebCoreSupport/WebFrameBridge.mm:162
#14 0x0032b21f in -[WebFrameBridge createChildFrameNamed:withURL:referrer:ownerElement:allowsScrolling:marginWidth:marginHeight:] (self=0x2147160, _cmd=0x3ac834, frameName=0x15e6d0f0, URL=0x15e75970, referrer=@0xbfffe49c, ownerElement=0x15e758e0, allowsScrolling=1 '\001', width=-1, height=-1) at /Users/mpcomplete/src/WebKit/WebKit/WebCoreSupport/WebFrameBridge.mm:349
#15 0x003900a3 in WebFrameLoaderClient::createFrame (this=0x21464d0, url=@0xbfffe56c, name=@0x15e75954, ownerElement=0x15e758e0, referrer=@0xbfffe49c, allowsScrolling=1, marginWidth=-1, marginHeight=-1) at /Users/mpcomplete/src/WebKit/WebKit/WebCoreSupport/WebFrameLoaderClient.mm:1163
#16 0x0138c045 in WebCore::FrameLoader::loadSubframe (this=0x281fa00, ownerElement=0x15e758e0, url=@0xbfffe56c, name=@0x15e75954, referrer=@0x281fb24) at /Users/mpcomplete/src/WebKit/WebCore/loader/FrameLoader.cpp:452
#17 0x0138d32b in WebCore::FrameLoader::requestFrame (this=0x281fa00, ownerElement=0x15e758e0, urlString=@0x15e75950, frameName=@0x15e75954) at /Users/mpcomplete/src/WebKit/WebCore/loader/FrameLoader.cpp:422
#18 0x0136c092 in WebCore::HTMLFrameElementBase::openURL (this=0x15e758e0) at /Users/mpcomplete/src/WebKit/WebCore/html/HTMLFrameElementBase.cpp:107
#19 0x0136c5af in WebCore::HTMLFrameElementBase::openURLCallback (n=0x15e758e0) at /Users/mpcomplete/src/WebKit/WebCore/html/HTMLFrameElementBase.cpp:158
#20 0x010d72f5 in WebCore::ContainerNode::attach (this=0x15e758e0) at /Users/mpcomplete/src/WebKit/WebCore/dom/ContainerNode.cpp:605
#21 0x0122f338 in WebCore::Element::attach (this=0x15e758e0) at /Users/mpcomplete/src/WebKit/WebCore/dom/Element.cpp:664
#22 0x0136b7e7 in WebCore::HTMLFrameElementBase::attach (this=0x15e758e0) at /Users/mpcomplete/src/WebKit/WebCore/html/HTMLFrameElementBase.cpp:192
#23 0x0126ee31 in WebCore::HTMLFrameElement::attach (this=0x15e758e0) at /Users/mpcomplete/src/WebKit/WebCore/html/HTMLFrameElement.cpp:66
#24 0x0101d18d in WebCore::HTMLParser::insertNode (this=0x15e3e8a0, n=0x15e758e0, flat=false) at /Users/mpcomplete/src/WebKit/WebCore/html/HTMLParser.cpp:328
#25 0x0101da35 in WebCore::HTMLParser::parseToken (this=0x15e3e8a0, t=0x28bde14) at /Users/mpcomplete/src/WebKit/WebCore/html/HTMLParser.cpp:250
#26 0x0101f818 in WebCore::HTMLTokenizer::processToken (this=0x28bde00) at /Users/mpcomplete/src/WebKit/WebCore/html/HTMLTokenizer.cpp:1641
#27 0x010229ed in WebCore::HTMLTokenizer::parseTag (this=0x28bde00, src=@0x28be304, state={static EntityShift = 4, m_bits = 0}) at /Users/mpcomplete/src/WebKit/WebCore/html/HTMLTokenizer.cpp:1206
#28 0x010233c4 in WebCore::HTMLTokenizer::write (this=0x28bde00, str=@0xbfffebdc, appendData=true) at /Users/mpcomplete/src/WebKit/WebCore/html/HTMLTokenizer.cpp:1437
#29 0x0101f20b in WebCore::HTMLTokenizer::timerFired (this=0x28bde00) at /Users/mpcomplete/src/WebKit/WebCore/html/HTMLTokenizer.cpp:1517
#30 0x014a7b57 in WebCore::Timer<WebCore::HTMLTokenizer>::fired (this=0x28bded8) at Timer.h:96
#31 0x011f1026 in WebCore::TimerBase::fireTimers (fireTime=1185310107.954752, firingTimers=@0xbfffecbc) at /Users/mpcomplete/src/WebKit/WebCore/platform/Timer.cpp:336
#32 0x011f10c3 in WebCore::TimerBase::sharedTimerFired () at /Users/mpcomplete/src/WebKit/WebCore/platform/Timer.cpp:353
#33 0x011f0763 in timerFired () at /Users/mpcomplete/src/WebKit/WebCore/platform/mac/SharedTimerMac.cpp:48
Comment 1 Matt Perry 2007-07-24 15:54:53 PDT
Created attachment 15674 [details]
patch to HTMLTokenizer

I'm going the simple route and changing HTMLTokenizer::processingToken().  It should also return true when inside the ::write() call (which it will be when the timer fires).
Comment 2 Darin Adler 2007-07-25 09:58:47 PDT
Can you make a test case to demonstrate the issue here? What would we need added to the testing infrastructure to demonstrate the problem this causes?
Comment 3 Matt Perry 2007-07-25 15:32:33 PDT
To be clear, the symptom of this problem that I saw was that WebFrameLoaderClient::dispatchDidFinishLoad was called for the toplevel frame before it was called for its subframes (and thus, obviously before it was actually finished loading).

To add a test case, we would need some way to force HTMLTokenizer to defer its processing using m_timer, so that it processed the <frame> element (thus creating the Frame object) from the timer callback.  Then we would need a way to check that dispatchDidFinishLoad for the toplevel frame was called after dispatchDidFinishLoad for a subframe.

However, it seems to me that a new test is unnecessary, since processingData() should obviously return true if HTMLTokenizer is still inside the write() function.
Comment 4 Dave Hyatt 2007-07-25 16:17:10 PDT
Comment on attachment 15674 [details]
patch to HTMLTokenizer

r=me
Comment 5 Matt Perry 2007-07-25 16:23:06 PDT
Damn, hang on.  http/tests/misc/window-dot-stop.html hangs with this patch.  Investigating.
Comment 6 Dave Hyatt 2007-07-25 16:29:35 PDT
Comment on attachment 15674 [details]
patch to HTMLTokenizer

Clearing flag.
Comment 7 Dave Hyatt 2007-07-25 16:30:06 PDT
Minusing while mpComplete investigates.

Comment 8 Matt Perry 2007-07-25 18:23:56 PDT
Ok, this test hangs for the same reason that I was seeing the bug to begin with.  There is a JS call to window.stop() in the body of this test, which gets executed synchronously while parsing the document.  That means that we cancel the load of the page from within HTMLTokenizer::write.  As soon as we cancel, we check if the load is complete - since we're still parsing, this check fails.

I believe a reasonable fix would be to do the checkLoadComplete call asynchronously in this case.  (Of course, we'd still have to cancel the load mid-parsing.)  If you guys agree, I'll have a patch ready shortly.
Comment 9 Matt Perry 2007-07-25 18:54:18 PDT
Created attachment 15688 [details]
above patch with test hang fix

This patch does what I described in my previous comment - FrameLoader::stopForUserCancel (called from window.stop) does the loadComplete check asynchronously.
Comment 10 Darin Adler 2007-07-25 20:05:10 PDT
Comment on attachment 15688 [details]
above patch with test hang fix

(In reply to comment #3)
> However, it seems to me that a new test is unnecessary, since processingData()
> should obviously return true if HTMLTokenizer is still inside the write()
> function.

Sorry, I think there's a misunderstanding here.

A new test is necessary. Proving that the fix is correct is only one reason to add a test. The more important reason for a test is to prevent the bug from being reintroduced later on.

We definitely need to add a test for this unless it's impractical. The patch should include that test.

+    //return m_timer.isActive();

We don't check in commented-out code.
Comment 11 Maciej Stachowiak 2007-07-25 22:11:17 PDT
I agree with Darin that this needs a regression test. It should be possible to make one. You can trigger use of the tokenizer timer deliberately with a long-running script and then more content after it. If you make it an http test, you can even control what individual chunks of content get fed back by making a CGI script that pauses as needed. You should be able to observe the symptom as "load" events getting delivered out of order. With an http/tests/loading test, you could even observe the order of delegate methods.
Comment 12 Dave Hyatt 2007-07-26 13:24:13 PDT
This is a pretty bad regression.  Elevating to P1.  We need a corresponding Radar for this issue.
Comment 13 Matt Perry 2007-07-26 14:02:12 PDT
Created attachment 15693 [details]
fixed patch with test

Here's the above patch, with regression test.

Turns out the test was easier to make than I expected.  Thanks for the advice, Maciej!  I didn't know there was already a framework for dumping the frame load callbacks.

And sorry about the commented code.  I was in a rush and didn't look closely at the previous patch.
Comment 14 Matt Perry 2007-07-26 17:14:55 PDT
Wait, there's a problem with this patch.  I hit an assert in FrameLoader::checkLoadComplete sometimes when closing a window.  It seems stopForUserCancel can be called while destroying the WebView.  Since checkLoadComplete is now called asynchronously, the WebView is gone by that time, which it doesn't like.
Comment 15 David Kilzer (:ddkilzer) 2007-07-26 22:04:51 PDT
<rdar://problem/5364692>
Comment 16 Matt Perry 2007-07-27 12:49:04 PDT
I've been brainstorming a few solutions to the assertion problem I had, and I'm not sure what the right approach is.  Maybe someone knows the answer.

Here are the options:
1. Just check if m_client->hasWebView() (what the assertion asserts) before calling checkLoadComplete.  Feels like a bandaid.  Also, this would mean we wouldn't fire the cancel load events when closing a webview while loading.
2. Add some more logic to HTMLTokenizer::processingData to detect that the load has been cancelled mid-parse (set a flag in stopParsing), and if so, return false.  I think this might mean the load events would still be fired out of order for a cancelled load, but I haven't tested.
3. Add some logic to FrameLoader::stopForUserCancel that decides whether to checkLoadComplete asynchronously or synchronously, based on whether the tokenizer is still parsing.  I think this would work for all cases, but it feels hacky.
4. Leave stopForUserCancel alone, and do the async checkLoadComplete only for the window.stop case.  This would fix the hang I described in comment #8, while also fixing the original problem.

Hmm.. now that I've talked myself through this, I think I like option 4 the best.
Comment 17 Matt Perry 2007-07-27 14:09:59 PDT
Created attachment 15709 [details]
patch with assertion fix

This is like my previous patch, except:
- FrameLoader::stopForUserCancel does what it always did.
- DOMWindow::stop (window.stop) no longer calls stopForUserCancel.  It cancels the load and then schedules a load-complete check asynchronously.

This fixes the original bug while still passing all tests and not hitting the assertion I had in comment #8.
Comment 18 Brady Eidson 2007-07-31 12:43:52 PDT
Comment on attachment 15709 [details]
patch with assertion fix

I think this fix is pretty darned good.  
r- for one issue.
You replace the stopForUserCancel() call with a stopAllLoaders() and scheduleCheckLoadComplete() call.
It is not obvious to me that those two lines replace the behavior of "stopForUserCancell()" but just do it asynchronously.
I think it'd be better for future maintainability if stopForUserCancel() had an asynchronous flag added, and that method decides whether to do the load complete check now, or later.
Comment 19 Brady Eidson 2007-07-31 13:00:01 PDT
Created attachment 15763 [details]
Final patch with tweak I suggested to previous patch
Comment 20 Darin Adler 2007-07-31 13:16:43 PDT
Comment on attachment 15763 [details]
Final patch with tweak I suggested to previous patch

+    if (m_frame->page())
+        checkLoadComplete();

The checkLoadComplete() function checks to see if the page is 0 -- why do we need an additional check in the caller? Is the problem with the assertion? If so, maybe we should move the assertion inside the if statement.

+    if (m_frame->page()) {
+        if (asynchronous)
+            scheduleCheckLoadComplete();
+        else
+            checkLoadComplete();
+    }

Also seems unnecessary to check for page of 0 here.

I don't understand the meaning of the "asynchronous" parameter in stopForUserCancel. The stop itself seems to be immediate no matter what, so the only that's deferred is the "check load complete". Given that, I think the parameter name is quite unclear.

Otherwise looks, good, and these are really nitpicks, so r=me.
Comment 21 Brady Eidson 2007-07-31 13:34:42 PDT
I changed the name of the async parameter, and made a small tweak to address one of the page() null check concerns.  I decided against making further changes since they were not necessary and might have unforeseen consequences that we'd rather not deal with.

Committed revision 24798.