Bug 9001

Summary: Javascript stops running before replacement page data arrives
Product: WebKit Reporter: Geoffrey Garen <ggaren>
Component: Page LoadingAssignee: Alexey Proskuryakov <ap>
Status: RESOLVED FIXED    
Severity: Normal CC: grahamperrin, ian
Priority: P2 Keywords: InRadar
Version: 420+   
Hardware: Mac   
OS: OS X 10.4   
Attachments:
Description Flags
HTML file demonstrating bug
none
Helper file for demonstrating bug
none
David's latest patch
ggaren: review-
test case ggaren: review+

Geoffrey Garen
Reported 2006-05-19 09:56:42 PDT
This bug is also in Radar as <rdar://4477632> * SUMMARY I'm working on a combination of JavaScript and CGI that gives a progress bar while a file is being uploaded. I can't do it in the background with XMLHttpRequest, because Javascript doesn't allow access to file contents. (Grrr.) Thus, the only workaround is to track progress on the server and use client-side code to poll it periodically during the form submission. Problem is... it doesn't work in Safari. It works fine in Firefox, but in Safari as soon as you click a link or start a form submission, all Javascript execution stops cold. It's really annoying, and the only workaround I can find (using a pop-up window for status reporting) is a bit of a kludge. I've attached a simple test page that demonstrates the problem. To test this, load the page timertest.html, then click the link to start either the internal (same page) or external (pop-up) timer, then click to start the test. In Firefox, you get he expected behavior葉he number continues to increment once a second while the new page tries to load. In Safari, the number stops changing as soon as you click the test link. ------------------------------------------- <GMT15-Mar-2006 01:21:05GMT> David Gatwood: I just checked, and this mechanism also works on IE5 (Mac), Netscape 7.1 (Mac), and Opera (Mac), as well as IE 6 (Win) and Netscape 7.1 (Win). So it looks like Safari is the only major browser where this code doesn't work. <GMT22-Mar-2006 23:35:35GMT> David Gatwood: Proposed patch attached. The problem was caused by the call to Frame::pauseTimeouts from within saveDocumentToPageCache, which was, in turn, called from WebFrame's _createPageCacheForItem method. Because this was being called before the data source was committed, the javascript code stopped functioning, causing my JavaScript progress bar to stand still. This patch wraps the _createPageCacheForItem message in a test for (_private->state == WebFrameStateCommittedPage) instead of (_private->state == WebFrameStateProvisional), thus causing the JavaScript code to continue executing until the last practical moment. This two line change fixes the misbehavior noted. Backwards/forwards caching appears to still work, at least in very limited testing. <GMT28-Mar-2006 23:54:11GMT> Alice Liu: Don't worry David - the milestone being "Later" doesn't preclude your fix, if approved, from being integrated into Leopard. =) Geoff please review David's patch. <GMT30-Mar-2006 00:39:58GMT> Geoff Garen: David, It's great that you took this bug into your own hands. However, I don't think the fix you proposed will work. A load becomes committed when its receives its first byte of data. At that time, it's almost certain that isLoading is true. Therefore, your patch would result in WebKit never adding items to the back/forward cache. I confirmed this theory by setting a breakpoint on _createPageCacheForItem and browsing around www.google.com. I'm assigning this back to you in case you want to take another shot at it. <GMT30-Mar-2006 03:49:06GMT> David Gatwood: At the point where this happens, previousItem is null for the first page, so we obviously should be looking at currentItem. However, when I do this, backwards caching appears to work, but going back forwards results in a bus error. So it looks like _createPageCacheForItem will have to take an alternate data source, cached from the previous one, and if there is no previous data source, no cache entry should get stored. It is definitely using the caching with these changes, as I can disconnect from the network and go backwards and forwards without any problems. I ended up fixing a NULL pointer dereference in scheduleRelayout in which it checks to see if m_frame->document is NULL, but doesn't check to see if m_frame is NULL, and for some reason, it is. I'm not sure why m_frame is ending up being NULL, though, and that concerns me that this may be introducing some very subtle regression in back/forward cache behavior. I sure can't see it if it is, though. I've attached the updated bug as javascript_hang-revised.patch. I'm not sure how concerned to be about the m_frame issue. With the updated changes, it still passes all the regression tests except two that failed before I started touching things. Let me know what you think. <GMT01-Apr-2006 23:09:32GMT> Geoff Garen: This patch worries me. I don't think either of us fully understands its repercussions. It's very easy to cause nasty regressions in this party of the code (I speak from experience here), and none of our regression tests cover it. (In fact, DumpRenderTree runs with the back/forward cache turned off.) Some comments: 1. I think you only need to track previousDataSource and previousDocumentView. You can use those later to discover whether the previous data source was loading, etc. 2. The commented-out code in Window::Alert is just confusing. 3. Was the change to CursorMac.mm intentional? What does it do? 4. To match "_private->previousDataSource = NULL;", you should probably use "[_private->previousDataSource release];" instead of "[dataSource release];" In fact, it would probably be clearer if you just used "_private->previousDataSource" instead of "dataSource". 5. The log is probably only appropriate in the case where _private->previousDataSource exists. The fact that we don't save pages to the b/f cache when you first arrive is not news; it's only if a page fails to save when you leave that's news. So "else if (_private->previousDataSource) {" is probably a better way to go. Generally, this approach seems sound. I don't think we can land it until we know why m_frame is now NULL in a situation where it wasn't before, though. <GMT03-Apr-2006 18:29:40GMT> David Gatwood: Re #1: good point. That will make things a lot cleaner. Ignore #2. This was me trying to figure out whether the script execution had actually stopped or the results simply were not being displayed. Ignore #3, also. This was me trying to work around a compile problem of some sort that turned out to just be a a case of me using the wrong gcc version.... (BTW, the webkit pages should really address which compiler to use. :-) Will take care of #4 and #5 and will try to figure out what's causing the null pointer regression. <GMT04-Apr-2006 01:39:48GMT> David Gatwood: Okay, I'm still not sure whether the failure in that call would actually cause any problems or not, but this revised patch avoids the question entirely by moving the caching code earlier in the chain of events that lead to committing the new data---far enough back that backing data hasn't been obliterated, but far enough forward that it doesn't stop the Javascript interpreter until data has actually been received. I've attached the updated patch as javascript_hang-revised-3.patch. <GMT04-Apr-2006 16:54:55GMT> Geoff Garen: I'm confused. javascript_hang-revised-3.patch doesn't seem to address #4 or #5, and it still includes the NULL check for m_frame. Generally, I'm reluctant to add things to _checkLoadCompleteForThisFrame that do more than check whether the load is complete and notify the right folks. It's true that some code is guilty of that already, but we really need to clean that up, clarifying load responsibilities and making regressions due to poorly understood dependencies less likely. <GMT04-Apr-2006 19:02:47GMT> David Gatwood: I'm a little confused by your second issue. I didn't touch _checkLoadCompleteForThisFrame. The call to doCache is at the start of _transitionToCommitted. Oh, son of a... I just figured out why m_frame was NULL when the caching code was executed later.... After that one line change, we can go back to caching it where I tried to cache it in the original patch. *slams head into wall repeatedly* New (greatly simplified) patch attached as javascript_hang-revised-6.patch. <GMT05-Apr-2006 14:54:08GMT> Geoff Garen: if (_private->previousDataSource && In Objective-C, messages sent to nil objects just return nil, so the nil check is unnecessary. [_private->bridge canCachePage] What does this call do? Should it be called the previous bridge now, not the current one? else { - LOG(PageCache, "NOT saving page to back/forward cache, %@\n", [[self dataSource] _URL]); + if (_private->previousDataSource) { + LOG(PageCache, "NOT saving page to back/forward cache, %@\n", [_private->previousDataSource _URL]); + } The WebCore style guidelines say only to use braces around ifs if the ifs enclose more than one line. So you can eliminate three lines of code here by nixing the 2 braces and making the else \n if into an else if. Why do we refuse to cache when _private->quickRedirectComing is true? With your new scheme, do we need to know if a quick redirect *was* coming for the previous page? _canCachePage is the most confusing indirection. Let's just get rid of it and put the one line of code it encloses at the call site. (Not new to your patch, but it would be a great thing to clean up along the way.) + /* DAG: Delayed caching code until the last possible moment + for more consistent JavaScript behavior. */ We usually put this kind of exposition in the ChangeLog. (You can generate a skeleton ChangeLog with the prepare-ChangeLog script, then fill in the blanks. You'll need one before this patch can land.) <GMT05-Apr-2006 18:58:50GMT> David Gatwood: Cool about the null call being safe. I'm used to C++ where calling a function with a NULL "this" pointer isn't a bright idea. :-) Re: the bridge's canCachePage... since it's calling through the bridge into the C++ bigs, that call is probably using the wrong m_frame by this point in the code no matter what we do. Thus, I'm thinking that the cacheable status of the old page should be stored in _private->previousCanCachePage instead of the web view's notion (which isn't going to change anyway). Re: the quick redirect, yes, this is still needed, I think. If the previous page contained a meta refresh with... I think a second or less... something like that... then the page doesn't go in the cache because it is considered worthless... or at least that's the way I heard it explained to me. But the previous page still was loaded, so the old page info should still be stored in _private->previousWhatever, so if we don't check that, we risk breaking that caching behavior. Re: style, as long as I'm fixing coding style in that spot, I'm moving the 'else' statements to be on the same line as the prior closing brace, as per recommended style. It is a little confusing visually to have a nested else statement without braces followed immediately by the closing brace of the enclosing 'if' statement, though. I'm wondering if maybe that should be an exception where even a single line 'else' clause should have braces anyway. Just a thought. Anyway, updated patch attached. <GMT19-May-2006 16:52:15GMT> Geoff Garen: I still don't think this is quite right. I'm going to post your patch on bugzilla to get more eyes on it.
Attachments
HTML file demonstrating bug (804 bytes, text/html)
2006-05-19 09:59 PDT, Geoffrey Garen
no flags
Helper file for demonstrating bug (578 bytes, text/html)
2006-05-19 09:59 PDT, Geoffrey Garen
no flags
David's latest patch (6.22 KB, patch)
2006-05-19 10:00 PDT, Geoffrey Garen
ggaren: review-
test case (2.97 KB, patch)
2008-11-23 03:13 PST, Alexey Proskuryakov
ggaren: review+
Geoffrey Garen
Comment 1 2006-05-19 09:59:00 PDT
Created attachment 8406 [details] HTML file demonstrating bug
Geoffrey Garen
Comment 2 2006-05-19 09:59:32 PDT
Created attachment 8407 [details] Helper file for demonstrating bug
Geoffrey Garen
Comment 3 2006-05-19 10:00:53 PDT
Created attachment 8408 [details] David's latest patch I don't think this patch is quite right -- I'd like to get some more opinions.
Geoffrey Garen
Comment 4 2006-06-09 11:55:10 PDT
To be specific, this is what I think might be wrong: Our old behavior was to make all decisions based on the current page. This patch changes behavior to make certain decisions based on a previous page (_private->previousCanCachePage, for example), while still making others based on the current page (!_private->quickRedirectComing, for example). I'm worried that these decisions will end up out of sync with each other. I would also like to see some testing and confirmation that (1) pages get loaded into the back/forward cache; and (2) all the possible branches, like the quickRedirectComing branch, work appropriately. Since nobody else has said anything, I'm going to mark this r- for now, and hope that somebody picks it up later.
David Kilzer (:ddkilzer)
Comment 5 2006-06-09 14:38:08 PDT
I hope this is the same David Gatwood I'm adding to the CC list!
Geoffrey Garen
Comment 6 2006-06-09 16:16:51 PDT
Actually, it was an Apple employee.
David A. Gatwood
Comment 7 2006-07-31 10:01:41 PDT
Same me.
Alexey Proskuryakov
Comment 8 2008-11-23 03:13:59 PST
Created attachment 25396 [details] test case This bug is no longer reproducible with Safari 3.1.1 (possibly, even earlier). Adding a test case.
Alexey Proskuryakov
Comment 9 2008-11-23 03:28:31 PST
This test is tricky - it only works because CFNetwork caches data before giving it to WebKit. So, it doesn't pass in Firefox. But if hang-connection.php didn't write anything, the connection would have remained frozen even after canceled by WebKit, breaking all future tests.
Geoffrey Garen
Comment 10 2008-11-24 22:29:18 PST
Comment on attachment 25396 [details] test case > + function updateclock() Should be "updateClock". + usleep(100000); Is this 1s? If so, OK. But let's try not to pause DRT for more than 1s on this test. r=me
Alexey Proskuryakov
Comment 11 2008-11-25 02:26:22 PST
Committed revision 38751. (In reply to comment #10) > + usleep(100000); > > Is this 1s? If so, OK. But let's try not to pause DRT for more than 1s on this > test. Even better, it's 0.1s.
Note You need to log in before you can comment on or make changes to this bug.