Bug 30839 - REGRESSION: crashes in WebCore::RedirectScheduler::timerFired(WebCore::Timer<WebCore::RedirectScheduler>*)
Summary: REGRESSION: crashes in WebCore::RedirectScheduler::timerFired(WebCore::Timer<...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.5
: P2 Normal
Assignee: Nobody
URL: http://www.linkinpark.it/forum/index.php
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-27 15:51 PDT by Simon Fraser (smfr)
Modified: 2009-10-28 17:22 PDT (History)
5 users (show)

See Also:


Attachments
Patch v1 (1.13 KB, patch)
2009-10-28 01:08 PDT, Adam Barth
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Fraser (smfr) 2009-10-27 15:51:13 PDT
Nameless on #webkit is reporting crashes here:

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000092
Crashed Thread:  0  Dispatch queue: com.apple.main-thread

Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
0   com.apple.WebCore                     0x0000000100dc289a WebCore::RedirectScheduler::timerFired(WebCore::Timer<WebCore::RedirectScheduler>*) + 42
1   com.apple.WebCore                     0x0000000100fc6767 WebCore::ThreadTimers::sharedTimerFiredInternal() + 151
2   com.apple.WebCore                     0x0000000100ed9e05 WebCore::timerFired(__CFRunLoopTimer*, void*) + 53
3   com.apple.CoreFoundation              0x00007fff85003a78 __CFRunLoopRun + 5480
4   com.apple.CoreFoundation              0x00007fff8500203f CFRunLoopRunSpecific + 575
5   com.apple.HIToolbox                   0x00007fff837ffc4e RunCurrentEventLoopInMode + 333
6   com.apple.HIToolbox                   0x00007fff837ffa53 ReceiveNextEventCommon + 310
7   com.apple.HIToolbox                   0x00007fff837ff90c BlockUntilNextEventMatchingListInMode + 59
8   com.apple.AppKit                      0x00007fff82804520 _DPSNextEvent + 718
9   com.apple.AppKit                      0x00007fff82803e89 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 155
10  com.apple.Safari                      0x000000010000bcf0 0x100000000 + 48368
11  com.apple.AppKit                      0x00007fff827c9a7d -[NSApplication run] + 395
12  com.apple.AppKit                      0x00007fff827c2798 NSApplicationMain + 364
13  com.apple.Safari                      0x0000000100001d0c 0x100000000 + 7436

(s)he says it happens at http://www.linkinpark.it/forum/index.php, in various places in those fora.

Regression from bug 30424?
Comment 1 Simon Fraser (smfr) 2009-10-27 15:52:49 PDT
It's not clear that bug 30424 affected this.
Comment 2 Brady Eidson 2009-10-27 15:53:41 PDT
Probably not a regression from https://bugs.webkit.org/show_bug.cgi?id=30424 - we've had this crash in radar pretty much since Adam added the redirect scheduler.

(see <rdar://problem/7313414>)
Comment 3 Nameless 2009-10-27 16:04:04 PDT
I am running Version 4.0.3 (6531.9, r50124)
I've updated the browser 3 times, but the problem persists.
The Segmentation Fault happens when browsing the phpbb3 forum, but it's general.
Sometimes happens when i'm only browsing, others when i'm replying to a post.
I don't really know
Comment 4 Nameless 2009-10-27 16:08:54 PDT
(In reply to comment #3)
> I am running Version 4.0.3 (6531.9, r50124)
> I've updated the browser 3 times, but the problem persists.
> The Segmentation Fault happens when browsing the phpbb3 forum, but it's
> general.
> Sometimes happens when i'm only browsing, others when i'm replying to a post.
> I don't really know

Oh i forgot. i'm running Snow Leopard
Comment 5 Brady Eidson 2009-10-27 16:51:14 PDT
I'm on a debug build of r50166 and I simply can't reproduce this.

I haven't tried replying to a comment yet, but according to the reporter that doesn't actually reliably reproduce it for him, either.
Comment 6 Darin Adler 2009-10-27 17:28:11 PDT
The invalid address at 0x92 indicates we’re dereferencing a null pointer with an offset of 0x92. And defersLoading is at an offset of 0x92. So I’m pretty sure this crash means that RedirectScheduler::timerFired is being called when m_frame->page() is 0.

One thing we can do about this right away is to add a return statement for cases where page is 0 after the assertion. Then we can still look for this bug in debug builds, but possible render it harmless in release builds.

Another thing we can do is do the thought experiment about how m_frame->page() could become zero without telling the RedirectScheduler to stop its timer.
Comment 7 Darin Adler 2009-10-27 17:41:14 PDT
I see nothing that would guarantee that RedirectScheduleer::clear() is called on the frames in a page when a page is destroyed.

The Frame::pageDestroyed() function calls loader()->checkLoadComplete() on the parent frame but otherwise does nothing to stop the loader or clear the redirect scheduler.

There’s also CachedFrame::destroy(). My guess is that case already guarantees the loading has stopped somehow, including clearing the redirect scheduler, but I'm not sure about that case.

The lowest level way to fix this would be to just check for 0 in the timerFired function. Moving up the food chain, we could add a call to RedirectScheduler::clear() in Frame::detachFromPage(). Or we could add one to Frame::pageDestroyed() if we satisfy ourselves it’s not needed in CachedFrame::destroy().

If we could figure out a clean existing or new higher level concept of when we should stop loading due to the page going away, including clearing the scheduled redirect, then we could just fix that to work properly.

Given we can’t reproduce the crash reliably, we may not be able to make a test case, but this is trivial to fix!
Comment 8 Adam Barth 2009-10-28 00:08:14 PDT
Huh...  I thought I commented on this bug already.  Anyway, I'm looking into it.
Comment 9 Adam Barth 2009-10-28 01:08:31 PDT
Created attachment 42004 [details]
Patch v1
Comment 10 Adam Barth 2009-10-28 01:12:02 PDT
Adding the null check seems more robust than making sure some elaborate set of cancelations happens correctly.

How can I test this behavior?  Is there a good way to hold on to a Frame after it's page has died?  Nothing jumped out at me when I grepped the code base.  The most promising thing I saw was Database callbacks...
Comment 11 Darin Adler 2009-10-28 07:39:39 PDT
Comment on attachment 42004 [details]
Patch v1

I would have put the Page* into a local variable. But Maciej has chided me about that and told me to just trust the compiler to optimize the common subexpression in a simple, inlined case like this.
Comment 12 Darin Adler 2009-10-28 07:41:47 PDT
(In reply to comment #10)
> How can I test this behavior?  Is there a good way to hold on to a Frame after
> it's page has died?  Nothing jumped out at me when I grepped the code base. 
> The most promising thing I saw was Database callbacks...

I really have no idea. We could try to read some of the code at <http://www.linkinpark.it/forum/index.php> for clues. For the most part, I know about these "frame afterlife" issues from crash traces.
Comment 13 Adam Barth 2009-10-28 12:45:29 PDT
Comment on attachment 42004 [details]
Patch v1

Long term, we might want to consider adding some testing specific APIs to WebKit like "please hold a reference to this Frame and keep it alive."

Issues like this are easier to test with unit tests.  Maybe we should consider unit tests in addition to LayoutTests?  I like the LayoutTest model, but there are a lot of branches for null pages.  It's sad if we don't have any way of testing them.
Comment 14 Darin Adler 2009-10-28 16:25:47 PDT
(In reply to comment #13)
> I like the LayoutTest model, but there
> are a lot of branches for null pages. It's sad if we don't have any way of
> testing them.

We can probably crack the code and figure out how to test them. I’ll think about it. One of the easiest ways to find a way to trigger a condition is to stick in an assertion and then run the browser under the debugger for a while.
Comment 15 WebKit Commit Bot 2009-10-28 17:22:23 PDT
Comment on attachment 42004 [details]
Patch v1

Clearing flags on attachment: 42004

Committed r50251: <http://trac.webkit.org/changeset/50251>
Comment 16 WebKit Commit Bot 2009-10-28 17:22:27 PDT
All reviewed patches have been landed.  Closing bug.