Bug 29832

Summary: Crash in DOMWindow::clearTimeout etc when DOMWindow is not connected to frame
Product: WebKit Reporter: Nate Chapin <japhet>
Component: WebCore Misc.Assignee: Nate Chapin <japhet>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, ap, commit-queue, darin, dglazkov, dimich, eric, ggaren, levin, psolanki, sam
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Attachments:
Description Flags
Test file reproducing the crash in Safari
none
Crash dump
none
Proposed fix.
fishd: review-
Updated patch.
none
Updated patch. none

Description Nate Chapin 2009-09-28 15:41:47 PDT
The Chromium port has been seeing some crashes due to DOMWindow::clearTimeout and DOMWindow::clearInterval passing a null Document* as the ScriptExecutionContext into DOMTimer::removeById().

I'd like to add a null check and early return to DOMTimer::removeById(). Since ScriptExecutionContexts should delete all related timers when they are destructed, we shouldn't need to do anything if told to remove a timer from a null ScriptExecutionContext.

In addition, the attempted fix in the V8 bindings needs to be removed, as it was placed in custom bindings that aren't actually in use.
Comment 1 Sam Weinig 2009-09-28 15:50:36 PDT
I think it would be much better to find out why null pointers are being passed then to just add a work-around.  Has there been investigation into the cause?
Comment 2 Nate Chapin 2009-09-28 15:58:02 PDT
Some, but nothing definite has been concluded.  It appears based on the fact that DOMWindow::document() is returning 0 and not failing any assertions, it seems either DOMWindow:m_frame is getting zeroed, or m_frame->domWindow() is pointing to a different DOMWindow than this.

Unfortuantely, I don't know how to deterministically reproduce the crash, but I'll do some more digging before submitting a patch to this bug.
Comment 3 Sam Weinig 2009-09-28 16:02:31 PDT
A null document here is probably indicative of a navigation.  I would guess that the timers are not getting correctly suspended in some case.
Comment 4 Dmitry Titov 2009-10-02 22:47:00 PDT
Created attachment 40564 [details]
Test file reproducing the crash in Safari

Interesting. I found one way to reproduce it - see attached file. Crashes Safari with WebKit ToT.

I found this repro looking  for incorrect suspend/resume of timers per Sam's suggestion. Didn't find those, but found at least one where timer in one window refers to another window, and PageCahe expiration starts to close cached pages. The DOMWindow.m_frame is NULL and JS continues to call window.clearTimeout.

I'm weak in JSC details, but here is what I see in the debugger:
Apparently, DOMWindow w/o frame happens as result of navigation or window close - in these cases the  JSDOMWindow::getOwnPropertySlot starts to tell JS that window only has 'close' and 'closed', and throws TypeError otherwise. JSDOMWindow::getOwnPropertySlot is not called all the time, the result seems to be cached (not yet sure how). So page destruction and navigation code paths use ScriptController::clearWindowShell() and this seems to have an effect of re-creating JSDOMWindow wrapper and invalidating the caches - so the properties/methods get re-resolved.

Adding ScriptController::clearWindowShell() to Frame destructor right after it disconnects from the DOMWindow (so if JSDOMWindowShell is still referenced by some script in other pages, it'll start act as 'closed' window) fixes the crash...
Comment 5 Dmitry Titov 2009-10-05 10:53:07 PDT
Created attachment 40643 [details]
Crash dump
Comment 6 Dmitry Titov 2009-10-05 10:58:40 PDT
Created attachment 40644 [details]
Proposed fix.

I think this might be a fix. Clearing WindowShell after Frame disconnects from DOMWindow (and renders it unable to execute most JS callbacks) will clear global script data and cause re-resolution of properties. I can't think of a DRT test for that, the one I have takes ~10 seconds to run which is too long.
Comment 7 Darin Fisher (:fishd, Google) 2009-10-05 23:32:08 PDT
Comment on attachment 40644 [details]
Proposed fix.

> +        Reviewed by NOBODY (OOPS!).
> +
> +        Crash Crash in DOMWindow::clearTimeout etc when DOMWindow is not connected to frame.

extra "Crash"


> +        https://bugs.webkit.org/show_bug.cgi?id=29832
> +
> +		Need to make sure the script caches are reset when frame gets disconnected from still-alive DOMWindow.
> +		This will prevent JS from calling DOMWindow methods that can not be completed w/o the frame.

oops ^^^ tabs need to be changed to spaces
Comment 8 Dmitry Titov 2009-10-05 23:54:47 PDT
Created attachment 40695 [details]
Updated patch.

Oops. Updated :-) Thanks for looking!
Comment 9 Dmitry Titov 2009-10-05 23:56:01 PDT
Created attachment 40696 [details]
Updated patch.
Comment 10 Eric Seidel (no email) 2009-10-06 09:57:49 PDT
Sam is your man here.
Comment 11 Dmitry Titov 2009-10-07 18:00:58 PDT
I hope Sam will have time to take a look. Or anybody else who feel ok with it.

Obviously, alternative fix would be to simply check for m_frame being NULL in DOMWindow set/clearTimeout, set/clearInterval methods - consistently with another couple dozen methods on DOMWindow that already do that. Perhaps that would be easier to review too.

I thought it would be nice to avoid conditions when JS is using DOMWindow which is disconnected. But if the fix looks risky, we could just check for null :-)
Comment 12 Adam Barth 2009-10-07 19:00:19 PDT
> I thought it would be nice to avoid conditions when JS is using DOMWindow which
> is disconnected. But if the fix looks risky, we could just check for null :-)

Drive-by comment: we can't avoid JS playing with detached DOMWindows.
Comment 13 Dmitry Titov 2009-10-07 20:23:41 PDT
(In reply to comment #12)
> Drive-by comment: we can't avoid JS playing with detached DOMWindows.

I think we can - with WidowShell->JSDOMWindow->DOMWindow triplet we should be able to have a check in a single place (and this is what we have now already at the level of JSDOMWindow custom getters)
Comment 14 Adam Barth 2009-10-18 22:39:57 PDT
Comment on attachment 40696 [details]
Updated patch.

This change is fine.  I'm sad that we can't test it, but I believe you when you say it's requires a 10 second test.
Comment 15 Yong Li 2009-10-19 08:48:09 PDT
Comment on attachment 40696 [details]
Updated patch.

Let commit bot land it
Comment 16 WebKit Commit Bot 2009-10-19 09:00:28 PDT
Comment on attachment 40696 [details]
Updated patch.

Clearing flags on attachment: 40696

Committed r49786: <http://trac.webkit.org/changeset/49786>
Comment 17 WebKit Commit Bot 2009-10-19 09:00:32 PDT
All reviewed patches have been landed.  Closing bug.
Comment 18 Darin Adler 2009-10-19 15:41:32 PDT
(In reply to comment #14)
> This change is fine.  I'm sad that we can't test it, but I believe you when you
> say it's requires a 10 second test.

Normally in such cases we put the test into WebCore/manual-tests with instructions on how to run it.
Comment 19 Dmitry Titov 2009-10-19 15:51:25 PDT
(In reply to comment #18)
> (In reply to comment #14)
> > This change is fine.  I'm sad that we can't test it, but I believe you when you
> > say it's requires a 10 second test.
> 
> Normally in such cases we put the test into WebCore/manual-tests with
> instructions on how to run it.

Good point! The bug and a patch with manual test is coming.
Comment 20 Geoffrey Garen 2011-05-15 16:27:28 PDT
> Good point! The bug and a patch with manual test is coming.

Did you land a manual test?
Comment 21 Geoffrey Garen 2011-05-15 16:32:20 PDT
(In reply to comment #12)
> > I thought it would be nice to avoid conditions when JS is using DOMWindow which
> > is disconnected. But if the fix looks risky, we could just check for null :-)
> 
> Drive-by comment: we can't avoid JS playing with detached DOMWindows.

FWIW, it turned out that we can't avoid JS playing with detached DOMWindows, and Alexey had to re-fix this bug with a null check. See bug 33815.
Comment 22 Geoffrey Garen 2011-05-15 16:36:01 PDT
Also, the fix here caused this performance problem: https://bugs.webkit.org/show_bug.cgi?id=33815.
Comment 23 Alexey Proskuryakov 2011-05-15 18:14:22 PDT
> Also, the fix here caused this performance problem: https://bugs.webkit.org/show_bug.cgi?id=33815.

Geoff, is that the right bug#?
Comment 24 Pratik Solanki 2011-05-16 15:25:15 PDT
(In reply to comment #23)
> > Also, the fix here caused this performance problem: https://bugs.webkit.org/show_bug.cgi?id=33815.
> 
> Geoff, is that the right bug#?

I believe the performance issue (which Geoff fixed) was <https://bugs.webkit.org/show_bug.cgi?id=59699>.
Comment 25 Geoffrey Garen 2011-05-16 20:33:08 PDT
Oops! Pratik is right -- the performance regression was https://bugs.webkit.org/show_bug.cgi?id=59699.
Comment 26 Dmitry Titov 2011-05-17 10:30:34 PDT
The manual test for this crash was landed in http://trac.webkit.org/changeset/49824

Great to see the better fix!