Bug 74533

Summary: REGRESSION(r102619): Reproducible crash closing window with video + poster image inside an object element
Product: WebKit Reporter: Brady Eidson <beidson>
Component: Page LoadingAssignee: Brady Eidson <beidson>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, darin, kbr, mrowe, ossy, tkent, webkit.review.bot, zherczeg
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 74607    
Bug Blocks: 74555    
Attachments:
Description Flags
Patch v1 - Fix + layout test
beidson: review-
Patch v2 - Style fix and better layout test
darin: review+
Patch v3 - Rename "document activation callbacks" to "page cache suspension" callbacks and use them darin: review+

Description Brady Eidson 2011-12-14 13:11:59 PST
REGRESSION(r102619): Reproducible crash closing window with video + poster image inside an object element

0   com.apple.WebCore             	0x000000010ca60dac WebCore::FrameLoader::loadedResourceFromMemoryCache(WebCore::CachedResource*) + 76
1   com.apple.WebCore             	0x000000010c92fb15 WebCore::CachedResourceLoader::requestResource(WebCore::CachedResource::Type, WebCore::ResourceRequest&, WTF::String const&, WebCore::ResourceLoaderOptions const&, WebCore::ResourceLoadPriority, bool) + 821
2   com.apple.WebCore             	0x000000010c92f702 WebCore::CachedResourceLoader::requestImage(WebCore::ResourceRequest&) + 402
3   com.apple.WebCore             	0x000000010c42d1d1 WebCore::ImageLoader::updateFromElement() + 769
4   com.apple.WebCore             	0x000000010c470389 WebCore::HTMLVideoElement::attach() + 121
5   com.apple.WebCore             	0x000000010c30ec5c WebCore::ContainerNode::attach() + 28
6   com.apple.WebCore             	0x000000010c31d9d9 WebCore::Element::attach() + 105
7   com.apple.WebCore             	0x000000010c30ec5c WebCore::ContainerNode::attach() + 28
8   com.apple.WebCore             	0x000000010c31d9d9 WebCore::Element::attach() + 105
9   com.apple.WebCore             	0x000000010c34e520 WebCore::Element::recalcStyle(WebCore::Node::StyleChange) + 592
10  com.apple.WebCore             	0x000000010ca9ea2b WebCore::HTMLPlugInImageElement::documentWillBecomeInactive() + 203
11  com.apple.WebCore             	0x000000010c375479 WebCore::Document::detach() + 409
12  com.apple.WebCore             	0x000000010c300c48 WebCore::Frame::setView(WTF::PassRefPtr<WebCore::FrameView>) + 72
13  com.apple.WebCore             	0x000000010c373a9a WebCore::FrameLoader::detachFromParent() + 314
14  com.apple.WebKit2             	0x000000010bd7f59a WebKit::WebPage::close() + 306

This is due to running what was intended to be "suspend" code at teardown time.

Fix is forthcoming.

In radar as <rdar://problem/10576732>
Comment 1 Brady Eidson 2011-12-14 13:54:58 PST
Created attachment 119290 [details]
Patch v1 - Fix + layout test
Comment 2 WebKit Review Bot 2011-12-14 13:58:57 PST
Attachment 119290 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/medi..." exit_code: 1

Source/WebCore/html/HTMLPlugInImageElement.h:26:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 1 in 7 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Brady Eidson 2011-12-14 13:59:58 PST
(In reply to comment #2)
> Attachment 119290 [details] did not pass style-queue:
> 
> Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/medi..." exit_code: 1
> 
> Source/WebCore/html/HTMLPlugInImageElement.h:26:  Alphabetical sorting problem.  [build/include_order] [4]
> Total errors found: 1 in 7 files
> 
> 
> If any of these errors are false positives, please file a bug against check-webkit-style.

Will fix this, and clean up the layout test some more - Just realized it had unnecessary remnants from a past version.
Comment 4 Brady Eidson 2011-12-14 14:27:57 PST
Created attachment 119297 [details]
Patch v2 - Style fix and better layout test
Comment 5 Darin Adler 2011-12-14 14:32:15 PST
Comment on attachment 119297 [details]
Patch v2 - Style fix and better layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=119297&action=review

> Source/WebCore/html/HTMLPlugInImageElement.h:59
> +    virtual bool canSuspend() const OVERRIDE;
> +    virtual void suspend(ReasonForSuspension) OVERRIDE;
> +    virtual void resume() OVERRIDE;

Do these need to be public? Why can’t they be protected?
Comment 6 Brady Eidson 2011-12-14 14:47:15 PST
(In reply to comment #5)
> (From update of attachment 119297 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=119297&action=review
> 
> > Source/WebCore/html/HTMLPlugInImageElement.h:59
> > +    virtual bool canSuspend() const OVERRIDE;
> > +    virtual void suspend(ReasonForSuspension) OVERRIDE;
> > +    virtual void resume() OVERRIDE;
> 
> Do these need to be public? Why can’t they be protected?

Good point, fixing before landing.  Thanks for the review!
Comment 7 Brady Eidson 2011-12-14 14:52:33 PST
http://trac.webkit.org/changeset/102829
Comment 8 Alexey Proskuryakov 2011-12-14 15:32:00 PST
I don't think that using ActiveDOMObject here is right. In particular, it looks like the suspend() function can fire events, breaking iteration over the objects in ScriptExecutionContext.

ActiveDOMObject was meant as a means to keep non-node objects alive, and using it with nodes might still be causing issues along the lines of bug 45309. Also, as you mentioned on IRC, the semantics of moving ActiveDOMObject nodes between documents are unclear.
Comment 9 Brady Eidson 2011-12-14 15:40:50 PST
(In reply to comment #8)
> I don't think that using ActiveDOMObject here is right. In particular, it looks like the suspend() function can fire events, breaking iteration over the objects in ScriptExecutionContext.

Breaking how?  Can't the ScriptExecutionContext iterate in a manner that won't be broken?

> ActiveDOMObject was meant as a means to keep non-node objects alive, and using it with nodes might still be causing issues along the lines of bug 45309.

That might have been one of it's purposes, but doesn't it also do much more than fits the Document activation callback use case quite well?  I think that's why over time more elements have found themselves moving to ActiveDOMObject.

>Also, as you mentioned on IRC, the semantics of moving ActiveDOMObject nodes between documents are unclear.

Indeed.  Such semantics are only clear for elements that override willMoveToNewOwnerDocument and didMoveToNewOwnerDocument, which is dreadfully few.  In general our support for moving objects between Documents is horrible.  Making ActiveDOMObject aware of this would be useful.
Comment 10 Alexey Proskuryakov 2011-12-14 15:51:36 PST
> Breaking how?  Can't the ScriptExecutionContext iterate in a manner that won't be broken?

An event handler can add or remove ActiveDOMObjects.

This was previously discussed at length in the context of IndexedDB, and no, it doesn't seem easy to change the code in a way that would be sufficiently robust. No matter how we change iteration in ScriptExecutionContext, there will be cases that don't work right in general case.

Since this patch almost certainly introduces a security bug, it should probably be rolled out.
Comment 11 Brady Eidson 2011-12-14 16:03:20 PST
(In reply to comment #10)
> > Breaking how?  Can't the ScriptExecutionContext iterate in a manner that won't be broken?
> 
> An event handler can add or remove ActiveDOMObjects.
> 
> This was previously discussed at length in the context of IndexedDB, and no, it doesn't seem easy to change the code in a way that would be sufficiently robust. No matter how we change iteration in ScriptExecutionContext, there will be cases that don't work right in general case.
> 
> Since this patch almost certainly introduces a security bug, it should probably be rolled out.

Can you explain why this patch almost certainly introduces a security bug whereas the Media and Marquee element stuff do not, and should not be rolled out?

I'm still unclear on what the actual security bug is.

Why can't we copy the set of objects to a vector before iterating them...?

Going forward we should have precisely *one* mechanism for handling suspend/resume instead of the current two mechanisms.  I think the right course is to fix the shortcomings in the obviously superior mechanism so we can get rid of the obviously inferior one (which I already filed as https://bugs.webkit.org/show_bug.cgi?id=74545)
Comment 12 Kenneth Russell 2011-12-14 16:05:02 PST
Another point: this patch caused the layout test fast/images/move-image-to-new-document.html to start timing out, which would be another point in favor of rolling out and trying another approach.
Comment 13 Alexey Proskuryakov 2011-12-14 16:12:48 PST
> Can you explain why this patch almost certainly introduces a security bug whereas the Media and Marquee element stuff do not, and should not be rolled out?

HTMLPlugInImageElement::suspend does a style recalculation, which can do things like instantiating plug-ins or firing events. I haven't looked at the other active nodes now, perhaps they also do something terrible like that.

Someone who wants to redesign ActiveDOMObject can follow discussions in 52719, bug 52366 and related bugs. My take from those discussions is that ActiveDOMObject should be avoided. Given that it often causes such strong desire to use it, it should probably be made less prominent or removed altogether.
Comment 14 Kenneth Russell 2011-12-14 16:17:04 PST
(In reply to comment #12)
> Another point: this patch caused the layout test fast/images/move-image-to-new-document.html to start timing out, which would be another point in favor of rolling out and trying another approach.

To be clear, the regression in that test case looks pretty serious -- load events aren't firing for an object element when it's moved from one page to another.
Comment 15 Zoltan Herczeg 2011-12-15 01:50:08 PST
Test fix seems broke fast/images/move-image-to-new-document.html on several bots (at least on Mac, GTK and QT). Any idea how to fix it?
Comment 16 Csaba Osztrogonác 2011-12-15 07:08:26 PST
New bug report for the new regression: https://bugs.webkit.org/show_bug.cgi?id=74607
Comment 17 Brady Eidson 2011-12-15 08:54:06 PST
(In reply to comment #12)
> Another point: this patch caused the layout test fast/images/move-image-to-new-document.html to start timing out, which would be another point in favor of rolling out and trying another approach.

Covered by https://bugs.webkit.org/show_bug.cgi?id=74555 I'll take action on this today.

(In reply to comment #14)
> (In reply to comment #12)
> To be clear, the regression in that test case looks pretty serious -- load events aren't firing for an object element when it's moved from one page to another.

Actually our support for moving nodes between documents is spotty at best.  I'm surprised I happened to touch a cross section of the code where it *was* handled properly.

Reproducible crashers are also serious, but obviously it's not okay to cause a known regression either.
Comment 18 Brady Eidson 2011-12-15 10:51:04 PST
Reverted r102829 for reason:

Caused https://bugs.webkit.org/show_bug.cgi?id=74555

Committed r102962: <http://trac.webkit.org/changeset/102962>
Comment 19 Brady Eidson 2011-12-15 15:37:15 PST
I have a patch for a better fix, but I can't test it easily right now as http://trac.webkit.org/changeset/102866 is causing a crash.

Covered by https://bugs.webkit.org/show_bug.cgi?id=74655
Comment 20 Brady Eidson 2011-12-16 10:54:04 PST
Created attachment 119634 [details]
Patch v3 - Rename "document activation callbacks" to "page cache suspension" callbacks and use them

This changes the "document will become inactive" callbacks from being called both on page cache suspension and document detachment in to "page cache suspension" callbacks that are absolutely only called when the document goes in to the page cache.
Comment 21 Darin Adler 2011-12-16 16:06:56 PST
Comment on attachment 119634 [details]
Patch v3 - Rename "document activation callbacks" to "page cache suspension" callbacks and use them

View in context: https://bugs.webkit.org/attachment.cgi?id=119634&action=review

> Source/WebCore/ChangeLog:11
> +        Elements that used to register for "document activation callbacks" now register for "page cache suspension" callbacks.
> +        The Document only dispatches these callbacks specifically when it suspended in to the page cache or resumed from it.

I think this description is unnecessarily oblique. The real issue is that we don’t need to do all this work when tearing down documents; only when suspending documents we plan to resume later. Maybe there’s a more straightforward way of saying that.

> Source/WebCore/dom/Document.h:982
> +    enum ReasonForInactivation {
> +        Suspension,
> +        Detach
> +    };

I’d rather tree this all on one line. I don’t think that “suspension” and “detach” are parallel. Maybe call it “render tree destruction” instead of “detach”? I wonder what the name is for a process when a document goes into a state where it’s no longer showed in a frame, but nodes in it may still exist.
Comment 22 Darin Adler 2011-12-16 16:08:19 PST
Comment on attachment 119634 [details]
Patch v3 - Rename "document activation callbacks" to "page cache suspension" callbacks and use them

View in context: https://bugs.webkit.org/attachment.cgi?id=119634&action=review

> Source/WebCore/dom/Document.cpp:3922
> -void Document::documentWillBecomeInactive() 
> +void Document::documentWillBecomeInactive(ReasonForInactivation reason)

I think this should just be two functions. One can call the other. There’s no reason to use an enum here instead of a second function.
Comment 23 Brady Eidson 2011-12-16 16:33:15 PST
Landed in http://trac.webkit.org/changeset/103130