13203 – REGRESSION: Repro crash in -[WebBaseNetscapePluginView(WebNPPCallbacks) destroyStream:reason:] navigating away from page with DivX movie plug-in

RESOLVED FIXED 13203

REGRESSION: Repro crash in -[WebBaseNetscapePluginView(WebNPPCallbacks) destroyStream:reason:] navigating away from page with DivX movie plug-in

https://bugs.webkit.org/show_bug.cgi?id=13203

Summary REGRESSION: Repro crash in -[WebBaseNetscapePluginView(WebNPPCallbacks) destr...

Patricia Warwick

Reported 2007-03-27 06:31:47 PDT

I am attempting to test embedding a DIVX movie into a web page. This works correctly in Safari, but when I try to load the same page under Webkit two things happen. 1) Webkit loads an older version of the page (presumably from cache) and as a result cannot find the DIVX file. 2) When I click on the Reload button on the toolbar, Webkit crashes.

Attachments
Safari Crash Log (143.61 KB, text/plain) 2007-03-27 08:54 PDT, Patricia Warwick	no flags	Details
patch (7.91 KB, patch) 2007-03-27 19:21 PDT, Geoffrey Garen	darin: review-	Details Formatted Diff Diff
patch (8.01 KB, patch) 2007-03-28 17:26 PDT, Geoffrey Garen	darin: review+	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

David Kilzer (:ddkilzer)

Comment 1 2007-03-27 07:59:06 PDT

Hi Patricia, please post the stack trace to this bug. Also, if possible, please give the URL of the page causing the crash, or the URL of the main web site if it's a password-protected page. Thanks!

Patricia Warwick

Comment 2 2007-03-27 08:54:44 PDT

Created attachment 13829 [details] Safari Crash Log

Patricia Warwick

Comment 3 2007-03-27 08:56:41 PDT

The crash occurs when I reload this page: http://patriciawarwick.com.hosting.domaindirect.com/viewmovie.html

Mark Rowe (bdash)

Comment 4 2007-03-27 09:46:53 PDT

I can reproduce this crash with ToT. With r19968 I do not experience a crash, but after reloading the page then closing the tab Safari starts using a *huge* amount of memory. I killed it after it reached 1.5GB, but reproduced the behaviour a second time. I'm seeing what looks to be user-controlled data being dereferenced which triggers the crash. Patricia's crash log shows several instances of crashing while dereferencing 0x50504320 ("PPC ") and I have seen other values such as 0xf97223d1 or 0x56e58975. The two most common address I am seeing dereferenced causing the crash are 0xc0000000 and 0x00000020. I have also seen a third failure mode that appears at random: 2007-03-28 02:27:29.057 Safari[39550:117] *** -[NSConcreteMutableData errorForReason:]: selector not recognized [self = 0x187f6350] ASSERTION FAILED: Uncaught exception - *** -[NSConcreteMutableData errorForReason:]: selector not recognized [self = 0x187f6350] 0 (WebCore/platform/mac/BlockExceptions.mm:36 void ReportBlockedObjCException(NSException*)) Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0xbbadbeef 0x0132a9b3 in ReportBlockedObjCException (localException=0x181a6a60) at WebCore/platform/mac/BlockExceptions.mm:36 36 ASSERT_WITH_MESSAGE(0, "Uncaught exception - %@", localException); (gdb)

Mark Rowe (bdash)

Comment 5 2007-03-27 09:48:35 PDT

<rdar://problem/5091330>

Mark Rowe (bdash)

Comment 6 2007-03-27 09:50:17 PDT

And after posting that comment I saw a crash at 0x65707954 ("epyT"). It seems like either we or the divx.com plugin are accessing memory which has been deallocated and repurposed :-(

Geoffrey Garen

Comment 7 2007-03-27 14:38:16 PDT

Changing title to focus on the most important issue -- the crash. Once we fix that, we should re-check for other problems.

Geoffrey Garen

Comment 8 2007-03-27 14:38:41 PDT

I'm working on this.

Geoffrey Garen

Comment 9 2007-03-27 15:15:16 PDT

You need the DivX player and plug-in to repro: http://www.divx.com/divx/mac/

Geoffrey Garen

Comment 10 2007-03-27 15:22:29 PDT

Crash also happens if you just navigate away from the page.

Geoffrey Garen

Comment 11 2007-03-27 15:48:10 PDT

Ironically, MallocScribble fixes this crash, so be sure to turn it off!

Geoffrey Garen

Comment 12 2007-03-27 15:50:39 PDT

Strike that. It's NSZombieEnabled that fixes this crash.

Geoffrey Garen

Comment 13 2007-03-27 15:57:18 PDT

The interesting part of the crash is here: #0 0x0041a028 in -[WebBaseNetscapePluginView(WebNPPCallbacks) destroyStream:reason:] at WebBaseNetscapePluginView.mm:2323 #1 0x00426ebd in NPN_DestroyStream at npapi.m:110 #2 0x17907c38 in nsDivXBrowserPlugin::cancelDownload #3 0x17914d5f in DivXBrowserPlugin::Close #4 0x1791698a in DivXBrowserPlugin::~DivXBrowserPlugin #5 0x17909596 in nsDivXBrowserPlugin::~nsDivXBrowserPlugin #6 0x1790dc94 in Mozilla_NPP_Destroy #7 0x1790e06b in NPP_Destroy #8 0x0041ab90 in -[WebBaseNetscapePluginView(Internal) _destroyPlugin] at WebBaseNetscapePluginView.mm:2593 When we destroy the DivX plug-in, it cancels its plug-in stream. However, the stream it passes to WebKit to cancel is either already destroyed or complete garbage: p stream $10 = (NPStream *) 0x178b5b10 (gdb) p *(WebBaseNetscapePluginStream *)stream->ndata $11 = { { isa = 0x0 }, [...many more NULL fields...]

Geoffrey Garen

Comment 14 2007-03-27 16:16:27 PDT

According to MallocScribble, the NPStream* the plug-in passes to NPN_DestroyStream points to freed memory.

Geoffrey Garen

Comment 15 2007-03-27 17:35:30 PDT

gdb confirms that this is an already destroyed stream, not just random garbage: NPP_DestroyStream: 0x17794180, {pdata = 0x17554e70, ndata = 0x17794160, url = 0x17738d90 "http://patriciawarwick.com.hosting.domaindirect.com/weaving.divx", end = 52260076, lastmodified = 1175003384, notifyData = 0x1} NPN_DestroyStream: 0x17794180, {pdata = 0x55555555, ndata = 0x55555555, url = 0x55555555 <Address 0x55555555 out of bounds>, end = 1431655765, lastmodified = 1431655765, notifyData = 0x55555555} So, we notify the plug-in that we're destroying an NPStream (NPP_DestroyStream), and then we destroy it, and subsequently the plug-in asks us to destroy it again (NPN_DestroyStream). This seems like a plug-in bug that we'll have to work around.

Mark Rowe (bdash)

Comment 16 2007-03-27 17:39:01 PDT

Geoff, the initial report indicates this works correctly in Safari 2.0.4. I recall we had some fixes recently related to freeing plugin streams earlier to prevent holding on to their memory for too long -- perhaps the fix is incorrect, or alternatively the plugin could have been making bogus assumptions about the lifetime of the stream.

Geoffrey Garen

Comment 17 2007-03-27 19:16:00 PDT

Based on Anders' comments, I believe we never experienced this bug before because we never destroyed plug-in streams, period. That's why I think we didn't crash before. NPP_DestroyStream is the callback the browser is supposed to make to notify a plug-in that a stream is dead (http://developer.mozilla.org/en/docs/Gecko_Plugin_API_Reference:Plug-in_Side_Plug-in_API). The gdb log I posted above shows that the plug-in makes its NPN_DestroyStream call after the browser has called NPP_DestroyStream (it turns out to be long after, actually), so I think this bug is about a bogus assumption in the plug-in. Specifically, I think the plug-in's destructor unconditionally destroys its stream, without checking whether the stream has been destroyed already.

Geoffrey Garen

Comment 18 2007-03-27 19:21:23 PDT

Created attachment 13839 [details] patch This patch guards against bogus NPStreams in plug-in callbacks.

Darin Adler

Comment 19 2007-03-27 21:51:26 PDT

Comment on attachment 13839 [details] patch This fix is incorrect. There's no guarantee that we won't create a new stream with the same pointer value as an old stream. While this change greatly decreases the chance of the bug happening, it would be better to find a fix that fixes the problem completely. To do a correct fix, we need to understand more about exactly what the DivX plug-in is doing. Does it destroy the same stream twice? Does it destroy a stream after we destroy it? A correct fix for this will presumably involve keeping the NPStream objects around for a longer time. + char* filename = reinterpret_cast<char*>(malloc(bufSize)); These kinds of casts are done with static_cast, not reinterpret_cast. The malloc function returns a void* and you can and should cast from one of those without using reinterpret_cast.

Darin Adler

Comment 20 2007-03-27 22:12:53 PDT

(In reply to comment #19) > (From update of attachment 13839 [details] [edit]) > This fix is incorrect. There's no guarantee that we won't create a new stream > with the same pointer value as an old stream. While this change greatly > decreases the chance of the bug happening, it would be better to find a fix > that fixes the problem completely. > > To do a correct fix, we need to understand more about exactly what the DivX > plug-in is doing. Does it destroy the same stream twice? Does it destroy a > stream after we destroy it? > > A correct fix for this will presumably involve keeping the NPStream objects > around for a longer time. I'm sorry. I should have read the bug report more carefully before reviewing. Your comments indicate that you consider this a guard against incorrect code in the DivX plug-in. In that context, I suppose this fix might be acceptable. But I'd like to understand why this is not a problem in other browsers. If this is really a workaround for incorrect plug-in code, then it should write out a log message in debug versions. I'd also like to understand who destroyed the stream in the first place. Did we destroy the stream? Why? Was that correct? It's possible that failures are not supposed to destroy the stream, and it's our code that's incorrect. I'm suspicious that we don't yet fully understand what's happening. Protecting against use a bad pointer like this should be a last resort. Do we really know that this is a bug in the plug-in and not something we're doing wrong? A cleaner way to do the workaround would be to close streams, but not actually deallocate the stream object itself until the plug-in is deallocated. This would have the downside that the stream objects would leak while a plug-in was working, but the upside that it would not be susceptible to aliasing the way the attached fix is. By making a closed stream discard most of its state, the cost for each stream could be a single small memory block. Or possibly this is the correct semantic. It may be that we are destroying streams when really we should just be putting them in a closed state. It's possible that only the plug-in itself is supposed to actually be able to deallocate a stream by calling NPN_DestroyStream.

Geoffrey Garen

Comment 21 2007-03-27 23:11:02 PDT

> This fix is incorrect. There's no guarantee that we won't create a new stream with the same pointer value as an old stream. Streams are per-plug-in. So, in the unlikely aliasing case, a buggy plug-in would just end up canceling one of its own streams. We could go to great lengths to accomodate that sort of bug, but I don't know of any real-world compatibility problems that would solve. I'm sympathetic to your desire to make this patch perfect. I have that desire too. I'm just not sure it's the right thing to do here. > Does it destroy the same stream twice? No. > Does it destroy a stream after we destroy it? Yes. > I'd like to understand why this is not a problem in other browsers. I didn't see an obvious answer to this question in the FF source code. It could be random chance regarding when other browsers destroy plug-in streams. It's also possible that other browsers simply don't implement NPN_DestroyStream. I know of at least two other browsers that don't. A simpler solution to this bug might be to remove our implementation of NPN_DestroyStream. I went back through SVN history, and it doesn't look like it was added for any specific reason. I didn't go that route because I thought it was a bit of a blind leap, but it's an option. > If this is really a workaround for incorrect plug-in code, then it should write out a log message in debug versions. Sounds good. > Did we destroy the stream? Yes. > Why? Because the user navigated away from the page, we canceled all loaders. > Was that correct? That depends. There's no documentation saying it's incorrect. Certainly the browser needs to cancel loaders at some time. Page tear-down seems like a reasonable time. We could go out of our way to destroy all plug-ins before canceling loaders. That would be another work-around, but I think it would be hard to keep that sort of dependency in sync. > It's possible that failures are not supposed to destroy the stream, and it's our code that's incorrect. The Mozilla documentation says this: "The browser allocates and initializes the NPStream object... The browser cannot delete the object until after it calls NPP_DestroyStream or the plug-in calls NPN_DestroyStream... The browser informs the plug-in when the stream is about to be deleted through NPP_DestroyStream, after which the NPStream object is no longer valid." This is a browser-created stream. My reading of the documentation is that either the plug-in or the browser can decide when the stream is no longer needed, but it's ultimately up to the browser to decide when to free the memory. According to gdb, we correctly call NPP_DestroyStream to notify that plug-in that the stream is being destroyed. That's why I think the plug-in's later use of the stream is an error. > Do we really know that this is a bug in the plug-in and not something we're doing wrong? Yes. A plug-in should not use a stream after it receives the NPP_DestroyStream callback. > A cleaner way to do the workaround would be to close streams, but not actually deallocate the stream object itself until the plug-in is deallocated. I'm not sure I would call this implementation "cleaner." I would call it more complicated, with the benefit of guaranteeing that a buggy plug-in couldn't accidentally cancel one of its own streams. Do you think the benefit is worth the cost? > Or possibly this is the correct semantic. It may be that we are destroying streams when really we should just be putting them in a closed state. I don't think the documentation supports that claim. > It's possible that only the plug-in itself is supposed to actually be able to deallocate a stream by calling NPN_DestroyStream. I don't think the documentation supports that claim, either.

Darin Adler

Comment 22 2007-03-28 09:45:22 PDT

> > I'd like to understand why this is not a problem in other browsers. > > I didn't see an obvious answer to this question in the FF source code. It could > be random chance regarding when other browsers destroy plug-in streams. What you call "random chance" here could be the essence of the real problem. > A simpler solution to this bug might be to remove our implementation of > NPN_DestroyStream. I went back through SVN history, and it doesn't look like it > was added for any specific reason. I didn't go that route because I thought it > was a bit of a blind leap, but it's an option. Seems worth considering. > Because the user navigated away from the page, we canceled all loaders. Do other browsers destroy streams before destroying plug-ins when navigating away from a page? > > Was that correct? > > That depends. There's no documentation saying it's incorrect. Certainly the > browser needs to cancel loaders at some time. Page tear-down seems like a > reasonable time. We could go out of our way to destroy all plug-ins before > canceling loaders. That would be another work-around, but I think it would be > hard to keep that sort of dependency in sync. I don't agree with your assessment here. It doesn't sound hard to destroy the plug-ins before destroying the plug-in streams. > > Do we really know that this is a bug in the plug-in and not something we're doing wrong? > > Yes. A plug-in should not use a stream after it receives the NPP_DestroyStream > callback. But perhaps there's an implicit contract that the browser won't do this. And perhaps no other browser does it. In which case I'm not sure I'd call this a bug in the plug-in. > > A cleaner way to do the workaround would be to close streams, but not actually deallocate the stream object itself until the plug-in is deallocated. > > I'm not sure I would call this implementation "cleaner." I would call it more > complicated, with the benefit of guaranteeing that a buggy plug-in couldn't > accidentally cancel one of its own streams. Do you think the benefit is worth > the cost? I don't agree with your assessment of the cost. I think having a notion of a closed stream is very simple to implement. I strongly prefer a solution that does not involve a pointer validity check.

Geoffrey Garen

Comment 23 2007-03-28 10:55:25 PDT

Anders found this comment in the Mozilla source code, which seems to confirm that their behavior truly is random. It looks like Firefox may never release browser-created streams: 1180 // FIXME: http://bugzilla.mozilla.org/show_bug.cgi?id=240131 1181 // 1182 // Is it ok to leave pstream->ndata set here, and who releases it 1183 // (or is it even properly ref counted)? And who closes the stream 1184 // etc? (http://lxr.mozilla.org/seamonkey/source/modules/plugin/base/src/ns4xPlugin.cpp)

Geoffrey Garen

Comment 24 2007-03-28 13:29:34 PDT

After further reflection, I don't think we want to eliminate NPN_DestoryStream. It's an important networking optimization in the case of a plug-in that wants to do its own loading. The plug-in first calls NPN_DestroyStream to cancel the browser's loading. In that case, we don't want to continue the browser's loading, slowing down the real load.

Geoffrey Garen

Comment 25 2007-03-28 13:32:24 PDT

If we decide to keep NPStream objects around for the lifetime of a plug-in, I think we need to answer the question of what to do in the case of, e.g., a Flash-based email client, which might create hundreds or even thousands of streams in one sitting. Do we leak all that memory? Cut it off at some limit? Use a white list or black list?

Geoffrey Garen

Comment 26 2007-03-28 16:44:06 PDT

> I don't agree with your assessment here. It doesn't sound hard to destroy the > plug-ins before destroying the plug-in streams. That seems really non-trivial to me. We can't make stream destruction destroy a plug-in because it's normal for a plug-in to outlast a stream. We can't make the code that destroys all streams destroy all plug-ins, since we wouldn't want stopping a page load, for example, to have the side-efect of unloading the page's plug-ins. It seems to me that we would need to re-order the loader state machine to wait until a page had torn down before canceling its loaders. In addition to being very tricky, that solution would degrade networking performance in the case of a navigation that interrupted another navigation, since both navigations would remain live until the new navigation became committed.

Geoffrey Garen

Comment 27 2007-03-28 17:26:06 PDT

Created attachment 13855 [details] patch This patch eschews an "Is this stream valid?" check in favor of a "Does this stream belong to the plug-in using it?" check. So it avoids introducing a function that can create subtle bugs by promising to do more than it actually can, even though It doesn't do much more in the way of guarding against bad behavior.

Darin Adler

Comment 28 2007-03-28 17:40:25 PDT

Comment on attachment 13855 [details] patch You wore me down, man. r=me

Geoffrey Garen

Comment 29 2007-03-29 08:35:04 PDT

Committed revision 20571.

Patricia Warwick

Comment 30 2007-03-29 11:47:33 PDT

It is not totally clear from the Mark's comments about the huge amounts of memory being used ... did that apply to Webkit or Safari? The person for whom I was developing this page tested it with Safari and reported that it caused his system to come to a halt. Could that be a related bug in Safari?

Mark Rowe (bdash)

Comment 31 2007-03-29 12:05:01 PDT

Patricia, my comment was about r19968 of WebKit. It'll be worth retesting this site with Geoff's fix to see if the large memory increase is still present. If it is, that should be filed in a new bug report.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P1

Severity Major

Classification Unclassified

Version 523.x (Safari 3)

Hardware Mac

OS OS X 10.4

Product WebKit

Component New Bugs

Assignee

Geoffrey Garen

Reported

2007-03-27 06:31 PDT

Modified

2007-03-29 12:05 PDT History

CC List

1 user Show

URL

http://patriciawarwick.com.hosting.domaindirect.com/viewmovie.html

Keywords InRadar, Regression

Depends on

Blocks