WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
13203
REGRESSION: Repro crash in -[WebBaseNetscapePluginView(WebNPPCallbacks) destroyStream:reason:] navigating away from page with DivX movie plug-in
https://bugs.webkit.org/show_bug.cgi?id=13203
Summary
REGRESSION: Repro crash in -[WebBaseNetscapePluginView(WebNPPCallbacks) destr...
Patricia Warwick
Reported
2007-03-27 06:31:47 PDT
I am attempting to test embedding a DIVX movie into a web page. This works correctly in Safari, but when I try to load the same page under Webkit two things happen. 1) Webkit loads an older version of the page (presumably from cache) and as a result cannot find the DIVX file. 2) When I click on the Reload button on the toolbar, Webkit crashes.
Attachments
Safari Crash Log
(143.61 KB, text/plain)
2007-03-27 08:54 PDT
,
Patricia Warwick
no flags
Details
patch
(7.91 KB, patch)
2007-03-27 19:21 PDT
,
Geoffrey Garen
darin
: review-
Details
Formatted Diff
Diff
patch
(8.01 KB, patch)
2007-03-28 17:26 PDT
,
Geoffrey Garen
darin
: review+
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
David Kilzer (:ddkilzer)
Comment 1
2007-03-27 07:59:06 PDT
Hi Patricia, please post the stack trace to this bug. Also, if possible, please give the URL of the page causing the crash, or the URL of the main web site if it's a password-protected page. Thanks!
Patricia Warwick
Comment 2
2007-03-27 08:54:44 PDT
Created
attachment 13829
[details]
Safari Crash Log
Patricia Warwick
Comment 3
2007-03-27 08:56:41 PDT
The crash occurs when I reload this page:
http://patriciawarwick.com.hosting.domaindirect.com/viewmovie.html
Mark Rowe (bdash)
Comment 4
2007-03-27 09:46:53 PDT
I can reproduce this crash with ToT. With
r19968
I do not experience a crash, but after reloading the page then closing the tab Safari starts using a *huge* amount of memory. I killed it after it reached 1.5GB, but reproduced the behaviour a second time. I'm seeing what looks to be user-controlled data being dereferenced which triggers the crash. Patricia's crash log shows several instances of crashing while dereferencing 0x50504320 ("PPC ") and I have seen other values such as 0xf97223d1 or 0x56e58975. The two most common address I am seeing dereferenced causing the crash are 0xc0000000 and 0x00000020. I have also seen a third failure mode that appears at random: 2007-03-28 02:27:29.057 Safari[39550:117] *** -[NSConcreteMutableData errorForReason:]: selector not recognized [self = 0x187f6350] ASSERTION FAILED: Uncaught exception - *** -[NSConcreteMutableData errorForReason:]: selector not recognized [self = 0x187f6350] 0 (WebCore/platform/mac/BlockExceptions.mm:36 void ReportBlockedObjCException(NSException*)) Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0xbbadbeef 0x0132a9b3 in ReportBlockedObjCException (localException=0x181a6a60) at WebCore/platform/mac/BlockExceptions.mm:36 36 ASSERT_WITH_MESSAGE(0, "Uncaught exception - %@", localException); (gdb)
Mark Rowe (bdash)
Comment 5
2007-03-27 09:48:35 PDT
<
rdar://problem/5091330
>
Mark Rowe (bdash)
Comment 6
2007-03-27 09:50:17 PDT
And after posting that comment I saw a crash at 0x65707954 ("epyT"). It seems like either we or the divx.com plugin are accessing memory which has been deallocated and repurposed :-(
Geoffrey Garen
Comment 7
2007-03-27 14:38:16 PDT
Changing title to focus on the most important issue -- the crash. Once we fix that, we should re-check for other problems.
Geoffrey Garen
Comment 8
2007-03-27 14:38:41 PDT
I'm working on this.
Geoffrey Garen
Comment 9
2007-03-27 15:15:16 PDT
You need the DivX player and plug-in to repro:
http://www.divx.com/divx/mac/
Geoffrey Garen
Comment 10
2007-03-27 15:22:29 PDT
Crash also happens if you just navigate away from the page.
Geoffrey Garen
Comment 11
2007-03-27 15:48:10 PDT
Ironically, MallocScribble fixes this crash, so be sure to turn it off!
Geoffrey Garen
Comment 12
2007-03-27 15:50:39 PDT
Strike that. It's NSZombieEnabled that fixes this crash.
Geoffrey Garen
Comment 13
2007-03-27 15:57:18 PDT
The interesting part of the crash is here: #0 0x0041a028 in -[WebBaseNetscapePluginView(WebNPPCallbacks) destroyStream:reason:] at WebBaseNetscapePluginView.mm:2323 #1 0x00426ebd in NPN_DestroyStream at npapi.m:110 #2 0x17907c38 in nsDivXBrowserPlugin::cancelDownload #3 0x17914d5f in DivXBrowserPlugin::Close #4 0x1791698a in DivXBrowserPlugin::~DivXBrowserPlugin #5 0x17909596 in nsDivXBrowserPlugin::~nsDivXBrowserPlugin #6 0x1790dc94 in Mozilla_NPP_Destroy #7 0x1790e06b in NPP_Destroy #8 0x0041ab90 in -[WebBaseNetscapePluginView(Internal) _destroyPlugin] at WebBaseNetscapePluginView.mm:2593 When we destroy the DivX plug-in, it cancels its plug-in stream. However, the stream it passes to WebKit to cancel is either already destroyed or complete garbage: p stream $10 = (NPStream *) 0x178b5b10 (gdb) p *(WebBaseNetscapePluginStream *)stream->ndata $11 = { { isa = 0x0 }, [...many more NULL fields...]
Geoffrey Garen
Comment 14
2007-03-27 16:16:27 PDT
According to MallocScribble, the NPStream* the plug-in passes to NPN_DestroyStream points to freed memory.
Geoffrey Garen
Comment 15
2007-03-27 17:35:30 PDT
gdb confirms that this is an already destroyed stream, not just random garbage: NPP_DestroyStream: 0x17794180, {pdata = 0x17554e70, ndata = 0x17794160, url = 0x17738d90 "
http://patriciawarwick.com.hosting.domaindirect.com/weaving.divx
", end = 52260076, lastmodified = 1175003384, notifyData = 0x1} NPN_DestroyStream: 0x17794180, {pdata = 0x55555555, ndata = 0x55555555, url = 0x55555555 <Address 0x55555555 out of bounds>, end = 1431655765, lastmodified = 1431655765, notifyData = 0x55555555} So, we notify the plug-in that we're destroying an NPStream (NPP_DestroyStream), and then we destroy it, and subsequently the plug-in asks us to destroy it again (NPN_DestroyStream). This seems like a plug-in bug that we'll have to work around.
Mark Rowe (bdash)
Comment 16
2007-03-27 17:39:01 PDT
Geoff, the initial report indicates this works correctly in Safari 2.0.4. I recall we had some fixes recently related to freeing plugin streams earlier to prevent holding on to their memory for too long -- perhaps the fix is incorrect, or alternatively the plugin could have been making bogus assumptions about the lifetime of the stream.
Geoffrey Garen
Comment 17
2007-03-27 19:16:00 PDT
Based on Anders' comments, I believe we never experienced this bug before because we never destroyed plug-in streams, period. That's why I think we didn't crash before. NPP_DestroyStream is the callback the browser is supposed to make to notify a plug-in that a stream is dead (
http://developer.mozilla.org/en/docs/Gecko_Plugin_API_Reference:Plug-in_Side_Plug-in_API
). The gdb log I posted above shows that the plug-in makes its NPN_DestroyStream call after the browser has called NPP_DestroyStream (it turns out to be long after, actually), so I think this bug is about a bogus assumption in the plug-in. Specifically, I think the plug-in's destructor unconditionally destroys its stream, without checking whether the stream has been destroyed already.
Geoffrey Garen
Comment 18
2007-03-27 19:21:23 PDT
Created
attachment 13839
[details]
patch This patch guards against bogus NPStreams in plug-in callbacks.
Darin Adler
Comment 19
2007-03-27 21:51:26 PDT
Comment on
attachment 13839
[details]
patch This fix is incorrect. There's no guarantee that we won't create a new stream with the same pointer value as an old stream. While this change greatly decreases the chance of the bug happening, it would be better to find a fix that fixes the problem completely. To do a correct fix, we need to understand more about exactly what the DivX plug-in is doing. Does it destroy the same stream twice? Does it destroy a stream after we destroy it? A correct fix for this will presumably involve keeping the NPStream objects around for a longer time. + char* filename = reinterpret_cast<char*>(malloc(bufSize)); These kinds of casts are done with static_cast, not reinterpret_cast. The malloc function returns a void* and you can and should cast from one of those without using reinterpret_cast.
Darin Adler
Comment 20
2007-03-27 22:12:53 PDT
(In reply to
comment #19
)
> (From update of
attachment 13839
[details]
[edit]) > This fix is incorrect. There's no guarantee that we won't create a new stream > with the same pointer value as an old stream. While this change greatly > decreases the chance of the bug happening, it would be better to find a fix > that fixes the problem completely. > > To do a correct fix, we need to understand more about exactly what the DivX > plug-in is doing. Does it destroy the same stream twice? Does it destroy a > stream after we destroy it? > > A correct fix for this will presumably involve keeping the NPStream objects > around for a longer time.
I'm sorry. I should have read the bug report more carefully before reviewing. Your comments indicate that you consider this a guard against incorrect code in the DivX plug-in. In that context, I suppose this fix might be acceptable. But I'd like to understand why this is not a problem in other browsers. If this is really a workaround for incorrect plug-in code, then it should write out a log message in debug versions. I'd also like to understand who destroyed the stream in the first place. Did we destroy the stream? Why? Was that correct? It's possible that failures are not supposed to destroy the stream, and it's our code that's incorrect. I'm suspicious that we don't yet fully understand what's happening. Protecting against use a bad pointer like this should be a last resort. Do we really know that this is a bug in the plug-in and not something we're doing wrong? A cleaner way to do the workaround would be to close streams, but not actually deallocate the stream object itself until the plug-in is deallocated. This would have the downside that the stream objects would leak while a plug-in was working, but the upside that it would not be susceptible to aliasing the way the attached fix is. By making a closed stream discard most of its state, the cost for each stream could be a single small memory block. Or possibly this is the correct semantic. It may be that we are destroying streams when really we should just be putting them in a closed state. It's possible that only the plug-in itself is supposed to actually be able to deallocate a stream by calling NPN_DestroyStream.
Geoffrey Garen
Comment 21
2007-03-27 23:11:02 PDT
> This fix is incorrect. There's no guarantee that we won't create a new stream with the same pointer value as an old stream.
Streams are per-plug-in. So, in the unlikely aliasing case, a buggy plug-in would just end up canceling one of its own streams. We could go to great lengths to accomodate that sort of bug, but I don't know of any real-world compatibility problems that would solve. I'm sympathetic to your desire to make this patch perfect. I have that desire too. I'm just not sure it's the right thing to do here.
> Does it destroy the same stream twice?
No.
> Does it destroy a stream after we destroy it?
Yes.
> I'd like to understand why this is not a problem in other browsers.
I didn't see an obvious answer to this question in the FF source code. It could be random chance regarding when other browsers destroy plug-in streams. It's also possible that other browsers simply don't implement NPN_DestroyStream. I know of at least two other browsers that don't. A simpler solution to this bug might be to remove our implementation of NPN_DestroyStream. I went back through SVN history, and it doesn't look like it was added for any specific reason. I didn't go that route because I thought it was a bit of a blind leap, but it's an option.
> If this is really a workaround for incorrect plug-in code, then it should write out a log message in debug versions.
Sounds good.
> Did we destroy the stream?
Yes.
> Why?
Because the user navigated away from the page, we canceled all loaders.
> Was that correct?
That depends. There's no documentation saying it's incorrect. Certainly the browser needs to cancel loaders at some time. Page tear-down seems like a reasonable time. We could go out of our way to destroy all plug-ins before canceling loaders. That would be another work-around, but I think it would be hard to keep that sort of dependency in sync.
> It's possible that failures are not supposed to destroy the stream, and it's our code that's incorrect.
The Mozilla documentation says this: "The browser allocates and initializes the NPStream object... The browser cannot delete the object until after it calls NPP_DestroyStream or the plug-in calls NPN_DestroyStream... The browser informs the plug-in when the stream is about to be deleted through NPP_DestroyStream, after which the NPStream object is no longer valid." This is a browser-created stream. My reading of the documentation is that either the plug-in or the browser can decide when the stream is no longer needed, but it's ultimately up to the browser to decide when to free the memory. According to gdb, we correctly call NPP_DestroyStream to notify that plug-in that the stream is being destroyed. That's why I think the plug-in's later use of the stream is an error.
> Do we really know that this is a bug in the plug-in and not something we're doing wrong?
Yes. A plug-in should not use a stream after it receives the NPP_DestroyStream callback.
> A cleaner way to do the workaround would be to close streams, but not actually deallocate the stream object itself until the plug-in is deallocated.
I'm not sure I would call this implementation "cleaner." I would call it more complicated, with the benefit of guaranteeing that a buggy plug-in couldn't accidentally cancel one of its own streams. Do you think the benefit is worth the cost?
> Or possibly this is the correct semantic. It may be that we are destroying streams when really we should just be putting them in a closed state.
I don't think the documentation supports that claim.
> It's possible that only the plug-in itself is supposed to actually be able to deallocate a stream by calling NPN_DestroyStream.
I don't think the documentation supports that claim, either.
Darin Adler
Comment 22
2007-03-28 09:45:22 PDT
> > I'd like to understand why this is not a problem in other browsers. > > I didn't see an obvious answer to this question in the FF source code. It could > be random chance regarding when other browsers destroy plug-in streams.
What you call "random chance" here could be the essence of the real problem.
> A simpler solution to this bug might be to remove our implementation of > NPN_DestroyStream. I went back through SVN history, and it doesn't look like it > was added for any specific reason. I didn't go that route because I thought it > was a bit of a blind leap, but it's an option.
Seems worth considering.
> Because the user navigated away from the page, we canceled all loaders.
Do other browsers destroy streams before destroying plug-ins when navigating away from a page?
> > Was that correct? > > That depends. There's no documentation saying it's incorrect. Certainly the > browser needs to cancel loaders at some time. Page tear-down seems like a > reasonable time. We could go out of our way to destroy all plug-ins before > canceling loaders. That would be another work-around, but I think it would be > hard to keep that sort of dependency in sync.
I don't agree with your assessment here. It doesn't sound hard to destroy the plug-ins before destroying the plug-in streams.
> > Do we really know that this is a bug in the plug-in and not something we're doing wrong? > > Yes. A plug-in should not use a stream after it receives the NPP_DestroyStream > callback.
But perhaps there's an implicit contract that the browser won't do this. And perhaps no other browser does it. In which case I'm not sure I'd call this a bug in the plug-in.
> > A cleaner way to do the workaround would be to close streams, but not actually deallocate the stream object itself until the plug-in is deallocated. > > I'm not sure I would call this implementation "cleaner." I would call it more > complicated, with the benefit of guaranteeing that a buggy plug-in couldn't > accidentally cancel one of its own streams. Do you think the benefit is worth > the cost?
I don't agree with your assessment of the cost. I think having a notion of a closed stream is very simple to implement. I strongly prefer a solution that does not involve a pointer validity check.
Geoffrey Garen
Comment 23
2007-03-28 10:55:25 PDT
Anders found this comment in the Mozilla source code, which seems to confirm that their behavior truly is random. It looks like Firefox may never release browser-created streams: 1180 // FIXME:
http://bugzilla.mozilla.org/show_bug.cgi?id=240131
1181 // 1182 // Is it ok to leave pstream->ndata set here, and who releases it 1183 // (or is it even properly ref counted)? And who closes the stream 1184 // etc? (
http://lxr.mozilla.org/seamonkey/source/modules/plugin/base/src/ns4xPlugin.cpp
)
Geoffrey Garen
Comment 24
2007-03-28 13:29:34 PDT
After further reflection, I don't think we want to eliminate NPN_DestoryStream. It's an important networking optimization in the case of a plug-in that wants to do its own loading. The plug-in first calls NPN_DestroyStream to cancel the browser's loading. In that case, we don't want to continue the browser's loading, slowing down the real load.
Geoffrey Garen
Comment 25
2007-03-28 13:32:24 PDT
If we decide to keep NPStream objects around for the lifetime of a plug-in, I think we need to answer the question of what to do in the case of, e.g., a Flash-based email client, which might create hundreds or even thousands of streams in one sitting. Do we leak all that memory? Cut it off at some limit? Use a white list or black list?
Geoffrey Garen
Comment 26
2007-03-28 16:44:06 PDT
> I don't agree with your assessment here. It doesn't sound hard to destroy the > plug-ins before destroying the plug-in streams.
That seems really non-trivial to me. We can't make stream destruction destroy a plug-in because it's normal for a plug-in to outlast a stream. We can't make the code that destroys all streams destroy all plug-ins, since we wouldn't want stopping a page load, for example, to have the side-efect of unloading the page's plug-ins. It seems to me that we would need to re-order the loader state machine to wait until a page had torn down before canceling its loaders. In addition to being very tricky, that solution would degrade networking performance in the case of a navigation that interrupted another navigation, since both navigations would remain live until the new navigation became committed.
Geoffrey Garen
Comment 27
2007-03-28 17:26:06 PDT
Created
attachment 13855
[details]
patch This patch eschews an "Is this stream valid?" check in favor of a "Does this stream belong to the plug-in using it?" check. So it avoids introducing a function that can create subtle bugs by promising to do more than it actually can, even though It doesn't do much more in the way of guarding against bad behavior.
Darin Adler
Comment 28
2007-03-28 17:40:25 PDT
Comment on
attachment 13855
[details]
patch You wore me down, man. r=me
Geoffrey Garen
Comment 29
2007-03-29 08:35:04 PDT
Committed revision 20571.
Patricia Warwick
Comment 30
2007-03-29 11:47:33 PDT
It is not totally clear from the Mark's comments about the huge amounts of memory being used ... did that apply to Webkit or Safari? The person for whom I was developing this page tested it with Safari and reported that it caused his system to come to a halt. Could that be a related bug in Safari?
Mark Rowe (bdash)
Comment 31
2007-03-29 12:05:01 PDT
Patricia, my comment was about
r19968
of WebKit. It'll be worth retesting this site with Geoff's fix to see if the large memory increase is still present. If it is, that should be filed in a new bug report.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug