We're getting downstream reports of a crash in NPN_InvokeDefault [1] [2]. The first recorded crash is with 2.5.3, but that's the first 2.5 series release that was packaged for Fedora.
So far, everyone who provided a comment says the crash happened on extensions.gnome.org:
"I tried to upgrade the World Clock extension at extensions.gnome.org using GNOME Web."
"I surfed on extensions.gnome.org using epiphany."
From the detailed backtrace [3], it looks like the first user to report the bug was also on extensions.gnome.org.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1147314
[2] https://retrace.fedoraproject.org/faf/reports/427972/
[3] https://bugzilla.redhat.com/attachment.cgi?id=942098
I'm checking this, and I still think it's a bug in the plugin calling invokeDefault with a null NPObject, but we shouldn't trust plugins anyway. I think we should protect all public plugins api. Firefox seems to do something like this in invokeDefault, for example:
if (!npp || !npobj || !npobj->_class || !npobj->_class->invokeDefault)
return false;
I think we could do something similar to what we do in retainNPObject, releaseNPObject, etc. Add ASSERTS to catch those issues in Debug builds, but also early returns to not crash in Release.
Since plug-ins are a cross browser API we need to be consistent with other browsers about how we handle null. I don’t think an assert is appropriate if this value is allowed. I think it’s fine to tolerate nulls for these arguments if the other browsers do.
I think the language about “trusting” plug-ins isn’t really quite right. If a plug-in does something wrong there’s not much we can do about it. Whether we “trust” it or not doesn’t matter.
On the other hand, if other browsers tolerate null but we dereference the null and crash, then there’s a real interoperability problem and we should fix to be consistent.
(In reply to comment #7)
> Still getting reports of this with latest GNOME Shell and WebKitGTK+.
3621 reports of this in Fedora so far. This one's old; reports go back to 2013, so it's not a recent regression.
Comment on attachment 293142[details]
Patch
View in context: https://bugs.webkit.org/attachment.cgi?id=293142&action=review
Thanks very much for working on this, it will cut my downstream bug queue by about a quarter.
Unfortunately it broke macOS tests, which will need to be investigated:
Regressions: Unexpected text-only failures (1)
plugins/browser-funcs-invalid-args.html [ Failure ]
Regressions: Unexpected timeouts (1)
plugins/npruntime/npruntime-calls-with-null-npp.html [ Timeout ]
> Source/WebKit2/ChangeLog:9
> + the same approach than Firefox.
than -> as
(In reply to comment #11)
> Comment on attachment 293142[details]
> Patch
>
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=293142&action=review
>
> Thanks very much for working on this, it will cut my downstream bug queue by
> about a quarter.
Note that this is not enough, the bug in in the plugin.
> Unfortunately it broke macOS tests, which will need to be investigated:
There's nothing to investigate, as I said the test needs to be added to xcode, I knew it was not going to work.
> Regressions: Unexpected text-only failures (1)
> plugins/browser-funcs-invalid-args.html [ Failure ]
>
> Regressions: Unexpected timeouts (1)
> plugins/npruntime/npruntime-calls-with-null-npp.html [ Timeout ]
>
> > Source/WebKit2/ChangeLog:9
> > + the same approach than Firefox.
>
> than -> as
English is hard.
(In reply to comment #12)
> (In reply to comment #11)
> > Unfortunately it broke macOS tests, which will need to be investigated:
>
> There's nothing to investigate, as I said the test needs to be added to
> xcode, I knew it was not going to work.
I’d like to help or get someone to help you. I don’t understand, though.
I believe you are pointing out that BrowserFunctionsInvalidArguments.cpp needs to be added to an Xcode project, and someone who develops on a Mac should help you add that to your patch.
But how does this cause the browser-funcs-invalid-args.html and npruntime-calls-with-null-npp.html tests to fail?
(In reply to comment #13)
> (In reply to comment #12)
> > (In reply to comment #11)
> > > Unfortunately it broke macOS tests, which will need to be investigated:
> >
> > There's nothing to investigate, as I said the test needs to be added to
> > xcode, I knew it was not going to work.
>
> I’d like to help or get someone to help you. I don’t understand, though.
Thanks.
> I believe you are pointing out that BrowserFunctionsInvalidArguments.cpp
> needs to be added to an Xcode project, and someone who develops on a Mac
> should help you add that to your patch.
Right.
> But how does this cause the browser-funcs-invalid-args.html and
> npruntime-calls-with-null-npp.html tests to fail?
browser-funcs-invalid-args.html is added in this patch and requires BrowserFunctionsInvalidArguments.cpp to work. I didn't realize npruntime-calls-with-null-npp.html was also failing, Michael is right and I need to investigate that one. But in any case I need BrowserFunctionsInvalidArguments.cpp to be built on mac.
(In reply to comment #14)
> And anyway: do we still want this?
Yes, gnome-shell plugin is not the only plugin in the world, if a plugin calls a browser funcs with a npobject we shouldn't crash. Also for browsers interoperability as Darin said in comment #5, that's the reason why I added the early returns in the very same places that firefox.
(In reply to comment #11)
> Regressions: Unexpected timeouts (1)
> plugins/npruntime/npruntime-calls-with-null-npp.html [ Timeout ]
Ok, this is timing out because my patch changed the behavior. In current code we handle null npp in browser funcs, but in a slightly different way. NetscapePlugin::fromNPP() returns nullptr if the given instance is nullptr, and then PluginDestructionProtector does nothing if the instance is nullptr. So we end up actually calling the browser function on the given npobject even when the plugin instance is nullptr. With the new patch, we return early if npp is nullptr and the function returns false, which is not expected by npruntime-calls-with-null-npp.html. When fixing this I noticed that firefox always returns early from all the browser funcs when npp is nullptr, and I planned to check this also in WebKit in a follow up patch. I will update the test to make it work after this patch, and then I'll do the same for all other methods to keep consistency.
I also noticed that I should probably move browser-funcs-invalid-args.html to the npruntime subdir as well.
Created attachment 293340[details]
Updated patch
NOTE: still needs the xcode changes, so npruntime-calls-with-null-npobject.html is going to fail for sure in mac EWS.
Renamed the test as npruntime-calls-with-null-npobject.html for consistency with npruntime-calls-with-null-npp.html. Also updated the latter because the behavior changed.
Created attachment 293342[details]
Archive of layout-test-results from ews102 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews102 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 293343[details]
Archive of layout-test-results from ews107 for mac-yosemite-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107 Port: mac-yosemite-wk2 Platform: Mac OS X 10.10.5
Created attachment 293344[details]
Archive of layout-test-results from ews113 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews113 Port: mac-yosemite Platform: Mac OS X 10.10.5
Attachment 293400[details] did not pass style-queue:
ERROR: Source/WebCore/bridge/NP_jsobject.cpp:168: _NPN_InvokeDefault is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/WebCore/bridge/NP_jsobject.cpp:290: _NPN_GetProperty is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/WebCore/bridge/NP_jsobject.cpp:328: _NPN_SetProperty is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/WebCore/bridge/NP_jsobject.cpp:360: _NPN_RemoveProperty is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/WebCore/bridge/NP_jsobject.cpp:403: _NPN_HasProperty is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/WebCore/bridge/NP_jsobject.cpp:436: _NPN_HasMethod is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/WebCore/bridge/NP_jsobject.cpp:473: _NPN_Enumerate is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/WebCore/bridge/NP_jsobject.cpp:517: _NPN_Construct is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
Total errors found: 8 in 16 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 293400[details]
Updated patch
View in context: https://bugs.webkit.org/attachment.cgi?id=293400&action=review> Source/WebCore/bridge/NP_jsobject.cpp:171
> + if (!npp || !o || !o->_class || !o->_class->invokeDefault)
> + return false;
This is a change in behavior. Before we would return true with a void value in result when the invokeDefault pointer was null. I don’t see any test coverage for this change in behavior.
> Source/WebCore/bridge/NP_jsobject.cpp:173
> if (o->_class == NPScriptObjectClass) {
This is broken now, because invokeDefault is null in NPScriptObjectClass, so this code will never run. No test coverage?
> Source/WebCore/bridge/NP_jsobject.cpp:213
> + if (!npp || !o || !o->_class || !o->_class->invoke)
> + return false;
This is a change in behavior. Before we would return true with a void value for result when the invoke pointer was null. I don’t see any test coverage for this change in behavior.
> Source/WebCore/bridge/NP_jsobject.cpp:215
> if (o->_class == NPScriptObjectClass) {
Also broken like the other case above.
> Source/WebCore/bridge/NP_jsobject.cpp:293
> + if (!npp || !o || !o->_class || !o->_class->getProperty)
> + return false;
This is a change in behavior. Before we would set result to a void value when the getProperty pointer was null. Maybe we should test this?
> Source/WebCore/bridge/NP_jsobject.cpp:295
> if (o->_class == NPScriptObjectClass) {
Also broken like the other cases above.
> Source/WebCore/bridge/NP_jsobject.cpp:322
> - if (o->_class->hasProperty && o->_class->getProperty) {
> - if (o->_class->hasProperty(o, propertyName))
> - return o->_class->getProperty(o, propertyName, variant);
> - return false;
> - }
> + if (o->_class->hasProperty(o, propertyName))
> + return o->_class->getProperty(o, propertyName, variant);
Why is it OK to drop the check of o->_class->hasProperty for null? Do we have a test case covering that?
> Source/WebCore/bridge/NP_jsobject.cpp:333
> if (o->_class == NPScriptObjectClass) {
Also broken like the other cases above.
> Source/WebCore/bridge/NP_jsobject.cpp:408
> if (o->_class == NPScriptObjectClass) {
Also broken like the other cases above.
> Source/WebCore/bridge/NP_jsobject.cpp:441
> if (o->_class == NPScriptObjectClass) {
Also broken like the other cases above.
> Source/WebCore/bridge/NP_jsobject.cpp:522
> if (o->_class == NPScriptObjectClass) {
Also broken like the other cases above.
(In reply to comment #28)
> Comment on attachment 293400[details]
> Updated patch
>
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=293400&action=review
>
> > Source/WebCore/bridge/NP_jsobject.cpp:171
> > + if (!npp || !o || !o->_class || !o->_class->invokeDefault)
> > + return false;
>
> This is a change in behavior. Before we would return true with a void value
> in result when the invokeDefault pointer was null. I don’t see any test
> coverage for this change in behavior.
Yes, this is not covered by tests, I'm afraid, I just copied the early return condition from firefox, but only null npp and null npobject is covered by the tests (I think, I'll check anyway).
> > Source/WebCore/bridge/NP_jsobject.cpp:173
> > if (o->_class == NPScriptObjectClass) {
>
> This is broken now, because invokeDefault is null in NPScriptObjectClass, so
> this code will never run. No test coverage?
Ah, I didn't notice that.
> > Source/WebCore/bridge/NP_jsobject.cpp:213
> > + if (!npp || !o || !o->_class || !o->_class->invoke)
> > + return false;
>
> This is a change in behavior. Before we would return true with a void value
> for result when the invoke pointer was null. I don’t see any test coverage
> for this change in behavior.
>
> > Source/WebCore/bridge/NP_jsobject.cpp:215
> > if (o->_class == NPScriptObjectClass) {
>
> Also broken like the other case above.
>
> > Source/WebCore/bridge/NP_jsobject.cpp:293
> > + if (!npp || !o || !o->_class || !o->_class->getProperty)
> > + return false;
>
> This is a change in behavior. Before we would set result to a void value
> when the getProperty pointer was null. Maybe we should test this?
>
> > Source/WebCore/bridge/NP_jsobject.cpp:295
> > if (o->_class == NPScriptObjectClass) {
>
> Also broken like the other cases above.
>
> > Source/WebCore/bridge/NP_jsobject.cpp:322
> > - if (o->_class->hasProperty && o->_class->getProperty) {
> > - if (o->_class->hasProperty(o, propertyName))
> > - return o->_class->getProperty(o, propertyName, variant);
> > - return false;
> > - }
> > + if (o->_class->hasProperty(o, propertyName))
> > + return o->_class->getProperty(o, propertyName, variant);
>
> Why is it OK to drop the check of o->_class->hasProperty for null? Do we
> have a test case covering that?
I went too fast and thought this was only checking getProperty and removed the if because getProperty was already null checked before.
> > Source/WebCore/bridge/NP_jsobject.cpp:333
> > if (o->_class == NPScriptObjectClass) {
>
> Also broken like the other cases above.
>
> > Source/WebCore/bridge/NP_jsobject.cpp:408
> > if (o->_class == NPScriptObjectClass) {
>
> Also broken like the other cases above.
>
> > Source/WebCore/bridge/NP_jsobject.cpp:441
> > if (o->_class == NPScriptObjectClass) {
>
> Also broken like the other cases above.
>
> > Source/WebCore/bridge/NP_jsobject.cpp:522
> > if (o->_class == NPScriptObjectClass) {
>
> Also broken like the other cases above.
I'll update the patch.
Attachment 293405[details] did not pass style-queue:
ERROR: Source/WebCore/bridge/NP_jsobject.cpp:168: _NPN_InvokeDefault is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/WebCore/bridge/NP_jsobject.cpp:297: _NPN_GetProperty is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/WebCore/bridge/NP_jsobject.cpp:338: _NPN_SetProperty is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/WebCore/bridge/NP_jsobject.cpp:373: _NPN_RemoveProperty is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/WebCore/bridge/NP_jsobject.cpp:416: _NPN_HasProperty is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/WebCore/bridge/NP_jsobject.cpp:452: _NPN_HasMethod is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/WebCore/bridge/NP_jsobject.cpp:492: _NPN_Enumerate is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/WebCore/bridge/NP_jsobject.cpp:533: _NPN_Construct is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
Total errors found: 8 in 16 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 293408[details]
Archive of layout-test-results from ews103 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 293409[details]
Archive of layout-test-results from ews116 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews116 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 293410[details]
Archive of layout-test-results from ews106 for mac-yosemite-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews106 Port: mac-yosemite-wk2 Platform: Mac OS X 10.10.5
I don't understand why plugins/npruntime/npruntime-calls-with-null-npp.html still fails in WebKit1, according to the results the methods are returning true in case of null NPP.
Still happening as of 2.16.1:
Truncated backtrace:
Thread no. 1 (10 frames)
#0 WebKit::NPN_InvokeDefault at /usr/src/debug/webkitgtk-2.16.1/Source/WebKit2/WebProcess/Plugins/Netscape/NetscapeBrowserFuncs.cpp:689
#1 on_shell_signal at browser-plugin.c:179
#2 ffi_call_unix64 at ../src/x86/unix64.S:76
#3 ffi_call at ../src/x86/ffi64.c:525
#4 g_cclosure_marshal_generic at gclosure.c:1490
#9 on_signal_received at gdbusproxy.c:917
#10 emit_signal_instance_in_idle_cb at gdbusconnection.c:3705
#16 WTF::RunLoop::run at /usr/src/debug/webkitgtk-2.16.1/Source/WTF/wtf/glib/RunLoopGLib.cpp:94
#17 WebKit::ChildProcessMain<WebKit::PluginProcess, WebKit::PluginProcessMain> at /usr/src/debug/webkitgtk-2.16.1/Source/WebKit2/Shared/unix/ChildProcessMain.h:61
#19 _start
2016-10-28 04:12 PDT, Carlos Garcia Campos
2016-10-30 02:25 PDT, Carlos Garcia Campos
2016-10-30 03:26 PDT, Build Bot
2016-10-30 03:32 PDT, Build Bot
2016-10-30 03:41 PDT, Build Bot
2016-10-31 01:01 PDT, Carlos Garcia Campos
2016-10-31 01:47 PDT, Carlos Garcia Campos
2016-10-31 02:39 PDT, Build Bot
2016-10-31 02:54 PDT, Build Bot
2016-10-31 02:55 PDT, Build Bot
2018-03-28 10:17 PDT, Michael Catanzaro