Bug 222943

Summary: REGRESSION(r272469): [WPE][GTK] Epiphany UI process crashes when downloading PDFs, WebKitSecurityOrigin should use SecurityOriginData
Product: WebKit Reporter: Michael Catanzaro <mcatanzaro>
Component: WebCore Misc.Assignee: Michael Catanzaro <mcatanzaro>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, aperez, berto, bugs-noreply, cgarcia, ews-watchlist, gustavo, mcatanzaro, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: PC   
OS: Linux   
See Also: https://bugs.webkit.org/show_bug.cgi?id=162254
https://bugs.webkit.org/show_bug.cgi?id=223077
https://bugs.webkit.org/show_bug.cgi?id=223140
Attachments:
Description Flags
Patch
none
Patch none

Description Michael Catanzaro 2021-03-08 15:08:20 PST
Since r272469, the Epiphany UI process crashes when downloading any PDF. webkit_security_origin_new_for_uri() returns NULL for ephy-pdf: URIs, but the return value is not nullable, so Epiphany correctly dereferences it and crashes. This probably indicates a GTK API layer bug, but let's focus on the regression first.

"""
In order to allow things like web extensions to continue to work, we allow non-null origins for schemes for which a WKURLSchemeHandler has been registered. We learned this lesson 4 years ago when we tried this change. This also makes sense conceptually because those schemes will be handled by the containing application, so they can be an "origin" for a page.
"""

Epiphany registers ephy-pdf: using webkit_web_context_register_uri_scheme(), so in theory it ought to work, but in fact it only works in the web process because r272469 checks the scheme using LegacySchemeRegistry. But the scheme is registered with LegacySchemeRegistry only in the web process, not in the UI process! r272469 made WebPage::registerURLSchemeHandler register all modern scheme handlers with LegacySchemeRegistry, but WebPageProxy::setURLSchemeHandlerForScheme does not itself do so.

This patch fixes the crashes:

diff --git a/Source/WebKit/UIProcess/WebPageProxy.cpp b/Source/WebKit/UIProcess/WebPageProxy.cpp
index 8a1d3bd188b6..8a906a43f744 100644
--- a/Source/WebKit/UIProcess/WebPageProxy.cpp
+++ b/Source/WebKit/UIProcess/WebPageProxy.cpp
@@ -162,6 +162,7 @@
 #include <WebCore/FrameLoader.h>
 #include <WebCore/GlobalFrameIdentifier.h>
 #include <WebCore/GlobalWindowIdentifier.h>
+#include <WebCore/LegacySchemeRegistry.h>
 #include <WebCore/LengthBox.h>
 #include <WebCore/MIMETypeRegistry.h>
 #include <WebCore/MediaStreamRequest.h>
@@ -9549,6 +9550,9 @@ void WebPageProxy::setURLSchemeHandlerForScheme(Ref<WebURLSchemeHandler>&& handl
     ASSERT_UNUSED(identifierResult, identifierResult.isNewEntry);
 
     send(Messages::WebPage::RegisterURLSchemeHandler(identifier, canonicalizedScheme.value()));
+
+    WebCore::LegacySchemeRegistry::registerURLSchemeAsHandledBySchemeHandler(scheme);
+    WebCore::LegacySchemeRegistry::registerURLSchemeAsCORSEnabled(scheme);
 }
 
 WebURLSchemeHandler* WebPageProxy::urlSchemeHandlerForScheme(const String& scheme)


I added the call to registerURLSchemeAsCORSEnabled simply because that is what WebPage::registerURLSchemeHandler does (again, since r272469). I'm not sure if this is the direction we want to take here though. Hi Alex, what do you think?

TODO: I need to write a GLib API test for this, since it should be easy to do so.
Comment 1 Alex Christensen 2021-03-09 12:16:29 PST
I think I'd prefer you put it in webkit_web_context_register_uri_scheme and webkitWebContextCreatePageForWebView (which should probably have their common code put in a common place) unless we find that this is also needed in cocoa platforms.
Comment 2 Michael Catanzaro 2021-03-09 13:27:31 PST
Hm, well we could do that, sure... but that seems pretty fragile? Both for Cocoa platforms -- it seems like a potentially-serious footgun that SecurityOrigin may return different results for the same security origin depending on which process it's used from -- and for GLib platforms, since the code moves an implementation detail up to the GLib API layer. (If we want it to be platform-specific, I would even prefer #ifdefs in WebPageProxy rather than putting it in WebKitWebContext.cpp. Why should WebKitWebContext know about LegacySchemeRegistry?)

Anyway, I'll implement it this way if that's really what you prefer. Just seems riskier to me.
Comment 3 Michael Catanzaro 2021-03-10 06:17:40 PST
I think I'll put this in WebPageProxy, since WebKitWebContext is just not the right place for it. But I'll use PLATFORM(GTK) || PLATFORM(WPE) conditionals. I think it would be better without the platform conditionals, but Alex doesn't want to do this for Cocoa.
Comment 4 Alex Christensen 2021-03-10 08:18:06 PST
You may have also noticed in r272469 that we are using SecurityOriginData instead of SecurityOrigin in a few more places.  We have a long term goal of using SecurityOrigin only in the web process (or in the network process in the context of a web process or page) so that we can get rid of LegacySchemeRegistry in the coming decade or so.  We have made API::SecurityOrigin a wrapper of WebCore::SecurityOriginData instead of WebCore::SecurityOrigin.  You should consider doing the same for webkitSecurityOriginCreate and webkitSecurityOriginGetSecurityOrigin.  I think it's a bad idea to increase the use of LegacySchemeRegistry, and I've done so only reluctantly, realizing it increases the amount of work to remove it.
Comment 5 Michael Catanzaro 2021-03-10 11:08:33 PST
OK.

(In reply to Alex Christensen from comment #4)
> We have made
> API::SecurityOrigin a wrapper of WebCore::SecurityOriginData instead of
> WebCore::SecurityOrigin.  You should consider doing the same for
> webkitSecurityOriginCreate and webkitSecurityOriginGetSecurityOrigin.

If it works for API::SecurityOrigin, it will probably also work for WebKitSecurityOrigin. I will try.
Comment 6 Michael Catanzaro 2021-03-10 15:18:48 PST
It works!

(In reply to Michael Catanzaro from comment #0)
> Since r272469, the Epiphany UI process crashes when downloading any PDF.
> webkit_security_origin_new_for_uri() returns NULL for ephy-pdf: URIs, but
> the return value is not nullable, so Epiphany correctly dereferences it and
> crashes.

BTW this was wrong, webkit_security_origin_new_for_uri() never returns NULL. Epiphany was really crashing because webkit_security_origin_get_protocol() was unexpectedly returning NULL.
Comment 7 Michael Catanzaro 2021-03-10 15:30:25 PST
Created attachment 422882 [details]
Patch
Comment 8 EWS Watchlist 2021-03-10 15:30:59 PST
Thanks for the patch. If this patch contains new public API please make sure it follows the guidelines for new WebKit2 GTK+ API. See https://trac.webkit.org/wiki/WebKitGTK/AddingNewWebKit2API
Comment 9 Michael Catanzaro 2021-03-10 15:32:54 PST
Created attachment 422883 [details]
Patch
Comment 10 Michael Catanzaro 2021-03-10 15:35:10 PST
Comment on attachment 422883 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=422883&action=review

> Source/WebKit/UIProcess/API/glib/WebKitSecurityOrigin.cpp:234
> + * Deprecated: 2.32

I know deprecating APIs three days before GNOME's hard code freeze is not cool. If undesired, I could change this to 2.34 and we could just revert r272469 on the stable branch. I think it's safe, though, since Epiphany is the only app in Debian using this API, and it won't break.
Comment 11 EWS 2021-03-11 01:09:25 PST
Committed r274270: <https://commits.webkit.org/r274270>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 422883 [details].
Comment 12 Radar WebKit Bug Importer 2021-03-11 01:11:14 PST
<rdar://problem/75306566>
Comment 13 Adrian Perez 2021-03-11 01:47:05 PST
Comment on attachment 422883 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=422883&action=review

> Source/WebKit/UIProcess/API/glib/WebKitSecurityOrigin.cpp:34
> + * hostname, and an optional port number.

It would be more correct to write “consists of a scheme and an
optional authority, where the authority is typically a hostname,
though it may contain user information and a port number as well”.

Otherwise we may end up with people confused because they have CORS
failures due to using e.g. different port numbers so it would be
a good thing to aim for completeness here—though maybe my suggested
wording can be further improved.
Comment 14 Michael Catanzaro 2021-03-11 06:33:22 PST
(In reply to Adrian Perez from comment #13) 
> It would be more correct to write “consists of a scheme and an
> optional authority, where the authority is typically a hostname,
> though it may contain user information and a port number as well”.

It cannot contain username, though. Only <protocol, host, port>.

Scheme == protocol

Authority == host + optional port

So I don't see the difference? Except that I know what host and port are, but had to use DuckDuckGo to figure out for sure what "authority" meant. :P

> Otherwise we may end up with people confused because they have CORS
> failures due to using e.g. different port numbers so it would be
> a good thing to aim for completeness here—though maybe my suggested
> wording can be further improved.

Ah, you want me to clarify that two security origins with different ports are truly different? Or that no port means default port? I guess neither is entirely clear from the documentation.
Comment 15 Adrian Perez 2021-03-11 09:05:45 PST
(In reply to Michael Catanzaro from comment #14)
> (In reply to Adrian Perez from comment #13) 
> > It would be more correct to write “consists of a scheme and an
> > optional authority, where the authority is typically a hostname,
> > though it may contain user information and a port number as well”.
> 
> It cannot contain username, though. Only <protocol, host, port>.
> 
> Scheme == protocol
> 
> Authority == host + optional port
> 
> So I don't see the difference? Except that I know what host and port are,
> but had to use DuckDuckGo to figure out for sure what "authority" meant. :P

Okay, I've alwayts thought that the userinfo part of an URI was taken
as a component of a security origin, but thanks for making me doubt…
Reading https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy
it turns out that you are right, and mentioning <scheme, host, port> is
the correct thing.

And it's true that usually a scheme corresponds to a protocol, but it may
not be true, for example one could register a custom URI scheme handler
that loads files from disk, and it kind of would behave like “file://”
but as the scheme names are different they are always in different security
origins even if the protocol could be argued that is the same (“protocol”
here being “load files from disk”) — dunno, maybe I am getting a bit
philosophical here 🙃️

> > Otherwise we may end up with people confused because they have CORS
> > failures due to using e.g. different port numbers so it would be
> > a good thing to aim for completeness here—though maybe my suggested
> > wording can be further improved.
> 
> Ah, you want me to clarify that two security origins with different ports
> are truly different? Or that no port means default port? I guess neither is
> entirely clear from the documentation.

Yes, those both would be good clarifications 👍️
Comment 16 Adrian Perez 2021-03-11 09:09:46 PST
Also I think this patch introduced the following documentation build
failure, because the opening angle brackets should have been escaped
as “&lt;” (yes, it's horrendous to have all this SGML/XML late-90s
nightmare that ios Gtk-Doc, but it's what we have for now 🤷️):

html/webkitdomgtk-4.0/webkitdo/home/aperez/devel/WebKit/build-gtk/Documentation/webkit2gtk-4.0/xml/WebKitSecurityOrigin.xml:271: parser error : error parsing attribute name
wrapper around a <protocol, host, port> triplet, and no longer
                          ^
Comment 17 Michael Catanzaro 2021-03-11 09:13:42 PST
Alas, I'll land a follow-up.