Bug 205157 - Ignore URL host for schemes that are not using host information
Summary: Ignore URL host for schemes that are not using host information
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: youenn fablet
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-12-12 04:47 PST by youenn fablet
Modified: 2019-12-30 09:10 PST (History)
15 users (show)

See Also:


Attachments
Patch (12.18 KB, patch)
2019-12-12 04:51 PST, youenn fablet
no flags Details | Formatted Diff | Diff
Removing host inside Document class (12.12 KB, patch)
2019-12-24 05:35 PST, youenn fablet
no flags Details | Formatted Diff | Diff
Updated according review (15.06 KB, patch)
2019-12-30 02:50 PST, youenn fablet
no flags Details | Formatted Diff | Diff
Fix Win build (15.04 KB, patch)
2019-12-30 07:02 PST, youenn fablet
no flags Details | Formatted Diff | Diff
Patch (15.04 KB, patch)
2019-12-30 08:11 PST, youenn fablet
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description youenn fablet 2019-12-12 04:47:05 PST
Ignore URL host for schemes that are not using host information, like file, data or about
Comment 1 youenn fablet 2019-12-12 04:51:36 PST
Created attachment 385489 [details]
Patch
Comment 2 youenn fablet 2019-12-13 07:31:43 PST
<rdar://problem/57825963>
Comment 3 Brent Fulgham 2019-12-16 09:12:53 PST
Comment on attachment 385489 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=385489&action=review

> Source/WebCore/ChangeLog:14
> +        The resource loading will still be done based on the initial URL with host, but the document creation, including its location will do as if the host is empty.

"will ACT as if the host is empty"?

> Source/WebCore/loader/DocumentLoader.cpp:1631
> +        url.setHost({ });

This seems to be the only functional change in this patch. Is 'documentURL' used during the creation of the Document object (which your ChangeLog indicates), or do we need to modify Document::Document() to ensure we don't get a Document with a host that should be ignored?

For example, could JS create a Document object with a data URL that might bypass your change?
Comment 4 youenn fablet 2019-12-17 08:43:53 PST
(In reply to Brent Fulgham from comment #3)
> Comment on attachment 385489 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=385489&action=review
> 
> > Source/WebCore/ChangeLog:14
> > +        The resource loading will still be done based on the initial URL with host, but the document creation, including its location will do as if the host is empty.
> 
> "will ACT as if the host is empty"?

Sounds good.
If a user is loading file://example.org/XX, the address bar will still have that URL, but 


> > Source/WebCore/loader/DocumentLoader.cpp:1631
> > +        url.setHost({ });
> 
> This seems to be the only functional change in this patch. Is 'documentURL'
> used during the creation of the Document object (which your ChangeLog
> indicates), or do we need to modify Document::Document() to ensure we don't
> get a Document with a host that should be ignored?

This is used for documents created by DocumentLoader/DocumentWriter, this change will basically apply whenever navigation happens.
This is the case when opening a link/iframe or using window.location.

If a user loads file://example.org/XX, location.host will be empty instead of "example.org" as done currently.

> For example, could JS create a Document object with a data URL that might
> bypass your change?

This is mostly targeting file URLs. Hosts in data URLs interact with parsing of the mime type.

JS could try to create a Document with XHR but in that case, this would be considered a cross origin request and would normally fail.
There is the case of SVG documents but they should also be same-origin.

The stronger way is to do the same modification within Document constructor.
The impact is wider though, for instance it impacts HTMLDocument::createSynthesizedDocument.
Maybe we should try this approach.
Comment 5 Alex Christensen 2019-12-17 12:01:28 PST
Comment on attachment 385489 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=385489&action=review

> Source/WebCore/ChangeLog:4
> +        https://bugs.webkit.org/show_bug.cgi?id=205157

Please include radar.
Comment 6 youenn fablet 2019-12-24 05:35:28 PST
Created attachment 386390 [details]
Removing host inside Document class
Comment 7 Darin Adler 2019-12-26 20:39:17 PST
Comment on attachment 386390 [details]
Removing host inside Document class

View in context: https://bugs.webkit.org/attachment.cgi?id=386390&action=review

> Source/WebCore/dom/Document.cpp:3233
> +    if (SecurityOrigin::shouldIgnoreHost(m_url) && !m_url.host().isEmpty())
> +        m_url.setHost({ });

I think this should do something more like setHostAndPort instead of setHost. I don’t think it makes sense to leave a port number behind. But setHostAndPort doesn’t seem to handle the empty string correctly. So might need to change it in the URL class or even add a new removeHostAndPort function. If we did add a new removeHostAndPort function, we could optimize it to quickly do nothing when it’s already empty, and then we wouldn’t need the host().isEmpty() check here.

> Source/WebCore/page/DOMWindow.h:179
> +    WEBCORE_EXPORT Location& location();

We also have WEBCORE_TESTSUPPORT_EXPORT. Not sure if we are using it consistently.

> Source/WebCore/page/Location.h:55
> +    WEBCORE_EXPORT String host() const;

Ditto.

> Source/WebCore/page/SecurityOrigin.cpp:61
> +    return url.protocolIs("data") || url.protocolIs("about") || url.protocolIs("file");

How did you determine this set of protocols?

Could use protocolIsAbout() and protocolIsData() rather than writing those two strings out.

Could consider using isLocalFile() for the "file" one, but not sure if that’s the correct abstraction.

I don’t understand how we determined this list. What about "blob"? What about "javascript"? How are they different from "data"?
Comment 8 youenn fablet 2019-12-30 02:44:54 PST
Comment on attachment 386390 [details]
Removing host inside Document class

View in context: https://bugs.webkit.org/attachment.cgi?id=386390&action=review

>> Source/WebCore/dom/Document.cpp:3233
>> +        m_url.setHost({ });
> 
> I think this should do something more like setHostAndPort instead of setHost. I don’t think it makes sense to leave a port number behind. But setHostAndPort doesn’t seem to handle the empty string correctly. So might need to change it in the URL class or even add a new removeHostAndPort function. If we did add a new removeHostAndPort function, we could optimize it to quickly do nothing when it’s already empty, and then we wouldn’t need the host().isEmpty() check here.

OK, will add removeHostAndPort

>> Source/WebCore/page/SecurityOrigin.cpp:61
>> +    return url.protocolIs("data") || url.protocolIs("about") || url.protocolIs("file");
> 
> How did you determine this set of protocols?
> 
> Could use protocolIsAbout() and protocolIsData() rather than writing those two strings out.
> 
> Could consider using isLocalFile() for the "file" one, but not sure if that’s the correct abstraction.
> 
> I don’t understand how we determined this list. What about "blob"? What about "javascript"? How are they different from "data"?

We can only remove host information for the schemes we know hosts are not useful, custom protocols might rely on host information for instance.
From the known schemes, http/ftp cannot be in the list. "blob" is tied to the origin that created the blob URL so cannot be in the list.
I will add "javascript" in the list since it is similar to data URL.
Comment 9 youenn fablet 2019-12-30 02:50:22 PST
Created attachment 386519 [details]
Updated according review
Comment 10 youenn fablet 2019-12-30 07:02:06 PST
Created attachment 386531 [details]
Fix Win build
Comment 11 WebKit Commit Bot 2019-12-30 07:48:32 PST
Comment on attachment 386531 [details]
Fix Win build

Rejecting attachment 386531 [details] from commit-queue.

Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-01', 'validate-changelog', '--check-oops', '--non-interactive', 386531, '--port=mac']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

ChangeLog entry in Source/WTF/ChangeLog contains OOPS!.

Full output: https://webkit-queues.webkit.org/results/13296948
Comment 12 youenn fablet 2019-12-30 08:11:44 PST
Created attachment 386534 [details]
Patch
Comment 13 WebKit Commit Bot 2019-12-30 09:09:20 PST
The commit-queue encountered the following flaky tests while processing attachment 386534 [details]:

inspector/debugger/breakpoint-scope.html bug 205646 (authors: drousso@apple.com and joepeck@webkit.org)
The commit-queue is continuing to process your patch.
Comment 14 WebKit Commit Bot 2019-12-30 09:10:20 PST
Comment on attachment 386534 [details]
Patch

Clearing flags on attachment: 386534

Committed r253946: <https://trac.webkit.org/changeset/253946>
Comment 15 WebKit Commit Bot 2019-12-30 09:10:22 PST
All reviewed patches have been landed.  Closing bug.