Bug 234543 - Archived subresource loads fail if m_allowedNetworkHosts doesn't include the remote URL
Summary: Archived subresource loads fail if m_allowedNetworkHosts doesn't include the ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alex Christensen
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-12-20 20:22 PST by Matt Woodrow
Modified: 2021-12-23 15:11 PST (History)
6 users (show)

See Also:


Attachments
Patch (7.04 KB, patch)
2021-12-20 20:28 PST, Matt Woodrow
no flags Details | Formatted Diff | Diff
Patch (7.94 KB, patch)
2021-12-21 12:48 PST, Matt Woodrow
no flags Details | Formatted Diff | Diff
Patch (10.26 KB, patch)
2021-12-22 15:19 PST, Matt Woodrow
no flags Details | Formatted Diff | Diff
Patch (10.06 KB, patch)
2021-12-22 19:30 PST, Matt Woodrow
no flags Details | Formatted Diff | Diff
Patch (17.64 KB, patch)
2021-12-23 11:55 PST, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (17.93 KB, patch)
2021-12-23 12:00 PST, Alex Christensen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matt Woodrow 2021-12-20 20:22:01 PST
rdar://83970256

If WebKit is configured with a set of allowed network hosts, then subresource loads will be cancelled if the URL isn't within the list. This happens before the check to see if the subresource was already within the archived content, and wouldn't have actually needed a network load.
Comment 1 Matt Woodrow 2021-12-20 20:28:57 PST
Created attachment 447679 [details]
Patch
Comment 2 Matt Woodrow 2021-12-21 12:48:44 PST
Created attachment 447746 [details]
Patch
Comment 3 Alexey Proskuryakov 2021-12-21 15:50:18 PST
Not sure if the proposed change is OK given the initial intent of this SPI (disallow any 3rd party content). It's still 3rd party content even when it's loaded via an archive, and having it can still have security implications!

CC Alex to help untangle this.
Comment 4 Alex Christensen 2021-12-21 16:03:39 PST
Comment on attachment 447746 [details]
Patch

This is problematic for many reasons:
1. It doesn't contain a unit test to verify we won't break it in the future.
2. It does not contain a good description of what is going on or under what circumstances this changes behavior.
3. Being an "archive load" is not an inherent property of a resource request, which ought to be basically just a URL and header fields
4.  It substantially changes the behavior of when scheduleArchiveLoad is called, which almost certainly breaks web archive loading somehow.
5. There is no link to a radar or something motivating this change.

I believe the intent of this is to make allowsLoadFromURL return true if we know it will only load from a web archive and not from the network when _allowedNetworkHosts is being used.  Such a change should clearly not affect anything else.
Comment 5 Matt Woodrow 2021-12-21 16:43:03 PST
(In reply to Alex Christensen from comment #4)
> Comment on attachment 447746 [details]
> Patch
> 
> This is problematic for many reasons:
> 1. It doesn't contain a unit test to verify we won't break it in the future.

Agreed, can you please point me to related tests so that I have an idea where to start?
I saw WKURLSchemeHandler-1.mm, but I'm not sure where to start with loading sub resources from an archive.

> 2. It does not contain a good description of what is going on or under what
> circumstances this changes behavior.

Apologies, looks like --update-changelogs changed the description text. Is what's in the first comment of the bug sufficient?

> 3. Being an "archive load" is not an inherent property of a resource
> request, which ought to be basically just a URL and header fields

Would a change in naming to 'will load from archive' be satisfactory here?

> 4.  It substantially changes the behavior of when scheduleArchiveLoad is
> called, which almost certainly breaks web archive loading somehow.

Indeed it does. I was trying to avoid checking 'will load from archive' during the network host check, in advance of actually scheduling the archive load, since that feels overly fragile. It felt preferable to just schedule, and condition the other behaviour on the result.

I tried to audit possible behaviour changes from calling it early, and couldn't find anything, plus it passes test. It's possible that this is better suited to someone who knows this code.


> 5. There is no link to a radar or something motivating this change.

Comment 0 links to rdar://83970256, apologies if that's not the correct form of linkage.
Comment 6 Alex Christensen 2021-12-21 17:21:25 PST
(In reply to Matt Woodrow from comment #5)
> (In reply to Alex Christensen from comment #4)
> > Comment on attachment 447746 [details]
> > Patch
> > 
> > This is problematic for many reasons:
> > 1. It doesn't contain a unit test to verify we won't break it in the future.
> 
> Agreed, can you please point me to related tests so that I have an idea
> where to start?
> I saw WKURLSchemeHandler-1.mm, but I'm not sure where to start with loading
> sub resources from an archive.
Yes, that's where to find existing unit tests that use _allowedNetworkHosts.  Search for TEST(LoadWebArchive, HTTPSUpgrade) to see how to load a web archive.  You probably won't need to create a web archive dynamically for this, but you could.  Once you tell WKWebView to load a web archive, it will automatically load sub resources from the web archive.

> > 2. It does not contain a good description of what is going on or under what
> > circumstances this changes behavior.
> 
> Apologies, looks like --update-changelogs changed the description text. Is
> what's in the first comment of the bug sufficient?
It could use more clear language.  "Using WKWebViewConfiguration._allowedNetworkHosts prevents loads from web archives even when the network will not be used" makes it quite clear what is going on.

> Would a change in naming to 'will load from archive' be satisfactory here?
No, I don't think it should be a member of ResourceRequest.  I think it should be another parameter to Page::allowsLoadFromURL

> > 5. There is no link to a radar or something motivating this change.
> 
> Comment 0 links to rdar://83970256, apologies if that's not the correct form
> of linkage.
Sorry, I didn't see that.  That's a good link.  Put it in the Change Log.
Comment 7 Alex Christensen 2021-12-21 17:26:01 PST
I would fix this bug by moving the m_allowedNetworkHosts part of the allowsLoadFromURL check from ResourceLoadNotifier::dispatchWillSendRequest to WebLoaderStrategy::scheduleLoadFromNetworkProcess or somewhere like that.
Comment 8 Matt Woodrow 2021-12-22 15:19:34 PST
Created attachment 447837 [details]
Patch
Comment 9 Alex Christensen 2021-12-22 16:37:59 PST
Comment on attachment 447837 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=447837&action=review

nicer

> Tools/TestWebKitAPI/Tests/mac/LoadWebArchive.mm:259
> +    NSData *data = [NSPropertyListSerialization dataFromPropertyList:archive format:NSPropertyListBinaryFormat_v1_0 errorDescription:nil];

This is all duplicate code with the test before it.  Could we instead share it and make a static function that returns an NSData?
Comment 10 Matt Woodrow 2021-12-22 19:30:51 PST
Created attachment 447852 [details]
Patch
Comment 11 Alex Christensen 2021-12-23 11:55:18 PST
Created attachment 447894 [details]
Patch
Comment 12 Alex Christensen 2021-12-23 12:00:04 PST
Created attachment 447896 [details]
Patch
Comment 13 Alex Christensen 2021-12-23 12:01:41 PST
Comment on attachment 447896 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=447896&action=review

> Source/WebKit/WebProcess/Network/WebLoaderStrategy.cpp:311
> +            RunLoop::main().dispatch([resourceLoader = Ref { resourceLoader }] {
> +                resourceLoader->didFail(resourceLoader->blockedError());

This changed to call didFail on another run loop iteration to make this behave more like a real load that was blocked and make the other existing tests pass.

> Source/WebKit/WebProcess/Network/WebResourceLoader.cpp:116
> +            if (!page->allowsLoadFromURL(proposedRequest.url(), mainFrameMainResource))

I added this to prevent an allowed host from redirecting to a forbidden host.  ResourceLoadNotifier::dispatchWillSendRequest was called when redirecting and when preparing to send the initial request, so after moving it we need to do this in two places.
Comment 14 EWS 2021-12-23 15:11:01 PST
Committed r287414 (245549@main): <https://commits.webkit.org/245549@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 447896 [details].