Bug 236062 - Setup a custom URL scheme for PDF.js resources
Summary: Setup a custom URL scheme for PDF.js resources
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Tim Nguyen (:ntim)
URL:
Keywords: InRadar
Depends on:
Blocks: 235969
  Show dependency treegraph
 
Reported: 2022-02-03 00:22 PST by Tim Nguyen (:ntim)
Modified: 2022-02-21 10:40 PST (History)
12 users (show)

See Also:


Attachments
Patch (18.53 KB, patch)
2022-02-10 02:31 PST, Tim Nguyen (:ntim)
no flags Details | Formatted Diff | Diff
Patch (13.51 KB, patch)
2022-02-10 07:38 PST, Tim Nguyen (:ntim)
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (19.23 KB, patch)
2022-02-10 07:40 PST, Tim Nguyen (:ntim)
no flags Details | Formatted Diff | Diff
Patch (19.31 KB, patch)
2022-02-10 07:43 PST, Tim Nguyen (:ntim)
youennf: review+
Details | Formatted Diff | Diff
[fast-cq] Patch for landing (19.12 KB, patch)
2022-02-10 09:43 PST, Tim Nguyen (:ntim)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Nguyen (:ntim) 2022-02-03 00:22:56 PST
A custom URL scheme
Comment 1 Radar WebKit Bug Importer 2022-02-03 00:23:31 PST
<rdar://problem/88422488>
Comment 2 Tim Nguyen (:ntim) 2022-02-10 02:31:47 PST
Created attachment 451509 [details]
Patch
Comment 3 youenn fablet 2022-02-10 04:29:10 PST
Comment on attachment 451509 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=451509&action=review

> Source/WebCore/ChangeLog:11
> +        https://bugs.webkit.org/show_bug.cgi?id=235981 takes care of populating this folder at build-time.

Why not doing https://bugs.webkit.org/show_bug.cgi?id=235981 first and then https://bugs.webkit.org/show_bug.cgi?id=236062
This would allow to enable tests when landing this patch.

> Source/WebCore/loader/MixedContentChecker.cpp:68
> +    if (frame.ownerElement() && frame.ownerElement()->document().isPDFDocument() && url.protocolIs("webkit-pdfjs-viewer"))

There are a lot of protocolIs("webkit-pdfjs-viewer"), might be worth adding a isPDFJsURL(const URL&) for instance.

> Source/WebCore/loader/MixedContentChecker.cpp:95
> +    if (frame.document()->isPDFDocument() && url.protocolIs("webkit-pdfjs-viewer"))

Maybe we can teach WebKit to consider webkit-pdfjs-viewer is secure through our legacy scheme handlers.

> Source/WebCore/loader/cocoa/PDFJSResourceLoader.mm:55
> +    loadQueue().dispatch([protectedLoader = Ref { loader }, url]() mutable {

Need to isolatedCopy the url to be thread safe.
Something like url = url.isolatedCopy().

> Source/WebCore/loader/cocoa/PDFJSResourceLoader.mm:56
> +        NSString *relativePath = [@"pdfjs/" stringByAppendingString: (static_cast<NSURL*>(url).path)];

It seems this is mostly PDFJS agnostic. Maybe we could generalise this by adding a few parameters to loadResource, and rename it to something like loadResourceFromBundle.

> Source/WebCore/loader/cocoa/PDFJSResourceLoader.mm:60
> +        NSData *data = [NSData dataWithContentsOfFile:path];

auto * for all 3 variables?

> Source/WebCore/loader/cocoa/PDFJSResourceLoader.mm:63
> +            RunLoop::main().dispatch([protectedLoader = WTFMove(protectedLoader), url] {

Ditto for isolating copy the URL.
Something like url = WTFMove(url).isolatedCopy() to remove a memory copy.
Or we could try merging both dispatch in one by dispatch capturing a RefPtr<FragmentedSharedBuffer>

> Source/WebCore/loader/cocoa/PDFJSResourceLoader.mm:69
> +        RunLoop::main().dispatch([protectedLoader = WTFMove(protectedLoader), url, buffer = FragmentedSharedBuffer::create(data)] {

Can probably use SharedBuffer instead of FragmentedSharedBuffer.
Ditto for isolating copy the URL.

> Source/WebCore/platform/LegacySchemeRegistry.cpp:83
> +            "webkit-pdfjs-viewer",

Why is this needed?
Maybe we can try making webkit-pdfjs-viewer a little bit more than that. Something like part of builtinSecureSchemes and builtinSchemesWithUniqueOrigins.
Worth some additional thoughts there.

> Source/WebKit/WebProcess/Network/WebLoaderStrategy.cpp:279
> +        return false;

We could try to fail webkit-pdfjs-viewer loads here if load is a=subresource load and its document is not PDF.

Doing those checks here might allow removing some of the other checks in WebCore.
It might be worth adding some tests like: fetch("webkit-pdfjs-viewer://xxx") and see whether it fails for regular http/https loads.
Comment 4 Tim Nguyen (:ntim) 2022-02-10 07:33:04 PST
Comment on attachment 451509 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=451509&action=review

>> Source/WebCore/ChangeLog:11
>> +        https://bugs.webkit.org/show_bug.cgi?id=235981 takes care of populating this folder at build-time.
> 
> Why not doing https://bugs.webkit.org/show_bug.cgi?id=235981 first and then https://bugs.webkit.org/show_bug.cgi?id=236062
> This would allow to enable tests when landing this patch.

https://bugs.webkit.org/show_bug.cgi?id=235981 is blocked on bundle size concerns. We'd like to land this first for prototyping, and to unblock other subtasks too.
Comment 5 Tim Nguyen (:ntim) 2022-02-10 07:38:18 PST
Created attachment 451531 [details]
Patch
Comment 6 Tim Nguyen (:ntim) 2022-02-10 07:40:52 PST
Created attachment 451532 [details]
Patch
Comment 7 Tim Nguyen (:ntim) 2022-02-10 07:43:39 PST
Created attachment 451533 [details]
Patch
Comment 8 youenn fablet 2022-02-10 08:03:57 PST
Comment on attachment 451533 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=451533&action=review

> Source/WebCore/loader/PolicyChecker.cpp:173
> +    bool isInPDFDocumentFrame = m_frame.ownerElement() && m_frame.ownerElement()->document().isPDFDocument();

What about main frame loads?

> Source/WebCore/loader/PolicyChecker.cpp:174
> +    if (isInPDFDocumentFrame && !request.isNull() && request.url().protocolIs("webkit-pdfjs-viewer")) {

I would guess there cannot be a case where request is null but request url protocol is matching viewer, so we can probably remove !request.isNull() check.

> Source/WebCore/loader/ResourceLoader.cpp:881
> +    }

Maybe we can write it this way:
if (options().mode == FetchOptions::Mode::Navigate)
    return true;

auto* frame = resourceLoader.frame();
auto* document = frame ? frame->document() : nullptr;
return document ? document->isPDFDocument() : false;

> Source/WebCore/loader/cocoa/BundleResourceLoader.h:30
> +void loadResourceFromBundle(WebCore::ResourceLoader&, const String&);

s/WebCore:://.
and forward declare ResourceLoader. Might want to add String forward declaration as well.

> Source/WebCore/loader/cocoa/BundleResourceLoader.mm:55
> +    loadQueue().dispatch([protectedLoader = Ref { loader }, url = url.isolatedCopy(), subdirectory = subdirectory.isolatedCopy()]() mutable {

url = loader.request().url().isolatedCopy() to reduce count churning.

> Source/WebCore/loader/cocoa/BundleResourceLoader.mm:69
> +            auto mimeType = MIMETypeRegistry::mimeTypeForPath(url.path().toString());

Given mimeTypeForPath is only looking at the suffix, can we do: MIMETypeRegistry::mimeTypeForPath(url.string()) as a small optimization?

> Source/WebCore/loader/cocoa/BundleResourceLoader.mm:74
> +            response.addHTTPHeaderField(HTTPHeaderName::AccessControlAllowOrigin, "*");

Is it needed?
AccessControlAllowOrigin is only used for XHR/fetch and maybe the loads will be considered same origin.

> Source/WebCore/loader/cocoa/BundleResourceLoader.mm:75
> +            protectedLoader->deliverResponseAndData(response, buffer.ptr());

WTFMove(buffer) and make the lambda mutable.

> Source/WebKit/WebProcess/Network/WebLoaderStrategy.cpp:275
> +bool WebLoaderStrategy::tryLoadingUsingPDFJSHandler(ResourceLoader& resourceLoader, const WebResourceLoader::TrackingParameters& trackingParameters)

trackingParameters not needed.

> Source/WebKit/WebProcess/Network/WebLoaderStrategy.cpp:277
> +    if (!resourceLoader.isPDFJSResourceLoad())

Given we do this check in ResourceLoader, I would only do the scheme check here.
Comment 9 youenn fablet 2022-02-10 08:08:26 PST
Comment on attachment 451533 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=451533&action=review

> Source/WebCore/ChangeLog:11
> +        https://bugs.webkit.org/show_bug.cgi?id=235981 takes care of populating this folder at build-time.

Please update change log according latest patch.

> Source/WebKit/ChangeLog:8
> +        The code handling the scheme is under WebCore::PDFJSResourceLoader.

Ditto.

> Source/WebCore/loader/ResourceLoader.h:162
> +    WEBCORE_EXPORT bool isPDFJSResourceLoad() const;

Could be #if PLATFORM(COCOA)?
Comment 10 Tim Nguyen (:ntim) 2022-02-10 09:36:51 PST
(In reply to youenn fablet from comment #8)
> Comment on attachment 451533 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=451533&action=review
> 
> > Source/WebCore/loader/PolicyChecker.cpp:173
> > +    bool isInPDFDocumentFrame = m_frame.ownerElement() && m_frame.ownerElement()->document().isPDFDocument();
> 
> What about main frame loads?

We want to disallow main frame loads.

> > Source/WebCore/loader/PolicyChecker.cpp:174
> > +    if (isInPDFDocumentFrame && !request.isNull() && request.url().protocolIs("webkit-pdfjs-viewer")) {
> 
> I would guess there cannot be a case where request is null but request url
> protocol is matching viewer, so we can probably remove !request.isNull()
> check.

Yep, removed the check.

> > Source/WebCore/loader/ResourceLoader.cpp:881
> > +    }
> 
> Maybe we can write it this way:
> if (options().mode == FetchOptions::Mode::Navigate)
>     return true;
> 
> auto* frame = resourceLoader.frame();
> auto* document = frame ? frame->document() : nullptr;
> return document ? document->isPDFDocument() : false;

Simplified the existing method.

> > Source/WebCore/loader/cocoa/BundleResourceLoader.h:30
> > +void loadResourceFromBundle(WebCore::ResourceLoader&, const String&);
> 
> s/WebCore:://.
> and forward declare ResourceLoader. Might want to add String forward
> declaration as well.

Done.

> > Source/WebCore/loader/cocoa/BundleResourceLoader.mm:55
> > +    loadQueue().dispatch([protectedLoader = Ref { loader }, url = url.isolatedCopy(), subdirectory = subdirectory.isolatedCopy()]() mutable {
> 
> url = loader.request().url().isolatedCopy() to reduce count churning.

Done.

> > Source/WebCore/loader/cocoa/BundleResourceLoader.mm:69
> > +            auto mimeType = MIMETypeRegistry::mimeTypeForPath(url.path().toString());
> 
> Given mimeTypeForPath is only looking at the suffix, can we do:
> MIMETypeRegistry::mimeTypeForPath(url.string()) as a small optimization?

This doesn't seem to work. I'll investigate.

> > Source/WebCore/loader/cocoa/BundleResourceLoader.mm:74
> > +            response.addHTTPHeaderField(HTTPHeaderName::AccessControlAllowOrigin, "*");
> 
> Is it needed?
> AccessControlAllowOrigin is only used for XHR/fetch and maybe the loads will
> be considered same origin.

Yes, otherwise images can't load.

> > Source/WebCore/loader/cocoa/BundleResourceLoader.mm:75
> > +            protectedLoader->deliverResponseAndData(response, buffer.ptr());
> 
> WTFMove(buffer) and make the lambda mutable.

Done.

> > Source/WebKit/WebProcess/Network/WebLoaderStrategy.cpp:275
> > +bool WebLoaderStrategy::tryLoadingUsingPDFJSHandler(ResourceLoader& resourceLoader, const WebResourceLoader::TrackingParameters& trackingParameters)
> 
> trackingParameters not needed.

They are actually needed for the
`WEBLOADERSTRATEGY_RELEASE_LOG("tryLoadingUsingPDFJSHandler: URL will be scheduled with the PDFJS url scheme handler");` macro.

> > Source/WebKit/WebProcess/Network/WebLoaderStrategy.cpp:277
> > +    if (!resourceLoader.isPDFJSResourceLoad())
> 
> Given we do this check in ResourceLoader, I would only do the scheme check
> here.

Done.
Comment 11 Tim Nguyen (:ntim) 2022-02-10 09:38:45 PST
(In reply to youenn fablet from comment #9)
> Comment on attachment 451533 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=451533&action=review
> 
> > Source/WebCore/ChangeLog:11
> > +        https://bugs.webkit.org/show_bug.cgi?id=235981 takes care of populating this folder at build-time.
> 
> Please update change log according latest patch.
> 
> > Source/WebKit/ChangeLog:8
> > +        The code handling the scheme is under WebCore::PDFJSResourceLoader.
> 
> Ditto.

Done.

> > Source/WebCore/loader/ResourceLoader.h:162
> > +    WEBCORE_EXPORT bool isPDFJSResourceLoad() const;
> 
> Could be #if PLATFORM(COCOA)?

I don't feel strongly, though this could be cross platform if other platforms implement a BundleResourceLoader too. Right now, this function returns false for non Cocoa platforms which should be good enough.
Comment 12 Tim Nguyen (:ntim) 2022-02-10 09:43:11 PST
Created attachment 451553 [details]
[fast-cq] Patch for landing
Comment 13 EWS 2022-02-10 09:50:42 PST
Committed r289549 (247080@main): <https://commits.webkit.org/247080@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 451553 [details].
Comment 14 Alex Christensen 2022-02-10 09:52:25 PST
View in context: https://bugs.webkit.org/attachment.cgi?id=451553&action=review

> Source/WebCore/loader/SubresourceLoader.h:55
> +    SecurityOrigin* origin() const { return m_origin.get(); }

const SecurityOrigin* origin() const

> Source/WebCore/loader/cocoa/BundleResourceLoader.mm:45
> +    static auto& queue = WorkQueue::create("org.WebCore.BundleResourceLoader", WorkQueue::QOS::Utility).leakRef();

org.WebKit.BundleResourceLoader
Comment 15 Tim Nguyen (:ntim) 2022-02-10 10:48:01 PST
(In reply to Alex Christensen from comment #14)
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=451553&action=review
> 
> > Source/WebCore/loader/SubresourceLoader.h:55
> > +    SecurityOrigin* origin() const { return m_origin.get(); }
> 
> const SecurityOrigin* origin() const

This caused other build problems.

> > Source/WebCore/loader/cocoa/BundleResourceLoader.mm:45
> > +    static auto& queue = WorkQueue::create("org.WebCore.BundleResourceLoader", WorkQueue::QOS::Utility).leakRef();
> 
> org.WebKit.BundleResourceLoader

Fixed in https://commits.webkit.org/r289553
Comment 16 Aakash Jain 2022-02-12 07:19:03 PST
(In reply to EWS from comment #13)
> Committed r289549 (247080@main): <https://commits.webkit.org/247080@main>

Seems like this broke a layout-test on ios-wk2 https://results.webkit.org/?suite=layout-tests&test=imported%2Fw3c%2Fweb-platform-tests%2Furl%2Ftoascii.window.html

e.g.:
https://build.webkit.org/results/Apple-iPadOS-15-Simulator-Release-WK2-Tests/r289589%20(1019)/results.html
Diff:
https://build.webkit.org/results/Apple-iPadOS-15-Simulator-Release-WK2-Tests/r289589%20(1019)/imported/w3c/web-platform-tests/url/toascii.window-diff.txt
-FAIL xn--tešla (using URL) assert_throws_js: function "() => makeURL("url", hostTest.input)" did not throw
-FAIL xn--tešla (using URL.host) assert_equals: expected "x" but got "xn--teala"
-FAIL xn--tešla (using URL.hostname) assert_equals: expected "x" but got "xn--teala"
-FAIL xn--tešla (using <a>) assert_equals: expected "" but got "xn--teala"
-FAIL xn--tešla (using <a>.host) assert_equals: expected "x" but got "xn--teala"
-FAIL xn--tešla (using <a>.hostname) assert_equals: expected "x" but got "xn--teala"
-FAIL xn--tešla (using <area>) assert_equals: expected "" but got "xn--teala"
-FAIL xn--tešla (using <area>.host) assert_equals: expected "x" but got "xn--teala"
-FAIL xn--tešla (using <area>.hostname) assert_equals: expected "x" but got "xn--teala"
+PASS xn--tešla (using URL)
+PASS xn--tešla (using URL.host)
+PASS xn--tešla (using URL.hostname)
+PASS xn--tešla (using <a>)
+PASS xn--tešla (using <a>.host)
+PASS xn--tešla (using <a>.hostname)
+PASS xn--tešla (using <area>)
+PASS xn--tešla (using <area>.host)
+PASS xn--tešla (using <area>.hostname)
 PASS xn--zca.xn--zca (using URL)
Comment 17 Alex Christensen 2022-02-15 09:42:08 PST
(In reply to Tim Nguyen (:ntim) from comment #15)
> (In reply to Alex Christensen from comment #14)
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=451553&action=review
> > 
> > > Source/WebCore/loader/SubresourceLoader.h:55
> > > +    SecurityOrigin* origin() const { return m_origin.get(); }
> > 
> > const SecurityOrigin* origin() const
> 
> This caused other build problems.
We need to fix those other build problems.  Does this help?

const SecurityOrigin* origin() const { return m_origin.get(); }
SecurityOrigin* origin() { return m_origin.get(); }
Comment 18 Alex Christensen 2022-02-21 10:40:13 PST
I'm doing this in https://bugs.webkit.org/show_bug.cgi?id=236983