Bug 237513 - [GTK][WPE] Add initial support for PDF documents using PDF.js
Summary: [GTK][WPE] Add initial support for PDF documents using PDF.js
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: Gtk
Depends on:
Blocks:
 
Reported: 2022-03-07 01:49 PST by Carlos Garcia Campos
Modified: 2022-05-17 15:43 PDT (History)
11 users (show)

See Also:


Attachments
Patch (14.93 KB, patch)
2022-03-07 01:58 PST, Carlos Garcia Campos
mcatanzaro: review+
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch for landing (16.93 KB, patch)
2022-03-09 02:57 PST, Carlos Garcia Campos
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Carlos Garcia Campos 2022-03-07 01:49:08 PST
Use PDF.js from third party.
Comment 1 Carlos Garcia Campos 2022-03-07 01:58:24 PST
Created attachment 453949 [details]
Patch

This is an initial patch, loading PDF documents works, but there are a few things to fix in follow up patches:

 - Figure out a way to load platform styles. style-cocoa.css is currently added unconditionally (we just get a 404 now and default style is used).
 - I'm seeing a couple of cross-origin errors in the console, even though I don't see anything broken.
 - Add a style for GTK/WPE (adwaita), to remove UI elements we don't want, like the open button, and customize the UI a bit too.
Comment 2 Michael Catanzaro 2022-03-07 08:43:42 PST
(In reply to Carlos Garcia Campos from comment #1)
>  - I'm seeing a couple of cross-origin errors in the console, even though I
> don't see anything broken.

Epiphany has a pretty gross hack to avoid these, but it's bad and you don't want to use it. For us, these errors indicated that stuff really was broken. Jan-Michael would probably remember for sure what it was.

(In reply to Carlos Garcia Campos from comment #1)
>  - Add a style for GTK/WPE (adwaita), to remove UI elements we don't want,
> like the open button, and customize the UI a bit too.

I think we removed the open button because it didn't work, not because we didn't want it. Should probably hook it up to a WebKitFileChooserRequest.
Comment 3 Brent Fulgham 2022-03-07 08:56:58 PST
Comment on attachment 453949 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=453949&action=review

> Source/WebKit/UIProcess/WebPageProxy.cpp:1915
> +        return true;

@ntim: Should we have done this for our port? Will this break something for Cocoa ports if we choose not to ship with it enabled by default yet?

> Source/WebKit/WebProcess/WebPage/WebPage.cpp:6460
> +        return true;

Ditto:
Comment 4 Michael Catanzaro 2022-03-07 09:26:09 PST
Comment on attachment 453949 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=453949&action=review

Do you plan to investigate adapting Epiphany for this? We'll be able to remove a lot of code there....

> Source/WebCore/loader/ResourceLoader.cpp:872
> -#if PLATFORM(COCOA)
> +#if PLATFORM(COCOA) || PLATFORM(GTK) || PLATFORM(WPE)

No need for #if here: we have a runtime setting PdfJSViewerEnabled, so should check that instead.

> Source/WebCore/page/SecurityOrigin.cpp:124
> +        || url.protocolIs("webkit-pdfjs-viewer")

This is surely required for all ports using PDF.js, not just GTK and WPE. Better check PdfJSViewerEnabled instead of using #if.

> Source/WebCore/platform/LegacySchemeRegistry.cpp:148
> -#if PLATFORM(COCOA)
> +#if PLATFORM(COCOA) || PLATFORM(GTK) || PLATFORM(WPE)
>          "webkit-pdfjs-viewer"_s,
>  #endif

If you change this from std::array to std::vector, then you can check the runtime setting PdfJSViewerEnabled here too.

> Source/WebKit/UIProcess/WebPageProxy.cpp:1915
> +    if (m_preferences->pdfJSViewerEnabled() && MIMETypeRegistry::isPDFMIMEType(mimeType))
> +        return true;

Note: this requires owner approval to land.

> Source/WebKit/WebProcess/WebPage/WebPage.cpp:6460
> +    if (m_page->settings().pdfJSViewerEnabled() && MIMETypeRegistry::isPDFMIMEType(mimeType))
> +        return true;

This too.

> Tools/glib/generate-pdfjs-gresource-manifest.py:35
> +        if os.path.splitext(resource)[1] not in VALID_EXTENSIONS:
> +            return True

Hm, I don't like this: we'll likely drop desired files by mistake during future updates of PDF.js. I'd either (a) drop this altogether and just include everything, or (b) use a denylist containing just '.map' as a sanity-check to ensure we don't include those pre-generated map files by mistake.

> Tools/glib/generate-pdfjs-gresource-manifest.py:54
> +            # separator, is properly replaced

Nit: no comma here
Comment 5 Michael Catanzaro 2022-03-07 09:28:28 PST
(In reply to Brent Fulgham from comment #3)
> > Source/WebKit/UIProcess/WebPageProxy.cpp:1915
> > +        return true;
> 
> @ntim: Should we have done this for our port? Will this break something for
> Cocoa ports if we choose not to ship with it enabled by default yet?

Note the runtime setting defaults to off for Cocoa ports, so that shouldn't affect you unless the runtime setting is toggled. (GTK/WPE do not actually have any way to toggle these runtime settings. I guess you probably do have a way, though?)

BTW: we need owner approval for these lines, hint hint. ;)
Comment 6 Michael Catanzaro 2022-03-07 09:29:56 PST
Comment on attachment 453949 [details]
Patch

Also there's three test failures to deal with before landing:

fast/replaced/encrypted-pdf-as-object-and-embed.html
http/tests/cache/disk-cache/shattered-deduplication.html
imported/w3c/web-platform-tests/html/semantics/embedded-content/the-iframe-element/sandbox_004.htm
Comment 7 Carlos Garcia Campos 2022-03-08 00:29:47 PST
Comment on attachment 453949 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=453949&action=review

>> Source/WebCore/page/SecurityOrigin.cpp:124
>> +        || url.protocolIs("webkit-pdfjs-viewer")
> 
> This is surely required for all ports using PDF.js, not just GTK and WPE. Better check PdfJSViewerEnabled instead of using #if.

It's enabled unconditionally for GTK and WPE. We can discuss if we want to add api to enable/disable it, though. But for now, I think it's easier this way. Cocoa doesn't do this because they have :

// Allow images to load.
response.addHTTPHeaderField(HTTPHeaderName::AccessControlAllowOrigin, "*");

in the bundle resource loader. I  did it this way for consistency with gresources, since webkit-pdfjs-viewer are gresources too in the end.

>> Source/WebCore/platform/LegacySchemeRegistry.cpp:148
>>  #endif
> 
> If you change this from std::array to std::vector, then you can check the runtime setting PdfJSViewerEnabled here too.

And same here.

>> Tools/glib/generate-pdfjs-gresource-manifest.py:35
>> +            return True
> 
> Hm, I don't like this: we'll likely drop desired files by mistake during future updates of PDF.js. I'd either (a) drop this altogether and just include everything, or (b) use a denylist containing just '.map' as a sanity-check to ensure we don't include those pre-generated map files by mistake.

I think it's unlikely new file types are added in new versions. I've tried hard to not add any file we don't need to the bundle, because it's already big enough. If it breaks after an upgrade we can easily fix it.
Comment 8 Carlos Garcia Campos 2022-03-08 00:31:00 PST
(In reply to Michael Catanzaro from comment #6)
> Comment on attachment 453949 [details]
> Patch
> 
> Also there's three test failures to deal with before landing:
> 
> fast/replaced/encrypted-pdf-as-object-and-embed.html
> http/tests/cache/disk-cache/shattered-deduplication.html
> imported/w3c/web-platform-tests/html/semantics/embedded-content/the-iframe-
> element/sandbox_004.htm

Yes, I'll check. Maybe we can just disable pdf support in WTR for now.
Comment 9 Carlos Garcia Campos 2022-03-08 01:17:54 PST
Rather than API to enable/disable it, maybe we want a build option to allow embedders who don't need pdf support to opt-out, because pdfjs adds a few MB to the library.
Comment 10 Tim Nguyen (:ntim) 2022-03-08 06:02:46 PST
Comment on attachment 453949 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=453949&action=review

>> Source/WebCore/loader/ResourceLoader.cpp:872
>> +#if PLATFORM(COCOA) || PLATFORM(GTK) || PLATFORM(WPE)
> 
> No need for #if here: we have a runtime setting PdfJSViewerEnabled, so should check that instead.

The goal here is to target platforms that implement the webkit-pdfjs-viewer scheme, which may be correlated to PdfJSViewerEnabled, but not necessarily. Perhaps we should have a WEBKIT-PDFJS-VIEWER-SCHEME macro?

>>> Source/WebCore/page/SecurityOrigin.cpp:124
>>> +        || url.protocolIs("webkit-pdfjs-viewer")
>> 
>> This is surely required for all ports using PDF.js, not just GTK and WPE. Better check PdfJSViewerEnabled instead of using #if.
> 
> It's enabled unconditionally for GTK and WPE. We can discuss if we want to add api to enable/disable it, though. But for now, I think it's easier this way. Cocoa doesn't do this because they have :
> 
> // Allow images to load.
> response.addHTTPHeaderField(HTTPHeaderName::AccessControlAllowOrigin, "*");
> 
> in the bundle resource loader. I  did it this way for consistency with gresources, since webkit-pdfjs-viewer are gresources too in the end.

If we can make this consistent across platforms, that would be great. I'm open to removing response.addHTTPHeaderField(HTTPHeaderName::AccessControlAllowOrigin, "*"); too.

Can you add a comment either way to explain why PDF.js needs this?

>>> Source/WebCore/platform/LegacySchemeRegistry.cpp:148
>>>  #endif
>> 
>> If you change this from std::array to std::vector, then you can check the runtime setting PdfJSViewerEnabled here too.
> 
> And same here.

ditto here, maybe a macro type thing.

>>> Source/WebKit/UIProcess/WebPageProxy.cpp:1915
>>> +        return true;
>> 
>> @ntim: Should we have done this for our port? Will this break something for Cocoa ports if we choose not to ship with it enabled by default yet?
> 
> Note: this requires owner approval to land.

Brent, Cocoa is covered by the 5 lines above:

#if PLATFORM(COCOA)
// On Mac, we can show PDFs.
if (MIMETypeRegistry::isPDFOrPostScriptMIMEType(mimeType) && !WebProcessPool::omitPDFSupport()
  return true;
#endif // PLATFORM(COCOA)

I think we should add this, but put it above the Cocoa check, although I'm not sure if respecting the `WebProcessPool::omitPDFSupport()` pref is important in the context of PDF.js.

>>> Source/WebKit/WebProcess/WebPage/WebPage.cpp:6460
>>> +        return true;
>> 
>> Ditto:
> 
> This too.

I'm not sure what this adds? Can you describe why you're adding this?
Comment 11 Michael Catanzaro 2022-03-08 06:19:01 PST
(In reply to Carlos Garcia Campos from comment #7)
> It's enabled unconditionally for GTK and WPE. We can discuss if we want to
> add api to enable/disable it, though. But for now, I think it's easier this
> way. Cocoa doesn't do this because they have :
> 
> // Allow images to load.
> response.addHTTPHeaderField(HTTPHeaderName::AccessControlAllowOrigin, "*");
> 
> in the bundle resource loader. I  did it this way for consistency with
> gresources, since webkit-pdfjs-viewer are gresources too in the end.

Hmmm, OK, I see why you did this. I'm not actually certain, but I *suspect* this wasn't the best approach. There are a few other considerations:

 * A webkit-pdf-viewer: URI has a host component and therefore it should probably not be treated as a unique origin, i.e. shouldTreatAsUniqueOrigin should return false (...right?). Your change accomplishes this only for GTK and WPE, but leaves Cocoa ports with the opposite result. Understanding that Cocoa ports don't *need* it for PDF.js to work, I think they still *want* this. (I assume LegacySchemeRegistry::schemeIsHandledBySchemeHandler will always return true for webkit-pdf-viewer, causing shouldTreatAsUniqueOrigin to return false if we don't have a special case for webkit-pdfjs-viewer URIs.)

 * Matching the Cocoa port behavior allows us to share the same set of bugs between ports, which is valuable.

 * You mention you are seeing CORS errors in the console, but surely those will go away if we add AccessControlAllowOrigin=* to match Cocoa ports, so we should probably do that too, right? In fact, I wonder if maybe we should do that for all GResources. We spent ages trying to make PDF.js dodge CORS, and even wound up adding new public API webkit_web_view_set_cors_allowlist() to facilitate it. None of that would have been needed if GResources just did the right thing. Resource content should be trusted and allowed to embed content from any origin it wants, right?

So even if your way works, I think it'd be nice to match behavior between ports anyway. Normally we use #if only when something either *has* to be different between ports, or at least really *wants* to be different. I don't think that's the case here. I won't block you on this, though (you still have r+ regardless).

> I think it's unlikely new file types are added in new versions. I've tried
> hard to not add any file we don't need to the bundle, because it's already
> big enough. If it breaks after an upgrade we can easily fix it.

What files do we not need besides the *.map and the example PDF? If you want to maintain an allowlist of file extensions, then please maintain a denylist as well, and fail the build if a file doesn't match one list or the other, so we'll be sure to notice when something is wrong. It's not much extra code now, but will make things easier for us in the future if it's really needed.

I'm sure whoever winds up updating PDF.js now that it has moved to WebKit is not going to want to deal with important files discarded with no warning... *especially* as that will only happen for GTK/WPE, so someone from Apple would really have to chance to avoid breaking us if there is no build failure.
Comment 12 Carlos Garcia Campos 2022-03-08 06:41:19 PST
(In reply to Michael Catanzaro from comment #11)
> (In reply to Carlos Garcia Campos from comment #7)
> > It's enabled unconditionally for GTK and WPE. We can discuss if we want to
> > add api to enable/disable it, though. But for now, I think it's easier this
> > way. Cocoa doesn't do this because they have :
> > 
> > // Allow images to load.
> > response.addHTTPHeaderField(HTTPHeaderName::AccessControlAllowOrigin, "*");
> > 
> > in the bundle resource loader. I  did it this way for consistency with
> > gresources, since webkit-pdfjs-viewer are gresources too in the end.
> 
> Hmmm, OK, I see why you did this. I'm not actually certain, but I *suspect*
> this wasn't the best approach. There are a few other considerations:
> 
>  * A webkit-pdf-viewer: URI has a host component and therefore it should
> probably not be treated as a unique origin, i.e. shouldTreatAsUniqueOrigin
> should return false (...right?). Your change accomplishes this only for GTK
> and WPE, but leaves Cocoa ports with the opposite result. Understanding that
> Cocoa ports don't *need* it for PDF.js to work, I think they still *want*
> this. (I assume LegacySchemeRegistry::schemeIsHandledBySchemeHandler will
> always return true for webkit-pdf-viewer, causing shouldTreatAsUniqueOrigin
> to return false if we don't have a special case for webkit-pdfjs-viewer
> URIs.)
> 
>  * Matching the Cocoa port behavior allows us to share the same set of bugs
> between ports, which is valuable.
> 
>  * You mention you are seeing CORS errors in the console, but surely those
> will go away if we add AccessControlAllowOrigin=* to match Cocoa ports, so
> we should probably do that too, right? In fact, I wonder if maybe we should
> do that for all GResources. We spent ages trying to make PDF.js dodge CORS,
> and even wound up adding new public API webkit_web_view_set_cors_allowlist()
> to facilitate it. None of that would have been needed if GResources just did
> the right thing. Resource content should be trusted and allowed to embed
> content from any origin it wants, right?

No, the initial CORS errors prevented the resourced from being loaded, those are fixed either adding the HTTP header or making the scheme as not unique. The errors I'0m still seeing happen with the HTTP header too. I'll submit a new bug for those.

> So even if your way works, I think it'd be nice to match behavior between
> ports anyway. Normally we use #if only when something either *has* to be
> different between ports, or at least really *wants* to be different. I don't
> think that's the case here. I won't block you on this, though (you still
> have r+ regardless).

I'm fine switching to cocoa behavior, I haver to check it also works for the inspector resources.

> > I think it's unlikely new file types are added in new versions. I've tried
> > hard to not add any file we don't need to the bundle, because it's already
> > big enough. If it breaks after an upgrade we can easily fix it.
> 
> What files do we not need besides the *.map and the example PDF? If you want
> to maintain an allowlist of file extensions, then please maintain a denylist
> as well, and fail the build if a file doesn't match one list or the other,
> so we'll be sure to notice when something is wrong. It's not much extra code
> now, but will make things easier for us in the future if it's really needed.
> 
> I'm sure whoever winds up updating PDF.js now that it has moved to WebKit is
> not going to want to deal with important files discarded with no warning...
> *especially* as that will only happen for GTK/WPE, so someone from Apple
> would really have to chance to avoid breaking us if there is no build
> failure.
Comment 13 Michael Catanzaro 2022-03-08 07:23:21 PST
(In reply to Carlos Garcia Campos from comment #12)
> I'm fine switching to cocoa behavior, I haver to check it also works for the
> inspector resources.

Please read my last comment again: I think we want both behaviors for all ports, not one or the other.
Comment 14 Tim Nguyen (:ntim) 2022-03-08 11:05:51 PST
Comment on attachment 453949 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=453949&action=review

> Tools/glib/generate-pdfjs-gresource-manifest.py:88
> +        if not filename.startswith('build/') and not filename.startswith('web/'):
> +            alias = 'extras/' + filename
> +            line += ' alias="%s"' % alias
> +        line += '>%s</file>\n' % filename

fwiw, pdfjs-extras is copied as as subfolder of the pdfjs folder called "extras" in the bundle. We don't do this type of aliasing based on the folder name.
Comment 15 Carlos Garcia Campos 2022-03-09 02:57:36 PST
Created attachment 454213 [details]
Patch for landing
Comment 16 Carlos Garcia Campos 2022-03-10 01:00:34 PST
Committed r291094 (248256@trunk): <https://commits.webkit.org/248256@trunk>
Comment 17 Michael Catanzaro 2022-05-17 15:43:58 PDT
(In reply to Michael Catanzaro from comment #11)
> What files do we not need besides the *.map and the example PDF? If you want
> to maintain an allowlist of file extensions, then please maintain a denylist
> as well, and fail the build if a file doesn't match one list or the other,
> so we'll be sure to notice when something is wrong. It's not much extra code
> now, but will make things easier for us in the future if it's really needed.
> 
> I'm sure whoever winds up updating PDF.js now that it has moved to WebKit is
> not going to want to deal with important files discarded with no warning...
> *especially* as that will only happen for GTK/WPE, so someone from Apple
> would really have to chance to avoid breaking us if there is no build
> failure.

Followed up on this in bug #240536.