RESOLVED FIXED 237513
[GTK][WPE] Add initial support for PDF documents using PDF.js
https://bugs.webkit.org/show_bug.cgi?id=237513
Summary [GTK][WPE] Add initial support for PDF documents using PDF.js
Carlos Garcia Campos
Reported Monday, March 7, 2022 9:49:08 AM UTC
Use PDF.js from third party.
Attachments
Patch (14.93 KB, patch)
2022-03-07 01:58 PST, Carlos Garcia Campos
mcatanzaro: review+
ews-feeder: commit-queue-
Patch for landing (16.93 KB, patch)
2022-03-09 02:57 PST, Carlos Garcia Campos
no flags
Carlos Garcia Campos
Comment 1 Monday, March 7, 2022 9:58:24 AM UTC
Created attachment 453949 [details] Patch This is an initial patch, loading PDF documents works, but there are a few things to fix in follow up patches: - Figure out a way to load platform styles. style-cocoa.css is currently added unconditionally (we just get a 404 now and default style is used). - I'm seeing a couple of cross-origin errors in the console, even though I don't see anything broken. - Add a style for GTK/WPE (adwaita), to remove UI elements we don't want, like the open button, and customize the UI a bit too.
Michael Catanzaro
Comment 2 Monday, March 7, 2022 4:43:42 PM UTC
(In reply to Carlos Garcia Campos from comment #1) > - I'm seeing a couple of cross-origin errors in the console, even though I > don't see anything broken. Epiphany has a pretty gross hack to avoid these, but it's bad and you don't want to use it. For us, these errors indicated that stuff really was broken. Jan-Michael would probably remember for sure what it was. (In reply to Carlos Garcia Campos from comment #1) > - Add a style for GTK/WPE (adwaita), to remove UI elements we don't want, > like the open button, and customize the UI a bit too. I think we removed the open button because it didn't work, not because we didn't want it. Should probably hook it up to a WebKitFileChooserRequest.
Brent Fulgham
Comment 3 Monday, March 7, 2022 4:56:58 PM UTC
Comment on attachment 453949 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=453949&action=review > Source/WebKit/UIProcess/WebPageProxy.cpp:1915 > + return true; @ntim: Should we have done this for our port? Will this break something for Cocoa ports if we choose not to ship with it enabled by default yet? > Source/WebKit/WebProcess/WebPage/WebPage.cpp:6460 > + return true; Ditto:
Michael Catanzaro
Comment 4 Monday, March 7, 2022 5:26:09 PM UTC
Comment on attachment 453949 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=453949&action=review Do you plan to investigate adapting Epiphany for this? We'll be able to remove a lot of code there.... > Source/WebCore/loader/ResourceLoader.cpp:872 > -#if PLATFORM(COCOA) > +#if PLATFORM(COCOA) || PLATFORM(GTK) || PLATFORM(WPE) No need for #if here: we have a runtime setting PdfJSViewerEnabled, so should check that instead. > Source/WebCore/page/SecurityOrigin.cpp:124 > + || url.protocolIs("webkit-pdfjs-viewer") This is surely required for all ports using PDF.js, not just GTK and WPE. Better check PdfJSViewerEnabled instead of using #if. > Source/WebCore/platform/LegacySchemeRegistry.cpp:148 > -#if PLATFORM(COCOA) > +#if PLATFORM(COCOA) || PLATFORM(GTK) || PLATFORM(WPE) > "webkit-pdfjs-viewer"_s, > #endif If you change this from std::array to std::vector, then you can check the runtime setting PdfJSViewerEnabled here too. > Source/WebKit/UIProcess/WebPageProxy.cpp:1915 > + if (m_preferences->pdfJSViewerEnabled() && MIMETypeRegistry::isPDFMIMEType(mimeType)) > + return true; Note: this requires owner approval to land. > Source/WebKit/WebProcess/WebPage/WebPage.cpp:6460 > + if (m_page->settings().pdfJSViewerEnabled() && MIMETypeRegistry::isPDFMIMEType(mimeType)) > + return true; This too. > Tools/glib/generate-pdfjs-gresource-manifest.py:35 > + if os.path.splitext(resource)[1] not in VALID_EXTENSIONS: > + return True Hm, I don't like this: we'll likely drop desired files by mistake during future updates of PDF.js. I'd either (a) drop this altogether and just include everything, or (b) use a denylist containing just '.map' as a sanity-check to ensure we don't include those pre-generated map files by mistake. > Tools/glib/generate-pdfjs-gresource-manifest.py:54 > + # separator, is properly replaced Nit: no comma here
Michael Catanzaro
Comment 5 Monday, March 7, 2022 5:28:28 PM UTC
(In reply to Brent Fulgham from comment #3) > > Source/WebKit/UIProcess/WebPageProxy.cpp:1915 > > + return true; > > @ntim: Should we have done this for our port? Will this break something for > Cocoa ports if we choose not to ship with it enabled by default yet? Note the runtime setting defaults to off for Cocoa ports, so that shouldn't affect you unless the runtime setting is toggled. (GTK/WPE do not actually have any way to toggle these runtime settings. I guess you probably do have a way, though?) BTW: we need owner approval for these lines, hint hint. ;)
Michael Catanzaro
Comment 6 Monday, March 7, 2022 5:29:56 PM UTC
Comment on attachment 453949 [details] Patch Also there's three test failures to deal with before landing: fast/replaced/encrypted-pdf-as-object-and-embed.html http/tests/cache/disk-cache/shattered-deduplication.html imported/w3c/web-platform-tests/html/semantics/embedded-content/the-iframe-element/sandbox_004.htm
Carlos Garcia Campos
Comment 7 Tuesday, March 8, 2022 8:29:47 AM UTC
Comment on attachment 453949 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=453949&action=review >> Source/WebCore/page/SecurityOrigin.cpp:124 >> + || url.protocolIs("webkit-pdfjs-viewer") > > This is surely required for all ports using PDF.js, not just GTK and WPE. Better check PdfJSViewerEnabled instead of using #if. It's enabled unconditionally for GTK and WPE. We can discuss if we want to add api to enable/disable it, though. But for now, I think it's easier this way. Cocoa doesn't do this because they have : // Allow images to load. response.addHTTPHeaderField(HTTPHeaderName::AccessControlAllowOrigin, "*"); in the bundle resource loader. I did it this way for consistency with gresources, since webkit-pdfjs-viewer are gresources too in the end. >> Source/WebCore/platform/LegacySchemeRegistry.cpp:148 >> #endif > > If you change this from std::array to std::vector, then you can check the runtime setting PdfJSViewerEnabled here too. And same here. >> Tools/glib/generate-pdfjs-gresource-manifest.py:35 >> + return True > > Hm, I don't like this: we'll likely drop desired files by mistake during future updates of PDF.js. I'd either (a) drop this altogether and just include everything, or (b) use a denylist containing just '.map' as a sanity-check to ensure we don't include those pre-generated map files by mistake. I think it's unlikely new file types are added in new versions. I've tried hard to not add any file we don't need to the bundle, because it's already big enough. If it breaks after an upgrade we can easily fix it.
Carlos Garcia Campos
Comment 8 Tuesday, March 8, 2022 8:31:00 AM UTC
(In reply to Michael Catanzaro from comment #6) > Comment on attachment 453949 [details] > Patch > > Also there's three test failures to deal with before landing: > > fast/replaced/encrypted-pdf-as-object-and-embed.html > http/tests/cache/disk-cache/shattered-deduplication.html > imported/w3c/web-platform-tests/html/semantics/embedded-content/the-iframe- > element/sandbox_004.htm Yes, I'll check. Maybe we can just disable pdf support in WTR for now.
Carlos Garcia Campos
Comment 9 Tuesday, March 8, 2022 9:17:54 AM UTC
Rather than API to enable/disable it, maybe we want a build option to allow embedders who don't need pdf support to opt-out, because pdfjs adds a few MB to the library.
Tim Nguyen (:ntim)
Comment 10 Tuesday, March 8, 2022 2:02:46 PM UTC
Comment on attachment 453949 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=453949&action=review >> Source/WebCore/loader/ResourceLoader.cpp:872 >> +#if PLATFORM(COCOA) || PLATFORM(GTK) || PLATFORM(WPE) > > No need for #if here: we have a runtime setting PdfJSViewerEnabled, so should check that instead. The goal here is to target platforms that implement the webkit-pdfjs-viewer scheme, which may be correlated to PdfJSViewerEnabled, but not necessarily. Perhaps we should have a WEBKIT-PDFJS-VIEWER-SCHEME macro? >>> Source/WebCore/page/SecurityOrigin.cpp:124 >>> + || url.protocolIs("webkit-pdfjs-viewer") >> >> This is surely required for all ports using PDF.js, not just GTK and WPE. Better check PdfJSViewerEnabled instead of using #if. > > It's enabled unconditionally for GTK and WPE. We can discuss if we want to add api to enable/disable it, though. But for now, I think it's easier this way. Cocoa doesn't do this because they have : > > // Allow images to load. > response.addHTTPHeaderField(HTTPHeaderName::AccessControlAllowOrigin, "*"); > > in the bundle resource loader. I did it this way for consistency with gresources, since webkit-pdfjs-viewer are gresources too in the end. If we can make this consistent across platforms, that would be great. I'm open to removing response.addHTTPHeaderField(HTTPHeaderName::AccessControlAllowOrigin, "*"); too. Can you add a comment either way to explain why PDF.js needs this? >>> Source/WebCore/platform/LegacySchemeRegistry.cpp:148 >>> #endif >> >> If you change this from std::array to std::vector, then you can check the runtime setting PdfJSViewerEnabled here too. > > And same here. ditto here, maybe a macro type thing. >>> Source/WebKit/UIProcess/WebPageProxy.cpp:1915 >>> + return true; >> >> @ntim: Should we have done this for our port? Will this break something for Cocoa ports if we choose not to ship with it enabled by default yet? > > Note: this requires owner approval to land. Brent, Cocoa is covered by the 5 lines above: #if PLATFORM(COCOA) // On Mac, we can show PDFs. if (MIMETypeRegistry::isPDFOrPostScriptMIMEType(mimeType) && !WebProcessPool::omitPDFSupport() return true; #endif // PLATFORM(COCOA) I think we should add this, but put it above the Cocoa check, although I'm not sure if respecting the `WebProcessPool::omitPDFSupport()` pref is important in the context of PDF.js. >>> Source/WebKit/WebProcess/WebPage/WebPage.cpp:6460 >>> + return true; >> >> Ditto: > > This too. I'm not sure what this adds? Can you describe why you're adding this?
Michael Catanzaro
Comment 11 Tuesday, March 8, 2022 2:19:01 PM UTC
(In reply to Carlos Garcia Campos from comment #7) > It's enabled unconditionally for GTK and WPE. We can discuss if we want to > add api to enable/disable it, though. But for now, I think it's easier this > way. Cocoa doesn't do this because they have : > > // Allow images to load. > response.addHTTPHeaderField(HTTPHeaderName::AccessControlAllowOrigin, "*"); > > in the bundle resource loader. I did it this way for consistency with > gresources, since webkit-pdfjs-viewer are gresources too in the end. Hmmm, OK, I see why you did this. I'm not actually certain, but I *suspect* this wasn't the best approach. There are a few other considerations: * A webkit-pdf-viewer: URI has a host component and therefore it should probably not be treated as a unique origin, i.e. shouldTreatAsUniqueOrigin should return false (...right?). Your change accomplishes this only for GTK and WPE, but leaves Cocoa ports with the opposite result. Understanding that Cocoa ports don't *need* it for PDF.js to work, I think they still *want* this. (I assume LegacySchemeRegistry::schemeIsHandledBySchemeHandler will always return true for webkit-pdf-viewer, causing shouldTreatAsUniqueOrigin to return false if we don't have a special case for webkit-pdfjs-viewer URIs.) * Matching the Cocoa port behavior allows us to share the same set of bugs between ports, which is valuable. * You mention you are seeing CORS errors in the console, but surely those will go away if we add AccessControlAllowOrigin=* to match Cocoa ports, so we should probably do that too, right? In fact, I wonder if maybe we should do that for all GResources. We spent ages trying to make PDF.js dodge CORS, and even wound up adding new public API webkit_web_view_set_cors_allowlist() to facilitate it. None of that would have been needed if GResources just did the right thing. Resource content should be trusted and allowed to embed content from any origin it wants, right? So even if your way works, I think it'd be nice to match behavior between ports anyway. Normally we use #if only when something either *has* to be different between ports, or at least really *wants* to be different. I don't think that's the case here. I won't block you on this, though (you still have r+ regardless). > I think it's unlikely new file types are added in new versions. I've tried > hard to not add any file we don't need to the bundle, because it's already > big enough. If it breaks after an upgrade we can easily fix it. What files do we not need besides the *.map and the example PDF? If you want to maintain an allowlist of file extensions, then please maintain a denylist as well, and fail the build if a file doesn't match one list or the other, so we'll be sure to notice when something is wrong. It's not much extra code now, but will make things easier for us in the future if it's really needed. I'm sure whoever winds up updating PDF.js now that it has moved to WebKit is not going to want to deal with important files discarded with no warning... *especially* as that will only happen for GTK/WPE, so someone from Apple would really have to chance to avoid breaking us if there is no build failure.
Carlos Garcia Campos
Comment 12 Tuesday, March 8, 2022 2:41:19 PM UTC
(In reply to Michael Catanzaro from comment #11) > (In reply to Carlos Garcia Campos from comment #7) > > It's enabled unconditionally for GTK and WPE. We can discuss if we want to > > add api to enable/disable it, though. But for now, I think it's easier this > > way. Cocoa doesn't do this because they have : > > > > // Allow images to load. > > response.addHTTPHeaderField(HTTPHeaderName::AccessControlAllowOrigin, "*"); > > > > in the bundle resource loader. I did it this way for consistency with > > gresources, since webkit-pdfjs-viewer are gresources too in the end. > > Hmmm, OK, I see why you did this. I'm not actually certain, but I *suspect* > this wasn't the best approach. There are a few other considerations: > > * A webkit-pdf-viewer: URI has a host component and therefore it should > probably not be treated as a unique origin, i.e. shouldTreatAsUniqueOrigin > should return false (...right?). Your change accomplishes this only for GTK > and WPE, but leaves Cocoa ports with the opposite result. Understanding that > Cocoa ports don't *need* it for PDF.js to work, I think they still *want* > this. (I assume LegacySchemeRegistry::schemeIsHandledBySchemeHandler will > always return true for webkit-pdf-viewer, causing shouldTreatAsUniqueOrigin > to return false if we don't have a special case for webkit-pdfjs-viewer > URIs.) > > * Matching the Cocoa port behavior allows us to share the same set of bugs > between ports, which is valuable. > > * You mention you are seeing CORS errors in the console, but surely those > will go away if we add AccessControlAllowOrigin=* to match Cocoa ports, so > we should probably do that too, right? In fact, I wonder if maybe we should > do that for all GResources. We spent ages trying to make PDF.js dodge CORS, > and even wound up adding new public API webkit_web_view_set_cors_allowlist() > to facilitate it. None of that would have been needed if GResources just did > the right thing. Resource content should be trusted and allowed to embed > content from any origin it wants, right? No, the initial CORS errors prevented the resourced from being loaded, those are fixed either adding the HTTP header or making the scheme as not unique. The errors I'0m still seeing happen with the HTTP header too. I'll submit a new bug for those. > So even if your way works, I think it'd be nice to match behavior between > ports anyway. Normally we use #if only when something either *has* to be > different between ports, or at least really *wants* to be different. I don't > think that's the case here. I won't block you on this, though (you still > have r+ regardless). I'm fine switching to cocoa behavior, I haver to check it also works for the inspector resources. > > I think it's unlikely new file types are added in new versions. I've tried > > hard to not add any file we don't need to the bundle, because it's already > > big enough. If it breaks after an upgrade we can easily fix it. > > What files do we not need besides the *.map and the example PDF? If you want > to maintain an allowlist of file extensions, then please maintain a denylist > as well, and fail the build if a file doesn't match one list or the other, > so we'll be sure to notice when something is wrong. It's not much extra code > now, but will make things easier for us in the future if it's really needed. > > I'm sure whoever winds up updating PDF.js now that it has moved to WebKit is > not going to want to deal with important files discarded with no warning... > *especially* as that will only happen for GTK/WPE, so someone from Apple > would really have to chance to avoid breaking us if there is no build > failure.
Michael Catanzaro
Comment 13 Tuesday, March 8, 2022 3:23:21 PM UTC
(In reply to Carlos Garcia Campos from comment #12) > I'm fine switching to cocoa behavior, I haver to check it also works for the > inspector resources. Please read my last comment again: I think we want both behaviors for all ports, not one or the other.
Tim Nguyen (:ntim)
Comment 14 Tuesday, March 8, 2022 7:05:51 PM UTC
Comment on attachment 453949 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=453949&action=review > Tools/glib/generate-pdfjs-gresource-manifest.py:88 > + if not filename.startswith('build/') and not filename.startswith('web/'): > + alias = 'extras/' + filename > + line += ' alias="%s"' % alias > + line += '>%s</file>\n' % filename fwiw, pdfjs-extras is copied as as subfolder of the pdfjs folder called "extras" in the bundle. We don't do this type of aliasing based on the folder name.
Carlos Garcia Campos
Comment 15 Wednesday, March 9, 2022 10:57:36 AM UTC
Created attachment 454213 [details] Patch for landing
Carlos Garcia Campos
Comment 16 Thursday, March 10, 2022 9:00:34 AM UTC
Michael Catanzaro
Comment 17 Tuesday, May 17, 2022 11:43:58 PM UTC
(In reply to Michael Catanzaro from comment #11) > What files do we not need besides the *.map and the example PDF? If you want > to maintain an allowlist of file extensions, then please maintain a denylist > as well, and fail the build if a file doesn't match one list or the other, > so we'll be sure to notice when something is wrong. It's not much extra code > now, but will make things easier for us in the future if it's really needed. > > I'm sure whoever winds up updating PDF.js now that it has moved to WebKit is > not going to want to deal with important files discarded with no warning... > *especially* as that will only happen for GTK/WPE, so someone from Apple > would really have to chance to avoid breaking us if there is no build > failure. Followed up on this in bug #240536.
Note You need to log in before you can comment on or make changes to this bug.