RESOLVED FIXED 236525
PDF.js viewer should work for all kinds of URLs
https://bugs.webkit.org/show_bug.cgi?id=236525
Summary PDF.js viewer should work for all kinds of URLs
Tim Nguyen (:ntim)
Reported 2022-02-11 13:34:16 PST
Attachments
Patch (2.83 KB, patch)
2022-02-11 17:32 PST, pascoe@apple.com
no flags
Patch (7.65 KB, patch)
2022-02-14 16:10 PST, pascoe@apple.com
no flags
Patch (7.26 KB, patch)
2022-02-15 08:49 PST, pascoe@apple.com
ews-feeder: commit-queue-
Patch (6.76 KB, patch)
2022-02-15 09:42 PST, pascoe@apple.com
no flags
Patch (7.33 KB, patch)
2022-02-15 14:08 PST, pascoe@apple.com
no flags
Patch for landing (7.77 KB, patch)
2022-02-17 14:16 PST, pascoe@apple.com
ews-feeder: commit-queue-
Patch for landing (7.77 KB, patch)
2022-02-17 14:24 PST, pascoe@apple.com
no flags
Radar WebKit Bug Importer
Comment 1 2022-02-11 13:34:25 PST Comment hidden (obsolete)
pascoe@apple.com
Comment 2 2022-02-11 17:32:55 PST
Tim Nguyen (:ntim)
Comment 3 2022-02-11 23:54:11 PST
Tim Nguyen (:ntim)
Comment 4 2022-02-12 00:30:48 PST
Comment on attachment 451762 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=451762&action=review This is a great start, thanks! I know this is just a WIP but here are a few initial comments. > Source/WebCore/ChangeLog:12 > + by calling the PDFJS viewer's open function. More work is needed to > + disable loading of the default option and potentially present the > + data as a PDFDataRangeTransport. I think we might want to remove the `?file=` parameter logic from `PDFDocument::createDocumentStructure()` with this change as well. Not sure how to disable the example PDF from loading, but will look into that. > Source/WebCore/html/PDFDocument.cpp:114 > if (event.type() == eventNames().loadEvent) { If the PDF finishes loading after the iframe finishes loading, this will be problematic. `PDFDocumentParser::finish( )` or `PDFDocument::finishedParsing()` is a place where we are guaranteed to finish parsing the PDF bytes. I think we need a flag for when the PDF loads, and one for when the iframe loads, then we execute this code when both of these are satisfied (whichever is first) to avoid a race. Also, PDFJS has its own `webviewerloaded` event too, so not sure if we need to care about that. > Source/WebCore/html/PDFDocument.cpp:117 > + auto* iframe = dynamicDowncast<HTMLIFrameElement>(event.target()); nit: this is already defined above. > Source/WebCore/html/PDFDocument.cpp:132 > + JSLockHolder lock(globalObject->vm()); > + auto callData = JSC::getCallData(globalObject->vm(), openFunction); > + ASSERT(callData.type != JSC::CallData::Type::None); > + JSC::MarkedArgumentBuffer arguments; > + auto nativeBuffer = m_document->loader()->mainResourceData()->tryCreateArrayBuffer(); > + ArrayBufferSharingMode sharingMode = nativeBuffer->sharingMode(); > + arguments.append(JSArrayBuffer::create(globalObject->vm(), globalObject->arrayBufferStructure(sharingMode), WTFMove(nativeBuffer))); > + ASSERT(!arguments.hasOverflowed()); > + > + JSC::call(globalObject, openFunction, callData, pdfJSApplication, arguments); I wonder if we could run window.postMessage() with a message containing the array buffer instead, and have `content-script.js` listen for the message and do the rest. It might be slightly more convoluted, but I want to do this for a few reasons: - `PDFViewerApplication.open` can break with PDF.js API updates, and ideally I'd like this type of code to stay contained in the `pdfjs-extras` folder (with `PDFDocument` should staying clear from that) - Might be slightly easier if we have to wait for `webviewerloaded` as well, or do anything else with the buffer later on.
pascoe@apple.com
Comment 5 2022-02-14 12:21:45 PST
(In reply to Tim Nguyen (:ntim) from comment #4) > Comment on attachment 451762 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=451762&action=review > > This is a great start, thanks! I know this is just a WIP but here are a few > initial comments. > > > Source/WebCore/ChangeLog:12 > > + by calling the PDFJS viewer's open function. More work is needed to > > + disable loading of the default option and potentially present the > > + data as a PDFDataRangeTransport. > > I think we might want to remove the `?file=` parameter logic from > `PDFDocument::createDocumentStructure()` with this change as well. Not sure > how to disable the example PDF from loading, but will look into that. I've found leaving just ?file=, prevents the default example pdf from loading. > > Source/WebCore/html/PDFDocument.cpp:114 > > if (event.type() == eventNames().loadEvent) { > > If the PDF finishes loading after the iframe finishes loading, this will be > problematic. > > `PDFDocumentParser::finish( )` or `PDFDocument::finishedParsing()` is a > place where we are guaranteed to finish parsing the PDF bytes. I think we > need a flag for when the PDF loads, and one for when the iframe loads, then > we execute this code when both of these are satisfied (whichever is first) > to avoid a race. > > Also, PDFJS has its own `webviewerloaded` event too, so not sure if we need > to care about that. I've tried registering a listener for that event both from the PDFDocument.cpp and within content-script.js but have been unable to get it to work. > > Source/WebCore/html/PDFDocument.cpp:117 > > + auto* iframe = dynamicDowncast<HTMLIFrameElement>(event.target()); > > nit: this is already defined above. > > > Source/WebCore/html/PDFDocument.cpp:132 > > + JSLockHolder lock(globalObject->vm()); > > + auto callData = JSC::getCallData(globalObject->vm(), openFunction); > > + ASSERT(callData.type != JSC::CallData::Type::None); > > + JSC::MarkedArgumentBuffer arguments; > > + auto nativeBuffer = m_document->loader()->mainResourceData()->tryCreateArrayBuffer(); > > + ArrayBufferSharingMode sharingMode = nativeBuffer->sharingMode(); > > + arguments.append(JSArrayBuffer::create(globalObject->vm(), globalObject->arrayBufferStructure(sharingMode), WTFMove(nativeBuffer))); > > + ASSERT(!arguments.hasOverflowed()); > > + > > + JSC::call(globalObject, openFunction, callData, pdfJSApplication, arguments); > > I wonder if we could run window.postMessage() with a message containing the > array buffer instead, and have `content-script.js` listen for the message > and do the rest. It might be slightly more convoluted, but I want to do this > for a few reasons: > > - `PDFViewerApplication.open` can break with PDF.js API updates, and ideally > I'd like this type of code to stay contained in the `pdfjs-extras` folder > (with `PDFDocument` should staying clear from that) I think PDFViewerApplication.open will stay stable based on this comment: https://github.com/mozilla/pdf.js/issues/9487#issuecomment-372036644 > - Might be slightly easier if we have to wait for `webviewerloaded` as well, > or do anything else with the buffer later on. See above comment regarding listening to webviewerloaded.
Tim Nguyen (:ntim)
Comment 6 2022-02-14 12:39:13 PST
(In reply to j_pascoe@apple.com from comment #5) > (In reply to Tim Nguyen (:ntim) from comment #4) > > Comment on attachment 451762 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=451762&action=review > > > > This is a great start, thanks! I know this is just a WIP but here are a few > > initial comments. > > > > > Source/WebCore/ChangeLog:12 > > > + by calling the PDFJS viewer's open function. More work is needed to > > > + disable loading of the default option and potentially present the > > > + data as a PDFDataRangeTransport. > > > > I think we might want to remove the `?file=` parameter logic from > > `PDFDocument::createDocumentStructure()` with this change as well. Not sure > > how to disable the example PDF from loading, but will look into that. > > I've found leaving just ?file=, prevents the default example pdf from > loading. That sounds good to me. We should probably add a comment that explains it, and also stop appending the PDF URL after this parameter too. > > > Source/WebCore/html/PDFDocument.cpp:114 > > > if (event.type() == eventNames().loadEvent) { > > > > If the PDF finishes loading after the iframe finishes loading, this will be > > problematic. > > > > `PDFDocumentParser::finish( )` or `PDFDocument::finishedParsing()` is a > > place where we are guaranteed to finish parsing the PDF bytes. I think we > > need a flag for when the PDF loads, and one for when the iframe loads, then > > we execute this code when both of these are satisfied (whichever is first) > > to avoid a race. > > > > Also, PDFJS has its own `webviewerloaded` event too, so not sure if we need > > to care about that. > I've tried registering a listener for that event both from the > PDFDocument.cpp and within content-script.js but have been unable to get it > to work. I think we could care about this specific event later. But I'd still like to avoid a race between PDFDocument::finishedParsing() and iframe.onload though. > > > Source/WebCore/html/PDFDocument.cpp:117 > > > + auto* iframe = dynamicDowncast<HTMLIFrameElement>(event.target()); > > > > nit: this is already defined above. > > > > > Source/WebCore/html/PDFDocument.cpp:132 > > > + JSLockHolder lock(globalObject->vm()); > > > + auto callData = JSC::getCallData(globalObject->vm(), openFunction); > > > + ASSERT(callData.type != JSC::CallData::Type::None); > > > + JSC::MarkedArgumentBuffer arguments; > > > + auto nativeBuffer = m_document->loader()->mainResourceData()->tryCreateArrayBuffer(); > > > + ArrayBufferSharingMode sharingMode = nativeBuffer->sharingMode(); > > > + arguments.append(JSArrayBuffer::create(globalObject->vm(), globalObject->arrayBufferStructure(sharingMode), WTFMove(nativeBuffer))); > > > + ASSERT(!arguments.hasOverflowed()); > > > + > > > + JSC::call(globalObject, openFunction, callData, pdfJSApplication, arguments); > > > > I wonder if we could run window.postMessage() with a message containing the > > array buffer instead, and have `content-script.js` listen for the message > > and do the rest. It might be slightly more convoluted, but I want to do this > > for a few reasons: > > > > - `PDFViewerApplication.open` can break with PDF.js API updates, and ideally > > I'd like this type of code to stay contained in the `pdfjs-extras` folder > > (with `PDFDocument` should staying clear from that) > I think PDFViewerApplication.open will stay stable based on this comment: > https://github.com/mozilla/pdf.js/issues/9487#issuecomment-372036644 I still find it preferable to have the JS logic contained in the content-script.js with a postMessage based system for consistency. There are more things we'd want to use postMessage() for, e.g. sending find in page messages for instance, where we may want to do more than just one function call. In general, PDFDocument should be as viewer-agnostic as possible, e.g. it should be possible to swap another viewer that watches for the same messages.
pascoe@apple.com
Comment 7 2022-02-14 16:10:20 PST
Tim Nguyen (:ntim)
Comment 8 2022-02-14 23:52:57 PST
Comment on attachment 451956 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=451956&action=review > Source/WebCore/html/PDFDocument.cpp:151 > +class ScriptInjectedEventListener final : public EventListener { > +public: > + static Ref<ScriptInjectedEventListener> create(PDFDocument& pdfDocument) { return adoptRef(*new ScriptInjectedEventListener(pdfDocument)); } > + > +private: > + ScriptInjectedEventListener(PDFDocument& pdfDocument) > + : EventListener(PDFJSScriptInjectionEventListenerType) > + , m_pdfDocument(pdfDocument) > + { > + } > + > + bool operator==(const EventListener&) const override; > + void handleEvent(ScriptExecutionContext&, Event&) override; > + > + WeakPtr<PDFDocument> m_pdfDocument; > +}; > + > +void ScriptInjectedEventListener::handleEvent(ScriptExecutionContext&, Event&) > +{ > + m_pdfDocument->setContentsScriptLoaded(true); > + if (m_pdfDocument->isFullyParsed()) > + m_pdfDocument->sendBlob(); > +} > + > +bool ScriptInjectedEventListener::operator==(const EventListener& other) const > +{ > + // All PDFJSScriptInjectionEventListenerType objects compare as equal; OK since there is only one per document. > + return other.type() == PDFJSScriptInjectionEventListenerType; > +} You can just re-use the other event listener, and check what the event target is in handleEvent. I don't think we want to add more event listener types. ImageDocument uses one event listener for 2 different events for instance. > Source/WebCore/html/PDFDocument.cpp:187 > + m_iframe = iframe.ptr(); `m_iframe = WTFMove(iframe)` > Source/WebCore/html/PDFDocument.cpp:201 > + if (this->m_contentScriptsLoaded) > + this->sendBlob(); nit: omit `this->` > Source/WebCore/html/PDFDocument.cpp:209 > + auto openFunction = frame->script().executeScriptIgnoringException("(data) => PDFJSContentScript.open(data)").getObject(); nit: auto openFunction = frame->script().executeScriptIgnoringException("PDFJSContentScript.open").getObject(); > Source/WebCore/html/PDFDocument.cpp:210 > + auto globalObject = this->globalObject(); nit: you can omit this-> here and below. > Source/WebCore/html/PDFDocument.cpp:215 > + JSC::MarkedArgumentBuffer arguments; `JSC::` can be omitted since you have `using namespace JSC` > Source/WebCore/html/PDFDocument.cpp:225 > void PDFDocument::injectContentScript(Document& contentDocument) You can stop passing the contentDocument around if you store a member with m_iframe. > Source/WebCore/html/PDFDocument.h:45 > + void sendBlob(); nit: sendPDFArrayBuffer(); ?
pascoe@apple.com
Comment 9 2022-02-15 08:49:54 PST
Tim Nguyen (:ntim)
Comment 10 2022-02-15 09:14:42 PST
Comment on attachment 452029 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=452029&action=review > Source/WebCore/dom/EventListener.h:50 > + PDFJSScriptInjectionEventListenerType, nit: remove this > Source/WebCore/html/PDFDocument.cpp:112 > + ASSERT(iframe, "Should have event target"); You can probably remove this assert > Source/WebCore/html/PDFDocument.cpp:118 > } and add: ``` else ASSERT_NOT_REACHED(); ``` > Source/WebCore/html/PDFDocument.cpp:179 > +void PDFDocument::sendPDFArrayBuffer() I still would like to use postMessage() here if possible. > Source/WebCore/html/PDFDocument.cpp:185 > + auto globalObject = this->globalObject(); redundant this-> > Source/WebCore/html/PDFDocument.cpp:191 > + auto nativeBuffer = this->loader()->mainResourceData()->tryCreateArrayBuffer(); here too
Tim Nguyen (:ntim)
Comment 11 2022-02-15 09:29:31 PST
Comment on attachment 452029 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=452029&action=review >> Source/WebCore/html/PDFDocument.cpp:185 >> + auto globalObject = this->globalObject(); > > redundant this-> oops this is not actually redundant, the variable name matches.
pascoe@apple.com
Comment 12 2022-02-15 09:42:41 PST
Chris Dumez
Comment 13 2022-02-15 11:16:52 PST
Comment on attachment 452040 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=452040&action=review > Source/WebCore/html/PDFDocument.cpp:152 > auto iframe = HTMLIFrameElement::create(HTMLNames::iframeTag, *this); Any reason we are not setting m_iframe right away here. > Source/WebCore/html/PDFDocument.cpp:155 > body->appendChild(iframe); Note that this may fire events and thus run JS AFAIK. > Source/WebCore/html/PDFDocument.cpp:157 > auto listener = PDFDocumentEventListener::create(*this); ditto. > Source/WebCore/html/PDFDocument.cpp:160 > + m_iframe = WTFMove(iframe); instead of waiting all the way until here? > Source/WebCore/html/PDFDocument.cpp:182 > + auto* frame = downcast<Frame>(m_iframe->contentWindow()->frame()); Why isn't this just calling m_iframe->contentFrame() ? Seems odd to go via the WindowProxy. > Source/WebCore/html/PDFDocument.cpp:187 > + auto callData = JSC::getCallData(globalObject->vm(), openFunction); Why all the JSC:: in this function given that you use 'using namespace JSC;' above? > Source/WebCore/html/PDFDocument.cpp:204 > + auto script = HTMLScriptElement::create(scriptTag, *contentDocument, false, false); Last parameter is optional and can be omitted. > Source/WebCore/html/PDFDocument.h:47 > + bool isFullyParsed() { return m_finishedParsing; } Should be marked as const. Also would be nice if the getter and the data member names would match. > Source/WebCore/html/PDFDocument.h:56 > + bool m_finishedParsing { false }; Boolean variables need to start with a prefix (such as "is") as per WebKit coding style. > Source/WebCore/html/PDFDocument.h:57 > + bool m_contentScriptsLoaded { false }; ditto.
Alexey Shvayka
Comment 14 2022-02-15 11:30:52 PST
Comment on attachment 452040 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=452040&action=review > Source/WebCore/html/PDFDocument.cpp:185 > + nit: auto& vm = globalObject->vm(); > Source/WebCore/html/PDFDocument.cpp:190 > + auto nativeBuffer = loader()->mainResourceData()->tryCreateArrayBuffer(); Should we handle tryCreateArrayBuffer() returning nullptr?
Tim Nguyen (:ntim)
Comment 15 2022-02-15 12:05:38 PST
Comment on attachment 452040 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=452040&action=review > Source/WebCore/ChangeLog:3 > + Collect bytes during parsing and pass data to PDF.js as a blob Can you correct the changelog title to be descriptive of what this patch actually does? (perhaps the bug/radar titles should be changed too)
Yusuke Suzuki
Comment 16 2022-02-15 12:13:03 PST
Comment on attachment 452040 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=452040&action=review > Source/WebCore/html/PDFDocument.cpp:184 > + auto openFunction = frame->script().executeScriptIgnoringException("PDFJSContentScript.open").getObject(); > + auto globalObject = this->globalObject(); If we have globalObject, then how about just getting a property instead of evaluating a script here? auto pdfjs = globalObject->get(globalObject, Identifier::fromString(vm, "PDFJSContentScript")); .... exception handling .... auto open = asObject(pdfjs)->get(globalObject, Identifier::fromString(vm, "open")); .... exception handling .... The above is much faster and efficient than evaluating script snippet. If we do that, do it after taking a JSLock.
pascoe@apple.com
Comment 17 2022-02-15 14:08:44 PST
Tim Horton
Comment 18 2022-02-15 14:31:57 PST
(In reply to Tim Nguyen (:ntim) from comment #15) > Comment on attachment 452040 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=452040&action=review > > > Source/WebCore/ChangeLog:3 > > + Collect bytes during parsing and pass data to PDF.js as a blob > > Can you correct the changelog title to be descriptive of what this patch > actually does? (perhaps the bug/radar titles should be changed too) Ideally the bug title would describe the problem, not the solution.
Tim Nguyen (:ntim)
Comment 19 2022-02-15 15:47:15 PST
Comment on attachment 452089 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=452089&action=review > Source/WebCore/html/PDFDocument.cpp:192 > + auto nativeBuffer = loader()->mainResourceData()->tryCreateArrayBuffer(); > + if (nativeBuffer == nullptr) { auto arrayBuffer = ... if (!arrayBuffer) { > Source/WebCore/html/PDFDocument.cpp:197 > + ArrayBufferSharingMode sharingMode = nativeBuffer->sharingMode(); `auto sharingMode =` should work here.
Tim Nguyen (:ntim)
Comment 20 2022-02-15 17:09:15 PST
Comment on attachment 452089 [details] Patch r=me for this initial iteration. We should switch to postMessage in bug 236668, since the iframe and the PDFDocument origins are not the same.
Tim Nguyen (:ntim)
Comment 21 2022-02-15 17:10:11 PST
Comment on attachment 452089 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=452089&action=review > Source/WebCore/html/PDFDocument.cpp:181 > + // FIXME: https://bugs.webkit.org/show_bug.cgi?id=236668 - Use globalObject->get // FIXME: webkit.org/b/236668 - Use postMessage instead.
Tim Nguyen (:ntim)
Comment 22 2022-02-15 17:13:36 PST
Comment on attachment 452089 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=452089&action=review > Source/WebCore/ChangeLog:15 > + * html/PDFDocument.cpp: > + (WebCore::PDFDocumentEventListener::handleEvent): The changelog needs to be regenerated, there are more functions that have been changed since.
Tim Nguyen (:ntim)
Comment 23 2022-02-15 17:13:39 PST Comment hidden (duplicate, obsolete)
pascoe@apple.com
Comment 24 2022-02-17 14:16:52 PST
Created attachment 452432 [details] Patch for landing
pascoe@apple.com
Comment 25 2022-02-17 14:24:41 PST
Created attachment 452433 [details] Patch for landing
EWS
Comment 26 2022-02-17 16:29:16 PST
Committed r290089 (247444@main): <https://commits.webkit.org/247444@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 452433 [details].
Note You need to log in before you can comment on or make changes to this bug.