Bug 210208

Summary: Fix handling non-linearized PDFs when incremental PDF loading is enabled
Product: WebKit Reporter: Brady Eidson <beidson>
Component: WebKit Misc.Assignee: Brady Eidson <beidson>
Status: RESOLVED FIXED    
Severity: Normal CC: ggaren, thorton
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch none

Brady Eidson
Reported 2020-04-08 11:55:41 PDT
Fix handling non-linearized PDFs when incremental PDF loading is enabled <rdar://problem/60619506>
Attachments
Patch (7.84 KB, patch)
2020-04-08 12:16 PDT, Brady Eidson
no flags
Patch (8.38 KB, patch)
2020-04-08 13:40 PDT, Brady Eidson
no flags
Patch (8.37 KB, patch)
2020-04-08 13:49 PDT, Brady Eidson
no flags
Brady Eidson
Comment 1 2020-04-08 12:16:39 PDT
Geoffrey Garen
Comment 2 2020-04-08 12:24:07 PDT
/Volumes/Data/worker/macOS-Mojave-Debug-Build-EWS/build/Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:1536:5: error: use of undeclared identifier 'pdfLog'
Tim Horton
Comment 3 2020-04-08 12:27:45 PDT
Comment on attachment 395852 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395852&action=review > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:142 > +static const int32_t maximumRangeRequestPosition = std::numeric_limits<int32_t>::max(); INT32_MAX is a reasonable maximum? Seems ... big.
Geoffrey Garen
Comment 4 2020-04-08 12:46:42 PDT
Comment on attachment 395852 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395852&action=review > Source/WebKit/ChangeLog:9 > + When we try to load a non-linearized PDF with PDFKit, it makes an outlandishly large range request > + to try to verify the PDF file size. Are we working around a bug in PDFKit here? If so, let's cite the Radar for that bug. > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:142 > +static const int32_t maximumRangeRequestPosition = std::numeric_limits<int32_t>::max(); Should this be off_t, since the position argument passed to dataProviderGetBytesAtPositionCallback is off_t? What is the actual request we get from PDFKit in practice? Maybe we can bump this number up, and do a direct comparison instead of a > comparison. This limit would prohibit streaming of any > 4GB PDF resource. While 4GB is very big, it doesn't strike me as impossibly big, and refusing to stream something because it was big would be a sad paradox. > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:701 > +void PDFPlugin::receivedInvalidRangeRequest() I think this code would be clearer if we named and/or commented it according to the behavior we are working around / modeling. Maybe something like maximumRangeRequestPosition => nonLinearizedPDFSentinel receivedInvalidRangeRequest => receivedNonLinearizedPDFSentinel > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:741 > + Ref<PDFPlugin> plugin = *((PDFPlugin*)info); Let's put this at the top of the function so we can remove the casting above. > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:744 > + // It's possible we previously encountered an invalid range and therefore disabled incremental loading, > + // but PDFDocument is still trying to read data using a different strategy. Does this happen? > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:750 > + auto debugPluginRef = plugin.copyRef(); I think you can just use plugin here. > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:859 > + // The main thread dispatch below removes the last reference to the PDF thread. > + // It must be the last code executed in this function. > + callOnMainThread([protectedPlugin = WTFMove(protectedPlugin)] { }); callOnMainThread can execute its function concurrently at any time; so, this is a suspicious lifetime idiom, and probably not fully correct. What are we trying to accomplish here?
Geoffrey Garen
Comment 5 2020-04-08 12:47:20 PDT
Comment on attachment 395852 [details] Patch I accidentally overwrote thorton's r+, but now it's on purpose cuz EWS is angry.
Brady Eidson
Comment 6 2020-04-08 13:04:09 PDT
(In reply to Geoffrey Garen from comment #4) > Comment on attachment 395852 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=395852&action=review > > > Source/WebKit/ChangeLog:9 > > + When we try to load a non-linearized PDF with PDFKit, it makes an outlandishly large range request > > + to try to verify the PDF file size. > > Are we working around a bug in PDFKit here? If so, let's cite the Radar for > that bug. > > > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:142 > > +static const int32_t maximumRangeRequestPosition = std::numeric_limits<int32_t>::max(); > > Should this be off_t, since the position argument passed to > dataProviderGetBytesAtPositionCallback is off_t? > > What is the actual request we get from PDFKit in practice? Maybe we can bump > this number up, and do a direct comparison instead of a > comparison. It's not constant, and it's closer to int64_t max. > This limit would prohibit streaming of any > 4GB PDF resource. While 4GB is very > big, it doesn't strike me as impossibly big, and refusing to stream > something because it was big would be a sad paradox. I can make it "bigger than 4gb" > > > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:701 > > +void PDFPlugin::receivedInvalidRangeRequest() > > I think this code would be clearer if we named and/or commented it according > to the behavior we are working around / modeling. Maybe something like > > maximumRangeRequestPosition => nonLinearizedPDFSentinel > receivedInvalidRangeRequest => receivedNonLinearizedPDFSentinel I'm fine with these > > > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:741 > > + Ref<PDFPlugin> plugin = *((PDFPlugin*)info); > > Let's put this at the top of the function so we can remove the casting above. > > > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:744 > > + // It's possible we previously encountered an invalid range and therefore disabled incremental loading, > > + // but PDFDocument is still trying to read data using a different strategy. > > Does this happen? Yes. > > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:750 > > + auto debugPluginRef = plugin.copyRef(); > > I think you can just use plugin here. Need a non-moved copy of plugin down at the bottom of the function for logging. > > > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:859 > > + // The main thread dispatch below removes the last reference to the PDF thread. > > + // It must be the last code executed in this function. > > + callOnMainThread([protectedPlugin = WTFMove(protectedPlugin)] { }); > > callOnMainThread can execute its function concurrently at any time; so, this > is a suspicious lifetime idiom, and probably not fully correct. What are we > trying to accomplish here? It's a scope exit - It only runs the functor as the enclosing scope (e.g. ::threadEntry()) finishes up. This is just moving the final "callOnMainThread" to a scope exit because there's now multiple return paths from this function.
Brady Eidson
Comment 7 2020-04-08 13:40:49 PDT
Geoffrey Garen
Comment 8 2020-04-08 13:44:09 PDT
> > callOnMainThread can execute its function concurrently at any time; so, this > > is a suspicious lifetime idiom, and probably not fully correct. What are we > > trying to accomplish here? > > It's a scope exit - It only runs the functor as the enclosing scope (e.g. > ::threadEntry()) finishes up. > > This is just moving the final "callOnMainThread" to a scope exit because > there's now multiple return paths from this function. How about changing the comment to "Keep PDFPlugin alive until the end of this function and the end of the last main thread task submitted by this function".
Brady Eidson
Comment 9 2020-04-08 13:49:36 PDT
EWS
Comment 10 2020-04-08 15:15:22 PDT
Committed r259760: <https://trac.webkit.org/changeset/259760> All reviewed patches have been landed. Closing bug and clearing flags on attachment 395863 [details].
Note You need to log in before you can comment on or make changes to this bug.