RESOLVED FIXED 246454
PDFPlugin hangs in deallocation when data load contends with waiting for thread completion
https://bugs.webkit.org/show_bug.cgi?id=246454
Summary PDFPlugin hangs in deallocation when data load contends with waiting for thre...
Daniel Jalkut
Reported 2022-10-13 05:40:48 PDT
Created attachment 462960 [details] example source file to reproduce the hang in Safari When a PDFPlugin instance is destroyed while it's still waiting for data to load, a deadlock may occur while the data loading thread attempts to dispatch to the main thread, and the main thread waits for the data loading thread to complete. The specific contention seems to be between the code at https://github.com/WebKit/WebKit/blob/main/Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm#L879 which requires an async dispatch to the main thread, while PDFPlugin destructor on the main thread: https://github.com/WebKit/WebKit/blob/main/Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm#L1877 waits for the thread to terminate. In real world scenarios this bug was observed in NetNewsWire, when viewing RSS articles that have PDF embedded in them. The ability in NNW to rapidly flip through multiple content loads by arrowing through articles exacerbates the problem. A concise example of this bug can be produced without referencing any actual PDF data but by loading a simple HTML document like (also attached as test.html): <textarea>This page will hang if reloaded repeatedly</textarea> <object type="application/pdf" style="width:100%;height:100%"></object> And then reloading it. It seems that the particular case of not supplying a data parameter is even better at exacerbating the problem than the real-world example of a PDF that might just take a while to load. I suspect a good fix would be to build in some mechanism so that PDFPlugin::unconditionalCompleteOutstandingRangeRequests can also signal the PDF document thread to stop waiting on its local dataSemaphore. I tried to prototype such a fix but my C++ skills are not great and the right fix might be a bit more nuanced than I'm imagining. I wonder if the pattern using m_abortSemaphore in SourceBufferPrivateAVFObjC might be a good model to follow? One complication is that the PDFPlugin seems to have a few separate data loading semaphore code paths so it might not be as simple as covering the case I higlighted here.
Attachments
example source file to reproduce the hang in Safari (218 bytes, text/html)
2022-10-13 05:40 PDT, Daniel Jalkut
no flags
Daniel Jalkut
Comment 1 2022-10-13 05:44:30 PDT
The underlying NetNewsWire bug that inspired me to locate this bug: https://github.com/Ranchero-Software/NetNewsWire/issues/3716
Radar WebKit Bug Importer
Comment 2 2022-10-13 16:04:59 PDT
Darin Adler
Comment 3 2022-10-14 15:07:20 PDT
The simplest fix I can think of is this inside the for loop at https://github.com/WebKit/WebKit/blob/main/Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm#L879: if (plugin->isBeingDestroyed()) dataSemaphore.signal(); else { plugin->getResourceBytesAtPosition( ... ... } Given both destroy and the for loop are running on the main thread, I don’t think we have to do anything more sophisticated.
Daniel Jalkut
Comment 4 2022-10-14 16:22:54 PDT
Unfortunately I don't think the proposed fix will work, because the deadlock prevents the code that might signal the dataSemaphore from ever being reached. So the RunLoop::main().dispatch(...) that gets invoked right before waiting on the semaphore, is prevented from having its block run if/when the main thread is already waiting on the thread to complete in PDFPlugin::destroy. I tried to adapt the proposed solution to the semaphore wait phase, by replacing dataSemaphore.wait() with while (true) { if (monitorPluginRef->isBeingDestroyed()) { break; } if (dataSemaphore.waitFor(100_ms)) { break; } } This seems to ameliorate the problem, and seems thread safe since the isBeingDestroyed property is only ever going to be true once, and never go back to not being true. But implementing this solution led me to other nuanced issues with exceptions being thrown when the main runloop based code did finally get dispatched, presumably because the plugin is truly gone by that point. If I put the ->isBeingDestroyed check both inside the main runloop closure and in the semaphore wait construct as above, it seems to escape the code paths without hanging but eventually crashes in an assertion in WebPage::SandboxExtensionTracker::didStartProvisionalLoad. Not really sure how that is connected, but it's a good example of why this is difficult for me to pursue unless we can come up with a "silver bullet" like Darin tried to do here! I think this line of thinking is the right idea: how to ensure when destroying the plugin that any current OR upcoming attempt to wait on data will be ignored, without leaving dangling closures that might have trouble with that.
Darin Adler
Comment 5 2022-10-14 16:40:05 PDT
I’ll keep thinking about this although eventually we will want to await the arrival of an expert when I almost certainly don’t succeed. One thing that probably isn’t right about the adapted code above is that it’s inherently racy to access isBeingDestroyed from a non-main thread. And of course we don’t want that type of polling loop in our completed solution.
Chris Dumez
Comment 6 2024-03-04 13:19:23 PST
EWS
Comment 7 2024-03-05 12:52:20 PST
Committed 275707@main (e604d7c70127): <https://commits.webkit.org/275707@main> Reviewed commits have been landed. Closing PR #25448 and removing active labels.
Note You need to log in before you can comment on or make changes to this bug.