Bug 246454 - PDFPlugin hangs in deallocation when data load contends with waiting for thread completion
Summary: PDFPlugin hangs in deallocation when data load contends with waiting for thre...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: PDF (show other bugs)
Version: WebKit Local Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Chris Dumez
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2022-10-13 05:40 PDT by Daniel Jalkut
Modified: 2024-03-05 12:52 PST (History)
7 users (show)

See Also:


Attachments
example source file to reproduce the hang in Safari (218 bytes, text/html)
2022-10-13 05:40 PDT, Daniel Jalkut
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Jalkut 2022-10-13 05:40:48 PDT
Created attachment 462960 [details]
example source file to reproduce the hang in Safari

When a PDFPlugin instance is destroyed while it's still waiting for data to load, a deadlock may occur while the data loading thread attempts to dispatch to the main thread, and the main thread waits for the data loading thread to complete.

The specific contention seems to be between the code at

https://github.com/WebKit/WebKit/blob/main/Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm#L879

which requires an async dispatch to the main thread, while PDFPlugin destructor on the main thread:

https://github.com/WebKit/WebKit/blob/main/Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm#L1877

waits for the thread to terminate.

In real world scenarios this bug was observed in NetNewsWire, when viewing RSS articles that have PDF embedded in them. The ability in NNW to rapidly flip through multiple content loads by arrowing through articles exacerbates the problem.

A concise example of this bug can be produced without referencing any actual PDF data but by loading a simple HTML document like (also attached as test.html):

<textarea>This page will hang if reloaded repeatedly</textarea>
<object type="application/pdf" style="width:100%;height:100%"></object>                                                                                  

And then reloading it. It seems that the particular case of not supplying a data parameter is even better at exacerbating the problem than the real-world example of a PDF that might just take a while to load.

I suspect a good fix would be to build in some mechanism so that PDFPlugin::unconditionalCompleteOutstandingRangeRequests can also signal the PDF document thread to stop waiting on its local dataSemaphore. I tried to prototype such a fix but my C++ skills are not great and the right fix might be a bit more nuanced than I'm imagining.

I wonder if the pattern using m_abortSemaphore in SourceBufferPrivateAVFObjC might be a good model to follow? One complication is that the PDFPlugin seems to have a few separate data loading semaphore code paths so it might not be as simple as covering the case I higlighted here.
Comment 1 Daniel Jalkut 2022-10-13 05:44:30 PDT
The underlying NetNewsWire bug that inspired me to locate this bug: https://github.com/Ranchero-Software/NetNewsWire/issues/3716
Comment 2 Radar WebKit Bug Importer 2022-10-13 16:04:59 PDT
<rdar://problem/101147811>
Comment 3 Darin Adler 2022-10-14 15:07:20 PDT
The simplest fix I can think of is this inside the for loop at https://github.com/WebKit/WebKit/blob/main/Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm#L879:

    if (plugin->isBeingDestroyed())
        dataSemaphore.signal();
    else {
        plugin->getResourceBytesAtPosition( ...
        ...
    }

Given both destroy and the for loop are running on the main thread, I don’t think we have to do anything more sophisticated.
Comment 4 Daniel Jalkut 2022-10-14 16:22:54 PDT
Unfortunately I don't think the proposed fix will work, because the deadlock prevents the code that might signal the dataSemaphore from ever being reached. So the RunLoop::main().dispatch(...) that gets invoked right before waiting on the semaphore, is prevented from having its block run if/when the main thread is already waiting on the thread to complete in PDFPlugin::destroy.

I tried to adapt the proposed solution to the semaphore wait phase, by replacing dataSemaphore.wait() with 

    while (true) {
	if (monitorPluginRef->isBeingDestroyed()) {
            break;
	}
        if (dataSemaphore.waitFor(100_ms)) {
            break;
        }
    }

This seems to ameliorate the problem, and seems thread safe since the isBeingDestroyed property is only ever going to be true once, and never go back to not being true. But implementing this solution led me to other nuanced issues with exceptions being thrown when the main runloop based code did finally get dispatched, presumably because the plugin is truly gone by that point.

If I put the ->isBeingDestroyed check both inside the main runloop closure and in the semaphore wait construct as above, it seems to escape the code paths without hanging but eventually crashes in an assertion in WebPage::SandboxExtensionTracker::didStartProvisionalLoad. Not really sure how that is connected, but it's a good example of why this is difficult for me to pursue unless we can come up with a "silver bullet" like Darin tried to do here!

I think this line of thinking is the right idea: how to ensure when destroying the plugin that any current OR upcoming attempt to wait on data will be ignored, without leaving dangling closures that might have trouble with that.
Comment 5 Darin Adler 2022-10-14 16:40:05 PDT
I’ll keep thinking about this although eventually we will want to await the arrival of an expert when I almost certainly don’t succeed.

One thing that probably isn’t right about the adapted code above is that it’s inherently racy to access isBeingDestroyed from a non-main thread. And of course we don’t want that type of polling loop in our completed solution.
Comment 6 Chris Dumez 2024-03-04 13:19:23 PST
Pull request: https://github.com/WebKit/WebKit/pull/25448
Comment 7 EWS 2024-03-05 12:52:20 PST
Committed 275707@main (e604d7c70127): <https://commits.webkit.org/275707@main>

Reviewed commits have been landed. Closing PR #25448 and removing active labels.