Bug 210284 - [GStreamer] Rework WebKitWebSrc threading
Summary: [GStreamer] Rework WebKitWebSrc threading
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alicia Boya García
URL:
Keywords:
: 203194 204654 206162 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-04-09 09:48 PDT by Alicia Boya García
Modified: 2020-10-05 02:07 PDT (History)
10 users (show)

See Also:


Attachments
Patch (68.63 KB, patch)
2020-04-10 09:53 PDT, Alicia Boya García
no flags Details | Formatted Diff | Diff
Patch (68.88 KB, patch)
2020-04-15 09:02 PDT, Alicia Boya García
no flags Details | Formatted Diff | Diff
Patch (69.09 KB, patch)
2020-04-24 06:26 PDT, Alicia Boya García
no flags Details | Formatted Diff | Diff
Patch for landing (69.09 KB, patch)
2020-04-27 07:03 PDT, Alicia Boya García
no flags Details | Formatted Diff | Diff
Patch for landing (69.14 KB, patch)
2020-04-27 07:36 PDT, Alicia Boya García
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alicia Boya García 2020-04-09 09:48:01 PDT
WebKitWebSrc as it is in master has a number of race conditions
leading to occasional starvation (due to cancelling the wrong request)
or data corruption (due to pushing data from a cancelled request).

The threading situation wasn't easy to follow, as it wasn't clear
access to what members should be protected by what mutex, in what
circumstances. Also, some parts of the design were also introducing
addicional complexity, such as the first request being sent from the
main thread whereas the rest were being sent from the streaming thread
or basesrc async start.

In response, this patch reworks all the locking in WebKitWebSrc to use
WTF::DataMutex. This ensures all accesses to its (now explicit)
protected members are locked. The two mutexes and condition variables
have been simplified into one, as there was no obvious need or benefit
for two of each in this case.

Requests have been numbered, which allows to safely and atomically
ignore results from cancelled requests, avoiding data corruption
races, and makes following them in debug logs much easier.

The conditions for making and cancelling requests have been simplified
to a simpler and safer model: There is at most only one active request
at anytime, flushes cancel the request, and the first create() call
always makes the new request (both at startup and after a flush).
Debug asserts and notes about the flow of operations during basesrc
seeks have been provided.

As this effort needed a review of the entire WebKitWebSrc, cleanups,
corrections and documentation comments have been provided where
appropriate.

This patch introduces no visible behavior changes, just stability
improvements.
Comment 1 Alicia Boya García 2020-04-10 09:53:17 PDT
Created attachment 396095 [details]
Patch
Comment 2 Xabier Rodríguez Calvar 2020-04-13 02:51:11 PDT
Comment on attachment 396095 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=396095&action=review

Other than a couple of comments, LGTM, but I would like Phil to have a look at this as well since this is a big rework.

> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:105
> -                if (resource) {
> -                    auto* client = static_cast<CachedResourceStreamingClient*>(resource->client());
> -                    if (client)
> -                        client->setSourceElement(nullptr);
> -
> -                    resource->setClient(nullptr);
> -                }
> -                loader = nullptr;
> +                DataMutex<WebKitWebSrcPrivate::StreamingMembers>::LockedWrapper members(dataMutex);
> +                ASSERT(!members->resource);
> +                members->loader = nullptr;

Is this right? It looks like we are forgetting to set the client to null. Then we are also checking for resources and then we set the loader to null?

> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:329
> -    case PROP_KEEP_ALIVE:
> +    case PROP_KEEP_ALIVE: {
>          src->priv->keepAlive = g_value_get_boolean(value);
>          break;
> +    }

This does not seem necessary.

> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:480
> +            if (gst_pad_peer_query(GST_BASE_SRC_PAD(baseSrc), query.get())) {

You seem to be sending the context query just in one way and doc says we should send it upstream and downstream, why?

OTOH, there are no other GStreamer elements that are going to answer this message because this is a very specific request, I think we do nothing by performing a context query and can go directly to posting the message.

> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:826
> +        RunLoop::main().dispatch([protector, resource = WTFMove(members->resource), requestNumber = members->requestNumber] {

Why don't you create the protector here? If you leave it like this, you are reffing at the beginning for the function and again here.
Comment 3 Philippe Normand 2020-04-13 06:46:05 PDT
Comment on attachment 396095 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=396095&action=review

>> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:480
>> +            if (gst_pad_peer_query(GST_BASE_SRC_PAD(baseSrc), query.get())) {
> 
> You seem to be sending the context query just in one way and doc says we should send it upstream and downstream, why?
> 
> OTOH, there are no other GStreamer elements that are going to answer this message because this is a very specific request, I think we do nothing by performing a context query and can go directly to posting the message.

Well this is a src element, so there's no sink pad, hence no real upstream I suppose :)
Comment 4 Xabier Rodríguez Calvar 2020-04-13 07:23:33 PDT
(In reply to Philippe Normand from comment #3)
> Well this is a src element, so there's no sink pad, hence no real upstream I
> suppose :)

If I were dumber, I wouldn't have head already.
Comment 5 Philippe Normand 2020-04-14 04:52:56 PDT
Comment on attachment 396095 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=396095&action=review

>>> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:480
>>> +            if (gst_pad_peer_query(GST_BASE_SRC_PAD(baseSrc), query.get())) {
>> 
>> You seem to be sending the context query just in one way and doc says we should send it upstream and downstream, why?
>> 
>> OTOH, there are no other GStreamer elements that are going to answer this message because this is a very specific request, I think we do nothing by performing a context query and can go directly to posting the message.
> 
> Well this is a src element, so there's no sink pad, hence no real upstream I suppose :)

Calvaris is right, I agree there's not much point of sending a query no one would answer anyway.
Comment 6 Alicia Boya García 2020-04-15 06:56:33 PDT
Comment on attachment 396095 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=396095&action=review

>> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:105
>> +                members->loader = nullptr;
> 
> Is this right? It looks like we are forgetting to set the client to null. Then we are also checking for resources and then we set the loader to null?

Note the ASSERT(!members->resource). Reached this point, resource is null already. We were never entering into the if. The resource is already cancelled during UnLock(), which will be called during tear down by basesrc. On the other hand, the loader is preserved in the keep-alive case.

Setting `loader` to null inside the notifyAndWait() ensures that it's destroyed from the main thread, which errs in the safe side since this is WebKit network API.

>>>> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:480
>>>> +            if (gst_pad_peer_query(GST_BASE_SRC_PAD(baseSrc), query.get())) {
>>> 
>>> You seem to be sending the context query just in one way and doc says we should send it upstream and downstream, why?
>>> 
>>> OTOH, there are no other GStreamer elements that are going to answer this message because this is a very specific request, I think we do nothing by performing a context query and can go directly to posting the message.
>> 
>> Well this is a src element, so there's no sink pad, hence no real upstream I suppose :)
> 
> Calvaris is right, I agree there's not much point of sending a query no one would answer anyway.

We are a source element, we only have downstream indeed.

Bins catch and store contexts, so after the message has been posted once -- even in a different WebKitWebSrc in the same pipeline, the query is indeed answered. (Note this is done to satisfy adaptivedemux cases where there are several HTTP source elements hanging around.)

Also, this code is not new, it has just been reindented into `members.runUnlocked()`.

>> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:826
>> +        RunLoop::main().dispatch([protector, resource = WTFMove(members->resource), requestNumber = members->requestNumber] {
> 
> Why don't you create the protector here? If you leave it like this, you are reffing at the beginning for the function and again here.

Inertia. I could replace it with a generalized lambda capture like [protector = WTF::ensureGRef(src)] here and in every other case in the file actually.
Comment 7 Alicia Boya García 2020-04-15 09:02:37 PDT
Created attachment 396541 [details]
Patch
Comment 8 Alicia Boya García 2020-04-15 09:11:53 PDT
The updated version removes the usage of MainThreadNotifier after I identified a race condition consequence of its usage: if two tasks to make an HTTP request are passed to notify in quick succession, MainThreadNotifier discards the newest one, never sending the request.

Removing MainThreadNotifier usages also simplifies the WebKitWebSrc private destruction code.
Comment 9 Alicia Boya García 2020-04-24 06:26:08 PDT
Created attachment 397449 [details]
Patch
Comment 10 Alicia Boya García 2020-04-24 06:34:33 PDT
Comment on attachment 397449 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=397449&action=review

> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:446
> +    if (members->readPosition == members->size) {
> +        GST_TRACE_OBJECT(src, "just downloaded the last chunk in the file, loadFinished() is about to be called");
> +        return;
> +    }

I uploaded a revision of the patch after identifying and fixing a bug, fixed by these lines of code. Before they were causing Resource cancellation, and therefore, loadFinished() wasn't called and EOS wasn't sent, causing the video not to play because as far as the queues could know in that state, they were starved.
Comment 11 EWS 2020-04-27 06:53:49 PDT
/Volumes/Data/worker/Commit-Queue/build/LayoutTests/ChangeLog neither lists a valid reviewer nor contains the string "Unreviewed" or "Rubber stamp" (case insensitive).
Comment 12 Alicia Boya García 2020-04-27 07:03:53 PDT
Created attachment 397678 [details]
Patch for landing
Comment 13 EWS 2020-04-27 07:04:30 PDT
ChangeLog entry in Source/WebCore/ChangeLog contains OOPS!.
Comment 14 Alicia Boya García 2020-04-27 07:36:22 PDT
Created attachment 397680 [details]
Patch for landing
Comment 15 EWS 2020-04-27 08:30:07 PDT
Committed r260755: <https://trac.webkit.org/changeset/260755>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 397680 [details].
Comment 16 Alicia Boya García 2020-05-08 09:54:55 PDT
*** Bug 209719 has been marked as a duplicate of this bug. ***
Comment 17 Alicia Boya García 2020-05-12 05:41:48 PDT
*** Bug 206162 has been marked as a duplicate of this bug. ***
Comment 18 Alicia Boya García 2020-05-12 07:21:26 PDT
*** Bug 203194 has been marked as a duplicate of this bug. ***
Comment 19 Alicia Boya García 2020-05-14 01:42:32 PDT
*** Bug 204654 has been marked as a duplicate of this bug. ***
Comment 20 Philippe Normand 2020-10-05 02:07:16 PDT
Comment on attachment 397680 [details]
Patch for landing

View in context: https://bugs.webkit.org/attachment.cgi?id=397680&action=review

> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:-1090
> -        if (response.httpStatusCode() == 200) {
> -            // Range request didn't have a ranged response; resetting offset.
> -            priv->readPosition = 0;
> -        } else if (response.httpStatusCode() != 206) {

Why was this removed?