RESOLVED FIXED 210284
[GStreamer] Rework WebKitWebSrc threading
https://bugs.webkit.org/show_bug.cgi?id=210284
Summary [GStreamer] Rework WebKitWebSrc threading
Alicia Boya García
Reported 2020-04-09 09:48:01 PDT
WebKitWebSrc as it is in master has a number of race conditions leading to occasional starvation (due to cancelling the wrong request) or data corruption (due to pushing data from a cancelled request). The threading situation wasn't easy to follow, as it wasn't clear access to what members should be protected by what mutex, in what circumstances. Also, some parts of the design were also introducing addicional complexity, such as the first request being sent from the main thread whereas the rest were being sent from the streaming thread or basesrc async start. In response, this patch reworks all the locking in WebKitWebSrc to use WTF::DataMutex. This ensures all accesses to its (now explicit) protected members are locked. The two mutexes and condition variables have been simplified into one, as there was no obvious need or benefit for two of each in this case. Requests have been numbered, which allows to safely and atomically ignore results from cancelled requests, avoiding data corruption races, and makes following them in debug logs much easier. The conditions for making and cancelling requests have been simplified to a simpler and safer model: There is at most only one active request at anytime, flushes cancel the request, and the first create() call always makes the new request (both at startup and after a flush). Debug asserts and notes about the flow of operations during basesrc seeks have been provided. As this effort needed a review of the entire WebKitWebSrc, cleanups, corrections and documentation comments have been provided where appropriate. This patch introduces no visible behavior changes, just stability improvements.
Attachments
Patch (68.63 KB, patch)
2020-04-10 09:53 PDT, Alicia Boya García
no flags
Patch (68.88 KB, patch)
2020-04-15 09:02 PDT, Alicia Boya García
no flags
Patch (69.09 KB, patch)
2020-04-24 06:26 PDT, Alicia Boya García
no flags
Patch for landing (69.09 KB, patch)
2020-04-27 07:03 PDT, Alicia Boya García
no flags
Patch for landing (69.14 KB, patch)
2020-04-27 07:36 PDT, Alicia Boya García
no flags
Alicia Boya García
Comment 1 2020-04-10 09:53:17 PDT
Xabier Rodríguez Calvar
Comment 2 2020-04-13 02:51:11 PDT
Comment on attachment 396095 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=396095&action=review Other than a couple of comments, LGTM, but I would like Phil to have a look at this as well since this is a big rework. > Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:105 > - if (resource) { > - auto* client = static_cast<CachedResourceStreamingClient*>(resource->client()); > - if (client) > - client->setSourceElement(nullptr); > - > - resource->setClient(nullptr); > - } > - loader = nullptr; > + DataMutex<WebKitWebSrcPrivate::StreamingMembers>::LockedWrapper members(dataMutex); > + ASSERT(!members->resource); > + members->loader = nullptr; Is this right? It looks like we are forgetting to set the client to null. Then we are also checking for resources and then we set the loader to null? > Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:329 > - case PROP_KEEP_ALIVE: > + case PROP_KEEP_ALIVE: { > src->priv->keepAlive = g_value_get_boolean(value); > break; > + } This does not seem necessary. > Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:480 > + if (gst_pad_peer_query(GST_BASE_SRC_PAD(baseSrc), query.get())) { You seem to be sending the context query just in one way and doc says we should send it upstream and downstream, why? OTOH, there are no other GStreamer elements that are going to answer this message because this is a very specific request, I think we do nothing by performing a context query and can go directly to posting the message. > Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:826 > + RunLoop::main().dispatch([protector, resource = WTFMove(members->resource), requestNumber = members->requestNumber] { Why don't you create the protector here? If you leave it like this, you are reffing at the beginning for the function and again here.
Philippe Normand
Comment 3 2020-04-13 06:46:05 PDT
Comment on attachment 396095 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=396095&action=review >> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:480 >> + if (gst_pad_peer_query(GST_BASE_SRC_PAD(baseSrc), query.get())) { > > You seem to be sending the context query just in one way and doc says we should send it upstream and downstream, why? > > OTOH, there are no other GStreamer elements that are going to answer this message because this is a very specific request, I think we do nothing by performing a context query and can go directly to posting the message. Well this is a src element, so there's no sink pad, hence no real upstream I suppose :)
Xabier Rodríguez Calvar
Comment 4 2020-04-13 07:23:33 PDT
(In reply to Philippe Normand from comment #3) > Well this is a src element, so there's no sink pad, hence no real upstream I > suppose :) If I were dumber, I wouldn't have head already.
Philippe Normand
Comment 5 2020-04-14 04:52:56 PDT
Comment on attachment 396095 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=396095&action=review >>> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:480 >>> + if (gst_pad_peer_query(GST_BASE_SRC_PAD(baseSrc), query.get())) { >> >> You seem to be sending the context query just in one way and doc says we should send it upstream and downstream, why? >> >> OTOH, there are no other GStreamer elements that are going to answer this message because this is a very specific request, I think we do nothing by performing a context query and can go directly to posting the message. > > Well this is a src element, so there's no sink pad, hence no real upstream I suppose :) Calvaris is right, I agree there's not much point of sending a query no one would answer anyway.
Alicia Boya García
Comment 6 2020-04-15 06:56:33 PDT
Comment on attachment 396095 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=396095&action=review >> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:105 >> + members->loader = nullptr; > > Is this right? It looks like we are forgetting to set the client to null. Then we are also checking for resources and then we set the loader to null? Note the ASSERT(!members->resource). Reached this point, resource is null already. We were never entering into the if. The resource is already cancelled during UnLock(), which will be called during tear down by basesrc. On the other hand, the loader is preserved in the keep-alive case. Setting `loader` to null inside the notifyAndWait() ensures that it's destroyed from the main thread, which errs in the safe side since this is WebKit network API. >>>> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:480 >>>> + if (gst_pad_peer_query(GST_BASE_SRC_PAD(baseSrc), query.get())) { >>> >>> You seem to be sending the context query just in one way and doc says we should send it upstream and downstream, why? >>> >>> OTOH, there are no other GStreamer elements that are going to answer this message because this is a very specific request, I think we do nothing by performing a context query and can go directly to posting the message. >> >> Well this is a src element, so there's no sink pad, hence no real upstream I suppose :) > > Calvaris is right, I agree there's not much point of sending a query no one would answer anyway. We are a source element, we only have downstream indeed. Bins catch and store contexts, so after the message has been posted once -- even in a different WebKitWebSrc in the same pipeline, the query is indeed answered. (Note this is done to satisfy adaptivedemux cases where there are several HTTP source elements hanging around.) Also, this code is not new, it has just been reindented into `members.runUnlocked()`. >> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:826 >> + RunLoop::main().dispatch([protector, resource = WTFMove(members->resource), requestNumber = members->requestNumber] { > > Why don't you create the protector here? If you leave it like this, you are reffing at the beginning for the function and again here. Inertia. I could replace it with a generalized lambda capture like [protector = WTF::ensureGRef(src)] here and in every other case in the file actually.
Alicia Boya García
Comment 7 2020-04-15 09:02:37 PDT
Alicia Boya García
Comment 8 2020-04-15 09:11:53 PDT
The updated version removes the usage of MainThreadNotifier after I identified a race condition consequence of its usage: if two tasks to make an HTTP request are passed to notify in quick succession, MainThreadNotifier discards the newest one, never sending the request. Removing MainThreadNotifier usages also simplifies the WebKitWebSrc private destruction code.
Alicia Boya García
Comment 9 2020-04-24 06:26:08 PDT
Alicia Boya García
Comment 10 2020-04-24 06:34:33 PDT
Comment on attachment 397449 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=397449&action=review > Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:446 > + if (members->readPosition == members->size) { > + GST_TRACE_OBJECT(src, "just downloaded the last chunk in the file, loadFinished() is about to be called"); > + return; > + } I uploaded a revision of the patch after identifying and fixing a bug, fixed by these lines of code. Before they were causing Resource cancellation, and therefore, loadFinished() wasn't called and EOS wasn't sent, causing the video not to play because as far as the queues could know in that state, they were starved.
EWS
Comment 11 2020-04-27 06:53:49 PDT
/Volumes/Data/worker/Commit-Queue/build/LayoutTests/ChangeLog neither lists a valid reviewer nor contains the string "Unreviewed" or "Rubber stamp" (case insensitive).
Alicia Boya García
Comment 12 2020-04-27 07:03:53 PDT
Created attachment 397678 [details] Patch for landing
EWS
Comment 13 2020-04-27 07:04:30 PDT
ChangeLog entry in Source/WebCore/ChangeLog contains OOPS!.
Alicia Boya García
Comment 14 2020-04-27 07:36:22 PDT
Created attachment 397680 [details] Patch for landing
EWS
Comment 15 2020-04-27 08:30:07 PDT
Committed r260755: <https://trac.webkit.org/changeset/260755> All reviewed patches have been landed. Closing bug and clearing flags on attachment 397680 [details].
Alicia Boya García
Comment 16 2020-05-08 09:54:55 PDT
*** Bug 209719 has been marked as a duplicate of this bug. ***
Alicia Boya García
Comment 17 2020-05-12 05:41:48 PDT
*** Bug 206162 has been marked as a duplicate of this bug. ***
Alicia Boya García
Comment 18 2020-05-12 07:21:26 PDT
*** Bug 203194 has been marked as a duplicate of this bug. ***
Alicia Boya García
Comment 19 2020-05-14 01:42:32 PDT
*** Bug 204654 has been marked as a duplicate of this bug. ***
Philippe Normand
Comment 20 2020-10-05 02:07:16 PDT
Comment on attachment 397680 [details] Patch for landing View in context: https://bugs.webkit.org/attachment.cgi?id=397680&action=review > Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:-1090 > - if (response.httpStatusCode() == 200) { > - // Range request didn't have a ranged response; resetting offset. > - priv->readPosition = 0; > - } else if (response.httpStatusCode() != 206) { Why was this removed?
Note You need to log in before you can comment on or make changes to this bug.