Bug 260796

Summary: [GStreamer] Crash after 10 seconds on watchdog thread due to deadlock deep inside GStreamer
Product: WebKit Reporter: Michael Catanzaro <mcatanzaro>
Component: MediaAssignee: Philippe Normand <philn>
Status: RESOLVED FIXED    
Severity: Normal CC: mcatanzaro, philn, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: PC   
OS: Linux   
See Also: https://bugs.webkit.org/show_bug.cgi?id=259504
https://bugs.webkit.org/show_bug.cgi?id=257551
https://bugs.webkit.org/show_bug.cgi?id=264824
Attachments:
Description Flags
All-threads backtrace and full all-threads backtrace
none
Newer backtrace
none
Backtrace of similar issue none

Description Michael Catanzaro 2023-08-28 08:00:28 PDT
Created attachment 467457 [details]
All-threads backtrace and full all-threads backtrace

Here's another crash after 10 seconds on the web process watchdog thread due to a deadlock deep inside GStreamer that has locked up the main thread. This is using Ephy Tech Preview with WebKitGTK 2.41.91 and GStreamer 1.22.5. This one looks different from the various other bugs I've reported about GStreamer deadlocking the main thread.

The top frames on the main thread are:

#0  futex_wait (private=0, expected=2, futex_word=0x7fe814223b50) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x7fe814223b50, private=0) at lowlevellock.c:49
#2  0x00007fe900092527 in lll_mutex_lock_optimized (mutex=0x7fe814223b50) at pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x7fe814223b50) at pthread_mutex_lock.c:128
#4  0x00007fe8fbf1658d in g_rec_mutex_lock (mutex=mutex@entry=0x7fe8142239b0) at ../glib/gthread-posix.c:397
#5  0x00007fe8fcb4b90f in gst_pad_pause_task (pad=0x7fe814223940 [GstPad|sink]) at ../gst/gstpad.c:6396
#6  0x00007fe850726307 in gst_qtdemux_do_seek (event=0x55d65103c4e0 [GstEvent], pad=0x7fe808002bc0 [GstPad|video_0], qtdemux=0x7fe814223370 [GstQTDemux|qtdemux0]) at ../gst/isomp4/qtdemux.c:1536
#7  gst_qtdemux_handle_src_event (pad=0x7fe808002bc0 [GstPad|video_0], parent=0x7fe814223370 [GstQTDemux|qtdemux0], event=0x55d65103c4e0 [GstEvent]) at ../gst/isomp4/qtdemux.c:1683
#8  0x00007fe8fcb4762d in gst_pad_send_event_unchecked (pad=pad@entry=0x7fe808002bc0 [GstPad|video_0], event=event@entry=0x55d65103c4e0 [GstEvent], type=<optimized out>, type@entry=GST_PAD_PROBE_TYPE_EVENT_UPSTREAM) at ../gst/gstpad.c:5939
#9  0x00007fe8fcb47d13 in gst_pad_push_event_unchecked (pad=pad@entry=0x7fe80800aa80 [GstMultiQueuePad|sink_0], event=event@entry=0x55d65103c4e0 [GstEvent], type=<optimized out>, type@entry=GST_PAD_PROBE_TYPE_EVENT_UPSTREAM) at ../gst/gstpad.c:5572
#10 0x00007fe8fcb4b186 in gst_pad_push_event (pad=pad@entry=0x7fe80800aa80 [GstMultiQueuePad|sink_0], event=event@entry=0x55d65103c4e0 [GstEvent]) at ../gst/gstpad.c:5711

Another thread that's also using GstPad:

Thread 28 (Thread 0x7fe8227fc6c0 (LWP 99)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007fe8fbf172fd in g_cond_wait (cond=cond@entry=0x7fe818028528, mutex=mutex@entry=0x7fe818028518) at ../glib/gthread-posix.c:1552
#2  0x00007fe8683a45a3 in gst_download_buffer_wait_for_data (length=<optimized out>, offset=<optimized out>, dlbuf=<optimized out>) at ../plugins/elements/gstdownloadbuffer.c:725
#3  gst_download_buffer_read_buffer (dlbuf=dlbuf@entry=0x7fe818028340 [GstDownloadBuffer|downloadbuffer0], offset=offset@entry=193539, length=<optimized out>, length@entry=16, buffer=buffer@entry=0x7fe8227fb1e8) at ../plugins/elements/gstdownloadbuffer.c:808
#4  0x00007fe8683a8009 in gst_download_buffer_get_range (pad=<optimized out>, parent=0x7fe818028340 [GstDownloadBuffer|downloadbuffer0], offset=193539, length=16, buffer=0x7fe8227fb1e8) at ../plugins/elements/gstdownloadbuffer.c:1651
#5  0x00007fe8fcb49f6b in gst_pad_get_range_unchecked (pad=pad@entry=0x7fe818028b90 [GstPad|src], offset=offset@entry=193539, size=size@entry=16, buffer=buffer@entry=0x7fe8227fb2b8) at ../gst/gstpad.c:4948
#6  0x00007fe8fcb4a8da in gst_pad_pull_range (pad=0x7fe8180257a0 [GstGhostPad|sink], offset=193539, size=16, buffer=0x7fe8227fb3a8) at ../gst/gstpad.c:5193
#7  0x00007fe8fcb2adb7 in gst_proxy_pad_getrange_default (pad=<optimized out>, parent=<optimized out>, offset=193539, size=16, buffer=0x7fe8227fb3a8) at ../gst/gstghostpad.c:185
#8  0x00007fe8fcb49f6b in gst_pad_get_range_unchecked (pad=pad@entry=0x7fe818025b50 [GstProxyPad|proxypad10], offset=offset@entry=193539, size=size@entry=16, buffer=buffer@entry=0x7fe8227fb478) at ../gst/gstpad.c:4948
#9  0x00007fe8fcb4a8da in gst_pad_pull_range (pad=0x7fe818023c80 [GstPad|sink], offset=193539, size=16, buffer=0x7fe8227fb538) at ../gst/gstpad.c:5193
#10 0x00007fe8fcb49f6b in gst_pad_get_range_unchecked (pad=pad@entry=0x7fe818024180 [GstPad|src], offset=offset@entry=193539, size=size@entry=16, buffer=buffer@entry=0x7fe8227fb608) at ../gst/gstpad.c:4948

Another thread:

Thread 19 (Thread 0x7fe8217fa6c0 (LWP 101)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007fe8fbf172fd in g_cond_wait (cond=cond@entry=0x7fe810006db8, mutex=mutex@entry=0x7fe810006da8) at ../glib/gthread-posix.c:1552
#2  0x00007fe8683a45a3 in gst_download_buffer_wait_for_data (length=<optimized out>, offset=<optimized out>, dlbuf=<optimized out>) at ../plugins/elements/gstdownloadbuffer.c:725
#3  gst_download_buffer_read_buffer (dlbuf=dlbuf@entry=0x7fe810006bd0 [GstDownloadBuffer|downloadbuffer1], offset=offset@entry=1391565, length=<optimized out>, length@entry=371, buffer=buffer@entry=0x7fe8217f9238) at ../plugins/elements/gstdownloadbuffer.c:808
#4  0x00007fe8683a8009 in gst_download_buffer_get_range (pad=<optimized out>, parent=0x7fe810006bd0 [GstDownloadBuffer|downloadbuffer1], offset=1391565, length=371, buffer=0x7fe8217f9238) at ../plugins/elements/gstdownloadbuffer.c:1651
#5  0x00007fe8fcb49f6b in gst_pad_get_range_unchecked (pad=pad@entry=0x7fe8100073b0 [GstPad|src], offset=offset@entry=1391565, size=size@entry=371, buffer=buffer@entry=0x7fe8217f9308) at ../gst/gstpad.c:4948
#6  0x00007fe8fcb4a8da in gst_pad_pull_range (pad=0x7fe810005df0 [GstGhostPad|sink], offset=1391565, size=371, buffer=0x7fe8217f93f8) at ../gst/gstpad.c:5193
#7  0x00007fe8fcb2adb7 in gst_proxy_pad_getrange_default (pad=<optimized out>, parent=<optimized out>, offset=1391565, size=371, buffer=0x7fe8217f93f8) at ../gst/gstghostpad.c:185
#8  0x00007fe8fcb49f6b in gst_pad_get_range_unchecked (pad=pad@entry=0x7fe810006170 [GstProxyPad|proxypad12], offset=offset@entry=1391565, size=size@entry=371, buffer=buffer@entry=0x7fe8217f94c8) at ../gst/gstpad.c:4948
#9  0x00007fe8fcb4a8da in gst_pad_pull_range (pad=0x7fe810004070 [GstPad|sink], offset=1391565, size=371, buffer=0x7fe8217f9588) at ../gst/gstpad.c:5193
#10 0x00007fe8fcb49f6b in gst_pad_get_range_unchecked (pad=pad@entry=0x7fe810004570 [GstPad|src], offset=offset@entry=1391565, size=size@entry=371, buffer=buffer@entry=0x7fe8217f9658) at ../gst/gstpad.c:4948

I'll attach the all-threads backtrace first, followed by the full all-threads backtrace.
Comment 1 Philippe Normand 2023-09-01 06:08:02 PDT
If you still have the coredump, can you check which thread owns the mutex locked from thread 31?

t 31
f 2
p *mutex

and i'm interested on the "owner" field.
Comment 2 Michael Catanzaro 2023-11-14 17:11:17 PST
Looks like I failed to reply to your previous comment, oops. But fear not! because I have just hit this crash again just now. The top frames of the main thread:

Thread 21 (Thread 0x7f7a2967df40 (LWP 2)):
#0  futex_wait (private=0, expected=2, futex_word=0x7f77bc069180) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x7f77bc069180, private=0) at lowlevellock.c:49
#2  0x00007f7a316a5527 in lll_mutex_lock_optimized (mutex=0x7f77bc069180) at pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x7f77bc069180) at pthread_mutex_lock.c:128
#4  0x00007f7a2d3139ad in g_rec_mutex_lock (mutex=mutex@entry=0x7f77bc068fe0) at ../glib/gthread-posix.c:397
#5  0x00007f7a2e01f94f in gst_pad_pause_task (pad=0x7f77bc068f70 [GstPad|sink]) at ../gst/gstpad.c:6396
#6  0x00007f7974780347 in gst_qtdemux_do_seek (event=0x56401d0e2dc0 [GstEvent], pad=0x7f776000a400 [GstPad|video_0], qtdemux=0x7f77bc068b40 [GstQTDemux|qtdemux3]) at ../gst/isomp4/qtdemux.c:1536
#7  gst_qtdemux_handle_src_event (pad=0x7f776000a400 [GstPad|video_0], parent=0x7f77bc068b40 [GstQTDemux|qtdemux3], event=0x56401d0e2dc0 [GstEvent]) at ../gst/isomp4/qtdemux.c:1683
#8  0x00007f7a2e01b66d in gst_pad_send_event_unchecked (pad=pad@entry=0x7f776000a400 [GstPad|video_0], event=event@entry=0x56401d0e2dc0 [GstEvent], type=<optimized out>, type@entry=GST_PAD_PROBE_TYPE_EVENT_UPSTREAM) at ../gst/gstpad.c:5939
#9  0x00007f7a2e01bd53 in gst_pad_push_event_unchecked (pad=pad@entry=0x7f776000ed70 [GstMultiQueuePad|sink_0], event=event@entry=0x56401d0e2dc0 [GstEvent], type=<optimized out>, type@entry=GST_PAD_PROBE_TYPE_EVENT_UPSTREAM) at ../gst/gstpad.c:5572
#10 0x00007f7a2e01f1c6 in gst_pad_push_event (pad=pad@entry=0x7f776000ed70 [GstMultiQueuePad|sink_0], event=event@entry=0x56401d0e2dc0 [GstEvent]) at ../gst/gstpad.c:5711

And here is that owner field you were interested in:

(gdb) f 2
#2  0x00007f7a316a5527 in lll_mutex_lock_optimized (mutex=0x7f77bc069180) at pthread_mutex_lock.c:48
48	    lll_lock (mutex->__data.__lock, private);
(gdb) p *mutex
$1 = {__data = {__lock = 2, __count = 1, __owner = 367, __nusers = 1, __kind = 1, __spins = 0, __elision = 0, 
    __list = {__prev = 0x0, __next = 0x0}}, 
  __size = "\002\000\000\000\001\000\000\000o\001\000\000\001\000\000\000\001", '\000' <repeats 22 times>, 
  __align = 4294967298}

This is a really cool trick. I had no clue you could do that. Anyway, owner is 367, so here's the backtrace for that thread, which is blocking on a condition variable in gst_download_buffer_wait_for_data():

Thread 20 (Thread 0x7f7767fff6c0 (LWP 367)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007f7a2d31471d in g_cond_wait (cond=cond@entry=0x7f77d0004d48, mutex=mutex@entry=0x7f77d0004d38) at ../glib/gthread-posix.c:1552
#2  0x00007f79b80a65a3 in gst_download_buffer_wait_for_data (length=<optimized out>, offset=<optimized out>, dlbuf=<optimized out>) at ../plugins/elements/gstdownloadbuffer.c:725
#3  gst_download_buffer_read_buffer (dlbuf=dlbuf@entry=0x7f77d0004b60 [GstDownloadBuffer|downloadbuffer3], offset=offset@entry=3156835, length=<optimized out>, length@entry=36802, buffer=buffer@entry=0x7f7767ffe3d8) at ../plugins/elements/gstdownloadbuffer.c:808
#4  0x00007f79b80aa009 in gst_download_buffer_get_range (pad=<optimized out>, parent=0x7f77d0004b60 [GstDownloadBuffer|downloadbuffer3], offset=3156835, length=36802, buffer=0x7f7767ffe3d8) at ../plugins/elements/gstdownloadbuffer.c:1651
#5  0x00007f7a2e01dfab in gst_pad_get_range_unchecked (pad=pad@entry=0x7f77d00336b0 [GstPad|src], offset=offset@entry=3156835, size=size@entry=36802, buffer=buffer@entry=0x7f7767ffe4a8) at ../gst/gstpad.c:4948
#6  0x00007f7a2e01e91a in gst_pad_pull_range (pad=0x7f77d0001a60 [GstGhostPad|sink], offset=3156835, size=36802, buffer=0x7f7767ffe598) at ../gst/gstpad.c:5193
#7  0x00007f7a2dffedf7 in gst_proxy_pad_getrange_default (pad=<optimized out>, parent=<optimized out>, offset=3156835, size=36802, buffer=0x7f7767ffe598) at ../gst/gstghostpad.c:185
#8  0x00007f7a2e01dfab in gst_pad_get_range_unchecked (pad=pad@entry=0x7f77d0012060 [GstProxyPad|proxypad54], offset=offset@entry=3156835, size=size@entry=36802, buffer=buffer@entry=0x7f7767ffe668) at ../gst/gstpad.c:4948
#9  0x00007f7a2e01e91a in gst_pad_pull_range (pad=0x7f77d0000bb0 [GstPad|sink], offset=3156835, size=36802, buffer=0x7f7767ffe728) at ../gst/gstpad.c:5193
#10 0x00007f7a2e01dfab in gst_pad_get_range_unchecked (pad=pad@entry=0x7f77d002be20 [GstPad|src], offset=offset@entry=3156835, size=size@entry=36802, buffer=buffer@entry=0x7f7767ffe7f8) at ../gst/gstpad.c:4948
#11 0x00007f7a2e01e91a in gst_pad_pull_range (pad=0x7f77bc068f70 [GstPad|sink], offset=offset@entry=3156835, size=size@entry=36802, buffer=buffer@entry=0x7f7767ffe9f0) at ../gst/gstpad.c:5193
#12 0x00007f797477e0c5 in gst_qtdemux_pull_atom (qtdemux=0x7f77bc068b40 [GstQTDemux|qtdemux3], offset=3156835, size=36802, buf=0x7f7767ffe9f0) at ../gst/isomp4/qtdemux.c:562
#13 0x00007f797478c1c1 in gst_qtdemux_loop_state_movie (qtdemux=0x7f77bc068b40 [GstQTDemux|qtdemux3]) at ../gst/isomp4/qtdemux.c:6776
#14 gst_qtdemux_loop (pad=0x7f77bc068f70 [GstPad|sink]) at ../gst/isomp4/qtdemux.c:6878
#15 0x00007f7a2e04a204 in gst_task_func (task=0x7f77bc08a180 [GstTask|qtdemux3:sink]) at ../gst/gsttask.c:384
#16 0x00007f7a2d2e5b22 in g_thread_pool_thread_proxy (data=<optimized out>) at ../glib/gthreadpool.c:336
#17 0x00007f7a2d2e5029 in g_thread_proxy (data=0x7f7a1c0034b0) at ../glib/gthread.c:817
#18 0x00007f7a316a1e39 in start_thread (arg=<optimized out>) at pthread_create.c:444
#19 0x00007f7a317298c4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
Warning: the current language does not match this frame.

(We also see gst_download_buffer_wait_for_data() twice in my first comment above.)

Guess: the main thread blocks for a download to finish on a secondary thread? That's not good.
Comment 3 Michael Catanzaro 2023-11-14 17:11:54 PST
Created attachment 468598 [details]
Newer backtrace
Comment 4 Philippe Normand 2023-11-15 02:37:21 PST
Thanks! I think this is a GStreamer bug, I filed https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3129

Let's keep this open for now though.
Comment 5 Michael Catanzaro 2024-02-07 07:24:51 PST
Created attachment 469758 [details]
Backtrace of similar issue

Here's one that's similar but not identical. Do you want a separate bug report? It occurs 100% of the time when loading cnn.com and causes the UI process to hang (presumably due to outstanding sync IPC).
Comment 6 Philippe Normand 2024-02-07 07:49:39 PST
Which thread is deadlocked with thread 36 ?
Comment 7 Michael Catanzaro 2024-02-07 08:13:26 PST
Not sure, but I guess thread 12? It uses the same GstQTDemux.
Comment 8 Philippe Normand 2024-03-04 09:10:20 PST
Pull request: https://github.com/WebKit/WebKit/pull/25423
Comment 9 EWS 2024-03-07 04:33:52 PST
Committed 275786@main (755898f07a51): <https://commits.webkit.org/275786@main>

Reviewed commits have been landed. Closing PR #25423 and removing active labels.
Comment 10 Radar WebKit Bug Importer 2024-03-07 04:34:16 PST
<rdar://problem/124203978>