Bug 200437 - Web process crashes on cnn.com
Summary: Web process crashes on cnn.com
Status: RESOLVED MOVED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Nightly Build
Hardware: All Linux
: P2 Major
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-08-05 09:42 PDT by Yury Semikhatsky
Modified: 2020-10-28 16:07 PDT (History)
5 users (show)

See Also:


Attachments
Web process backtraces (203.93 KB, text/plain)
2019-09-19 10:02 PDT, Yury Semikhatsky
no flags Details
MiniBrowser UI process bt (14.58 KB, text/plain)
2019-09-19 10:03 PDT, Yury Semikhatsky
no flags Details
Xorg bt (1.82 KB, text/plain)
2019-09-19 10:03 PDT, Yury Semikhatsky
no flags Details
core dump of segfaulting web process (534.95 KB, text/plain)
2019-09-19 12:53 PDT, Yury Semikhatsky
no flags Details
bt on a hanging web process (300.67 KB, text/plain)
2019-09-19 13:07 PDT, Yury Semikhatsky
no flags Details
double free or corruption (fasttop) (442.40 KB, text/plain)
2019-09-23 16:43 PDT, Yury Semikhatsky
no flags Details
assertion in nouveau/pushbuf.c:723 (436.47 KB, text/plain)
2019-09-23 16:45 PDT, Yury Semikhatsky
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yury Semikhatsky 2019-08-05 09:42:18 PDT
Steps to reproduce on Ubuntu 18.04.2 LTS with WebKit@248268 

1. Start mini browser 
Tools/Scripts/run-minibrowser --gtk

2. Inspect an element on the default page (https://www.webkitgtk.org/) to open web inspector.

3. Try to navigate to https://www.cnn.com/


Result: browser hangs, some web processes start utilizing 100% CPU and segfault.


I see the following in the console:
nouveau: kernel rejected pushbuf: Device or resource busy


Same crash happens if I try to inspect cnn.com in epiphany.
Comment 1 Yury Semikhatsky 2019-08-05 09:47:09 PDT
Ok, looks like it happens regardless of web inspector, cnn.com simply crashes the  page with the following message in the console:

WebKitWebProcess: ../nouveau/pushbuf.c:723: nouveau_pushbuf_data: Assertion `kref' failed.
Comment 2 Yury Semikhatsky 2019-09-18 12:20:33 PDT
It's a problem with Nvidia proprietary driver, mitigated by compiling WebKit with -DENABLE_OPENGL=OFF for now.
Comment 3 Carlos Alberto Lopez Perez 2019-09-18 14:01:07 PDT
I can't reproduce this on Debian 10 and WebKit r250039 with a Nvidia card and the open source graphics drivers (Mesa Nouveau NV117)

(In reply to Yury Semikhatsky from comment #2)
> It's a problem with Nvidia proprietary driver

Are you sure you are using the propietary driver? Your error logs contain the word "nouveau" which is the name of the open source driver.

> mitigated by compiling WebKit
> with -DENABLE_OPENGL=OFF for now.

Instead of disabling at build-time, you can also disable it at run-time with this environment variable WEBKIT_DISABLE_COMPOSITING_MODE=1 

check: https://trac.webkit.org/wiki/EnvironmentVariables
Comment 4 Yury Semikhatsky 2019-09-19 10:02:10 PDT
(In reply to Carlos Alberto Lopez Perez from comment #3)
> I can't reproduce this on Debian 10 and WebKit r250039 with a Nvidia card
> and the open source graphics drivers (Mesa Nouveau NV117)
> 
> (In reply to Yury Semikhatsky from comment #2)
> > It's a problem with Nvidia proprietary driver
> 
> Are you sure you are using the propietary driver? Your error logs contain
> the word "nouveau" which is the name of the open source driver.
You are right, I'm using nouveau, i.e. an open source driver. Thanks for correction. I can reliably reproduce it. I also have another scenario with booking.com, symptoms are the same: browser hangs and entire desktop hangs too until I kill WebProcess.

Running top yields the following:

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
40180 yurys     20   0 95.062g 976924 164432 S 169.7  0.7   0:52.08 WebKitWebProces
40162 yurys     20   0 82.089g  91856  74060 S  43.1  0.1   0:11.79 MiniBrowser
51690 yurys     20   0  661668 137592  71772 S  20.4  0.1  21:36.06 Xorg


I collected some for each of the web process, mini browser and xorg if that helps. This one in WebProcess seems most relevant:

Thread 126 (Thread 0x7f9e6b7c6700 (LWP 40558)):
#0  0x00007f9ff2212e57 in sched_yield () at ../sysdeps/unix/syscall-template.S:78
#1  0x00007f9f6ac239c9 in ?? () from /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
#2  0x00007f9f6b205b26 in ?? () from /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
#3  0x00007f9f6ade9533 in ?? () from /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
#4  0x00007f9f6adeb376 in ?? () from /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
#5  0x00007f9f6ada76ab in ?? () from /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
#6  0x00007f9f6ada9a30 in ?? () from /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
#7  0x00007f9ff65aebf3 in _new_texture () at ../../Source/gst-plugins-base-1.16.0/gst-libs/gst/gl/gstglmemory.c:225
#8  _gl_tex_create () at ../../Source/gst-plugins-base-1.16.0/gst-libs/gst/gl/gstglmemory.c:255
#9  0x00007f9ff65afbde in _gl_mem_create () at ../../Source/gst-plugins-base-1.16.0/gst-libs/gst/gl/gstglmemorypbo.c:189
#10 0x00007f9ff659ab9c in _mem_create_gl () at ../../Source/gst-plugins-base-1.16.0/gst-libs/gst/gl/gstglbasememory.c:99
#11 0x00007f9ff65c4223 in _run_message_sync () at ../../Source/gst-plugins-base-1.16.0/gst-libs/gst/gl/gstglwindow.c:573
#12 0x00007f9ff65c4282 in _run_message_async () at ../../Source/gst-plugins-base-1.16.0/gst-libs/gst/gl/gstglwindow.c:640
#13 0x00007f9ff5075bc5 in g_main_dispatch () at ../../Source/glib-2.58.1/glib/gmain.c:3182
#14 g_main_context_dispatch () at ../../Source/glib-2.58.1/glib/gmain.c:3847
#15 0x00007f9ff5075f90 in g_main_context_iterate () at ../../Source/glib-2.58.1/glib/gmain.c:3920
#16 0x00007f9ff50762a2 in g_main_loop_run () at ../../Source/glib-2.58.1/glib/gmain.c:4116
#17 0x00007f9ff65c4022 in gst_gl_window_default_run () at ../../Source/gst-plugins-base-1.16.0/gst-libs/gst/gl/gstglwindow.c:499
#18 0x00007f9ff65a329c in gst_gl_context_create_thread () at ../../Source/gst-plugins-base-1.16.0/gst-libs/gst/gl/gstglcontext.c:1305
#19 0x00007f9ff509e715 in g_thread_proxy () at ../../Source/glib-2.58.1/glib/gthread.c:784
#20 0x00007f9ff5a536db in start_thread (arg=0x7f9e6b7c6700) at pthread_create.c:463
#21 0x00007f9ff223088f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
   


> Instead of disabling at build-time, you can also disable it at run-time with
> this environment variable WEBKIT_DISABLE_COMPOSITING_MODE=1 
> 
> check: https://trac.webkit.org/wiki/EnvironmentVariables
Thanks for the pointer!
Comment 5 Yury Semikhatsky 2019-09-19 10:02:34 PDT
Created attachment 379140 [details]
Web process backtraces
Comment 6 Yury Semikhatsky 2019-09-19 10:03:13 PDT
Created attachment 379142 [details]
MiniBrowser UI process bt
Comment 7 Yury Semikhatsky 2019-09-19 10:03:26 PDT
Created attachment 379143 [details]
Xorg bt
Comment 8 Yury Semikhatsky 2019-09-19 10:07:23 PDT
I can also reproduce it reliably with booking.com Here is a couple of stack traces on that scenario, they contain WebCore functions which could be used as starting point for further debugging. I'd be happy to help investigating this further but I'd need some pointers from experts what to look for.



29      ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) bt
#0  0x00007fcfd7449bf9 in __GI___poll (fds=0x7ffcacf81da8, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007fcfd5f31747 in ?? () from /usr/lib/x86_64-linux-gnu/libxcb.so.1
#2  0x00007fcfd5f3306f in ?? () from /usr/lib/x86_64-linux-gnu/libxcb.so.1
#3  0x00007fcfd5f331ef in xcb_wait_for_reply64 () from /usr/lib/x86_64-linux-gnu/libxcb.so.1
#4  0x00007fcfd8dcd6e8 in _XReply () from /usr/lib/x86_64-linux-gnu/libX11.so.6
#5  0x00007fcfd49fe004 in XIGetClientPointer () from /usr/lib/x86_64-linux-gnu/libXi.so.6
#6  0x00007fcfda4b0c1e in gdk_x11_display_get_default_seat ()
    at /home/yurys/WebKit/WebKitBuild/DependenciesGTK/Source/gtk+-3.22.11/gdk/x11/gdkdisplay-x11.c:2889
#7  0x00007fcfe3a21b95 in WebCore::screenHasTouchDevice() () from /home/yurys/WebKit/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#8  0x00007fcfe38c55cd in WebCore::RuntimeEnabledFeatures::touchEventsEnabled() const ()
   from /home/yurys/WebKit/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#9  0x00007fcfe29671f1 in WebCore::JSDOMWindow::finishCreation(JSC::VM&, WebCore::JSWindowProxy*) ()
   from /home/yurys/WebKit/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#10 0x00007fcfe31238ff in WebCore::JSWindowProxy::setWindow(WebCore::AbstractDOMWindow&) ()
   from /home/yurys/WebKit/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#11 0x00007fcfe3123ca6 in WebCore::JSWindowProxy::create(JSC::VM&, WebCore::AbstractDOMWindow&, WebCore::DOMWrapperWorld&) ()
   from /home/yurys/WebKit/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#12 0x00007fcfe314a8df in WebCore::WindowProxy::createJSWindowProxy(WebCore::DOMWrapperWorld&) ()
   from /home/yurys/WebKit/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#13 0x00007fcfe314ac68 in WebCore::WindowProxy::createJSWindowProxyWithInitializedScript(WebCore::DOMWrapperWorld&) ()
   from /home/yurys/WebKit/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#14 0x00007fcfe3123f97 in WebCore::toJS(JSC::ExecState*, WebCore::WindowProxy&) ()
   from /home/yurys/WebKit/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#15 0x00007fcfdfae749e in llint_slow_path_get_by_id () from /home/yurys/WebKit/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18
#16 0x00007fcfdfad220f in llint_op_get_by_id () from /home/yurys/WebKit/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18
#17 0xffff000000000002 in ?? ()
#18 0x00007fcfdfacccd1 in llint_op_enter_wide32 () from /home/yurys/WebKit/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18
#19 0x0000000000000000 in ?? ()




Thread 16 (Thread 0x7fcf332fd700 (LWP 36208)):
#0  0x00007fcfd7438e57 in sched_yield () at ../sysdeps/unix/syscall-template.S:78
#1  0x00007fcf5cc4c9c9 in ?? () from /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
#2  0x00007fcf5c996809 in ?? () from /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
#3  0x00007fcf5eadf583 in glPrimitiveBoundingBox () from /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0
#4  0x00007fcfe25d24b5 in WebKit::ThreadedCompositor::renderLayerTree() () from /home/yurys/WebKit/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#5  0x00007fcfe0036871 in WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::$_2::__invoke(void*) ()
   from /home/yurys/WebKit/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18
#6  0x00007fcfda29bbc5 in g_main_dispatch () at ../../Source/glib-2.58.1/glib/gmain.c:3182
#7  g_main_context_dispatch () at ../../Source/glib-2.58.1/glib/gmain.c:3847
#8  0x00007fcfda29bf90 in g_main_context_iterate () at ../../Source/glib-2.58.1/glib/gmain.c:3920
#9  0x00007fcfda29c2a2 in g_main_loop_run () at ../../Source/glib-2.58.1/glib/gmain.c:4116
#10 0x00007fcfe0036338 in WTF::RunLoop::run() () from /home/yurys/WebKit/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18
#11 0x00007fcfdffe3def in WTF::Thread::entryPoint(WTF::Thread::NewThreadContext*) ()
   from /home/yurys/WebKit/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18
#12 0x00007fcfe0036ec6 in WTF::wtfThreadEntryPoint(void*) () from /home/yurys/WebKit/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18
#13 0x00007fcfdac796db in start_thread (arg=0x7fcf332fd700) at pthread_create.c:463
#14 0x00007fcfd745688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Comment 9 Carlos Alberto Lopez Perez 2019-09-19 10:46:16 PDT
(In reply to Yury Semikhatsky from comment #8)
> I can also reproduce it reliably with booking.com Here is a couple of stack
> traces on that scenario, they contain WebCore functions which could be used
> as starting point for further debugging. I'd be happy to help investigating
> this further but I'd need some pointers from experts what to look for.
> 
> 
> 

Its not clear to me on those backtraces where the crash happens.

Can you try this?

1. Enable coredumps by executing "ulimit -c unlimited"
2. Reproduce the crash by executing the minibrowser from the previous terminal and you should get a file named "core" on your working dir
3. Get a backtrace from it with this:

cd $WebKitBaseDir
Tools/jhbuild/jhbuild-wrapper --gtk run gdb --batch -ex "thread apply all bt full" WebKitBuild/Release/bin/WebKitWebProcess /path/to/file/core &> backtrace.txt

And that should get a backtrace with info about the crash.

If you can get a debug build and reproduce the crash with the debug build and get the backtrace with it, that would be even better.

See also: https://trac.webkit.org/wiki/WebKitGTK/Debugging
Comment 10 Yury Semikhatsky 2019-09-19 12:53:10 PDT
Created attachment 379154 [details]
core dump of segfaulting web process

Here is the dump collected from a release build, I'll repeat that with debug too. Perhaps it's worth building gst-plugins-base-1.16.0 in debug mode too.
Comment 11 Yury Semikhatsky 2019-09-19 13:07:37 PDT
Created attachment 379157 [details]
bt on a hanging web process

The process wouldn't die this time, just hung, so I collected the traces by running: 
Tools/jhbuild/jhbuild-wrapper --gtk run gdb --batch -ex "thread apply all bt full" WebKitBuild/Release/bin/WebKitWebProcess  37323 &> live-bt.txt

looks similar to the above.
Comment 12 Yury Semikhatsky 2019-09-19 13:33:13 PDT
MiniBrowser is crashing on an assertion failure in debug mode: http://webkit.org/b/202001
Comment 13 Carlos Alberto Lopez Perez 2019-09-19 16:54:39 PDT
(In reply to Yury Semikhatsky from comment #10)
> Created attachment 379154 [details]
> core dump of segfaulting web process
> 
> Here is the dump collected from a release build, I'll repeat that with debug
> too. Perhaps it's worth building gst-plugins-base-1.16.0 in debug mode too.

From that backtrace I can see that the crash happens on the graphic driver code (nouveau) and not in webkit code.

> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00007f98054310b3 in ?? () from /usr/lib/x86_64-linux-gnu/libdrm_nouveau.so.2
> [Current thread is 1 (Thread 0x7f7f14cff700 (LWP 35521))]

Perhaps this is a bug on the Mesa/Nouveau code and is unrelated to WebKit itself?. Maybe it just happens that WebKit triggers the bug.

If you install (via apt) the debug/debugsym package for the package that provides /usr/lib/x86_64-linux-gnu/libdrm_nouveau.so.2 and generate the backtrace again, then it should show in which function of the nouveau driver it crashes.
Comment 14 Yury Semikhatsky 2019-09-23 16:43:00 PDT
Created attachment 379406 [details]
double free or corruption (fasttop)

Looks like memory corruption somewhere


Thread 1 (Thread 0x7f16c0dfe700 (LWP 10807)):
#3  0x00007f174188190a in malloc_printerr (str=str@entry=0x7f17419a9828 "double free or corruption (fasttop)") at malloc.c:5350
No locals.
#4  0x00007f1741889004 in _int_free (have_lock=0, p=0x7f16b408c790, av=0x7f16b4000020) at malloc.c:4230
#5  __GI___libc_free (mem=mem@entry=0x7f16b408c7a0) at malloc.c:3124
#6  0x00007f16b9661de6 in nouveau_bo_del (bo=0x7f16b408c7a0) at ../nouveau/nouveau.c:618
#7  nouveau_bo_ref (bo=bo@entry=0x0, pref=pref@entry=0x7f16c0dfd860) at ../nouveau/nouveau.c:784
#8  0x00007f16b9662f00 in pushbuf_flush (push=push@entry=0x7f16b4091290) at ../nouveau/pushbuf.c:413
        nvpb = <optimized out>
        krec = 0x7f16b4091860
        kref = 0x7f16b4091890
        bctx = <optimized out>
        btmp = <optimized out>
        bo = 0x7f16b408c7a0
        ret = -2
        i = 1
#9  0x00007f16b9663a40 in nouveau_pushbuf_kick (push=0x7f16b4091290, chan=<optimized out>) at ../nouveau/pushbuf.c:775
No locals.
#10 0x00007f16ba47f116 in ?? () from /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
No symbol table info available.
#11 0x00007f16ba5c5e6b in ?? () from /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
No symbol table info available.
#12 0x00007f16ba14382a in ?? () from /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
No symbol table info available.
#13 0x00007f16c03de583 in glPrimitiveBoundingBox () from /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0
No symbol table info available.
#14 0x00007f174ca7b395 in WebKit::ThreadedCompositor::renderLayerTree() () from /home/yurys/WebKit/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
No symbol table info available.
#15 0x00007f174a4e7ae1 in WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::$_2::__invoke(void*) () from /home/yurys/WebKit/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18
No symbol table info available.
#16 0x00007f1744757bc5 in g_main_dispatch () at ../../Source/glib-2.58.1/glib/gmain.c:3182
No locals.
#17 g_main_context_dispatch () at ../../Source/glib-2.58.1/glib/gmain.c:3847
No locals.
#18 0x00007f1744757f90 in g_main_context_iterate () at ../../Source/glib-2.58.1/glib/gmain.c:3920
No locals.
#19 0x00007f17447582a2 in g_main_loop_run () at ../../Source/glib-2.58.1/glib/gmain.c:4116
No locals.
#20 0x00007f174a4e75a8 in WTF::RunLoop::run() () from /home/yurys/WebKit/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18
No symbol table info available.
#21 0x00007f174a49505f in WTF::Thread::entryPoint(WTF::Thread::NewThreadContext*) () from /home/yurys/WebKit/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18
No symbol table info available.
#22 0x00007f174a4e8136 in WTF::wtfThreadEntryPoint(void*) () from /home/yurys/WebKit/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18
No symbol table info available.
#23 0x00007f17451356db in start_thread (arg=0x7f16c0dfe700) at pthread_create.c:463
        pd = 0x7f16c0dfe700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139735701907200, 887403434324897123, 139735701904704, 0, 139735727867440, 140725081343720, -973568036388360861, -973276017472721565}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#24 0x00007f174191288f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Comment 15 Yury Semikhatsky 2019-09-23 16:45:31 PDT
Created attachment 379407 [details]
assertion in nouveau/pushbuf.c:723

This one fails on an assertion.
Comment 16 Yury Semikhatsky 2019-09-23 16:59:22 PDT
Filed https://bugs.freedesktop.org/show_bug.cgi?id=111793
Comment 17 Sergio Villar Senin 2020-08-05 07:48:42 PDT
I guess we can close this now, right?
Comment 18 Adrian Perez 2020-10-28 16:07:40 PDT
(In reply to Sergio Villar Senin from comment #17)
> I guess we can close this now, right?

I think so; the Nouveau people have acknowledged that this seems to
be a threading issue on their side. Note that the Nouveau bug tracker
has been moved, this is the new URL of the driver bug:

  https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/-/issues/504