Bug 165848

Summary: [GTK] WebProcess from WebKitGtk+ 2.15.2 SIGSEGVs in std::unique_ptr<SoupBuffer, WTF::GPtrDeleter<SoupBuffer> >::get() const () at /usr/include/c++/6/bits/unique_ptr.h:305
Product: WebKit Reporter: Andres Gomez Garcia <agomez>
Component: WebKitGTKAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: bugs-noreply, cgarcia, clopez, mcatanzaro, sabouhallawa, zan
Priority: P2    
Version: WebKit Nightly Build   
Hardware: PC   
OS: Linux   
See Also: https://bugs.webkit.org/show_bug.cgi?id=166838
Attachments:
Description Flags
BT from gdb
none
Another similar BT from gdb
none
Yet another similar BT from gdb
none
BT from gdb for the UIProcess
none
And yet another similar BT from gdb
none
Another BT from gdb for the UIProcess
none
BT from gdb, compiled with -g0
none
BT from gdb, compiled with -g
none
BT from gdb for the UIProcess with -g
none
BT from gdb with LockHolder moved
none
Patch
mcatanzaro: review+
BT from gdb for the WebProcess, with the new patch
none
BT from gdb for the UIProcess, with the new patch none

Description Andres Gomez Garcia 2016-12-14 07:48:48 PST
Created attachment 297087 [details]
BT from gdb

I'm using WebKitGtk+ with my own JHBuild setting:
https://github.com/tanty/jhbuild-epiphany/tree/master

Epiphany 3.20.3 and WebKit 2.15.2 with the attached patches for bug 164049, bug 165200, bug 165283 and bug 164052, applied.

I'm running Epiphany with the dconf key:

"process-model" = "shared-secondary-process"

And the env variable:

"export LIBGL_DRI3_DISABLE=1"

The compilation was done with CMake args:

'-DUSE_LD_GOLD=OFF -DPORT=GTK -DCMAKE_BUILD_TYPE=Release -DENABLE_MINIBROWSER=ON -DCMAKE_C_FLAGS_RELEASE="-O0 -g1 -DNDEBUG -DG_DEBUG=fatal-criticals -DG_DISABLE_CAST_CHECKS" -DCMAKE_CXX_FLAGS_RELEASE="-O0 -g1 -DNDEBUG -DG_DEBUG=fatal-criticals -DG_DISABLE_CAST_CHECKS"'

After visiting several pages, eventually, the WebProcess hits a SIGSEV.

This bug is not reproducible in a predictable way.
Comment 1 Andres Gomez Garcia 2016-12-15 01:23:26 PST
Created attachment 297186 [details]
Another similar BT from gdb
Comment 2 Andres Gomez Garcia 2016-12-15 01:44:36 PST
This bug is happening very recurrently.

FTR, the compilation also has '-DENABLE_THREADED_COMPOSITOR=OFF', I forgot to mention that.
Comment 3 Zan Dobersek 2016-12-16 01:51:57 PST
When you hit this again, in gdb, run these commands:

$ info registers
$ info proc mappings
$ disassemble _ZNKSt10unique_ptrI10SoupBufferN3WTF11GPtrDeleterIS0_EEE3getEv
$ disassemble _ZNK7WebCore12SharedBuffer4dataEv
$ disassemble _ZNK14GIFImageReader4dataEm

The SharedBuffer object on the GIFImageReader might be null.
Comment 4 Andres Gomez Garcia 2016-12-16 09:32:52 PST
(In reply to comment #3)
> When you hit this again, in gdb, run these commands:
> 
> $ info registers

This is already included in the attached BTs

> $ info proc mappings
> $ disassemble _ZNKSt10unique_ptrI10SoupBufferN3WTF11GPtrDeleterIS0_EEE3getEv
> $ disassemble _ZNK7WebCore12SharedBuffer4dataEv
> $ disassemble _ZNK14GIFImageReader4dataEm

I'll do this.
Comment 5 Andres Gomez Garcia 2016-12-17 05:24:22 PST
Created attachment 297401 [details]
Yet another similar BT from gdb

This BT additionally includes info for:

$ info proc mappings
$ disassemble _ZNKSt10unique_ptrI10SoupBufferN3WTF11GPtrDeleterIS0_EEE3getEv
$ disassemble _ZNK7WebCore12SharedBuffer4dataEv
$ disassemble _ZNK14GIFImageReader4dataEm
Comment 6 Andres Gomez Garcia 2016-12-17 05:26:49 PST
Created attachment 297402 [details]
BT from gdb for the UIProcess

After the last SIGSEV, the UIProcess also SIGSEVed. This is its BT.

Have into account that the code was modified accordingly to the suggestion at:
https://bugs.webkit.org/show_bug.cgi?id=165656#c18
Comment 7 Andres Gomez Garcia 2016-12-17 16:19:10 PST
Created attachment 297416 [details]
And yet another similar BT from gdb

This BT additionally includes info for:

$ info proc mappings
$ disassemble _ZNKSt10unique_ptrI10SoupBufferN3WTF11GPtrDeleterIS0_EEE3getEv
$ disassemble _ZNK7WebCore12SharedBuffer4dataEv
$ disassemble _ZNK14GIFImageReader4dataEm
Comment 8 Andres Gomez Garcia 2016-12-17 16:20:24 PST
Created attachment 297417 [details]
Another BT from gdb for the UIProcess

After the last SIGSEV, the UIProcess also SIGSEVed. This is its BT.

Have into account that the code was modified accordingly to the suggestion at:
https://bugs.webkit.org/show_bug.cgi?id=165656#c18
Comment 9 Zan Dobersek 2016-12-18 11:59:41 PST
The hope was that upon the crash the instruction pointer would stay somewhere in _ZNKSt10unique_ptrI10SoupBufferN3WTF11GPtrDeleterIS0_EEE3getEv. That doesn't happen, but the backtrace still reports the crash occurring there.

Assuming the backtrace is correct, with the rdi register having the 0x20 value, it seems that the std::unique_ptr<> getter is called on a std::unique_ptr<> object that's at address 0x20. That's the same value as the offset at which the std::unique_ptr<SoupBuffer> object is positioned inside the SharedBuffer class. So I think the SharedBuffer object in the GIFImageReader class is null.

Also, looking at the memory mappings -- I really want to know what you're loading in the browser, for starters.
Comment 10 Carlos Garcia Campos 2016-12-19 02:31:01 PST
Maybe this is the same I fixed in r208881
Comment 11 Andres Gomez Garcia 2016-12-19 03:18:49 PST
(In reply to comment #10)
> Maybe this is the same I fixed in r208881

Sounds like it. I will apply that patch and check ...
Comment 12 Andres Gomez Garcia 2016-12-19 03:21:48 PST
(In reply to comment #11)
> (In reply to comment #10)
> > Maybe this is the same I fixed in r208881
> 
> Sounds like it. I will apply that patch and check ...

Actually, this is already in 2.15.2
Comment 13 Andres Gomez Garcia 2016-12-19 03:22:20 PST
(In reply to comment #9)
> Also, looking at the memory mappings -- I really want to know what you're
> loading in the browser, for starters.

I'll try to pass you the crashed session next time it happens.
Comment 14 Michael Catanzaro 2016-12-19 04:41:04 PST
(In reply to comment #12)
> Actually, this is already in 2.15.2

Yeah, I checked that commit when I saw Zan's comment yesterday. :(
Comment 15 Andres Gomez Garcia 2016-12-19 05:45:38 PST
Created attachment 297457 [details]
BT from gdb, compiled with -g0
Comment 16 Andres Gomez Garcia 2016-12-21 04:40:14 PST
Created attachment 297584 [details]
BT from gdb, compiled with -g
Comment 17 Andres Gomez Garcia 2016-12-21 04:41:54 PST
Created attachment 297585 [details]
BT from gdb for the UIProcess with -g
Comment 18 Carlos Garcia Campos 2016-12-21 05:15:47 PST
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > Maybe this is the same I fixed in r208881
> > 
> > Sounds like it. I will apply that patch and check ...
> 
> Actually, this is already in 2.15.2

I didn't take into account that size() calls isSizeAvailable() that is virtual, and in the case of GIF, it's overriden and also calls decode(). I think the solution should be to make decoders thread-safe, but in the meantime, could you try moving the LockHolder locker(m_lock); in ImageDecoder::createFrameImageAtIndex() to the beginning? right before the size() call?
Comment 19 Andres Gomez Garcia 2017-01-02 09:17:24 PST
Created attachment 297900 [details]
BT from gdb with LockHolder moved

I already applied a patch to move LockHolder locker(m_lock); in ImageDecoder::createFrameImageAtIndex() to the beginning
Comment 20 Andres Gomez Garcia 2017-01-04 02:01:01 PST
Going to this URL and scrolling down reproduces the crash in a 100%

http://wiki.inkscape.org/wiki/index.php/Release_notes/0.92#Important_changes
Comment 21 Zan Dobersek 2017-01-04 09:35:06 PST
(In reply to comment #20)
> Going to this URL and scrolling down reproduces the crash in a 100%
> 
> http://wiki.inkscape.org/wiki/index.php/Release_notes/0.92#Important_changes

Confirmed with r210274.
Comment 22 Zan Dobersek 2017-01-04 09:55:28 PST
(In reply to comment #21)
> (In reply to comment #20)
> > Going to this URL and scrolling down reproduces the crash in a 100%
> > 
> > http://wiki.inkscape.org/wiki/index.php/Release_notes/0.92#Important_changes
> 
> Confirmed with r210274.

That's using MiniBrowser. I get a pretty good crash rate if I remove the ~/.cache/MiniBrowser cache before each run.
Comment 23 Carlos Garcia Campos 2017-01-05 04:13:17 PST
Created attachment 298091 [details]
Patch

Could you please try this new patch?
Comment 24 Zan Dobersek 2017-01-05 11:40:58 PST
(In reply to comment #23)
> Created attachment 298091 [details]
> Patch
> 
> Could you please try this new patch?

I am not able to reproduce the crash with it, while I can still easily reproduce the crash without the patch applied.
Comment 25 Andres Gomez Garcia 2017-01-07 14:04:20 PST
Created attachment 298283 [details]
BT from gdb for the WebProcess, with the new patch
Comment 26 Andres Gomez Garcia 2017-01-07 14:04:44 PST
(In reply to comment #25)
> Created attachment 298283 [details]
> BT from gdb for the WebProcess, with the new patch

Keeps crashing ...
Comment 27 Andres Gomez Garcia 2017-01-07 14:07:33 PST
Created attachment 298284 [details]
BT from gdb for the UIProcess, with the new patch

Immediately after the SIGSEV in the WebProcess, we have this SIGSEV in the UIProcess.

This is the message in the console:


-- 8< -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< --

The program with pid 29593 received an X Window System error.
The error was 'BadDamage (invalid Damage parameter)'.

-- 8< -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< --
Comment 28 Carlos Garcia Campos 2017-01-08 23:50:54 PST
(In reply to comment #25)
> Created attachment 298283 [details]
> BT from gdb for the WebProcess, with the new patch

This bt proves the patch works indeed, main thread is waiting in the mutex, while the decoder thread is the only one decoding the gif image.
Comment 29 Carlos Garcia Campos 2017-01-08 23:54:33 PST
(In reply to comment #26)
> (In reply to comment #25)
> > Created attachment 298283 [details]
> > BT from gdb for the WebProcess, with the new patch
> 
> Keeps crashing ...

Due to an assert when accessing passed in rowBuffer:

const unsigned char sourceValue = rowBuffer[(m_scaled ? m_scaledColumns[x] : x) - frameContext->xOffset];

it seems that can overflow, which looks unrelated to the threading issue, I think, I would need to look at it in more detail to be sure.
Comment 30 Carlos Garcia Campos 2017-01-08 23:55:16 PST
(In reply to comment #27)
> Created attachment 298284 [details]
> BT from gdb for the UIProcess, with the new patch
> 
> Immediately after the SIGSEV in the WebProcess, we have this SIGSEV in the
> UIProcess.
> 
> This is the message in the console:
> 
> 
> -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< --
> 
> The program with pid 29593 received an X Window System error.
> The error was 'BadDamage (invalid Damage parameter)'.
> 
> -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< -- 8< --

This is also a different issue.
Comment 31 Carlos Garcia Campos 2017-01-09 04:13:26 PST
Committed r210501: <http://trac.webkit.org/changeset/210501>