Bug 167304 - [GTK] WebProcess from WebKitGtk+ 2.15.x SIGSEVs in GIFLZWContext::doLZW(unsigned char const*, unsigned long) at Source/WebCore/platform/image-decoders/gif/GIFImageReader.cpp:303
Summary: [GTK] WebProcess from WebKitGtk+ 2.15.x SIGSEVs in GIFLZWContext::doLZW(unsig...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Nightly Build
Hardware: PC Linux
: P2 Normal
Assignee: Miguel Gomez
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-23 06:48 PST by Andres Gomez Garcia
Modified: 2017-03-06 07:18 PST (History)
7 users (show)

See Also:


Attachments
BT from gdb for the WebProcess (207.20 KB, text/plain)
2017-01-23 06:48 PST, Andres Gomez Garcia
no flags Details
Another BT from gdb for the WebProcess (295.44 KB, text/plain)
2017-02-07 07:04 PST, Andres Gomez Garcia
no flags Details
Yet another similar BT from gdb (276.05 KB, text/plain)
2017-02-10 00:46 PST, Andres Gomez Garcia
no flags Details
And yet another similar BT from gdb (5.56 KB, text/plain)
2017-02-15 02:18 PST, Andres Gomez Garcia
no flags Details
And yet another similar BT from gdb (149.68 KB, text/plain)
2017-02-15 02:19 PST, Andres Gomez Garcia
no flags Details
And another similar BT from gdb (149.29 KB, text/plain)
2017-02-15 02:26 PST, Andres Gomez Garcia
no flags Details
A sane BT from gdb for the WebProcess (244.80 KB, text/plain)
2017-02-16 06:11 PST, Andres Gomez Garcia
no flags Details
Another sane BT from gdb for the WebProcess (203.10 KB, text/plain)
2017-02-16 06:12 PST, Andres Gomez Garcia
no flags Details
BT from gdb for the WebProcess 2.15.90 (136.53 KB, text/plain)
2017-02-28 04:00 PST, Andres Gomez Garcia
no flags Details
Patch (1.96 KB, patch)
2017-03-06 05:10 PST, Miguel Gomez
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andres Gomez Garcia 2017-01-23 06:48:40 PST
Created attachment 299514 [details]
BT from gdb for the WebProcess

Epiphany 3.20.6 and WebKit 2.15.3

I'm running Epiphany with the dconf key:

"process-model" = "shared-secondary-process"

And the env variable:

"export LIBGL_DRI3_DISABLE=1"

The compilation was done with CMake args:

'-DPORT=GTK -DCMAKE_BUILD_TYPE=Release -DENABLE_MINIBROWSER=ON -DCMAKE_C_FLAGS_RELEASE="-O0 -g -DNDEBUG -DG_DEBUG=fatal-criticals -DG_DISABLE_CAST_CHECKS" -DCMAKE_CXX_FLAGS_RELEASE="-O0 -g -DNDEBUG -DG_DEBUG=fatal-criticals -DG_DISABLE_CAST_CHECKS"'

After visiting several pages, eventually, the WebProcess hits a SIGSEV.

This bug is not reproducible in a predictable way.
Comment 1 Andres Gomez Garcia 2017-02-07 07:04:43 PST
Created attachment 300809 [details]
Another BT from gdb for the WebProcess
Comment 2 Andres Gomez Garcia 2017-02-10 00:46:10 PST
Created attachment 301139 [details]
Yet another similar BT from gdb

This is completely a different BT but I suppose it is related.
Comment 3 Andres Gomez Garcia 2017-02-10 01:04:30 PST
(In reply to comment #2)
> Created attachment 301139 [details]
> Yet another similar BT from gdb
> 
> This is completely a different BT but I suppose it is related.

I'm hitting this one very easily by visiting:
https://mujeryciencia.fundaciontelefonica.com/
Comment 4 Andres Gomez Garcia 2017-02-15 02:18:03 PST
Created attachment 301597 [details]
And yet another similar BT from gdb

This is with:

"process-model" = "one-secondary-process-per-web-view"
Comment 5 Andres Gomez Garcia 2017-02-15 02:19:03 PST
Created attachment 301598 [details]
And yet another similar BT from gdb
Comment 6 Andres Gomez Garcia 2017-02-15 02:26:26 PST
Created attachment 301599 [details]
And another similar BT from gdb
Comment 7 Andres Gomez Garcia 2017-02-16 06:11:56 PST
Created attachment 301755 [details]
A sane BT from gdb for the WebProcess
Comment 8 Andres Gomez Garcia 2017-02-16 06:12:28 PST
Created attachment 301756 [details]
Another sane BT from gdb for the WebProcess
Comment 9 Andres Gomez Garcia 2017-02-28 04:00:07 PST
Created attachment 302930 [details]
BT from gdb for the WebProcess 2.15.90

This is still reproducible with Epiphany 3.22.6 and WebKit 2.15.90.

The rest of the dependencies are all provided from Debian Testing.

The compilation was done with CMake args:

'-DPORT=GTK -DCMAKE_BUILD_TYPE=Release -DENABLE_MINIBROWSER=ON -DCMAKE_C_FLAGS_RELEASE="-O0 -g -DNDEBUG  -DG_DISABLE_CAST_CHECKS" -DCMAKE_CXX_FLAGS_RELEASE="-O0 -g -DNDEBUG -DG_DISABLE_CAST_CHECKS"'
Comment 10 Miguel Gomez 2017-03-03 09:03:27 PST
The problem here seems to come from the usage of a decoding thread to decode gif images.

I've been able to reproduce the cash and what seems to be happening is:

- We want to show a gif image, so we create ImageFrameCache decides that the decoding of that image will happen in a secondary thread.
It creates a WorkQueue and, for each of the frames requested, the WorkQueue calls m_decoder->createFrameImageAtIndex() in the decoding thread.

- For some reason (maybe because it's hidden because of scrolling), the BitmapImage containing the gif gets resetAnimation() called. This stops the WorkQueue with the frame decodings and then it destroys all the data that has been decoded, and sometimes even the decoder used. Theoretically this is fine.

- But what I'm seeing is that the WorkQueue is indeed being stopped, and no new frames are requested for decoding. But if there's an ongoing decoding in the seconday thread, it won't stop until it finishes. Due to this, there's the possibility that the decoder gets destroyed while it's still being used in the decoding thread, and this seems to be reason of the crashes, both this one and the similar ones reported.
Comment 11 Miguel Gomez 2017-03-06 05:04:52 PST
More details.

There are actually 2 different crashes, both of them happening because something unexpected happens while there's a decoding ongoing on the decoding thread.

* The fist one affects only WebKitGtk+ code. It happens because GIFImageDecoder::clearFrameBufferCache() gets called, and that function deletes the GIFImageReader that is being used for decoding in the decoding thread. I'll upload a patch that avoid this.

* The second crash affects multiplatform code, and it happens because the decoder being used in the decoding thread can be deleted while there's still decoding happening.
The decoder is a unique_ptr stored in ImageSource, and that class is the one that sets it to ImageFrameCache, which is the one using it (it's stored as a simple pointer in ImageFrameCache).
When ImageSource::clear() gets called, it sets the ImageFrameCache decoder to null, without ensuring somehow that it's not being used at that point by the decoding thread, and this causes the crash.

I see 2 ways to fix this:

* Using a lock to ensure that ImageFrameCache::stopAsyncDecodingQueue() doesn't return until the decoding thread has stopped processing the requested frames. That function gets called before ImageSource::clear() and its mission is to stop the decoding in the secondary thread, but currently it's possible that there's a frame being decoded when that function returns. If we add a lock there and in the function that performs the decoding in the secondary thread, we can be sure that there's no decoding happening and we can delete the decoder safely. I've tested this fix and it seems to work properly.

* Another option is to turn the decoder into a RefPtr inside ImageSource, pass that RefPtr to ImageFrameCache and use it inside the function that performs the decoding in the secondary thread. This way ImageSource can unref the decoder when it wants, but it won't be destroyed until the decoding thread finishes using it.

Which do you think is the best option?
Comment 12 Miguel Gomez 2017-03-06 05:10:07 PST
Created attachment 303509 [details]
Patch
Comment 13 Carlos Garcia Campos 2017-03-06 05:33:49 PST
(In reply to comment #11)
> More details.
> 
> There are actually 2 different crashes, both of them happening because
> something unexpected happens while there's a decoding ongoing on the
> decoding thread.
> 
> * The fist one affects only WebKitGtk+ code. It happens because
> GIFImageDecoder::clearFrameBufferCache() gets called, and that function
> deletes the GIFImageReader that is being used for decoding in the decoding
> thread. I'll upload a patch that avoid this.

Thanks!

> * The second crash affects multiplatform code, and it happens because the
> decoder being used in the decoding thread can be deleted while there's still
> decoding happening.
> The decoder is a unique_ptr stored in ImageSource, and that class is the one
> that sets it to ImageFrameCache, which is the one using it (it's stored as a
> simple pointer in ImageFrameCache).
> When ImageSource::clear() gets called, it sets the ImageFrameCache decoder
> to null, without ensuring somehow that it's not being used at that point by
> the decoding thread, and this causes the crash.
> 
> I see 2 ways to fix this:
> 
> * Using a lock to ensure that ImageFrameCache::stopAsyncDecodingQueue()
> doesn't return until the decoding thread has stopped processing the
> requested frames. That function gets called before ImageSource::clear() and
> its mission is to stop the decoding in the secondary thread, but currently
> it's possible that there's a frame being decoded when that function returns.
> If we add a lock there and in the function that performs the decoding in the
> secondary thread, we can be sure that there's no decoding happening and we
> can delete the decoder safely. I've tested this fix and it seems to work
> properly.
> 
> * Another option is to turn the decoder into a RefPtr inside ImageSource,
> pass that RefPtr to ImageFrameCache and use it inside the function that
> performs the decoding in the secondary thread. This way ImageSource can
> unref the decoder when it wants, but it won't be destroyed until the
> decoding thread finishes using it.
> 
> Which do you think is the best option?

Let's file a new bug report to discuss this one, then. I would say we should make ImageDecoder refcounted. Said, what do you think?
Comment 14 WebKit Commit Bot 2017-03-06 06:19:43 PST
Comment on attachment 303509 [details]
Patch

Clearing flags on attachment: 303509

Committed r213448: <http://trac.webkit.org/changeset/213448>
Comment 15 WebKit Commit Bot 2017-03-06 06:19:48 PST
All reviewed patches have been landed.  Closing bug.
Comment 16 Michael Catanzaro 2017-03-06 07:09:17 PST
Miguel, could you please check the See Also bugs to see if any of those are duplicates?
Comment 17 Carlos Garcia Campos 2017-03-06 07:12:12 PST
(In reply to comment #16)
> Miguel, could you please check the See Also bugs to see if any of those are
> duplicates?

All of them are the same issue, I would say, but maybe they are not fixed by this one, but by #169199
Comment 18 Miguel Gomez 2017-03-06 07:18:20 PST
(In reply to comment #16)
> Miguel, could you please check the See Also bugs to see if any of those are
> duplicates?

As Carlos said, all of them are the same issue, but they are not completely fixed by the patch in this bug. They'll be fixed when 169199 gets fixed.

We can close them as duplicates in the meantime if you want. I was planning on doing so when closing 169199