Created attachment 299514 [details] BT from gdb for the WebProcess Epiphany 3.20.6 and WebKit 2.15.3 I'm running Epiphany with the dconf key: "process-model" = "shared-secondary-process" And the env variable: "export LIBGL_DRI3_DISABLE=1" The compilation was done with CMake args: '-DPORT=GTK -DCMAKE_BUILD_TYPE=Release -DENABLE_MINIBROWSER=ON -DCMAKE_C_FLAGS_RELEASE="-O0 -g -DNDEBUG -DG_DEBUG=fatal-criticals -DG_DISABLE_CAST_CHECKS" -DCMAKE_CXX_FLAGS_RELEASE="-O0 -g -DNDEBUG -DG_DEBUG=fatal-criticals -DG_DISABLE_CAST_CHECKS"' After visiting several pages, eventually, the WebProcess hits a SIGSEV. This bug is not reproducible in a predictable way.
Created attachment 300809 [details] Another BT from gdb for the WebProcess
Created attachment 301139 [details] Yet another similar BT from gdb This is completely a different BT but I suppose it is related.
(In reply to comment #2) > Created attachment 301139 [details] > Yet another similar BT from gdb > > This is completely a different BT but I suppose it is related. I'm hitting this one very easily by visiting: https://mujeryciencia.fundaciontelefonica.com/
Created attachment 301597 [details] And yet another similar BT from gdb This is with: "process-model" = "one-secondary-process-per-web-view"
Created attachment 301598 [details] And yet another similar BT from gdb
Created attachment 301599 [details] And another similar BT from gdb
Created attachment 301755 [details] A sane BT from gdb for the WebProcess
Created attachment 301756 [details] Another sane BT from gdb for the WebProcess
Created attachment 302930 [details] BT from gdb for the WebProcess 2.15.90 This is still reproducible with Epiphany 3.22.6 and WebKit 2.15.90. The rest of the dependencies are all provided from Debian Testing. The compilation was done with CMake args: '-DPORT=GTK -DCMAKE_BUILD_TYPE=Release -DENABLE_MINIBROWSER=ON -DCMAKE_C_FLAGS_RELEASE="-O0 -g -DNDEBUG -DG_DISABLE_CAST_CHECKS" -DCMAKE_CXX_FLAGS_RELEASE="-O0 -g -DNDEBUG -DG_DISABLE_CAST_CHECKS"'
The problem here seems to come from the usage of a decoding thread to decode gif images. I've been able to reproduce the cash and what seems to be happening is: - We want to show a gif image, so we create ImageFrameCache decides that the decoding of that image will happen in a secondary thread. It creates a WorkQueue and, for each of the frames requested, the WorkQueue calls m_decoder->createFrameImageAtIndex() in the decoding thread. - For some reason (maybe because it's hidden because of scrolling), the BitmapImage containing the gif gets resetAnimation() called. This stops the WorkQueue with the frame decodings and then it destroys all the data that has been decoded, and sometimes even the decoder used. Theoretically this is fine. - But what I'm seeing is that the WorkQueue is indeed being stopped, and no new frames are requested for decoding. But if there's an ongoing decoding in the seconday thread, it won't stop until it finishes. Due to this, there's the possibility that the decoder gets destroyed while it's still being used in the decoding thread, and this seems to be reason of the crashes, both this one and the similar ones reported.
More details. There are actually 2 different crashes, both of them happening because something unexpected happens while there's a decoding ongoing on the decoding thread. * The fist one affects only WebKitGtk+ code. It happens because GIFImageDecoder::clearFrameBufferCache() gets called, and that function deletes the GIFImageReader that is being used for decoding in the decoding thread. I'll upload a patch that avoid this. * The second crash affects multiplatform code, and it happens because the decoder being used in the decoding thread can be deleted while there's still decoding happening. The decoder is a unique_ptr stored in ImageSource, and that class is the one that sets it to ImageFrameCache, which is the one using it (it's stored as a simple pointer in ImageFrameCache). When ImageSource::clear() gets called, it sets the ImageFrameCache decoder to null, without ensuring somehow that it's not being used at that point by the decoding thread, and this causes the crash. I see 2 ways to fix this: * Using a lock to ensure that ImageFrameCache::stopAsyncDecodingQueue() doesn't return until the decoding thread has stopped processing the requested frames. That function gets called before ImageSource::clear() and its mission is to stop the decoding in the secondary thread, but currently it's possible that there's a frame being decoded when that function returns. If we add a lock there and in the function that performs the decoding in the secondary thread, we can be sure that there's no decoding happening and we can delete the decoder safely. I've tested this fix and it seems to work properly. * Another option is to turn the decoder into a RefPtr inside ImageSource, pass that RefPtr to ImageFrameCache and use it inside the function that performs the decoding in the secondary thread. This way ImageSource can unref the decoder when it wants, but it won't be destroyed until the decoding thread finishes using it. Which do you think is the best option?
Created attachment 303509 [details] Patch
(In reply to comment #11) > More details. > > There are actually 2 different crashes, both of them happening because > something unexpected happens while there's a decoding ongoing on the > decoding thread. > > * The fist one affects only WebKitGtk+ code. It happens because > GIFImageDecoder::clearFrameBufferCache() gets called, and that function > deletes the GIFImageReader that is being used for decoding in the decoding > thread. I'll upload a patch that avoid this. Thanks! > * The second crash affects multiplatform code, and it happens because the > decoder being used in the decoding thread can be deleted while there's still > decoding happening. > The decoder is a unique_ptr stored in ImageSource, and that class is the one > that sets it to ImageFrameCache, which is the one using it (it's stored as a > simple pointer in ImageFrameCache). > When ImageSource::clear() gets called, it sets the ImageFrameCache decoder > to null, without ensuring somehow that it's not being used at that point by > the decoding thread, and this causes the crash. > > I see 2 ways to fix this: > > * Using a lock to ensure that ImageFrameCache::stopAsyncDecodingQueue() > doesn't return until the decoding thread has stopped processing the > requested frames. That function gets called before ImageSource::clear() and > its mission is to stop the decoding in the secondary thread, but currently > it's possible that there's a frame being decoded when that function returns. > If we add a lock there and in the function that performs the decoding in the > secondary thread, we can be sure that there's no decoding happening and we > can delete the decoder safely. I've tested this fix and it seems to work > properly. > > * Another option is to turn the decoder into a RefPtr inside ImageSource, > pass that RefPtr to ImageFrameCache and use it inside the function that > performs the decoding in the secondary thread. This way ImageSource can > unref the decoder when it wants, but it won't be destroyed until the > decoding thread finishes using it. > > Which do you think is the best option? Let's file a new bug report to discuss this one, then. I would say we should make ImageDecoder refcounted. Said, what do you think?
Comment on attachment 303509 [details] Patch Clearing flags on attachment: 303509 Committed r213448: <http://trac.webkit.org/changeset/213448>
All reviewed patches have been landed. Closing bug.
Miguel, could you please check the See Also bugs to see if any of those are duplicates?
(In reply to comment #16) > Miguel, could you please check the See Also bugs to see if any of those are > duplicates? All of them are the same issue, I would say, but maybe they are not fixed by this one, but by #169199
(In reply to comment #16) > Miguel, could you please check the See Also bugs to see if any of those are > duplicates? As Carlos said, all of them are the same issue, but they are not completely fixed by the patch in this bug. They'll be fixed when 169199 gets fixed. We can close them as duplicates in the meantime if you want. I was planning on doing so when closing 169199