Bug 111179

Summary: [Cairo] Surface pointer passed to asNewNativeImage() might be freed.
Product: WebKit Reporter: Zoltan Herczeg <zherczeg>
Component: PlatformAssignee: Martin Robinson <mrobinson>
Status: NEW    
Severity: Normal CC: alex, allan.jensen, alp, bugs-noreply, buildbot, cand, ed, mcatanzaro, mrobinson, noam, obzhirov, rniwa, ssseintr2, WebKit, xytsoft
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
early draft patch
none
patch
none
refcount-imageframe-data.patch
none
imageframe-use-refcounted-array.patch
none
Workaround patch
none
Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2
none
Spinning cube files for demonstrating crashes.
none
Patch
none
Patch none

Zoltan Herczeg
Reported 2013-03-01 07:12:21 PST
Created attachment 190964 [details] early draft patch Hi Martin, you made a patch a year ago which refactored how Cairo wrap surfaces. The side effect of your "just a refactoring" patch is that when the frame data is moved, the cairo surface does not know about it. For example, this might happen in GIFImageDecoder.cpp:90 m_frameBufferCache.resize(reader.images_count); // Reallocates all pixel buffers. When the following image is loaded: <html><body> <img width=90 height=90 src="http://images.animationfactory.com/thw/thw17/AFCT/20090116/FTW/i/12316066.gif?mouse_new_year_2012_confetti_sm_wm"> </body></html> This was actually a really nasty bug, which took me a week to debug it, because it unfortunately disappears when valgrind or gdb is used. It appears only when WebKit execution is fast and the download speed is slow (some kind of racing condition). So I finally used mprotect to force a crash on memory write. A draft patch is attached, but I don't think this is the best solution. Any suggestions are welcome.
Attachments
early draft patch (866 bytes, patch)
2013-03-01 07:12 PST, Zoltan Herczeg
no flags
patch (2.14 KB, patch)
2013-03-12 03:19 PDT, Zoltan Herczeg
no flags
refcount-imageframe-data.patch (2.83 KB, patch)
2014-04-10 10:17 PDT, Ed Catmur
no flags
imageframe-use-refcounted-array.patch (2.36 KB, patch)
2014-04-10 10:20 PDT, Ed Catmur
no flags
Workaround patch (1.78 KB, patch)
2014-06-16 02:23 PDT, cand
no flags
Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2 (598.34 KB, application/zip)
2014-06-16 16:13 PDT, Build Bot
no flags
Spinning cube files for demonstrating crashes. (281.72 KB, application/zip)
2015-08-09 09:53 PDT, Jeremy Zerfas
no flags
Patch (2.55 KB, patch)
2016-03-07 14:38 PST, Martin Robinson
no flags
Patch (2.52 KB, patch)
2016-03-09 19:08 PST, Martin Robinson
no flags
Martin Robinson
Comment 1 2013-03-01 09:35:03 PST
Thanks for finding this! I think it's probably safe to just rip out NativeImageCairo altogether. That's the patch you are referring to right? I think that we can instead do all per-backend caching via Cairo surface data and PlatformContextCairo.
Zoltan Herczeg
Comment 2 2013-03-01 14:19:09 PST
Sorry, I forgot the bug report: https://bugs.webkit.org/show_bug.cgi?id=83611 What do you mean by ripping out NativeImageCairo? What should we use instead?
Martin Robinson
Comment 3 2013-03-05 10:25:43 PST
(In reply to comment #2) > Sorry, I forgot the bug report: > https://bugs.webkit.org/show_bug.cgi?id=83611 > > What do you mean by ripping out NativeImageCairo? What should we use instead? Essentially I mean that we could replace the NativeImageCairo class with a pointer to a raw cairo_surface_t in the way it was before I introduced NativeImageCairo.
Zoltan Herczeg
Comment 4 2013-03-08 05:55:34 PST
> Essentially I mean that we could replace the NativeImageCairo class with a pointer to a raw cairo_surface_t in the way it was before I introduced NativeImageCairo. I realized this will not help. The problem is, that ImageFrame has a Vector<PixelData> m_backingStore member, which is reallocated when the frame is copied, so the original buffer becomes invalid. However, the image surface created by the ImageFrame still uses the old buffer. The point is, in case of resize (and only in this case!) the m_backingStore should keep its value.
Zoltan Herczeg
Comment 5 2013-03-12 03:19:35 PDT
Created attachment 192693 [details] patch This is the best solution I can come up with. Please review.
Martin Robinson
Comment 6 2013-03-18 10:42:21 PDT
There are a couple things that I don't understand here. The first is why simply reverting my original patch doesn't fix the issue, if it did cause it. The second is why this isn't a problem for other platforms.
Zoltan Herczeg
Comment 7 2013-03-18 11:46:41 PDT
(In reply to comment #6) > There are a couple things that I don't understand here. The first is why simply reverting my original patch doesn't fix the issue, if it did cause it. The second is why this isn't a problem for other platforms. Q1: the surface points directly to the data() member of m_backingStore. So it doesn't matter if we have a NativeImage or we simply use a cairo surface, the issue is that the rgba array is freed. I don't think your patch caused it, perhaps just noone noticed that before. Q2: Because they probably duplicate the buffer. That is safer, but the consumed memory is duplicated as well. I like the approach of cairo, it is clever.
Noam Rosenthal
Comment 8 2013-04-26 02:26:21 PDT
(In reply to comment #6) > There are a couple things that I don't understand here. The first is why simply reverting my original patch doesn't fix the issue, if it did cause it. The second is why this isn't a problem for other platforms. The same problem occurs for Qt.
Noam Rosenthal
Comment 9 2013-04-26 02:27:29 PDT
Comment on attachment 192693 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=192693&action=review How about using a SegmentedVector for m_frameBufferCache instead? > Source/WebCore/platform/image-decoders/ImageDecoder.h:245 > +#if USE(CAIRO) Since this occurs for Qt as well, maybe we should have something more general.
Zoltan Herczeg
Comment 10 2013-05-03 03:50:50 PDT
> How about using a SegmentedVector for m_frameBufferCache instead? Needs a lot of changes. Iterators, missing members (such as resize), etc. Probably most of them are trivial, but there are many of them. Do you think it is worth it?
Allan Sandfeld Jensen
Comment 11 2013-06-13 02:46:37 PDT
As far as I can tell, Qt now does a full copy of the data when creating a QPixmap, so we don't have the same bug currently.
Noam Rosenthal
Comment 12 2013-06-18 03:06:20 PDT
Martin, could you give this another look?
Martin Robinson
Comment 13 2013-06-20 12:27:28 PDT
(In reply to comment #11) > As far as I can tell, Qt now does a full copy of the data when creating a QPixmap, so we don't have the same bug currently. If both Qt and CG are doing this, it's probably much safer to follow the crowd. While this might be a minor optimization in Cairo's case, it seems that we should maintain consistent class behavior across platforms. If CG doesn't copy the buffer, we probably need to understand why it isn't an issue there.
Anton Obzhirov
Comment 14 2013-07-08 08:00:39 PDT
I've tried this patch for https://bugs.webkit.org/show_bug.cgi?id=16200 - seems to fix the issue. However I think it might be not the optimal solution as it involves extra copying of the image buffer. Did you guys consider some extra referencing mechanism for the data inside image frame?
Zoltan Herczeg
Comment 15 2013-07-08 23:39:19 PDT
(In reply to comment #14) > I've tried this patch for https://bugs.webkit.org/show_bug.cgi?id=16200 - seems to fix the issue. However I think it might be not the optimal solution as it involves extra copying of the image buffer. Did you guys consider some extra referencing mechanism for the data inside image frame? No, it actually makes the extra copy unnecessary, which happens when the frames array is resized. The copy constructor is the not optimal solution in this case.
Andreas Kling
Comment 16 2014-02-05 11:16:48 PST
Comment on attachment 192693 [details] patch Clearing review flag on patches from before 2014. If this patch is still relevant, please reset the r? flag.
Ed Catmur
Comment 17 2014-04-10 09:22:28 PDT
Is this patch likely to be accepted? The only alternative I can see is on Cairo (and only on Cairo) to replace m_backingStore with a PassNativeImagePtr that gets refcount-incremented when the ImageFrame is copied (when the frame buffer cache is resized). I can code up an alternative patch if it would be preferred.
Ed Catmur
Comment 18 2014-04-10 10:17:21 PDT
Created attachment 229057 [details] refcount-imageframe-data.patch Alternative 1: use a refcounted Cairo image surface
Martin Robinson
Comment 19 2014-04-10 10:20:16 PDT
This reason this hasn't been fixed is that it's simply missing a minimal reproducible test case. If someone could indicate how to reproduce this issue I'd be happy to take a closer look.
Ed Catmur
Comment 20 2014-04-10 10:20:37 PDT
Created attachment 229059 [details] imageframe-use-refcounted-array.patch Alternative 2: use a RefCountedArray instead of Vector to hold the image data This is probably more straightforward. copyImageData is a little inefficient (it unnecessarily memsets the array) but we'd need to add to RefCountedArray (maybe a clone() method) to fix that.
Ed Catmur
Comment 21 2014-04-10 10:25:23 PDT
(In reply to comment #19) > This reason this hasn't been fixed is that it's simply missing a minimal reproducible test case. If someone could indicate how to reproduce this issue I'd be happy to take a closer look. This page reliably does it for me: http://math.stackexchange.com/questions/733754/visually-stunning-math-concepts-which-are-easy-to-explain Essentially you need a page with large animations with plenty of frames and where the frame buffer cache gets resized while the animation is drawing.
Martin Robinson
Comment 22 2014-04-10 10:48:00 PDT
(In reply to comment #21) > Essentially you need a page with large animations with plenty of frames and where the frame buffer cache gets resized while the animation is drawing. I cannot reproduce just by viewing the page in MiniBrowser. Do I need a particular port or version of Cairo to see the crash?
Ed Catmur
Comment 23 2014-04-10 14:52:28 PDT
(In reply to comment #22) > I cannot reproduce just by viewing the page in MiniBrowser. Do I need a particular port or version of Cairo to see the crash? Shouldn't think so - the crash occurs when Cairo reads from freed memory, so it won't make any difference what the version of Cairo is. It's more likely to depend on what allocator you're using - it might help to switch to the system allocator (#define USE_SYSTEM_MALLOC 1). You also need to give the allocator a chance to return pages to the OS; either that or switch to an allocator that poisons freed memory.
Zoltan Herczeg
Comment 24 2014-04-11 00:12:41 PDT
You need an animated image with more than one frame to see invalid result or crash. The default size of image list is 1, since most images are still images, so two frames are enough. However, if the resize happens in place, or the resized area contains pointers to valid memory regions (already mapped to the process address space) you will not see the crash. In the former case, the engine seems work correctly. I would suggest a small (8x8) animated gif with 100+ frames (to avoid in place resizes), and do a pixel dump (invalid output can be detected).
cand
Comment 25 2014-06-16 01:19:50 PDT
(In reply to comment #20) > Created an attachment (id=229059) [details] > imageframe-use-refcounted-array.patch > > Alternative 2: use a RefCountedArray instead of Vector to hold the image data I'm using this patch, and google.com still randomly crashes. Doesn't really help that they change the gif each day. Segfault in ImageFrame::copyBitmapData on the memcpy line. Only happens standalone and in gdb, I could not get it to happen in valgrind. It's called from the resize line in GIFImageDecoder, but the backtrace is not really useful beyond that.
cand
Comment 26 2014-06-16 01:58:07 PDT
The alternative#2 patch also had visible corruption in the gif before the crash. The "make a copy" approach from https://bugs.webkit.org/show_bug.cgi?id=16200#c11 works, but drops performance to unacceptably low numbers. There is no corruption or crash, but the gif plays at 1/4 the speed it should. Next I will test alt#1.
cand
Comment 27 2014-06-16 02:02:26 PDT
(In reply to comment #18) > Created an attachment (id=229057) [details] > refcount-imageframe-data.patch > > Alternative 1: use a refcounted Cairo image surface This patch crashes in the same place as alt#2.
cand
Comment 28 2014-06-16 02:04:31 PDT
(In reply to comment #27) > (In reply to comment #18) > > Created an attachment (id=229057) [details] [details] > > refcount-imageframe-data.patch > > > > Alternative 1: use a refcounted Cairo image surface > > This patch crashes in the same place as alt#2. Sorry, clarifying: it also crashes in memcpy called from resize. Obviously the memcpy is in a different place.
cand
Comment 29 2014-06-16 02:23:48 PDT
Created attachment 233153 [details] Workaround patch This is what I will deploy for now. No crashes, and once the gif is loaded, normal performance. It disables the 5mb/ios 2mb animated limit, removing memcopies from runtime, but using more RAM. The GIF decoder is simply too buggy to handle the runtime resize in its current state. It also seems to decode all images fully, when only a small part may be needed. But there are other bugs open on the optimization of the gif decoder.
Martin Robinson
Comment 30 2014-06-16 08:46:13 PDT
Comment on attachment 233153 [details] Workaround patch View in context: https://bugs.webkit.org/attachment.cgi?id=233153&action=review > Source/WebCore/platform/image-decoders/cairo/ImageDecoderCairo.cpp:39 > + cairo_surface_t* surface = cairo_image_surface_create(CAIRO_FORMAT_ARGB32, width(), height()); > + unsigned char* data = cairo_image_surface_get_data(surface); > + memcpy(data, m_bytes, width() * height() * sizeof(PixelData)); > + cairo_surface_mark_dirty(surface); > + return adoptRef(surface); I don't know what to say about the patch, but I do know this is wrong because you aren't taking into account the image surface stride.
Build Bot
Comment 31 2014-06-16 16:13:17 PDT
Comment on attachment 233153 [details] Workaround patch Attachment 233153 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.appspot.com/results/6613769119596544 New failing tests: compositing/absolute-inside-out-of-view-fixed.html animations/3d/change-transform-in-end-event.html accessibility/accessibility-node-memory-management.html animations/added-while-suspended.html canvas/philip/tests/2d.canvas.readonly.html
Build Bot
Comment 32 2014-06-16 16:13:22 PDT
Created attachment 233200 [details] Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: webkit-ews-10 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Jeremy Zerfas
Comment 33 2015-08-09 09:50:53 PDT
(In reply to comment #22) > I cannot reproduce just by viewing the page in MiniBrowser. Do I need a > particular port or version of Cairo to see the crash? The occurrence of these crashes also seem to be influenced by the bit rate of the GIF, the transfer rate from the server, and whether the GIF is offscreen while it's loading which might explain why you're having trouble reproducing these crashes. At least that seems to be the case based on my testing of WinLauncher.exe from a release build of the Windows Cairo port on Windows 10 x86_64. The GIF linked to from the original bug is pretty small and doesn't seem to reliably produce any crashes. However if the transfer rate is limited to less than 200 KBps or so it does seem to reliably produce a corrupt animation (images are offset or have random noise in them). The animated GIFs on the Stack Exchange page won't cause any problems either unless the window size is large enough to make the animated GIF in the first answer visible while the page is loading. I've made a spinning cube animation that should reliably demonstrate this problem, you can view it at http://www.jeremyzerfas.com/WebKit/Cairo_Animated_GIF_Crashing/Spinning_Cube_500x500.gif . To demonstrate that the crashes only occur when the animated GIFs are onscreen, I've also created a page at http://www.jeremyzerfas.com/WebKit/Cairo_Animated_GIF_Crashing/Spinning_Cube_Page_with_Animation_at_the_Bottom.html that includes the same animation but shows it on a long page that you need to scroll down in order to make the animation visible. I'll attach the two files in case I remove these files later but keep in mind that the crashes also don't seem to occur if the animation is loaded from the file system instead of from a web server. Zoltan's patch in attachment 192693 [details] seems to work well for fixing this bug and looks like a good fix for now.
Jeremy Zerfas
Comment 34 2015-08-09 09:53:11 PDT
Created attachment 258593 [details] Spinning cube files for demonstrating crashes.
Martin Robinson
Comment 35 2016-03-07 14:38:52 PST
Martin Robinson
Comment 36 2016-03-07 14:39:46 PST
Jeremy, sorry for the super late response. Do you mind testing the patch that I have posted? Copying the memory might be a performance hit, but avoiding the copying is a much more complicated task and it's better to prevent crashes now.
Darin Adler
Comment 37 2016-03-08 09:27:48 PST
Comment on attachment 273214 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=273214&action=review Seems really sad and bad for performance to be copying the image data all the time instead of being able to reference-count the already allocated buffer so in the normal case no copying is necessary. But that’s presumably a bigger project. > Source/WebCore/ChangeLog:11 > + No new tests. This issue is incredibly illusive and I haven't been > + able to reproduce it locally. It depends very much on the network speed > + and the size of the animated GIF. The code in question seems obviously > + wrong. Typo: we mean "elusive" instead of "illusive" Sad to hear we weren’t able to create a test. > Source/WebCore/platform/image-decoders/cairo/ImageDecoderCairo.cpp:36 > + PixelData* copiedData = static_cast<PixelData*>(fastMalloc(width() * height() * sizeof(PixelData))); > + memcpy(copiedData, m_bytes, width() * height() * sizeof(PixelData)); Might consider new/delete instead of fastMalloc/fastFree. PixelData* copiedData = new PixelData[width() * height()]; Then, below: delete [] static_cast<PixelData*>(data); Might also consider putting the fastMalloc closer to the fastFree by allocating the copied data *after* creating the nativeImage. > Source/WebCore/platform/image-decoders/cairo/ImageDecoderCairo.cpp:40 > + RefPtr<cairo_surface_t> nativeImage = adoptRef(cairo_image_surface_create_for_data( > + reinterpret_cast<unsigned char*>(const_cast<PixelData*>(copiedData)), > CAIRO_FORMAT_ARGB32, width(), height(), width() * sizeof(PixelData))); Can this ever return null? The old code would return nullptr, but the new code will crash in this case. I also suggest auto for the local variable type here rather than explicitly stating RefPtr<cairo_surface_t>.
Martin Robinson
Comment 38 2016-03-08 10:07:29 PST
(In reply to comment #37) > Comment on attachment 273214 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=273214&action=review > > Seems really sad and bad for performance to be copying the image data all > the time instead of being able to reference-count the already allocated > buffer so in the normal case no copying is necessary. But that’s presumably > a bigger project. Thanks for the review. I'm going to hold off committing until I can confirm that this fixes the original issue (probably I should have prevented the r? flag from activating). As a side note, I have investigated using reference counting to preserve the memory. It is possible, but it has potential negatively implications for image decoding performance elsewhere. The issue is that currently the image decoder first tries to resize the backing store vector when the frame size changes. I believe that Vector can sometimes do this without reallocating, but if we use a shared block of memory we always need to reallocate. I'm not sure what is the correct performance tradeoff. > > > Source/WebCore/ChangeLog:11 > > + No new tests. This issue is incredibly illusive and I haven't been > > + able to reproduce it locally. It depends very much on the network speed > > + and the size of the animated GIF. The code in question seems obviously > > + wrong. > > Typo: we mean "elusive" instead of "illusive" Yikes! Thanks. > Sad to hear we weren’t able to create a test. > > > Source/WebCore/platform/image-decoders/cairo/ImageDecoderCairo.cpp:36 > > + PixelData* copiedData = static_cast<PixelData*>(fastMalloc(width() * height() * sizeof(PixelData))); > > + memcpy(copiedData, m_bytes, width() * height() * sizeof(PixelData)); > > Might consider new/delete instead of fastMalloc/fastFree. > > PixelData* copiedData = new PixelData[width() * height()]; > > Then, below: > > delete [] static_cast<PixelData*>(data); > > Might also consider putting the fastMalloc closer to the fastFree by > allocating the copied data *after* creating the nativeImage. > > > Source/WebCore/platform/image-decoders/cairo/ImageDecoderCairo.cpp:40 > > + RefPtr<cairo_surface_t> nativeImage = adoptRef(cairo_image_surface_create_for_data( > > + reinterpret_cast<unsigned char*>(const_cast<PixelData*>(copiedData)), > > CAIRO_FORMAT_ARGB32, width(), height(), width() * sizeof(PixelData))); > > Can this ever return null? The old code would return nullptr, but the new > code will crash in this case. > > I also suggest auto for the local variable type here rather than explicitly > stating RefPtr<cairo_surface_t>. Okay. I will make all these changes.
cand
Comment 39 2016-03-08 10:28:20 PST
Martin, I find it funny you complained about my patch not taking stride into account, and now your patch, which is clearly based on mine, does not do so either. I've used my patch for over a year now without issues. FWIW, the issue was very easy to trigger for me, taking just seconds and a couple refreshes of google.com.
Martin Robinson
Comment 40 2016-03-08 10:43:26 PST
(In reply to comment #39) > Martin, I find it funny you complained about my patch not taking stride into > account, and now your patch, which is clearly based on mine, does not do so > either. After looking more closely at this code and understanding the problem more thoroughly, it's clear that strides that are not 32-bits per pixel will fail anyway. We should likely have a compile-time check in the image decoder code for this, if we can. Sorry for the misunderstanding. I'm going to take a look at your reference counting patch again though.
Martin Robinson
Comment 41 2016-03-09 19:02:04 PST
(In reply to comment #37) > Comment on attachment 273214 [details] > Patch > Might also consider putting the fastMalloc closer to the fastFree by > allocating the copied data *after* creating the nativeImage. I'm actually not sure this is possible. We need to have a pointer to the data when we call cairo_image_surface_create_for_data to construct the surface. There also is no way (as far as I know) to create the surface and then set the data later. > > Source/WebCore/platform/image-decoders/cairo/ImageDecoderCairo.cpp:40 > > + RefPtr<cairo_surface_t> nativeImage = adoptRef(cairo_image_surface_create_for_data( > > + reinterpret_cast<unsigned char*>(const_cast<PixelData*>(copiedData)), > > CAIRO_FORMAT_ARGB32, width(), height(), width() * sizeof(PixelData))); > > Can this ever return null? The old code would return nullptr, but the new > code will crash in this case. A creation failure should return a Cairo error surface, so this shouldn't crash.
Martin Robinson
Comment 42 2016-03-09 19:08:23 PST
Jeremy Zerfas
Comment 43 2016-03-12 07:36:30 PST
(In reply to comment #36) > Do you mind testing the patch that I have posted? I've tested the patches in attachment 273214 [details] and attachment 273528 [details] and both patches do work for fixing the crashes.
Michael Catanzaro
Comment 44 2017-03-06 10:34:40 PST
*** Bug 16200 has been marked as a duplicate of this bug. ***
Michael Catanzaro
Comment 45 2017-03-06 10:34:48 PST
*** Bug 48895 has been marked as a duplicate of this bug. ***
Michael Catanzaro
Comment 46 2017-03-06 10:35:00 PST
*** Bug 80754 has been marked as a duplicate of this bug. ***
Michael Catanzaro
Comment 47 2017-03-06 10:37:53 PST
The patch looks correct. Martin, were you planning to commit this? I guess it's probably rotted a bit by now, but couldn't hurt to try commit-queue...?
Note You need to log in before you can comment on or make changes to this bug.