WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
111179
[Cairo] Surface pointer passed to asNewNativeImage() might be freed.
https://bugs.webkit.org/show_bug.cgi?id=111179
Summary
[Cairo] Surface pointer passed to asNewNativeImage() might be freed.
Zoltan Herczeg
Reported
2013-03-01 07:12:21 PST
Created
attachment 190964
[details]
early draft patch Hi Martin, you made a patch a year ago which refactored how Cairo wrap surfaces. The side effect of your "just a refactoring" patch is that when the frame data is moved, the cairo surface does not know about it. For example, this might happen in GIFImageDecoder.cpp:90 m_frameBufferCache.resize(reader.images_count); // Reallocates all pixel buffers. When the following image is loaded: <html><body> <img width=90 height=90 src="
http://images.animationfactory.com/thw/thw17/AFCT/20090116/FTW/i/12316066.gif?mouse_new_year_2012_confetti_sm_wm
"> </body></html> This was actually a really nasty bug, which took me a week to debug it, because it unfortunately disappears when valgrind or gdb is used. It appears only when WebKit execution is fast and the download speed is slow (some kind of racing condition). So I finally used mprotect to force a crash on memory write. A draft patch is attached, but I don't think this is the best solution. Any suggestions are welcome.
Attachments
early draft patch
(866 bytes, patch)
2013-03-01 07:12 PST
,
Zoltan Herczeg
no flags
Details
Formatted Diff
Diff
patch
(2.14 KB, patch)
2013-03-12 03:19 PDT
,
Zoltan Herczeg
no flags
Details
Formatted Diff
Diff
refcount-imageframe-data.patch
(2.83 KB, patch)
2014-04-10 10:17 PDT
,
Ed Catmur
no flags
Details
Formatted Diff
Diff
imageframe-use-refcounted-array.patch
(2.36 KB, patch)
2014-04-10 10:20 PDT
,
Ed Catmur
no flags
Details
Formatted Diff
Diff
Workaround patch
(1.78 KB, patch)
2014-06-16 02:23 PDT
,
cand
no flags
Details
Formatted Diff
Diff
Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2
(598.34 KB, application/zip)
2014-06-16 16:13 PDT
,
Build Bot
no flags
Details
Spinning cube files for demonstrating crashes.
(281.72 KB, application/zip)
2015-08-09 09:53 PDT
,
Jeremy Zerfas
no flags
Details
Patch
(2.55 KB, patch)
2016-03-07 14:38 PST
,
Martin Robinson
no flags
Details
Formatted Diff
Diff
Patch
(2.52 KB, patch)
2016-03-09 19:08 PST
,
Martin Robinson
no flags
Details
Formatted Diff
Diff
Show Obsolete
(6)
View All
Add attachment
proposed patch, testcase, etc.
Martin Robinson
Comment 1
2013-03-01 09:35:03 PST
Thanks for finding this! I think it's probably safe to just rip out NativeImageCairo altogether. That's the patch you are referring to right? I think that we can instead do all per-backend caching via Cairo surface data and PlatformContextCairo.
Zoltan Herczeg
Comment 2
2013-03-01 14:19:09 PST
Sorry, I forgot the bug report:
https://bugs.webkit.org/show_bug.cgi?id=83611
What do you mean by ripping out NativeImageCairo? What should we use instead?
Martin Robinson
Comment 3
2013-03-05 10:25:43 PST
(In reply to
comment #2
)
> Sorry, I forgot the bug report: >
https://bugs.webkit.org/show_bug.cgi?id=83611
> > What do you mean by ripping out NativeImageCairo? What should we use instead?
Essentially I mean that we could replace the NativeImageCairo class with a pointer to a raw cairo_surface_t in the way it was before I introduced NativeImageCairo.
Zoltan Herczeg
Comment 4
2013-03-08 05:55:34 PST
> Essentially I mean that we could replace the NativeImageCairo class with a pointer to a raw cairo_surface_t in the way it was before I introduced NativeImageCairo.
I realized this will not help. The problem is, that ImageFrame has a Vector<PixelData> m_backingStore member, which is reallocated when the frame is copied, so the original buffer becomes invalid. However, the image surface created by the ImageFrame still uses the old buffer. The point is, in case of resize (and only in this case!) the m_backingStore should keep its value.
Zoltan Herczeg
Comment 5
2013-03-12 03:19:35 PDT
Created
attachment 192693
[details]
patch This is the best solution I can come up with. Please review.
Martin Robinson
Comment 6
2013-03-18 10:42:21 PDT
There are a couple things that I don't understand here. The first is why simply reverting my original patch doesn't fix the issue, if it did cause it. The second is why this isn't a problem for other platforms.
Zoltan Herczeg
Comment 7
2013-03-18 11:46:41 PDT
(In reply to
comment #6
)
> There are a couple things that I don't understand here. The first is why simply reverting my original patch doesn't fix the issue, if it did cause it. The second is why this isn't a problem for other platforms.
Q1: the surface points directly to the data() member of m_backingStore. So it doesn't matter if we have a NativeImage or we simply use a cairo surface, the issue is that the rgba array is freed. I don't think your patch caused it, perhaps just noone noticed that before. Q2: Because they probably duplicate the buffer. That is safer, but the consumed memory is duplicated as well. I like the approach of cairo, it is clever.
Noam Rosenthal
Comment 8
2013-04-26 02:26:21 PDT
(In reply to
comment #6
)
> There are a couple things that I don't understand here. The first is why simply reverting my original patch doesn't fix the issue, if it did cause it. The second is why this isn't a problem for other platforms.
The same problem occurs for Qt.
Noam Rosenthal
Comment 9
2013-04-26 02:27:29 PDT
Comment on
attachment 192693
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=192693&action=review
How about using a SegmentedVector for m_frameBufferCache instead?
> Source/WebCore/platform/image-decoders/ImageDecoder.h:245 > +#if USE(CAIRO)
Since this occurs for Qt as well, maybe we should have something more general.
Zoltan Herczeg
Comment 10
2013-05-03 03:50:50 PDT
> How about using a SegmentedVector for m_frameBufferCache instead?
Needs a lot of changes. Iterators, missing members (such as resize), etc. Probably most of them are trivial, but there are many of them. Do you think it is worth it?
Allan Sandfeld Jensen
Comment 11
2013-06-13 02:46:37 PDT
As far as I can tell, Qt now does a full copy of the data when creating a QPixmap, so we don't have the same bug currently.
Noam Rosenthal
Comment 12
2013-06-18 03:06:20 PDT
Martin, could you give this another look?
Martin Robinson
Comment 13
2013-06-20 12:27:28 PDT
(In reply to
comment #11
)
> As far as I can tell, Qt now does a full copy of the data when creating a QPixmap, so we don't have the same bug currently.
If both Qt and CG are doing this, it's probably much safer to follow the crowd. While this might be a minor optimization in Cairo's case, it seems that we should maintain consistent class behavior across platforms. If CG doesn't copy the buffer, we probably need to understand why it isn't an issue there.
Anton Obzhirov
Comment 14
2013-07-08 08:00:39 PDT
I've tried this patch for
https://bugs.webkit.org/show_bug.cgi?id=16200
- seems to fix the issue. However I think it might be not the optimal solution as it involves extra copying of the image buffer. Did you guys consider some extra referencing mechanism for the data inside image frame?
Zoltan Herczeg
Comment 15
2013-07-08 23:39:19 PDT
(In reply to
comment #14
)
> I've tried this patch for
https://bugs.webkit.org/show_bug.cgi?id=16200
- seems to fix the issue. However I think it might be not the optimal solution as it involves extra copying of the image buffer. Did you guys consider some extra referencing mechanism for the data inside image frame?
No, it actually makes the extra copy unnecessary, which happens when the frames array is resized. The copy constructor is the not optimal solution in this case.
Andreas Kling
Comment 16
2014-02-05 11:16:48 PST
Comment on
attachment 192693
[details]
patch Clearing review flag on patches from before 2014. If this patch is still relevant, please reset the r? flag.
Ed Catmur
Comment 17
2014-04-10 09:22:28 PDT
Is this patch likely to be accepted? The only alternative I can see is on Cairo (and only on Cairo) to replace m_backingStore with a PassNativeImagePtr that gets refcount-incremented when the ImageFrame is copied (when the frame buffer cache is resized). I can code up an alternative patch if it would be preferred.
Ed Catmur
Comment 18
2014-04-10 10:17:21 PDT
Created
attachment 229057
[details]
refcount-imageframe-data.patch Alternative 1: use a refcounted Cairo image surface
Martin Robinson
Comment 19
2014-04-10 10:20:16 PDT
This reason this hasn't been fixed is that it's simply missing a minimal reproducible test case. If someone could indicate how to reproduce this issue I'd be happy to take a closer look.
Ed Catmur
Comment 20
2014-04-10 10:20:37 PDT
Created
attachment 229059
[details]
imageframe-use-refcounted-array.patch Alternative 2: use a RefCountedArray instead of Vector to hold the image data This is probably more straightforward. copyImageData is a little inefficient (it unnecessarily memsets the array) but we'd need to add to RefCountedArray (maybe a clone() method) to fix that.
Ed Catmur
Comment 21
2014-04-10 10:25:23 PDT
(In reply to
comment #19
)
> This reason this hasn't been fixed is that it's simply missing a minimal reproducible test case. If someone could indicate how to reproduce this issue I'd be happy to take a closer look.
This page reliably does it for me:
http://math.stackexchange.com/questions/733754/visually-stunning-math-concepts-which-are-easy-to-explain
Essentially you need a page with large animations with plenty of frames and where the frame buffer cache gets resized while the animation is drawing.
Martin Robinson
Comment 22
2014-04-10 10:48:00 PDT
(In reply to
comment #21
)
> Essentially you need a page with large animations with plenty of frames and where the frame buffer cache gets resized while the animation is drawing.
I cannot reproduce just by viewing the page in MiniBrowser. Do I need a particular port or version of Cairo to see the crash?
Ed Catmur
Comment 23
2014-04-10 14:52:28 PDT
(In reply to
comment #22
)
> I cannot reproduce just by viewing the page in MiniBrowser. Do I need a particular port or version of Cairo to see the crash?
Shouldn't think so - the crash occurs when Cairo reads from freed memory, so it won't make any difference what the version of Cairo is. It's more likely to depend on what allocator you're using - it might help to switch to the system allocator (#define USE_SYSTEM_MALLOC 1). You also need to give the allocator a chance to return pages to the OS; either that or switch to an allocator that poisons freed memory.
Zoltan Herczeg
Comment 24
2014-04-11 00:12:41 PDT
You need an animated image with more than one frame to see invalid result or crash. The default size of image list is 1, since most images are still images, so two frames are enough. However, if the resize happens in place, or the resized area contains pointers to valid memory regions (already mapped to the process address space) you will not see the crash. In the former case, the engine seems work correctly. I would suggest a small (8x8) animated gif with 100+ frames (to avoid in place resizes), and do a pixel dump (invalid output can be detected).
cand
Comment 25
2014-06-16 01:19:50 PDT
(In reply to
comment #20
)
> Created an attachment (id=229059) [details] > imageframe-use-refcounted-array.patch > > Alternative 2: use a RefCountedArray instead of Vector to hold the image data
I'm using this patch, and google.com still randomly crashes. Doesn't really help that they change the gif each day. Segfault in ImageFrame::copyBitmapData on the memcpy line. Only happens standalone and in gdb, I could not get it to happen in valgrind. It's called from the resize line in GIFImageDecoder, but the backtrace is not really useful beyond that.
cand
Comment 26
2014-06-16 01:58:07 PDT
The alternative#2 patch also had visible corruption in the gif before the crash. The "make a copy" approach from
https://bugs.webkit.org/show_bug.cgi?id=16200#c11
works, but drops performance to unacceptably low numbers. There is no corruption or crash, but the gif plays at 1/4 the speed it should. Next I will test alt#1.
cand
Comment 27
2014-06-16 02:02:26 PDT
(In reply to
comment #18
)
> Created an attachment (id=229057) [details] > refcount-imageframe-data.patch > > Alternative 1: use a refcounted Cairo image surface
This patch crashes in the same place as alt#2.
cand
Comment 28
2014-06-16 02:04:31 PDT
(In reply to
comment #27
)
> (In reply to
comment #18
) > > Created an attachment (id=229057) [details] [details] > > refcount-imageframe-data.patch > > > > Alternative 1: use a refcounted Cairo image surface > > This patch crashes in the same place as alt#2.
Sorry, clarifying: it also crashes in memcpy called from resize. Obviously the memcpy is in a different place.
cand
Comment 29
2014-06-16 02:23:48 PDT
Created
attachment 233153
[details]
Workaround patch This is what I will deploy for now. No crashes, and once the gif is loaded, normal performance. It disables the 5mb/ios 2mb animated limit, removing memcopies from runtime, but using more RAM. The GIF decoder is simply too buggy to handle the runtime resize in its current state. It also seems to decode all images fully, when only a small part may be needed. But there are other bugs open on the optimization of the gif decoder.
Martin Robinson
Comment 30
2014-06-16 08:46:13 PDT
Comment on
attachment 233153
[details]
Workaround patch View in context:
https://bugs.webkit.org/attachment.cgi?id=233153&action=review
> Source/WebCore/platform/image-decoders/cairo/ImageDecoderCairo.cpp:39 > + cairo_surface_t* surface = cairo_image_surface_create(CAIRO_FORMAT_ARGB32, width(), height()); > + unsigned char* data = cairo_image_surface_get_data(surface); > + memcpy(data, m_bytes, width() * height() * sizeof(PixelData)); > + cairo_surface_mark_dirty(surface); > + return adoptRef(surface);
I don't know what to say about the patch, but I do know this is wrong because you aren't taking into account the image surface stride.
Build Bot
Comment 31
2014-06-16 16:13:17 PDT
Comment on
attachment 233153
[details]
Workaround patch
Attachment 233153
[details]
did not pass mac-wk2-ews (mac-wk2): Output:
http://webkit-queues.appspot.com/results/6613769119596544
New failing tests: compositing/absolute-inside-out-of-view-fixed.html animations/3d/change-transform-in-end-event.html accessibility/accessibility-node-memory-management.html animations/added-while-suspended.html canvas/philip/tests/2d.canvas.readonly.html
Build Bot
Comment 32
2014-06-16 16:13:22 PDT
Created
attachment 233200
[details]
Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: webkit-ews-10 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Jeremy Zerfas
Comment 33
2015-08-09 09:50:53 PDT
(In reply to
comment #22
)
> I cannot reproduce just by viewing the page in MiniBrowser. Do I need a > particular port or version of Cairo to see the crash?
The occurrence of these crashes also seem to be influenced by the bit rate of the GIF, the transfer rate from the server, and whether the GIF is offscreen while it's loading which might explain why you're having trouble reproducing these crashes. At least that seems to be the case based on my testing of WinLauncher.exe from a release build of the Windows Cairo port on Windows 10 x86_64. The GIF linked to from the original bug is pretty small and doesn't seem to reliably produce any crashes. However if the transfer rate is limited to less than 200 KBps or so it does seem to reliably produce a corrupt animation (images are offset or have random noise in them). The animated GIFs on the Stack Exchange page won't cause any problems either unless the window size is large enough to make the animated GIF in the first answer visible while the page is loading. I've made a spinning cube animation that should reliably demonstrate this problem, you can view it at
http://www.jeremyzerfas.com/WebKit/Cairo_Animated_GIF_Crashing/Spinning_Cube_500x500.gif
. To demonstrate that the crashes only occur when the animated GIFs are onscreen, I've also created a page at
http://www.jeremyzerfas.com/WebKit/Cairo_Animated_GIF_Crashing/Spinning_Cube_Page_with_Animation_at_the_Bottom.html
that includes the same animation but shows it on a long page that you need to scroll down in order to make the animation visible. I'll attach the two files in case I remove these files later but keep in mind that the crashes also don't seem to occur if the animation is loaded from the file system instead of from a web server. Zoltan's patch in
attachment 192693
[details]
seems to work well for fixing this bug and looks like a good fix for now.
Jeremy Zerfas
Comment 34
2015-08-09 09:53:11 PDT
Created
attachment 258593
[details]
Spinning cube files for demonstrating crashes.
Martin Robinson
Comment 35
2016-03-07 14:38:52 PST
Created
attachment 273214
[details]
Patch
Martin Robinson
Comment 36
2016-03-07 14:39:46 PST
Jeremy, sorry for the super late response. Do you mind testing the patch that I have posted? Copying the memory might be a performance hit, but avoiding the copying is a much more complicated task and it's better to prevent crashes now.
Darin Adler
Comment 37
2016-03-08 09:27:48 PST
Comment on
attachment 273214
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=273214&action=review
Seems really sad and bad for performance to be copying the image data all the time instead of being able to reference-count the already allocated buffer so in the normal case no copying is necessary. But that’s presumably a bigger project.
> Source/WebCore/ChangeLog:11 > + No new tests. This issue is incredibly illusive and I haven't been > + able to reproduce it locally. It depends very much on the network speed > + and the size of the animated GIF. The code in question seems obviously > + wrong.
Typo: we mean "elusive" instead of "illusive" Sad to hear we weren’t able to create a test.
> Source/WebCore/platform/image-decoders/cairo/ImageDecoderCairo.cpp:36 > + PixelData* copiedData = static_cast<PixelData*>(fastMalloc(width() * height() * sizeof(PixelData))); > + memcpy(copiedData, m_bytes, width() * height() * sizeof(PixelData));
Might consider new/delete instead of fastMalloc/fastFree. PixelData* copiedData = new PixelData[width() * height()]; Then, below: delete [] static_cast<PixelData*>(data); Might also consider putting the fastMalloc closer to the fastFree by allocating the copied data *after* creating the nativeImage.
> Source/WebCore/platform/image-decoders/cairo/ImageDecoderCairo.cpp:40 > + RefPtr<cairo_surface_t> nativeImage = adoptRef(cairo_image_surface_create_for_data( > + reinterpret_cast<unsigned char*>(const_cast<PixelData*>(copiedData)), > CAIRO_FORMAT_ARGB32, width(), height(), width() * sizeof(PixelData)));
Can this ever return null? The old code would return nullptr, but the new code will crash in this case. I also suggest auto for the local variable type here rather than explicitly stating RefPtr<cairo_surface_t>.
Martin Robinson
Comment 38
2016-03-08 10:07:29 PST
(In reply to
comment #37
)
> Comment on
attachment 273214
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=273214&action=review
> > Seems really sad and bad for performance to be copying the image data all > the time instead of being able to reference-count the already allocated > buffer so in the normal case no copying is necessary. But that’s presumably > a bigger project.
Thanks for the review. I'm going to hold off committing until I can confirm that this fixes the original issue (probably I should have prevented the r? flag from activating). As a side note, I have investigated using reference counting to preserve the memory. It is possible, but it has potential negatively implications for image decoding performance elsewhere. The issue is that currently the image decoder first tries to resize the backing store vector when the frame size changes. I believe that Vector can sometimes do this without reallocating, but if we use a shared block of memory we always need to reallocate. I'm not sure what is the correct performance tradeoff.
> > > Source/WebCore/ChangeLog:11 > > + No new tests. This issue is incredibly illusive and I haven't been > > + able to reproduce it locally. It depends very much on the network speed > > + and the size of the animated GIF. The code in question seems obviously > > + wrong. > > Typo: we mean "elusive" instead of "illusive"
Yikes! Thanks.
> Sad to hear we weren’t able to create a test. > > > Source/WebCore/platform/image-decoders/cairo/ImageDecoderCairo.cpp:36 > > + PixelData* copiedData = static_cast<PixelData*>(fastMalloc(width() * height() * sizeof(PixelData))); > > + memcpy(copiedData, m_bytes, width() * height() * sizeof(PixelData)); > > Might consider new/delete instead of fastMalloc/fastFree. > > PixelData* copiedData = new PixelData[width() * height()]; > > Then, below: > > delete [] static_cast<PixelData*>(data); > > Might also consider putting the fastMalloc closer to the fastFree by > allocating the copied data *after* creating the nativeImage. > > > Source/WebCore/platform/image-decoders/cairo/ImageDecoderCairo.cpp:40 > > + RefPtr<cairo_surface_t> nativeImage = adoptRef(cairo_image_surface_create_for_data( > > + reinterpret_cast<unsigned char*>(const_cast<PixelData*>(copiedData)), > > CAIRO_FORMAT_ARGB32, width(), height(), width() * sizeof(PixelData))); > > Can this ever return null? The old code would return nullptr, but the new > code will crash in this case. > > I also suggest auto for the local variable type here rather than explicitly > stating RefPtr<cairo_surface_t>.
Okay. I will make all these changes.
cand
Comment 39
2016-03-08 10:28:20 PST
Martin, I find it funny you complained about my patch not taking stride into account, and now your patch, which is clearly based on mine, does not do so either. I've used my patch for over a year now without issues. FWIW, the issue was very easy to trigger for me, taking just seconds and a couple refreshes of google.com.
Martin Robinson
Comment 40
2016-03-08 10:43:26 PST
(In reply to
comment #39
)
> Martin, I find it funny you complained about my patch not taking stride into > account, and now your patch, which is clearly based on mine, does not do so > either.
After looking more closely at this code and understanding the problem more thoroughly, it's clear that strides that are not 32-bits per pixel will fail anyway. We should likely have a compile-time check in the image decoder code for this, if we can. Sorry for the misunderstanding. I'm going to take a look at your reference counting patch again though.
Martin Robinson
Comment 41
2016-03-09 19:02:04 PST
(In reply to
comment #37
)
> Comment on
attachment 273214
[details]
> Patch
> Might also consider putting the fastMalloc closer to the fastFree by > allocating the copied data *after* creating the nativeImage.
I'm actually not sure this is possible. We need to have a pointer to the data when we call cairo_image_surface_create_for_data to construct the surface. There also is no way (as far as I know) to create the surface and then set the data later.
> > Source/WebCore/platform/image-decoders/cairo/ImageDecoderCairo.cpp:40 > > + RefPtr<cairo_surface_t> nativeImage = adoptRef(cairo_image_surface_create_for_data( > > + reinterpret_cast<unsigned char*>(const_cast<PixelData*>(copiedData)), > > CAIRO_FORMAT_ARGB32, width(), height(), width() * sizeof(PixelData))); > > Can this ever return null? The old code would return nullptr, but the new > code will crash in this case.
A creation failure should return a Cairo error surface, so this shouldn't crash.
Martin Robinson
Comment 42
2016-03-09 19:08:23 PST
Created
attachment 273528
[details]
Patch
Jeremy Zerfas
Comment 43
2016-03-12 07:36:30 PST
(In reply to
comment #36
)
> Do you mind testing the patch that I have posted?
I've tested the patches in
attachment 273214
[details]
and
attachment 273528
[details]
and both patches do work for fixing the crashes.
Michael Catanzaro
Comment 44
2017-03-06 10:34:40 PST
***
Bug 16200
has been marked as a duplicate of this bug. ***
Michael Catanzaro
Comment 45
2017-03-06 10:34:48 PST
***
Bug 48895
has been marked as a duplicate of this bug. ***
Michael Catanzaro
Comment 46
2017-03-06 10:35:00 PST
***
Bug 80754
has been marked as a duplicate of this bug. ***
Michael Catanzaro
Comment 47
2017-03-06 10:37:53 PST
The patch looks correct. Martin, were you planning to commit this? I guess it's probably rotted a bit by now, but couldn't hurt to try commit-queue...?
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug