Bug 165751 - ImageBufferCairo: cairo_image_surface should use bmalloc-allocated memory
Summary: ImageBufferCairo: cairo_image_surface should use bmalloc-allocated memory
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Zan Dobersek
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-12 02:40 PST by Zan Dobersek
Modified: 2017-01-26 01:23 PST (History)
5 users (show)

See Also:


Attachments
Patch (3.10 KB, patch)
2016-12-12 03:49 PST, Zan Dobersek
no flags Details | Formatted Diff | Diff
Patch (3.00 KB, patch)
2017-01-25 08:39 PST, Zan Dobersek
no flags Details | Formatted Diff | Diff
nytimes.com analysis (17.34 KB, text/plain)
2017-01-25 08:40 PST, Zan Dobersek
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Zan Dobersek 2016-12-12 02:40:45 PST
ImageBufferCairo: cairo_image_surface should use bmalloc-allocated memory
Comment 1 Zan Dobersek 2016-12-12 03:49:07 PST
Created attachment 296911 [details]
Patch
Comment 2 Michael Catanzaro 2016-12-21 12:35:59 PST
Comment on attachment 296911 [details]
Patch

The problem I see with this patch is that the cairo_surface_t is refcounted, but the data buffer is not, so it's not hard to imagine it being destroyed before the cairo_surface_t itself if any other part of the code ever decides to ref the cairo_surface_t, which would not be unreasonable. If you think this provides a significant performance enhancement then you should find some way to tie the lifetime of the buffer to that of the cairo_surface_t, instead of assuming that the ImageBufferDataCairo is the only object that can ever hold a reference to it. Unfortunately I don't see an easy way to do that; the cairo_surface_t is not a GObject, so you can't use a weak reference to tell you when it's destroyed, for instance. One non-ideal solution would be to add a manual call to cairo_surface_finish directly in the destructor, to guaranteed that happens even if there is somehow a ref outstanding, at least other code using the cairo_surface_t would turn into a CAIRO_STATUS_SURFACE_FINISHED error rather than a use after free vulnerability.

There's also a behavior change here, in that previously we create an empty/null/zero/default CairoImageSurface, but now we create a CarioImageSurface that's initialized using uninitialized data. (Note that MallocPtr is using fastMalloc and not fastZeroedMalloc.) Was that intentional? Even if that's not a problem for whatever reason (e.g. if it's always painted over right away), it does not seem like a good idea to pass uninitialized memory to cairo; does it not cause any problems with valgrind? Maybe you should ditch the MallocPtr and manually use fastZeroedMalloc instead? If zeroing the memory does not achieve satisfactory  performance, then I'd like to see an explanation of why it's not a problem to use uninitialized memory here, and your assurance that it doesn't trip asan or valgrind.
Comment 3 Michael Catanzaro 2016-12-21 12:38:06 PST
(I'm assuming that bmalloc gives a significant performance improvement here, for some value of "significant." If you don't see a significant improvement, then it's much simpler to just not change anything and let cairo own the memory.)
Comment 4 Michael Catanzaro 2016-12-21 12:40:04 PST
(In reply to comment #2)
> to guaranteed that happens even if there is somehow a ref
> outstanding

I meant to write: "...to guarantee that even if there is..."

> There's also a behavior change here, in that previously we create an
> empty/null/zero/default CairoImageSurface, but now we create a
> CarioImageSurface that's initialized using uninitialized data.

I meant to write cario_surface_t, not CairoImageSurface.
Comment 5 Carlos Garcia Campos 2016-12-22 00:05:06 PST
(In reply to comment #2)
> Comment on attachment 296911 [details]
> Patch
> 
> The problem I see with this patch is that the cairo_surface_t is refcounted,
> but the data buffer is not, so it's not hard to imagine it being destroyed
> before the cairo_surface_t itself if any other part of the code ever decides
> to ref the cairo_surface_t, which would not be unreasonable. If you think
> this provides a significant performance enhancement then you should find
> some way to tie the lifetime of the buffer to that of the cairo_surface_t,
> instead of assuming that the ImageBufferDataCairo is the only object that
> can ever hold a reference to it. Unfortunately I don't see an easy way to do
> that; the cairo_surface_t is not a GObject, so you can't use a weak
> reference to tell you when it's destroyed, for instance. One non-ideal
> solution would be to add a manual call to cairo_surface_finish directly in
> the destructor, to guaranteed that happens even if there is somehow a ref
> outstanding, at least other code using the cairo_surface_t would turn into a
> CAIRO_STATUS_SURFACE_FINISHED error rather than a use after free
> vulnerability.

Cairo already provides a way to do this. cairo_surface_set_user_data(). That's what we use in ShareableBitmapCairo, for example. Maybe we can make this more generic and add a helper to CairoUtilities to create image surfaces. If the size is big enough, we allocate the memory and use cairo_image_surface_create_for_data + cairo_surface_set_user_data. Then we can replace any call to cairo_image_surface_create() with the helper.

> There's also a behavior change here, in that previously we create an
> empty/null/zero/default CairoImageSurface, but now we create a
> CarioImageSurface that's initialized using uninitialized data. (Note that
> MallocPtr is using fastMalloc and not fastZeroedMalloc.) Was that
> intentional? Even if that's not a problem for whatever reason (e.g. if it's
> always painted over right away), it does not seem like a good idea to pass
> uninitialized memory to cairo; does it not cause any problems with valgrind?
> Maybe you should ditch the MallocPtr and manually use fastZeroedMalloc
> instead? If zeroing the memory does not achieve satisfactory  performance,
> then I'd like to see an explanation of why it's not a problem to use
> uninitialized memory here, and your assurance that it doesn't trip asan or
> valgrind.
Comment 6 Michael Catanzaro 2016-12-22 07:27:28 PST
(In reply to comment #5)
> Cairo already provides a way to do this. cairo_surface_set_user_data().

Great, that looks like a perfect solution to the first problem.
Comment 7 Zan Dobersek 2017-01-01 10:03:44 PST
Hum, missing the cairo_surface object being ref-counted was stupidity on my side.

I'll come back with a fixed patch and memory consumption numbers.
Comment 8 Zan Dobersek 2017-01-25 08:39:19 PST
Created attachment 299707 [details]
Patch
Comment 9 WebKit Commit Bot 2017-01-25 08:40:43 PST
Attachment 299707 [details] did not pass style-queue:


ERROR: Source/WebCore/platform/graphics/cairo/ImageBufferCairo.cpp:228:  More than one command on the same line  [whitespace/newline] [4]
Total errors found: 1 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 10 Zan Dobersek 2017-01-25 08:40:54 PST
Created attachment 299708 [details]
nytimes.com analysis
Comment 11 Carlos Garcia Campos 2017-01-25 08:48:09 PST
Comment on attachment 299707 [details]
Patch

Awesome, thanks!
Comment 12 Zan Dobersek 2017-01-25 08:55:45 PST
(In reply to comment #10)
> Created attachment 299708 [details]
> nytimes.com analysis

This shows the memory consumption of WebProcess when loading nytimes.com through MiniBrowser in three modes -- AC completely disabled, AC enabled without the proposed patch, and AC enabled with the proposed patch.

nytimes.com is an example of a Web page that does a lot of painting. As such, a lot of UpdateAtlas objects and the underlying ImageBuffer objects will be created.

By default, cairo_image_surface will allocate the necessary pixel memory through the libc allocator (your usual malloc()), but the patch proposes to do it through FastMalloc (i.e. bmalloc).

To measure the memory consumption, nytimes.com was first completely loaded, and then slowly scrolled through to the bottom of the page. I waited for some additional time on order for the garbage collection to kick in.

Regarding the USS, PSS and RSS columns -- pick one, but PSS is the most accurate measurement since it also proportionally assigns shared memory to each process that shares it, though that doesn't affect anything here.

Anyway, at the end of the test, with AC disabled the PSS memory consumption stabilizes at 202-207 MB. With AC enabled and without using FastMalloc, it's at 264-276 MB -- that's a serious increase. With AC enabled and with using FastMalloc, it's at much more acceptable 215-218 MB. That brings the difference down a lot, and take note that 4 MB of that difference is due to one UpdateAtlas (and thus one ImageBuffer object and the underlying 4 MB of pixel memory) being cached indefinitely in CompositingCoordinator.

While I'm happy with that improvement, this is really a very limited (or rather focused) test case. There's possibly other areas of memory consumption we might be able to improve, but this was the most obvious one in the massif dumps.

Although it should be noted that technically in this case we're not decreasing the amount of memory that's being used, we're just making sure that it's allocated in the most efficient way possible.
Comment 13 Zan Dobersek 2017-01-26 01:23:05 PST
Comment on attachment 299707 [details]
Patch

Clearing flags on attachment: 299707

Committed r211206: <http://trac.webkit.org/changeset/211206>
Comment 14 Zan Dobersek 2017-01-26 01:23:14 PST
All reviewed patches have been landed.  Closing bug.