Bug 27538 - Qt memory consumption on site with many images
Summary: Qt memory consumption on site with many images
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Images (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Nobody
URL:
Keywords: Performance, Qt
Depends on:
Blocks: Qt46
  Show dependency treegraph
 
Reported: 2009-07-22 04:57 PDT by Holger Freyther
Modified: 2009-10-08 01:57 PDT (History)
7 users (show)

See Also:


Attachments
Change the Qt image decoding to be closer to cairo (31.84 KB, patch)
2009-07-22 05:01 PDT, Holger Freyther
no flags Details | Formatted Diff | Diff
Readd the checking and extension code (2.16 KB, patch)
2009-07-22 05:02 PDT, Holger Freyther
no flags Details | Formatted Diff | Diff
Allow to compile with WebCore decoders (3.61 KB, patch)
2009-07-22 05:02 PDT, Holger Freyther
no flags Details | Formatted Diff | Diff
Use clock_gettime on Linux to get a monotonic increasing clock (830 bytes, patch)
2009-07-22 05:02 PDT, Holger Freyther
no flags Details | Formatted Diff | Diff
Add HACK to force pruning liveObjects even during paint if painting took too long (1010 bytes, patch)
2009-07-22 05:02 PDT, Holger Freyther
no flags Details | Formatted Diff | Diff
backtrace (4.41 KB, text/plain)
2009-07-28 05:39 PDT, Balazs Kelemen
no flags Details
stress test (128.04 KB, text/html)
2009-07-30 09:58 PDT, Balazs Kelemen
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Holger Freyther 2009-07-22 04:57:39 PDT
one of the biggest issues with QtWebKit is performance. I had a look at it recently and would discuss various changes...
Comment 1 Holger Freyther 2009-07-22 05:01:22 PDT
Created attachment 33252 [details]
Change the Qt image decoding to be closer to cairo

Introduce an RGBA32Buffer buffer implementation for Qt. The
actualy buffer is a WTF::Vector (to use fastMalloc) but we
have a image on top of this we are using for manipulation,
width/heigt.

Re-Implement ImageDecoderQt on top of the ImageDecoder using
the RGBA32Buffer to hold the final image and attempt to delay
image decoding until the actual image data is requested. This
is failing for gif were we need to decode the images in frameCount()
due QImageReader limitations. Add a side-channel to RGBA32Buffer
because QImageReader can not decode to the QImage we give it.

Change the ImageSourceQt.cpp to be close cairo and change
ImageQt.cpp to free the native image pointer (QPixmap) now that
the ownership of QPixmap's is the same as in other ports.

This change makes the ownership of RGBA32Buffer and native
images/pixmaps follow other ports and allows to easily use
the WebCore image decoders.
---
 8 files changed, 414 insertions(+), 321 deletions(-)
Comment 2 Holger Freyther 2009-07-22 05:02:06 PDT
Created attachment 33253 [details]
Readd the checking and extension code


---
 3 files changed, 15 insertions(+), 4 deletions(-)
Comment 3 Holger Freyther 2009-07-22 05:02:20 PDT
Created attachment 33254 [details]
Allow to compile with WebCore decoders


---
 2 files changed, 78 insertions(+), 0 deletions(-)
Comment 4 Holger Freyther 2009-07-22 05:02:42 PDT
Created attachment 33255 [details]
Use clock_gettime on Linux to get a monotonic increasing clock

For the cache we expect to have a monotonic increasing. Use
clock_gettime(CLOCK_MONOTONIC, &tp) over gettimeofday on Linux. In
contrast to gettimeofday clock_gettime(CLOCK_MONOTONIC) can only
go forward.
---
 1 files changed, 12 insertions(+), 0 deletions(-)
Comment 5 Holger Freyther 2009-07-22 05:02:58 PDT
Created attachment 33256 [details]
Add HACK to force pruning liveObjects even during paint if painting took too long


---
 1 files changed, 5 insertions(+), 3 deletions(-)
Comment 6 Kenneth Rohde Christiansen 2009-07-22 07:22:13 PDT
What is the memory implications with these patches?
Comment 7 Holger Freyther 2009-07-22 08:58:54 PDT
It is hard to tell.... because of:
     - the Cache
     - pruning will happen as part of the drawing
     - you will have to use xrestop to see what one has placed in the xserver...

For the WebCore decoders one can say they perform better than QImageReader. First of all QImageReader is doing many allocations, e.g. starting to parse the image once we ask for the size (where the WebCore decoders decode when the image is needed)... and the old and new caching strategy is a bit different which makes comparing heard.

I'm currently running two QtLauncher's in memprof opening two pages round robin for ~2000 times and then will look at the numbers.
Comment 8 Kenneth Rohde Christiansen 2009-07-23 06:21:59 PDT
Did you look at the numbers?

Btw, the patch misses a ChangeLog so that is a r-, though I'm no reviewer.
Comment 9 Balazs Kelemen 2009-07-27 07:09:23 PDT
What is the revision is these patches for?
Unfortunately, I could not apply them on ToT.
Comment 10 Balazs Kelemen 2009-07-28 04:05:20 PDT
I have applied first and second patches by hand on ToT but it crashes for me.
Comment 11 Holger Freyther 2009-07-28 04:49:42 PDT
(In reply to comment #10)
> I have applied first and second patches by hand on ToT but it crashes for me.

Well, did you consider attaching a backtrace?
Comment 12 Balazs Kelemen 2009-07-28 05:39:29 PDT
Created attachment 33617 [details]
backtrace

The first two patch was applied to r46417.
The backtrace was produced by "gdb ./QtLauncher http://www.index.hu".
Comment 13 Balazs Kelemen 2009-07-30 09:58:44 PDT
Created attachment 33790 [details]
stress test

A test what I used to measuring the effects of the changes.
Comment 14 Balazs Kelemen 2009-07-30 10:21:46 PDT
I have compared the staff with an all-in patch (contains all changes from these patches) against r46520. I used the _localized_ version of the stress test what I attached (every picture was downloaded, the urls were changed to the local files). I measured the time what is needed for loading the images by the time command, and I measured the maximum resident set size with getrusage from <sys/resource.h> on a linux system with a patched kernel what is able to give this info like BSD-s (with a little hack on QtLauncher). Results:

base (r46520)
-------------
max-rss: 171728 KB
speed:
  real    1m39.696s
  user    0m3.968s
  sys     0m1.296s

zecke's changes without enabling webcore image-decoders
-------------------------------------------------------
max-rss: 45580 KB
speed:
  real    1m34.465s
  user    0m2.664s
  sys     0m1.120s

zecke's changes with enabling webcore image-decoders
----------------------------------------------------
max-rss: 44924 KB
speed:
  real    1m37.507s
  user    0m2.800s
  sys     0m1.020s
Comment 15 Kenneth Rohde Christiansen 2009-08-04 07:11:30 PDT
Any update on these patches? Holger, why are these patches not up for review?
Comment 16 Holger Freyther 2009-08-04 08:13:59 PDT
(In reply to comment #15)
> Any update on these patches? Holger, why are these patches not up for review?

Look at what you said at comment #8. Besides that there are some things I want to and need to do first..


1.) finish unrelated paid work..

2.) be able to measure the change.
  2.1) measure total memory comsumption, measure peak.
  2.2) measure time to go through a benchmark of loading and rendering pages with images.
  2.3) setup testing so everyone can redo these benchmarks.


For 2nd) I have landed two benchmark tests, we need a system to collect and compare the data though, hook up memprof or memusage... have you seen the chromium system? we need something like this too and now is the time to have it.
Comment 17 Peter Kasting 2009-08-30 22:21:18 PDT
On bug 27965 comment 28 I was asked to comment on how some ongoing refactoring work on ImageSource, ImageDecoder, etc. would affect the patches on this bug.  At a glance, it looks like it might make a few of them a little simpler, but probably would not have a huge effect.  The main things I have done recently are reducing ImageSourceQt.cpp to only a few functions and combining most of the ImageDecoder*.cpp files into one set.

At some point soon I intend to break RGBA32Buffer out of ImageDecoder.h and into its own interface in platform/graphics, probably called ImageFrame.  I have been considering changing the interface between ImageDecoder and the putative ImageFrame to make the ImageFrame completely isolated from other callers besides ImageDecoder (right now it's also accessed by the non-QT versions of ImageSource) in hopes of making it possible to merge the rest of the current ImageSourceQt.cpp with ImageSource.cpp.  The patches here would obviate that need, but such a change still might make things cleaner.
Comment 18 Yongjun Zhang 2009-09-28 07:29:21 PDT
Any plan to get this patchset reviewed and landed?  I believe it would be very useful for QtWebKit in S60 platform and other mobile platforms as well.
Comment 19 Simon Hausmann 2009-09-28 07:41:56 PDT
Here's a copy of Holger's earlier explanations about the current work:

[10:53] let me start with theory of operation. ImageSource has a ImageDecoder, ImageDecoder gets the encoded data and has access to the decoded frames
[10:54] the WebCore::Cache will ask the ImageSource to drop decoded image data to safe memory, then ImageSource will ask the decoder to free things
[10:56] Qt is working a bit differently. The QPixmaps will never be freed
[10:56] and the decoded frame (QImage) will be freed when we ask for the QPixmap
[10:58] so everyone has a delete in FrameData::clear, but Qt doesn't...
[10:58] the problem with this are: QPixmap/QImage are not subject to the cache control (ofcourse when the ImageSource says bye bye... the QPixmap/QImage will go away too)
[10:58] and mixing WebCore decoders and Qt decoders is not possible
[11:00] the Cache Control can be seen on sites with many pictures (e.g. the test case I sent around)

 for the patches
[12:23] the overall goal is: a) Control image caching through WebCore::Cache b) be able to use webcore image decoders
[12:23] for a) CachedImage is doing a calculation like: width*height*depth bytes are allocated by ImageSource
[12:24] then CachedImage asks ImageSource to clear the cache, FrameData gets cleared, the QPixmap stays
[12:24] the cache thinks that width*height*depth bytes are allocated less... which is not really true for QPixmap on QWS, and not nice to abuse the xserver
[12:25] to make that change, we will need to delete the pixmap/NativeImagePtr m_frame in FrameData::clear (ImageQt.cpp) and change the ownership of the QPixmap (from ImageDecoderQt) to ImageSource
[12:26] my patch is doing it in a probably too radical way, I replace the old ImageDecoderQt crap with new one. :}

[12:29] so my changes are doing the following: Remove ImageDecoderQt the ImageHandler, start to decode the image only when someone asks for the frame or the size
[12:29] and use the RGBA32Buffer class to store them, to more or less behave like a normal ImageDecoder (e.g. the GIF one)

[13:27] <tronical> zecke_: otherwise the changes look good. the only thing I'm wondering is how hard it'll be to add support for qt image decoders for other unsupported formats.
[13:28] <zecke_> tronical: by default QImageReader/ImageDecoderQt is used
[13:29] <zecke_> tronical: so the place I have put the code is like. If it is configured that way, then use GIF, JPEG, PNG, ICO, BMP from WebCore and fallback to ImageDecoderQt then

[14:06] <zecke_> tronical: and the idea is that we have a WTF::Vector for the image data and QImage on top, for the WebCore decoders we could do color space conversion on decoding (to device depth), for QImageReader we have to convert/copy...
[14:07] <zecke_> tronical: the main benefits that I can not yet prove are: Less decoding during loading, better cache behavior
Comment 20 Holger Freyther 2009-10-01 20:41:17 PDT
Here a small status update. I was able to measure:

The good news:
     - The new code is faster, WebCore image decoders are generally faster than Qt
     - My assumption is that it is mostly due doing the decoding on demand, e.g. when the size is queried instead of decoding it right away.

The bad news:
     - On one test case 30mb more memory is allocated (i have no number of what is mapped)
     - I will have to figure out which test is failing and then why.
Comment 21 Holger Freyther 2009-10-01 23:03:16 PDT
More news on the stress test. The old code takes 125secs to load on a local link, the new one only 25 secs and the memory usage is in the peak ~500mb vs. 3gb.

Balazs could you create the stress test without so much nudity, I would love to include this in the mirror test suite.
Comment 22 Balazs Kelemen 2009-10-02 02:04:19 PDT
Sure, on Monday or at least Tuesday I (or probably Zoltan) will do a maxRSS based measurement.
Comment 23 Holger Freyther 2009-10-06 07:15:41 PDT
Last words before regressions come in:

The changes remove dead code from ImageDecoderQt, replace various Progressive Loading future support features as QImageReader does not support progressive loading. The ImageState internal class is replaced with the RGBA32Buffer allowing us to share the ImageSource and the final patch is doing the big change. It is replacing the ReadContext with a new implementation that attempts to postpone decoding as far as possible. This will make QtWebKit only decode images that are displayed instead of everything. In the benchmarks this is great for memory usage and loading speed. We will set the QImage in the RGBA32Buffer even if it doesn't have the 32 bits in depth... but that is great for the memory consumption.

The last tests were only ran on the szeged test site but it can be reloaded 100 times without memory leaks and consuming a lot less than before.


Future work will focus on QImageReader with animated images and Caching, E.g. when starting to scroll the website the non displayed items should be pruned.
Comment 24 Simon Hausmann 2009-10-08 01:31:16 PDT
Holger, is it okay to close this bug as resolved?
Comment 25 Holger Freyther 2009-10-08 01:57:38 PDT
Regressions and further bugs should go to new reports. Closing this one.