RESOLVED FIXED Bug 27538
Qt memory consumption on site with many images
https://bugs.webkit.org/show_bug.cgi?id=27538
Summary Qt memory consumption on site with many images
Holger Freyther
Reported 2009-07-22 04:57:39 PDT
one of the biggest issues with QtWebKit is performance. I had a look at it recently and would discuss various changes...
Attachments
Change the Qt image decoding to be closer to cairo (31.84 KB, patch)
2009-07-22 05:01 PDT, Holger Freyther
no flags
Readd the checking and extension code (2.16 KB, patch)
2009-07-22 05:02 PDT, Holger Freyther
no flags
Allow to compile with WebCore decoders (3.61 KB, patch)
2009-07-22 05:02 PDT, Holger Freyther
no flags
Use clock_gettime on Linux to get a monotonic increasing clock (830 bytes, patch)
2009-07-22 05:02 PDT, Holger Freyther
no flags
Add HACK to force pruning liveObjects even during paint if painting took too long (1010 bytes, patch)
2009-07-22 05:02 PDT, Holger Freyther
no flags
backtrace (4.41 KB, text/plain)
2009-07-28 05:39 PDT, Balazs Kelemen
no flags
stress test (128.04 KB, text/html)
2009-07-30 09:58 PDT, Balazs Kelemen
no flags
Holger Freyther
Comment 1 2009-07-22 05:01:22 PDT
Created attachment 33252 [details] Change the Qt image decoding to be closer to cairo Introduce an RGBA32Buffer buffer implementation for Qt. The actualy buffer is a WTF::Vector (to use fastMalloc) but we have a image on top of this we are using for manipulation, width/heigt. Re-Implement ImageDecoderQt on top of the ImageDecoder using the RGBA32Buffer to hold the final image and attempt to delay image decoding until the actual image data is requested. This is failing for gif were we need to decode the images in frameCount() due QImageReader limitations. Add a side-channel to RGBA32Buffer because QImageReader can not decode to the QImage we give it. Change the ImageSourceQt.cpp to be close cairo and change ImageQt.cpp to free the native image pointer (QPixmap) now that the ownership of QPixmap's is the same as in other ports. This change makes the ownership of RGBA32Buffer and native images/pixmaps follow other ports and allows to easily use the WebCore image decoders. --- 8 files changed, 414 insertions(+), 321 deletions(-)
Holger Freyther
Comment 2 2009-07-22 05:02:06 PDT
Created attachment 33253 [details] Readd the checking and extension code --- 3 files changed, 15 insertions(+), 4 deletions(-)
Holger Freyther
Comment 3 2009-07-22 05:02:20 PDT
Created attachment 33254 [details] Allow to compile with WebCore decoders --- 2 files changed, 78 insertions(+), 0 deletions(-)
Holger Freyther
Comment 4 2009-07-22 05:02:42 PDT
Created attachment 33255 [details] Use clock_gettime on Linux to get a monotonic increasing clock For the cache we expect to have a monotonic increasing. Use clock_gettime(CLOCK_MONOTONIC, &tp) over gettimeofday on Linux. In contrast to gettimeofday clock_gettime(CLOCK_MONOTONIC) can only go forward. --- 1 files changed, 12 insertions(+), 0 deletions(-)
Holger Freyther
Comment 5 2009-07-22 05:02:58 PDT
Created attachment 33256 [details] Add HACK to force pruning liveObjects even during paint if painting took too long --- 1 files changed, 5 insertions(+), 3 deletions(-)
Kenneth Rohde Christiansen
Comment 6 2009-07-22 07:22:13 PDT
What is the memory implications with these patches?
Holger Freyther
Comment 7 2009-07-22 08:58:54 PDT
It is hard to tell.... because of: - the Cache - pruning will happen as part of the drawing - you will have to use xrestop to see what one has placed in the xserver... For the WebCore decoders one can say they perform better than QImageReader. First of all QImageReader is doing many allocations, e.g. starting to parse the image once we ask for the size (where the WebCore decoders decode when the image is needed)... and the old and new caching strategy is a bit different which makes comparing heard. I'm currently running two QtLauncher's in memprof opening two pages round robin for ~2000 times and then will look at the numbers.
Kenneth Rohde Christiansen
Comment 8 2009-07-23 06:21:59 PDT
Did you look at the numbers? Btw, the patch misses a ChangeLog so that is a r-, though I'm no reviewer.
Balazs Kelemen
Comment 9 2009-07-27 07:09:23 PDT
What is the revision is these patches for? Unfortunately, I could not apply them on ToT.
Balazs Kelemen
Comment 10 2009-07-28 04:05:20 PDT
I have applied first and second patches by hand on ToT but it crashes for me.
Holger Freyther
Comment 11 2009-07-28 04:49:42 PDT
(In reply to comment #10) > I have applied first and second patches by hand on ToT but it crashes for me. Well, did you consider attaching a backtrace?
Balazs Kelemen
Comment 12 2009-07-28 05:39:29 PDT
Created attachment 33617 [details] backtrace The first two patch was applied to r46417. The backtrace was produced by "gdb ./QtLauncher http://www.index.hu".
Balazs Kelemen
Comment 13 2009-07-30 09:58:44 PDT
Created attachment 33790 [details] stress test A test what I used to measuring the effects of the changes.
Balazs Kelemen
Comment 14 2009-07-30 10:21:46 PDT
I have compared the staff with an all-in patch (contains all changes from these patches) against r46520. I used the _localized_ version of the stress test what I attached (every picture was downloaded, the urls were changed to the local files). I measured the time what is needed for loading the images by the time command, and I measured the maximum resident set size with getrusage from <sys/resource.h> on a linux system with a patched kernel what is able to give this info like BSD-s (with a little hack on QtLauncher). Results: base (r46520) ------------- max-rss: 171728 KB speed: real 1m39.696s user 0m3.968s sys 0m1.296s zecke's changes without enabling webcore image-decoders ------------------------------------------------------- max-rss: 45580 KB speed: real 1m34.465s user 0m2.664s sys 0m1.120s zecke's changes with enabling webcore image-decoders ---------------------------------------------------- max-rss: 44924 KB speed: real 1m37.507s user 0m2.800s sys 0m1.020s
Kenneth Rohde Christiansen
Comment 15 2009-08-04 07:11:30 PDT
Any update on these patches? Holger, why are these patches not up for review?
Holger Freyther
Comment 16 2009-08-04 08:13:59 PDT
(In reply to comment #15) > Any update on these patches? Holger, why are these patches not up for review? Look at what you said at comment #8. Besides that there are some things I want to and need to do first.. 1.) finish unrelated paid work.. 2.) be able to measure the change. 2.1) measure total memory comsumption, measure peak. 2.2) measure time to go through a benchmark of loading and rendering pages with images. 2.3) setup testing so everyone can redo these benchmarks. For 2nd) I have landed two benchmark tests, we need a system to collect and compare the data though, hook up memprof or memusage... have you seen the chromium system? we need something like this too and now is the time to have it.
Peter Kasting
Comment 17 2009-08-30 22:21:18 PDT
On bug 27965 comment 28 I was asked to comment on how some ongoing refactoring work on ImageSource, ImageDecoder, etc. would affect the patches on this bug. At a glance, it looks like it might make a few of them a little simpler, but probably would not have a huge effect. The main things I have done recently are reducing ImageSourceQt.cpp to only a few functions and combining most of the ImageDecoder*.cpp files into one set. At some point soon I intend to break RGBA32Buffer out of ImageDecoder.h and into its own interface in platform/graphics, probably called ImageFrame. I have been considering changing the interface between ImageDecoder and the putative ImageFrame to make the ImageFrame completely isolated from other callers besides ImageDecoder (right now it's also accessed by the non-QT versions of ImageSource) in hopes of making it possible to merge the rest of the current ImageSourceQt.cpp with ImageSource.cpp. The patches here would obviate that need, but such a change still might make things cleaner.
Yongjun Zhang
Comment 18 2009-09-28 07:29:21 PDT
Any plan to get this patchset reviewed and landed? I believe it would be very useful for QtWebKit in S60 platform and other mobile platforms as well.
Simon Hausmann
Comment 19 2009-09-28 07:41:56 PDT
Here's a copy of Holger's earlier explanations about the current work: [10:53] let me start with theory of operation. ImageSource has a ImageDecoder, ImageDecoder gets the encoded data and has access to the decoded frames [10:54] the WebCore::Cache will ask the ImageSource to drop decoded image data to safe memory, then ImageSource will ask the decoder to free things [10:56] Qt is working a bit differently. The QPixmaps will never be freed [10:56] and the decoded frame (QImage) will be freed when we ask for the QPixmap [10:58] so everyone has a delete in FrameData::clear, but Qt doesn't... [10:58] the problem with this are: QPixmap/QImage are not subject to the cache control (ofcourse when the ImageSource says bye bye... the QPixmap/QImage will go away too) [10:58] and mixing WebCore decoders and Qt decoders is not possible [11:00] the Cache Control can be seen on sites with many pictures (e.g. the test case I sent around) for the patches [12:23] the overall goal is: a) Control image caching through WebCore::Cache b) be able to use webcore image decoders [12:23] for a) CachedImage is doing a calculation like: width*height*depth bytes are allocated by ImageSource [12:24] then CachedImage asks ImageSource to clear the cache, FrameData gets cleared, the QPixmap stays [12:24] the cache thinks that width*height*depth bytes are allocated less... which is not really true for QPixmap on QWS, and not nice to abuse the xserver [12:25] to make that change, we will need to delete the pixmap/NativeImagePtr m_frame in FrameData::clear (ImageQt.cpp) and change the ownership of the QPixmap (from ImageDecoderQt) to ImageSource [12:26] my patch is doing it in a probably too radical way, I replace the old ImageDecoderQt crap with new one. :} [12:29] so my changes are doing the following: Remove ImageDecoderQt the ImageHandler, start to decode the image only when someone asks for the frame or the size [12:29] and use the RGBA32Buffer class to store them, to more or less behave like a normal ImageDecoder (e.g. the GIF one) [13:27] <tronical> zecke_: otherwise the changes look good. the only thing I'm wondering is how hard it'll be to add support for qt image decoders for other unsupported formats. [13:28] <zecke_> tronical: by default QImageReader/ImageDecoderQt is used [13:29] <zecke_> tronical: so the place I have put the code is like. If it is configured that way, then use GIF, JPEG, PNG, ICO, BMP from WebCore and fallback to ImageDecoderQt then [14:06] <zecke_> tronical: and the idea is that we have a WTF::Vector for the image data and QImage on top, for the WebCore decoders we could do color space conversion on decoding (to device depth), for QImageReader we have to convert/copy... [14:07] <zecke_> tronical: the main benefits that I can not yet prove are: Less decoding during loading, better cache behavior
Holger Freyther
Comment 20 2009-10-01 20:41:17 PDT
Here a small status update. I was able to measure: The good news: - The new code is faster, WebCore image decoders are generally faster than Qt - My assumption is that it is mostly due doing the decoding on demand, e.g. when the size is queried instead of decoding it right away. The bad news: - On one test case 30mb more memory is allocated (i have no number of what is mapped) - I will have to figure out which test is failing and then why.
Holger Freyther
Comment 21 2009-10-01 23:03:16 PDT
More news on the stress test. The old code takes 125secs to load on a local link, the new one only 25 secs and the memory usage is in the peak ~500mb vs. 3gb. Balazs could you create the stress test without so much nudity, I would love to include this in the mirror test suite.
Balazs Kelemen
Comment 22 2009-10-02 02:04:19 PDT
Sure, on Monday or at least Tuesday I (or probably Zoltan) will do a maxRSS based measurement.
Holger Freyther
Comment 23 2009-10-06 07:15:41 PDT
Last words before regressions come in: The changes remove dead code from ImageDecoderQt, replace various Progressive Loading future support features as QImageReader does not support progressive loading. The ImageState internal class is replaced with the RGBA32Buffer allowing us to share the ImageSource and the final patch is doing the big change. It is replacing the ReadContext with a new implementation that attempts to postpone decoding as far as possible. This will make QtWebKit only decode images that are displayed instead of everything. In the benchmarks this is great for memory usage and loading speed. We will set the QImage in the RGBA32Buffer even if it doesn't have the 32 bits in depth... but that is great for the memory consumption. The last tests were only ran on the szeged test site but it can be reloaded 100 times without memory leaks and consuming a lot less than before. Future work will focus on QImageReader with animated images and Caching, E.g. when starting to scroll the website the non displayed items should be pruned.
Simon Hausmann
Comment 24 2009-10-08 01:31:16 PDT
Holger, is it okay to close this bug as resolved?
Holger Freyther
Comment 25 2009-10-08 01:57:38 PDT
Regressions and further bugs should go to new reports. Closing this one.
Note You need to log in before you can comment on or make changes to this bug.