Summary: | [Qt] Parallel imagedecoders | ||||||
---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Zoltan Horvath <zoltan> | ||||
Component: | Images | Assignee: | Nobody <webkit-unassigned> | ||||
Status: | RESOLVED INVALID | ||||||
Severity: | Normal | CC: | hausmann, kbalazs, kevin.simons, kling, loki, skyul | ||||
Priority: | P2 | ||||||
Version: | 528+ (Nightly build) | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Bug Depends on: | 71555 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Zoltan Horvath
2011-10-06 04:09:18 PDT
Some thoughts and questions after a brief overview: - It is a serious limitation that we can only paint the image when it has been fully downloaded. We should discuss whether it is acceptable in a user experience point of view. - This approach needs to be generalized across ports. I guess every port has a thread-safe image format so it should be possible. - Do you think Qt5 has a way to make using QImage not so slow? - Do you think it would be possible to still use QPixmap for smaller images that are decoded on the main thread and QImage for the big ones? I measured the overhead of the patch with starting the testbrowser with "-graphicssystem raster" parameter, in this way the overhead is decreased to a negligible level, so the considerable overhead is coming from the QPixmap -> QImage modification. (In reply to comment #1) > - Do you think it would be possible to still use QPixmap for smaller images that are decoded on the main thread and QImage for the big ones? It's possible, we need to turn some functions to template functions or duplicate code. I wonder whether it is worth powder and shot. Simon, what is your opinion about get rid of using QImageDecoder.* things and switch to WebCore's imagedecoders? Both use the same system libraries, and produce the same results on benchmarks. Is there any blocker (api, compatibility, etc...) to switch to the WebCore's imagedecoder implementation? I have a bug for WebCore vs. QtImageDecoder comparison: bug #71555 I made some measurements with libjpeg-turbo, it's related a bit to this topic, so I share my results with you. About libjpeg-turbo: http://libjpeg-turbo.virtualgl.org/ "libjpeg-turbo is a derivative of libjpeg that uses SIMD instructions (MMX, SSE2, NEON) to accelerate baseline JPEG compression and decompression on x86, x86-64, and ARM systems. On such systems, libjpeg-turbo is generally 2-4x as fast as the unmodified version of libjpeg, all else being equal." Sounds delicious :) Let's see the numbers: PC: Intel(R) Core(TM)2 Duo CPUE6550@2.33GHz (2 cores) Slackware 13.1 - 32bit, Qt4.8, r91038, http://zoltan.sed.hu/WebKit/methanol_imgdecoder/fire.html?iter=1 libjpeg-turbo (1.1.0): avg 14 339 ms (+/-4.8%) min: 13 195 ms max: 14 897 ms libjpeg (v8a): avg 22 423 ms (+/-7.1%) min: 20 926 ms max: 25 140 ms libjpeg-turbo is 36.1% faster than libjpeg on my imgdecoder-specific benchmark. Slackware 13.1 - 32bit, Qt4.8, r91038 http://zoltan.sed.hu/WebKit/methanolx/fire.html?iter=1 libjpeg-turbo (1.1.0): avg 8 055 ms (+/-15.5%) min: 6 572 ms max: 9 768 ms libjpeg (v8a): avg 8 946 ms (+/-11.7%) min: 7 455 ms max: 9 824 ms libjpeg-turbo is 10% faster than libjpeg on Methanol benchmark. It may performs better with more than 2 cores. Impressive gain even on small pictures! What is your opinion guys? The current design delays decoding until painting, but your patch seems to decode an image as soon as all data is received. It means the new parallel image decoder eagerly decodes all the images even though many of them end up not being drawn to screen. Is this an acceptable change considering the performance gain? (In reply to comment #6) > The current design delays decoding until painting, but your patch seems to decode an image as soon as all data is received. It means the new parallel image decoder eagerly decodes all the images even though many of them end up not being drawn to screen. Is this an acceptable change considering the performance gain? I think this is not a blocker problem. The main problem is that with this approach we can't achieve satisfying performance gain. For big images this works perfectly, but for the usually used relatively small images it's too slow (you can't make decision about which image should be decoded first.), furthermore in case of user events (drag&drop) you need a lots of extra copying. Monil Parmar was trying to implement this on gtk side, but he finished his examinations with the same result. We can improve on the a library side I think. E.g. as I know chromium changed to libJPEG turbo last week. :) (In reply to comment #7) > I think this is not a blocker problem. The main problem is that with this approach we can't achieve satisfying performance gain. For big images this works perfectly, but for the usually used relatively small images it's too slow (you can't make decision about which image should be decoded first.), furthermore in case of user events (drag&drop) you need a lots of extra copying. Monil Parmar was trying to implement this on gtk side, but he finished his examinations with the same result. Yes, I agree. Hyung Song did the same experiment last November on WebKit SDL port and got the exact same result. http://dev.dorothybrowser.com/?p=276 > We can improve on the a library side I think. E.g. as I know chromium changed to libJPEG turbo last week. :) It looks promising. Thanks for the information. Based on my and others experiments we couldn't achieve improvement on common cases, so I close the bug as invalid. *** Bug 40159 has been marked as a duplicate of this bug. *** |