Greetings, JPEGImageDecoder::outputScanlines() copies row pixels returned by jpeg_read_scanlines() calls from a vector to a ImageFrame because libjpeg cannot output RGBA pixels used by WebKit. It improves the performance of decoding JPEG data to remove these copies when using libjpeg-turbo because it can output RGBA pixels used by WebKit. (It improves the rendering performance of JPEG images by >30%, especially huge ones as written in the following test results.) 1. Before removing memory copies Before removing memory copies, it takes ~50s for our page_cycler_tests to render test images in <http://www.imagecompression.info/test_images/>. Note: Google Test filter = PageCyclerTest.JPEGFile [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from PageCyclerTest [ RUN ] PageCyclerTest.JPEGFile *RESULT vm_size_final_b: vm_size_f_b= 417281 bytes *RESULT vm_rss_final_b: vm_rss_f_b= 15 bytes *RESULT vm_size_final_r: vm_size_f_r= 417281 bytes *RESULT vm_rss_final_r: vm_rss_f_r= 15 bytes *RESULT vm_size_final_t: vm_size_f_t= 834562 bytes *RESULT vm_rss_final_t: vm_rss_f_t= 30 bytes RESULT processes: proc_= 2 RESULT read_op_b: r_op_b= 0 RESULT write_op_b: w_op_b= 0 RESULT other_op_b: o_op_b= 0 *RESULT total_op_b: IO_op_b= 0 RESULT read_byte_b: r_b= 0 kb RESULT write_byte_b: w_b= 0 kb RESULT other_byte_b: o_b= 0 kb *RESULT total_byte_b: IO_b= 0 kb RESULT read_op_r: r_op_r= 0 RESULT write_op_r: w_op_r= 0 RESULT other_op_r: o_op_r= 0 *RESULT total_op_r: IO_op_r= 0 RESULT read_byte_r: r_r= 0 kb RESULT write_byte_r: w_r= 0 kb RESULT other_byte_r: o_r= 0 kb *RESULT total_byte_r: IO_r= 0 kb RESULT commit_charge: cc= 78680 kb Pages: [artificial,big_building,big_tree,bridge,cathedral,deer,fireworks,flower_foveon,hdr,leaves_iso_1600,leaves_iso_200,nightshot_iso_100,nightshot_iso_1600,spider_web] *RESULT times: t= [110,27,43,38,11,12,11,9,10,15,53,12,13,11,10,23,43,39,11,12,10,10,10,14,13,12,13,10,10,23,43,37,10,12,10,9,11,14,14,12,13,11,9,23,44,38,12,12,10,9,11,15,12,12,13,13,11,23,44,39,11,12,10,9,10,14,14,12,13,14,10,22,43,38,11,13,10,9,10,14,139,403,13,11,11,23,44,38,11,12,10,9,11,14,13,15,12,11,9,23,44,39,12,12,10,9,10,14,13,11,12,11,11,23,43,38,12,12,10,9,10,14,14,11,13,11,11,22,43,38,11,12,10,9,10,14,13,12,12,12] ms [ OK ] PageCyclerTest.JPEGFile (49039 ms) [----------] 1 test from PageCyclerTest (49040 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (49040 ms total) [ PASSED ] 1 test. YOU HAVE 15 FLAKY TESTS 2. After removing memory copies After removing memory copies, it takes ~30s for our page_cycler_tests to render the same test images. Note: Google Test filter = PageCyclerTest.JPEGFile [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from PageCyclerTest [ RUN ] PageCyclerTest.JPEGFile *RESULT vm_size_final_b: vm_size_f_b= 416961 bytes *RESULT vm_rss_final_b: vm_rss_f_b= 15 bytes *RESULT vm_size_final_r: vm_size_f_r= 416961 bytes *RESULT vm_rss_final_r: vm_rss_f_r= 15 bytes *RESULT vm_size_final_t: vm_size_f_t= 833922 bytes *RESULT vm_rss_final_t: vm_rss_f_t= 30 bytes RESULT processes: proc_= 2 RESULT read_op_b: r_op_b= 0 RESULT write_op_b: w_op_b= 0 RESULT other_op_b: o_op_b= 0 *RESULT total_op_b: IO_op_b= 0 RESULT read_byte_b: r_b= 0 kb RESULT write_byte_b: w_b= 0 kb RESULT other_byte_b: o_b= 0 kb *RESULT total_byte_b: IO_b= 0 kb RESULT read_op_r: r_op_r= 0 RESULT write_op_r: w_op_r= 0 RESULT other_op_r: o_op_r= 0 *RESULT total_op_r: IO_op_r= 0 RESULT read_byte_r: r_r= 0 kb RESULT write_byte_r: w_r= 0 kb RESULT other_byte_r: o_r= 0 kb *RESULT total_byte_r: IO_r= 0 kb RESULT commit_charge: cc= 78116 kb Pages: [artificial,big_building,big_tree,bridge,cathedral,deer,fireworks,flower_foveon,hdr,leaves_iso_1600,leaves_iso_200,nightshot_iso_100,nightshot_iso_1600,spider_web] *RESULT times: t= [55,26,19,13,11,12,10,9,10,14,13,12,12,11,10,23,19,13,11,12,12,13,142,14,12,12,12,12,10,23,47,13,11,12,10,9,10,14,41,16,12,10,10,24,45,14,11,12,10,9,10,14,12,12,12,11,10,24,47,14,11,12,10,9,10,14,13,12,13,11,10,23,44,13,11,13,11,9,10,14,12,12,13,11,10,23,45,14,11,12,10,9,11,14,13,12,13,11,10,24,44,14,11,13,10,10,11,14,13,11,12,18,52,23,44,14,12,14,10,10,11,14,12,12,17,11,11,22,45,14,12,13,10,9,10,14,13,11,13,11] ms [ OK ] PageCyclerTest.JPEGFile (30283 ms) [----------] 1 test from PageCyclerTest (30283 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (30283 ms total) [ PASSED ] 1 test. YOU HAVE 15 FLAKY TESTS Regards, Hironori Bono
Created attachment 91421 [details] A quick fix v0 Greetings, This is the quick fix used for measuring the performance. Regards, Hironori Bono
Tamura-san, Is it possible to tell me the good reviewer for this change? Regards, Hironori Bono
> Is it possible to tell me the good reviewer for this change? Adam, Eric, could you review this please?
The friend declaration seems wrong. Also I'd like pasting to take a look first...
Comment on attachment 91421 [details] A quick fix v0 Yup. If Peter likes it, I'm happy to r+ it.
Comment on attachment 91421 [details] A quick fix v0 View in context: https://bugs.webkit.org/attachment.cgi?id=91421&action=review > Source/WebCore/ChangeLog:9 > + to copy row pixels from a vector to a ImageFrame because libjpeg-turbo Nit: "a ImageFrame" -> "an ImageFrame" > Source/WebCore/ChangeLog:10 > + can output RBA pixels used by WebKit. This change writes the output Nit: RBA -> RGBA > Source/WebCore/ChangeLog:12 > + this memory copies. Nit: "this memory copies" -> "the extra copy" > Source/WebCore/platform/image-decoders/ImageDecoder.h:56 > + friend class JPEGImageDecoder; This isn't the right way to do this. I'm concerned about blindly writing into the buffer. Do all platforms guarantee the image format is 32bpp ARGB and that the stride is not larger than the actual line width? If so, you can probably just make getAddr() public. If not, you'll need a function which returns a pointer to the image data only if the above are met, and NULL otherwise. > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:220 > + m_info.out_color_space = JCS_EXT_BGRX; I believe this is only correct for little-endian machines. For big-endian I think you need XRGB, because libjpeg-turbo writes using byte offsets rather than bit shifts, so it doesn't auto-swap with machine endianness. (I'm saying this based on trying to decipher http://sourceforge.net/mailarchive/forum.php?thread_name=4D850B59.1050201%40users.sourceforge.net&forum_name=tigervnc-users ).
I think the three tests would be better combined into one test. Filed bug 72234 about it and posted a patch. Tamura-san could you review please?
Examining the history of ImageFrame suggests that it is in fact an abstraction of an RGBA32 pixel buffer. I think it's fine to make getAddr() public, filed bug 72321 with a patch.
Related tests: fast/images/rgb-jpeg-endian-pixels.html fast/images/cmyk-jpeg-with-color-profile.html fast/images/gray-scale-jpeg-with-color-profile.html Looking green now on the bots: http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=rgb-jpeg-endian-pixels.html%2Ccmyk-jpeg-with-color-profile.html%2Cgray-scale-jpeg-with-color-profile.html&showExpectations=true&group=%40ToT%20-%20webkit.org
JCS_EXTENSIONS is only defined if libjpeg-turbo is compiled in and Chromium is the only port using libjpeg-turbo. Next the endianness question ... Chromium is designed for little endian machines only, reading from src/build/build-config.h. http://www.google.com/codesearch#OAMlx_jo-ck/src/build/build_config.h&type=cs&l=86 Brett Wilson made the same point in his recent reply about Chromium endianness on chrome-dev@. http://groups.google.com/a/chromium.org/group/chromium-dev/browse_thread/thread/c6c4a23a33c155f4?fwc=1&pli=1 Chrome Skia code states: "All little-endian Chrome platforms agree: BGRA is the optimal pixel layout." http://www.google.com/codesearch#OAMlx_jo-ck/src/skia/config/SkUserConfig.h&type=cs&l=184 so a BGRX swizzle makes sense to me. Another concern I had - the decoders m_scaled member. Talked to Peter off-line, it's enabled by flag ENABLE(IMAGE_DECODER_DOWN_SAMPLING). Chromium does not enable that flag. Next step: code all the above in a WIP patch.
Created attachment 115373 [details] Patch
Above is a WIP Patch, baking it in the queues.
Comment on attachment 115373 [details] Patch Attachment 115373 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/10484787 New failing tests: tables/mozilla/bugs/bug29314.html tables/mozilla/bugs/bug13169.html tables/mozilla/bugs/bug10565.html tables/mozilla/bugs/bug11026.html fast/repaint/backgroundSizeRepaint.html fast/repaint/block-layout-inline-children-replaced.html fast/repaint/clipped-relative.html fast/repaint/selected-replaced.html tables/mozilla/bugs/bug12908-1.html
These are regressions, and I reproduced on chrome-linux/win32 locally. The test image resources were created by XV Version 3.10a Rev: 12/29/94 http://trac.webkit.org/browser/trunk/LayoutTests/tables/mozilla/images/ant.jpg http://trac.webkit.org/browser/trunk/LayoutTests/fast/repaint/resources/apple.jpg They might be more than 10 years old, but they work fine with libjpeg. The ant.jpg image shows spot red/white pixel noise, or swathes of while pixel errors.
Created attachment 115556 [details] Regression: red pixel noise, white swathe errors tables/mozilla/bugs/bug12908-1.html regression.
Seems it's a libjpeg-turbo bug, a workaround is to exclude JSC_GRAYSCALE images.
Created attachment 115564 [details] Patch - Don't swizzle grayscale JPEG images.
Created attachment 115565 [details] Patch - Don't swizzle grayscale JPEG images.
Greetings, (In reply to comment #16) > Seems it's a libjpeg-turbo bug, a workaround is to exclude JSC_GRAYSCALE images. Thank you for taking over this issue and bug report. Even though I'm on a trip and cannot afford to investigate this libjpeg-turbo issue, I will investigate it when I'm back from the trip. Regards, Hironori Bono
Hironori-san, thank you for reporting the bug - the work you did here was important. Enjoy your trip. I look forward to talking with you again on your return.
Created attachment 115756 [details] Patch
Created attachment 115775 [details] Patch
Comment on attachment 115775 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=115775&action=review Thanks for pushing this through. A little more build infrastructure work is needed. > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:73 > +#if PLATFORM(CHROMIUM) && USE(SKIA) && ASSUME_LITTLE_ENDIAN This #if test is too fragile. It might be the case that Chromium stops using libjpeg-turbo. See in particular src/third_party/libjpeg_turbo/libjpeg.gyp; it looks like on Chrome OS the system JPEG library is used, which I think would result in the wrong code path being taken here. If that happened, would this code break? Alternatively, other ports might start using libjpeg-turbo. I think this should be written as "#if USE(LIBJPEG_TURBO) && ...". Conditionally define WTF_USE_LIBJPEG_TURBO in Source/WebKit/chromium/WebKit.gyp based on a gyp variable set in libjpeg.gyp.
(In reply to comment #23) > > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:73 > > +#if PLATFORM(CHROMIUM) && USE(SKIA) && ASSUME_LITTLE_ENDIAN > > This #if test is too fragile. It might be the case that Chromium stops using libjpeg-turbo. See in particular src/third_party/libjpeg_turbo/libjpeg.gyp; it looks like on Chrome OS the system JPEG library is used, which I think would result in the wrong code path being taken here. If that happened, would this code break? Alternatively, other ports might start using libjpeg-turbo. > Yes, too fragile and my try jobs broke on chrome-mac. I had assumed libjpeg-turbo was not enabled there, and yes, what if other ports or variants (chrome-os) want to use it? My #if test is too restrictive. > I think this should be written as "#if USE(LIBJPEG_TURBO) && ...". Conditionally define WTF_USE_LIBJPEG_TURBO in Source/WebKit/chromium/WebKit.gyp based on a gyp variable set in libjpeg.gyp. If any port switches to using libjepg-turbo, it'll be via build file or other means, but #include "jpeglib.h" will load libjepg-turbo's version of that file, which #defines JCS_EXTENSIONS. So I think JCS_EXTENSIONS can be used to detect libjepg-turbo? Anyway, I've gone that way for now to continue my testing with a much simpler #if test.
chrome try jobs -- look got to me. http://build.chromium.org/p/tryserver.chromium/builders/linux/builds/4683 http://build.chromium.org/p/tryserver.chromium/builders/win/builds/5019 http://build.chromium.org/p/tryserver.chromium/builders/mac/builds/4624 chrome-os try jobs -- look good to me. http://build.chromium.org/p/tryserver.chromium/builders/linux_chromeos/builds/1611 http://build.chromium.org/p/tryserver.chromium/builders/cros_tegra2/builds/42 http://build.chromium.org/p/tryserver.chromium/builders/cros_arm/builds/63 http://build.chromium.org/p/tryserver.chromium/builders/cros_x86/builds/64
Created attachment 116577 [details] Patch
Greetings, Thank you for your bug report. This is a bug of gray->rgbx conversion in libjpeg-turbo. I will file a upstream bug and upload its fix. Regards, Hironori Bono (In reply to comment #20) > Hironori-san, thank you for reporting the bug - the work you did here was important. Enjoy your trip. I look forward to talking with you again on your return.
Comment on attachment 116577 [details] Patch Looks good. Testing for JCS_EXTENSIONS does seem to be a reasonable discriminator between libjpeg-turbo and the original libjpeg. r=me
Comment on attachment 116577 [details] Patch Clearing flags on attachment: 116577 Committed r101286: <http://trac.webkit.org/changeset/101286>
All reviewed patches have been landed. Closing bug.
(In reply to comment #27) > Thank you for your bug report. > This is a bug of gray->rgbx conversion in libjpeg-turbo. I will file a upstream bug and upload its fix. Filed bug 75189 about the grayscale issue.