Bug 59670

Summary: [chromium] Use data decoding swizzle for turbo JPEG image decoding
Product: WebKit Reporter: Hironori Bono <hbono>
Component: ImagesAssignee: noel gordon <noel.gordon>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, dglazkov, eric, jbauman, kbr, noel.gordon, pkasting, tkent, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 72234, 72240, 72321, 73284, 74286, 75189, 75861, 79457    
Bug Blocks: 78323    
Attachments:
Description Flags
A quick fix v0
none
Patch
none
Regression: red pixel noise, white swathe errors
none
Patch - Don't swizzle grayscale JPEG images.
none
Patch - Don't swizzle grayscale JPEG images.
none
Patch
none
Patch
none
Patch none

Description Hironori Bono 2011-04-27 20:41:41 PDT
Greetings,

JPEGImageDecoder::outputScanlines() copies row pixels returned by jpeg_read_scanlines() calls from a vector to a ImageFrame because libjpeg cannot output RGBA pixels used by WebKit. It improves the performance of decoding JPEG data to remove these copies when using libjpeg-turbo because it can output RGBA pixels used by WebKit. (It improves the rendering performance of JPEG images by >30%, especially huge ones as written in the following test results.)

1. Before removing memory copies

Before removing memory copies, it takes ~50s for our page_cycler_tests to render test images in <http://www.imagecompression.info/test_images/>.

Note: Google Test filter = PageCyclerTest.JPEGFile
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from PageCyclerTest
[ RUN      ] PageCyclerTest.JPEGFile
*RESULT vm_size_final_b: vm_size_f_b= 417281 bytes
*RESULT vm_rss_final_b: vm_rss_f_b= 15 bytes
*RESULT vm_size_final_r: vm_size_f_r= 417281 bytes
*RESULT vm_rss_final_r: vm_rss_f_r= 15 bytes
*RESULT vm_size_final_t: vm_size_f_t= 834562 bytes
*RESULT vm_rss_final_t: vm_rss_f_t= 30 bytes
RESULT processes: proc_= 2
RESULT read_op_b: r_op_b= 0
RESULT write_op_b: w_op_b= 0
RESULT other_op_b: o_op_b= 0
*RESULT total_op_b: IO_op_b= 0
RESULT read_byte_b: r_b= 0 kb
RESULT write_byte_b: w_b= 0 kb
RESULT other_byte_b: o_b= 0 kb
*RESULT total_byte_b: IO_b= 0 kb
RESULT read_op_r: r_op_r= 0
RESULT write_op_r: w_op_r= 0
RESULT other_op_r: o_op_r= 0
*RESULT total_op_r: IO_op_r= 0
RESULT read_byte_r: r_r= 0 kb
RESULT write_byte_r: w_r= 0 kb
RESULT other_byte_r: o_r= 0 kb
*RESULT total_byte_r: IO_r= 0 kb
RESULT commit_charge: cc= 78680 kb
Pages: [artificial,big_building,big_tree,bridge,cathedral,deer,fireworks,flower_foveon,hdr,leaves_iso_1600,leaves_iso_200,nightshot_iso_100,nightshot_iso_1600,spider_web]
*RESULT times: t= [110,27,43,38,11,12,11,9,10,15,53,12,13,11,10,23,43,39,11,12,10,10,10,14,13,12,13,10,10,23,43,37,10,12,10,9,11,14,14,12,13,11,9,23,44,38,12,12,10,9,11,15,12,12,13,13,11,23,44,39,11,12,10,9,10,14,14,12,13,14,10,22,43,38,11,13,10,9,10,14,139,403,13,11,11,23,44,38,11,12,10,9,11,14,13,15,12,11,9,23,44,39,12,12,10,9,10,14,13,11,12,11,11,23,43,38,12,12,10,9,10,14,14,11,13,11,11,22,43,38,11,12,10,9,10,14,13,12,12,12] ms
[       OK ] PageCyclerTest.JPEGFile (49039 ms)
[----------] 1 test from PageCyclerTest (49040 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (49040 ms total)
[  PASSED  ] 1 test.
  YOU HAVE 15 FLAKY TESTS

2. After removing memory copies

After removing memory copies, it takes ~30s for our page_cycler_tests to render the same test images.

Note: Google Test filter = PageCyclerTest.JPEGFile
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from PageCyclerTest
[ RUN      ] PageCyclerTest.JPEGFile
*RESULT vm_size_final_b: vm_size_f_b= 416961 bytes
*RESULT vm_rss_final_b: vm_rss_f_b= 15 bytes
*RESULT vm_size_final_r: vm_size_f_r= 416961 bytes
*RESULT vm_rss_final_r: vm_rss_f_r= 15 bytes
*RESULT vm_size_final_t: vm_size_f_t= 833922 bytes
*RESULT vm_rss_final_t: vm_rss_f_t= 30 bytes
RESULT processes: proc_= 2
RESULT read_op_b: r_op_b= 0
RESULT write_op_b: w_op_b= 0
RESULT other_op_b: o_op_b= 0
*RESULT total_op_b: IO_op_b= 0
RESULT read_byte_b: r_b= 0 kb
RESULT write_byte_b: w_b= 0 kb
RESULT other_byte_b: o_b= 0 kb
*RESULT total_byte_b: IO_b= 0 kb
RESULT read_op_r: r_op_r= 0
RESULT write_op_r: w_op_r= 0
RESULT other_op_r: o_op_r= 0
*RESULT total_op_r: IO_op_r= 0
RESULT read_byte_r: r_r= 0 kb
RESULT write_byte_r: w_r= 0 kb
RESULT other_byte_r: o_r= 0 kb
*RESULT total_byte_r: IO_r= 0 kb
RESULT commit_charge: cc= 78116 kb
Pages: [artificial,big_building,big_tree,bridge,cathedral,deer,fireworks,flower_foveon,hdr,leaves_iso_1600,leaves_iso_200,nightshot_iso_100,nightshot_iso_1600,spider_web]
*RESULT times: t= [55,26,19,13,11,12,10,9,10,14,13,12,12,11,10,23,19,13,11,12,12,13,142,14,12,12,12,12,10,23,47,13,11,12,10,9,10,14,41,16,12,10,10,24,45,14,11,12,10,9,10,14,12,12,12,11,10,24,47,14,11,12,10,9,10,14,13,12,13,11,10,23,44,13,11,13,11,9,10,14,12,12,13,11,10,23,45,14,11,12,10,9,11,14,13,12,13,11,10,24,44,14,11,13,10,10,11,14,13,11,12,18,52,23,44,14,12,14,10,10,11,14,12,12,17,11,11,22,45,14,12,13,10,9,10,14,13,11,13,11] ms
[       OK ] PageCyclerTest.JPEGFile (30283 ms)
[----------] 1 test from PageCyclerTest (30283 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (30283 ms total)
[  PASSED  ] 1 test.
  YOU HAVE 15 FLAKY TESTS

Regards,

Hironori Bono
Comment 1 Hironori Bono 2011-04-27 21:03:44 PDT
Created attachment 91421 [details]
A quick fix v0

Greetings,

This is the quick fix used for measuring the performance.

Regards,

Hironori Bono
Comment 2 Hironori Bono 2011-04-27 21:30:56 PDT
Tamura-san,

Is it possible to tell me the good reviewer for this change?

Regards,

Hironori Bono
Comment 3 Kent Tamura 2011-04-30 23:29:01 PDT
> Is it possible to tell me the good reviewer for this change?

Adam, Eric, could you review this please?
Comment 4 Adam Barth 2011-04-30 23:48:11 PDT
The friend declaration seems wrong.  Also I'd like pasting to take a look first...
Comment 5 Eric Seidel (no email) 2011-05-01 09:41:36 PDT
Comment on attachment 91421 [details]
A quick fix v0

Yup. If Peter likes it, I'm happy to r+ it.
Comment 6 Peter Kasting 2011-05-02 09:45:01 PDT
Comment on attachment 91421 [details]
A quick fix v0

View in context: https://bugs.webkit.org/attachment.cgi?id=91421&action=review

> Source/WebCore/ChangeLog:9
> +        to copy row pixels from a vector to a ImageFrame because libjpeg-turbo

Nit: "a ImageFrame" -> "an ImageFrame"

> Source/WebCore/ChangeLog:10
> +        can output RBA pixels used by WebKit. This change writes the output

Nit: RBA -> RGBA

> Source/WebCore/ChangeLog:12
> +        this memory copies.

Nit: "this memory copies" -> "the extra copy"

> Source/WebCore/platform/image-decoders/ImageDecoder.h:56
> +        friend class JPEGImageDecoder;

This isn't the right way to do this.

I'm concerned about blindly writing into the buffer.  Do all platforms guarantee the image format is 32bpp ARGB and that the stride is not larger than the actual line width?  If so, you can probably just make getAddr() public.  If not, you'll need a function which returns a pointer to the image data only if the above are met, and NULL otherwise.

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:220
> +                m_info.out_color_space = JCS_EXT_BGRX;

I believe this is only correct for little-endian machines.  For big-endian I think you need XRGB, because libjpeg-turbo writes using byte offsets rather than bit shifts, so it doesn't auto-swap with machine endianness.  (I'm saying this based on trying to decipher http://sourceforge.net/mailarchive/forum.php?thread_name=4D850B59.1050201%40users.sourceforge.net&forum_name=tigervnc-users ).
Comment 7 noel gordon 2011-11-13 20:08:13 PST
I think the three tests would be better combined into one test.  Filed bug 72234 about it and posted a patch.  Tamura-san could you review please?
Comment 8 noel gordon 2011-11-14 15:43:39 PST
Examining the history of ImageFrame suggests that it is in fact an abstraction of an RGBA32 pixel
buffer.  I think it's fine to make getAddr() public, filed bug 72321 with a patch.
Comment 9 noel gordon 2011-11-16 02:04:18 PST
Related tests:
  fast/images/rgb-jpeg-endian-pixels.html
  fast/images/cmyk-jpeg-with-color-profile.html
  fast/images/gray-scale-jpeg-with-color-profile.html

Looking green now on the bots:
http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=rgb-jpeg-endian-pixels.html%2Ccmyk-jpeg-with-color-profile.html%2Cgray-scale-jpeg-with-color-profile.html&showExpectations=true&group=%40ToT%20-%20webkit.org
Comment 10 noel gordon 2011-11-16 06:31:09 PST
JCS_EXTENSIONS is only defined if libjpeg-turbo is compiled in and Chromium is the only port
using libjpeg-turbo.  Next the endianness question ...

Chromium is designed for little endian machines only, reading from src/build/build-config.h.
http://www.google.com/codesearch#OAMlx_jo-ck/src/build/build_config.h&type=cs&l=86

Brett Wilson made the same point in his recent reply about Chromium endianness on chrome-dev@.
http://groups.google.com/a/chromium.org/group/chromium-dev/browse_thread/thread/c6c4a23a33c155f4?fwc=1&pli=1

Chrome Skia code states: "All little-endian Chrome platforms agree: BGRA is the optimal pixel layout."
http://www.google.com/codesearch#OAMlx_jo-ck/src/skia/config/SkUserConfig.h&type=cs&l=184
so a BGRX swizzle makes sense to me.

Another concern I had - the decoders m_scaled member. Talked to Peter off-line, it's enabled
by flag ENABLE(IMAGE_DECODER_DOWN_SAMPLING).  Chromium does not enable that flag.

Next step: code all the above in a WIP patch.
Comment 11 noel gordon 2011-11-16 06:51:28 PST
Created attachment 115373 [details]
Patch
Comment 12 noel gordon 2011-11-16 07:00:36 PST
Above is a WIP Patch, baking it in the queues.
Comment 13 WebKit Review Bot 2011-11-16 07:31:20 PST
Comment on attachment 115373 [details]
Patch

Attachment 115373 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/10484787

New failing tests:
tables/mozilla/bugs/bug29314.html
tables/mozilla/bugs/bug13169.html
tables/mozilla/bugs/bug10565.html
tables/mozilla/bugs/bug11026.html
fast/repaint/backgroundSizeRepaint.html
fast/repaint/block-layout-inline-children-replaced.html
fast/repaint/clipped-relative.html
fast/repaint/selected-replaced.html
tables/mozilla/bugs/bug12908-1.html
Comment 14 noel gordon 2011-11-17 04:09:13 PST
These are regressions, and I reproduced on chrome-linux/win32 locally.  The test image resources
were created by XV Version 3.10a  Rev: 12/29/94

  http://trac.webkit.org/browser/trunk/LayoutTests/tables/mozilla/images/ant.jpg
  http://trac.webkit.org/browser/trunk/LayoutTests/fast/repaint/resources/apple.jpg

They might be more than 10 years old, but they work fine with libjpeg.  The ant.jpg image shows
spot red/white pixel noise, or swathes of while pixel errors.
Comment 15 noel gordon 2011-11-17 04:12:20 PST
Created attachment 115556 [details]
Regression: red pixel noise, white swathe errors

tables/mozilla/bugs/bug12908-1.html regression.
Comment 16 noel gordon 2011-11-17 04:38:53 PST
Seems it's a libjpeg-turbo bug, a workaround is to exclude JSC_GRAYSCALE images.
Comment 17 noel gordon 2011-11-17 04:53:59 PST
Created attachment 115564 [details]
Patch - Don't swizzle grayscale JPEG images.
Comment 18 noel gordon 2011-11-17 04:58:57 PST
Created attachment 115565 [details]
Patch - Don't swizzle grayscale JPEG images.
Comment 19 Hironori Bono 2011-11-17 09:54:30 PST
Greetings,

(In reply to comment #16)
> Seems it's a libjpeg-turbo bug, a workaround is to exclude JSC_GRAYSCALE images.

Thank you for taking over this issue and bug report. Even though I'm on a trip and cannot afford to investigate this libjpeg-turbo issue, I will investigate it when I'm back from the trip.

Regards,

Hironori Bono
Comment 20 noel gordon 2011-11-18 00:32:17 PST
Hironori-san, thank you for reporting the bug - the work you did here was important.  Enjoy your trip.  I look forward to talking with you again on your return.
Comment 21 noel gordon 2011-11-18 00:37:14 PST
Created attachment 115756 [details]
Patch
Comment 22 noel gordon 2011-11-18 03:00:49 PST
Created attachment 115775 [details]
Patch
Comment 23 Kenneth Russell 2011-11-21 16:24:21 PST
Comment on attachment 115775 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=115775&action=review

Thanks for pushing this through. A little more build infrastructure work is needed.

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:73
> +#if PLATFORM(CHROMIUM) && USE(SKIA) && ASSUME_LITTLE_ENDIAN

This #if test is too fragile. It might be the case that Chromium stops using libjpeg-turbo. See in particular src/third_party/libjpeg_turbo/libjpeg.gyp; it looks like on Chrome OS the system JPEG library is used, which I think would result in the wrong code path being taken here. If that happened, would this code break? Alternatively, other ports might start using libjpeg-turbo.

I think this should be written as "#if USE(LIBJPEG_TURBO) && ...". Conditionally define WTF_USE_LIBJPEG_TURBO in Source/WebKit/chromium/WebKit.gyp based on a gyp variable set in libjpeg.gyp.
Comment 24 noel gordon 2011-11-25 00:05:23 PST
(In reply to comment #23)
> > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:73
> > +#if PLATFORM(CHROMIUM) && USE(SKIA) && ASSUME_LITTLE_ENDIAN
> 
> This #if test is too fragile. It might be the case that Chromium stops using libjpeg-turbo. See in particular src/third_party/libjpeg_turbo/libjpeg.gyp; it looks like on Chrome OS the system JPEG library is used, which I think would result in the wrong code path being taken here. If that happened, would this code break? Alternatively, other ports might start using libjpeg-turbo.
>

Yes, too fragile and my try jobs broke on chrome-mac.  I had assumed libjpeg-turbo was not enabled there, and
yes, what if other ports or variants (chrome-os) want to use it?  My #if test is too restrictive.

> I think this should be written as "#if USE(LIBJPEG_TURBO) && ...". Conditionally define WTF_USE_LIBJPEG_TURBO in Source/WebKit/chromium/WebKit.gyp based on a gyp variable set in libjpeg.gyp.

If any port switches to using libjepg-turbo, it'll be via build file or other means, but #include "jpeglib.h"
will load libjepg-turbo's version of that file, which #defines JCS_EXTENSIONS.  So I think JCS_EXTENSIONS can
be used to detect libjepg-turbo?

Anyway, I've gone that way for now to continue my testing with a much simpler #if test.
Comment 26 noel gordon 2011-11-25 00:13:59 PST
Created attachment 116577 [details]
Patch
Comment 27 Hironori Bono 2011-11-25 01:09:30 PST
Greetings,

Thank you for your bug report.
This is a bug of gray->rgbx conversion in libjpeg-turbo. I will file a upstream bug and upload its fix.

Regards,

Hironori Bono

(In reply to comment #20)
> Hironori-san, thank you for reporting the bug - the work you did here was important.  Enjoy your trip.  I look forward to talking with you again on your return.
Comment 28 Kenneth Russell 2011-11-28 13:10:21 PST
Comment on attachment 116577 [details]
Patch

Looks good. Testing for JCS_EXTENSIONS does seem to be a reasonable discriminator between libjpeg-turbo and the original libjpeg. r=me
Comment 29 WebKit Review Bot 2011-11-28 14:55:48 PST
Comment on attachment 116577 [details]
Patch

Clearing flags on attachment: 116577

Committed r101286: <http://trac.webkit.org/changeset/101286>
Comment 30 WebKit Review Bot 2011-11-28 14:55:55 PST
All reviewed patches have been landed.  Closing bug.
Comment 31 noel gordon 2011-12-24 05:21:21 PST
(In reply to comment #27)
> Thank you for your bug report.
> This is a bug of gray->rgbx conversion in libjpeg-turbo. I will file a upstream bug and upload its fix.

Filed bug 75189 about the grayscale issue.