Bug 88424

Summary: Optimization in image decoding
Product: WebKit Reporter: Misha <mtutunik>
Component: ImagesAssignee: Misha <mtutunik>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, bfulgham, dglazkov, gustavo, laszlo.gombos, noam, noel.gordon, ostap73, philn, pkasting, webkit.review.bot, xan.lopez, zoltan
Priority: P2 Keywords: Performance
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 103463, 103216, 103401    
Bug Blocks:    
Attachments:
Description Flags
kcachegrind screen shot before patch
none
kcachegrind screen shot after patch
none
patch for review
gyuyoung.kim: commit-queue-
revised patch
webkit.review.bot: commit-queue-
Another attempt
none
Attempt to fix cr build
none
Revised patch again
webkit.review.bot: commit-queue-
Patch for review
none
Addressed comments
none
Addressed last comments
none
new kcachegrind screen screen shot
none
Patch
none
Patch
none
Simple benchmark for ImageDecoder.
none
Patch
bfulgham: review-
GIF part
none
JPEG part
bfulgham: review-
GIF part without inlines
none
jpeg part, turbo issue fixed.
none
Updated patch.
bfulgham: review+
Patch for landing.
none
Patch for landing
none
Patch none

Misha
Reported 2012-06-06 09:41:58 PDT
Created attachment 146045 [details] kcachegrind screen shot before patch This patch is addressing functions that are getting called for each pixel during image decoding. In such cases it's better to avoid as much as possible branching inside loops and multiplications. The improvements should be significant when image cache reaches it's limit. To simulate this I set to 1M when I did my testing. To test the patch I just scroll locally saved page back and forth under oprofile. Then I used: opreport -gdf | op2calltree and kcachegrind to see the results. For locally saved nytimes.com the difference is quite significant. I'm attaching two screen shots of kcachegrind view before and after.
Attachments
kcachegrind screen shot before patch (385.94 KB, image/png)
2012-06-06 09:41 PDT, Misha
no flags
kcachegrind screen shot after patch (451.88 KB, image/png)
2012-06-06 09:42 PDT, Misha
no flags
patch for review (12.36 KB, patch)
2012-06-06 09:50 PDT, Misha
gyuyoung.kim: commit-queue-
revised patch (12.62 KB, patch)
2012-06-06 11:27 PDT, Misha
webkit.review.bot: commit-queue-
Another attempt (12.82 KB, patch)
2012-06-07 08:40 PDT, Misha
no flags
Attempt to fix cr build (12.82 KB, patch)
2012-06-07 09:05 PDT, Misha
no flags
Revised patch again (12.38 KB, patch)
2012-06-07 10:50 PDT, Misha
webkit.review.bot: commit-queue-
Patch for review (12.37 KB, patch)
2012-06-07 15:52 PDT, Misha
no flags
Addressed comments (12.55 KB, patch)
2012-06-08 13:40 PDT, Misha
no flags
Addressed last comments (14.07 KB, patch)
2012-07-12 09:45 PDT, Misha
no flags
new kcachegrind screen screen shot (383.01 KB, image/png)
2012-07-12 09:47 PDT, Misha
no flags
Patch (15.87 KB, patch)
2012-11-20 18:21 PST, Viatcheslav Ostapenko
no flags
Patch (15.53 KB, patch)
2012-11-20 20:08 PST, Viatcheslav Ostapenko
no flags
Simple benchmark for ImageDecoder. (1.53 KB, application/zip)
2012-11-26 14:39 PST, Viatcheslav Ostapenko
no flags
Patch (12.30 KB, patch)
2012-11-26 20:08 PST, Viatcheslav Ostapenko
bfulgham: review-
GIF part (4.74 KB, patch)
2012-11-27 17:54 PST, Viatcheslav Ostapenko
no flags
JPEG part (8.25 KB, patch)
2012-11-27 17:59 PST, Viatcheslav Ostapenko
bfulgham: review-
GIF part without inlines (4.72 KB, patch)
2012-11-27 21:29 PST, Viatcheslav Ostapenko
no flags
jpeg part, turbo issue fixed. (8.24 KB, patch)
2012-11-27 23:19 PST, Viatcheslav Ostapenko
no flags
Updated patch. (8.28 KB, patch)
2012-11-28 12:53 PST, Viatcheslav Ostapenko
bfulgham: review+
Patch for landing. (8.39 KB, patch)
2012-11-29 13:59 PST, Viatcheslav Ostapenko
no flags
Patch for landing (8.61 KB, patch)
2012-11-29 16:42 PST, Viatcheslav Ostapenko
no flags
Patch (4.60 KB, patch)
2012-11-30 17:34 PST, Viatcheslav Ostapenko
no flags
Misha
Comment 1 2012-06-06 09:42:40 PDT
Created attachment 146046 [details] kcachegrind screen shot after patch
Misha
Comment 2 2012-06-06 09:50:04 PDT
Created attachment 146050 [details] patch for review
WebKit Review Bot
Comment 3 2012-06-06 09:54:50 PDT
Attachment 146050 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:544: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Total errors found: 1 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.
Gyuyoung Kim
Comment 4 2012-06-06 11:03:54 PDT
Comment on attachment 146050 [details] patch for review Attachment 146050 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/12901894
Misha
Comment 5 2012-06-06 11:27:15 PDT
Created attachment 146071 [details] revised patch
WebKit Review Bot
Comment 6 2012-06-06 11:29:17 PDT
Attachment 146071 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:543: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.h:34: Alphabetical sorting problem. [build/include_order] [4] Total errors found: 2 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Review Bot
Comment 7 2012-06-06 22:15:37 PDT
Comment on attachment 146071 [details] revised patch Attachment 146071 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/12912129
Misha
Comment 8 2012-06-07 08:40:04 PDT
Created attachment 146294 [details] Another attempt
WebKit Review Bot
Comment 9 2012-06-07 08:45:18 PDT
Attachment 146294 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:543: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.h:34: Alphabetical sorting problem. [build/include_order] [4] Total errors found: 2 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.
Misha
Comment 10 2012-06-07 09:05:45 PDT
Created attachment 146299 [details] Attempt to fix cr build
WebKit Review Bot
Comment 11 2012-06-07 09:10:25 PDT
Attachment 146299 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:543: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.h:34: Alphabetical sorting problem. [build/include_order] [4] Total errors found: 2 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.
Misha
Comment 12 2012-06-07 10:50:47 PDT
Created attachment 146322 [details] Revised patch again
WebKit Review Bot
Comment 13 2012-06-07 14:44:58 PDT
Comment on attachment 146322 [details] Revised patch again Attachment 146322 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/12907749
Misha
Comment 14 2012-06-07 15:52:55 PDT
Created attachment 146402 [details] Patch for review
Zoltan Horvath
Comment 15 2012-06-08 11:03:53 PDT
Comment on attachment 146402 [details] Patch for review View in context: https://bugs.webkit.org/attachment.cgi?id=146402&action=review > Source/WebCore/platform/image-decoders/ImageDecoder.h:168 > + inline PixelData* data() const { return (PixelData*) m_bitmap.bitmap().getPixels(); } You shouldn't use C style casts. > Source/WebCore/platform/image-decoders/gif/GIFImageDecoder.cpp:233 > + ImageFrame::PixelData* curAddr = buffer.data() + yBegin * buffer.width() + xBegin; You shouldn't use abbreviation. Just write currentAddress e.g. > Source/WebCore/platform/image-decoders/gif/GIFImageDecoder.cpp:252 > + curAddr++; I prefer ++ as prefix. > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:517 > + ImageFrame::PixelData* curAddr = buffer.data() + destY * buffer.width(); Same naming here. And several other places. > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:599 > + bool ret = true; We should also find a better name for this. e.g. isScanningSuccessful, or something talkative. > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.h:64 > + bool outputScanlinesRGB(ImageFrame& buffer); The difference between the the two functions is only the 'for' loop inside. Can't we just make 1 function with an additional parameter?
Misha
Comment 16 2012-06-08 13:37:31 PDT
Thanks for comments. Will address them, but I don't think I agree with last one. The purpose of the patch is to reduce branching for the places that got invoked a lot. Combining two functions into the one with parameter will give us pretty much the same code as we have currently in the trunk. This parameter will be checked for every row which is not optimal. (In reply to comment #15) > (From update of attachment 146402 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=146402&action=review > > > Source/WebCore/platform/image-decoders/ImageDecoder.h:168 > > + inline PixelData* data() const { return (PixelData*) m_bitmap.bitmap().getPixels(); } > > You shouldn't use C style casts. > > > Source/WebCore/platform/image-decoders/gif/GIFImageDecoder.cpp:233 > > + ImageFrame::PixelData* curAddr = buffer.data() + yBegin * buffer.width() + xBegin; > > You shouldn't use abbreviation. Just write currentAddress e.g. > > > Source/WebCore/platform/image-decoders/gif/GIFImageDecoder.cpp:252 > > + curAddr++; > > I prefer ++ as prefix. > > > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:517 > > + ImageFrame::PixelData* curAddr = buffer.data() + destY * buffer.width(); > > Same naming here. And several other places. > > > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:599 > > + bool ret = true; > > We should also find a better name for this. e.g. isScanningSuccessful, or something talkative. > > > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.h:64 > > + bool outputScanlinesRGB(ImageFrame& buffer); > > The difference between the the two functions is only the 'for' loop inside. Can't we just make 1 function with an additional parameter?
Misha
Comment 17 2012-06-08 13:40:54 PDT
Created attachment 146642 [details] Addressed comments
Peter Kasting
Comment 18 2012-06-13 13:25:18 PDT
(In reply to comment #16) > Thanks for comments. Will address them, but I don't think I agree with last one. The purpose of the patch is to reduce branching for the places that got invoked a lot. Combining two functions into the one with parameter will give us pretty much the same code as we have currently in the trunk. This parameter will be checked for every row which is not optimal. How about this then: Make the function templated with a bool or enum argument, and write the conditional as checking the template argument. This means the conditional result is fixed at compile-time and the compiler can presumably optimize it out. Double-check with your profiler that this doesn't cost perf, and write a comment about why you're doing this. The goal here is to get the perf win while maintaining the significant clarity and maintenance benefits of the current code.
Misha
Comment 19 2012-07-12 09:45:41 PDT
Created attachment 151976 [details] Addressed last comments
Misha
Comment 20 2012-07-12 09:47:16 PDT
Created attachment 151977 [details] new kcachegrind screen screen shot
Peter Kasting
Comment 21 2012-07-12 10:38:06 PDT
Comment on attachment 151976 [details] Addressed last comments View in context: https://bugs.webkit.org/attachment.cgi?id=151976&action=review > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:555 > +template <bool isCMYK> > +void setScanLines(ImageFrame& buffer, int x, ImageFrame::PixelData* currentAddress, JSAMPARRAY samples, int column); How about: template <bool isCMYK> void setScanLines(ImageFrame& buffer, int x, ImageFrame::PixelData* currentAddress, JSAMPARRAY samples, int column) { JSAMPLE* jsample = *samples + column * (isCMYK ? 4 : 3); unsigned r = jsample[0]; unsigned g = jsample[1]; unsigned b = jsample[2]; if (isCMYK) { // Source is 'Inverted CMYK', output is RGB. // See: http://www.easyrgb.com/math.php?MATH=M12#text12 // Or: http://www.ilkeratalay.com/colorspacesfaq.php#rgb // From CMYK to CMY: // X = X * (1 - K ) + K [for X = C, M, or Y] // Thus, from Inverted CMYK to CMY is: // X = (1-iX) * (1 - (1-iK)) + (1-iK) => 1 - iX*iK // From CMY (0..1) to RGB (0..1): // R = 1 - C => 1 - (1 - iC*iK) => iC*iK [G and B similar] unsigned k = jsample[3]; r = r * k / 255; g = g * k / 255; b = b * k / 255; } buffer.setRGBA(currentAddress, r, g, b, 0xFF); } > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:583 > +template <bool isScaled> int getColumn(Vector<int> columns, int x); How about: template <bool isScaled> int getColumn(Vector<int> columns, int x) { return isScaled ? columns[x] : x; } > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:658 > + ASSERT(!m_scaled); > + while (info->output_scanline < info->output_height) { > + unsigned char* row = reinterpret_cast<unsigned char*>(buffer.getAddr(0, info->output_scanline)); > + if (jpeg_read_scanlines(info, &row, 1) != 1) > + return false; The additional indenting here is wrong. > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:674 > + case JCS_RGB: How about: // outputScanlines() uses template arguments to fix the comparison values at // compile time so the compiler can avoid generating code with branches. if (info->out_color_space == JCS_RGB) return m_scaled ? outputScanlines<false, true>(buffer) : outputScanlines<false, false>(buffer); if (info->out_color_space == JCS_CMYK) return m_scaled ? outputScanlines<true, true>(buffer) : outputScanlines<true, false>(buffer); ASSERT_NOT_REACHED(); return setFailed();
Viatcheslav Ostapenko
Comment 22 2012-11-20 18:21:52 PST
WebKit Review Bot
Comment 23 2012-11-20 19:08:15 PST
Comment on attachment 175317 [details] Patch Attachment 175317 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/14933541 New failing tests: inspector-protocol/debugger-terminate-dedicated-worker-while-paused.html
Zoltan Horvath
Comment 24 2012-11-20 19:37:38 PST
Comment on attachment 175317 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=175317&action=review Some very little nits below. Otherwise looks okay for me. > Source/WebCore/ChangeLog:3 > + Optimization in image decoding I would use a more accurate name as a bug title. Something like: Reduce branching and multiplications in image decoding loops and functions > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:718 > + - Extra line. > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:791 > + - Extra line.
Viatcheslav Ostapenko
Comment 25 2012-11-20 20:08:55 PST
Adam Barth
Comment 26 2012-11-21 00:14:21 PST
Comment on attachment 175328 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=175328&action=review > Source/WebCore/ChangeLog:10 > + Optimization in image decoding. > + Reduce branching and multiplications in image decoding loops and functions. > + Rebase and update of original patch by Misha Tyutyunik <michael.tyuytunik@nokia.com> Do you have data on which benchmarks this improves and by how much?
Viatcheslav Ostapenko
Comment 27 2012-11-21 07:07:19 PST
(In reply to comment #26) > (From update of attachment 175328 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=175328&action=review > > > Source/WebCore/ChangeLog:10 > > + Optimization in image decoding. > > + Reduce branching and multiplications in image decoding loops and functions. > > + Rebase and update of original patch by Misha Tyutyunik <michael.tyuytunik@nokia.com> > > Do you have data on which benchmarks this improves and by how much? This improved loading time a bit on a low end device for pages with big amount of images, but it was some kind of side effect. Originally this patch tried to solve the problem of very slow update during scrolling of the page with big number of images on the device with limited image cache. Profiling showed that hotspot in this case was image decoding. After implementing this patch tile updates became visually faster and image decoding functions moved from top of kcachegrind below the painting functions. To make profiling of image decoding functions easier image cache was set to the very low limit. Large page was loaded and scrolled up/down under profiler. kcachegrind screenshots are attached.
noel gordon
Comment 28 2012-11-26 05:16:13 PST
(In reply to comment #27) I have no way to reproduce the kcachegrind images, nor do I understand what they mean when I compare them. All Comment #1 mentioned was ... > This patch is addressing functions that are getting called for each pixel during image decoding. In such cases it's better to avoid as much as possible branching inside loops and multiplications. Assuming an image from the browser cache, the image decoding cost is 1) the decoder library API time, and 2) the time to write the decoded rows to the frame buffer. The patch appears to be dealing with 2). I was curious to find out how much time 2) takes when decoding PNG images. See bug 103216 where the results suggest that the row write time is ~15% of the PNG image decoding cost. The row writer calls buffer.setRGBA(x,y), which requires an integer multiply-add for each pixel x in row y. Since row pixels are written in order, x in [0..row.width) typically, it might be faster to write to the frame buffer *address++ for each x. I tried that on bug 103216 and the answer is yes - the overall PNG decoding time is reduced by a few percent. Seems those multiply-adds on each pixel write do slow decoding more than necessary. I plan to submit the patch on bug 103216 to address PNG. A similar patch would work for GIF, and I'll do that next.
Viatcheslav Ostapenko
Comment 29 2012-11-26 07:26:52 PST
(In reply to comment #28) > (In reply to comment #27) > > The row writer calls buffer.setRGBA(x,y), which requires an integer multiply-add for each pixel x in row y. Since row pixels are written in order, x in [0..row.width) typically, it might be faster to write to the frame buffer *address++ for each x. I tried that on bug 103216 and the answer is yes - the overall PNG decoding time is reduced by a few percent. Actually, that's the part of what patch does. It replaces call to setRGBA(x, y) with setRGBA(address) which would be inlined by compiler. This patch does the same as 103216 for all image formats and has some additional optimizations. They way to reproduce cache grid is to limit image cache in order to force continuous decoding of images during page scrolling. Without this patch image decoding functions were at the top meaning that they were using more CPU time than other functions (were hot spots), but after this patch they disappeared from cachegrind top meaning that they don't use so much time anymore. It is difficult to say exactly how much improvement is achieved, but moving from 16% CPU usage to about 1% I would consider significant.
Brent Fulgham
Comment 30 2012-11-26 10:24:06 PST
Comment on attachment 175328 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=175328&action=review > Source/WebCore/platform/image-decoders/ImageDecoder.h:187 > + inline PixelData* data() const { return m_bytes; } Do we need the 'inline' keyword here? I thought that it was unnecessary if the implementation was inline. > Source/WebCore/platform/image-decoders/ImageDecoder.h:188 > + inline int width() const { return m_size.width(); } Ditto. > Source/WebCore/platform/image-decoders/ImageDecoder.h:189 > + inline int height() const { return m_size.height(); } Ditto. > Source/WebCore/platform/image-decoders/ImageDecoder.h:191 > + inline PixelData* data() const { return static_cast<PixelData*>(m_bitmap.bitmap().getPixels()); } Ditto.
Brent Fulgham
Comment 31 2012-11-26 10:28:17 PST
(In reply to comment #29) > (In reply to comment #28) > > (In reply to comment #27) > > > > The row writer calls buffer.setRGBA(x,y), which requires an integer multiply-add for each pixel x in row y. Since row pixels are written in order, x in [0..row.width) typically, it might be faster to write to the frame buffer *address++ for each x. I tried that on bug 103216 and the answer is yes - the overall PNG decoding time is reduced by a few percent. > > Actually, that's the part of what patch does. It replaces call to setRGBA(x, y) with setRGBA(address) which would be inlined by compiler. This patch does the same as 103216 for all image formats and has some additional optimizations. > > They way to reproduce cache grid is to limit image cache in order to force continuous decoding of images during page scrolling. Without this patch image decoding functions were at the top meaning that they were using more CPU time than other functions (were hot spots), but after this patch they disappeared from cachegrind top meaning that they don't use so much time anymore. It is difficult to say exactly how much improvement is achieved, but moving from 16% CPU usage to about 1% I would consider significant. I would like to use this bug as an overarching bug to address these issues, and land the individual implementations (PNG already covered by the new 103216 bug), JPEG, etc. Noel, can you update your existing patch to include the inlining of the width/height and mention this bug in your ChangeLog?
Viatcheslav Ostapenko
Comment 32 2012-11-26 14:39:23 PST
Created attachment 176070 [details] Simple benchmark for ImageDecoder.
Viatcheslav Ostapenko
Comment 33 2012-11-26 14:47:36 PST
(In reply to comment #32) > Created an attachment (id=176070) [details] > Simple benchmark for ImageDecoder. I've created simple benchmark, that loads file specified in command line and decodes it 1000 times. Results for couple images: Before patch: time ./ImageDecoderPerf img.jpg Done real 0m14.184s user 0m14.109s sys 0m0.048s time ./ImageDecoderPerf img.png Done real 0m49.146s user 0m48.967s sys 0m0.056s After patch: time ./ImageDecoderPerf img.jpg Done real 0m13.254s user 0m13.177s sys 0m0.048s ./ImageDecoderPerf img.png Done real 0m47.481s user 0m47.343s sys 0m0.048s For JPEG decoder improvement is about 6.6% and for PNG is about 3.4% from total decoding time.
Brent Fulgham
Comment 34 2012-11-26 16:45:31 PST
(In reply to comment #33) > (In reply to comment #32) > > Created an attachment (id=176070) [details] [details] > > Simple benchmark for ImageDecoder. > > I've created simple benchmark, that loads file specified in command line and decodes it 1000 times. > [...] > > For JPEG decoder improvement is about 6.6% and for PNG is about 3.4% from total decoding time. Very nice! I know this issue has gone through a number of iterations, and you are probably tired of all of this back-and-forth. Thank you for your patience. :-) Would you mind creating a new bug for your image decoder utility? We could make it part of the Tools directory, and could add build rules for all of the various platforms to allow similar benchmarks on all of our targets. Also, could you please rebaseline the patch after 103216 closes out since it incorporates several of the changes you propose in your patch?
Brent Fulgham
Comment 35 2012-11-26 16:47:03 PST
Comment on attachment 175328 [details] Patch Much of this patch is being committed as part of Bug 103216. r- so that you can rebaseline after that change.
noel gordon
Comment 36 2012-11-26 16:54:10 PST
(In reply to comment #33) > I've created simple benchmark, that loads file specified in command line and decodes it 1000 times. > ... > For JPEG decoder improvement is about 6.6% and for PNG is about 3.4% from total decoding time. Debugging sucks, testing is better.
noel gordon
Comment 37 2012-11-26 17:02:42 PST
(In reply to comment #31) > I would like to use this bug as an overarching bug to address these issues, and land the individual implementations (PNG already covered by the new 103216 bug), JPEG, etc. Agree. Seems the original patch author no longer works on webkit, so we need to retest these changes. > Noel, can you update your existing patch to include the inlining of the width/height and mention this bug in your ChangeLog? Let's do that last and in a separate bug given the PNG patch is now cq+.
Brent Fulgham
Comment 38 2012-11-26 17:14:12 PST
(In reply to comment #37) > (In reply to comment #31) > > Noel, can you update your existing patch to include the inlining of the width/height and mention this bug in your ChangeLog? > > Let's do that last and in a separate bug given the PNG patch is now cq+. Sounds good. Looks like the patch failed to apply for some reason...
Brent Fulgham
Comment 39 2012-11-26 17:20:52 PST
(In reply to comment #33) > time ./ImageDecoderPerf img.jpg > Done > > real 0m13.254s > user 0m13.177s > sys 0m0.048s It goes without saying (of course), but we should really assess performance using gprof, Instruments, Valgrind, or similar. This is a good first approximation, but all kinds of things might affect these values, especially if we are in the 0-5% difference range. We might be better off alternating the two implementations in a loop and get a mean/std. deviation of the run times to get a better feel for the change.
Viatcheslav Ostapenko
Comment 40 2012-11-26 18:21:11 PST
(In reply to comment #39) > (In reply to comment #33) > > > time ./ImageDecoderPerf img.jpg > > Done > > > > real 0m13.254s > > user 0m13.177s > > sys 0m0.048s > > It goes without saying (of course), but we should really assess performance using gprof, Instruments, Valgrind, or similar. oprofile/kcachegrind screenshots are attached from the very beginning half year old. > This is a good first approximation, but all kinds of things might affect these values, especially if we are in the 0-5% difference range. That's on total image decoding, including decoder work itself and on intel desktop CPUs with good branch prediction. This think was tested on ARM and difference was bigger. > We might be better off alternating the two implementations in a loop and get a mean/std. deviation of the run times to get a better feel for the change. Was it done here: https://bugs.webkit.org/show_bug.cgi?id=103216 ?
Viatcheslav Ostapenko
Comment 41 2012-11-26 18:28:20 PST
(In reply to comment #35) > (From update of attachment 175328 [details]) > Much of this patch is being committed as part of Bug 103216. r- so that you can rebaseline after that change. Not much(In reply to comment #36) > (In reply to comment #33) > > > I've created simple benchmark, that loads file specified in command line and decodes it 1000 times. > > ... > > For JPEG decoder improvement is about 6.6% and for PNG is about 3.4% from total decoding time. > > Debugging sucks, testing is better. What do you mean? Have you looked into benchmarked I attached?
Viatcheslav Ostapenko
Comment 42 2012-11-26 18:29:30 PST
(In reply to comment #37) > (In reply to comment #31) > > I would like to use this bug as an overarching bug to address these issues, and land the individual implementations (PNG already covered by the new 103216 bug), JPEG, etc. > > Agree. Seems the original patch author no longer works on webkit, so we need to retest these changes. Test is attached. Isn't it? > > Noel, can you update your existing patch to include the inlining of the width/height and mention this bug in your ChangeLog? > > Let's do that last and in a separate bug given the PNG patch is now cq+. What a rush!
Viatcheslav Ostapenko
Comment 43 2012-11-26 18:50:42 PST
(In reply to comment #34) > (In reply to comment #33) > > (In reply to comment #32) > > > Created an attachment (id=176070) [details] [details] [details] > > > Simple benchmark for ImageDecoder. > > > > I've created simple benchmark, that loads file specified in command line and decodes it 1000 times. > > > [...] > > > > For JPEG decoder improvement is about 6.6% and for PNG is about 3.4% from total decoding time. > > Very nice! > > I know this issue has gone through a number of iterations, and you are probably tired of all of this back-and-forth. Thank you for your patience. :-) > > Would you mind creating a new bug for your image decoder utility? We could make it part of the Tools directory, and could add build rules for all of the various platforms to allow similar benchmarks on all of our targets. > > Also, could you please rebaseline the patch after 103216 closes out since it incorporates several of the changes you propose in your patch? Not all platforms export those symbols. EFL normally also doesn't, but it has special dev build with separate libraries and symbols exported, which I used for testing. Normally it wouldn't work.
Viatcheslav Ostapenko
Comment 44 2012-11-26 20:08:41 PST
Brent Fulgham
Comment 45 2012-11-27 12:38:18 PST
Comment on attachment 175328 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=175328&action=review > Source/WebCore/platform/image-decoders/ImageDecoder.h:163 > inline void setRGBA(PixelData* dest, unsigned r, unsigned g, unsigned b, unsigned a) I wonder if we could use pkasting's idea here, and make this a template that encodes the decision about premultiplication and bounds-checking of the alpha term. This would remove two branches, the bounds check, and possibly the calculation of the color band scaling factor.
Viatcheslav Ostapenko
Comment 46 2012-11-27 13:12:22 PST
(In reply to comment #45) > (From update of attachment 175328 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=175328&action=review > > > Source/WebCore/platform/image-decoders/ImageDecoder.h:163 > > inline void setRGBA(PixelData* dest, unsigned r, unsigned g, unsigned b, unsigned a) > > I wonder if we could use pkasting's idea here, and make this a template that encodes the decision about premultiplication and bounds-checking of the alpha term. This would remove two branches, the bounds check, and possibly the calculation of the color band scaling factor. Yes, I'm planning another commit for this.
Brent Fulgham
Comment 47 2012-11-27 16:56:12 PST
Comment on attachment 176155 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=176155&action=review As I mentioned previously, I'd like to land the GIF part first, and the JPEG part separately. Could you please revise this patch to just be the GIF part (and the inlining), and save the JPEG for a second step? Also, I think we can avoid exporting more symbols by just using the getAddr call (rather than making width() and height() public). Let's do this, unless you have a strong reason for wanting to expose them. r- for these reasons. > Source/WebCore/platform/image-decoders/ImageDecoder.h:-184 > - private: Can we please keep width() and height() private (see below comment). > Source/WebCore/platform/image-decoders/gif/GIFImageDecoder.cpp:233 > + ImageFrame::PixelData* currentAddress = buffer.data() + yBegin * buffer.width() + xBegin; Note: As Noel Gordon points out, we can avoid exposing the width and height accessors entirely by using the getAddr method: ImageFrame::PixelData* address = buffer.getAddr(0, y);
Brent Fulgham
Comment 48 2012-11-27 16:58:12 PST
Viatcheslav: I split your profiler class into a separate bug (see 103463).
Viatcheslav Ostapenko
Comment 49 2012-11-27 17:54:00 PST
Created attachment 176377 [details] GIF part
Viatcheslav Ostapenko
Comment 50 2012-11-27 17:59:33 PST
Created attachment 176379 [details] JPEG part
Brent Fulgham
Comment 51 2012-11-27 21:25:10 PST
Comment on attachment 176377 [details] GIF part Let's get rid of the inlines that were done as part of https://bugs.webkit.org/show_bug.cgi?id=103463.
Viatcheslav Ostapenko
Comment 52 2012-11-27 21:29:02 PST
Created attachment 176393 [details] GIF part without inlines
Brent Fulgham
Comment 53 2012-11-27 21:32:25 PST
Isn't that just the same patch you submitted previously?
Viatcheslav Ostapenko
Comment 54 2012-11-27 21:34:29 PST
(In reply to comment #53) > Isn't that just the same patch you submitted previously? I removed inline before width/height.
Brent Fulgham
Comment 55 2012-11-27 21:41:08 PST
Comment on attachment 176393 [details] GIF part without inlines Looks good, aside from the code that just got landed in https://bugs.webkit.org/show_bug.cgi?id=103401. I'll fix it up and land.
Brent Fulgham
Comment 56 2012-11-27 21:42:13 PST
Brent Fulgham
Comment 57 2012-11-27 21:42:36 PST
No, webkit-patch! This is not closed yet.
Viatcheslav Ostapenko
Comment 58 2012-11-27 21:49:36 PST
(In reply to comment #57) > No, webkit-patch! This is not closed yet. That's strange. Usually it is able to detect, that not all patches were landed.
Brent Fulgham
Comment 59 2012-11-27 22:00:04 PST
Comment on attachment 176379 [details] JPEG part View in context: https://bugs.webkit.org/attachment.cgi?id=176379&action=review > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:657 > + switch (J_COLOR_SPACE(colorSpace)) { Could this switch be handled as template specializations to avoid branching? Would it even make a difference in performance?
Viatcheslav Ostapenko
Comment 60 2012-11-27 22:15:36 PST
Comment on attachment 176379 [details] JPEG part View in context: https://bugs.webkit.org/attachment.cgi?id=176379&action=review >> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:657 >> + switch (J_COLOR_SPACE(colorSpace)) { > > Could this switch be handled as template specializations to avoid branching? Would it even make a difference in performance? It is actually. colorSpace is parameter of template. It makes switch on constant and compiler optimizes it out. Originally it was several template specializations in 1st version of patch, but Peter Kasting suggested just to use if/switch as in many other places in code. We've tried and it worked - compiler creates several versions of function killing unreachable branches.
Brent Fulgham
Comment 61 2012-11-27 22:28:49 PST
(In reply to comment #60) > (From update of attachment 176379 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=176379&action=review > > >> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:657 > >> + switch (J_COLOR_SPACE(colorSpace)) { > > > > Could this switch be handled as template specializations to avoid branching? Would it even make a difference in performance? > > It is actually. colorSpace is parameter of template. It makes switch on constant and compiler optimizes it out. > Originally it was several template specializations in 1st version of patch, but Peter Kasting suggested just to use if/switch as in many other places in code. We've tried and it worked - compiler creates several versions of function killing unreachable branches. Oh, excellent!
Brent Fulgham
Comment 62 2012-11-27 22:32:07 PST
Comment on attachment 176393 [details] GIF part without inlines Clearing these flags since it is already landed, so that it does not show up in searches.
Brent Fulgham
Comment 63 2012-11-27 23:12:35 PST
Comment on attachment 176379 [details] JPEG part View in context: https://bugs.webkit.org/attachment.cgi?id=176379&action=review Based on discussion on IRC with ostap and noel, this current patch is missing some new functionality from the libjpeg-turbo update. > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:652 > + JSAMPLE* jsample = *samples + column * (J_COLOR_SPACE(colorSpace) == JCS_CMYK ? 4 : 3); As Noel pointed out on IRC, this breaks libjpeg-turbo. See http://trac.webkit.org/browser/trunk/Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp?annotate=blame&rev=135976#L706 for details of the break. More specifically: JSAMPLE* jsample = *samples + (m_scaled ? m_scaledColumns[x] : x) * ((info->out_color_space == JCS_RGB) ? 3 : 4); > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:-706 > - JSAMPLE* jsample = *samples + (m_scaled ? m_scaledColumns[x] : x) * ((info->out_color_space == JCS_RGB) ? 3 : 4); This line is not dealt with properly in this change.
Viatcheslav Ostapenko
Comment 64 2012-11-27 23:19:10 PST
Created attachment 176408 [details] jpeg part, turbo issue fixed.
noel gordon
Comment 65 2012-11-28 05:55:20 PST
Comment on attachment 176408 [details] jpeg part, turbo issue fixed. View in context: https://bugs.webkit.org/attachment.cgi?id=176408&action=review > Source/WebCore/ChangeLog:19 > + * platform/image-decoders/jpeg/JPEGImageDecoder.cpp: > + (WebCore): > + (WebCore::setScanLines): > + (WebCore::JPEGImageDecoder::outputScanlines): > + * platform/image-decoders/jpeg/JPEGImageDecoder.h: > + (JPEGImageDecoder): This area of the ChangeLog needs comments. > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:650 > +void setScanLines(ImageFrame& buffer, ImageFrame::PixelData* currentAddress, JSAMPARRAY samples, int column) setScanLines? Perhaps setPixel would be a better function name? > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:652 > + JSAMPLE* jsample = *samples + column * (J_COLOR_SPACE(colorSpace) == JCS_RGB ? 3 : 4); This fixes the libjpeg-turbo error mentioned on IRC with the previous patch, thank you. The bug was not obvious. Peter mentioned something about this in Comment #18 (see below). The J_COLOR_SPACE() is not needed here, I believe: write it as JSAMPLE* jsample = *samples + column * (colorSpace == JCS_RGB ? 3 : 4); I use the Clang C++ compiler: it does not complain about this comparison. Peter's Comment #18 > The goal here is to get the perf win while maintaining the significant clarity and maintenance benefits of the current code. I would add that the clarity he seeks also helps us analyse this code in terms of security. The image decoder code is security sensitive code. Too many templates tend to obfuscate the code, making security much harder to assess. With that in mind ... > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:655 > + unsigned r = jsample[0]; > + unsigned g = jsample[1]; > + unsigned b = jsample[2]; I would prefer we got rid of these temporary variables ... > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:682 > + switch (J_COLOR_SPACE(colorSpace)) { > +#if defined(TURBO_JPEG_RGB_SWIZZLE) > + case JCS_EXT_BGRA: > + std::swap(r, b); > + break; > + case JCS_EXT_RGBA: // Fallback to JSC_RGB case here. > +#endif > + case JCS_RGB: > + // Empty to avoid compiler warning about unhandled case in switch. > + break; > + case JCS_CMYK: > + // Source is 'Inverted CMYK', output is RGB. > + // See: http://www.easyrgb.com/math.php?MATH=M12#text12 > + // Or: http://www.ilkeratalay.com/colorspacesfaq.php#rgb > + // From CMYK to CMY: > + // X = X * (1 - K ) + K [for X = C, M, or Y] > + // Thus, from Inverted CMYK to CMY is: > + // X = (1-iX) * (1 - (1-iK)) + (1-iK) => 1 - iX*iK > + // From CMY (0..1) to RGB (0..1): > + // R = 1 - C => 1 - (1 - iC*iK) => iC*iK [G and B similar] > + unsigned k = jsample[3]; > + r = r * k / 255; > + g = g * k / 255; > + b = b * k / 255; > + break; > + } ... and instead used the if statements from the original code here: if (colorSpace == JCS_RGB) buffer.setRGBA(currentAddress, jsample[0], jsample[1], jsample[2], 0xFF); #if defined(TURBO_JPEG_RGB_SWIZZLE) else if (colorSpace == JCS_EXT_RGBA) buffer.setRGBA(currentAddress, jsample[0], jsample[1], jsample[2], 0xFF); else if (colorSpace == JCS_EXT_BGRA) buffer.setRGBA(currentAddress, jsample[2], jsample[1], jsample[0], 0xFF); #endif else if (colorSpace == JCS_CMYK) { // long comment about cmyk ... unsigned k = jsample[3]; buffer.setRGBA(currentAddress, jsample[0] * k / 255, jsample[1] * k / 255, jsample[2] * k / 255, 0xFF); } } It's shorter, and easier to read and verify against the previous code. This function is templated on colorSpace so the compiler will write a sensible template function for each colorSpace value as you noted in Comment #60. > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:688 > +template <int colorSpace, bool isScaled> > +bool JPEGImageDecoder::outputScanlines(ImageFrame& buffer) > +{ This is a member function, but we pass its frame buffer member as an argument to this member function! That seems somewhat odd to me. Why is this not bool JPEGImageDecoder::outputScanlines() with normal access to the buffer member in the function body? Maybe your profiling detected a difference? I'm not sure: the ChangeLog doesn't mention it. > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:705 > +#if USE(QCMSLIB) > + if (m_reader->colorTransform() && info->out_color_space == JCS_RGB) This function is templated on colorSpace, so why use info->out_color_space here? I think you meant to write if (m_reader->colorTransform() && colorSpace == JCS_RGB) That would allow the compiler to template specialize this code even more. > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:711 > + int width = m_scaled ? m_scaledColumns.size() : info->output_width; This function is templated on isScaled, yet you use m_scaled here. Again, I think you meant to write this in terms of the template parameter int width = isScaled ? m_scaledColumns.size() : info->output_width; Also, why recompute width every time through this loop? It's a loop invariant, right? I'd move it out of the loop body. > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:713 > + for (int x = 0; x < width; ++x) { > + setScanLines<colorSpace>(buffer, currentAddress, m_reader->samples(), isScaled ? m_scaledColumns[x] : x); Use samples instead of m_reader->samples() here. A post increment on currentAddress might also be useful: that'd remove 3 more lines. > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:724 > +template <int colorSpace> > +bool JPEGImageDecoder::outputScanlines(ImageFrame& buffer) > +{ > + return m_scaled ? outputScanlines<colorSpace, true>(buffer) : outputScanlines<colorSpace, false>(buffer); > +} This seems unnecessary. You could inline it I suppose, but I'd remove this function (saving a function call) and use it's body at the call site. Peter suggested this in Comment #21 and even provided the prototype code ... > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:764 > + bool isScanSuccessful = true; ... so let's replace these changes here and below with Peter's suggestion: // outputScanlines() uses template arguments to fix the comparison values at compile-time // so the compiler can avoid generating code with branches. if (info->out_color_space == JCS_RGB) return m_scaled ? outputScanlines<JCS_RGB, true>(buffer) : outputScanlines<JCS_RGB, false>(buffer); #if defined(TURBO_JPEG_RGB_SWIZZLE) if (info->out_color_space == JCS_EXT_RGBA) return m_scaled ? outputScanlines<JCS_EXT_RGBA, true>(buffer) : outputScanlines<JCS_EXT_RGBA, false>(buffer); if (info->out_color_space == JCS_EXT_BGRA) return m_scaled ? outputScanlines<JCS_EXT_BGRA, true>(buffer) : outputScanlines<JCS_EXT_BGRA, false>(buffer); #endif if (info->out_color_space == JCS_CMYK) return m_scaled ? outputScanlines<JCS_CMYK, true>(buffer) : outputScanlines<JCS_CMYK, false>(buffer); ASSERT_NOT_REACHED(); return setFailed(); } Works for me, thank you Peter.
Viatcheslav Ostapenko
Comment 66 2012-11-28 11:52:00 PST
Comment on attachment 176408 [details] jpeg part, turbo issue fixed. View in context: https://bugs.webkit.org/attachment.cgi?id=176408&action=review >> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:688 >> +{ > > This is a member function, but we pass its frame buffer member as an argument to this member function! That seems somewhat odd to me. Why is this not bool JPEGImageDecoder::outputScanlines() with normal access to the buffer member in the function body? Maybe your profiling detected a difference? I'm not sure: the ChangeLog doesn't mention it. IMHO, it is odd to write every time m_frameBufferCache[0]. This function is called once for the whole decoding and it makes no difference from performance point of view. >> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:724 >> +} > > This seems unnecessary. You could inline it I suppose, but I'd remove this function (saving a function call) and use it's body at the call site. Peter suggested this in Comment #21 and even provided the prototype code ... I prefer it this way to avoid writing repeating code. Also this function is a place holder for some other code in future commits. >> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:764 >> + bool isScanSuccessful = true; > > ... so let's replace these changes here and below with Peter's suggestion: > > // outputScanlines() uses template arguments to fix the comparison values at compile-time > // so the compiler can avoid generating code with branches. > > if (info->out_color_space == JCS_RGB) > return m_scaled ? outputScanlines<JCS_RGB, true>(buffer) : outputScanlines<JCS_RGB, false>(buffer); > #if defined(TURBO_JPEG_RGB_SWIZZLE) > if (info->out_color_space == JCS_EXT_RGBA) > return m_scaled ? outputScanlines<JCS_EXT_RGBA, true>(buffer) : outputScanlines<JCS_EXT_RGBA, false>(buffer); > if (info->out_color_space == JCS_EXT_BGRA) > return m_scaled ? outputScanlines<JCS_EXT_BGRA, true>(buffer) : outputScanlines<JCS_EXT_BGRA, false>(buffer); > #endif > if (info->out_color_space == JCS_CMYK) > return m_scaled ? outputScanlines<JCS_CMYK, true>(buffer) : outputScanlines<JCS_CMYK, false>(buffer); > > ASSERT_NOT_REACHED(); > return setFailed(); > } > > Works for me, thank you Peter. I prefer to use switch instead of if chains because it looks cleaner and usually is better optimized by compiler.
Viatcheslav Ostapenko
Comment 67 2012-11-28 12:53:37 PST
Created attachment 176555 [details] Updated patch.
Brent Fulgham
Comment 68 2012-11-29 09:52:39 PST
Comment on attachment 176555 [details] Updated patch. View in context: https://bugs.webkit.org/attachment.cgi?id=176555&action=review I think this looks really good. I had a couple of questions/comments, none of which are critical. > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:652 > + JSAMPLE* jsample = *samples + column * (J_COLOR_SPACE(colorSpace) == JCS_RGB ? 3 : 4); Noel pointed out that (at least under Clang) that this can just be: JSAMPLE* jsample = *samples + column * (colorSpace == JCS_RGB ? 3 : 4); Does that not work on your compiler? > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:674 > + unsigned k = jsample[3]; It really seems like making "k = sample[3] / 255" (and getting rid of the three divides below) would be better, but I don't know offhand if sample[3] is floating point or not. If it is, that might change the result of the division/multiply combinations. As Noel pointed out, Clang might be emitting code that caches the k/255 so we don't pay the division cost three times, so it may be a moot point. > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:705 > + int width = isScaled ? m_scaledColumns.size() : info->output_width; I think Noel made a good point about the width calculation. Isn't it a loop invariant, so that it could be defined outside the loop? Or does m_scaledColumns or info->output_width change based on the calls to jpeg_read_scanlines or something? Could you please clarify (or change the code)? > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:761 > + // That is why use teplate and template specializations here so nit: "That is why *WE* use *TEMPLATE* ..." > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:762 > + // the proper code will be generated in compile time. nit: ".. at compile time"
Viatcheslav Ostapenko
Comment 69 2012-11-29 11:37:04 PST
Comment on attachment 176555 [details] Updated patch. View in context: https://bugs.webkit.org/attachment.cgi?id=176555&action=review >> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:652 >> + JSAMPLE* jsample = *samples + column * (J_COLOR_SPACE(colorSpace) == JCS_RGB ? 3 : 4); > > Noel pointed out that (at least under Clang) that this can just be: > > JSAMPLE* jsample = *samples + column * (colorSpace == JCS_RGB ? 3 : 4); > > Does that not work on your compiler? I remember gcc gave some warning here, which treated as error on many builds. enum(int) conversion used in many places, so I followed the crowed. This doesn't produce any extra code - just type conversion. >> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:674 >> + unsigned k = jsample[3]; > > It really seems like making "k = sample[3] / 255" (and getting rid of the three divides below) would be better, but I don't know offhand if sample[3] is floating point or not. If it is, that might change the result of the division/multiply combinations. > > As Noel pointed out, Clang might be emitting code that caches the k/255 so we don't pay the division cost three times, so it may be a moot point. I don't know what he tried. I wrote a test and it makes a difference replacing division with less costly operations, but this difference might be not seen in this case for integer division. There is much bigger difference of replacing /255 in setRGBA because it has 4x conversions to float, float division, 3x float multiplications and 3x conversions from float. I'd like to make separate patch when this goes in. >> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:705 >> + int width = isScaled ? m_scaledColumns.size() : info->output_width; > > I think Noel made a good point about the width calculation. Isn't it a loop invariant, so that it could be defined outside the loop? Or does m_scaledColumns or info->output_width change based on the calls to jpeg_read_scanlines or something? Could you please clarify (or change the code)? Sorry, I've missed that. If probably compiler will optimize this out of the loop in any case because isScale is template parameter and size() and output_width are simple inlined functions in mostly inlined code. But let's move it out to make it clear.
Brent Fulgham
Comment 70 2012-11-29 11:46:23 PST
(In reply to comment #69) > >> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:652 > >> + JSAMPLE* jsample = *samples + column * (J_COLOR_SPACE(colorSpace) == JCS_RGB ? 3 : 4); > > > > Noel pointed out that (at least under Clang) that this can just be: > > > > JSAMPLE* jsample = *samples + column * (colorSpace == JCS_RGB ? 3 : 4); > > > > Does that not work on your compiler? > > I remember gcc gave some warning here, which treated as error on many builds. enum(int) conversion used in many places, so I followed the crowed. This doesn't produce any extra code - just type conversion. Let's please do it as a C++-style cast in accordance with the coding conventions. I actually thought it was some kind of libjpeg macro or something based on the way it was written: JSAMPLE* jsample = *samples + column * (static_cast<J_COLOR_SPACE>(colorSpace) == JCS_RGB ? 3 : 4); > >> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:674 > There is much bigger difference of replacing /255 in setRGBA because it has 4x conversions to float, float division, 3x float multiplications and 3x conversions from float. > I'd like to make separate patch when this goes in. Great -- just please remember to include this one at the same time, just to be consistent :-) Thanks!
Viatcheslav Ostapenko
Comment 71 2012-11-29 13:59:47 PST
Created attachment 176813 [details] Patch for landing.
WebKit Review Bot
Comment 72 2012-11-29 14:31:42 PST
Comment on attachment 176813 [details] Patch for landing. Rejecting attachment 176813 [details] from commit-queue. Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '-..." exit_code: 2 Last 500 characters of output: mit-queue/Source/WebKit/chromium/third_party/skia/gyp --revision 6555 --non-interactive --force --accept theirs-conflict --ignore-externals' in '/mnt/git/webkit-commit-queue/Source/WebKit/chromium' 46>At revision 6555. ________ running '/usr/bin/python tools/clang/scripts/update.py --mac-only' in '/mnt/git/webkit-commit-queue/Source/WebKit/chromium' ________ running '/usr/bin/python gyp_webkit' in '/mnt/git/webkit-commit-queue/Source/WebKit/chromium' Updating webkit projects from gyp files... Full output: http://queues.webkit.org/results/15029960
Brent Fulgham
Comment 73 2012-11-29 15:47:18 PST
(In reply to comment #72) > (From update of attachment 176813 [details]) > Rejecting attachment 176813 [details] from commit-queue. It has to be r+'d by someone before the commit queue will accept it.
Viatcheslav Ostapenko
Comment 74 2012-11-29 16:37:53 PST
(In reply to comment #73) > (In reply to comment #72) > > (From update of attachment 176813 [details] [details]) > > Rejecting attachment 176813 [details] [details] from commit-queue. > > It has to be r+'d by someone before the commit queue will accept it. No, I've inserted "Reviewed by". It complained about OOPS in Changelog. Sorry, missed it.
Viatcheslav Ostapenko
Comment 75 2012-11-29 16:42:58 PST
Created attachment 176848 [details] Patch for landing
WebKit Review Bot
Comment 76 2012-11-29 17:16:06 PST
Comment on attachment 176848 [details] Patch for landing Clearing flags on attachment: 176848 Committed r136189: <http://trac.webkit.org/changeset/136189>
WebKit Review Bot
Comment 77 2012-11-29 17:16:15 PST
All reviewed patches have been landed. Closing bug.
noel gordon
Comment 78 2012-11-30 00:06:36 PST
(In reply to comment #70) > > I remember gcc gave some warning here, which treated as error on many builds. enum(int) conversion used in many places, so I followed the crowed. This doesn't produce any extra code - just type conversion. > > Let's please do it as a C++-style cast in accordance with the coding conventions. I actually thought it was some kind of libjpeg macro or something based on the way it was written: > > JSAMPLE* jsample = *samples + column * (static_cast<J_COLOR_SPACE>(colorSpace) == JCS_RGB ? 3 : 4); % cat test.cpp typedef enum { JCS_UNKNOWN, JCS_RGB, JCS_EXT_RGBA, JCS_EXT_BGRA } J_COLOR_SPACE; int colorSpace() { return 1; } int main() { if (colorSpace() == JCS_EXT_RGBA) return 1; return 0; } % g++ -Wall --pedantic test.cpp No warnings. Maybe your respective g++'s are broken?
Peter Kasting
Comment 79 2012-11-30 13:01:09 PST
Comment on attachment 176848 [details] Patch for landing View in context: https://bugs.webkit.org/attachment.cgi?id=176848&action=review > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:652 > + JSAMPLE* jsample = *samples + column * (static_cast<J_COLOR_SPACE>(colorSpace) == JCS_RGB ? 3 : 4); Maybe we should make the template arg types J_COLOR_SPACE instead of int? > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:701 > + if (m_reader->colorTransform() && colorSpace == JCS_RGB) Weird that you cast in the previous function but not here. We should be doing both or neither, I'd think. > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:718 > + return m_scaled ? outputScanlines<colorSpace, true>(buffer) : outputScanlines<colorSpace, false>(buffer); Can the compiler still manage to compile this if we just write as "return outputScanlines<colorSpace, m_scaled>(buffer);"? I'm guessing it's not smart enough to figure that out, but if it is that'd be clearer... At that point it'd probably make sense to inline this into any callsites.
Viatcheslav Ostapenko
Comment 80 2012-11-30 13:31:45 PST
(In reply to comment #79) > (From update of attachment 176848 [details]) It is already committed. Should I make another patch? > View in context: https://bugs.webkit.org/attachment.cgi?id=176848&action=review > > > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:652 > > + JSAMPLE* jsample = *samples + column * (static_cast<J_COLOR_SPACE>(colorSpace) == JCS_RGB ? 3 : 4); > > Maybe we should make the template arg types J_COLOR_SPACE instead of int? It requires including jpeglib.h into JPEGImageDecoder.h . I think it doesn't worth it. It works just fine on current compilers without cast. IMHO older compiler versions were giving some warning > > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:701 > > + if (m_reader->colorTransform() && colorSpace == JCS_RGB) > > Weird that you cast in the previous function but not here. We should be doing both or neither, I'd think. That's from recent changes. On gcc 2.6 it works fine and cast is not necessary. Originally this patch comes from mobile device and I think we had some older compiler version. It was giving error like this: http://stackoverflow.com/questions/8703743/using-enum-says-invalid-conversion-from-int-to-type > > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:718 > > + return m_scaled ? outputScanlines<colorSpace, true>(buffer) : outputScanlines<colorSpace, false>(buffer); > > Can the compiler still manage to compile this if we just write as "return outputScanlines<colorSpace, m_scaled>(buffer);"? I'm guessing it's not smart enough to figure that out, but if it is that'd be clearer... No, it doesn't.
Peter Kasting
Comment 81 2012-11-30 13:37:28 PST
(In reply to comment #80) > > Maybe we should make the template arg types J_COLOR_SPACE instead of int? > > It requires including jpeglib.h into JPEGImageDecoder.h . Is that bad? Does this break a layering rule? > I think it doesn't worth it. It works just fine on current compilers without cast. IMHO older compiler versions were giving some warning If we can safely include the header, I think it'd be clearer this way. It resolves the question of static-casting in the implementation and it eliminates an implicit cast on the call side. It's not a huge deal, I just think it's slightly nicer. I don't really see a downside.
Brent Fulgham
Comment 82 2012-11-30 14:29:01 PST
(In reply to comment #81) > (In reply to comment #80) > > > Maybe we should make the template arg types J_COLOR_SPACE instead of int? > > > > It requires including jpeglib.h into JPEGImageDecoder.h . > > Is that bad? Does this break a layering rule? > > > I think it doesn't worth it. It works just fine on current compilers without cast. IMHO older compiler versions were giving some warning > > If we can safely include the header, I think it'd be clearer this way. It resolves the question of static-casting in the implementation and it eliminates an implicit cast on the call side. It's not a huge deal, I just think it's slightly nicer. I don't really see a downside. I agree with Peter on this. WebKit is a big enough beast that the added overhead of including jpeglib.h probably doesn't create any appreciable increase in compile times. Do the LIBJPEG-TURBO headers have different names, or is it still just jpeglib.h?
Viatcheslav Ostapenko
Comment 83 2012-11-30 17:34:41 PST
Reopening to attach new patch.
Viatcheslav Ostapenko
Comment 84 2012-11-30 17:34:47 PST
Viatcheslav Ostapenko
Comment 85 2012-11-30 17:36:20 PST
(In reply to comment #82) > (In reply to comment #81) > > (In reply to comment #80) > > > > Maybe we should make the template arg types J_COLOR_SPACE instead of int? > > > > > > It requires including jpeglib.h into JPEGImageDecoder.h . > > > > Is that bad? Does this break a layering rule? > > > > > I think it doesn't worth it. It works just fine on current compilers without cast. IMHO older compiler versions were giving some warning > > > > If we can safely include the header, I think it'd be clearer this way. It resolves the question of static-casting in the implementation and it eliminates an implicit cast on the call side. It's not a huge deal, I just think it's slightly nicer. I don't really see a downside. > > I agree with Peter on this. WebKit is a big enough beast that the added overhead of including jpeglib.h probably doesn't create any appreciable increase in compile times. > > Do the LIBJPEG-TURBO headers have different names, or is it still just jpeglib.h? No, it's all the same. I've attached the patch.
noel gordon
Comment 86 2012-11-30 19:07:38 PST
(In reply to comment #80) > Originally this patch comes from mobile device and I think we had some older compiler version. > It was giving error like this: http://stackoverflow.com/questions/8703743/using-enum-says-invalid-conversion-from-int-to-type That article is about assigning an integer to an enumerated type. Color n = 1; // compile-error
noel gordon
Comment 87 2012-11-30 19:22:58 PST
(In reply to comment #80) > That's from recent changes. On gcc 2.6 it works fine and cast is not necessary. > Originally this patch comes from mobile device and I think we had some older compiler version. "some older compiler version"? Which compiler? And which version of it? We are talking about integer comparison. Enumerated types are integral types and always have been. They are compared in expressions using C++ integer promotion rules. Refer to http://msdn.microsoft.com/en-us/library/bd77ckhw(v=vs.71).aspx All the casting code added for the enum in question is superfluous noise.
Brent Fulgham
Comment 88 2012-12-03 10:14:18 PST
Comment on attachment 177063 [details] Patch Looks great! r=me.
WebKit Review Bot
Comment 89 2012-12-03 10:19:17 PST
Comment on attachment 177063 [details] Patch Clearing flags on attachment: 177063 Committed r136410: <http://trac.webkit.org/changeset/136410>
WebKit Review Bot
Comment 90 2012-12-03 10:19:27 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.