Bug 88424 - Optimization in image decoding
Summary: Optimization in image decoding
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Images (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Misha
URL:
Keywords: Performance
Depends on: 103463 103216 103401
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-06 09:41 PDT by Misha
Modified: 2012-12-03 10:19 PST (History)
13 users (show)

See Also:


Attachments
kcachegrind screen shot before patch (385.94 KB, image/png)
2012-06-06 09:41 PDT, Misha
no flags Details
kcachegrind screen shot after patch (451.88 KB, image/png)
2012-06-06 09:42 PDT, Misha
no flags Details
patch for review (12.36 KB, patch)
2012-06-06 09:50 PDT, Misha
gyuyoung.kim: commit-queue-
Details | Formatted Diff | Diff
revised patch (12.62 KB, patch)
2012-06-06 11:27 PDT, Misha
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
Another attempt (12.82 KB, patch)
2012-06-07 08:40 PDT, Misha
no flags Details | Formatted Diff | Diff
Attempt to fix cr build (12.82 KB, patch)
2012-06-07 09:05 PDT, Misha
no flags Details | Formatted Diff | Diff
Revised patch again (12.38 KB, patch)
2012-06-07 10:50 PDT, Misha
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
Patch for review (12.37 KB, patch)
2012-06-07 15:52 PDT, Misha
no flags Details | Formatted Diff | Diff
Addressed comments (12.55 KB, patch)
2012-06-08 13:40 PDT, Misha
no flags Details | Formatted Diff | Diff
Addressed last comments (14.07 KB, patch)
2012-07-12 09:45 PDT, Misha
no flags Details | Formatted Diff | Diff
new kcachegrind screen screen shot (383.01 KB, image/png)
2012-07-12 09:47 PDT, Misha
no flags Details
Patch (15.87 KB, patch)
2012-11-20 18:21 PST, Viatcheslav Ostapenko
no flags Details | Formatted Diff | Diff
Patch (15.53 KB, patch)
2012-11-20 20:08 PST, Viatcheslav Ostapenko
no flags Details | Formatted Diff | Diff
Simple benchmark for ImageDecoder. (1.53 KB, application/zip)
2012-11-26 14:39 PST, Viatcheslav Ostapenko
no flags Details
Patch (12.30 KB, patch)
2012-11-26 20:08 PST, Viatcheslav Ostapenko
bfulgham: review-
Details | Formatted Diff | Diff
GIF part (4.74 KB, patch)
2012-11-27 17:54 PST, Viatcheslav Ostapenko
no flags Details | Formatted Diff | Diff
JPEG part (8.25 KB, patch)
2012-11-27 17:59 PST, Viatcheslav Ostapenko
bfulgham: review-
Details | Formatted Diff | Diff
GIF part without inlines (4.72 KB, patch)
2012-11-27 21:29 PST, Viatcheslav Ostapenko
no flags Details | Formatted Diff | Diff
jpeg part, turbo issue fixed. (8.24 KB, patch)
2012-11-27 23:19 PST, Viatcheslav Ostapenko
no flags Details | Formatted Diff | Diff
Updated patch. (8.28 KB, patch)
2012-11-28 12:53 PST, Viatcheslav Ostapenko
bfulgham: review+
Details | Formatted Diff | Diff
Patch for landing. (8.39 KB, patch)
2012-11-29 13:59 PST, Viatcheslav Ostapenko
no flags Details | Formatted Diff | Diff
Patch for landing (8.61 KB, patch)
2012-11-29 16:42 PST, Viatcheslav Ostapenko
no flags Details | Formatted Diff | Diff
Patch (4.60 KB, patch)
2012-11-30 17:34 PST, Viatcheslav Ostapenko
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Misha 2012-06-06 09:41:58 PDT
Created attachment 146045 [details]
kcachegrind screen shot before patch

This patch is addressing functions that are getting called for each pixel during image decoding. In such cases it's better to avoid as much as possible branching inside loops and multiplications. The improvements should be significant when image cache reaches it's limit.
To simulate this I set to 1M when I did my testing. To test the patch I just scroll locally saved page back and forth under oprofile. 
Then I used: opreport -gdf | op2calltree and kcachegrind to see the results.

For locally saved nytimes.com the difference is quite significant. I'm attaching two screen shots of kcachegrind view before and after.
Comment 1 Misha 2012-06-06 09:42:40 PDT
Created attachment 146046 [details]
 kcachegrind screen shot after patch
Comment 2 Misha 2012-06-06 09:50:04 PDT
Created attachment 146050 [details]
patch for review
Comment 3 WebKit Review Bot 2012-06-06 09:54:50 PDT
Attachment 146050 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1
Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:544:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
Total errors found: 1 in 6 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Gyuyoung Kim 2012-06-06 11:03:54 PDT
Comment on attachment 146050 [details]
patch for review

Attachment 146050 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/12901894
Comment 5 Misha 2012-06-06 11:27:15 PDT
Created attachment 146071 [details]
revised patch
Comment 6 WebKit Review Bot 2012-06-06 11:29:17 PDT
Attachment 146071 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1
Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:543:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.h:34:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 2 in 6 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 7 WebKit Review Bot 2012-06-06 22:15:37 PDT
Comment on attachment 146071 [details]
revised patch

Attachment 146071 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/12912129
Comment 8 Misha 2012-06-07 08:40:04 PDT
Created attachment 146294 [details]
Another attempt
Comment 9 WebKit Review Bot 2012-06-07 08:45:18 PDT
Attachment 146294 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1
Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:543:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.h:34:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 2 in 6 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 10 Misha 2012-06-07 09:05:45 PDT
Created attachment 146299 [details]
Attempt to fix cr build
Comment 11 WebKit Review Bot 2012-06-07 09:10:25 PDT
Attachment 146299 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1
Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:543:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.h:34:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 2 in 6 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 12 Misha 2012-06-07 10:50:47 PDT
Created attachment 146322 [details]
Revised patch again
Comment 13 WebKit Review Bot 2012-06-07 14:44:58 PDT
Comment on attachment 146322 [details]
Revised patch again

Attachment 146322 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/12907749
Comment 14 Misha 2012-06-07 15:52:55 PDT
Created attachment 146402 [details]
Patch for review
Comment 15 Zoltan Horvath 2012-06-08 11:03:53 PDT
Comment on attachment 146402 [details]
Patch for review

View in context: https://bugs.webkit.org/attachment.cgi?id=146402&action=review

> Source/WebCore/platform/image-decoders/ImageDecoder.h:168
> +        inline PixelData* data() const { return (PixelData*) m_bitmap.bitmap().getPixels(); }

You shouldn't use C style casts.

> Source/WebCore/platform/image-decoders/gif/GIFImageDecoder.cpp:233
> +    ImageFrame::PixelData* curAddr = buffer.data() + yBegin * buffer.width() + xBegin;

You shouldn't use abbreviation. Just write currentAddress e.g.

> Source/WebCore/platform/image-decoders/gif/GIFImageDecoder.cpp:252
> +        curAddr++;

I prefer ++ as prefix.

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:517
> +        ImageFrame::PixelData* curAddr = buffer.data() + destY * buffer.width();

Same naming here. And several other places.

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:599
> +    bool ret = true;

We should also find a better name for this. e.g. isScanningSuccessful, or something talkative.

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.h:64
> +        bool outputScanlinesRGB(ImageFrame& buffer);

The difference between the the two functions is only the 'for' loop inside. Can't we just make 1 function with an additional parameter?
Comment 16 Misha 2012-06-08 13:37:31 PDT
Thanks for comments. Will address them, but I don't think I agree with last one. The purpose of the patch is to reduce branching for the places that got invoked a lot. Combining two functions into the one with parameter will give us pretty much the same code as we have currently in the trunk. This parameter will be checked for every row which is not optimal.




(In reply to comment #15)
> (From update of attachment 146402 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=146402&action=review
> 
> > Source/WebCore/platform/image-decoders/ImageDecoder.h:168
> > +        inline PixelData* data() const { return (PixelData*) m_bitmap.bitmap().getPixels(); }
> 
> You shouldn't use C style casts.
> 
> > Source/WebCore/platform/image-decoders/gif/GIFImageDecoder.cpp:233
> > +    ImageFrame::PixelData* curAddr = buffer.data() + yBegin * buffer.width() + xBegin;
> 
> You shouldn't use abbreviation. Just write currentAddress e.g.
> 
> > Source/WebCore/platform/image-decoders/gif/GIFImageDecoder.cpp:252
> > +        curAddr++;
> 
> I prefer ++ as prefix.
> 
> > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:517
> > +        ImageFrame::PixelData* curAddr = buffer.data() + destY * buffer.width();
> 
> Same naming here. And several other places.
> 
> > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:599
> > +    bool ret = true;
> 
> We should also find a better name for this. e.g. isScanningSuccessful, or something talkative.
> 
> > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.h:64
> > +        bool outputScanlinesRGB(ImageFrame& buffer);
> 
> The difference between the the two functions is only the 'for' loop inside. Can't we just make 1 function with an additional parameter?
Comment 17 Misha 2012-06-08 13:40:54 PDT
Created attachment 146642 [details]
Addressed comments
Comment 18 Peter Kasting 2012-06-13 13:25:18 PDT
(In reply to comment #16)
> Thanks for comments. Will address them, but I don't think I agree with last one. The purpose of the patch is to reduce branching for the places that got invoked a lot. Combining two functions into the one with parameter will give us pretty much the same code as we have currently in the trunk. This parameter will be checked for every row which is not optimal.

How about this then: Make the function templated with a bool or enum argument, and write the conditional as checking the template argument.  This means the conditional result is fixed at compile-time and the compiler can presumably optimize it out.  Double-check with your profiler that this doesn't cost perf, and write a comment about why you're doing this.

The goal here is to get the perf win while maintaining the significant clarity and maintenance benefits of the current code.
Comment 19 Misha 2012-07-12 09:45:41 PDT
Created attachment 151976 [details]
Addressed last comments
Comment 20 Misha 2012-07-12 09:47:16 PDT
Created attachment 151977 [details]
new kcachegrind screen screen shot
Comment 21 Peter Kasting 2012-07-12 10:38:06 PDT
Comment on attachment 151976 [details]
Addressed last comments 

View in context: https://bugs.webkit.org/attachment.cgi?id=151976&action=review

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:555
> +template <bool isCMYK>
> +void setScanLines(ImageFrame& buffer, int x, ImageFrame::PixelData* currentAddress, JSAMPARRAY samples, int column);

How about:

template <bool isCMYK>
void setScanLines(ImageFrame& buffer, int x, ImageFrame::PixelData* currentAddress, JSAMPARRAY samples, int column)
{
    JSAMPLE* jsample = *samples + column * (isCMYK ? 4 : 3);

    unsigned r = jsample[0];
    unsigned g = jsample[1];
    unsigned b = jsample[2];
    if (isCMYK) {
        // Source is 'Inverted CMYK', output is RGB.
        // See: http://www.easyrgb.com/math.php?MATH=M12#text12
        // Or: http://www.ilkeratalay.com/colorspacesfaq.php#rgb
        // From CMYK to CMY:
        // X =   X    * (1 -   K   ) +   K  [for X = C, M, or Y]
        // Thus, from Inverted CMYK to CMY is:
        // X = (1-iX) * (1 - (1-iK)) + (1-iK) => 1 - iX*iK
        // From CMY (0..1) to RGB (0..1):
        // R = 1 - C => 1 - (1 - iC*iK) => iC*iK  [G and B similar]
        unsigned k = jsample[3];
        r = r * k / 255;
        g = g * k / 255;
        b = b * k / 255;
    }
    buffer.setRGBA(currentAddress, r, g, b, 0xFF);
}

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:583
> +template <bool isScaled> int getColumn(Vector<int> columns, int x);

How about:

template <bool isScaled>
int getColumn(Vector<int> columns, int x)
{
    return isScaled ? columns[x] : x;
}

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:658
> +         ASSERT(!m_scaled);
> +         while (info->output_scanline < info->output_height) {
> +             unsigned char* row = reinterpret_cast<unsigned char*>(buffer.getAddr(0, info->output_scanline));
> +             if (jpeg_read_scanlines(info, &row, 1) != 1)
> +                  return false;

The additional indenting here is wrong.

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:674
> +    case JCS_RGB:

How about:

    // outputScanlines() uses template arguments to fix the comparison values at
    // compile time so the compiler can avoid generating code with branches.
    if (info->out_color_space == JCS_RGB)
        return m_scaled ? outputScanlines<false, true>(buffer) : outputScanlines<false, false>(buffer);
    if (info->out_color_space == JCS_CMYK)
        return m_scaled ? outputScanlines<true, true>(buffer) : outputScanlines<true, false>(buffer);
    ASSERT_NOT_REACHED();
    return setFailed();
Comment 22 Viatcheslav Ostapenko 2012-11-20 18:21:52 PST
Created attachment 175317 [details]
Patch
Comment 23 WebKit Review Bot 2012-11-20 19:08:15 PST
Comment on attachment 175317 [details]
Patch

Attachment 175317 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/14933541

New failing tests:
inspector-protocol/debugger-terminate-dedicated-worker-while-paused.html
Comment 24 Zoltan Horvath 2012-11-20 19:37:38 PST
Comment on attachment 175317 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=175317&action=review

Some very little nits below. Otherwise looks okay for me.

> Source/WebCore/ChangeLog:3
> +        Optimization in image decoding

I would use a more accurate name as a bug title. Something like: Reduce branching and multiplications in image decoding loops and functions

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:718
> +

- Extra line.

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:791
> +

- Extra line.
Comment 25 Viatcheslav Ostapenko 2012-11-20 20:08:55 PST
Created attachment 175328 [details]
Patch
Comment 26 Adam Barth 2012-11-21 00:14:21 PST
Comment on attachment 175328 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=175328&action=review

> Source/WebCore/ChangeLog:10
> +        Optimization in image decoding.
> +        Reduce branching and multiplications in image decoding loops and functions.
> +        Rebase and update of original patch by Misha Tyutyunik <michael.tyuytunik@nokia.com>

Do you have data on which benchmarks this improves and by how much?
Comment 27 Viatcheslav Ostapenko 2012-11-21 07:07:19 PST
(In reply to comment #26)
> (From update of attachment 175328 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=175328&action=review
> 
> > Source/WebCore/ChangeLog:10
> > +        Optimization in image decoding.
> > +        Reduce branching and multiplications in image decoding loops and functions.
> > +        Rebase and update of original patch by Misha Tyutyunik <michael.tyuytunik@nokia.com>
> 
> Do you have data on which benchmarks this improves and by how much?

This improved loading time a bit on a low end device for pages with big amount of images, but it was some kind of side effect.
Originally this patch tried to solve the problem of very slow update during scrolling of the page with big number of images on the device with limited image cache. Profiling showed that hotspot in this case was image decoding. After implementing this patch tile updates became visually faster and image decoding functions moved from top of kcachegrind below the painting functions. To make profiling of image decoding functions easier image cache was set to the very low limit. Large page was loaded and scrolled up/down under profiler. kcachegrind screenshots are attached.
Comment 28 noel gordon 2012-11-26 05:16:13 PST
(In reply to comment #27)

I have no way to reproduce the kcachegrind images, nor do I understand what they mean when I compare them.  All Comment #1 mentioned was ...

> This patch is addressing functions that are getting called for each pixel during image decoding. In such cases it's better to avoid as much as possible branching inside loops and multiplications.

Assuming an image from the browser cache, the image decoding cost is 1) the decoder library API time, and 2) the time to write the decoded rows to the frame buffer.  The patch appears to be dealing with 2).

I was curious to find out how much time 2) takes when decoding PNG images.  See bug 103216 where the results suggest that the row write time is ~15% of the PNG image decoding cost.

The row writer calls buffer.setRGBA(x,y), which requires an integer multiply-add for each pixel x in row y.  Since row pixels are written in order, x in [0..row.width) typically, it might be faster to write to the frame buffer *address++ for each x.  I tried that on bug 103216 and the answer is yes - the overall PNG decoding time is reduced by a few percent.

Seems those multiply-adds on each pixel write do slow decoding more than necessary.  I plan to submit the patch on bug 103216 to address PNG.  A similar patch would work for GIF, and I'll do that next.
Comment 29 Viatcheslav Ostapenko 2012-11-26 07:26:52 PST
(In reply to comment #28)
> (In reply to comment #27)
> 
> The row writer calls buffer.setRGBA(x,y), which requires an integer multiply-add for each pixel x in row y.  Since row pixels are written in order, x in [0..row.width) typically, it might be faster to write to the frame buffer *address++ for each x.  I tried that on bug 103216 and the answer is yes - the overall PNG decoding time is reduced by a few percent.

Actually, that's the part of what patch does. It replaces call to setRGBA(x, y) with setRGBA(address) which would be inlined by compiler. This patch does the same as 103216 for all image formats and has some additional optimizations. 

They way to reproduce cache grid is to limit image cache in order to force continuous decoding of images during page scrolling. Without this patch image decoding functions were at the top meaning that they were using more CPU time than other functions (were hot spots), but after this patch they disappeared from cachegrind top meaning that they don't use so much time anymore. It is difficult to say exactly how much improvement is achieved, but moving from 16% CPU usage to about 1% I would consider significant.
Comment 30 Brent Fulgham 2012-11-26 10:24:06 PST
Comment on attachment 175328 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=175328&action=review

> Source/WebCore/platform/image-decoders/ImageDecoder.h:187
> +        inline PixelData* data() const { return m_bytes; }

Do we need the 'inline' keyword here?  I thought that it was unnecessary if the implementation was inline.

> Source/WebCore/platform/image-decoders/ImageDecoder.h:188
> +        inline int width() const { return m_size.width(); }

Ditto.

> Source/WebCore/platform/image-decoders/ImageDecoder.h:189
> +        inline int height() const { return m_size.height(); }

Ditto.

> Source/WebCore/platform/image-decoders/ImageDecoder.h:191
> +        inline PixelData* data() const { return static_cast<PixelData*>(m_bitmap.bitmap().getPixels()); }

Ditto.
Comment 31 Brent Fulgham 2012-11-26 10:28:17 PST
(In reply to comment #29)
> (In reply to comment #28)
> > (In reply to comment #27)
> > 
> > The row writer calls buffer.setRGBA(x,y), which requires an integer multiply-add for each pixel x in row y.  Since row pixels are written in order, x in [0..row.width) typically, it might be faster to write to the frame buffer *address++ for each x.  I tried that on bug 103216 and the answer is yes - the overall PNG decoding time is reduced by a few percent.
> 
> Actually, that's the part of what patch does. It replaces call to setRGBA(x, y) with setRGBA(address) which would be inlined by compiler. This patch does the same as 103216 for all image formats and has some additional optimizations. 
> 
> They way to reproduce cache grid is to limit image cache in order to force continuous decoding of images during page scrolling. Without this patch image decoding functions were at the top meaning that they were using more CPU time than other functions (were hot spots), but after this patch they disappeared from cachegrind top meaning that they don't use so much time anymore. It is difficult to say exactly how much improvement is achieved, but moving from 16% CPU usage to about 1% I would consider significant.

I would like to use this bug as an overarching bug to address these issues, and land the individual implementations (PNG already covered by the new 103216 bug), JPEG, etc.

Noel, can you update your existing patch to include the inlining of the width/height and mention this bug in your ChangeLog?
Comment 32 Viatcheslav Ostapenko 2012-11-26 14:39:23 PST
Created attachment 176070 [details]
Simple benchmark for ImageDecoder.
Comment 33 Viatcheslav Ostapenko 2012-11-26 14:47:36 PST
(In reply to comment #32)
> Created an attachment (id=176070) [details]
> Simple benchmark for ImageDecoder.

I've created simple benchmark, that loads file specified in command line and decodes it 1000 times.

Results for couple images:

Before patch:
time ./ImageDecoderPerf img.jpg
Done

real	0m14.184s
user	0m14.109s
sys	0m0.048s


time ./ImageDecoderPerf img.png
Done

real	0m49.146s
user	0m48.967s
sys	0m0.056s


After patch:
time ./ImageDecoderPerf img.jpg
Done

real	0m13.254s
user	0m13.177s
sys	0m0.048s

./ImageDecoderPerf img.png
Done

real	0m47.481s
user	0m47.343s
sys	0m0.048s


For JPEG decoder improvement is about 6.6% and for PNG is about 3.4% from total decoding time.
Comment 34 Brent Fulgham 2012-11-26 16:45:31 PST
(In reply to comment #33)
> (In reply to comment #32)
> > Created an attachment (id=176070) [details] [details]
> > Simple benchmark for ImageDecoder.
> 
> I've created simple benchmark, that loads file specified in command line and decodes it 1000 times.
> 
[...]
> 
> For JPEG decoder improvement is about 6.6% and for PNG is about 3.4% from total decoding time.

Very nice!

I know this issue has gone through a number of iterations, and you are probably tired of all of this back-and-forth.  Thank you for your patience. :-)

Would you mind creating a new bug for your image decoder utility?  We could make it part of the Tools directory, and could add build rules for all of the various platforms to allow similar benchmarks on all of our targets.

Also, could you please rebaseline the patch after 103216 closes out since it incorporates several of the changes you propose in your patch?
Comment 35 Brent Fulgham 2012-11-26 16:47:03 PST
Comment on attachment 175328 [details]
Patch

Much of this patch is being committed as part of Bug 103216.  r- so that you can rebaseline after that change.
Comment 36 noel gordon 2012-11-26 16:54:10 PST
(In reply to comment #33)

> I've created simple benchmark, that loads file specified in command line and decodes it 1000 times.
> ...
> For JPEG decoder improvement is about 6.6% and for PNG is about 3.4% from total decoding time.

Debugging sucks, testing is better.
Comment 37 noel gordon 2012-11-26 17:02:42 PST
(In reply to comment #31)
> I would like to use this bug as an overarching bug to address these issues, and land the individual implementations (PNG already covered by the new 103216 bug), JPEG, etc.

Agree.  Seems the original patch author no longer works on webkit, so we need to retest these changes.

> Noel, can you update your existing patch to include the inlining of the width/height and mention this bug in your ChangeLog?

Let's do that last and in a separate bug given the PNG patch is now cq+.
Comment 38 Brent Fulgham 2012-11-26 17:14:12 PST
(In reply to comment #37)
> (In reply to comment #31)
> > Noel, can you update your existing patch to include the inlining of the width/height and mention this bug in your ChangeLog?
> 
> Let's do that last and in a separate bug given the PNG patch is now cq+.

Sounds good.  Looks like the patch failed to apply for some reason...
Comment 39 Brent Fulgham 2012-11-26 17:20:52 PST
(In reply to comment #33)

> time ./ImageDecoderPerf img.jpg
> Done
> 
> real    0m13.254s
> user    0m13.177s
> sys    0m0.048s

It goes without saying (of course), but we should really assess performance using gprof, Instruments, Valgrind, or similar.  This is a good first approximation, but all kinds of things might affect these values, especially if we are in the 0-5% difference range.

We might be better off alternating the two implementations in a loop and get a mean/std. deviation of the run times to get a better feel for the change.
Comment 40 Viatcheslav Ostapenko 2012-11-26 18:21:11 PST
(In reply to comment #39)
> (In reply to comment #33)
> 
> > time ./ImageDecoderPerf img.jpg
> > Done
> > 
> > real    0m13.254s
> > user    0m13.177s
> > sys    0m0.048s
> 
> It goes without saying (of course), but we should really assess performance using gprof, Instruments, Valgrind, or similar. 

oprofile/kcachegrind screenshots are attached from the very beginning half year old.

> This is a good first approximation, but all kinds of things might affect these values, especially if we are in the 0-5% difference range.

That's on total image decoding, including decoder work itself and on intel desktop CPUs with good branch prediction. This think was tested on ARM and difference was bigger.

> We might be better off alternating the two implementations in a loop and get a mean/std. deviation of the run times to get a better feel for the change.

Was it done here: https://bugs.webkit.org/show_bug.cgi?id=103216 ?
Comment 41 Viatcheslav Ostapenko 2012-11-26 18:28:20 PST
(In reply to comment #35)
> (From update of attachment 175328 [details])
> Much of this patch is being committed as part of Bug 103216.  r- so that you can rebaseline after that change.

Not much(In reply to comment #36)
> (In reply to comment #33)
> 
> > I've created simple benchmark, that loads file specified in command line and decodes it 1000 times.
> > ...
> > For JPEG decoder improvement is about 6.6% and for PNG is about 3.4% from total decoding time.
> 
> Debugging sucks, testing is better.

What do you mean? Have you looked into benchmarked I attached?
Comment 42 Viatcheslav Ostapenko 2012-11-26 18:29:30 PST
(In reply to comment #37)
> (In reply to comment #31)
> > I would like to use this bug as an overarching bug to address these issues, and land the individual implementations (PNG already covered by the new 103216 bug), JPEG, etc.
> 
> Agree.  Seems the original patch author no longer works on webkit, so we need to retest these changes.

Test is attached. Isn't it?

> > Noel, can you update your existing patch to include the inlining of the width/height and mention this bug in your ChangeLog?
> 
> Let's do that last and in a separate bug given the PNG patch is now cq+.

What a rush!
Comment 43 Viatcheslav Ostapenko 2012-11-26 18:50:42 PST
(In reply to comment #34)
> (In reply to comment #33)
> > (In reply to comment #32)
> > > Created an attachment (id=176070) [details] [details] [details]
> > > Simple benchmark for ImageDecoder.
> > 
> > I've created simple benchmark, that loads file specified in command line and decodes it 1000 times.
> > 
> [...]
> > 
> > For JPEG decoder improvement is about 6.6% and for PNG is about 3.4% from total decoding time.
> 
> Very nice!
> 
> I know this issue has gone through a number of iterations, and you are probably tired of all of this back-and-forth.  Thank you for your patience. :-)
> 
> Would you mind creating a new bug for your image decoder utility?  We could make it part of the Tools directory, and could add build rules for all of the various platforms to allow similar benchmarks on all of our targets.
> 
> Also, could you please rebaseline the patch after 103216 closes out since it incorporates several of the changes you propose in your patch?

Not all platforms export those symbols. EFL normally also doesn't, but it has special dev build with separate libraries and symbols exported, which I used for testing. Normally it wouldn't work.
Comment 44 Viatcheslav Ostapenko 2012-11-26 20:08:41 PST
Created attachment 176155 [details]
Patch
Comment 45 Brent Fulgham 2012-11-27 12:38:18 PST
Comment on attachment 175328 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=175328&action=review

> Source/WebCore/platform/image-decoders/ImageDecoder.h:163
>          inline void setRGBA(PixelData* dest, unsigned r, unsigned g, unsigned b, unsigned a)

I wonder if we could use pkasting's idea here, and make this a template that encodes the decision about premultiplication and bounds-checking of the alpha term.  This would remove two branches, the bounds check, and possibly the calculation of the color band scaling factor.
Comment 46 Viatcheslav Ostapenko 2012-11-27 13:12:22 PST
(In reply to comment #45)
> (From update of attachment 175328 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=175328&action=review
> 
> > Source/WebCore/platform/image-decoders/ImageDecoder.h:163
> >          inline void setRGBA(PixelData* dest, unsigned r, unsigned g, unsigned b, unsigned a)
> 
> I wonder if we could use pkasting's idea here, and make this a template that encodes the decision about premultiplication and bounds-checking of the alpha term.  This would remove two branches, the bounds check, and possibly the calculation of the color band scaling factor.

Yes, I'm planning another commit for this.
Comment 47 Brent Fulgham 2012-11-27 16:56:12 PST
Comment on attachment 176155 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=176155&action=review

As I mentioned previously, I'd like to land the GIF part first, and the JPEG part separately.  Could you please revise this patch to just be the GIF part (and the inlining), and save the JPEG for a second step?

Also, I think we can avoid exporting more symbols by just using the getAddr call (rather than making width() and height() public).  Let's do this, unless you have a strong reason for wanting to expose them.

r- for these reasons.

> Source/WebCore/platform/image-decoders/ImageDecoder.h:-184
> -    private:

Can we please keep width() and height() private (see below comment).

> Source/WebCore/platform/image-decoders/gif/GIFImageDecoder.cpp:233
> +    ImageFrame::PixelData* currentAddress = buffer.data() + yBegin * buffer.width() + xBegin;

Note: As Noel Gordon points out, we can avoid exposing the width and height accessors entirely by using the getAddr method:

ImageFrame::PixelData* address = buffer.getAddr(0, y);
Comment 48 Brent Fulgham 2012-11-27 16:58:12 PST
Viatcheslav: I split your profiler class into a separate bug (see 103463).
Comment 49 Viatcheslav Ostapenko 2012-11-27 17:54:00 PST
Created attachment 176377 [details]
GIF part
Comment 50 Viatcheslav Ostapenko 2012-11-27 17:59:33 PST
Created attachment 176379 [details]
JPEG part
Comment 51 Brent Fulgham 2012-11-27 21:25:10 PST
Comment on attachment 176377 [details]
GIF part

Let's get rid of the inlines that were done as part of https://bugs.webkit.org/show_bug.cgi?id=103463.
Comment 52 Viatcheslav Ostapenko 2012-11-27 21:29:02 PST
Created attachment 176393 [details]
GIF part without inlines
Comment 53 Brent Fulgham 2012-11-27 21:32:25 PST
Isn't that just the same patch you submitted previously?
Comment 54 Viatcheslav Ostapenko 2012-11-27 21:34:29 PST
(In reply to comment #53)
> Isn't that just the same patch you submitted previously?

I removed inline before width/height.
Comment 55 Brent Fulgham 2012-11-27 21:41:08 PST
Comment on attachment 176393 [details]
GIF part without inlines

Looks good, aside from the code that just got landed in https://bugs.webkit.org/show_bug.cgi?id=103401.  I'll fix it up and land.
Comment 56 Brent Fulgham 2012-11-27 21:42:13 PST
Committed r135976: <http://trac.webkit.org/changeset/135976>
Comment 57 Brent Fulgham 2012-11-27 21:42:36 PST
No, webkit-patch!  This is not closed yet.
Comment 58 Viatcheslav Ostapenko 2012-11-27 21:49:36 PST
(In reply to comment #57)
> No, webkit-patch!  This is not closed yet.

That's strange. Usually it is able to detect, that not all patches were landed.
Comment 59 Brent Fulgham 2012-11-27 22:00:04 PST
Comment on attachment 176379 [details]
JPEG part

View in context: https://bugs.webkit.org/attachment.cgi?id=176379&action=review

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:657
> +    switch (J_COLOR_SPACE(colorSpace)) {

Could this switch be handled as template specializations to avoid branching?  Would it even make a difference in performance?
Comment 60 Viatcheslav Ostapenko 2012-11-27 22:15:36 PST
Comment on attachment 176379 [details]
JPEG part

View in context: https://bugs.webkit.org/attachment.cgi?id=176379&action=review

>> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:657
>> +    switch (J_COLOR_SPACE(colorSpace)) {
> 
> Could this switch be handled as template specializations to avoid branching?  Would it even make a difference in performance?

It is actually. colorSpace is parameter of template. It makes switch on constant and compiler optimizes it out.
Originally it was several template specializations in 1st version of patch, but Peter Kasting suggested just to use if/switch as in many other places in code. We've tried and it worked - compiler creates several versions of function killing unreachable branches.
Comment 61 Brent Fulgham 2012-11-27 22:28:49 PST
(In reply to comment #60)
> (From update of attachment 176379 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=176379&action=review
> 
> >> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:657
> >> +    switch (J_COLOR_SPACE(colorSpace)) {
> > 
> > Could this switch be handled as template specializations to avoid branching?  Would it even make a difference in performance?
> 
> It is actually. colorSpace is parameter of template. It makes switch on constant and compiler optimizes it out.
> Originally it was several template specializations in 1st version of patch, but Peter Kasting suggested just to use if/switch as in many other places in code. We've tried and it worked - compiler creates several versions of function killing unreachable branches.

Oh, excellent!
Comment 62 Brent Fulgham 2012-11-27 22:32:07 PST
Comment on attachment 176393 [details]
GIF part without inlines

Clearing these flags since it is already landed, so that it does not show up in searches.
Comment 63 Brent Fulgham 2012-11-27 23:12:35 PST
Comment on attachment 176379 [details]
JPEG part

View in context: https://bugs.webkit.org/attachment.cgi?id=176379&action=review

Based on discussion on IRC with ostap and noel, this current patch is missing some new functionality from the libjpeg-turbo update.

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:652
> +    JSAMPLE* jsample = *samples + column * (J_COLOR_SPACE(colorSpace) == JCS_CMYK ? 4 : 3);

As Noel pointed out on IRC, this breaks libjpeg-turbo.  See http://trac.webkit.org/browser/trunk/Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp?annotate=blame&rev=135976#L706 for details of the break.  More specifically:

JSAMPLE* jsample = *samples + (m_scaled ? m_scaledColumns[x] : x) * ((info->out_color_space == JCS_RGB) ? 3 : 4);

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:-706
> -            JSAMPLE* jsample = *samples + (m_scaled ? m_scaledColumns[x] : x) * ((info->out_color_space == JCS_RGB) ? 3 : 4);

This line is not dealt with properly in this change.
Comment 64 Viatcheslav Ostapenko 2012-11-27 23:19:10 PST
Created attachment 176408 [details]
jpeg part, turbo issue fixed.
Comment 65 noel gordon 2012-11-28 05:55:20 PST
Comment on attachment 176408 [details]
jpeg part, turbo issue fixed.

View in context: https://bugs.webkit.org/attachment.cgi?id=176408&action=review

> Source/WebCore/ChangeLog:19
> +        * platform/image-decoders/jpeg/JPEGImageDecoder.cpp:
> +        (WebCore):
> +        (WebCore::setScanLines):
> +        (WebCore::JPEGImageDecoder::outputScanlines):
> +        * platform/image-decoders/jpeg/JPEGImageDecoder.h:
> +        (JPEGImageDecoder):

This area of the ChangeLog needs comments.

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:650
> +void setScanLines(ImageFrame& buffer, ImageFrame::PixelData* currentAddress, JSAMPARRAY samples, int column)

setScanLines?  Perhaps setPixel would be a better function name?

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:652
> +    JSAMPLE* jsample = *samples + column * (J_COLOR_SPACE(colorSpace) == JCS_RGB ? 3 : 4);

This fixes the libjpeg-turbo error mentioned on IRC with the previous patch, thank you.  The bug was not obvious.  Peter mentioned something about this in Comment #18 (see below).

The J_COLOR_SPACE() is not needed here, I believe: write it as

   JSAMPLE* jsample = *samples + column * (colorSpace == JCS_RGB ? 3 : 4);

I use the Clang C++ compiler: it does not complain about this comparison.


Peter's Comment #18
> The goal here is to get the perf win while maintaining the significant clarity and maintenance benefits of the current code.

I would add that the clarity he seeks also helps us analyse this code in terms of security.  The image decoder code is security sensitive code.  Too many templates tend to obfuscate the code, making security much harder to assess.

With that in mind ...

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:655
> +    unsigned r = jsample[0];
> +    unsigned g = jsample[1];
> +    unsigned b = jsample[2];

I would prefer we got rid of these temporary variables ...

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:682
> +    switch (J_COLOR_SPACE(colorSpace)) {
> +#if defined(TURBO_JPEG_RGB_SWIZZLE)
> +    case JCS_EXT_BGRA:
> +        std::swap(r, b);
> +        break;
> +    case JCS_EXT_RGBA: // Fallback to JSC_RGB case here.
> +#endif
> +    case JCS_RGB:
> +        // Empty to avoid compiler warning about unhandled case in switch.
> +        break;
> +    case JCS_CMYK:
> +        // Source is 'Inverted CMYK', output is RGB.
> +        // See: http://www.easyrgb.com/math.php?MATH=M12#text12
> +        // Or: http://www.ilkeratalay.com/colorspacesfaq.php#rgb
> +        // From CMYK to CMY:
> +        // X =   X    * (1 -   K   ) +   K  [for X = C, M, or Y]
> +        // Thus, from Inverted CMYK to CMY is:
> +        // X = (1-iX) * (1 - (1-iK)) + (1-iK) => 1 - iX*iK
> +        // From CMY (0..1) to RGB (0..1):
> +        // R = 1 - C => 1 - (1 - iC*iK) => iC*iK  [G and B similar]
> +        unsigned k = jsample[3];
> +        r = r * k / 255;
> +        g = g * k / 255;
> +        b = b * k / 255;
> +        break;
> +    }

... and instead used the if statements from the original code here:

    if (colorSpace == JCS_RGB)
        buffer.setRGBA(currentAddress, jsample[0], jsample[1], jsample[2], 0xFF);
#if defined(TURBO_JPEG_RGB_SWIZZLE)
    else if (colorSpace == JCS_EXT_RGBA)
        buffer.setRGBA(currentAddress, jsample[0], jsample[1], jsample[2], 0xFF);
    else if (colorSpace == JCS_EXT_BGRA)
        buffer.setRGBA(currentAddress, jsample[2], jsample[1], jsample[0], 0xFF);
#endif
    else if (colorSpace == JCS_CMYK) {
        // long comment about cmyk ...
        unsigned k = jsample[3];
        buffer.setRGBA(currentAddress, jsample[0] * k / 255, jsample[1] * k / 255, jsample[2] * k / 255, 0xFF);
    }
}

It's shorter, and easier to read and verify against the previous code.  This function is templated on colorSpace so the compiler will write a sensible template function for each colorSpace value as you noted in Comment #60.

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:688
> +template <int colorSpace, bool isScaled>
> +bool JPEGImageDecoder::outputScanlines(ImageFrame& buffer)
> +{

This is a member function, but we pass its frame buffer member as an argument to this member function!  That seems somewhat odd to me.  Why is this not bool JPEGImageDecoder::outputScanlines() with normal access to the buffer member in the function body?  Maybe your profiling detected a difference?  I'm not sure: the ChangeLog doesn't mention it.

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:705
> +#if USE(QCMSLIB)
> +        if (m_reader->colorTransform() && info->out_color_space == JCS_RGB)

This function is templated on colorSpace, so why use info->out_color_space here?  I think you meant to write

    if (m_reader->colorTransform() && colorSpace == JCS_RGB)

That would allow the compiler to template specialize this code even more.

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:711
> +        int width = m_scaled ? m_scaledColumns.size() : info->output_width;

This function is templated on isScaled, yet you use m_scaled here.  Again, I think you meant to write this in terms of the template parameter

    int width = isScaled ? m_scaledColumns.size() : info->output_width;

Also, why recompute width every time through this loop?  It's a loop invariant, right?  I'd move it out of the loop body.

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:713
> +        for (int x = 0; x < width; ++x) {
> +            setScanLines<colorSpace>(buffer, currentAddress, m_reader->samples(), isScaled ? m_scaledColumns[x] : x);

Use samples instead of m_reader->samples() here.  A post increment on currentAddress might also be useful: that'd remove 3 more lines.

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:724
> +template <int colorSpace>
> +bool JPEGImageDecoder::outputScanlines(ImageFrame& buffer)
> +{
> +    return m_scaled ? outputScanlines<colorSpace, true>(buffer) : outputScanlines<colorSpace, false>(buffer);
> +}

This seems unnecessary.  You could inline it I suppose, but I'd remove this function (saving a function call) and use it's body at the call site.  Peter suggested this in Comment #21 and even provided the prototype code ...

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:764
> +    bool isScanSuccessful = true;

... so let's replace these changes here and below with Peter's suggestion:

    // outputScanlines() uses template arguments to fix the comparison values at compile-time
    // so the compiler can avoid generating code with branches.

    if (info->out_color_space == JCS_RGB)
        return m_scaled ? outputScanlines<JCS_RGB, true>(buffer) : outputScanlines<JCS_RGB, false>(buffer);
#if defined(TURBO_JPEG_RGB_SWIZZLE)
    if (info->out_color_space == JCS_EXT_RGBA)
        return m_scaled ? outputScanlines<JCS_EXT_RGBA, true>(buffer) : outputScanlines<JCS_EXT_RGBA, false>(buffer);
    if (info->out_color_space == JCS_EXT_BGRA)
        return m_scaled ? outputScanlines<JCS_EXT_BGRA, true>(buffer) : outputScanlines<JCS_EXT_BGRA, false>(buffer);
#endif
    if (info->out_color_space == JCS_CMYK)
        return m_scaled ? outputScanlines<JCS_CMYK, true>(buffer) : outputScanlines<JCS_CMYK, false>(buffer);

    ASSERT_NOT_REACHED();
    return setFailed();
}

Works for me, thank you Peter.
Comment 66 Viatcheslav Ostapenko 2012-11-28 11:52:00 PST
Comment on attachment 176408 [details]
jpeg part, turbo issue fixed.

View in context: https://bugs.webkit.org/attachment.cgi?id=176408&action=review

>> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:688
>> +{
> 
> This is a member function, but we pass its frame buffer member as an argument to this member function!  That seems somewhat odd to me.  Why is this not bool JPEGImageDecoder::outputScanlines() with normal access to the buffer member in the function body?  Maybe your profiling detected a difference?  I'm not sure: the ChangeLog doesn't mention it.

IMHO, it is odd to write every time m_frameBufferCache[0]. This function is called once for the whole decoding and it makes no difference from performance point of view.

>> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:724
>> +}
> 
> This seems unnecessary.  You could inline it I suppose, but I'd remove this function (saving a function call) and use it's body at the call site.  Peter suggested this in Comment #21 and even provided the prototype code ...

I prefer it this way to avoid writing repeating code. Also this function is a place holder for some other code in future commits.

>> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:764
>> +    bool isScanSuccessful = true;
> 
> ... so let's replace these changes here and below with Peter's suggestion:
> 
>     // outputScanlines() uses template arguments to fix the comparison values at compile-time
>     // so the compiler can avoid generating code with branches.
> 
>     if (info->out_color_space == JCS_RGB)
>         return m_scaled ? outputScanlines<JCS_RGB, true>(buffer) : outputScanlines<JCS_RGB, false>(buffer);
> #if defined(TURBO_JPEG_RGB_SWIZZLE)
>     if (info->out_color_space == JCS_EXT_RGBA)
>         return m_scaled ? outputScanlines<JCS_EXT_RGBA, true>(buffer) : outputScanlines<JCS_EXT_RGBA, false>(buffer);
>     if (info->out_color_space == JCS_EXT_BGRA)
>         return m_scaled ? outputScanlines<JCS_EXT_BGRA, true>(buffer) : outputScanlines<JCS_EXT_BGRA, false>(buffer);
> #endif
>     if (info->out_color_space == JCS_CMYK)
>         return m_scaled ? outputScanlines<JCS_CMYK, true>(buffer) : outputScanlines<JCS_CMYK, false>(buffer);
> 
>     ASSERT_NOT_REACHED();
>     return setFailed();
> }
> 
> Works for me, thank you Peter.

I prefer to use switch instead of if chains because it looks cleaner and usually is better optimized by compiler.
Comment 67 Viatcheslav Ostapenko 2012-11-28 12:53:37 PST
Created attachment 176555 [details]
Updated patch.
Comment 68 Brent Fulgham 2012-11-29 09:52:39 PST
Comment on attachment 176555 [details]
Updated patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=176555&action=review

I think this looks really good.  I had a couple of questions/comments, none of which are critical.

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:652
> +    JSAMPLE* jsample = *samples + column * (J_COLOR_SPACE(colorSpace) == JCS_RGB ? 3 : 4);

Noel pointed out that (at least under Clang) that this can just be:

   JSAMPLE* jsample = *samples + column * (colorSpace == JCS_RGB ? 3 : 4);

Does that not work on your compiler?

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:674
> +        unsigned k = jsample[3];

It really seems like making "k = sample[3] / 255" (and getting rid of the three divides below) would be better, but I don't know offhand if sample[3] is floating point or not.  If it is, that might change the result of the division/multiply combinations.

As Noel pointed out, Clang might be emitting code that caches the k/255 so we don't pay the division cost three times, so it may be a moot point.

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:705
> +        int width = isScaled ? m_scaledColumns.size() : info->output_width;

I think Noel made a good point about the width calculation.  Isn't it a loop invariant, so that it could be defined outside the loop?  Or does m_scaledColumns or info->output_width change based on the calls to jpeg_read_scanlines or something?  Could you please clarify (or change the code)?

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:761
> +    // That is why use teplate and template specializations here so

nit: "That is why *WE* use *TEMPLATE* ..."

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:762
> +    // the proper code will be generated in compile time.

nit:  ".. at compile time"
Comment 69 Viatcheslav Ostapenko 2012-11-29 11:37:04 PST
Comment on attachment 176555 [details]
Updated patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=176555&action=review

>> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:652
>> +    JSAMPLE* jsample = *samples + column * (J_COLOR_SPACE(colorSpace) == JCS_RGB ? 3 : 4);
> 
> Noel pointed out that (at least under Clang) that this can just be:
> 
>    JSAMPLE* jsample = *samples + column * (colorSpace == JCS_RGB ? 3 : 4);
> 
> Does that not work on your compiler?

I remember gcc gave some warning here, which treated as error on many builds. enum(int) conversion used in many places, so I followed the crowed. This doesn't produce any extra code - just type conversion.

>> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:674
>> +        unsigned k = jsample[3];
> 
> It really seems like making "k = sample[3] / 255" (and getting rid of the three divides below) would be better, but I don't know offhand if sample[3] is floating point or not.  If it is, that might change the result of the division/multiply combinations.
> 
> As Noel pointed out, Clang might be emitting code that caches the k/255 so we don't pay the division cost three times, so it may be a moot point.

I don't know what he tried.
I wrote a test and it makes a difference replacing division with less costly operations, but this difference might be not seen in this case for integer division.
There is much bigger difference of replacing /255 in setRGBA because it has 4x conversions to float, float division, 3x float multiplications and 3x conversions from float.
I'd like to make separate patch when this goes in.

>> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:705
>> +        int width = isScaled ? m_scaledColumns.size() : info->output_width;
> 
> I think Noel made a good point about the width calculation.  Isn't it a loop invariant, so that it could be defined outside the loop?  Or does m_scaledColumns or info->output_width change based on the calls to jpeg_read_scanlines or something?  Could you please clarify (or change the code)?

Sorry, I've missed that.
If probably compiler will optimize this out of the loop in any case because isScale is template parameter and size() and output_width are simple inlined functions in mostly inlined code. But let's move it out to make it clear.
Comment 70 Brent Fulgham 2012-11-29 11:46:23 PST
(In reply to comment #69)
> >> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:652
> >> +    JSAMPLE* jsample = *samples + column * (J_COLOR_SPACE(colorSpace) == JCS_RGB ? 3 : 4);
> > 
> > Noel pointed out that (at least under Clang) that this can just be:
> > 
> >    JSAMPLE* jsample = *samples + column * (colorSpace == JCS_RGB ? 3 : 4);
> > 
> > Does that not work on your compiler?
> 
> I remember gcc gave some warning here, which treated as error on many builds. enum(int) conversion used in many places, so I followed the crowed. This doesn't produce any extra code - just type conversion.

Let's please do it as a C++-style cast in accordance with the coding conventions.  I actually thought it was some kind of libjpeg macro or something based on the way it was written:

    JSAMPLE* jsample = *samples + column * (static_cast<J_COLOR_SPACE>(colorSpace) == JCS_RGB ? 3 : 4);

> >> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:674
> There is much bigger difference of replacing /255 in setRGBA because it has 4x conversions to float, float division, 3x float multiplications and 3x conversions from float.
> I'd like to make separate patch when this goes in.

Great -- just please remember to include this one at the same time, just to be consistent :-)
 
Thanks!
Comment 71 Viatcheslav Ostapenko 2012-11-29 13:59:47 PST
Created attachment 176813 [details]
Patch for landing.
Comment 72 WebKit Review Bot 2012-11-29 14:31:42 PST
Comment on attachment 176813 [details]
Patch for landing.

Rejecting attachment 176813 [details] from commit-queue.

Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '-..." exit_code: 2

Last 500 characters of output:
mit-queue/Source/WebKit/chromium/third_party/skia/gyp --revision 6555 --non-interactive --force --accept theirs-conflict --ignore-externals' in '/mnt/git/webkit-commit-queue/Source/WebKit/chromium'
46>At revision 6555.

________ running '/usr/bin/python tools/clang/scripts/update.py --mac-only' in '/mnt/git/webkit-commit-queue/Source/WebKit/chromium'

________ running '/usr/bin/python gyp_webkit' in '/mnt/git/webkit-commit-queue/Source/WebKit/chromium'
Updating webkit projects from gyp files...

Full output: http://queues.webkit.org/results/15029960
Comment 73 Brent Fulgham 2012-11-29 15:47:18 PST
(In reply to comment #72)
> (From update of attachment 176813 [details])
> Rejecting attachment 176813 [details] from commit-queue.

It has to be r+'d by someone before the commit queue will accept it.
Comment 74 Viatcheslav Ostapenko 2012-11-29 16:37:53 PST
(In reply to comment #73)
> (In reply to comment #72)
> > (From update of attachment 176813 [details] [details])
> > Rejecting attachment 176813 [details] [details] from commit-queue.
> 
> It has to be r+'d by someone before the commit queue will accept it.

No, I've inserted "Reviewed by".
It complained about OOPS in Changelog. Sorry, missed it.
Comment 75 Viatcheslav Ostapenko 2012-11-29 16:42:58 PST
Created attachment 176848 [details]
Patch for landing
Comment 76 WebKit Review Bot 2012-11-29 17:16:06 PST
Comment on attachment 176848 [details]
Patch for landing

Clearing flags on attachment: 176848

Committed r136189: <http://trac.webkit.org/changeset/136189>
Comment 77 WebKit Review Bot 2012-11-29 17:16:15 PST
All reviewed patches have been landed.  Closing bug.
Comment 78 noel gordon 2012-11-30 00:06:36 PST
(In reply to comment #70)

> > I remember gcc gave some warning here, which treated as error on many builds. enum(int) conversion used in many places, so I followed the crowed. This doesn't produce any extra code - just type conversion.
> 
> Let's please do it as a C++-style cast in accordance with the coding conventions.  I actually thought it was some kind of libjpeg macro or something based on the way it was written:
> 
>     JSAMPLE* jsample = *samples + column * (static_cast<J_COLOR_SPACE>(colorSpace) == JCS_RGB ? 3 : 4);

% cat test.cpp

typedef enum {
    JCS_UNKNOWN, JCS_RGB, JCS_EXT_RGBA, JCS_EXT_BGRA
} J_COLOR_SPACE;

int colorSpace() { return 1; }

int main() {
  if (colorSpace() == JCS_EXT_RGBA)
      return 1;
  return 0;
}

% g++ -Wall --pedantic test.cpp

No warnings.  Maybe your respective g++'s are broken?
Comment 79 Peter Kasting 2012-11-30 13:01:09 PST
Comment on attachment 176848 [details]
Patch for landing

View in context: https://bugs.webkit.org/attachment.cgi?id=176848&action=review

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:652
> +    JSAMPLE* jsample = *samples + column * (static_cast<J_COLOR_SPACE>(colorSpace) == JCS_RGB ? 3 : 4);

Maybe we should make the template arg types J_COLOR_SPACE instead of int?

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:701
> +        if (m_reader->colorTransform() && colorSpace == JCS_RGB)

Weird that you cast in the previous function but not here.  We should be doing both or neither, I'd think.

> Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:718
> +    return m_scaled ? outputScanlines<colorSpace, true>(buffer) : outputScanlines<colorSpace, false>(buffer);

Can the compiler still manage to compile this if we just write as "return outputScanlines<colorSpace, m_scaled>(buffer);"?  I'm guessing it's not smart enough to figure that out, but if it is that'd be clearer...

At that point it'd probably make sense to inline this into any callsites.
Comment 80 Viatcheslav Ostapenko 2012-11-30 13:31:45 PST
(In reply to comment #79)
> (From update of attachment 176848 [details])

It is already committed. Should I make another patch?

> View in context: https://bugs.webkit.org/attachment.cgi?id=176848&action=review
> 
> > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:652
> > +    JSAMPLE* jsample = *samples + column * (static_cast<J_COLOR_SPACE>(colorSpace) == JCS_RGB ? 3 : 4);
> 
> Maybe we should make the template arg types J_COLOR_SPACE instead of int?

It requires including jpeglib.h into JPEGImageDecoder.h .
I think it doesn't worth it. It works just fine on current compilers without cast. IMHO older compiler versions were giving some warning
 
> > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:701
> > +        if (m_reader->colorTransform() && colorSpace == JCS_RGB)
> 
> Weird that you cast in the previous function but not here.  We should be doing both or neither, I'd think.

That's from recent changes. On gcc 2.6 it works fine and cast is not necessary.
Originally this patch comes from mobile device and I think we had some older compiler version.
It was giving error like this: http://stackoverflow.com/questions/8703743/using-enum-says-invalid-conversion-from-int-to-type

 
> > Source/WebCore/platform/image-decoders/jpeg/JPEGImageDecoder.cpp:718
> > +    return m_scaled ? outputScanlines<colorSpace, true>(buffer) : outputScanlines<colorSpace, false>(buffer);
> 
> Can the compiler still manage to compile this if we just write as "return outputScanlines<colorSpace, m_scaled>(buffer);"?  I'm guessing it's not smart enough to figure that out, but if it is that'd be clearer...

No, it doesn't.
Comment 81 Peter Kasting 2012-11-30 13:37:28 PST
(In reply to comment #80)
> > Maybe we should make the template arg types J_COLOR_SPACE instead of int?
> 
> It requires including jpeglib.h into JPEGImageDecoder.h .

Is that bad?  Does this break a layering rule?

> I think it doesn't worth it. It works just fine on current compilers without cast. IMHO older compiler versions were giving some warning

If we can safely include the header, I think it'd be clearer this way.  It resolves the question of static-casting in the implementation and it eliminates an implicit cast on the call side.  It's not a huge deal, I just think it's slightly nicer.  I don't really see a downside.
Comment 82 Brent Fulgham 2012-11-30 14:29:01 PST
(In reply to comment #81)
> (In reply to comment #80)
> > > Maybe we should make the template arg types J_COLOR_SPACE instead of int?
> > 
> > It requires including jpeglib.h into JPEGImageDecoder.h .
> 
> Is that bad?  Does this break a layering rule?
> 
> > I think it doesn't worth it. It works just fine on current compilers without cast. IMHO older compiler versions were giving some warning
> 
> If we can safely include the header, I think it'd be clearer this way.  It resolves the question of static-casting in the implementation and it eliminates an implicit cast on the call side.  It's not a huge deal, I just think it's slightly nicer.  I don't really see a downside.

I agree with Peter on this.  WebKit is a big enough beast that the added overhead of including jpeglib.h probably doesn't create any appreciable increase in compile times.

Do the LIBJPEG-TURBO headers have different names, or is it still just jpeglib.h?
Comment 83 Viatcheslav Ostapenko 2012-11-30 17:34:41 PST
Reopening to attach new patch.
Comment 84 Viatcheslav Ostapenko 2012-11-30 17:34:47 PST
Created attachment 177063 [details]
Patch
Comment 85 Viatcheslav Ostapenko 2012-11-30 17:36:20 PST
(In reply to comment #82)
> (In reply to comment #81)
> > (In reply to comment #80)
> > > > Maybe we should make the template arg types J_COLOR_SPACE instead of int?
> > > 
> > > It requires including jpeglib.h into JPEGImageDecoder.h .
> > 
> > Is that bad?  Does this break a layering rule?
> > 
> > > I think it doesn't worth it. It works just fine on current compilers without cast. IMHO older compiler versions were giving some warning
> > 
> > If we can safely include the header, I think it'd be clearer this way.  It resolves the question of static-casting in the implementation and it eliminates an implicit cast on the call side.  It's not a huge deal, I just think it's slightly nicer.  I don't really see a downside.
> 
> I agree with Peter on this.  WebKit is a big enough beast that the added overhead of including jpeglib.h probably doesn't create any appreciable increase in compile times.
> 
> Do the LIBJPEG-TURBO headers have different names, or is it still just jpeglib.h?

No, it's all the same.
I've attached the patch.
Comment 86 noel gordon 2012-11-30 19:07:38 PST
(In reply to comment #80)
> Originally this patch comes from mobile device and I think we had some older compiler version.
> It was giving error like this: http://stackoverflow.com/questions/8703743/using-enum-says-invalid-conversion-from-int-to-type

That article is about assigning an integer to an enumerated type.  Color n = 1;  // compile-error
Comment 87 noel gordon 2012-11-30 19:22:58 PST
(In reply to comment #80)
> That's from recent changes. On gcc 2.6 it works fine and cast is not necessary.
> Originally this patch comes from mobile device and I think we had some older compiler version.

"some older compiler version"?  Which compiler?  And which version of it?

We are talking about integer comparison.  Enumerated types are integral types and always have been.  They are compared in expressions using C++ integer promotion rules.  Refer to

  http://msdn.microsoft.com/en-us/library/bd77ckhw(v=vs.71).aspx

All the casting code added for the enum in question is superfluous noise.
Comment 88 Brent Fulgham 2012-12-03 10:14:18 PST
Comment on attachment 177063 [details]
Patch

Looks great!  r=me.
Comment 89 WebKit Review Bot 2012-12-03 10:19:17 PST
Comment on attachment 177063 [details]
Patch

Clearing flags on attachment: 177063

Committed r136410: <http://trac.webkit.org/changeset/136410>
Comment 90 WebKit Review Bot 2012-12-03 10:19:27 PST
All reviewed patches have been landed.  Closing bug.