Bug 50804 - [chromium] Reduce canvas.toDataURL("image/jpeg") run-time cost by 10%
Summary: [chromium] Reduce canvas.toDataURL("image/jpeg") run-time cost by 10%
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Canvas (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Windows 7
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-10 00:38 PST by noel gordon
Modified: 2010-12-17 01:49 PST (History)
6 users (show)

See Also:


Attachments
patch (2.46 KB, patch)
2010-12-10 00:58 PST, noel gordon
no flags Details | Formatted Diff | Diff
patch (2.45 KB, patch)
2010-12-10 01:06 PST, noel gordon
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description noel gordon 2010-12-10 00:38:57 PST
Reduce canvas.toDataURL("image/jpeg") run-time cost by 10%
Comment 1 noel gordon 2010-12-10 00:58:34 PST
Created attachment 76173 [details]
patch
Comment 2 WebKit Review Bot 2010-12-10 01:00:16 PST
Attachment 76173 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--diff-files', u'WebCore/ChangeLog', u'WebCore/platform/image-encoders/skia/JPEGImageEncoder.cpp']" exit_code: 1
WebCore/platform/image-encoders/skia/JPEGImageEncoder.cpp:86:  One or more unexpected \r (^M) found; better to use only a \n  [whitespace/carriage_return] [1]
Suppressing further [whitespace/carriage_return] reports for this file.
WebCore/platform/image-encoders/skia/JPEGImageEncoder.cpp:37:  Alphabetical sorting problem.  [build/include_order] [4]
WebCore/platform/image-encoders/skia/JPEGImageEncoder.cpp:90:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Total errors found: 12 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 noel gordon 2010-12-10 01:06:04 PST
Created attachment 76174 [details]
patch

style: can't avoid the non-alphabetical include ordering nit.
Comment 4 noel gordon 2010-12-10 01:07:48 PST
so I expected to nitted again.
Comment 5 WebKit Review Bot 2010-12-10 01:08:49 PST
Attachment 76174 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--diff-files', u'WebCore/ChangeLog', u'WebCore/platform/image-encoders/skia/JPEGImageEncoder.cpp']" exit_code: 1
WebCore/platform/image-encoders/skia/JPEGImageEncoder.cpp:37:  Alphabetical sorting problem.  [build/include_order] [4]
WebCore/platform/image-encoders/skia/JPEGImageEncoder.cpp:90:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Total errors found: 2 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 6 Eric Seidel (no email) 2010-12-10 01:38:23 PST
Comment on attachment 76174 [details]
patch

You should consider using webkit-patch upload to upload your patches so that they get marked properly as patch files.
Comment 7 Eric Seidel (no email) 2010-12-10 01:40:51 PST
Comment on attachment 76174 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=76174&action=review

It's a little surprising that this unroll makes things faster, but OK.   Are you certain in the stability of your numbers?

> WebCore/platform/image-encoders/skia/JPEGImageEncoder.cpp:90
> +        if ((alpha != 0) && (alpha != 255)) {

We generally use just "alpha" instead of "alpha != 0", but it might be OK here.
Comment 8 noel gordon 2010-12-12 17:40:43 PST
(In reply to comment #7)
> (From update of attachment 76174 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=76174&action=review
> 
> It's a little surprising that this unroll makes things faster, but OK.   Are you certain in the stability of your numbers?

Test image is 1800 x 1350 px JPEG.  I load it into a <canvas> and then I do back-to-back canvas.toDataURL("image/jpeg")
calls in js, logging the time as I go.  My results are:

Safari 5.0.2 win32 (7533.18.5):
 12:15:45:062 extract image/jpeg 145 ms length: 667763 bytes
 12:15:45:214 extract image/jpeg 146 ms length: 667763 bytes
 12:15:45:366 extract image/jpeg 146 ms length: 667763 bytes
 12:15:45:517 extract image/jpeg 145 ms length: 667763 bytes
 12:15:45:668 extract image/jpeg 145 ms length: 667763 bytes
 12:15:45:819 extract image/jpeg 144 ms length: 667763 bytes
 12:15:45:970 extract image/jpeg 144 ms length: 667763 bytes
 12:15:46:122 extract image/jpeg 145 ms length: 667763 bytes
 12:15:46:273 extract image/jpeg 145 ms length: 667763 bytes
 12:15:46:424 extract image/jpeg 145 ms length: 667763 bytes
 12:15:46:575 extract image/jpeg 144 ms length: 667763 bytes

Chrome ToT win32:
 12:09:16:144 extract image/jpeg 129 ms length: 630179 bytes
 12:09:16:301 extract image/jpeg 130 ms length: 630179 bytes
 12:09:16:458 extract image/jpeg 130 ms length: 630179 bytes
 12:09:16:617 extract image/jpeg 130 ms length: 630179 bytes
 12:09:16:752 extract image/jpeg 130 ms length: 630179 bytes
 12:09:16:909 extract image/jpeg 129 ms length: 630179 bytes
 12:09:17:044 extract image/jpeg 129 ms length: 630179 bytes
 12:09:17:203 extract image/jpeg 130 ms length: 630179 bytes
 12:09:17:339 extract image/jpeg 130 ms length: 630179 bytes
 12:09:17:474 extract image/jpeg 130 ms length: 630179 bytes
 12:09:17:632 extract image/jpeg 130 ms length: 630179 bytes

Chrome ToT win32 with this change:
 12:13:14:649 extract image/jpeg 115 ms length: 630179 bytes
 12:13:14:769 extract image/jpeg 115 ms length: 630179 bytes
 12:13:14:891 extract image/jpeg 114 ms length: 630179 bytes
 12:13:15:011 extract image/jpeg 114 ms length: 630179 bytes
 12:13:15:131 extract image/jpeg 115 ms length: 630179 bytes
 12:13:15:253 extract image/jpeg 115 ms length: 630179 bytes
 12:13:15:373 extract image/jpeg 115 ms length: 630179 bytes
 12:13:15:494 extract image/jpeg 115 ms length: 630179 bytes
 12:13:15:617 extract image/jpeg 115 ms length: 630179 bytes
 12:13:15:738 extract image/jpeg 115 ms length: 630179 bytes
 12:13:15:857 extract image/jpeg 114 ms length: 630179 bytes

worth it?

> > WebCore/platform/image-encoders/skia/JPEGImageEncoder.cpp:90
> > +        if ((alpha != 0) && (alpha != 255)) {
> 
> We generally use just "alpha" instead of "alpha != 0", but it might be OK here.

agree, OK here.
Comment 9 Eric Seidel (no email) 2010-12-13 00:20:34 PST
Comment on attachment 76174 [details]
patch

Those are definitely small numbers. :)  But 10% is still 10%.  Woud be even better if we had a benchmark for this sort of thing, but 115ms is pretty small already. :)  WebCore/benchmarks is where such a benchmark would go.
Comment 10 WebKit Review Bot 2010-12-13 01:12:15 PST
Comment on attachment 76174 [details]
patch

Clearing flags on attachment: 76174

Committed r73890: <http://trac.webkit.org/changeset/73890>
Comment 11 WebKit Review Bot 2010-12-13 01:12:21 PST
All reviewed patches have been landed.  Closing bug.
Comment 12 Stephen White 2010-12-13 07:46:02 PST
Comment on attachment 76174 [details]
patch

I'm fine with this change, but I'm surprised that putting the conditional in the loop doesn't make performance worse.  Perhaps your tests are mostly alpha 255 or 0, and are benefiting disproportionately from branch prediction?

You could also look at writing an int at a time to the output, rather than a char at a time.  Of course, this would require unrolling 3X or so in order to write RGBR GBRG BRGB, kind of a pain, but usually nets a good speedup.

If this continues to be a bottleneck, it might be worth looking at an SSE2 version as well.
Comment 13 noel gordon 2010-12-13 20:02:07 PST
> Those are definitely small numbers. :)  But 10% is still 10%.  Woud be even better if we had a benchmark for this sort of thing, but 115ms is pretty small already. :)  WebCore/benchmarks is where such a benchmark would go.

Happy to take the 10%, we are about to get faster still.  hbono is is working
on bug 50054, which should make the encoding phase 3x faster.
Comment 14 noel gordon 2010-12-13 20:37:32 PST
(In reply to comment #12)
> I'm fine with this change, but I'm surprised that putting the conditional in the loop doesn't make performance worse.
> Perhaps your tests are mostly alpha 255 or 0, and are benefiting disproportionately from branch prediction?

Apply the first statement to canvas.getImageData and canvas.putImageData :)  But yes, I believe the else branch
benefits from alpha 0|255 and that it also exposes the cost of the existing "lookup" table.  This is not new idea;
see the PNG ncoder implementation in the skia repository for example.

 
> You could also look at writing an int at a time to the output, rather than a char at a time.  Of course, this would
> require unrolling 3X or so in order to write RGBR GBRG BRGB, kind of a pain, but usually nets a good speedup.

Yes a pain.  Would it be better to build premultipication/unpremultiplication tables?

http://mxr.mozilla.org/mozilla2.0/source/content/canvas/src/nsCanvasRenderingContext2D.cpp#3917
http://mxr.mozilla.org/mozilla2.0/source/content/canvas/src/nsCanvasRenderingContext2D.cpp#3996


> If this continues to be a bottleneck, it might be worth looking at an SSE2 version as well.

Indeed, hbono suggested exactly that approach, in particular that he could implement the (un)premultiplication via
SSE or  MMX.  I'm awaiting the result of his bug 50054 (I'm not synced to it yet) to measure the expected encoding
phase speed-up (3x), and to consider if the unpremultiplication is a "bottleneck" worth more optimization ...
Comment 15 Hironori Bono 2010-12-14 18:21:02 PST
(In reply to comment #13)
> Happy to take the 10%, we are about to get faster still.  hbono is is working
> on bug 50054, which should make the encoding phase 3x faster.

This is just for your information. I have been integrating premultiply/unpremultiply operations into libjpeg-turbo so we can directly use skia pixels, i.e. we don't need Vector<JSAMPLE>. For those who are interested in it, I have uploaded a part of this change (only unpremultiply operations for SSE) to Rietveld: <http://codereview.chromium.org/5862001/>.

Regards,
Comment 16 noel gordon 2010-12-14 21:44:02 PST
(In reply to comment #14)
> (In reply to comment #12)
> > I'm fine with this change, but I'm surprised that putting the conditional in the loop doesn't make performance worse.
> > Perhaps your tests are mostly alpha 255 or 0, and are benefiting disproportionately from branch prediction?
> 
> Apply the first statement to canvas.getImageData and canvas.putImageData :)  

correction, those are templated.
Comment 17 noel gordon 2010-12-14 23:59:43 PST
(In reply to comment #15)
@hbono thanks for http://codereview.chromium.org/5862001, sounds good.  I should note, be aware of 
http://www.w3.org/Bugs/Public/show_bug.cgi?id=11431
Comment 18 noel gordon 2010-12-17 01:49:09 PST
http://www.w3.org/Bugs/Public/show_bug.cgi?id=11431 is easy to resolve - the spec is correct.

Opened bug 51237 to correct chromium.  It's related to this bug, and we maintain the speedup
mentioned here, but there's no need to unpremultiply RGB.  See bug 51237 for details.