WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 103216
PNG decode performance: avoid using frame buffer.setRGBA(x,y)
https://bugs.webkit.org/show_bug.cgi?id=103216
Summary
PNG decode performance: avoid using frame buffer.setRGBA(x,y)
noel gordon
Reported
2012-11-25 23:01:07 PST
Writing decoded row pixels to the frame buffer with buffer.setRGBA(x,y) is slow compared to writing direct to the frame buffer. I disabled color profiles in the PNG decoder, and instrumented the total decode time and time taken to write row data to the frame buffer. The amount in ()'s is the percentage of decode time spent writing row data.
http://duke.kenai.com/eco/DukeTreeHuggerSmall.png
row write time 0.002204 PNG decode ends 0.016935 (13%)
http://duke.kenai.com/eco/DukeTreeHuggerSmall.png
row write time 0.022714 PNG decode ends 0.141308 (16%)
http://trac.webkit.org/browser/trunk/LayoutTests/fast/images/resources/red-at-12-oclock-with-color-profile.png
row write time 0.001901 PNG decode ends 0.012777 (14%)
Attachments
Patch
(5.68 KB, patch)
2012-11-25 23:05 PST
,
noel gordon
no flags
Details
Formatted Diff
Diff
Patch for landing
(5.62 KB, patch)
2012-11-26 17:21 PST
,
noel gordon
no flags
Details
Formatted Diff
Diff
png.write.time.git.diff
(3.59 KB, text/plain)
2012-11-26 18:41 PST
,
noel gordon
no flags
Details
png.decoder.time.git.diff
(4.33 KB, text/plain)
2012-11-26 18:44 PST
,
noel gordon
no flags
Details
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
noel gordon
Comment 1
2012-11-25 23:05:23 PST
Created
attachment 175930
[details]
Patch
noel gordon
Comment 2
2012-11-25 23:12:25 PST
The patch avoids using buffer.setRGBA(x,y), and writes direct to the frame buffer. This reduces decoding time 2% or more for these images.
http://duke.kenai.com/eco/DukeTreeHuggerSmall.png
row write time 0.001829 PNG decode ends 0.016632 (10%)
http://duke.kenai.com/eco/DukeTreeHuggerSmall.png
row write time 0.017736 PNG decode ends 0.136413 (13%)
http://trac.webkit.org/browser/trunk/LayoutTests/fast/images/resources/red-at-12-oclock-with-color-profile.png
row write time 0.001005 PNG decode ends 0.012086 (8%)
Brent Fulgham
Comment 3
2012-11-26 00:26:01 PST
Comment on
attachment 175930
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=175930&action=review
Looks good. I'll r+ after the EWS finishes running.
> Source/WebCore/platform/image-decoders/ImageDecoder.h:134 > + void reportMemoryUsage(MemoryObjectInfo*) const;
Why was this moved?
> Source/WebCore/platform/image-decoders/ImageDecoder.h:170 > + float alphaPercent = a / 255.0f;
Divides are costly. It might save ~20 cycles to multiply a by a precomputed "1/255.0f" I'm not sure if this would create any significant inaccuracies due to the vagaries of floating point math...
noel gordon
Comment 4
2012-11-26 02:47:24 PST
(In reply to
comment #3
)
> > Source/WebCore/platform/image-decoders/ImageDecoder.h:134 > > + void reportMemoryUsage(MemoryObjectInfo*) const; > > Why was this moved?
It was in the middle of routines dealing with frame buffer writing. Seemed misplaced to me, so I moved it elsewhere.
> > Source/WebCore/platform/image-decoders/ImageDecoder.h:170 > > + float alphaPercent = a / 255.0f; > > Divides are costly. It might save ~20 cycles to multiply a by a precomputed "1/255.0f" I'm not sure if this would create any significant inaccuracies due to the vagaries of floating point math...
Indeed. We have inaccuracies as the code stands for other reasons, and a change here might introduce more. Not looking to change behavior on this bug.
Viatcheslav Ostapenko
Comment 5
2012-11-26 07:29:51 PST
This is basically duplicate of
bug 88424
and recent patch there does the same for all image formats with some extra.
Viatcheslav Ostapenko
Comment 6
2012-11-26 07:53:22 PST
(In reply to
comment #3
)
> (From update of
attachment 175930
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=175930&action=review
> > Looks good. I'll r+ after the EWS finishes running.
I would ask you too look at the complete patch which has this as a part:
https://bugs.webkit.org/attachment.cgi?id=175328&action=review
Brent Fulgham
Comment 7
2012-11-26 10:01:13 PST
(In reply to
comment #6
)
> (In reply to
comment #3
) > > (From update of
attachment 175930
[details]
[details]) > > View in context:
https://bugs.webkit.org/attachment.cgi?id=175930&action=review
> > > > Looks good. I'll r+ after the EWS finishes running. > > I would ask you too look at the complete patch which has this as a part:
https://bugs.webkit.org/attachment.cgi?id=175328&action=review
Battle of the competing patches! I must admit that I prefer Viatcheslav's handling of the advancement of the address position, but otherwise I think that committing these as individual changes is better from a review standpoint. Noel, do you have committer rights?
Viatcheslav Ostapenko
Comment 8
2012-11-26 11:20:40 PST
(In reply to
comment #7
)
> (In reply to
comment #6
) > > (In reply to
comment #3
) > > > (From update of
attachment 175930
[details]
[details] [details]) > > > View in context:
https://bugs.webkit.org/attachment.cgi?id=175930&action=review
> > > > > > Looks good. I'll r+ after the EWS finishes running. > > > > I would ask you too look at the complete patch which has this as a part:
https://bugs.webkit.org/attachment.cgi?id=175328&action=review
> > Battle of the competing patches! > > I must admit that I prefer Viatcheslav's
Actually it is not my patch. I just rebaselined original patch from a person who submitted it half a year ago, but doesn't work on webkit anymore. This upstreamed patch showed great improvement on mobile devices where CPU branch prediction is not so good as on desktop processors.
noel gordon
Comment 9
2012-11-26 15:39:19 PST
(In reply to
comment #5
)
> This is basically duplicate of
bug 88424
and recent patch there does the same for all image formats with some extra.
(In reply to
comment #6
)
> > I would ask you too look at the complete patch which has this as a part:
https://bugs.webkit.org/attachment.cgi?id=175328&action=review
I would r- that patch: insufficient ChangeLog, it breaks JPEG on some ports, and is poorly tested, yet you claim it greatly improves performance on mobile mobile devices. We are shipping code in a production browser environments: production is the wrong place to test webkit patches.
noel gordon
Comment 10
2012-11-26 15:41:44 PST
(In reply to
comment #7
)
> I must admit that I prefer Viatcheslav's handling of the advancement of the address position, but otherwise I think that committing these as individual changes is better from a review standpoint.
Moreover, individual changes are easier to test, and roll out.
> Noel, do you have committer rights?
Yes.
Brent Fulgham
Comment 11
2012-11-26 16:38:15 PST
Comment on
attachment 175930
[details]
Patch Let's get this started!
Brent Fulgham
Comment 12
2012-11-26 16:40:17 PST
(In reply to
comment #10
)
> (In reply to
comment #7
) > > > I must admit that I prefer Viatcheslav's handling of the advancement of the address position, but otherwise I think that committing these as individual changes is better from a review standpoint. > > Moreover, individual changes are easier to test, and roll out. > > > Noel, do you have committer rights? > > Yes.
Okay. I marked this as r+, and cq+. I'll update 88424 to rebaseline after this lands, as those changes will no longer be necessary. Incidentally, how are you instrumenting the code to run your performance checks? If this is documented somewhere I'd like to get up-to-speed with it for some future changes I'd like to try.
WebKit Review Bot
Comment 13
2012-11-26 16:58:45 PST
Comment on
attachment 175930
[details]
Patch Rejecting
attachment 175930
[details]
from commit-queue. Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '-..." exit_code: 2 Last 500 characters of output: (content): Merge conflict in Source/WebKit/chromium/ChangeLog Failed to merge in the changes. Patch failed at 0001 RenderStyle: Move 'list-style-image' to rare inherited data. When you have resolved this problem run "git rebase --continue". If you would prefer to skip this patch, instead run "git rebase --skip". To restore the original branch and stop rebasing run "git rebase --abort". rebase refs/remotes/origin/master: command returned error: 1 Died at Tools/Scripts/update-webkit line 154. Full output:
http://queues.webkit.org/results/14989805
noel gordon
Comment 14
2012-11-26 17:21:21 PST
Created
attachment 176121
[details]
Patch for landing
Brent Fulgham
Comment 15
2012-11-26 17:25:20 PST
Comment on
attachment 176121
[details]
Patch for landing Thanks for rebaselining.
WebKit Review Bot
Comment 16
2012-11-26 18:11:04 PST
Comment on
attachment 176121
[details]
Patch for landing Clearing flags on attachment: 176121 Committed
r135799
: <
http://trac.webkit.org/changeset/135799
>
WebKit Review Bot
Comment 17
2012-11-26 18:11:08 PST
All reviewed patches have been landed. Closing bug.
noel gordon
Comment 18
2012-11-26 18:15:57 PST
(In reply to
comment #0
) The second image URL was wrong: should be
http://duke.kenai.com/eco/GreenThumbLarge.png
> Writing decoded row pixels to the frame buffer with buffer.setRGBA(x,y) is slow compared to writing direct to the frame buffer. > > I disabled color profiles in the PNG decoder, and instrumented the total decode time and time taken to write row data to the frame buffer. The amount in ()'s is the percentage of decode time spent writing row data.
http://duke.kenai.com/eco/DukeTreeHuggerSmall.png
row write time 0.002204 PNG decode ends 0.016935 (13%)
http://duke.kenai.com/eco/GreenThumbLarge.png
row write time 0.022714 PNG decode ends 0.141308 (16%)
http://trac.webkit.org/browser/trunk/LayoutTests/fast/images/resources/red-at-12-oclock-with-color-profile.png
row write time 0.001901 PNG decode ends 0.012777 (14%) (In reply to
comment #2
)
> The patch avoids using buffer.setRGBA(x,y), and writes direct to the frame buffer. This reduces decoding time 2% or more for these images.
http://duke.kenai.com/eco/DukeTreeHuggerSmall.png
row write time 0.001829 PNG decode ends 0.016632 (10%)
http://duke.kenai.com/eco/GreenThumbLarge.png
row write time 0.017736 PNG decode ends 0.136413 (13%)
http://trac.webkit.org/browser/trunk/LayoutTests/fast/images/resources/red-at-12-oclock-with-color-profile.png
row write time 0.001005 PNG decode ends 0.012086 (8%)
noel gordon
Comment 19
2012-11-26 18:39:50 PST
(In reply to
comment #12
)
> Incidentally, how are you instrumenting the code to run your performance checks? If this is documented somewhere I'd like to get up-to-speed with it for some future changes I'd like to try.
My only documentation is a patch.
noel gordon
Comment 20
2012-11-26 18:41:28 PST
Created
attachment 176140
[details]
png.write.time.git.diff
noel gordon
Comment 21
2012-11-26 18:44:02 PST
Created
attachment 176143
[details]
png.decoder.time.git.diff This is the timing patch I used for the test results on this bug. Enjoy.
noel gordon
Comment 22
2012-11-27 03:22:22 PST
(In reply to
comment #3
)
> Divides are costly. It might save ~20 cycles to multiply a by a precomputed "1/255.0f" I'm not sure if this would create any significant inaccuracies due to the vagaries of floating point math...
Given the above timing patch, one can try multiplying by the constant factor 1/255.0 to avoid the division. My result on Chromium Mac is no speed improvement :/
Brent Fulgham
Comment 23
2012-11-27 10:43:27 PST
(In reply to
comment #22
)
> (In reply to
comment #3
) > > > Divides are costly. It might save ~20 cycles to multiply a by a precomputed "1/255.0f" I'm not sure if this would create any significant inaccuracies due to the vagaries of floating point math... > > Given the above timing patch, one can try multiplying by the constant factor 1/255.0 to avoid the division. My result on Chromium Mac is no speed improvement :/
Thanks for checking -- it was worth a try. I've seen this shave a few percent off of tight loops in the past, but we must not be hitting this hard enough for typical use cases. There are other places where we do this kind of pass over the entire contents of a buffer. Converting to multiplies across the board would probably help some, and taking advantage of SSE vectorizing would potentially be an even bigger win, assuming alignment is dealt with properly.
noel gordon
Comment 24
2012-11-27 14:25:55 PST
(In reply to
comment #23
) I've similarly seen a few percent shaved off tight loops by converting a div to a mul (example below). Perhaps Clang automatically computes the inverse.
> Converting to multiplies across the board would probably help some, and taking advantage of SSE vectorizing would potentially be an even bigger win ...
Indeed. Skia has alpha pre-multiplication converters that avoid the division by using shifts
http://code.google.com/searchframe#OAMlx_jo-ck/src/third_party/skia/src/core/SkMathPriv.h&exact_package=chromium&q=SkMulDiv255&type=cs&l=72
and SSE would be faster still. However, doing it right either way requires a understanding of whether to round, truncate, or ceiling the result.
noel gordon
Comment 25
2012-11-27 14:31:18 PST
Also you asked for ImageFrame width() and height() to be inlined in this bug and I prepared a separate patch for that:
bug 103401
-- did I misunderstand
https://bugs.webkit.org/show_bug.cgi?id=88424#c38
??
Brent Fulgham
Comment 26
2012-11-27 14:49:03 PST
(In reply to
comment #25
)
> Also you asked for ImageFrame width() and height() to be inlined in this bug and I prepared a separate patch for that:
bug 103401
-- did I misunderstand
https://bugs.webkit.org/show_bug.cgi?id=88424#c38
??
No -- you didn't misunderstand, though it may not be worth doing in its own patch. I think ostap's current patch (rebaselined after landing this bug) includes the inlining. I think it would be good to combine the width/height inlining with a possible templatized setRGBA implementation that gets rid of the branching for premultiplication, since we know outside the main loop whether we need to apply those transformations or not. No need to test for each pixel.
noel gordon
Comment 27
2012-11-27 16:05:33 PST
(In reply to
comment #26
)
> (In reply to
comment #25
) > > Also you asked for ImageFrame width() and height() to be inlined in this bug and I prepared a separate patch for that:
bug 103401
-- did I misunderstand
https://bugs.webkit.org/show_bug.cgi?id=88424#c38
?? > > No -- you didn't misunderstand, though it may not be worth doing in its own patch. I think ostap's current patch (rebaselined after landing this bug) includes the inlining.
But as public symbols. Can we please not do that.
> I think it would be good to combine the width/height inlining with a possible templatized setRGBA implementation that gets rid of the branching for premultiplication, since we know outside the main loop whether we need to apply those transformations or not. No need to test for each pixel.
Noting that the 3d canvas (WebGL) performance and tests will need to be looked as a result. Perhaps one simple step at a time.
Brent Fulgham
Comment 28
2012-11-27 16:51:53 PST
(In reply to
comment #27
)
> But as public symbols. Can we please not do that.
Oh! Right, they don't need to be exposed at all.
> Noting that the 3d canvas (WebGL) performance and tests will need to be looked as a result. Perhaps one simple step at a time.
Sounds good. By the way, I've created a patch to track adding ostap's profiling utility (
https://bugs.webkit.org/show_bug.cgi?id=103463
).
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug