RESOLVED FIXED Bug 139310
[Apple] Use Accelerate framework to speed-up FEGaussianBlur
https://bugs.webkit.org/show_bug.cgi?id=139310
Summary [Apple] Use Accelerate framework to speed-up FEGaussianBlur
Dean Jackson
Reported 2014-12-05 13:19:45 PST
Accelerate.framework provides a better speedup than parallelJobs on Apple platforms, assuming the kernel is compatible. We should use it. <rdar://problem/18434594>
Attachments
Patch (12.88 KB, patch)
2014-12-05 14:01 PST, Dean Jackson
no flags
Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2 (572.95 KB, application/zip)
2014-12-05 16:19 PST, Build Bot
no flags
Archive of layout-test-results from webkit-ews-12 for mac-mountainlion-wk2 (511.67 KB, application/zip)
2014-12-05 18:06 PST, Build Bot
no flags
Archive of layout-test-results from webkit-ews-04 for mac-mountainlion (588.94 KB, application/zip)
2014-12-05 19:24 PST, Build Bot
no flags
Archive of layout-test-results from webkit-ews-07 for mac-mountainlion (652.90 KB, application/zip)
2014-12-05 21:38 PST, Build Bot
no flags
Patch (15.89 KB, patch)
2014-12-08 15:04 PST, Dean Jackson
simon.fraser: review+
Dean Jackson
Comment 1 2014-12-05 14:01:26 PST
Tim Horton
Comment 2 2014-12-05 14:28:03 PST
Comment on attachment 242659 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=242659&action=review > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:33 > +#if PLATFORM(COCOA) && HAVE(ACCELERATE) Anders thinks this should just be HAVE(ACCELERATE) (and elsewhere) > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:290 > + vImageBoxConvolve_ARGB8888(&effectInBuffer, &effectOutBuffer, 0, 0, 0, radius, radius, 0, kvImageEdgeExtend); > + vImageBoxConvolve_ARGB8888(&effectOutBuffer, &effectInBuffer, 0, 0, 0, radius, radius, 0, kvImageEdgeExtend); > + vImageBoxConvolve_ARGB8888(&effectInBuffer, &effectOutBuffer, 0, 0, 0, radius, radius, 0, kvImageEdgeExtend); We should consider using a shared temp buffer here instead of allocating three. There are instructs in the docs for figuring out how big to make it. > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:304 > + else { > +#endif Can we write this differently so we don't have a brace inside an ifdef with its closing brace in a different one? > ManualTests/blur-filter-timing.html:1 > +<!DOCTYPE html> Should be in PerformanceTests/Interactive, I think? > ManualTests/blur-filter-timing.html:13 > + var WIDTH = 600; why so many caps?
Simon Fraser (smfr)
Comment 3 2014-12-05 14:28:18 PST
Comment on attachment 242659 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=242659&action=review > Source/WebCore/ChangeLog:14 > + especially on retina devices. The problem was blur, and that we > + fixed a bug about 9 months ago that computed filters at the Would be good to reference the exact revision. > Source/WebCore/ChangeLog:34 > + FWIW, I also tried a CoreImage implementation, but the expense of > + copying back from the GPU was too high. When we have a fully accelerated > + filter chain (without needing compositing) I expect this approach > + will win. This belongs in the bug, but not in the changelog. > Source/WebCore/ChangeLog:36 > + There is a manual timing test: ManualTests/blur-filter-timing.html I think you should add a perf test. > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:274 > + // The accelerated box convolve approximation only works with an odd radius. > + if (radius % 2 != 1) > + radius += 1; So this doesn't exactly match the other impl? Is this visible?
Build Bot
Comment 4 2014-12-05 16:19:26 PST
Comment on attachment 242659 [details] Patch Attachment 242659 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.appspot.com/results/5812017482956800 New failing tests: svg/filters/svg-gaussianblur-edgeMode-duplicate.svg
Build Bot
Comment 5 2014-12-05 16:19:30 PST
Created attachment 242673 [details] Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: webkit-ews-10 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Build Bot
Comment 6 2014-12-05 18:06:39 PST
Comment on attachment 242659 [details] Patch Attachment 242659 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.appspot.com/results/6422366695456768 New failing tests: svg/filters/svg-gaussianblur-edgeMode-duplicate.svg
Build Bot
Comment 7 2014-12-05 18:06:43 PST
Created attachment 242690 [details] Archive of layout-test-results from webkit-ews-12 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: webkit-ews-12 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Build Bot
Comment 8 2014-12-05 19:24:38 PST
Comment on attachment 242659 [details] Patch Attachment 242659 [details] did not pass mac-ews (mac): Output: http://webkit-queues.appspot.com/results/5611548273803264 New failing tests: svg/filters/svg-gaussianblur-edgeMode-duplicate.svg
Build Bot
Comment 9 2014-12-05 19:24:42 PST
Created attachment 242695 [details] Archive of layout-test-results from webkit-ews-04 for mac-mountainlion The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: webkit-ews-04 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Build Bot
Comment 10 2014-12-05 21:38:32 PST
Comment on attachment 242659 [details] Patch Attachment 242659 [details] did not pass mac-ews (mac): Output: http://webkit-queues.appspot.com/results/6322940886908928 New failing tests: svg/filters/svg-gaussianblur-edgeMode-duplicate.svg
Build Bot
Comment 11 2014-12-05 21:38:35 PST
Created attachment 242701 [details] Archive of layout-test-results from webkit-ews-07 for mac-mountainlion The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: webkit-ews-07 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Dean Jackson
Comment 12 2014-12-08 11:52:11 PST
Comment on attachment 242659 [details] Patch I think Simon meant r-. Either way, updated patch is coming.
Dean Jackson
Comment 13 2014-12-08 12:00:11 PST
Comment on attachment 242659 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=242659&action=review >> Source/WebCore/ChangeLog:14 >> + fixed a bug about 9 months ago that computed filters at the > > Would be good to reference the exact revision. OK. >> Source/WebCore/ChangeLog:34 >> + will win. > > This belongs in the bug, but not in the changelog. OK. >> Source/WebCore/ChangeLog:36 >> + There is a manual timing test: ManualTests/blur-filter-timing.html > > I think you should add a perf test. OK. >> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:33 >> +#if PLATFORM(COCOA) && HAVE(ACCELERATE) > > Anders thinks this should just be HAVE(ACCELERATE) (and elsewhere) OK. >> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:274 >> + radius += 1; > > So this doesn't exactly match the other impl? Is this visible? This code comes from an example posted at WWDC by the Accelerate team (it was called UIImage+ImageEffects, but seems to have disappeared from the Apple site). In that example it is emulating a Gaussian blur, and references the SVG specification for how to do it with 3 box blurs. http://www.w3.org/TR/SVG/filters.html#feGaussianBlurElement That includes "... if d is odd, use three box-blurs of size 'd', centered on the output pixel." This is also the same approach that our software path uses, so we're ok here. >> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:290 >> + vImageBoxConvolve_ARGB8888(&effectInBuffer, &effectOutBuffer, 0, 0, 0, radius, radius, 0, kvImageEdgeExtend); > > We should consider using a shared temp buffer here instead of allocating three. There are instructs in the docs for figuring out how big to make it. Great point. >> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:304 >> +#endif > > Can we write this differently so we don't have a brace inside an ifdef with its closing brace in a different one? I tried a few ways to do this, but they were all slightly ugly. <rant>I have no idea how a method could be called platformApplyGeneric :) It makes no sense!</rant> I'll fix this up by splitting both the new code and the old code into different static methods, to see if that makes things cleaner. >> ManualTests/blur-filter-timing.html:13 >> + var WIDTH = 600; > > why so many caps? Because that's what I do in JS to represent constants :)
Dean Jackson
Comment 14 2014-12-08 14:57:40 PST
A similar approach could speed up some of the other filters, but blur is the one that has the biggest performance suck. FWIW, I also tried a CoreImage implementation, but the expense of copying back from the GPU was too high. When we have a fully accelerated filter chain (without needing compositing) I expect this approach will win.
Dean Jackson
Comment 15 2014-12-08 15:04:32 PST
Simon Fraser (smfr)
Comment 16 2014-12-08 16:56:02 PST
Comment on attachment 242852 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=242852&action=review > Source/WebCore/ChangeLog:17 > + We were getting reports of page slowdown using CSS filters, > + especially on retina devices. The problem was blur, and that we > + fixed a bug about 7 months ago that computed filters at the > + correct resolution (r168577). When not composited, we were now operating > + on 4x the pixels, and performing an expensive operation multiple > + times. This is still a bit of a "sit down, let me tell you a story" > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:52 > +inline void kernelPosition(int boxBlur, unsigned& std, int& dLeft, int& dRight) std, dLeft, dRight are terrible names. > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:55 > + switch (boxBlur) { is boxBlur really the step? Could use a better name, like blurStep. > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:322 > + RefPtr<Uint8ClampedArray> tmpBuffer = Uint8ClampedArray::createUninitialized(tmpBufferSize); Can't the buffer just be a malloc?
Dean Jackson
Comment 17 2014-12-08 17:31:41 PST
Carlos Alberto Lopez Perez
Comment 18 2014-12-10 05:46:25 PST
(In reply to comment #17) > Committed r177000: <http://trac.webkit.org/changeset/177000> The perf test Interactive/blur-filter-timing.html added by this commit is failing on ALL platforms except Apple Yosemite. GTK: https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20%28Perf%29 Apple MountainLion: https://build.webkit.org/builders/Apple%20MountainLion%20Release%20%28Perf%29 Apple Mavericks: https://build.webkit.org/builders/Apple%20Mavericks%20Release%20%28Perf%29 EFL: https://build.webkit.org/builders/EFL%20Linux%2064-bit%20Release%20WK2%20%28Perf%29 For example, on GTK it fails with: $ Tools/Scripts/run-perf-tests --no-show-results --platform gtk --release -2 Interactive/blur-filter-timing.html Running 1 tests Running Interactive/blur-filter-timing.html (1 of 1) ERROR: layer at (0,0) size 785x673 FAILED Finished: 2.328078 s Also, it seems is testing for a feature that is specific to Mac, so can we make it run only on the platforms where is supposed to work?
David Kilzer (:ddkilzer)
Comment 19 2014-12-23 22:30:23 PST
(In reply to comment #18) > (In reply to comment #17) > > Committed r177000: <http://trac.webkit.org/changeset/177000> > > The perf test Interactive/blur-filter-timing.html added by this commit is > failing on ALL platforms except Apple Yosemite. > > GTK: > https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20%28Perf%29 > Apple MountainLion: > https://build.webkit.org/builders/Apple%20MountainLion%20Release%20%28Perf%29 > Apple Mavericks: > https://build.webkit.org/builders/Apple%20Mavericks%20Release%20%28Perf%29 > EFL: > https://build.webkit.org/builders/EFL%20Linux%2064- > bit%20Release%20WK2%20%28Perf%29 > > For example, on GTK it fails with: > > $ Tools/Scripts/run-perf-tests --no-show-results --platform gtk --release -2 > Interactive/blur-filter-timing.html > Running 1 tests > Running Interactive/blur-filter-timing.html (1 of 1) > ERROR: layer at (0,0) size 785x673 > FAILED > Finished: 2.328078 s > > Also, it seems is testing for a feature that is specific to Mac, so can we > make it run only on the platforms where is supposed to work? Test fixed by: Bug 139462: Blur filter performance test doesn't provide results <https://bugs.webkit.org/show_bug.cgi?id=139462> <http://trac.webkit.org/changeset/177088>
Benjamin Poulain
Comment 20 2014-12-24 13:44:43 PST
Comment on attachment 242852 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=242852&action=review > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:381 > +#if HAVE(ACCELERATE) Did you test ARMv7? I'd be curious to know the difference between accelerate and Neon on ARMv7. The amortized perf of accelerate is okay but it tends to be bad on small loads. Here it is even worst because accelerate needs to go multiple time over the buffer. Is that really faster than the NEON intrinsics? ARM64 has AdvSIMD which is very close do Neon. It would be easy to make a fast implementation directly instead of using accelerate.
Note You need to log in before you can comment on or make changes to this bug.