Accelerate.framework provides a better speedup than parallelJobs on Apple platforms, assuming the kernel is compatible. We should use it. <rdar://problem/18434594>
Created attachment 242659 [details] Patch
Comment on attachment 242659 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=242659&action=review > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:33 > +#if PLATFORM(COCOA) && HAVE(ACCELERATE) Anders thinks this should just be HAVE(ACCELERATE) (and elsewhere) > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:290 > + vImageBoxConvolve_ARGB8888(&effectInBuffer, &effectOutBuffer, 0, 0, 0, radius, radius, 0, kvImageEdgeExtend); > + vImageBoxConvolve_ARGB8888(&effectOutBuffer, &effectInBuffer, 0, 0, 0, radius, radius, 0, kvImageEdgeExtend); > + vImageBoxConvolve_ARGB8888(&effectInBuffer, &effectOutBuffer, 0, 0, 0, radius, radius, 0, kvImageEdgeExtend); We should consider using a shared temp buffer here instead of allocating three. There are instructs in the docs for figuring out how big to make it. > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:304 > + else { > +#endif Can we write this differently so we don't have a brace inside an ifdef with its closing brace in a different one? > ManualTests/blur-filter-timing.html:1 > +<!DOCTYPE html> Should be in PerformanceTests/Interactive, I think? > ManualTests/blur-filter-timing.html:13 > + var WIDTH = 600; why so many caps?
Comment on attachment 242659 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=242659&action=review > Source/WebCore/ChangeLog:14 > + especially on retina devices. The problem was blur, and that we > + fixed a bug about 9 months ago that computed filters at the Would be good to reference the exact revision. > Source/WebCore/ChangeLog:34 > + FWIW, I also tried a CoreImage implementation, but the expense of > + copying back from the GPU was too high. When we have a fully accelerated > + filter chain (without needing compositing) I expect this approach > + will win. This belongs in the bug, but not in the changelog. > Source/WebCore/ChangeLog:36 > + There is a manual timing test: ManualTests/blur-filter-timing.html I think you should add a perf test. > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:274 > + // The accelerated box convolve approximation only works with an odd radius. > + if (radius % 2 != 1) > + radius += 1; So this doesn't exactly match the other impl? Is this visible?
Comment on attachment 242659 [details] Patch Attachment 242659 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.appspot.com/results/5812017482956800 New failing tests: svg/filters/svg-gaussianblur-edgeMode-duplicate.svg
Created attachment 242673 [details] Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: webkit-ews-10 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Comment on attachment 242659 [details] Patch Attachment 242659 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.appspot.com/results/6422366695456768 New failing tests: svg/filters/svg-gaussianblur-edgeMode-duplicate.svg
Created attachment 242690 [details] Archive of layout-test-results from webkit-ews-12 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: webkit-ews-12 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Comment on attachment 242659 [details] Patch Attachment 242659 [details] did not pass mac-ews (mac): Output: http://webkit-queues.appspot.com/results/5611548273803264 New failing tests: svg/filters/svg-gaussianblur-edgeMode-duplicate.svg
Created attachment 242695 [details] Archive of layout-test-results from webkit-ews-04 for mac-mountainlion The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: webkit-ews-04 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Comment on attachment 242659 [details] Patch Attachment 242659 [details] did not pass mac-ews (mac): Output: http://webkit-queues.appspot.com/results/6322940886908928 New failing tests: svg/filters/svg-gaussianblur-edgeMode-duplicate.svg
Created attachment 242701 [details] Archive of layout-test-results from webkit-ews-07 for mac-mountainlion The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: webkit-ews-07 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Comment on attachment 242659 [details] Patch I think Simon meant r-. Either way, updated patch is coming.
Comment on attachment 242659 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=242659&action=review >> Source/WebCore/ChangeLog:14 >> + fixed a bug about 9 months ago that computed filters at the > > Would be good to reference the exact revision. OK. >> Source/WebCore/ChangeLog:34 >> + will win. > > This belongs in the bug, but not in the changelog. OK. >> Source/WebCore/ChangeLog:36 >> + There is a manual timing test: ManualTests/blur-filter-timing.html > > I think you should add a perf test. OK. >> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:33 >> +#if PLATFORM(COCOA) && HAVE(ACCELERATE) > > Anders thinks this should just be HAVE(ACCELERATE) (and elsewhere) OK. >> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:274 >> + radius += 1; > > So this doesn't exactly match the other impl? Is this visible? This code comes from an example posted at WWDC by the Accelerate team (it was called UIImage+ImageEffects, but seems to have disappeared from the Apple site). In that example it is emulating a Gaussian blur, and references the SVG specification for how to do it with 3 box blurs. http://www.w3.org/TR/SVG/filters.html#feGaussianBlurElement That includes "... if d is odd, use three box-blurs of size 'd', centered on the output pixel." This is also the same approach that our software path uses, so we're ok here. >> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:290 >> + vImageBoxConvolve_ARGB8888(&effectInBuffer, &effectOutBuffer, 0, 0, 0, radius, radius, 0, kvImageEdgeExtend); > > We should consider using a shared temp buffer here instead of allocating three. There are instructs in the docs for figuring out how big to make it. Great point. >> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:304 >> +#endif > > Can we write this differently so we don't have a brace inside an ifdef with its closing brace in a different one? I tried a few ways to do this, but they were all slightly ugly. <rant>I have no idea how a method could be called platformApplyGeneric :) It makes no sense!</rant> I'll fix this up by splitting both the new code and the old code into different static methods, to see if that makes things cleaner. >> ManualTests/blur-filter-timing.html:13 >> + var WIDTH = 600; > > why so many caps? Because that's what I do in JS to represent constants :)
A similar approach could speed up some of the other filters, but blur is the one that has the biggest performance suck. FWIW, I also tried a CoreImage implementation, but the expense of copying back from the GPU was too high. When we have a fully accelerated filter chain (without needing compositing) I expect this approach will win.
Created attachment 242852 [details] Patch
Comment on attachment 242852 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=242852&action=review > Source/WebCore/ChangeLog:17 > + We were getting reports of page slowdown using CSS filters, > + especially on retina devices. The problem was blur, and that we > + fixed a bug about 7 months ago that computed filters at the > + correct resolution (r168577). When not composited, we were now operating > + on 4x the pixels, and performing an expensive operation multiple > + times. This is still a bit of a "sit down, let me tell you a story" > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:52 > +inline void kernelPosition(int boxBlur, unsigned& std, int& dLeft, int& dRight) std, dLeft, dRight are terrible names. > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:55 > + switch (boxBlur) { is boxBlur really the step? Could use a better name, like blurStep. > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:322 > + RefPtr<Uint8ClampedArray> tmpBuffer = Uint8ClampedArray::createUninitialized(tmpBufferSize); Can't the buffer just be a malloc?
Committed r177000: <http://trac.webkit.org/changeset/177000>
(In reply to comment #17) > Committed r177000: <http://trac.webkit.org/changeset/177000> The perf test Interactive/blur-filter-timing.html added by this commit is failing on ALL platforms except Apple Yosemite. GTK: https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20%28Perf%29 Apple MountainLion: https://build.webkit.org/builders/Apple%20MountainLion%20Release%20%28Perf%29 Apple Mavericks: https://build.webkit.org/builders/Apple%20Mavericks%20Release%20%28Perf%29 EFL: https://build.webkit.org/builders/EFL%20Linux%2064-bit%20Release%20WK2%20%28Perf%29 For example, on GTK it fails with: $ Tools/Scripts/run-perf-tests --no-show-results --platform gtk --release -2 Interactive/blur-filter-timing.html Running 1 tests Running Interactive/blur-filter-timing.html (1 of 1) ERROR: layer at (0,0) size 785x673 FAILED Finished: 2.328078 s Also, it seems is testing for a feature that is specific to Mac, so can we make it run only on the platforms where is supposed to work?
(In reply to comment #18) > (In reply to comment #17) > > Committed r177000: <http://trac.webkit.org/changeset/177000> > > The perf test Interactive/blur-filter-timing.html added by this commit is > failing on ALL platforms except Apple Yosemite. > > GTK: > https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20%28Perf%29 > Apple MountainLion: > https://build.webkit.org/builders/Apple%20MountainLion%20Release%20%28Perf%29 > Apple Mavericks: > https://build.webkit.org/builders/Apple%20Mavericks%20Release%20%28Perf%29 > EFL: > https://build.webkit.org/builders/EFL%20Linux%2064- > bit%20Release%20WK2%20%28Perf%29 > > For example, on GTK it fails with: > > $ Tools/Scripts/run-perf-tests --no-show-results --platform gtk --release -2 > Interactive/blur-filter-timing.html > Running 1 tests > Running Interactive/blur-filter-timing.html (1 of 1) > ERROR: layer at (0,0) size 785x673 > FAILED > Finished: 2.328078 s > > Also, it seems is testing for a feature that is specific to Mac, so can we > make it run only on the platforms where is supposed to work? Test fixed by: Bug 139462: Blur filter performance test doesn't provide results <https://bugs.webkit.org/show_bug.cgi?id=139462> <http://trac.webkit.org/changeset/177088>
Comment on attachment 242852 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=242852&action=review > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:381 > +#if HAVE(ACCELERATE) Did you test ARMv7? I'd be curious to know the difference between accelerate and Neon on ARMv7. The amortized perf of accelerate is okay but it tends to be bad on small loads. Here it is even worst because accelerate needs to go multiple time over the buffer. Is that really faster than the NEON intrinsics? ARM64 has AdvSIMD which is very close do Neon. It would be easy to make a fast implementation directly instead of using accelerate.