Bug 139310 - [Apple] Use Accelerate framework to speed-up FEGaussianBlur
Summary: [Apple] Use Accelerate framework to speed-up FEGaussianBlur
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Layout and Rendering (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Dean Jackson
URL:
Keywords: InRadar
Depends on:
Blocks: 139462
  Show dependency treegraph
 
Reported: 2014-12-05 13:19 PST by Dean Jackson
Modified: 2014-12-24 13:44 PST (History)
11 users (show)

See Also:


Attachments
Patch (12.88 KB, patch)
2014-12-05 14:01 PST, Dean Jackson
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2 (572.95 KB, application/zip)
2014-12-05 16:19 PST, Build Bot
no flags Details
Archive of layout-test-results from webkit-ews-12 for mac-mountainlion-wk2 (511.67 KB, application/zip)
2014-12-05 18:06 PST, Build Bot
no flags Details
Archive of layout-test-results from webkit-ews-04 for mac-mountainlion (588.94 KB, application/zip)
2014-12-05 19:24 PST, Build Bot
no flags Details
Archive of layout-test-results from webkit-ews-07 for mac-mountainlion (652.90 KB, application/zip)
2014-12-05 21:38 PST, Build Bot
no flags Details
Patch (15.89 KB, patch)
2014-12-08 15:04 PST, Dean Jackson
simon.fraser: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dean Jackson 2014-12-05 13:19:45 PST
Accelerate.framework provides a better speedup than parallelJobs on Apple platforms, assuming the kernel is compatible. We should use it.

<rdar://problem/18434594>
Comment 1 Dean Jackson 2014-12-05 14:01:26 PST
Created attachment 242659 [details]
Patch
Comment 2 Tim Horton 2014-12-05 14:28:03 PST
Comment on attachment 242659 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=242659&action=review

> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:33
> +#if PLATFORM(COCOA) && HAVE(ACCELERATE)

Anders thinks this should just be HAVE(ACCELERATE) (and elsewhere)

> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:290
> +    vImageBoxConvolve_ARGB8888(&effectInBuffer, &effectOutBuffer, 0, 0, 0, radius, radius, 0, kvImageEdgeExtend);
> +    vImageBoxConvolve_ARGB8888(&effectOutBuffer, &effectInBuffer, 0, 0, 0, radius, radius, 0, kvImageEdgeExtend);
> +    vImageBoxConvolve_ARGB8888(&effectInBuffer, &effectOutBuffer, 0, 0, 0, radius, radius, 0, kvImageEdgeExtend);

We should consider using a shared temp buffer here instead of allocating three. There are instructs in the docs for figuring out how big to make it.

> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:304
> +    else {
> +#endif

Can we write this differently so we don't have a brace inside an ifdef with its closing brace in a different one?

> ManualTests/blur-filter-timing.html:1
> +<!DOCTYPE html>

Should be in PerformanceTests/Interactive, I think?

> ManualTests/blur-filter-timing.html:13
> +  var WIDTH = 600;

why so many caps?
Comment 3 Simon Fraser (smfr) 2014-12-05 14:28:18 PST
Comment on attachment 242659 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=242659&action=review

> Source/WebCore/ChangeLog:14
> +        especially on retina devices. The problem was blur, and that we
> +        fixed a bug about 9 months ago that computed filters at the

Would be good to reference the exact revision.

> Source/WebCore/ChangeLog:34
> +        FWIW, I also tried a CoreImage implementation, but the expense of
> +        copying back from the GPU was too high. When we have a fully accelerated
> +        filter chain (without needing compositing) I expect this approach
> +        will win.

This belongs in the bug, but not in the changelog.

> Source/WebCore/ChangeLog:36
> +        There is a manual timing test: ManualTests/blur-filter-timing.html

I think you should add a perf test.

> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:274
> +    // The accelerated box convolve approximation only works with an odd radius.
> +    if (radius % 2 != 1)
> +        radius += 1;

So this doesn't exactly match the other impl? Is this visible?
Comment 4 Build Bot 2014-12-05 16:19:26 PST
Comment on attachment 242659 [details]
Patch

Attachment 242659 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/5812017482956800

New failing tests:
svg/filters/svg-gaussianblur-edgeMode-duplicate.svg
Comment 5 Build Bot 2014-12-05 16:19:30 PST
Created attachment 242673 [details]
Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-10  Port: mac-mountainlion-wk2  Platform: Mac OS X 10.8.5
Comment 6 Build Bot 2014-12-05 18:06:39 PST
Comment on attachment 242659 [details]
Patch

Attachment 242659 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/6422366695456768

New failing tests:
svg/filters/svg-gaussianblur-edgeMode-duplicate.svg
Comment 7 Build Bot 2014-12-05 18:06:43 PST
Created attachment 242690 [details]
Archive of layout-test-results from webkit-ews-12 for mac-mountainlion-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-12  Port: mac-mountainlion-wk2  Platform: Mac OS X 10.8.5
Comment 8 Build Bot 2014-12-05 19:24:38 PST
Comment on attachment 242659 [details]
Patch

Attachment 242659 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/5611548273803264

New failing tests:
svg/filters/svg-gaussianblur-edgeMode-duplicate.svg
Comment 9 Build Bot 2014-12-05 19:24:42 PST
Created attachment 242695 [details]
Archive of layout-test-results from webkit-ews-04 for mac-mountainlion

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-04  Port: mac-mountainlion  Platform: Mac OS X 10.8.5
Comment 10 Build Bot 2014-12-05 21:38:32 PST
Comment on attachment 242659 [details]
Patch

Attachment 242659 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/6322940886908928

New failing tests:
svg/filters/svg-gaussianblur-edgeMode-duplicate.svg
Comment 11 Build Bot 2014-12-05 21:38:35 PST
Created attachment 242701 [details]
Archive of layout-test-results from webkit-ews-07 for mac-mountainlion

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-07  Port: mac-mountainlion  Platform: Mac OS X 10.8.5
Comment 12 Dean Jackson 2014-12-08 11:52:11 PST
Comment on attachment 242659 [details]
Patch

I think Simon meant r-. Either way, updated patch is coming.
Comment 13 Dean Jackson 2014-12-08 12:00:11 PST
Comment on attachment 242659 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=242659&action=review

>> Source/WebCore/ChangeLog:14
>> +        fixed a bug about 9 months ago that computed filters at the
> 
> Would be good to reference the exact revision.

OK.

>> Source/WebCore/ChangeLog:34
>> +        will win.
> 
> This belongs in the bug, but not in the changelog.

OK.

>> Source/WebCore/ChangeLog:36
>> +        There is a manual timing test: ManualTests/blur-filter-timing.html
> 
> I think you should add a perf test.

OK.

>> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:33
>> +#if PLATFORM(COCOA) && HAVE(ACCELERATE)
> 
> Anders thinks this should just be HAVE(ACCELERATE) (and elsewhere)

OK.

>> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:274
>> +        radius += 1;
> 
> So this doesn't exactly match the other impl? Is this visible?

This code comes from an example posted at WWDC by the Accelerate team (it was called UIImage+ImageEffects, but seems to have disappeared from the Apple site).

In that example it is emulating a Gaussian blur, and references the SVG specification for how to do it with 3 box blurs. http://www.w3.org/TR/SVG/filters.html#feGaussianBlurElement

That includes "... if d is odd, use three box-blurs of size 'd', centered on the output pixel."

This is also the same approach that our software path uses, so we're ok here.

>> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:290
>> +    vImageBoxConvolve_ARGB8888(&effectInBuffer, &effectOutBuffer, 0, 0, 0, radius, radius, 0, kvImageEdgeExtend);
> 
> We should consider using a shared temp buffer here instead of allocating three. There are instructs in the docs for figuring out how big to make it.

Great point.

>> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:304
>> +#endif
> 
> Can we write this differently so we don't have a brace inside an ifdef with its closing brace in a different one?

I tried a few ways to do this, but they were all slightly ugly.

<rant>I have no idea how a method could be called platformApplyGeneric :) It makes no sense!</rant>

I'll fix this up by splitting both the new code and the old code into different static methods, to see if that makes things cleaner.

>> ManualTests/blur-filter-timing.html:13
>> +  var WIDTH = 600;
> 
> why so many caps?

Because that's what I do in JS to represent constants :)
Comment 14 Dean Jackson 2014-12-08 14:57:40 PST
        A similar approach could speed up some of the other filters, but
        blur is the one that has the biggest performance suck.

        FWIW, I also tried a CoreImage implementation, but the expense of
        copying back from the GPU was too high. When we have a fully accelerated
        filter chain (without needing compositing) I expect this approach
        will win.
Comment 15 Dean Jackson 2014-12-08 15:04:32 PST
Created attachment 242852 [details]
Patch
Comment 16 Simon Fraser (smfr) 2014-12-08 16:56:02 PST
Comment on attachment 242852 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=242852&action=review

> Source/WebCore/ChangeLog:17
> +        We were getting reports of page slowdown using CSS filters,
> +        especially on retina devices. The problem was blur, and that we
> +        fixed a bug about 7 months ago that computed filters at the
> +        correct resolution (r168577). When not composited, we were now operating
> +        on 4x the pixels, and performing an expensive operation multiple
> +        times.

This is still a bit of a "sit down, let me tell you a story"

> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:52
> +inline void kernelPosition(int boxBlur, unsigned& std, int& dLeft, int& dRight)

std, dLeft, dRight are terrible names.

> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:55
> +    switch (boxBlur) {

is boxBlur really the step? Could use a better name, like blurStep.

> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:322
> +    RefPtr<Uint8ClampedArray> tmpBuffer = Uint8ClampedArray::createUninitialized(tmpBufferSize);

Can't the buffer just be a malloc?
Comment 17 Dean Jackson 2014-12-08 17:31:41 PST
Committed r177000: <http://trac.webkit.org/changeset/177000>
Comment 18 Carlos Alberto Lopez Perez 2014-12-10 05:46:25 PST
(In reply to comment #17)
> Committed r177000: <http://trac.webkit.org/changeset/177000>

The perf test Interactive/blur-filter-timing.html added by this commit is failing on ALL platforms except Apple Yosemite.

GTK: https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20%28Perf%29
Apple MountainLion: https://build.webkit.org/builders/Apple%20MountainLion%20Release%20%28Perf%29
Apple Mavericks: https://build.webkit.org/builders/Apple%20Mavericks%20Release%20%28Perf%29
EFL: https://build.webkit.org/builders/EFL%20Linux%2064-bit%20Release%20WK2%20%28Perf%29

For example, on GTK it fails with:

$ Tools/Scripts/run-perf-tests --no-show-results --platform gtk --release -2 Interactive/blur-filter-timing.html
Running 1 tests
Running Interactive/blur-filter-timing.html (1 of 1)
ERROR: layer at (0,0) size 785x673
FAILED
Finished: 2.328078 s

Also, it seems is testing for a feature that is specific to Mac, so can we make it run only on the platforms where is supposed to work?
Comment 19 David Kilzer (:ddkilzer) 2014-12-23 22:30:23 PST
(In reply to comment #18)
> (In reply to comment #17)
> > Committed r177000: <http://trac.webkit.org/changeset/177000>
> 
> The perf test Interactive/blur-filter-timing.html added by this commit is
> failing on ALL platforms except Apple Yosemite.
> 
> GTK:
> https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20%28Perf%29
> Apple MountainLion:
> https://build.webkit.org/builders/Apple%20MountainLion%20Release%20%28Perf%29
> Apple Mavericks:
> https://build.webkit.org/builders/Apple%20Mavericks%20Release%20%28Perf%29
> EFL:
> https://build.webkit.org/builders/EFL%20Linux%2064-
> bit%20Release%20WK2%20%28Perf%29
> 
> For example, on GTK it fails with:
> 
> $ Tools/Scripts/run-perf-tests --no-show-results --platform gtk --release -2
> Interactive/blur-filter-timing.html
> Running 1 tests
> Running Interactive/blur-filter-timing.html (1 of 1)
> ERROR: layer at (0,0) size 785x673
> FAILED
> Finished: 2.328078 s
> 
> Also, it seems is testing for a feature that is specific to Mac, so can we
> make it run only on the platforms where is supposed to work?

Test fixed by:

Bug 139462: Blur filter performance test doesn't provide results
<https://bugs.webkit.org/show_bug.cgi?id=139462>
<http://trac.webkit.org/changeset/177088>
Comment 20 Benjamin Poulain 2014-12-24 13:44:43 PST
Comment on attachment 242852 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=242852&action=review

> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:381
> +#if HAVE(ACCELERATE)

Did you test ARMv7?
I'd be curious to know the difference between accelerate and Neon on ARMv7.

The amortized perf of accelerate is okay but it tends to be bad on small loads. Here it is even worst because accelerate needs to go multiple time over the buffer.
Is that really faster than the NEON intrinsics?

ARM64 has AdvSIMD which is very close do Neon. It would be easy to make a fast implementation directly instead of using accelerate.