Bug 50881 - Optimize FEGaussian blur
Summary: Optimize FEGaussian blur
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: SVG (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Simon Fraser (smfr)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-11 19:06 PST by Simon Fraser (smfr)
Modified: 2014-03-21 12:03 PDT (History)
13 users (show)

See Also:


Attachments
Patch (11.53 KB, patch)
2010-12-11 19:14 PST, Simon Fraser (smfr)
no flags Details | Formatted Diff | Diff
Patch (11.43 KB, patch)
2010-12-11 19:42 PST, Simon Fraser (smfr)
no flags Details | Formatted Diff | Diff
Patch (11.42 KB, patch)
2014-03-20 15:02 PDT, Adenilson Cavalcanti Silva
no flags Details | Formatted Diff | Diff
6 executions (3 vanilla, 3 patched) (72.65 KB, image/png)
2014-03-20 15:05 PDT, Adenilson Cavalcanti Silva
no flags Details
filters (css3/svg) and fast tests results in Vanilla (trunk of today) (5.37 KB, text/plain)
2014-03-21 10:57 PDT, Adenilson Cavalcanti Silva
no flags Details
Patched resultsfilters (css3/svg) and fast tests results (5.76 KB, text/plain)
2014-03-21 10:58 PDT, Adenilson Cavalcanti Silva
no flags Details
Patch (11.51 KB, patch)
2014-03-21 11:20 PDT, Adenilson Cavalcanti Silva
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Fraser (smfr) 2010-12-11 19:06:32 PST
I've been playing with FEGaussian blur for CSS shadows, and have some optimizations.
Comment 1 Simon Fraser (smfr) 2010-12-11 19:14:43 PST
Created attachment 76311 [details]
Patch
Comment 2 WebKit Review Bot 2010-12-11 19:20:36 PST
Attachment 76311 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/7000058
Comment 3 Simon Fraser (smfr) 2010-12-11 19:42:17 PST
Created attachment 76312 [details]
Patch
Comment 4 Build Bot 2010-12-11 20:00:39 PST
Attachment 76311 [details] did not build on win:
Build output: http://queues.webkit.org/results/6918061
Comment 5 Simon Fraser (smfr) 2010-12-11 20:57:18 PST
Using the fixed-point math (BlurSumShift) from ContextShadow gets us to 2x performance, with some very minor pixel differences.
Comment 6 Sam Weinig 2010-12-11 22:39:12 PST
Comment on attachment 76312 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=76312&action=review

> WebCore/ChangeLog:15
> +        These changes giave a 1.42x performance improvement.

Typo: giave.

> WebCore/platform/graphics/filters/FEGaussianBlur.cpp:133
> +inline void boxBlurAlphaOnly(ByteArray* srcPixelArray, ByteArray* dstPixelArray,
> +                    unsigned dx, int dxLeft, int dxRight, int stride, int strideLine, int effectWidth, int effectHeight)

Weird alignment here.
Comment 7 Zoltan Herczeg 2010-12-12 01:35:11 PST
How and on what test case you measured the perf impovement? You could also add a new WebCore/manual-test.
Comment 8 Dirk Schulze 2010-12-12 08:35:37 PST
I'd also like to see a perf test for gaussian blur in general, not only for this patch.

(In reply to comment #5)
> Using the fixed-point math (BlurSumShift) from ContextShadow gets us to 2x performance, with some very minor pixel differences.

Did your last patch still cause some pixel differences? Alex and others spend a lot of time in making FEGaussianBlur as compatible and comparable  as possible with other viewers. Would like to know if the difference is noticeable.
Even so, can you update the pixel results please?

PS: Thanks for working on filters :-).
Comment 9 Simon Fraser (smfr) 2010-12-12 09:08:04 PST
The patch posted here is math-identical with the current code.

I'll update pixel results if I post another patch with the fixed-point math.
Comment 10 Simon Fraser (smfr) 2010-12-12 09:09:26 PST
I was measuring perf with some custom code that blurred a 382x382 rectangle.
Comment 11 Eric Seidel (no email) 2010-12-14 15:14:20 PST
Attachment 76312 [details] was posted by a committer and has review+, assigning to Simon Fraser for commit.
Comment 12 Dirk Schulze 2011-05-22 10:10:50 PDT
Any news on this bug? Simon, do you land the patch?
Comment 13 Zoltan Herczeg 2011-05-22 10:36:56 PDT
I doubt it is still apply...
Comment 14 Darin Adler 2011-06-18 12:44:50 PDT
Comment on attachment 76312 [details]
Patch

Clearing review flag on this old patch.
Comment 15 Simon Fraser (smfr) 2013-09-05 21:27:56 PDT
Bah, I pretty much rewrote this patch from scratch today.
Comment 16 Dirk Schulze 2013-09-05 23:36:26 PDT
(In reply to comment #15)
> Bah, I pretty much rewrote this patch from scratch today.

Where is this patch? Can we mark this one as duplicated or do you reuse this bug report?
Comment 17 Zoltan Herczeg 2013-09-05 23:50:41 PDT
Looks nice.

I am just thinking that the following part:

// Fill the kernel
for (int i = 0; i < maxKernelSize; ++i) {
    unsigned offset = line + i * stride;
    unsigned char* srcPtr = srcPixelArray->data() + offset;
    sumR += *srcPtr++;
    sumG += *srcPtr++;
    sumB += *srcPtr++;
    sumA += *srcPtr;
}

could be changed to:

unsigned char* srcPtr = srcPixelArray->data() + line;
for (int i = 0; i < maxKernelSize; ++i, srcPtr += stride) {
    sumR += srcPtr[0];
    sumG += srcPtr[1];
    sumB += srcPtr[2];
    sumA += srcPtr[3];
} 

or:
unsigned char* srcPtr = srcPixelArray->data() + line;
unsigned char* end = srcPtr + maxKernelSize * stride;
while (srcPtr < end) {
    sumR += srcPtr[0];
    sumG += srcPtr[1];
    sumB += srcPtr[2];
    sumA += srcPtr[3];
    srcPtr += stride
}

Something similar could be done in the internal loop.
Comment 18 Simon Fraser (smfr) 2013-09-06 08:55:28 PDT
The real killer right now is bug 120813, which swamps any other optimization twiddling we try to do in this function.
Comment 19 Adenilson Cavalcanti Silva 2014-03-20 15:02:48 PDT
Created attachment 227342 [details]
Patch
Comment 20 Adenilson Cavalcanti Silva 2014-03-20 15:03:44 PDT
I observed an improvement of 7 to 10% in SVG perf test MtSaintHelens.
Comment 21 Adenilson Cavalcanti Silva 2014-03-20 15:05:07 PDT
Created attachment 227343 [details]
6 executions (3 vanilla, 3 patched)
Comment 22 Dirk Schulze 2014-03-21 10:02:45 PDT
Comment on attachment 227342 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=227342&action=review

smfr or dino, do you have further input?

> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:94
> +inline void boxBlurAlphaOnly(Uint8ClampedArray* srcPixelArray, Uint8ClampedArray* dstPixelArray,

is it really not possible to combine that with the code following? Looks sad that we have a separate code path just for SourceAlpha cases.

Ah, looked at the following code again... So you save iterating 4 times through the image data.

> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:142
> +    if (alphaImage) {
> +        return boxBlurAlphaOnly(srcPixelArray, dstPixelArray, dx, dxLeft, dxRight, stride, strideLine,
> +            effectWidth, effectHeight, maxKernelSize);
> +    }

You can omit braces here.

> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:225
> +

no unnecessary new lines

> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:244
> +

no unnecessary new lines

> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:259
> +

Ditto.
Comment 23 Simon Fraser (smfr) 2014-03-21 10:09:20 PDT
Comment on attachment 227342 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=227342&action=review

> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:95
> +inline void boxBlurAlphaOnly(Uint8ClampedArray* srcPixelArray, Uint8ClampedArray* dstPixelArray,
> +    unsigned dx, int& dxLeft, int& dxRight, int& stride, int& strideLine, int& effectWidth, int& effectHeight, const int& maxKernelSize)

The Uint8ClampedArray* should be Uint8ClampedArray& (in the caller too).
Comment 24 Dirk Schulze 2014-03-21 10:18:02 PDT
(In reply to comment #23)
> (From update of attachment 227342 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=227342&action=review
> 
> > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:95
> > +inline void boxBlurAlphaOnly(Uint8ClampedArray* srcPixelArray, Uint8ClampedArray* dstPixelArray,
> > +    unsigned dx, int& dxLeft, int& dxRight, int& stride, int& strideLine, int& effectWidth, int& effectHeight, const int& maxKernelSize)
> 
> The Uint8ClampedArray* should be Uint8ClampedArray& (in the caller too).

Adenilson investigates in a potential regression. If it turns out to be a false negative r=me as well. Otherwise we need to investigate the reason first.
Comment 25 Adenilson Cavalcanti Silva 2014-03-21 10:57:01 PDT
Created attachment 227457 [details]
filters (css3/svg) and fast tests results in Vanilla (trunk of today)
Comment 26 Adenilson Cavalcanti Silva 2014-03-21 10:58:00 PDT
Created attachment 227459 [details]
Patched resultsfilters (css3/svg) and fast tests results
Comment 27 Adenilson Cavalcanti Silva 2014-03-21 11:20:59 PDT
Created attachment 227465 [details]
Patch
Comment 28 WebKit Commit Bot 2014-03-21 12:03:03 PDT
Comment on attachment 227465 [details]
Patch

Clearing flags on attachment: 227465

Committed r166084: <http://trac.webkit.org/changeset/166084>
Comment 29 WebKit Commit Bot 2014-03-21 12:03:11 PDT
All reviewed patches have been landed.  Closing bug.