WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
91398
Fix blend filter for autovectorizing
https://bugs.webkit.org/show_bug.cgi?id=91398
Summary
Fix blend filter for autovectorizing
Allan Sandfeld Jensen
Reported
2012-07-16 09:05:32 PDT
The blend filter is currently written in such a way that it can not be autovectorized by the GCC auto-vectorizer, by removing conditions from the inner of the loop, GCC will be able to auto-vectorizing giving a large potential speedup on all platforms with SIMD instructions.
Attachments
Patch
(10.74 KB, patch)
2012-07-16 09:12 PDT
,
Allan Sandfeld Jensen
no flags
Details
Formatted Diff
Diff
Patch
(10.22 KB, patch)
2012-07-24 02:44 PDT
,
Allan Sandfeld Jensen
no flags
Details
Formatted Diff
Diff
Patch
(10.30 KB, patch)
2012-07-24 04:56 PDT
,
Allan Sandfeld Jensen
no flags
Details
Formatted Diff
Diff
Patch
(6.66 KB, patch)
2012-07-25 01:48 PDT
,
Allan Sandfeld Jensen
no flags
Details
Formatted Diff
Diff
Patch
(6.93 KB, patch)
2012-07-25 02:53 PDT
,
Allan Sandfeld Jensen
no flags
Details
Formatted Diff
Diff
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Allan Sandfeld Jensen
Comment 1
2012-07-16 09:12:30 PDT
Created
attachment 152547
[details]
Patch
Allan Sandfeld Jensen
Comment 2
2012-07-16 09:29:57 PDT
Bonus info I missed in the ChangeLog, even without auto-vectorization the improvements themselves yield a 1.5x speed-up.
Nikolas Zimmermann
Comment 3
2012-07-16 10:48:10 PDT
Comment on
attachment 152547
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=152547&action=review
> Source/WebCore/ChangeLog:10 > + To support auto-vectorizing, the loop had to be unswitched, and regular arrays used that > + did not do boundary-checks in the inner loop. Finally the integer division by 255 > + was optimized not use integer division intrinsics.
Nice speed-up, I'd like to hear Zoltans opinion on the patch as well, he's our master optimizer...
> Source/WebCore/platform/graphics/filters/FEBlend.cpp:68 > + // This is an approximate algorithm for division by 255, but it gives accurate results for 16bit values.
Perfect mixture of a what & why comment :-)
> Source/WebCore/platform/graphics/filters/FEBlend.cpp:114 > + for (unsigned pixelOffset = 0; pixelOffset < len; pixelOffset++) { > + unsigned char alphaA = sourcePixelA[3]; > + unsigned char alphaB = sourcePixelB[3]; > + for (unsigned channel = 0; channel < 3; ++channel) > + destinationPixel[channel] = normal(sourcePixelA[channel], sourcePixelB[channel], alphaA, alphaB); > + destinationPixel[3] = 255 - div255((255 - alphaA) * (255 - alphaB)); > + sourcePixelA += 4; > + sourcePixelB += 4; > + destinationPixel += 4; > + }
It should still be possible to share this in a static inline helper function w/o loss of performance, if you pass in the operation as function pointer (&normal, &multiply, etc.)
Allan Sandfeld Jensen
Comment 4
2012-07-16 11:08:04 PDT
(In reply to
comment #3
)
> > Source/WebCore/platform/graphics/filters/FEBlend.cpp:114 > > + for (unsigned pixelOffset = 0; pixelOffset < len; pixelOffset++) { > > + unsigned char alphaA = sourcePixelA[3]; > > + unsigned char alphaB = sourcePixelB[3]; > > + for (unsigned channel = 0; channel < 3; ++channel) > > + destinationPixel[channel] = normal(sourcePixelA[channel], sourcePixelB[channel], alphaA, alphaB); > > + destinationPixel[3] = 255 - div255((255 - alphaA) * (255 - alphaB)); > > + sourcePixelA += 4; > > + sourcePixelB += 4; > > + destinationPixel += 4; > > + } > > It should still be possible to share this in a static inline helper function w/o loss of performance, if you pass in the operation as function pointer (&normal, &multiply, etc.)
Maybe, I thought it would result in a call by function-pointer which would make vectorization impossible, but if the functions is fully inlined first it might work. Too bad we can not instantiate templates with functions (without by using wrapper classes).
Zoltan Herczeg
Comment 5
2012-07-16 11:13:07 PDT
Nice speedup. I agree with Niko's comments.
Allan Sandfeld Jensen
Comment 6
2012-07-24 02:44:56 PDT
Created
attachment 153999
[details]
Patch
Nikolas Zimmermann
Comment 7
2012-07-24 03:37:19 PDT
Comment on
attachment 153999
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=153999&action=review
Looks better, I still have some suggestions though:
> Source/WebCore/platform/graphics/filters/FEBlend.cpp:66 > +static inline unsigned char div255(uint16_t value)
nitpick: fastDivideBy255, to avoid abbreviations.
> Source/WebCore/platform/graphics/filters/FEBlend.cpp:124 > +template<typename BlendFunctor> > +static void platformApply(unsigned char* sourcePixelA, unsigned char* sourcePixelB, > + unsigned char* destinationPixel, unsigned pixelArrayLength)
I'm not sure why you need a templatified solution. I would have defined: typedef void (*BlendModefunction)(uchar colorA, uchar colorB, uchar alphaA, uchar alphaB) for the use in static void platformApply(uchar* sourcePixelA, uchar* sourcePixelB, uchar* destPixel, unsigned pixelArrayLength, BlendModeFunction blendModeFunction) { ... destinationPixel[0] = (*blendModeFunction)(sourcePixelA.... } This way you don't need to wrap the existing darken/lighten/.. methods in dummy classes. Please elaborate if I'm overlooking something which makes this impossible.
> Source/WebCore/platform/graphics/filters/FEBlend.cpp:160 > + platformApply<BlendUnknown>(sourcePixelA, sourcePixelB, destinationPixel, pixelArrayLength);
If you go with my proposed solution above, you can pass 0 as function-pointer here for the unknown case.
Allan Sandfeld Jensen
Comment 8
2012-07-24 04:18:25 PDT
(In reply to
comment #7
)
> (From update of
attachment 153999
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=153999&action=review
> > Looks better, I still have some suggestions though: > > > Source/WebCore/platform/graphics/filters/FEBlend.cpp:66 > > +static inline unsigned char div255(uint16_t value) > > nitpick: fastDivideBy255, to avoid abbreviations. >
Will do.
> I'm not sure why you need a templatified solution.
Because I want the functions to be inlined, if they are not inlined the loop that calls them can not be vectorized, and a function-pointer is a run-time parameter, while template parameters are compile-time parameters. We need the function to be fully defined at compile time for the compiler to be able to take advantage of every possible optimization.
> > Source/WebCore/platform/graphics/filters/FEBlend.cpp:160 > > + platformApply<BlendUnknown>(sourcePixelA, sourcePixelB, destinationPixel, pixelArrayLength); > > If you go with my proposed solution above, you can pass 0 as function-pointer here for the unknown case.
I don't see what that would gain. We would need to special case the null-pointer function anyway or it would segfault when called.
Nikolas Zimmermann
Comment 9
2012-07-24 04:27:45 PDT
(In reply to
comment #8
)
> Because I want the functions to be inlined, if they are not inlined the loop that calls them can not be vectorized, and a function-pointer is a run-time parameter, while template parameters are compile-time parameters. We need the function to be fully defined at compile time for the compiler to be able to take advantage of every possible optimization.
Ouch, you're right, I take back the criticism. If you upload a new patch I'll r+ it.
> I don't see what that would gain. We would need to special case the null-pointer function anyway or it would segfault when called.
I thought of special casing, but it's really not worth it, especially if you can't use it anyways.
Allan Sandfeld Jensen
Comment 10
2012-07-24 04:56:22 PDT
Created
attachment 154021
[details]
Patch
Nikolas Zimmermann
Comment 11
2012-07-24 05:33:14 PDT
Comment on
attachment 154021
[details]
Patch Thanks Allan, r=me.
WebKit Review Bot
Comment 12
2012-07-24 06:38:05 PDT
Comment on
attachment 154021
[details]
Patch Clearing flags on attachment: 154021 Committed
r123467
: <
http://trac.webkit.org/changeset/123467
>
WebKit Review Bot
Comment 13
2012-07-24 06:38:09 PDT
All reviewed patches have been landed. Closing bug.
Nikolas Zimmermann
Comment 14
2012-07-25 00:15:58 PDT
(In reply to
comment #9
)
> (In reply to
comment #8
) > > Because I want the functions to be inlined, if they are not inlined the loop that calls them can not be vectorized, and a function-pointer is a run-time parameter, while template parameters are compile-time parameters. We need the function to be fully defined at compile time for the compiler to be able to take advantage of every possible optimization. > Ouch, you're right, I take back the criticism. If you upload a new patch I'll r+ it.
Hm, I slept a night over this. What you could have done is: template<void myFunction()> void doSomething() { myFunction(); } This still allows the compile-time optimizations, but avoids the classes wrapping apply() functions. What do you think?
Allan Sandfeld Jensen
Comment 15
2012-07-25 00:43:41 PDT
(In reply to
comment #14
)
> (In reply to
comment #9
) > > (In reply to
comment #8
) > > > Because I want the functions to be inlined, if they are not inlined the loop that calls them can not be vectorized, and a function-pointer is a run-time parameter, while template parameters are compile-time parameters. We need the function to be fully defined at compile time for the compiler to be able to take advantage of every possible optimization. > > Ouch, you're right, I take back the criticism. If you upload a new patch I'll r+ it. > > Hm, I slept a night over this. What you could have done is: > > template<void myFunction()> > void doSomething() { myFunction(); } > > This still allows the compile-time optimizations, but avoids the classes wrapping apply() functions. > What do you think?
This is what I wanted to do, but I have tried the same approach before in other contexts, and it appears functions are not allowed as template argument for some reason. Maybe it would work in C++11 though.
Nikolas Zimmermann
Comment 16
2012-07-25 00:51:33 PDT
(In reply to
comment #15
)
> This is what I wanted to do, but I have tried the same approach before in other contexts, and it appears functions are not allowed as template argument for some reason. Maybe it would work in C++11 though.
I'm quite sure that is a valid c++ construct, even before c++11. A quick grep shows this is actually used in our HTML5 parser: class HTMLTreeBuilder::ExternalCharacterTokenBuffer { WTF_MAKE_NONCOPYABLE(ExternalCharacterTokenBuffer); public: ... String takeLeadingWhitespace() { return takeLeading<isHTMLSpace>(); } .. template<bool characterPredicate(UChar)> void skipLeading() { ASSERT(!isEmpty()); while (characterPredicate(*m_current)) { if (++m_current == m_end) return; } } template<bool characterPredicate(UChar)> String takeLeading() { ASSERT(!isEmpty()); const UChar* start = m_current; skipLeading<characterPredicate>(); if (start == m_current) return String(); return String(start, m_current - start); } ... Can you give it another try?
Allan Sandfeld Jensen
Comment 17
2012-07-25 01:48:46 PDT
Created
attachment 154292
[details]
Patch You are right, this actually works. I wonder what has changed, or what I did wrong last time I tried this.
Allan Sandfeld Jensen
Comment 18
2012-07-25 01:50:29 PDT
(In reply to
comment #17
)
> Created an attachment (id=154292) [details] > Patch > > You are right, this actually works. I wonder what has changed, or what I did wrong last time I tried this.
Well, static functions are not allowed. So I had to removed the static argument from the functions. This might could cause trouble with all-in-one compilations, if it does we need to add an extra namespace to these functions.
Nikolas Zimmermann
Comment 19
2012-07-25 02:22:52 PDT
Comment on
attachment 154292
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=154292&action=review
I'd r+ this as-is, though it doesn't apply on the bots. Still r=me, please watch the bots to make sure it builds everywhere. Thanks Allan, I'm now fully satisfied with your solution!
> Source/WebCore/platform/graphics/filters/FEBlend.cpp:74 > +inline unsigned char normal(unsigned char colorA, unsigned char colorB, unsigned char alphaA, unsigned char)
You could just rename this to "feBlendNormal", this will reduce the likelihood of clashes in all-in-one builds a lot.
Nikolas Zimmermann
Comment 20
2012-07-25 02:23:27 PDT
(In reply to
comment #17
)
> Created an attachment (id=154292) [details] > Patch > > You are right, this actually works. I wonder what has changed, or what I did wrong last time I tried this.
Your patch is 6.66k, evil, but works ;-)
Allan Sandfeld Jensen
Comment 21
2012-07-25 02:49:50 PDT
Reopen to get the EWS to check the patch.
Allan Sandfeld Jensen
Comment 22
2012-07-25 02:53:48 PDT
Created
attachment 154307
[details]
Patch
Allan Sandfeld Jensen
Comment 23
2012-07-25 04:01:05 PDT
Comment on
attachment 154307
[details]
Patch Builds on all platforms, so please land it.
WebKit Review Bot
Comment 24
2012-07-25 04:16:10 PDT
Comment on
attachment 154307
[details]
Patch Clearing flags on attachment: 154307 Committed
r123602
: <
http://trac.webkit.org/changeset/123602
>
WebKit Review Bot
Comment 25
2012-07-25 04:16:15 PDT
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug