Bug 59447 - Optimizing gaussian blur filter to ARM-neon SIMD instruction set
Summary: Optimizing gaussian blur filter to ARM-neon SIMD instruction set
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: SVG (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC All
: P2 Normal
Assignee: Zoltan Herczeg
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-26 04:51 PDT by Zoltan Herczeg
Modified: 2011-04-28 07:19 PDT (History)
12 users (show)

See Also:


Attachments
patch (26.34 KB, patch)
2011-04-26 05:07 PDT, Zoltan Herczeg
krit: review-
Details | Formatted Diff | Diff
clever patch (33.54 KB, patch)
2011-04-27 00:44 PDT, Zoltan Herczeg
no flags Details | Formatted Diff | Diff
next try (57.54 KB, patch)
2011-04-27 04:15 PDT, Zoltan Herczeg
no flags Details | Formatted Diff | Diff
mac fix (63.69 KB, patch)
2011-04-27 07:05 PDT, Zoltan Herczeg
no flags Details | Formatted Diff | Diff
final patch (65.31 KB, patch)
2011-04-27 23:54 PDT, Zoltan Herczeg
no flags Details | Formatted Diff | Diff
patch (65.37 KB, patch)
2011-04-28 03:56 PDT, Zoltan Herczeg
zimmermann: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zoltan Herczeg 2011-04-26 04:51:09 PDT
Faster shadow casting.
Comment 1 Zoltan Herczeg 2011-04-26 05:07:10 PDT
Created attachment 91092 [details]
patch

More arm magic.
Comment 2 WebKit Review Bot 2011-04-26 05:09:45 PDT
Attachment 91092 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1

Source/WebCore/platform/graphics/filters/arm/FEGaussianBlurNEON.cpp:182:  The parameter name "NL" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 1 in 8 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Dirk Schulze 2011-04-26 05:40:00 PDT
Comment on attachment 91092 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=91092&action=review

> Source/WebCore/ChangeLog:10
> +        This patch contains two sub-routines, one for speeding up
> +        alpha channel only gaussian blur (by 2.5 times) and one
> +        for speeding up all channel blur (by 4 times).

I wish you could move more code to the NEON files. I fear that we run into the same disaster like SVG and HTML Canvas were three years ago. Everyone is adding his platform dependent code to the main file and no one is able to work with the code. What do you think about a similar  design like GraphicsContext. GraphicsContext.cpp/.h interacts as Layer between WebCore and the platform dependent code. E.g. GraphicsContext::setStroke calls setPlatformStroke and so on. In your case the apply function could call platformApply. Every code that could be shared would still be part of apply or other functions in FEGaussianBlur. The rest moves to <platform>/FEGaussianBlur<platform>.cpp
This way we could avoid all the #if PLATFORM clauses in FEGaussianBlue.cpp. What do you think?

> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:202
> +    int widthMultipliedByFour = 4 * paintSize.width();

this can be const
Comment 4 Zoltan Herczeg 2011-04-26 06:06:45 PDT
> I wish you could move more code to the NEON files. I fear that we run into the same disaster like SVG and HTML Canvas were three years ago. Everyone is adding his platform dependent code to the main file and no one is able to work with the code. What do you think about a similar  design like GraphicsContext. GraphicsContext.cpp/.h interacts as Layer between WebCore and the platform dependent code. E.g. GraphicsContext::setStroke calls setPlatformStroke and so on. In your case the apply function could call platformApply. Every code that could be shared would still be part of apply or other functions in FEGaussianBlur. The rest moves to <platform>/FEGaussianBlur<platform>.cpp
> This way we could avoid all the #if PLATFORM clauses in FEGaussianBlue.cpp. What do you think?

I like the idea although I have no idea how to implement it. We usually replace something in the middle of a big chunk of code, which depend on several local variables. Would be good if we could use inline functions... And perhaps dynamic CPU feature detection (later).

So would it be a good idea to move the current code into an inline functions? Or how?
Comment 5 Nikolas Zimmermann 2011-04-26 10:59:28 PDT
(In reply to comment #4)
> > I wish you could move more code to the NEON files. I fear that we run into the same disaster like SVG and HTML Canvas were three years ago. Everyone is adding his platform dependent code to the main file and no one is able to work with the code. What do you think about a similar  design like GraphicsContext. GraphicsContext.cpp/.h interacts as Layer between WebCore and the platform dependent code. E.g. GraphicsContext::setStroke calls setPlatformStroke and so on. In your case the apply function could call platformApply. Every code that could be shared would still be part of apply or other functions in FEGaussianBlur. The rest moves to <platform>/FEGaussianBlur<platform>.cpp
> > This way we could avoid all the #if PLATFORM clauses in FEGaussianBlue.cpp. What do you think?

I think Dirks idea is very wise, we should go that way.

> 
> I like the idea although I have no idea how to implement it. We usually replace something in the middle of a big chunk of code, which depend on several local variables. Would be good if we could use inline functions... And perhaps dynamic CPU feature detection (later).
I'd say we just add a "void platformApply(xxx)" method, in eg. FEGaussianBlur.h where xxx are the needed arguments, which shall be passed from the apply() method -> apply() should call plaformApply(xxx).
In your patch, the first ifdef section (for CPU(NEON)), would go into FEGaussianBlurNEON.cpp, and the #else part into FEGaussianBlurDefault.cpp. The Neon version compiles in FEGaussianBlur.cpp & FEGaussianBlurNeon.cpp, and the non-Neon builds FEGaussiaunBlur.cpp & FEGaussianBlurDefault.cpp.

> So would it be a good idea to move the current code into an inline functions? Or how?
Does my approach make sense?
Comment 6 Eric Seidel (no email) 2011-04-26 15:14:05 PDT
Krit, would you please review this today as part of the reviewathon?
Comment 7 Dirk Schulze 2011-04-26 15:19:46 PDT
Comment on attachment 91092 [details]
patch

I'm more for discussing patches if there is something to discuss  than giving r- and discuss it afterwards. But I fear Eric ;-). r- See comments of Niko and me.
Comment 8 Eric Seidel (no email) 2011-04-26 18:28:48 PDT
Bwahahaha!

Thanks Dirk! :)
Comment 9 Zoltan Herczeg 2011-04-26 21:56:06 PDT
> > So would it be a good idea to move the current code into an inline functions? Or how?
> Does my approach make sense?

I fear the code duplications and performance losses. For example, the non-NEON lighting filter code is a simple loop, since all the utility functions are available. If I would move it to a separate directory I would have to duplicite them, since we only speed-up the internal pixel drawing. The border pixels are still drawn in a common way (and those loops using those common functions). If I would make them non-inline (to avoid code duplication), it would be a huge performance loss. We want to speed up drawing not slowing it down :)

I am totally agree about that we need to avoid ugly code, but unlike GraphicsContext, we have a lot of common core code we want to share among the platforms.
Comment 10 Nikolas Zimmermann 2011-04-26 23:46:04 PDT
(In reply to comment #9)
> > > So would it be a good idea to move the current code into an inline functions? Or how?
> > Does my approach make sense?
> 
> I fear the code duplications and performance losses. For example, the non-NEON lighting filter code is a simple loop, since all the utility functions are available. If I would move it to a separate directory I would have to duplicite them, since we only speed-up the internal pixel drawing. The border pixels are still drawn in a common way (and those loops using those common functions). If I would make them non-inline (to avoid code duplication), it would be a huge performance loss. We want to speed up drawing not slowing it down :)
Okay, but why not leave every code which can be shared in FEGaussianBlur itself?
We don't need to move the inner loop and duplicate it.
Why shall we make them non-inline? No need to, eh?
Comment 11 Zoltan Herczeg 2011-04-26 23:48:08 PDT
> Okay, but why not leave every code which can be shared in FEGaussianBlur itself?
> We don't need to move the inner loop and duplicate it.
> Why shall we make them non-inline? No need to, eh?

There are several possible solutions, and I can't choose among them :) If you are online, could we discuss this in #irc?
Comment 12 Zoltan Herczeg 2011-04-27 00:44:06 PDT
Created attachment 91248 [details]
clever patch

How about this solution?
Comment 13 WebKit Review Bot 2011-04-27 00:46:45 PDT
Attachment 91248 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1

Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:31:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 1 in 11 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 14 Build Bot 2011-04-27 01:24:58 PDT
Attachment 91248 [details] did not build on win:
Build output: http://queues.webkit.org/results/8508897
Comment 15 Nikolas Zimmermann 2011-04-27 01:30:15 PDT
Comment on attachment 91248 [details]
clever patch

This looks great to me, doesn't build on win/efl though, can you look into that first?
Comment 16 WebKit Review Bot 2011-04-27 01:36:50 PDT
Attachment 91248 [details] did not build on mac:
Build output: http://queues.webkit.org/results/8511308
Comment 17 Zoltan Herczeg 2011-04-27 04:15:09 PDT
Created attachment 91266 [details]
next try
Comment 18 WebKit Review Bot 2011-04-27 04:41:26 PDT
Attachment 91266 [details] did not build on mac:
Build output: http://queues.webkit.org/results/8509942
Comment 19 Zoltan Herczeg 2011-04-27 07:05:45 PDT
Created attachment 91280 [details]
mac fix
Comment 20 Zoltan Herczeg 2011-04-27 08:15:30 PDT
Wow all build work! I know the Changelog should be updated, but otherwise, this patch is ready for review!
Comment 21 Dirk Schulze 2011-04-27 08:24:12 PDT
(In reply to comment #20)
> Wow all build work! I know the Changelog should be updated, but otherwise, this patch is ready for review!

Gtk is not ready yet ;-)
Comment 22 Dirk Schulze 2011-04-27 08:32:12 PDT
Comment on attachment 91280 [details]
mac fix

View in context: https://bugs.webkit.org/attachment.cgi?id=91280&action=review

Much better! I'd like to let Niko make the final review. He may has more suggestions. The graphics code looks sane to me.

> Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:141
> +inline void FEGaussianBlur::platformApply(ByteArray* srcPixelArray, ByteArray* tmpPixelArray, unsigned kernelSizeX, unsigned kernelSizeY, IntSize& paintSize)
> +{
> +    // The selection here eventually should happen dynamically on some platforms.
> +#if CPU(ARM_NEON) && COMPILER(GCC)
> +    platformApplyNeon(srcPixelArray, tmpPixelArray, kernelSizeX, kernelSizeY, paintSize);
> +#else
> +    platformApplyGeneric(srcPixelArray, tmpPixelArray, kernelSizeX, kernelSizeY, paintSize);
> +#endif
> +}

I'd like to see this in the header as well. This way we would have a platform independent cpp file.

> Source/WebCore/platform/graphics/filters/FELighting.cpp:252
> +inline void FELighting::platformApply(LightingData& data, LightSource::PaintingData& paintingData)
> +{
> +    // The selection here eventually should happen dynamically on some platforms.
> +#if CPU(ARM_NEON) && COMPILER(GCC)
> +    platformApplyNeon(data, paintingData);
> +#else
> +    platformApplyGeneric(data, paintingData);
> +#endif
> +}

Ditto.
Comment 23 Collabora GTK+ EWS bot 2011-04-27 08:34:29 PDT
Attachment 91280 [details] did not build on gtk:
Build output: http://queues.webkit.org/results/8507989
Comment 24 Zoltan Herczeg 2011-04-27 23:54:49 PDT
Created attachment 91437 [details]
final patch

If WildFox likes it xD
Comment 25 Early Warning System Bot 2011-04-28 00:10:35 PDT
Attachment 91437 [details] did not build on qt:
Build output: http://queues.webkit.org/results/8519271
Comment 26 WebKit Review Bot 2011-04-28 00:12:44 PDT
Attachment 91437 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/8518286
Comment 27 WebKit Review Bot 2011-04-28 00:50:03 PDT
Attachment 91437 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/8520264
Comment 28 Zoltan Herczeg 2011-04-28 01:11:00 PDT
(In reply to comment #22)
> (From update of attachment 91280 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=91280&action=review
> 
> Much better! I'd like to let Niko make the final review. He may has more suggestions. The graphics code looks sane to me.
> 
> > Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp:141
> > +inline void FEGaussianBlur::platformApply(ByteArray* srcPixelArray, ByteArray* tmpPixelArray, unsigned kernelSizeX, unsigned kernelSizeY, IntSize& paintSize)
> > +{
> > +    // The selection here eventually should happen dynamically on some platforms.
> > +#if CPU(ARM_NEON) && COMPILER(GCC)
> > +    platformApplyNeon(srcPixelArray, tmpPixelArray, kernelSizeX, kernelSizeY, paintSize);
> > +#else
> > +    platformApplyGeneric(srcPixelArray, tmpPixelArray, kernelSizeX, kernelSizeY, paintSize);
> > +#endif
> > +}
> 
> I'd like to see this in the header as well. This way we would have a platform independent cpp file.

Seems some compilers does not like the idea :(
Comment 29 WebKit Review Bot 2011-04-28 01:50:01 PDT
Attachment 91437 [details] did not build on mac:
Build output: http://queues.webkit.org/results/8520276
Comment 30 Zoltan Herczeg 2011-04-28 03:56:38 PDT
Created attachment 91465 [details]
patch
Comment 31 Nikolas Zimmermann 2011-04-28 05:11:04 PDT
Comment on attachment 91465 [details]
patch

Looks great to me! As always I can't say anything regarding the assembler part, but I'll trust you since you're the expert in this area! :-)
Comment 32 Zoltan Herczeg 2011-04-28 07:19:59 PDT
Landed in http://trac.webkit.org/changeset/85180