Bug 90949

Summary: Optimizing blend filter to ARM-NEON with intrinsics
Product: WebKit Reporter: Gabor Rapcsanyi <rgabor>
Component: SVGAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: allan.jensen, rahul.g, rakuco, webkit.review.bot, zimmermann
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
patch
none
patch2
none
test case
none
SVG test case for ARM_NEON none

Description Gabor Rapcsanyi 2012-07-11 01:06:31 PDT
Fasten up the blending filter.
Comment 1 Gabor Rapcsanyi 2012-07-12 00:46:25 PDT
Created attachment 151874 [details]
patch
Comment 2 Zoltan Herczeg 2012-07-12 02:10:38 PDT
Comment on attachment 151874 [details]
patch

Nice patch! Few comments:

View in context: https://bugs.webkit.org/attachment.cgi?id=151874&action=review

> Source/WebCore/ChangeLog:11
> +        The feBlend SVG filter modes can be greatly fasten up with ARM-NEON since
> +        we are able to calculate with 2 pixels (8 channels) at the same time.
> +        The code is written with NEON intrinsics and it doesn't affect the
> +        general - it has the same behaviour as the original algorithm.

Please mention the speedup of this patch somewhere.

> Source/WebCore/platform/graphics/filters/FEBlend.cpp:159
> +    if (pixelArrayLength > 4)
> +        platformApplyNEON(srcPixelArrayA, srcPixelArrayB, dstPixelArray, pixelArrayLength);
> +    else
> +        platformApplyGeneric(srcPixelArrayA, srcPixelArrayB, dstPixelArray, pixelArrayLength);

I think a single pixel case is rare, and would not worth to include the object code of the generic just because of this. It would be better to create an array with 2 pixels, copy there the one pixel, perform the algorithm and copy back the result.

> Source/WebCore/platform/graphics/filters/arm/FEBlendNEON.h:39
> +static inline uint16x8_t div255(uint16x8_t num, uint16x8_t sixteenConst255, uint16x8_t sixteenConstOne)

I don't like introducing global symbols like these. Can\t we simply move them inside the body of the function or define them as part of FEBlend?
Comment 3 Gabor Rapcsanyi 2012-07-13 06:08:04 PDT
Created attachment 152235 [details]
patch2

(In reply to comment #2)
> (From update of attachment 151874 [details])
> Nice patch! Few comments:
> 
> View in context: https://bugs.webkit.org/attachment.cgi?id=151874&action=review
> 
> > Source/WebCore/ChangeLog:11
> > +        The feBlend SVG filter modes can be greatly fasten up with ARM-NEON since
> > +        we are able to calculate with 2 pixels (8 channels) at the same time.
> > +        The code is written with NEON intrinsics and it doesn't affect the
> > +        general - it has the same behaviour as the original algorithm.
> 
> Please mention the speedup of this patch somewhere.
> 

With this NEON optimization the blending is ~4.5 times faster for each mode.

> > Source/WebCore/platform/graphics/filters/FEBlend.cpp:159
> > +    if (pixelArrayLength > 4)
> > +        platformApplyNEON(srcPixelArrayA, srcPixelArrayB, dstPixelArray, pixelArrayLength);
> > +    else
> > +        platformApplyGeneric(srcPixelArrayA, srcPixelArrayB, dstPixelArray, pixelArrayLength);
> 
> I think a single pixel case is rare, and would not worth to include the object code of the generic just because of this. It would be better to create an array with 2 pixels, copy there the one pixel, perform the algorithm and copy back the result.
> 

Ok I solved that case with platformApplyNEON as well.

> > Source/WebCore/platform/graphics/filters/arm/FEBlendNEON.h:39
> > +static inline uint16x8_t div255(uint16x8_t num, uint16x8_t sixteenConst255, uint16x8_t sixteenConstOne)
> 
> I don't like introducing global symbols like these. Can\t we simply move them inside the body of the function or define them as part of FEBlend?

I wrapped the div255 and the others into FEBlendUtilitiesNEON class so they are not global symbols anymore.
Comment 4 Zoltan Herczeg 2012-07-13 08:07:46 PDT
Comment on attachment 152235 [details]
patch2

r=me

View in context: https://bugs.webkit.org/attachment.cgi?id=152235&action=review

> Source/WebCore/ChangeLog:11
> +        The feBlend SVG filter modes can be greatly fasten up with ARM-NEON since
> +        we are able to calculate with 2 pixels (8 channels) at the same time.
> +        The code is written with NEON intrinsics and it doesn't affect the
> +        general - it has the same behaviour as the original algorithm.

Please mention the speedup.

> Source/WebCore/platform/graphics/filters/FEBlend.cpp:157
> +    else { // if there is just one pixel

Please use full sentences for comments.
Comment 5 Gabor Rapcsanyi 2012-07-13 08:35:30 PDT
(In reply to comment #4)
> (From update of attachment 152235 [details])
> r=me
> 
> View in context: https://bugs.webkit.org/attachment.cgi?id=152235&action=review
> 
> > Source/WebCore/ChangeLog:11
> > +        The feBlend SVG filter modes can be greatly fasten up with ARM-NEON since
> > +        we are able to calculate with 2 pixels (8 channels) at the same time.
> > +        The code is written with NEON intrinsics and it doesn't affect the
> > +        general - it has the same behaviour as the original algorithm.
> 
> Please mention the speedup.
> 
> > Source/WebCore/platform/graphics/filters/FEBlend.cpp:157
> > +    else { // if there is just one pixel
> 
> Please use full sentences for comments.

I corrected those, thanks for the review.
Patch landed: http://trac.webkit.org/changeset/122582
Comment 6 Allan Sandfeld Jensen 2012-07-14 06:37:39 PDT
Could the same be written using the generic GCC vector functions, so that it would work on more than one platform?

What test-case are you using to tests speed-up?
Comment 7 Gabor Rapcsanyi 2012-07-14 10:41:13 PDT
Created attachment 152427 [details]
test case

(In reply to comment #6)
> Could the same be written using the generic GCC vector functions, so that it would work on more than one platform?
> 

I'm not very familiar with GCC vector functions but I think it could work as well.

> What test-case are you using to tests speed-up?

I uploaded the test case which I used. But I've just measured platformApplyGeneric() and platformApplyNEON() itself inside WebKit.
Comment 8 Rahul Gupta 2017-05-11 04:45:46 PDT
Created attachment 309706 [details]
SVG test case for ARM_NEON
Comment 9 Rahul Gupta 2017-05-11 04:46:50 PDT
ARM_NEON_INTRINSICS Optimization causes WebkitWebProcess to crash, if  pixelArrayLength is not in multiple of 8, i.e pixelArray contains ODD NUMBER of pixels.

Please find attached test case (SVG test case for ARM_NEON).