WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
90949
Optimizing blend filter to ARM-NEON with intrinsics
https://bugs.webkit.org/show_bug.cgi?id=90949
Summary
Optimizing blend filter to ARM-NEON with intrinsics
Gabor Rapcsanyi
Reported
2012-07-11 01:06:31 PDT
Fasten up the blending filter.
Attachments
patch
(18.83 KB, patch)
2012-07-12 00:46 PDT
,
Gabor Rapcsanyi
no flags
Details
Formatted Diff
Diff
patch2
(19.50 KB, patch)
2012-07-13 06:08 PDT
,
Gabor Rapcsanyi
no flags
Details
Formatted Diff
Diff
test case
(908 bytes, image/svg+xml)
2012-07-14 10:41 PDT
,
Gabor Rapcsanyi
no flags
Details
SVG test case for ARM_NEON
(5.55 KB, image/svg+xml)
2017-05-11 04:45 PDT
,
Rahul Gupta
no flags
Details
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Gabor Rapcsanyi
Comment 1
2012-07-12 00:46:25 PDT
Created
attachment 151874
[details]
patch
Zoltan Herczeg
Comment 2
2012-07-12 02:10:38 PDT
Comment on
attachment 151874
[details]
patch Nice patch! Few comments: View in context:
https://bugs.webkit.org/attachment.cgi?id=151874&action=review
> Source/WebCore/ChangeLog:11 > + The feBlend SVG filter modes can be greatly fasten up with ARM-NEON since > + we are able to calculate with 2 pixels (8 channels) at the same time. > + The code is written with NEON intrinsics and it doesn't affect the > + general - it has the same behaviour as the original algorithm.
Please mention the speedup of this patch somewhere.
> Source/WebCore/platform/graphics/filters/FEBlend.cpp:159 > + if (pixelArrayLength > 4) > + platformApplyNEON(srcPixelArrayA, srcPixelArrayB, dstPixelArray, pixelArrayLength); > + else > + platformApplyGeneric(srcPixelArrayA, srcPixelArrayB, dstPixelArray, pixelArrayLength);
I think a single pixel case is rare, and would not worth to include the object code of the generic just because of this. It would be better to create an array with 2 pixels, copy there the one pixel, perform the algorithm and copy back the result.
> Source/WebCore/platform/graphics/filters/arm/FEBlendNEON.h:39 > +static inline uint16x8_t div255(uint16x8_t num, uint16x8_t sixteenConst255, uint16x8_t sixteenConstOne)
I don't like introducing global symbols like these. Can\t we simply move them inside the body of the function or define them as part of FEBlend?
Gabor Rapcsanyi
Comment 3
2012-07-13 06:08:04 PDT
Created
attachment 152235
[details]
patch2 (In reply to
comment #2
)
> (From update of
attachment 151874
[details]
) > Nice patch! Few comments: > > View in context:
https://bugs.webkit.org/attachment.cgi?id=151874&action=review
> > > Source/WebCore/ChangeLog:11 > > + The feBlend SVG filter modes can be greatly fasten up with ARM-NEON since > > + we are able to calculate with 2 pixels (8 channels) at the same time. > > + The code is written with NEON intrinsics and it doesn't affect the > > + general - it has the same behaviour as the original algorithm. > > Please mention the speedup of this patch somewhere. >
With this NEON optimization the blending is ~4.5 times faster for each mode.
> > Source/WebCore/platform/graphics/filters/FEBlend.cpp:159 > > + if (pixelArrayLength > 4) > > + platformApplyNEON(srcPixelArrayA, srcPixelArrayB, dstPixelArray, pixelArrayLength); > > + else > > + platformApplyGeneric(srcPixelArrayA, srcPixelArrayB, dstPixelArray, pixelArrayLength); > > I think a single pixel case is rare, and would not worth to include the object code of the generic just because of this. It would be better to create an array with 2 pixels, copy there the one pixel, perform the algorithm and copy back the result. >
Ok I solved that case with platformApplyNEON as well.
> > Source/WebCore/platform/graphics/filters/arm/FEBlendNEON.h:39 > > +static inline uint16x8_t div255(uint16x8_t num, uint16x8_t sixteenConst255, uint16x8_t sixteenConstOne) > > I don't like introducing global symbols like these. Can\t we simply move them inside the body of the function or define them as part of FEBlend?
I wrapped the div255 and the others into FEBlendUtilitiesNEON class so they are not global symbols anymore.
Zoltan Herczeg
Comment 4
2012-07-13 08:07:46 PDT
Comment on
attachment 152235
[details]
patch2 r=me View in context:
https://bugs.webkit.org/attachment.cgi?id=152235&action=review
> Source/WebCore/ChangeLog:11 > + The feBlend SVG filter modes can be greatly fasten up with ARM-NEON since > + we are able to calculate with 2 pixels (8 channels) at the same time. > + The code is written with NEON intrinsics and it doesn't affect the > + general - it has the same behaviour as the original algorithm.
Please mention the speedup.
> Source/WebCore/platform/graphics/filters/FEBlend.cpp:157 > + else { // if there is just one pixel
Please use full sentences for comments.
Gabor Rapcsanyi
Comment 5
2012-07-13 08:35:30 PDT
(In reply to
comment #4
)
> (From update of
attachment 152235
[details]
) > r=me > > View in context:
https://bugs.webkit.org/attachment.cgi?id=152235&action=review
> > > Source/WebCore/ChangeLog:11 > > + The feBlend SVG filter modes can be greatly fasten up with ARM-NEON since > > + we are able to calculate with 2 pixels (8 channels) at the same time. > > + The code is written with NEON intrinsics and it doesn't affect the > > + general - it has the same behaviour as the original algorithm. > > Please mention the speedup. > > > Source/WebCore/platform/graphics/filters/FEBlend.cpp:157 > > + else { // if there is just one pixel > > Please use full sentences for comments.
I corrected those, thanks for the review. Patch landed:
http://trac.webkit.org/changeset/122582
Allan Sandfeld Jensen
Comment 6
2012-07-14 06:37:39 PDT
Could the same be written using the generic GCC vector functions, so that it would work on more than one platform? What test-case are you using to tests speed-up?
Gabor Rapcsanyi
Comment 7
2012-07-14 10:41:13 PDT
Created
attachment 152427
[details]
test case (In reply to
comment #6
)
> Could the same be written using the generic GCC vector functions, so that it would work on more than one platform? >
I'm not very familiar with GCC vector functions but I think it could work as well.
> What test-case are you using to tests speed-up?
I uploaded the test case which I used. But I've just measured platformApplyGeneric() and platformApplyNEON() itself inside WebKit.
Rahul Gupta
Comment 8
2017-05-11 04:45:46 PDT
Created
attachment 309706
[details]
SVG test case for ARM_NEON
Rahul Gupta
Comment 9
2017-05-11 04:46:50 PDT
ARM_NEON_INTRINSICS Optimization causes WebkitWebProcess to crash, if pixelArrayLength is not in multiple of 8, i.e pixelArray contains ODD NUMBER of pixels. Please find attached test case (SVG test case for ARM_NEON).
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug