WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 48212
SVG FELighting performance issues
https://bugs.webkit.org/show_bug.cgi?id=48212
Summary
SVG FELighting performance issues
Dirk Schulze
Reported
2010-10-24 14:28:30 PDT
FELighting has some performance issues. The most obvious issue is the -> operator for the get and set calls on CanvasPixelArray. The ByteArray is ref counted, so we have to use this operator on every read or write operation. This can be avoided by using a ByteArray pointer instead of the CanvasPixelArray like on the other effects. The other big perf issue if FloatPoint3D. The most time is spend on setPixel and mainly in the normalize() function of FloatPoint3D. normalize() makes three divisions and a dot product of itself. The dot product seems really slow and calls x() y() and z() on every call. Can we speed this up? Can't we write a selfDot function? Is it possible to avoid the call of normalize() more? Other issues depend on the kind of Lighting effect and are located at the update functions.
Attachments
draft patch
(28.40 KB, patch)
2010-10-27 00:07 PDT
,
Zoltan Herczeg
krit
: review-
Details
Formatted Diff
Diff
Example
(1.02 KB, image/svg+xml)
2010-10-27 00:07 PDT
,
Zoltan Herczeg
no flags
Details
Draft patch 2
(29.37 KB, patch)
2010-10-27 05:10 PDT
,
Zoltan Herczeg
no flags
Details
Formatted Diff
Diff
ByteArray rewriting
(29.33 KB, patch)
2010-10-27 05:12 PDT
,
Zoltan Herczeg
no flags
Details
Formatted Diff
Diff
final patch
(31.37 KB, patch)
2010-11-02 07:51 PDT
,
Zoltan Herczeg
krit
: review+
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Zoltan Herczeg
Comment 1
2010-10-27 00:07:03 PDT
Created
attachment 71991
[details]
draft patch Faster light filter. I will upload an example, where the speed is improved from 55 to 73 fps (on a mac leopard)! The patch contains heavy optimizations, which you may not like, that is why it is a draft atm.
Zoltan Herczeg
Comment 2
2010-10-27 00:07:50 PDT
Created
attachment 71992
[details]
Example
Dirk Schulze
Comment 3
2010-10-27 01:02:35 PDT
Comment on
attachment 71991
[details]
draft patch View in context:
https://bugs.webkit.org/attachment.cgi?id=71991&action=review
Have you measured the performance with ByteArray*? I couldn't see a difference between usigned char* and ByteArray* on feGaussianBlur with shark. Of course you save some address calculations on using unsigned char* (pixels[0] = ... pixels[1]), but would still interest me you see a difference. The changes on FloatPoint3D are ok.
> WebCore/platform/graphics/filters/FELighting.cpp:95 > +inline void FELighting::LightingData::topRight(int offset, IntPair& intPair) > { > - return static_cast<int>(pixels->get(offset - widthMultipliedByPixelSize + cPixelSize + cAlphaChannelOffset)); > + unsigned char* pixel = pixels + offset; > + int left = static_cast<int>(pixel[-cPixelSize + cAlphaChannelOffset]); > + int center = static_cast<int>(pixel[cAlphaChannelOffset]); > + pixel += widthMultipliedByPixelSize; > + int bottomLeft = static_cast<int>(pixel[-cPixelSize + cAlphaChannelOffset]); > + int bottom = static_cast<int>(pixel[cAlphaChannelOffset]); > + intPair.normalX = -(left << 1) + (center << 1) - bottomLeft + bottom; > + intPair.normalY = -left - (center << 1) + bottomLeft + (bottom << 1); > }
Is there no way to share more of the code with the other edge functions?
> WebCore/platform/graphics/filters/FELighting.cpp:196 > + if (m_specularExponent == 1.0f)
== 1
> WebCore/platform/graphics/filters/FELighting.cpp:205 > + normalVector.setZ(1.0f);
just ...setZ(1);
> WebCore/platform/graphics/filters/FELighting.cpp:214 > + if (m_specularExponent == 1.0f)
== 1
> WebCore/platform/graphics/filters/FELighting.cpp:-128 > - if (data.lightStrength > 1.0f) > - data.lightStrength = 1.0f; > - if (data.lightStrength < 0.0f) > - data.lightStrength = 0.0f;
1 and 0
> WebCore/platform/graphics/filters/FELighting.cpp:237 > + IntPair intPair; > + int offset;
Please initialize these values. And declare them on the first use.
> WebCore/platform/graphics/filters/FELighting.cpp:258 > + setPixel(offset, data, paintingData, 0, 0, -2.0f / 3.0f, -2.0f / 3.0f, intPair);
Normally the compiler should optimize it, but can you add a static const for -2 / 3.f ? This is reused multiple times. The same for some other constant calculations, a name is also easier to understand.
> WebCore/platform/graphics/filters/FELighting.cpp:346 > RefPtr<ImageData> srcImageData(in->resultImage()->getUnmultipliedImageData(effectDrawingRect));
change srcImageData(...) to srcImageData = ... please
> WebCore/platform/graphics/filters/FELighting.h:57 > + struct IntPair { > + int normalX; > + int normalY; > + };
Do you really see a noticeable performance win compared to IntPoint?
Zoltan Herczeg
Comment 4
2010-10-27 05:10:43 PDT
Created
attachment 72014
[details]
Draft patch 2 This one contains the requested changes except the ByteArray* rewriting. This is the fastest so far, can achive 74 fps.
Zoltan Herczeg
Comment 5
2010-10-27 05:12:34 PDT
Created
attachment 72015
[details]
ByteArray rewriting This one contains the ByteArray rewriting. It is slower, only 72 fps.
Dirk Schulze
Comment 6
2010-10-27 05:44:45 PDT
Thanks for taking the bother to check ByteArray, have you used callgrind (kcachegrind) or shark to look why it is slower on ByteArray? Is it the get and set method? If so, we might follow kling's suggestion and add a unsafeGet/Set method. I'd also like to know what takes most of the time after your changes.
Zoltan Herczeg
Comment 7
2010-10-27 06:35:26 PDT
(In reply to
comment #6
)
> Thanks for taking the bother to check ByteArray, have you used callgrind (kcachegrind) or shark to look why it is slower on ByteArray? Is it the get and set method? If so, we might follow kling's suggestion and add a unsafeGet/Set method. I'd also like to know what takes most of the time after your changes.
Assembly magic. The pixels->get(offset) is turned to indexed addressing modes [reg + reg], while the other [reg, immediate] is supported by more architectures, and results a simplified machine code. Valgrind is not useful for performance measurements, since it totally recompiles the code, even if you don't touch the instruction stream. Valgrind is a great tool, and excellent for searching memory leaks (and other things unrealted to performance). There is no unsafeGet method for ByteArray. Get is unsafe enough :) unsigned char get(unsigned index) const { ASSERT(index < m_size); return m_data[index]; }
Dirk Schulze
Comment 8
2010-10-27 07:15:52 PDT
(In reply to
comment #7
)
> (In reply to
comment #6
) > > Thanks for taking the bother to check ByteArray, have you used callgrind (kcachegrind) or shark to look why it is slower on ByteArray? Is it the get and set method? If so, we might follow kling's suggestion and add a unsafeGet/Set method. I'd also like to know what takes most of the time after your changes. > > Assembly magic. The pixels->get(offset) is turned to indexed addressing modes [reg + reg], while the other [reg, immediate] is supported by more architectures, and results a simplified machine code. > > Valgrind is not useful for performance measurements, since it totally recompiles the code, even if you don't touch the instruction stream. Valgrind is a great tool, and excellent for searching memory leaks (and other things unrealted to performance). > > There is no unsafeGet method for ByteArray. Get is unsafe enough :) > > unsigned char get(unsigned index) const > { > ASSERT(index < m_size); > return m_data[index]; > }
You don't need to recompile for using callgrind, and it is very useful to see how much time is spend in which function. Valgrind is a tool chain, and consists of different tools, not only cachegrind (I guess you were just thinking about this). kcachegrind2 on the other hand is just a graphical tool and makes it quiet easy to visualize the output of callgrind. Yes, get is pretty simple, but set has a bit more "complexity", that might get visible on the massive calls of it for pixel manipulation.
Zoltan Herczeg
Comment 9
2010-10-27 09:47:13 PDT
> You don't need to recompile for using callgrind, and it is very useful to see how much time is spend in which function. Valgrind is a tool chain, and consists of different tools, not only cachegrind (I guess you were just thinking about this). kcachegrind2 on the other hand is just a graphical tool and makes it quiet easy to visualize the output of callgrind.
I like valgrind, I even wrote valgrind tools myself. Valgrind using a recompiler called libVEX. It reads the machine instructions from the binary, convert them to a high level assembly language, where a tool can modify this stream (thus it is really easy to insert calls into the stream, that is what most tools do). Later, this high level assembly language is compiled back to binary form on-the-fly by libVEX again. If your tool does not touch the stream, you get a 3 times slower binary code. Hence it is useless for performance related things. But you can follow the control flow (inserting calls before function entries, and such), you can insert checks before each memory reads/writes (memcheck), and so on. If you need info, but you don't care about speed -> valgring (libVEX) is your tool. If speed matters, use cycle accurate simulators (we also have one for arm:
http://www.inf.u-szeged.hu/xeemu/
)
Zoltan Herczeg
Comment 10
2010-10-29 04:44:24 PDT
Krit, which version you choose? (Unfortunately the lighting filter is a big chunk of code because of inline functions, and cannot be analized properly)
Dirk Schulze
Comment 11
2010-10-29 05:19:59 PDT
(In reply to
comment #10
)
> Krit, which version you choose? (Unfortunately the lighting filter is a big chunk of code because of inline functions, and cannot be analized properly)
Oh sorry, for consistency to other effects and given the fact that the difference is not that high, Id prefer ByteArray for the moment. Is it possible to share more code on bottomLeft, borromRight,... functions?
Zoltan Herczeg
Comment 12
2010-11-02 07:51:59 PDT
Created
attachment 72664
[details]
final patch
Zoltan Herczeg
Comment 13
2010-11-02 07:53:37 PDT
> Oh sorry, for consistency to other effects and given the fact that the difference is not that high, Id prefer ByteArray for the moment. > Is it possible to share more code on bottomLeft, borromRight,... functions?
Did it. There maybe some template magic which could reduce the number of functions without performance loss, but it looks difficult.
Dirk Schulze
Comment 14
2010-11-02 14:25:54 PDT
Comment on
attachment 72664
[details]
final patch View in context:
https://bugs.webkit.org/attachment.cgi?id=72664&action=review
r=me with little style changes.
> WebCore/platform/graphics/filters/FELighting.cpp:178 > +inline void FELighting::inlineSetPixel(int offset, LightingData& data, LightSource::PaintingData& paintingData, > + int lightX, int lightY, float factorX, float factorY, IntPoint& normal2DVector)
align the int's
> WebCore/platform/graphics/filters/FELighting.cpp:227 > +void FELighting::setPixel(int offset, LightingData& data, LightSource::PaintingData& paintingData, > + int lightX, int lightY, float factorX, float factorY, IntPoint& normalVector)
align the int's.
> WebCore/platform/graphics/filters/FELighting.h:76 > + inline void inlineSetPixel(int offset, LightingData&, LightSource::PaintingData&, > + int lightX, int lightY, float factorX, float factorY, IntPoint& normalVector);
please align the ints
> WebCore/platform/graphics/filters/FELighting.h:80 > + void setPixel(int offset, LightingData&, LightSource::PaintingData&, > + int lightX, int lightY, float factorX, float factorY, IntPoint& normalVector);
Ditto.
Zoltan Herczeg
Comment 15
2010-11-03 01:44:19 PDT
Landed in
http://trac.webkit.org/changeset/71222
Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug