The arithmetic composite filter can currently not be auto-vectorized by GCC, but with a few changes many common cases could take a path that can.
Created attachment 154078 [details] Patch
Comment on attachment 154078 [details] Patch Attachment 154078 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13323652 New failing tests: svg/filters/filter-placement-issue.svg
Created attachment 154086 [details] Archive of layout-test-results from gce-cr-linux-01 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: gce-cr-linux-01 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.39-gcg-201203291735-x86_64-with-Ubuntu-10.04-lucid
Created attachment 154293 [details] Patch Fixing the detection of whether the unclamped version can be used. This has become more specific, but still represents many common use-cases.
Comment on attachment 154293 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=154293&action=review Looks good, r=me with minor suggestions: > Source/WebCore/platform/graphics/filters/FEComposite.cpp:167 > + scaledK1 = k1 / 255.f; Nitpick, we use 255.0f in WebCore/SVG code. > Source/WebCore/platform/graphics/filters/FEComposite.cpp:189 > + float upperLimit = fmaxf(0.f, k1) + fmaxf(0.f, k2) + fmaxf(0.f, k3) + k4; Any specific reason for not using std::max here? Both parameters are floats. > Source/WebCore/platform/graphics/filters/FEComposite.cpp:193 > + if (k4 >= 0.f && k4 <= 1.f > + && upperLimit >= 0.f && upperLimit <= 1.f > + && lowerLimit >= 0.f && lowerLimit <= 1.f) { No need for additional lines here, we don't care for line lengths in SVG code. > Source/WebCore/platform/graphics/filters/FEComposite.cpp:198 > + if (!k4) { > + if (!k1) > + computeArithmeticPixelsUnclamped<0, 0>(source, destination, pixelArrayLength, k1, k2, k3, k4); > + else > + computeArithmeticPixelsUnclamped<1, 0>(source, destination, pixelArrayLength, k1, k2, k3, k4); Can you swap all of these? if (k4) { if (k1) { ...} to avoid the negations. > Source/WebCore/platform/graphics/filters/FEComposite.cpp:208 > + if (!k4) { Ditto.
(In reply to comment #0) > The arithmetic composite filter can currently not be auto-vectorized by GCC, but with a few changes many common cases could take a path that can. Out of curiosity: how do you test that the auto-vectorization is applied? Do you inspect assembly or are there other tricks to let gcc show you what kind of optimizations it applied?
(In reply to comment #6) > (In reply to comment #0) > > The arithmetic composite filter can currently not be auto-vectorized by GCC, but with a few changes many common cases could take a path that can. > > Out of curiosity: how do you test that the auto-vectorization is applied? Do you inspect assembly or are there other tricks to let gcc show you what kind of optimizations it applied? I use -ftree-vectorizer-verbose, =2 tells you if it is applied, but I normally use =5 or =6, which tells me why it is not applied when it isn't.
(In reply to comment #7) > (In reply to comment #6) > > (In reply to comment #0) > > > The arithmetic composite filter can currently not be auto-vectorized by GCC, but with a few changes many common cases could take a path that can. > > > > Out of curiosity: how do you test that the auto-vectorization is applied? Do you inspect assembly or are there other tricks to let gcc show you what kind of optimizations it applied? > > I use -ftree-vectorizer-verbose, =2 tells you if it is applied, but I normally use =5 or =6, which tells me why it is not applied when it isn't. Btw, I was hoping to be able to use GCC pragma, or function attributes to mark functions that should be auto-vectorized, but I can't seem to get neither #pragma GCC optimize("tree-vectorize") nor __attribute__((optimize("tree-vectorize))) to do anything. I will probably open GCC bug on that, but once that works, these and other functions that could benefit from vectorization could be specifically marked, so only they are optimized aggresively and we do not risk library-bloat from aggresively optimizing irrelevant parts of WebCore.
Committed r123605: <http://trac.webkit.org/changeset/123605>