Bug 68478

Summary: feGaussianBlur CoreImage implementation
Product: WebKit Reporter: Tim Horton <thorton>
Component: SVGAssignee: Tim Horton <thorton>
Status: ASSIGNED ---    
Severity: Normal CC: dglazkov, dino, krit, sam, senorblanco, simon.fraser, webkit-bug-importer, webkit.review.bot, zherczeg, zimmermann
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 68469, 68479    
Attachments:
Description Flags
initial (i.e. no changelogs, some tests fail, etc.) patch
none
style fixes
none
new patch, with Simon's changes and WKSI update
webkit.review.bot: commit-queue-
example of brokenness
none
"example of brokenness" SVG source none

Description Tim Horton 2011-09-20 14:31:26 PDT
Still very rough, I just wanted to see how it'd go. Haven't tested to see if it's actually faster yet, either, though I suspect it is.

Also moves feGaussianBlur to using compiled-in-per-platform platformApply like everywhere else in WebCore, instead of dispatching to suffix'd functions.
Comment 1 Tim Horton 2011-09-20 14:31:57 PDT
Created attachment 108053 [details]
initial (i.e. no changelogs, some tests fail, etc.) patch
Comment 2 Radar WebKit Bug Importer 2011-09-20 14:31:57 PDT
<rdar://problem/10155876>
Comment 3 Tim Horton 2011-09-20 14:36:30 PDT
Created attachment 108055 [details]
style fixes
Comment 4 Tim Horton 2011-09-20 14:42:04 PDT
I seem to need to update :-\

Also, some of the blurs in the tests are slightly different, I'm not sure what that's about. It's like... the gamma of the blur, or something, if that makes sense.
Comment 5 Simon Fraser (smfr) 2011-09-20 14:43:47 PDT
Comment on attachment 108055 [details]
style fixes

View in context: https://bugs.webkit.org/attachment.cgi?id=108055&action=review

> Source/WebCore/platform/graphics/filters/ci/FECITools.h:1
> +/*

I suggest filters/coreimage/, not 'ci'

> Source/WebCore/platform/graphics/filters/ci/FECITools.h:36
> +#import <QuartzCore/CIImagePrivate.h> // TODO: WKSI? (this is for kCIFormatRGBA8)

Yep.

> Source/WebCore/platform/graphics/filters/ci/FECITools.mm:33
> +CIImage* ciImageFromByteArray(ByteArray* src, const IntSize& size)

This should return a RetainPtr

> Source/WebCore/platform/graphics/filters/ci/FEGaussianBlurCI.mm:38
> +    if(kernelSizeX != kernelSizeY)

Space after if, brace on new line.
Comment 6 Sam Weinig 2011-09-20 16:17:21 PDT
Is there a measurable performance gain? If so, what is it?
Comment 7 Tim Horton 2011-09-20 16:57:45 PDT
(In reply to comment #6)
> Is there a measurable performance gain? If so, what is it?

The frame rate bump I'm seeing on a 300x300 ellipse being blurred with stdDeviation=20 appears to be a little over 2x, on otherwise equivalent release builds, on my MacBook Pro.

Also appears to be a significant decrease (80% of a core to 40% of a core) in CPU usage.
Comment 8 Dean Jackson 2011-09-20 17:37:27 PDT
(In reply to comment #4)

> Also, some of the blurs in the tests are slightly different, I'm not sure what that's about. It's like... the gamma of the blur, or something, if that makes sense.

premultiplied vs non?

Looking at this I wonder if we need a way to store the CIFilter between applications. That would save us the compile and linking step (although maybe CI is really smart about it).
Comment 9 Tim Horton 2011-09-20 17:40:03 PDT
(In reply to comment #8)
> (In reply to comment #4)
> 
> > Also, some of the blurs in the tests are slightly different, I'm not sure what that's about. It's like... the gamma of the blur, or something, if that makes sense.
> 
> premultiplied vs non?

Possible, good call, I'll check that.

> Looking at this I wonder if we need a way to store the CIFilter between applications. That would save us the compile and linking step (although maybe CI is really smart about it).

Hadn't considered that; I had wondered if it would be nice if we could pass the CIImage (as platform-specific userdata ala other parts of WebCore) through the filter chain instead of recreating it for each, but that might get messy, so I'm avoiding it for now.
Comment 10 Tim Horton 2011-09-20 17:55:58 PDT
Created attachment 108092 [details]
new patch, with Simon's changes and WKSI update
Comment 11 WebKit Review Bot 2011-09-20 18:22:59 PDT
Comment on attachment 108092 [details]
new patch, with Simon's changes and WKSI update

Attachment 108092 [details] did not pass cr-mac-ews (chromium):
Output: http://queues.webkit.org/results/9772487
Comment 12 Zoltan Herczeg 2011-09-20 18:59:27 PDT
Tim, the purpose of platformApply, platformApplyGeneric grouping is dynamicaly detectable features.

I.e:

platformApply:
  if (RunOnARM && hasNEON)
    platformApplyNEON(...)
  if (hasOpenCLI)
    platformApplyOpenCLI(...)
  platformApplyGeneric(...)

I know this is not yet implemented, but planned work this way. So the generic code always call platformApply which might call a single function on some platforms, or do feature dependent selection on others.

So your man function could look like:
platformApply:
#if MAC
  if (kernelX == kernelY)
    platformApplyCF()
#endif
  platformApplyGeneric(...)
Comment 13 Tim Horton 2011-09-21 11:21:57 PDT
(In reply to comment #12)
> Tim, the purpose of platformApply, platformApplyGeneric grouping is dynamicaly detectable features.
> 
> I.e:
> 
> platformApply:
>   if (RunOnARM && hasNEON)
>     platformApplyNEON(...)
>   if (hasOpenCLI)
>     platformApplyOpenCLI(...)
>   platformApplyGeneric(...)
> 
> I know this is not yet implemented, but planned work this way. So the generic code always call platformApply which might call a single function on some platforms, or do feature dependent selection on others.
> 
> So your man function could look like:
> platformApply:
> #if MAC
>   if (kernelX == kernelY)
>     platformApplyCF()
> #endif
>   platformApplyGeneric(...)

Ah, ok. I was quite confused because it seemed both a) different than everywhere else, and b) unnecessary, but it makes sense knowing that there were eventually plans to use it more seriously. I'll switch that part back for now.
Comment 14 Tim Horton 2011-09-21 14:46:10 PDT
Created attachment 108237 [details]
example of brokenness

Attaching an example of what's going wrong. Here there's a white rectangle and a black rectangle on a gray background, the two rects overlap and are *together* in a blurred group.

It's not a premultiplication problem here since there's no alpha involved.

Not sure what's up.
Comment 15 Tim Horton 2011-09-21 14:46:43 PDT
Err, maybe I'm just getting the radius wrong. But I'm not sure why.
Comment 16 Tim Horton 2011-09-21 14:48:51 PDT
No, nevermind, the radius is fine, obviously, since only tests with overlapping elements blurred together fail.
Comment 17 Tim Horton 2011-09-21 14:54:32 PDT
Created attachment 108240 [details]
"example of brokenness" SVG source
Comment 18 Zoltan Herczeg 2011-09-21 22:58:30 PDT
This is really interesting. If you check the curve at the bottom-right corner of the white box, the "+patch" image clearly has an extra blurred circle.

I mean the blur looks like:

                  |
Normal:           |
                 /
           _____/


                  |
+Path:            /
        ___      /
           \____/
Comment 19 Dirk Schulze 2011-09-22 00:33:34 PDT
Even if 2x speedup is quite good. I'd expect better results with a OpenCL implementation of filters (with a better usage of the GPU). OpenCL gets also supported by more and more SoCs. I wouldn't be surprised if the next iPhone ships with support for OpenCL. But we can try to go both ways.
Comment 20 Dean Jackson 2011-09-22 03:40:15 PDT
(In reply to comment #19)
> Even if 2x speedup is quite good. I'd expect better results with a OpenCL implementation of filters (with a better usage of the GPU). OpenCL gets also supported by more and more SoCs. I wouldn't be surprised if the next iPhone ships with support for OpenCL. But we can try to go both ways.

I would take a bet that we'd get a bigger speedup if we could avoid copying the CIImage back for rendering by CG. However this would require some surgery in SVG rendering.
Comment 21 Dirk Schulze 2011-09-22 03:47:57 PDT
(In reply to comment #20)
> (In reply to comment #19)
> > Even if 2x speedup is quite good. I'd expect better results with a OpenCL implementation of filters (with a better usage of the GPU). OpenCL gets also supported by more and more SoCs. I wouldn't be surprised if the next iPhone ships with support for OpenCL. But we can try to go both ways.
> 
> I would take a bet that we'd get a bigger speedup if we could avoid copying the CIImage back for rendering by CG. However this would require some surgery in SVG rendering.

Maybe yes, but you'd need to do a compete new filters implementation. Means you need a new filter chain implemented in parallel. If you like to do that, it would be very hard to share any filter effect related work with other platforms (even with Safari Win). However, the code to do that is still in the revision history of SVN. You wouldn't need to start from scratch. IIRC it was removed even before I started working on WebKit (more than 5 years ago?).
Comment 22 Zoltan Herczeg 2011-09-30 01:13:48 PDT
Might be related: https://bugs.webkit.org/show_bug.cgi?id=68899
Comment 23 Stephen White 2011-11-09 12:48:32 PST
I've noticed that some of the filters (e.g., FEOffset) use GraphicsContext and ImageBuffer directly.  Since several ports already have accelerated implementations of GraphicsContext already, I think it's possible to trivially accelerate these filters today, simply by setting the "Accelerated" flag on ImageBuffer creation.

Inspired by that, for the remaining filters, what do people think about either placing the required new APIs (e.g., gaussianBlur) on GraphicsContext itself? If that seems out of place on GraphicsContext (most of these operations would be essentially image-to-image, and would ignore most GraphicsContext state), 
another alternative would be to provide a platform-independant CoreImage-like API for other ports to implement.  For example, we could have a class, call it ImageContext, which wraps a CIContext as GraphicsContext wraps a CGContext.

Among other things, I think this would allow the Mac port to do efficient CGImage -> CIImage conversions, and also allow other ports to implement the filters using other backends, with only platform-independent code in the platform/graphics/filters directory.  For example, the Skia backend has some image processing capabilities, but does not have a distinction between images used for image processing or drawing, so it would be nice to take advantage of that if possible.