Bug 70099 - OpenCL implementation of W3C Filter Effects Master Bug
Summary: OpenCL implementation of W3C Filter Effects Master Bug
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: SVG (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on: 70350 99829 107444 109580 110193 110752
Blocks: 68469 68479 103398
  Show dependency treegraph
 
Reported: 2011-10-14 04:54 PDT by Dirk Schulze
Modified: 2014-03-02 10:21 PST (History)
26 users (show)

See Also:


Attachments
Example implementation - OpenCL (40.86 KB, patch)
2011-10-18 09:40 PDT, Dirk Schulze
no flags Details | Formatted Diff | Diff
Set of new files for OpenCL support (33.94 KB, patch)
2011-11-25 03:58 PST, Giulio Urlini
no flags Details | Formatted Diff | Diff
Set of modified common files for OpenCL support (1.71 KB, patch)
2011-11-25 04:00 PST, Giulio Urlini
no flags Details | Formatted Diff | Diff
Makefiles and configuration for OpenCL optional support (9.03 KB, patch)
2011-11-25 04:02 PST, Giulio Urlini
no flags Details | Formatted Diff | Diff
Set of SVG filters modified with optional OpenCL support (51.57 KB, patch)
2011-11-25 04:03 PST, Giulio Urlini
no flags Details | Formatted Diff | Diff
OCL kernels for modified SVG filters (35.89 KB, patch)
2011-11-28 01:16 PST, Giulio Urlini
no flags Details | Formatted Diff | Diff
Additional patch to the OpenCL filter patches posted earlier by Giulio Urlini (1.54 KB, patch)
2011-12-01 01:31 PST, Surinder-Pal Singh
no flags Details | Formatted Diff | Diff
OCL kernels for modified SVG filters (37.96 KB, patch)
2011-12-02 00:41 PST, Giulio Urlini
no flags Details | Formatted Diff | Diff
opencl (26.37 KB, application/octet-stream)
2012-01-17 01:41 PST, Tamas Czene
no flags Details
Updated OpenCL patch and performance figures. (69.26 KB, application/octet-stream)
2012-02-08 03:12 PST, Himal Ghimiray
no flags Details
Example implementation - OpenCL (98.97 KB, application/octet-stream)
2012-05-11 06:01 PDT, Tamas Czene
no flags Details
Example implementation - OpenCL (137.01 KB, patch)
2012-09-12 08:34 PDT, Tamas Czene
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dirk Schulze 2011-10-14 04:54:31 PDT
I plan to use OpenCL to HW accelerate SVG and CSS Filters [1]. I'm targeting OpenCL 1.1 which consists of two profiles: 'full' for the Desktop and 'embedded' for embedded devices like mobile phones. For filters I'll use OpenCLs facilities for image processing from the 'full' profile. The most graphic chip manufactures like Imagination [2],[3] support even this part that is optional for the 'embedded' profile.

Right now I plan to reuse the current structure of our filters implementation as much as possible. So we can continue to calculate and use the smallest intermediate image buffers. I'll make sure that we don't have unnecessary data transfers between host and computing device. Just the data transfer for the source image to the device and the data transfer of the result to the host will be used. Therefore I'll propose to proceed with the following steps:

1. Make FilterEffect::apply() independent from any imageBuffer / imageData management (this will be a benefit for other HW acceleration implementations like CIFilters or OpenCL ES as well).
2. Add a compiler flag to disable/enable OpenCL
3. Implement basic FilterOpenCL object that manages the image processing, the kernels, the memory objects and the devices.
4. Create kernels for filter effects. This will be done for every single filter effect in followup patches. At the beginning every filter effect just allocates empty memory on the device to not block already implemented effects. I'll start with implementing SourceGraphic, feOffset and feColorMatrix.
5. Implement fallback to the existing software rendering (or other HW accelerated rendering) if no OpenCL capable device was found.

[1] https://dvcs.w3.org/hg/FXTF/raw-file/tip/filters/publish/Filters.html
[2] http://www.imgtec.com/news/Release/index.asp?NewsID=516
[3] http://www.khronos.org/conformance/adopters/conformant-products/
Comment 1 Chris Marrin 2011-10-14 13:55:01 PDT
Why use OpenCL rather than just using the existing WebGL backend (GraphicsContext3D) that already exists? What advantage would OpenCL give you?

Using GraphicsContext3D gives you several advantages:

1) it's already there. OpenCL is far from universally available. And even though mobile hardware theoretically supports it, I know of no hardware yet that is shipping with it. WebGL runs on top of many platforms. It runs on desktop OpenGL implementations on Windows, OSX and Linux. It runs on OpenGL ES implementations on iOS and Android. And it runs on top of DirectX courtesy of ANGLE.

2) GraphicsContext3D has already solved the "how do I get a buffer on the GPU" issue.

3) The buffers GraphicsContext3D uses already has a path for compositing on the page.

4) There is a new CSS Shaders proposal from Adobe which uses WebGL shaders to do its work. Since SVG and CSS filters are (at least in theory) sharing an implementation, the CSS Shaders implementation would be complicated by an OpenCL backend.
Comment 2 Zoltan Herczeg 2011-10-14 14:13:54 PDT
(In reply to comment #1)

I think the purpose of this work is not proving that something is possible. Rather that something can be done efficiently.

OpenCL has those nice kernel functions which allows an efficient implemetation of light or turbulence magic. If your GraphichsContext3D allows a similar drawing performance and capabilities, it might be a better choice. So how would you implement http://www.w3.org/TR/SVG/filters.html#feTurbulenceElement with G3D?
Comment 3 Simon Fraser (smfr) 2011-10-14 14:17:18 PDT
Maybe do some work on a branch to prove that it's a win, before landing stuff on TOT.
Comment 4 Chris Marrin 2011-10-14 16:30:14 PDT
(In reply to comment #2)
> (In reply to comment #1)
> 
> I think the purpose of this work is not proving that something is possible. Rather that something can be done efficiently.
> 
> OpenCL has those nice kernel functions which allows an efficient implemetation of light or turbulence magic. If your GraphichsContext3D allows a similar drawing performance and capabilities, it might be a better choice. So how would you implement http://www.w3.org/TR/SVG/filters.html#feTurbulenceElement with G3D?

I would find a shader on the web to do it :-)

GLSL, OpenCL and Apple's CoreImage shading language are all just high level language interfaces to the GPU. So I believe what you can do with one you can do with the other. OpenCL may end up having a nicer syntax for some effects, but I think you can ultimately do the same thing with any of them. And since we're not talking about exposing the shader syntax to the author(yet!), I'm not sure it matters.
Comment 5 Dirk Schulze 2011-10-14 22:29:37 PDT
(In reply to comment #4)
> (In reply to comment #2)
> > (In reply to comment #1)
> > 
> > I think the purpose of this work is not proving that something is possible. Rather that something can be done efficiently.
> > 
> > OpenCL has those nice kernel functions which allows an efficient implemetation of light or turbulence magic. If your GraphichsContext3D allows a similar drawing performance and capabilities, it might be a better choice. So how would you implement http://www.w3.org/TR/SVG/filters.html#feTurbulenceElement with G3D?
> 
> I would find a shader on the web to do it :-)
> 
> GLSL, OpenCL and Apple's CoreImage shading language are all just high level language interfaces to the GPU. So I believe what you can do with one you can do with the other. OpenCL may end up having a nicer syntax for some effects, but I think you can ultimately do the same thing with any of them. And since we're not talking about exposing the shader syntax to the author(yet!), I'm not sure it matters.

At first. Why do you suggest WebGL? I assume you mean OpenGL ES. I don't want to give web authors a possibility to apply kernels or shaders at this point.

I'm pretty sure that it is possible with OpenGL ES as well. The advantage of OpenCL is that it can run on different devices at the same time. You can use a CPU, GPU and a DSP all together to calculate the filters. I don't believe that OpenCL device support is a big problem. There is support for OpenCL on all platforms (linux, windows, macOS) and by all bigger graphic chip manufactures and even on embedded SoC solutions (HW and driver).

OpenCL can work together with OpenGL contexts, so we can still use GraphicsContext3D but do the calculations with OpenCL. So I don't see a performance lost. As a secondary effect OpenCL kernels are easy to program and therefore more easy to maintain in my opinion.

Another benefit on using OpenCL at this point is, that we can start implementing without modifying filters to use GraphicsContext3D. Also I'm not sure if it is a good idea to always us GraphicsContext3D for every filter (see https://bugs.webkit.org/show_bug.cgi?id=68479#c8).

Because OpenCL runs on CPU as well, it could still work on devices where the GPU is not OpenCL capable.

However, the first step is a benefit and necessary for all HW accelerated solutions. And we can try to go different ways and choose the most effective implementation that also runs on most devices. Implementations that are not used can be removed later like we do it for unused ports.
Comment 6 Vangelis Kokkevis 2011-10-17 09:42:13 PDT
One more data point:

It's unlikely that Chrome will be able to use an OpenCL implementation of filters, at least not without lots of additional work. The reason is that Chrome's security model prevents WebKit from directly accessing the GPU. All graphics calls are proxied over to a separate process for validation and execution. The proxying mechanism is hidden behind our GraphicsContext3D implementation so any code using GC3D can take advantage of it. Adding support for executing sufficiently validated OpenCL code on a secondary process won't be trivial.

For this and all the reasons that Chris mentioned, I would also favor implementing filters using GC3D and GLSL instead of OpenCL.
Comment 7 Dirk Schulze 2011-10-17 10:48:12 PDT
(In reply to comment #6)
> One more data point:
> 
> It's unlikely that Chrome will be able to use an OpenCL implementation of filters, at least not without lots of additional work. The reason is that Chrome's security model prevents WebKit from directly accessing the GPU. All graphics calls are proxied over to a separate process for validation and execution. The proxying mechanism is hidden behind our GraphicsContext3D implementation so any code using GC3D can take advantage of it. Adding support for executing sufficiently validated OpenCL code on a secondary process won't be trivial.
> 
> For this and all the reasons that Chris mentioned, I would also favor implementing filters using GC3D and GLSL instead of OpenCL.

OpenCL was created from the experience of implementations like CUDA from Nvidia. I strongly believe that OpenCL will replace GLSL in the long term. it has the benefit to not only to work with GPU, but all GPUs CPUs and DSPs of a system. 

The GC3D usage is no barrier. The opposite is the case. Many CL objects have objects with the same meaning in GL and can be used by OpenCL as well:

CL Buffer <-> GL Buffer
CL Image <-> GL Texture / GL RenderBuffers
CL Events <-> GL Sync
CL Context <-> GL Context
...

(The same with D3D objects btw.).

The bigger benefit: we can use HW acceleration for SVG in the short term. This isn't possible for OpenGL right now, because we don't support GC3D on SVG. And there is still a lot to do to make SVG use GC3D IMHO.

But again, I don't have any opposition agains GLSL. But we can just use it for HTML at the beginning, while SVG Filters is the only standardized specification at the moment.

I'm not very familiar with OpenGL. Is the shading language used by OpenGL ES 2.0 the same like GLSL?
Comment 8 Chris Marrin 2011-10-17 10:51:19 PDT
(In reply to comment #5)
> ...
> 
> At first. Why do you suggest WebGL? I assume you mean OpenGL ES. I don't want to give web authors a possibility to apply kernels or shaders at this point.

I say WebGL because GraphicsContext3D is the underlying implementation of WebGL. The API and shading language used is OpenGL ES with restrictions for WebGL compliance. So if you think of your implementation as using WebGL, you can understand the constraints within which you must operate.

> 
> I'm pretty sure that it is possible with OpenGL ES as well. The advantage of OpenCL is that it can run on different devices at the same time. You can use a CPU, GPU and a DSP all together to calculate the filters. I don't believe that OpenCL device support is a big problem. There is support for OpenCL on all platforms (linux, windows, macOS) and by all bigger graphic chip manufactures and even on embedded SoC solutions (HW and driver).

The universality of OpenCL is all very theoretical. As Vangelis says, getting it to work with Chrome would at least be non-trivial. The same is true of other platforms. On all current mobile platforms I am aware of, there is not even an available OpenCL implementation. And if you're talking about using a software OpenCL implementation, that's possible, but wouldn't be very practical, especially on mobile hardware. Since GC3D already works today in many WebKit ports and on many platforms with multiple underlying APIs, it just seems to make sense to use it.

> 
> OpenCL can work together with OpenGL contexts, so we can still use GraphicsContext3D but do the calculations with OpenCL. So I don't see a performance lost. As a secondary effect OpenCL kernels are easy to program and therefore more easy to maintain in my opinion.
> 
> Another benefit on using OpenCL at this point is, that we can start implementing without modifying filters to use GraphicsContext3D. Also I'm not sure if it is a good idea to always us GraphicsContext3D for every filter (see https://bugs.webkit.org/show_bug.cgi?id=68479#c8).
> 
> Because OpenCL runs on CPU as well, it could still work on devices where the GPU is not OpenCL capable.

Maybe this is just my focus, but it doesn't seem to be to be very practical to change SVG filters to use OpenCL just to run on a software implementation. We already have a software implementation of SVG filters. Do you have evidence to show that a software OpenCL based implementation would give significant performance improvements?
Comment 9 Chris Marrin 2011-10-17 10:56:28 PDT
(In reply to comment #7)
...
> The GC3D usage is no barrier. The opposite is the case. Many CL objects have objects with the same meaning in GL and can be used by OpenCL as well:
> 
> CL Buffer <-> GL Buffer
> CL Image <-> GL Texture / GL RenderBuffers
> CL Events <-> GL Sync
> CL Context <-> GL Context
> ...
> 
> (The same with D3D objects btw.).
> 
> The bigger benefit: we can use HW acceleration for SVG in the short term. This isn't possible for OpenGL right now, because we don't support GC3D on SVG. And there is still a lot to do to make SVG use GC3D IMHO.

But in order to use a hardware implementation of OpenCL without a layer, won't you have to send the filter to the hardware (usually GPU), run the OpenCL filter and then copy the resullts back to the CPU? That would have significant overhead and in many cases would cancel the benefits of doing the filtering in hardware.

> 
> But again, I don't have any opposition agains GLSL. But we can just use it for HTML at the beginning, while SVG Filters is the only standardized specification at the moment.
> 
> I'm not very familiar with OpenGL. Is the shading language used by OpenGL ES 2.0 the same like GLSL?

If you look at it as a WebGL variant of GLSL, then it is a subset of both OpenGL and OpenGL ES. There are only a few restrictions relative to GLSL ES, and GLSL ES is a pretty heavily reduced subset of GLSL.
Comment 10 Stephen White 2011-10-18 07:09:30 PDT
It sounds like what's needed is an abstraction layer at the platform/graphics level.  Then we could have OpenCL, GLES, GraphicsContext3D or other implementations of the filters.

This could either be by new methods on GraphicsContext, or a new interface entirely.  A new interface would have the advantage of being more modular, so ports could choose a filter backend independently of the choice of GraphicsContext backend.

It might also be possible to refactor the existing FilterEffect hierarchy to have multiple implementations.  That would be a bit tricky, though, since there are some dependencies on ByteArrays, and other CPU-specific details even in the base class which would have to be abstracted away.
Comment 11 Dirk Schulze 2011-10-18 07:21:56 PDT
(In reply to comment #10)
> It sounds like what's needed is an abstraction layer at the platform/graphics level.  Then we could have OpenCL, GLES, GraphicsContext3D or other implementations of the filters.
We already started that with the ARM implementation, I'd just continue on that way.

> 
> This could either be by new methods on GraphicsContext, or a new interface entirely.  A new interface would have the advantage of being more modular, so ports could choose a filter backend independently of the choice of GraphicsContext backend.
We would just add a apply method and various platformApply functions that get called. However, we can't divide the individual ports easily, since we might want to fallback to other implementations (and at the end to software rendering).

> 
> It might also be possible to refactor the existing FilterEffect hierarchy to have multiple implementations.  That would be a bit tricky, though, since there are some dependencies on ByteArrays, and other CPU-specific details even in the base class which would have to be abstracted away.
That's point one on my list and doesn't need a lot refactoring. We just need to make the apply function independent of the pixel buffers/imageBuffers. Not a big deal. Also it would be the first step for every implementation: OpenCL, OpenGL (WebGL), CI. That's why I start here independent of the further discussion. I'll upload my basic idea of the OpenCL implementation to this bug, just to demonstrate how the different implementations can interact with filters in a couple of days.
Comment 12 Stephen White 2011-10-18 08:13:04 PDT
(In reply to comment #11)
> (In reply to comment #10)
> > It sounds like what's needed is an abstraction layer at the platform/graphics level.  Then we could have OpenCL, GLES, GraphicsContext3D or other implementations of the filters.
> We already started that with the ARM implementation, I'd just continue on that way.

OK, so I'm guessing your plan is to essentially put the OpenCL-specific declarations directly in the header file (e.g., FEGaussianBlur.h), with separate concrete implementations in separate files (e.g., FEGaussianBlurOpenCL.cpp or whatever), but as non-virtual member functions, not as subclasses?  That is consistent with the GraphicsContext approach.  It's fairly lightweight in terms of lines of code/files, but it does have the disadvantage of not being able to runtime fallbacks in a clean way, since there's no virtual interface to do runtime dispatch on (the specific fallback path would have to be decided at compile time).  I think that's probably ok for now, though.

Unlike the Neon case which just takes ByteArrays, presumably the platformApply() functions would need to take some OpenCL-specific datatype, so this code would have to be #ifdef'ed.  Either that, or some abstraction of the images should be used.  Would it be possible or make sense to use Image and/or ImageBuffer here?  Maybe even for the CPU path?  In Chrome we already have code to back these with GPU resources as sources and sinks, so it would really make life easier for us.  Failing that, we could do that in a different flavour of platformApply() which uses Image and ImageBuffer.

> > This could either be by new methods on GraphicsContext, or a new interface entirely.  A new interface would have the advantage of being more modular, so ports could choose a filter backend independently of the choice of GraphicsContext backend.
> We would just add a apply method and various platformApply functions that get called. However, we can't divide the individual ports easily, since we might want to fallback to other implementations (and at the end to software rendering).

Not sure I completely understand this.  There's already a virtual apply method; would we need to subclass here?  Or #ifdef different versions of it to handle the different fallback paths?

> > It might also be possible to refactor the existing FilterEffect hierarchy to have multiple implementations.  That would be a bit tricky, though, since there are some dependencies on ByteArrays, and other CPU-specific details even in the base class which would have to be abstracted away.
> That's point one on my list and doesn't need a lot refactoring. We just need to make the apply function independent of the pixel buffers/imageBuffers. Not a big deal. Also it would be the first step for every implementation: OpenCL, OpenGL (WebGL), CI. That's why I start here independent of the further discussion. I'll upload my basic idea of the OpenCL implementation to this bug, just to demonstrate how the different implementations can interact with filters in a couple of days.

That would be great.  I'm definitely interested in doing an alternate implementation, so having your approach as a reference would be helpful.
Comment 13 Dirk Schulze 2011-10-18 09:33:00 PDT
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > It sounds like what's needed is an abstraction layer at the platform/graphics level.  Then we could have OpenCL, GLES, GraphicsContext3D or other implementations of the filters.
> > We already started that with the ARM implementation, I'd just continue on that way.
> 
> OK, so I'm guessing your plan is to essentially put the OpenCL-specific declarations directly in the header file (e.g., FEGaussianBlur.h), with separate concrete implementations in separate files (e.g., FEGaussianBlurOpenCL.cpp or whatever), but as non-virtual member functions, not as subclasses?  That is consistent with the GraphicsContext approach.  It's fairly lightweight in terms of lines of code/files, but it does have the disadvantage of not being able to runtime fallbacks in a clean way, since there's no virtual interface to do runtime dispatch on (the specific fallback path would have to be decided at compile time).  I think that's probably ok for now, though.
> 
> Unlike the Neon case which just takes ByteArrays, presumably the platformApply() functions would need to take some OpenCL-specific datatype, so this code would have to be #ifdef'ed.  Either that, or some abstraction of the images should be used.  Would it be possible or make sense to use Image and/or ImageBuffer here?  Maybe even for the CPU path?  In Chrome we already have code to back these with GPU resources as sources and sinks, so it would really make life easier for us.  Failing that, we could do that in a different flavour of platformApply() which uses Image and ImageBuffer.
> 
> > > This could either be by new methods on GraphicsContext, or a new interface entirely.  A new interface would have the advantage of being more modular, so ports could choose a filter backend independently of the choice of GraphicsContext backend.
> > We would just add a apply method and various platformApply functions that get called. However, we can't divide the individual ports easily, since we might want to fallback to other implementations (and at the end to software rendering).
> 
> Not sure I completely understand this.  There's already a virtual apply method; would we need to subclass here?  Or #ifdef different versions of it to handle the different fallback paths?
> 
> > > It might also be possible to refactor the existing FilterEffect hierarchy to have multiple implementations.  That would be a bit tricky, though, since there are some dependencies on ByteArrays, and other CPU-specific details even in the base class which would have to be abstracted away.
> > That's point one on my list and doesn't need a lot refactoring. We just need to make the apply function independent of the pixel buffers/imageBuffers. Not a big deal. Also it would be the first step for every implementation: OpenCL, OpenGL (WebGL), CI. That's why I start here independent of the further discussion. I'll upload my basic idea of the OpenCL implementation to this bug, just to demonstrate how the different implementations can interact with filters in a couple of days.
> 
> That would be great.  I'm definitely interested in doing an alternate implementation, so having your approach as a reference would be helpful.

I don't feel happy to publish something in this early state. The patch that I'll upload is just a prove of concept, won't compile or work at the moment, is not the final concept and might contain some errors. I just prototyped it to give an answer to your questions.

In general we have a function called apply() in FilterEffect. This function doesn't need to be virtual anymore. But for the prototype I take FEOffset as example:

void FEOffset::apply()
{
    if (hasResult())
        return;
    FilterEffect* in = inputEffect(0);
    in->apply();
    if (!in->hasResult())
        return;
    determineAbsolutePaintRect();
    
    // systemSupportsHWAcceleration():
    // * must be set once before calling FilterEffect::apply()
    // * can be static
    // * is an enumeration with different supportable HW acceleration possibilities.
#if USE(OPENCL)
    if (systemSupportsHWAcceleration() = OpenCLSupport) {
        platformApplyOpenCL();
        return;
     }
#endif
#if USE(Something else like OpenGL)
    if (systemSupportsHWAcceleration() =  check other HW acceleration implementations) {
         ...
         return;
    }
#endif 
    platformApplyGeneric();
}

Like you can see, there is an enumeration, that contains all HW acceleration implementations supported by the system. This needs to be checked once before applying the filter and can be static. The OpenCL implementation does not use the intermediate ImageBuffers or  ByteArrays. Instead I introduced a FilterOpenCL object that manages all mem allocations on the devices. The platform specific apply functions need to be virtual and contain the implementation aware code (can be moved to external files like FEOffsetOpenCL.cpp). For OpenCL the function either creates the kernels (like for FEColorMatrix) or copies data from one men object on the device to a new men object on the same device. No data will be copied over the CPU if the GPU was chosen. For OpenGL the function platformApplyOpenGL function would create a shader and might use FilterOpenGL for data and shader management.

If no HW acceleration is supported, we jump back to the software rendering with platformApplyGeneric().
Comment 14 Dirk Schulze 2011-10-18 09:40:18 PDT
Created attachment 111453 [details]
Example implementation - OpenCL

Example implementation of OpenCL - prove of concept - doesn't compile :)
Comment 15 Nikolas Zimmermann 2011-10-19 02:37:05 PDT
(In reply to comment #1)
> Why use OpenCL rather than just using the existing WebGL backend (GraphicsContext3D) that already exists? What advantage would OpenCL give you?
I thought about this as well, and think we should start using what-we-already-have, and that is the shading functionality in GC3D, to get started with hw accelerating filters.

My view on this is:
1) Introduce a hardware "platformApplyOpenGL" code path to platform/graphics/filters/*
2) Separate the software rendering fallback into "platformApplySoftware"
3) Write FEGaussianBlurOpenGL.cpp, and let it directly use GC3D, create your shaders, transfer the start ImageData, process the sample, extract an ImageBuffer result.

It may sound awkward to use GC3D in platform/graphics/filters, though we can always refactor that existing code and extract eg. the shading stuff into a FilterOpenGL class, that both Filter and GC3D could use - but keep in mind, I'm just thinking about this, didn't try this nor did I know if its feasible to do so.

> Using GraphicsContext3D gives you several advantages:
> 
> 1) it's already there. OpenCL is far from universally available. And even though mobile hardware theoretically supports it, I know of no hardware yet that is shipping with it. WebGL runs on top of many platforms. It runs on desktop OpenGL implementations on Windows, OSX and Linux. It runs on OpenGL ES implementations on iOS and Android. And it runs on top of DirectX courtesy of ANGLE.
What we all want: get HW accelerated filters as fast as possible, now that the software fallback path can be considered complete. The likelihood to get this turned on in trunk is much higher, if we use proven existing code like GC3D is.

> 
> 2) GraphicsContext3D has already solved the "how do I get a buffer on the GPU" issue.
Agreed.

> 
> 3) The buffers GraphicsContext3D uses already has a path for compositing on the page.
Agreed.

> 
> 4) There is a new CSS Shaders proposal from Adobe which uses WebGL shaders to do its work. Since SVG and CSS filters are (at least in theory) sharing an implementation, the CSS Shaders implementation would be complicated by an OpenCL backend.
That's an important point, and deserves attention. As CSS Shaders will need WebGL shaders, it would be a step backward to introduce another new layer OpenCL at the moment.

CSS Shaders can use the existing GC3D code as well as SVG/CSS filters. As I said above, if it turns out to be awkward to use GC3D right in eg. CSS Shaders or SVG filters (too heavy class, does work we don't need for just using shaders, etc.) we can go and refactor the relevant bits out of GC3D.

Does that sound like a reasonable route?
Comment 16 Zoltan Herczeg 2011-10-19 03:05:58 PDT
We have decieded to start working on the OpenGL shader based filter implementation, and we don't abandon the OpenCL work as well. We have enough resurce to do both.
Comment 17 Oliver Varga 2011-10-19 10:51:49 PDT
Yes, I have started to work on the OpenGL shader based filter implementation.

(In reply to comment #16)
> We have decieded to start working on the OpenGL shader based filter implementation, and we don't abandon the OpenCL work as well. We have enough resurce to do both.
Comment 18 James Robinson 2011-10-25 17:17:47 PDT
If this behavior is based on doing a readback for every element I can't imagine it performing well since it will destroy any frame overlap and parallelism that you would normally get between the GPU and CPU.  Please guard this with a new #ifdef at least so we can evaluate it independently from other features.
Comment 19 Oliver Varga 2011-11-07 08:28:30 PST
I created a bugreport for "OpenGL shader based implementation of the SVG filters".
https://bugs.webkit.org/show_bug.cgi?id=71656
Comment 20 Giulio Urlini 2011-11-21 13:21:12 PST
(In reply to comment #19)
> I created a bugreport for "OpenGL shader based implementation of the SVG filters".
> https://bugs.webkit.org/show_bug.cgi?id=71656

Dear Kirk, dear all,
	I'm Giulio Urlini and I'm working in ST-Microelectronics. In a common activity between us and ST-Ericsson we are developing a set of OpenCL kernels in order to accelerate the SVG filters available in WebKit.
We already have some results on a quad core x86 platform with an Nvidia Quadro NVS 290. The benchmarks are very promising.
The changes in the original WebKit code are minimal, and can be enabled with preporcessor macros without affecting the actual implementation if the target platform does not support OpenCL.
I would like to share our development, but I'm quite new in the WebKit development community. So what's the preferred way to share some code for a review?

Thanks in advance.
Best Regards,

Giulio
Comment 21 Giulio Urlini 2011-11-21 13:23:02 PST
(In reply to comment #20)
> (In reply to comment #19)
> > I created a bugreport for "OpenGL shader based implementation of the SVG filters".
> > https://bugs.webkit.org/show_bug.cgi?id=71656
> 
> Dear Kirk, dear all,
>     I'm Giulio Urlini and I'm working in ST-Microelectronics. In a common activity between us and ST-Ericsson we are developing a set of OpenCL kernels in order to accelerate the SVG filters available in WebKit.
> We already have some results on a quad core x86 platform with an Nvidia Quadro NVS 290. The benchmarks are very promising.
> The changes in the original WebKit code are minimal, and can be enabled with preporcessor macros without affecting the actual implementation if the target platform does not support OpenCL.
> I would like to share our development, but I'm quite new in the WebKit development community. So what's the preferred way to share some code for a review?
> 
> Thanks in advance.
> Best Regards,
> 
> Giulio

Dear Dirk, sorry for the typo
Comment 22 Dirk Schulze 2011-11-21 22:59:58 PST
(In reply to comment #21)
> (In reply to comment #20)
> > (In reply to comment #19)
> > > I created a bugreport for "OpenGL shader based implementation of the SVG filters".
> > > https://bugs.webkit.org/show_bug.cgi?id=71656
> > 
> > Dear Kirk, dear all,
> >     I'm Giulio Urlini and I'm working in ST-Microelectronics. In a common activity between us and ST-Ericsson we are developing a set of OpenCL kernels in order to accelerate the SVG filters available in WebKit.
> > We already have some results on a quad core x86 platform with an Nvidia Quadro NVS 290. The benchmarks are very promising.
> > The changes in the original WebKit code are minimal, and can be enabled with preporcessor macros without affecting the actual implementation if the target platform does not support OpenCL.
> > I would like to share our development, but I'm quite new in the WebKit development community. So what's the preferred way to share some code for a review?
> > 
> > Thanks in advance.
> > Best Regards,
> > 
> > Giulio
> 
> Dear Dirk, sorry for the typo

No problem :) If you want to share the code somehow, just post a link to the patch or upload the patch to this bug report (without setting the review flag). I think we just need to agree how we can implement it once the code is ready for landing. Either by setting up a new branch, like Simon Fraser suggested, or directly to trunk with OpenCL compiler flags.
Comment 23 Giulio Urlini 2011-11-25 03:58:34 PST
Created attachment 116590 [details]
Set of new files for OpenCL support
Comment 24 Giulio Urlini 2011-11-25 04:00:00 PST
Created attachment 116591 [details]
Set of modified common files for OpenCL support
Comment 25 Giulio Urlini 2011-11-25 04:02:31 PST
Created attachment 116593 [details]
Makefiles and configuration for OpenCL optional support
Comment 26 Giulio Urlini 2011-11-25 04:03:29 PST
Created attachment 116594 [details]
Set of SVG filters modified with optional OpenCL support
Comment 27 Giulio Urlini 2011-11-25 04:04:10 PST
Dear all,
     following my previous message I would like to propose our development to integrate the OpenCL support for some SVG filters.
I've divided my patches in several parts:
- config_make_files.patch where the makefiles and configure have been updated to add the compilation option for OpenCL
- common_files_modified.patch where the activation of OpenCL is inserted
- common_files_added.patch a set of files added in order to provide a set of common functions for the OpenCL usage
- filters_modified.patch are the filters actually modified

The preprocessor ENABLE(OPENCL) is activated by configure
the preprocessor USE_C_PROFILING and USE_OPENCL are activated manually in the file oclHelper.h described in common_files_added.patch

Best Regards,

Giulio
Comment 28 Tamas Czene 2011-11-25 05:24:26 PST
(In reply to comment #27)
> Dear all,
>      following my previous message I would like to propose our development to integrate the OpenCL support for some SVG filters.
> I've divided my patches in several parts:
> - config_make_files.patch where the makefiles and configure have been updated to add the compilation option for OpenCL
> - common_files_modified.patch where the activation of OpenCL is inserted
> - common_files_added.patch a set of files added in order to provide a set of common functions for the OpenCL usage
> - filters_modified.patch are the filters actually modified
> 
> The preprocessor ENABLE(OPENCL) is activated by configure
> the preprocessor USE_C_PROFILING and USE_OPENCL are activated manually in the file oclHelper.h described in common_files_added.patch
> 
> Best Regards,
> 
> Giulio


Hi!
I try to test your patch, but i can't. Can you upload the kernel files?
Comment 29 Giulio Urlini 2011-11-28 01:16:40 PST
Created attachment 116710 [details]
OCL kernels for modified SVG filters
Comment 30 Surinder-Pal Singh 2011-12-01 01:31:24 PST
Created attachment 117378 [details]
Additional patch to the OpenCL filter patches posted earlier by Giulio Urlini

Hi,

This patch is a continuation of the patches posted earlier by Giulio from STMicroelectronics and adds a copyright notice as required by the Nvidia OpenCL SDK EULA since the blur filter opencl kernel has been adapted from their OpenCL SDK. Please note that we added the current blur kernel filter  but it does not exactly match the output of the original C++ filter implementation since this was more of a work-in-progress experiment. In the meantime we've also have an implementation that matches exactly the C++ filter output and will post that patch as well shortly.

-Surinder
Comment 31 James Robinson 2011-12-01 11:18:28 PST
Comment on attachment 116710 [details]
OCL kernels for modified SVG filters

These files do not appear to have license header blocks. We require that all code contributed to the project be licensed under a BSD or LGPL 2.1 license, and typically encode this information in comments within the source.
Comment 32 Giulio Urlini 2011-12-02 00:29:23 PST
We are not particularly keen on a specific license. We would prefer GPLv2 or v3 but even BSD or LGPL is ok. Whichever the webkit team would prefer is ok for us. Actually I'll re-post the code with a GPL license statement, but it can be changed if you have a different opinion.
Comment 33 Giulio Urlini 2011-12-02 00:41:12 PST
Created attachment 117584 [details]
OCL kernels for modified SVG filters

This new version of OpenCL kernels contains a proper license statement
Comment 34 Zoltan Herczeg 2011-12-02 01:14:03 PST
(In reply to comment #32)
> We are not particularly keen on a specific license. We would prefer GPLv2 or v3 but even BSD or LGPL is ok. Whichever the webkit team would prefer is ok for us. Actually I'll re-post the code with a GPL license statement, but it can be changed if you have a different opinion.

When someone become a committer, he/she has to sign a paper about code contribution rules, and it clearly states the allowed licenses (BSD or LGPLv2). Basically just copy a license block from a file in the same directory.

Useful for new contributors: http://www.webkit.org/coding/contributing.html
Comment 35 Simon Fraser (smfr) 2011-12-02 11:06:20 PST
(In reply to comment #33)
> Created an attachment (id=117584) [details]
> OCL kernels for modified SVG filters
> 
> This new version of OpenCL kernels contains a proper license statement

Can you use svn or git patches please?
Comment 36 Chris Marrin 2011-12-02 11:19:12 PST
(In reply to comment #18)
> If this behavior is based on doing a readback for every element I can't imagine it performing well since it will destroy any frame overlap and parallelism that you would normally get between the GPU and CPU.  Please guard this with a new #ifdef at least so we can evaluate it independently from other features.

I want to second James' comment about the rationale behind this approach. Uploading an image to the CPU, running an OpenCL shader on it, and then reading the resullt back into the CPU will have a big enough performance impact, that I would be willing to bet that for many filters, just doing it in software would be faster, especially if vector instructions were employed.

WebKit has an architecture for hardware accelerating bitmap layers. Any use of the GPU should go through that mechanism. I don' think it's in the best interest of the WebKit project to embed this sort of functionality throughout the high-level filter logic.

There are system level filter changes currently underway. In https://bugs.webkit.org/show_bug.cgi?id=68479 I will be adding logic to the RenderLayer level to create layers when an element has a filter. I will also add API to GraphicsLayer to pass in info about adding a filter to the layer and animating that filter. There are implementations of GraphicsLayer that use OpenGL contexts for rendering, so it should not be hard to incorporate OpenCL based filter rendering into that.
Comment 37 Tamas Czene 2012-01-17 01:41:40 PST
Created attachment 122736 [details]
opencl

I used Giulio Urlini patch.

In the next step, i will try to eliminate unecessary memory copies between the opencl device and the cpu. The purpose of this we don't copy the filter while we use it.
Comment 38 Himal Ghimiray 2012-02-08 03:12:34 PST
Created attachment 126041 [details]
Updated OpenCL patch and performance figures.

Dear Czene,
I am attaching an updated patch for OpenCL svg filters. This patch basically fixes some bugs and minor cleanup. 

I've also attached a pdf with performance figures ie. GPU vs CPU.
Cheers,
Himal
Comment 39 Himal Ghimiray 2012-02-08 03:21:17 PST
(In reply to comment #38)
> Created an attachment (id=126041) [details]
> Updated OpenCL patch and performance figures.
> Dear Czene,
> I am attaching an updated patch for OpenCL svg filters. This patch basically fixes some bugs and minor cleanup. 
> I've also attached a pdf with performance figures ie. GPU vs CPU.
> Cheers,
> Himal

Dear All,
In performance figure document 
Speed up formula used is CPU Timing/OpenCL Timing(including data transfer) all the graphs are based on it.

Cheers,
Himal
Comment 40 Tamas Czene 2012-05-11 06:01:11 PDT
Created attachment 141390 [details]
Example implementation - OpenCL

The following patch is the OpenCL based implementation of the SVG filters. There aren't any performance measure yet.
Comment 41 Tamas Czene 2012-09-12 08:34:57 PDT
Created attachment 163638 [details]
Example implementation - OpenCL