SourceAlpha, SourceGraphics and FETurbulence filter need to be implemented on OpenCL.
Created attachment 169600 [details] proposed patch This patch contains the OpenCL implementation of SourceAlpha, SourceGraphic, FETurbulence and a simple OpenCL environment. This is only included in Qt linux build. After applying this patch, SVG/Turbulence tests can be sped up greatly.
Attachment 169600 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/platform/graphics/filters/OpenCL/OpenCLHandle.h:33: cl_mem is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] Total errors found: 1 in 14 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 169601 [details] svg files Here are some tests results: Before applying: Methanol Benchmark (version: 3%) svg/turbulence.svg: 42.386 (10.832363160749017%) svg/turbulence2.svg: 103.58 (4.411911339871702%) After applying: Methanol Benchmark (version: 3%) svg/turbulence.svg: 15.48 (8.185547318582563%) svg/turbulence2.svg: 27.668 (4.040529884401583%)
(In reply to comment #3) > Created an attachment (id=169601) [details] > svg files > > Here are some tests results: > Before applying: > > Methanol Benchmark (version: 3%) > > svg/turbulence.svg: 42.386 (10.832363160749017%) > > svg/turbulence2.svg: 103.58 (4.411911339871702%) > > After applying: > > Methanol Benchmark (version: 3%) > > svg/turbulence.svg: 15.48 (8.185547318582563%) > > svg/turbulence2.svg: 27.668 (4.040529884401583%) Well, without more details these data don't make sense. What HW do you have? Did it run on GPU or CPU? What do the numbers mean? Time in seconds? What does 3% mean, what is the relation? After all you need to discuss on the mailing list webkit-dev if there is an agreement to add it to WebKit. This is new functionality and must be maintained. It is less likely to get maintained when it just runs on one platform.
Comment on attachment 169600 [details] proposed patch Makes lots of svg tests fail.
(In reply to comment #5) > (From update of attachment 169600 [details]) > Makes lots of svg tests fail. You are right, last time when i checked it all test was succesfull. (In reply to comment #4) > (In reply to comment #3) > > Created an attachment (id=169601) [details] [details] > > svg files > > > > Here are some tests results: > > Before applying: > > > > Methanol Benchmark (version: 3%) > > > > svg/turbulence.svg: 42.386 (10.832363160749017%) > > > > svg/turbulence2.svg: 103.58 (4.411911339871702%) > > > > After applying: > > > > Methanol Benchmark (version: 3%) > > > > svg/turbulence.svg: 15.48 (8.185547318582563%) > > > > svg/turbulence2.svg: 27.668 (4.040529884401583%) > Well, without more details these data don't make sense. What HW do you have? Did it run on GPU or CPU? What do the numbers mean? Time in seconds? What does 3% mean, what is the relation? > I will create a more detailed test on GPU and CPU. > After all you need to discuss on the mailing list webkit-dev if there is an agreement to add it to WebKit. This is new functionality and must be maintained. It is less likely to get maintained when it just runs on one platform. I can build it only Qt, but it is easy to port to every platforms.
Created attachment 175437 [details] proposed patch
Created attachment 175439 [details] svg file Benchmark results: Qt Linux: GPU: NVIDIA GeForce GT 440 CUDA Cores 96 CPU: Intel(R) Core(TM) i5-2320 CPU @ 3.00GHz 4 Core Software: 3-4 fps OpenCL GPU: 15-17 fps Mac CPU: Intel(R) Xeon(R) X5670 CPU @ 2.93GHz version: OpenCL 1.1 Compute Units: 24 ATI Radeon HD 5770 version: OpenCL 1.1 Compute Units: 10 400 Processing Units Shader Clock: 650 MHz Software: 16-18 fps OpenCL CPU: 35-37 fps OpenCL GPU: 47-50 fps
Attachment 175437 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/platform/graphics/gpu/opencl/OpenCLHandle.h:33: cl_mem is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] Total errors found: 1 in 14 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 175437 [details] proposed patch Attachment 175437 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/14908975 New failing tests: inspector-protocol/debugger-terminate-dedicated-worker-while-paused.html
We are waiting for comments, thoughts, etc. about the patch.
View in context: https://bugs.webkit.org/attachment.cgi?id=175437&action=review > Source/WebCore/Target.pri:4091 > + HEADERS += platform/graphics/gpu/opencl/OpenCLHandle.h > + HEADERS += platform/graphics/gpu/opencl/FilterContextOpenCL.h > + SOURCES += platform/graphics/gpu/opencl/FilterContextOpenCL.cpp > + SOURCES += platform/graphics/gpu/opencl/OpenCLFESourceAlpha.cpp > + SOURCES += platform/graphics/gpu/opencl/OpenCLFESourceGraphic.cpp > + SOURCES += platform/graphics/gpu/opencl/OpenCLFETurbulence.cpp += for all files looks a little overkill. You could use something like this (like in any other place): HEADERS += .../OpenCLHandle.h \ .../Filter... and same for the SOURCES
Comment on attachment 175437 [details] proposed patch Thanks for all comments, and thanks for the patch. r=me with a few comments: View in context: https://bugs.webkit.org/attachment.cgi?id=175437&action=review > Source/WebCore/ChangeLog:9 > + This patch contains the OpenCL implementation of SourceAlpha, SourceGraphic, FETurbulence and a simple OpenCL environment. > + This is only included in Qt linux build. Please mention here the speedup. And a little more verbose about the patch. > Source/WebCore/platform/graphics/gpu/opencl/FilterContextOpenCL.cpp:38 > + static bool initFailed = false; This name should int alreadyInitialized > Source/WebCore/platform/graphics/gpu/opencl/FilterContextOpenCL.cpp:52 > + size_t deviceBufferSize = -1; Here deviceBufferSize is unsigned... > Source/WebCore/platform/graphics/gpu/opencl/FilterContextOpenCL.cpp:67 > + if (deviceBufferSize <= 0) And you compare it with <= 0. I think you could simply set it to 0, and change this comparison to !deviceBufferSize. > Source/WebCore/platform/graphics/gpu/opencl/FilterContextOpenCL.cpp:70 > + devices = reinterpret_cast<cl_device_id *>(fastMalloc(deviceBufferSize)); cl_device_id* without space > Source/WebCore/platform/graphics/gpu/opencl/FilterContextOpenCL.cpp:120 > +void FilterContextOpenCL::openCLTransformColorSpace(OpenCLHandle& source, IntRect sourceSize, ColorSpace srcColorSpace, ColorSpace dstColorSpace) This is just a note, but perhaps you could try sometimes the performance of doing the computation in the kernel without passing an array. > Source/WebCore/platform/graphics/gpu/opencl/OpenCLHandle.h:30 > + OpenCLHandle() : m_openclMemory(0) { } openCLImage and m_openclMemory names are not consistent. Use openCL everywhere (there are some other identifiers where this renaming is needed).
Created attachment 175749 [details] proposed patch
Comment on attachment 175749 [details] proposed patch Rejecting attachment 175749 [details] from commit-queue. Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '-..." exit_code: 1 ERROR: /mnt/git/webkit-commit-queue/Source/WebCore/ChangeLog neither lists a valid reviewer nor contains the string "Unreviewed" or "Rubber stamp" (case insensitive). Full output: http://queues.webkit.org/results/14967646
View in context: https://bugs.webkit.org/attachment.cgi?id=175437&action=review > Source/WebCore/platform/graphics/gpu/opencl/FilterContextOpenCL.cpp:141 > + These look like they don't account for the possibility that the kernel will fail to build. If the build fails, you'll be setting arguments / trying to run a null kernel > Source/WebCore/platform/graphics/gpu/opencl/FilterContextOpenCL.h:32 > +#include "CL/cl.h" On OS X This is under <OpenCL/cl.h> > Source/WebCore/platform/graphics/gpu/opencl/FilterContextOpenCL.h:104 > + { > + clFinish(m_context->m_commandQueue); > + clEnqueueNDRangeKernel(m_context->m_commandQueue, m_kernel, 2, 0, m_globalSize, 0, 0, 0, 0); > + } These should really be checked. These are pretty likely to fail. I also don't approve of using the default group size since that tends to not perform well (although choosing an appropriate size is unfortunately somewhat difficult) > Source/WebCore/platform/graphics/gpu/opencl/OpenCLFETurbulence.cpp:58 > + return a + t * (b - a); return mad(b - a, t, a);? > Source/WebCore/platform/graphics/gpu/opencl/OpenCLFETurbulence.cpp:86 > + I suspect that a lot of your indexing calculations are candidates for mad24() if you know your sizes require < 24 bits. > Source/WebCore/platform/graphics/gpu/opencl/OpenCLFETurbulence.cpp:114 > + StitchData stitchData; > + stitchData.width = 0; > + stitchData.wrapX = 0; > + stitchData.height = 0; > + stitchData.wrapY = 0; How about StitchData stitchData = { 0, 0, 0, 0 };? > Source/WebCore/platform/graphics/gpu/opencl/OpenCLFETurbulence.cpp:126 > + if (baseFrequencyX / lowFrequency < highFrequency / baseFrequencyX) > + baseFrequencyX = lowFrequency; > + else > + baseFrequencyX = highFrequency; For conditional assignments like this it would probably be better to use ternary operator or select() function. > Source/WebCore/platform/graphics/gpu/opencl/OpenCLFETurbulence.cpp:135 > + if (baseFrequencyY / lowFrequency < highFrequency / baseFrequencyY) > + baseFrequencyY = lowFrequency; > + else > + baseFrequencyY = highFrequency; > + } Same. > Source/WebCore/platform/graphics/gpu/opencl/OpenCLFETurbulence.cpp:161 > + if (type == 1) { > + turbulenceFunctionResult.x += noise2D(redComponent, latticeSelector, stitchData, noiseVectorX, noiseVectorY, stitchTiles) / ratio; > + turbulenceFunctionResult.y += noise2D(greenComponent, latticeSelector, stitchData, noiseVectorX, noiseVectorY, stitchTiles) / ratio; > + turbulenceFunctionResult.z += noise2D(blueComponent, latticeSelector, stitchData, noiseVectorX, noiseVectorY, stitchTiles) / ratio; > + turbulenceFunctionResult.w += noise2D(alphaComponent, latticeSelector, stitchData, noiseVectorX, noiseVectorY, stitchTiles) / ratio; > + } else { > + turbulenceFunctionResult.x += fabs(noise2D(redComponent, latticeSelector, stitchData, noiseVectorX, noiseVectorY, stitchTiles)) / ratio; > + turbulenceFunctionResult.y += fabs(noise2D(greenComponent, latticeSelector, stitchData, noiseVectorX, noiseVectorY, stitchTiles)) / ratio; > + turbulenceFunctionResult.z += fabs(noise2D(blueComponent, latticeSelector, stitchData, noiseVectorX, noiseVectorY, stitchTiles)) / ratio; > + turbulenceFunctionResult.w += fabs(noise2D(alphaComponent, latticeSelector, stitchData, noiseVectorX, noiseVectorY, stitchTiles)) / ratio; > + } Move the common noise2D part ahead of the if, and then instead of the if do something like turbuenceFunctionResult += (type == 1) ? result : fabs(result) > Source/WebCore/platform/graphics/gpu/opencl/OpenCLFETurbulence.cpp:169 > + stitchData.wrapX = 2 * stitchData.wrapX - s_perlinNoise; How about stitchData.wrapX = mad(2, stitchData.wrapX, -s_perlineNoise)? > Source/WebCore/platform/graphics/gpu/opencl/OpenCLFETurbulence.cpp:176 > + turbulenceFunctionResult = turbulenceFunctionResult * 0.5f + 0.5f; mad(0.5f, turbulenceFunctionResult, 0.5f)? > Source/WebCore/platform/graphics/gpu/opencl/OpenCLFETurbulence.cpp:223 > + kernel.run(); Where do you wait for the kernel to complete? The only clFinish/clWaitForEvents I see is before a clEnqueueNDRangeKernel
Attachment 175749 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/platform/graphics/gpu/opencl/OpenCLHandle.h:33: cl_mem is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] Total errors found: 1 in 14 files If any of these errors are false positives, please file a bug against check-webkit-style.
You have some good ideas. > These look like they don't account for the possibility that the kernel will > fail to build. If the build fails, you'll be setting arguments / trying to > run a null kernel The source code of the kernel should be correct, so the only fail possibility here is the out-of-resource allocation, which could be handled by a CRASH() macro. > > Source/WebCore/platform/graphics/gpu/opencl/FilterContextOpenCL.h:32 > > +#include "CL/cl.h" > > On OS X This is under <OpenCL/cl.h> Mac port comes later. > > > Source/WebCore/platform/graphics/gpu/opencl/FilterContextOpenCL.h:104 > > + { > > + clFinish(m_context->m_commandQueue); > > + clEnqueueNDRangeKernel(m_context->m_commandQueue, m_kernel, 2, 0, m_globalSize, 0, 0, 0, 0); > > + } > > These should really be checked. These are pretty likely to fail. Why? > Where do you wait for the kernel to complete? The only clFinish/clWaitForEvents I see is before a clEnqueueNDRangeKernel Opencl supports async execution, so we don't need to wait here.
> > Source/WebCore/platform/graphics/gpu/opencl/OpenCLFETurbulence.cpp:114 > > + StitchData stitchData; > > + stitchData.width = 0; > > + stitchData.wrapX = 0; > > + stitchData.height = 0; > > + stitchData.wrapY = 0; > > How about StitchData stitchData = { 0, 0, 0, 0 };? C++ preprocessor will complain abut it, since comma is also an argument separator for the preprocessor.
(In reply to comment #18) > You have some good ideas. > > > These look like they don't account for the possibility that the kernel will > fail to build. If the build fails, you'll be setting arguments / trying to > run a null kernel > > The source code of the kernel should be correct, so the only fail possibility here is the out-of-resource allocation, which could be handled by a CRASH() macro. The kernel might be correct, but the compiler might not. My experience with OpenCL is taking one kernel that builds on one implementation won't work on another. I wouldn't put so much faith in the compiler. > > > > > Source/WebCore/platform/graphics/gpu/opencl/FilterContextOpenCL.h:104 > > > + { > > > + clFinish(m_context->m_commandQueue); > > > + clEnqueueNDRangeKernel(m_context->m_commandQueue, m_kernel, 2, 0, m_globalSize, 0, 0, 0, 0); > > > + } > > > > These should really be checked. These are pretty likely to fail. > > Why? Such is the way with OpenCL. On the Nvidia platform in particular most problems end up manifesting themselves as an error from clFinish. > > > Where do you wait for the kernel to complete? The only clFinish/clWaitForEvents I see is before a clEnqueueNDRangeKernel > > Opencl supports async execution, so we don't need to wait here. You do when you'll need the result (which I guess I'm missing)
(In reply to comment #19) > > > Source/WebCore/platform/graphics/gpu/opencl/OpenCLFETurbulence.cpp:114 > > > + StitchData stitchData; > > > + stitchData.width = 0; > > > + stitchData.wrapX = 0; > > > + stitchData.height = 0; > > > + stitchData.wrapY = 0; > > > > How about StitchData stitchData = { 0, 0, 0, 0 };? > > C++ preprocessor will complain abut it, since comma is also an argument separator for the preprocessor. If your stringify macro is #define STRINGIFY(...) # __VA_ARGS__ it should work
(In reply to comment #20) > > Why? > Another reason is buffer allocation is usually delayed until first use. e.g. Create output buffer, bind to kernel, and then the memory resource allocation will fail on the enqueuendrange
> The kernel might be correct, but the compiler might not. My experience with > OpenCL is taking one kernel that builds on one implementation won't work on > another. I wouldn't put so much faith in the compiler. Any example for that? We does not use anything which is not specified in the standard. Btw this patch contains a feature, which should be tested before enabled, so in general the only thing which could happen here is the out-of-resource condition. > Such is the way with OpenCL. On the Nvidia platform in particular most problems end up manifesting themselves as an error from clFinish. In general WK does not have a mechanism to detect out-of-resource conditions. It calls CRASH(), or just simply crashes. > > > Where do you wait for the kernel to complete? The only clFinish/clWaitForEvents I see is before a clEnqueueNDRangeKernel > > > > Opencl supports async execution, so we don't need to wait here. > > You do when you'll need the result (which I guess I'm missing) That is performed by clEnqueueReadImage() (and ideally only happens for the last image). The standard says: All commands that use this image object have finished execution before the read command begins execution. The general idea is that we just enque OpenCL commands, and let WK and OCL do their work, and when the result is needed we read it. Up to that point they can run parallelly.
> > > Why? > > > > Another reason is buffer allocation is usually delayed until first use. e.g. Create output buffer, bind to kernel, and then the memory resource allocation will fail on the enqueuendrange Ok so this is another out-of-resource condition.
(In reply to comment #16) > View in context: https://bugs.webkit.org/attachment.cgi?id=175437&action=review > > Source/WebCore/platform/graphics/gpu/opencl/OpenCLFETurbulence.cpp:169 > > + stitchData.wrapX = 2 * stitchData.wrapX - s_perlinNoise; > > How about stitchData.wrapX = mad(2, stitchData.wrapX, -s_perlineNoise)? The suggested code didn't work, probably because stitchData.wrapX is type of int.
Created attachment 175808 [details] proposed patch
Attachment 175808 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/platform/graphics/gpu/opencl/OpenCLHandle.h:33: cl_mem is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] Total errors found: 1 in 14 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 175967 [details] proposed patch
Attachment 175967 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/platform/graphics/gpu/opencl/OpenCLHandle.h:33: cl_mem is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] Total errors found: 1 in 14 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 175996 [details] proposed patch
Attachment 175996 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/platform/graphics/gpu/opencl/OpenCLHandle.h:33: cl_mem is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] Total errors found: 1 in 14 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 176005 [details] proposed patch
Attachment 176005 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/platform/graphics/gpu/opencl/OpenCLHandle.h:33: cl_mem is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] Total errors found: 1 in 14 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 176005 [details] proposed patch Thanks for all comments. I think this patch is ready now. r=me.
Comment on attachment 176005 [details] proposed patch Clearing flags on attachment: 176005 Committed r135733: <http://trac.webkit.org/changeset/135733>
All reviewed patches have been landed. Closing bug.
Would you add OPENCL flag to https://trac.webkit.org/wiki/FeatureFlags please?
(In reply to comment #37) > Would you add OPENCL flag to https://trac.webkit.org/wiki/FeatureFlags please? Thank you for reminding me. I also noticed that some other flags are missing, I will update the text on tomorrow morning.