WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 43903
Speeding up SVG filters with multicore (SMP) support
https://bugs.webkit.org/show_bug.cgi?id=43903
Summary
Speeding up SVG filters with multicore (SMP) support
Zoltan Herczeg
Reported
2010-08-12 02:46:50 PDT
This patch is just show new ways of exploiting multicore cpus in WebKit. Filters usually requires a huge amount of CPU work, especially for animated SVGs. Any feedback would be welcome.
Attachments
parallel svg light filter
(9.38 KB, patch)
2010-08-12 02:49 PDT
,
Zoltan Herczeg
no flags
Details
Formatted Diff
Diff
updated patch
(17.22 KB, patch)
2010-11-30 05:07 PST
,
Zoltan Herczeg
no flags
Details
Formatted Diff
Diff
Speeding up FETurbulence with SMP
(23.58 KB, patch)
2011-02-22 07:06 PST
,
Gabor Loki
ossy
: review-
Details
Formatted Diff
Diff
A big SVG to test turbulence filter
(625 bytes, image/svg+xml)
2011-03-02 03:44 PST
,
Gabor Loki
no flags
Details
A simple test case to benchmark the SMP-based turbulence filter
(2.54 KB, image/svg+xml)
2011-03-02 04:24 PST
,
Gabor Loki
no flags
Details
Measurement result with Methanol
(1.12 KB, text/plain)
2011-03-09 07:41 PST
,
Gabor Loki
no flags
Details
Speeding up FETurbulence with SMP
(30.44 KB, patch)
2011-04-07 01:06 PDT
,
Gabor Loki
no flags
Details
Formatted Diff
Diff
Speeding up FETurbulence with SMP
(30.46 KB, patch)
2011-04-07 05:19 PDT
,
Gabor Loki
no flags
Details
Formatted Diff
Diff
Speeding up FETurbulence with SMP
(41.75 KB, patch)
2011-04-12 00:41 PDT
,
Gabor Loki
no flags
Details
Formatted Diff
Diff
Speeding up FETurbulence with SMP
(42.72 KB, patch)
2011-04-12 06:26 PDT
,
Gabor Loki
no flags
Details
Formatted Diff
Diff
Speeding up FETurbulence with SMP
(42.15 KB, patch)
2011-04-26 01:15 PDT
,
Gabor Loki
no flags
Details
Formatted Diff
Diff
Show Obsolete
(7)
View All
Add attachment
proposed patch, testcase, etc.
Zoltan Herczeg
Comment 1
2010-08-12 02:49:59 PDT
Created
attachment 64198
[details]
parallel svg light filter Light filter is easy to split up for multiple CPUs. The initial plan is to add one helper thread, and two functions (execute / waitForCompletition) to the Filter base class.
Zoltan Herczeg
Comment 2
2010-11-30 05:07:48 PST
Created
attachment 75125
[details]
updated patch
Nikolas Zimmermann
Comment 3
2010-11-30 05:48:36 PST
Comment on
attachment 75125
[details]
updated patch View in context:
https://bugs.webkit.org/attachment.cgi?id=75125&action=review
The idea looks great, the video you showed is also very convincing! :-) Some early review comments:
> WebCore/platform/graphics/filters/FELighting.cpp:240 > + data.interior(offset, normalVector);
Hm, not related to this patch, but 'interior' is a bad name, what does it do?
> WebCore/platform/graphics/filters/FELighting.cpp:246 > +#ifdef FILTER_THREAD
This should use the USE(FILTER_THREADS) or ENABLE(FILTER_THREADS) logic. (not sure which one to use, need to read Maciejs mail again regarding that topic, when to use USE or ENABLE, but I guess USE is approriate here).
> WebCore/platform/graphics/filters/FELighting.cpp:249 > + PaintThreadData& data = *reinterpret_cast<PaintThreadData*>(parameter);
How about adding ASSERT(parameter) as first argument.
> WebCore/platform/graphics/filters/FELighting.cpp:280 > + if (width >= 3 && height >= 10 && width * height > 4000) {
I'd like a comment here, that this is emperical data.
> WebCore/platform/graphics/filters/FELighting.h:79 > + {
An ASSERT(filter) can't hurt. It looks awkward to use filter(filter) and lightingData(lightingData) - is that guarenteed to work? I was not aware that this is possible!
> WebCore/platform/graphics/filters/FETurbulence.cpp:335 > +#ifdef FILTER_THREAD
Use USE(FILTER_THREADS).
> WebCore/platform/graphics/filters/FETurbulence.cpp:338 > + PaintThreadData& data = *reinterpret_cast<PaintThreadData*>(parameter);
Maybe ASSERT(parameter) first.
> WebCore/platform/graphics/filters/FETurbulence.cpp:360 > +#ifdef FILTER_THREAD
Same comment, use USE(FILTER_THREADS), won't repeat it for the other occourences.
> WebCore/platform/graphics/filters/FETurbulence.cpp:362 > + if (imageRect.height() >= 6 && imageRect.width() * imageRect.height() > 4000) {
Emperical data comment...
> WebCore/platform/graphics/filters/FETurbulence.h:95 > + {
ASSERT(filter)
> WebCore/platform/graphics/filters/FilterEffect.cpp:33 > +#ifdef FILTER_THREAD > +ThreadIdentifier FilterEffect::s_threadID; > +FilterEffect::ThreadShared* FilterEffect::s_threadShared; > +bool FilterEffect::s_stopThread; > +#endif
We generally disallow globals, so how about putting these in functions? static ThreadIdentifier threadIdentifier() { static ThreadIdentifier s_threadID; return s_threadID; } Are you avoiding the use of DEFINE_STATIC_LOCAL on purpose? I have never used threads in WebCore so far, forgive my ignorance if I'm overlooking something obvious.
> WebCore/platform/graphics/filters/FilterEffect.cpp:141 > + s_threadShared->m_cond.wait(s_threadShared->m_mutex);
Is this prefereedd style when using threads? Are shall we expose accesors for m_cond, m_start, m_finish, m_function, m_mutex etc.?
Gabor Loki
Comment 4
2011-02-22 07:06:48 PST
Created
attachment 83307
[details]
Speeding up FETurbulence with SMP I have redesigned the previous concept of threading from Zoltan's suggestion and introduced a framework for parallel tasks/jobs. The ParallelJobs framework is based on WebKit's threading infrastructure and/or OpenMP's API. The usage of ParallelJobs is the following: 1.- Create ParallelJobs object - which allocates the worker threads and necessary objects. 2.- Fill the parameter array - allocated by ParallelJobs, assign different sub-tasks for the workers 3.- Execute the workers 4.- Combine the necessary information [optional] This patch contains: 1.- ParallelJobs framework 2.- SMP version of FETurbulence One more note: I have measured 60%-80% performance progression on dual-core systems on ARM and x86 as well.
Benjamin Poulain
Comment 5
2011-02-22 07:34:42 PST
May I suggest to add benchmarks (even if only Qt benchmarks) to see how does that fly? I am also not a fan of OpenMP for having had some strange performance numbers out of it (like x1.1 boost where direct threading with pthread gave me x1.9). If we want to use more SMP inside WebKit, I would really prefer something more like TBB, because you have a lot more control over the way things are parallelized.
Build Bot
Comment 6
2011-02-22 07:49:08 PST
Attachment 83307
[details]
did not build on win: Build output:
http://queues.webkit.org/results/7946220
WebKit Review Bot
Comment 7
2011-02-22 08:36:41 PST
Attachment 83307
[details]
did not build on mac: Build output:
http://queues.webkit.org/results/7946234
Gabor Loki
Comment 8
2011-02-22 08:42:56 PST
> May I suggest to add benchmarks (even if only Qt benchmarks) to see how does that fly?
Where to add benchmarks? If you send me some pointers I will happily measure the effect of this patch, but I have to mention that in general the benchmarks mainly give an overall insight about the performance. So, if you really would like to measure the effect of this kind of parallelization, you should create a basic svg with turbulence filter. For example: you can use the Source/WebCore/manual-tests/svg-filter-animation.svg as a simple benchmark after all non-related filters are removed from the SVG.
> I would really prefer something more like TBB, because you have a lot more control over the way things are parallelized.
Good idea, but I feel this could be another patch where we will introduce TBB for another threading model of the WebKit's threading interface.
Benjamin Poulain
Comment 9
2011-02-22 08:58:43 PST
(In reply to
comment #8
)
> > May I suggest to add benchmarks (even if only Qt benchmarks) to see how does that fly? > > Where to add benchmarks? If you send me some pointers I will happily measure the effect of this patch, but I have to mention that in general the benchmarks mainly give an overall insight about the performance. So, if you really would like to measure the effect of this kind of parallelization, you should create a basic svg with turbulence filter. For example: you can use the Source/WebCore/manual-tests/svg-filter-animation.svg as a simple benchmark after all non-related filters are removed from the SVG.
The benchmarks for Qt are in Source/WebKit/qt/tests/benchmarks/ This is really a performance optimization. You don't need SMP to make the code cleaner/simpler, quite the contrary. So I think the performance gain is very relevant for this task. If I understant correctly, you are doing research. So getting accurate numbers is probably interesting for your papers as well. When we considered parallelization for Qt's CPU rasterization, we looked for the boundaries of input for which SMP is helpful VS is harmful. I think this kind of analysis in both interesting for the WebKit project, and for research. (and note that we ended up not using parallelization for most primitives of Qt because it was more harmful than helpful for general rendering).
Gabor Loki
Comment 10
2011-02-24 00:19:50 PST
> If I understant correctly, you are doing research. So getting accurate numbers is probably interesting for your papers as well.
I would be the most happiest people if this topic was a research.
> When we considered parallelization for Qt's CPU rasterization, we looked for the boundaries of input for which SMP is helpful VS is harmful. I think this kind of analysis in both interesting for the WebKit project, and for research. (and note that we ended up not using parallelization for most primitives of Qt because it was more harmful than helpful for general rendering).
I agree with you about the importance of boundaries. That's why I left the s_minimalRectDimension for platforms. All other code depends on the number of requested threads. So, if the platform/OS/architecture is able to decide its boundaries this will be an easy task to implement their needs. Of course lots of benchmarking are needed to achieve this, but for introducing this parallelization approach the current limit is fine. Fine tuning of parallel turbulence filter should be done later with the help of benchmarks. Anyway I will check Qt's benchmark environment. Is the time measuring method of this environment based on clock_t or wall-time (for example: clock_gettime)?
Benjamin Poulain
Comment 11
2011-02-24 02:07:14 PST
(In reply to
comment #10
)
> Anyway I will check Qt's benchmark environment. Is the time measuring method of this environment based on clock_t or wall-time (for example: clock_gettime)?
You choose the kind of timing with a parameter when running the benchmark:
http://doc.qt.nokia.com/latest/qtestlib-manual.html#creating-a-benchmark
Eric Seidel (no email)
Comment 12
2011-02-24 02:20:48 PST
crazy.
Eric Seidel (no email)
Comment 13
2011-02-24 02:21:43 PST
If we're going to use this parallel jobs thing widely, it's eventually going to need a GCG implementation:
http://developer.apple.com/technologies/mac/snowleopard/gcd.html
Simon Fraser (smfr)
Comment 14
2011-02-24 22:17:21 PST
Comment on
attachment 83307
[details]
Speeding up FETurbulence with SMP View in context:
https://bugs.webkit.org/attachment.cgi?id=83307&action=review
> Source/JavaScriptCore/ChangeLog:10 > + too complex. Using the power of Simmetric Multi Processing (SMP) we
symmetric
Simon Fraser (smfr)
Comment 15
2011-02-24 22:17:55 PST
I think it would be better to try to speed things up with SSE before we go to multithreading.
Zoltan Herczeg
Comment 16
2011-02-24 23:13:07 PST
I think some template magic could make this more user frendly. We could remove some static_casts and sizeof operations in that way.
Zoltan Herczeg
Comment 17
2011-02-24 23:15:19 PST
(In reply to
comment #15
)
> I think it would be better to try to speed things up with SSE before we go to multithreading.
See
https://bugs.webkit.org/show_bug.cgi?id=54456
(ARM Neon is similar to x86 SSE). Could you help me reviewing that?
Gabor Loki
Comment 18
2011-02-25 01:06:18 PST
(In reply to
comment #15
)
> I think it would be better to try to speed things up with SSE before we go to multithreading.
The SSE and NEON are very cool features which should be used in WebKit directly or indirectly, but they are architecture dependent features. We can use them to speed up something directly, like Zoltan's patch for FELighting, or design a common framework (something like JSC's macro assembler) which drops many performance opportunities because of the high level abstraction - like in JSC. Anyhow, we should consider SIMD extensions, but the scope of SIMD and SMP are different. Any of them can live without the other, and can work with the other.
Balazs Kelemen
Comment 19
2011-03-01 01:27:07 PST
(In reply to
comment #11
)
> (In reply to
comment #10
) > > Anyway I will check Qt's benchmark environment. Is the time measuring method of this environment based on clock_t or wall-time (for example: clock_gettime)? > > You choose the kind of timing with a parameter when running the benchmark:
http://doc.qt.nokia.com/latest/qtestlib-manual.html#creating-a-benchmark
You can find another version of the benchmarks on gitorious:
http://gitorious.org/qtwebkit/performance
. In this repo the benchmarks based on wall-time with higher accuracy.
Gabor Loki
Comment 20
2011-03-02 00:55:30 PST
> You can find another version of the benchmarks on gitorious:
http://gitorious.org/qtwebkit/performance
.
It looks like this performance tool has some issue running as a benchmark. I used several tools from this project, but they provided strange values to me. I create a simple svg with only one filter (on a BIG area). It takes about 5 sec to compute the filter with QtWebKit. And here is a sample result of cycler test: ********* Start testing of tst_Cycler ********* Config: Using QTest library 4.7.0, Qt 4.7.0 PASS : tst_Cycler::initTestCase() PASS : tst_Cycler::load() PASS : tst_Cycler::cleanupTestCase() Totals: 3 passed, 0 failed, 0 skipped ********* Finished testing of tst_Cycler ********* Printing result for: total benchmark: cycler::load
http://127.0.0.1/~smp/svg/turbulence.svg
mean: 3 msecs +/- 3239 msecs, +/- 107967 % avg: 1685 msecs 5741 3 3 3 2 2 2 5464 5636 3 Total mean: 3 msecs per test, avg: 1685 And another one: mean: 3 msecs +/- 3850 msecs, +/- 128333 % avg: 2312 msecs 5875 5682 5875 5679 3 2 2 2 2 2 Btw, using WebKit's threading, the results: mean: 2 msecs +/- 1588 msecs, +/- 79400 % avg: 676 msecs 3408 3336 3 3 2 2 2 2 2 2 I think there is some kind of caching issue. I feel this environment cannot be used as a benchmark now. Benjamin, do you have any idea what is happening here? Why did the environment produce this strange numbers? Btw, the derivation of the valid numbers are also too high.
Benjamin Poulain
Comment 21
2011-03-02 03:00:57 PST
> Benjamin, do you have any idea what is happening here? Why did the environment produce this strange numbers?
Nope, I have no idea what is going on here. The deviation is just insane, it is usually around 1%. But I would not use that particular benchmark. I would use tst_Painting::fullPagePaint(). Because what you really care about is the rendering speed. The cycler will also take into account loading and layout time.
Gabor Loki
Comment 22
2011-03-02 03:44:45 PST
Created
attachment 84394
[details]
A big SVG to test turbulence filter
Gabor Loki
Comment 23
2011-03-02 03:47:12 PST
> I would use tst_Painting::fullPagePaint(). Because what you really care about is the rendering speed.
In this test the calculation of SVG is not included. ********* Finished testing of tst_Painting ********* Printing result for: total benchmark: painting::paint
http://127.0.0.1/~smp/svg/turbulence.svg
mean: 46 msecs +/- 0 msecs, +/- 0 % avg: 46 msecs 47 47 47 46 46 46 46 46 46 46 benchmark: painting::fullPagePaint
http://127.0.0.1/~smp/svg/turbulence.svg
mean: 58 msecs +/- 0 msecs, +/- 0 % avg: 58 msecs 58 58 58 58 58 58 58 58 59 58 Total mean: 52 msecs per test, avg: 52
Gabor Loki
Comment 24
2011-03-02 04:24:18 PST
Created
attachment 84399
[details]
A simple test case to benchmark the SMP-based turbulence filter This test case is able to show the effect of SMP-based turbulence filter. I know the JS is also taken account in this test, but its effect for the execution time is not significant (according to filter calculation). It is not an ultimate benchmark, it is just a simple test. If anyone has a good test or a benchmark for SVG, please let me know!
Balazs Kelemen
Comment 25
2011-03-02 04:40:51 PST
(In reply to
comment #21
)
> > Benjamin, do you have any idea what is happening here? Why did the environment produce this strange numbers? > > Nope, I have no idea what is going on here. The deviation is just insane, it is usually around 1%. >
Not belongs strictly to this bug (should we create another?) but I have also tried those tests (the gitorious version) and the deviation was not acceptable. I guess the trunk version also suffering from this in wall-time mode. I do not think this is only a problem with SVG related pages but it is caused by the event based approach of our benchmark framework. The really good thing in this framework is that one can measure the performance of specific subsystems like layouting or painting but we cannot use it to validate optimizations as long as we cannot make it producing stable results.
Gabor Loki
Comment 26
2011-03-09 07:41:32 PST
Created
attachment 85174
[details]
Measurement result with Methanol I have measured the performance progression of this SMP framework with our page loading benchmark (called Methanol:
http://gitorious.org/methanol
). Two measurement scenarios were used to validate the framework (an artificial test-suite and a collection of real websites). Scenario 1: - Real websites: - Contains the first 25 unique websites from Alexa Top 500. - Does not contain any SVG. - Result - Measured on x86 and on ARM - It is not surprising that there is no difference between the reference and the two SMP-based approaches. Scenario 2: - Artificial test-suite: - Methanol contains a small test-suite for SVG. Some test cases contain only one SVG filter with simple shapes, others are using combined filters. - Result - Measured on x86 and ARM - Methanol reported 30% - 80% performance progression on turbulence-related tests. Others were not changed. See the attached file. - An important though - The 'turbSmall' test contains several small elements. The turbulence filter is called on these elements independently. The drawing area is for each elements is 100x100 (which is under the initial SMP-limit for turbulence). You can see the performance numbers were the same for each measurements. While the 'turbBig' test performs better with SMP (in this case the elements' size are 400x400). If someone has another SVG-related benchmark, I would be happy to measure the SMP framework with it.
Gabor Loki
Comment 27
2011-03-28 23:53:27 PDT
> Created an attachment (id=83307) [details] > Speeding up FETurbulence with SMP > ... > - Methanol reported 30% - 80% performance progression on turbulence-related tests. Others were not changed. See the attached file.
Any other thoughts or suggestions?
Csaba Osztrogonác
Comment 28
2011-04-04 08:51:22 PDT
Comment on
attachment 83307
[details]
Speeding up FETurbulence with SMP View in context:
https://bugs.webkit.org/attachment.cgi?id=83307&action=review
On the whole looks good to me, but please fix the following things: - I agree with Zoltan (
https://bugs.webkit.org/show_bug.cgi?id=43903#c16
), we need template for parameter passing to avoid ugly reinterpret_casts and using sizof. - Please fix the typo detected by smfr. (s/Simmetric/Symmetric) - And what about the other build systems? You should add the new files for all build system. ( Or at least disable PARALLEL_JOBS if !PLATFORM(QT) ) For example it doesn't build on Windows because of missing include path. r- for the present due to my comments above and below.
> Source/JavaScriptCore/wtf/ParallelJobs.cpp:135 > + ASSERT(m_numberOfJobs > 0);
m_numberOfJobs is always greater than 0 if !ENABLE(OPENMP)
> Source/JavaScriptCore/wtf/ParallelJobs.h:44 > +// ParallelJobs parallelJobs(&worker, sizeof(Parameters));
s/worker/workerFunction
> Source/JavaScriptCore/wtf/Platform.h:1162 > +#if !defined(ENABLE_PARALLEL_JOBS) && ENABLE(OPENMP) > +#define ENABLE_PARALLEL_JOBS 1 > +#endif
We shouldn't enable parallel jobs if ENABLE(SINGLE_THREADED) is true, so we need an additional !ENABLE(SINGLE_THREADED) guard.
Gabor Loki
Comment 29
2011-04-05 06:26:21 PDT
> On the whole looks good to me, but please fix the following things: > > - I agree with Zoltan (
https://bugs.webkit.org/show_bug.cgi?id=43903#c16
), we need > template for parameter passing to avoid ugly reinterpret_casts and using sizof.
Well, it seems an easy task, but the ParallelJobs has a static s_threadPool member in case of using WebKit's threading. If ParallelJobs is changed to template<typename Type>ParallelJobs, we will end up in more s_threadPool members (one for each different template type). This is not what we want, we should have only one thread pool. A possible solution is to create a base class (for example ParallelEnvironment) which contains the s_threadPool member and the related ThreadPrivate class. I am still not convinced which solution is better - a template class with a base pool, or passing the size of the parameter.
Zoltan Herczeg
Comment 30
2011-04-05 06:31:58 PDT
> I am still not convinced which solution is better - a template class with a base pool, or passing the size of the parameter.
I vote for the base class. You have to do it only once. The other one is ugly.
Alexey Proskuryakov
Comment 31
2011-04-05 08:50:46 PDT
Please consider that inheritance should only be used to express "is a" relationships, and never to share code or data.
Gabor Loki
Comment 32
2011-04-07 01:06:10 PDT
Created
attachment 88589
[details]
Speeding up FETurbulence with SMP Well, I redesigned the patch using template (and fixed the other issues as well). The difference of the usage of ParallelJobs looks very small, but the framework itself is getting more complex.
Gabor Loki
Comment 33
2011-04-07 05:19:11 PDT
Created
attachment 88609
[details]
Speeding up FETurbulence with SMP Speculative fix for vcproj file
Alexey Proskuryakov
Comment 34
2011-04-07 08:58:14 PDT
+ programming. The framework is based on WebKit's threading infrastructure + and Open Multi-Processing's (OpenMP) API. While this is certainly an interesting experiment and an extremely promising direction, I'm not sure that adding an OpenMP-inspired threading code to WebKit is a good idea. The code doesn't fit with the rest of WebKit, and it doesn't fit well with some target platforms (e.g. on OS X, one should use libdispatch for parallel jobs).
Zoltan Herczeg
Comment 35
2011-04-07 10:25:45 PDT
> The code doesn't fit with the rest of WebKit, and it doesn't fit well with some target platforms (e.g. on OS X, one should use libdispatch for parallel jobs).
As far as I see the central question of the patch is the generic WebKit interface for parallel jobs. The technical details (which platfom how implements them) is secondary. The header contains a simple, portable interface, which has 3 steps: - request threads, passing the optimal number of threads - system returns with the number of allocated threads (always <= optimal, but >= l), - fill the thread local data - execute the jobs, and wait for finish Alexey, would you do that a different way? Again, the purpose of this work seems to find a simple, yet powerful parallel processing interface. The openmp thing is just an implementation, nothing more.
Alexey Proskuryakov
Comment 36
2011-04-07 11:11:39 PDT
Libdispatch is different in recognizing that no single application knows what happens with the rest of the system, so no application can know how many concurrent threads to use. Instead, you supply small tasks, and it makes sure to optimize parallelism globally and dynamically. Since it's open source, you can use it on other operating systems, although the benefits are smaller without kernel support.
Zoltan Herczeg
Comment 37
2011-04-07 11:21:15 PDT
(In reply to
comment #36
)
> Libdispatch is different in recognizing that no single application knows what happens with the rest of the system, so no application can know how many concurrent threads to use. Instead, you supply small tasks, and it makes sure to optimize parallelism globally and dynamically. > > Since it's open source, you can use it on other operating systems, although the benefits are smaller without kernel support.
This is good news, it fits to the current approach. Basically we would return the optimal number of threads and let the system execute them in any order.
Alexey Proskuryakov
Comment 38
2011-04-07 11:36:06 PDT
What I'm saying is that libdispatch changes the number of threads dynamically. It's not determined at the time tasks are scheduled.
Zoltan Herczeg
Comment 39
2011-04-07 11:52:08 PDT
(In reply to
comment #38
)
> What I'm saying is that libdispatch changes the number of threads dynamically. It's not determined at the time tasks are scheduled.
I understand that, still I don't see a problem here. Basically even with libdispatch, you need to create those small tasks, isn't it? And their number is fixed when you created them, and they were executed somehow (doesn't matter how).
http://developer.apple.com/library/mac/#documentation/General/Conceptual/ConcurrencyProgrammingGuide/ConcurrencyandApplicationDesign/ConcurrencyandApplicationDesign.html#//apple_ref/doc/uid/TP40008091-CH100-SW1
Even with libdispatch, you determine the number of sub-tasks (since you create them), and their number is never infinte. Thus, the ParrallelJobs returns the number you passed as maxThreads, you create those sub-tasks, and libdispatch execute them.
Alexey Proskuryakov
Comment 40
2011-04-07 12:14:33 PDT
So, are you saying that we'd just pass INT_MAX as optimal number of threads? Even if that works, it sounds like an abuse. I think that using parallelism more would be a good topic for WebKit contributor meeting discussion this month.
Zoltan Herczeg
Comment 41
2011-04-07 12:48:32 PDT
(In reply to
comment #40
)
> So, are you saying that we'd just pass INT_MAX as optimal number of threads? Even if that works, it sounds like an abuse.
hmmmm, not exactly. I would rather say that the optimal number would depend on the input, like we want to start a job for each 5000 pixels (don't think every pixel should have an own thread...), so the optimal would be round_up(width * height / 5000)
> I think that using parallelism more would be a good topic for WebKit contributor meeting discussion this month.
Absolutely true. Unfortunately we can't be there in person. I hope we could follow you somehow, like mails, or something...
Nikolas Zimmermann
Comment 42
2011-04-08 23:51:08 PDT
(In reply to
comment #35
)
> > The code doesn't fit with the rest of WebKit, and it doesn't fit well with some target platforms (e.g. on OS X, one should use libdispatch for parallel jobs). > > As far as I see the central question of the patch is the generic WebKit interface for parallel jobs. The technical details (which platfom how implements them) is secondary. The header contains a simple, portable interface, which has 3 steps: > - request threads, passing the optimal number of threads > - system returns with the number of allocated threads (always <= optimal, but >= l), > - fill the thread local data > - execute the jobs, and wait for finish > > Alexey, would you do that a different way? Again, the purpose of this work seems to find a simple, yet powerful parallel processing interface. The openmp thing is just an implementation, nothing more.
Can we maybe split up everything openmp related into a different header, to really make it a detail of the implementation. I think the patch is already great as-is, though I'm absolutely no expert in this area, and like the hear from others how to go on :-)
Zoltan Herczeg
Comment 43
2011-04-09 00:27:42 PDT
> Can we maybe split up everything openmp related into a different header, to really make it a detail of the implementation.
The patch needs to be redesigned :(, because the ^{ block } construction required by libdispatch combined with templates results an GCC internal compiler error. Perhaps someone should report this to Apple as it seems ^{ block } is a special extension introduced by them.
Gabor Loki
Comment 44
2011-04-09 02:39:08 PDT
It looks like everything is discussed without me. ;) That's great! First of all the ParallelJobs framework should look like the Threading one. So splitting it into different headers is a good idea. About OpenMP, libdispatch and other parallel processing API. Every platform has its own favorite parallel processing method/API. I have only introduced one extra (OpenMP) beside the generic WebKit's threading infrastructure. The libdispatch is also a very good feature, but I am not familiar with it. So, It would be nice if someone helps in this area. About the framework. The WebKit's ParallelJobs framework should be simple and powerful for each parallel processing method/API. It is simple if the developers think it is ;) . It is powerful if it is as close to the platform dependent parallel processing API as possible. My solution is close to satisfy those two requirement. Well, I do not think I invented a brand new method for parallel processing. The basics are the same as you can see in any API. - Allocate / request some threads (depending on the input) - Fill the threads' local data - Execute the threads and wait for finish This is a very basic method using parallel processing and every API provides a way to execute tasks based on the above topics. As I see libdispatch also supports this way (
http://libdispatch.macosforge.org/trac/wiki/tutorial#Puttingworkinthepipeline:Queues
) About the technical details. Using template makes a big mess. Especially when I try to add libdispatch support. As Zoltan said libdispatch is not supported in a template. The GCC 4.2 fails with internal compiler error on Mac. So, I have to redesign the whole framework. - First of all, I would like to create separate headers for each parallel processing method/API and a common one which will contain the public API of WebKit's ParallelJobs framework (thanks to Nikolas). - The other important thing should be the base class (so-called ParallelEnviroment). It will be something like in my first patch, will contain a constructor for initialization, functions to be able to fill the local data and an execute function. - The third part should be the template class. The purpose of this class is to make the framework be as simple as possible. Any more thoughts?
Geoffrey Garen
Comment 45
2011-04-11 11:22:09 PDT
Is it really a good idea to add the complexity of threading to the mix when so many basic bugs and crashes in SVG filters remain?
Zoltan Herczeg
Comment 46
2011-04-11 12:26:47 PDT
(In reply to
comment #45
)
> Is it really a good idea to add the complexity of threading to the mix when so many basic bugs and crashes in SVG filters remain?
So many is a little exaggerated :) SVG is getting more attention than in the past, and the code improves rapidly.
Brent Fulgham
Comment 47
2011-04-11 13:36:18 PDT
Comment on
attachment 88609
[details]
Speeding up FETurbulence with SMP View in context:
https://bugs.webkit.org/attachment.cgi?id=88609&action=review
This is a very interesting project, and I'd love to be able to benchmark it under WinCairo to see what improvements it might provide. It would be nice (in the future) to allow GCD/OpenMP backends for generic parallelization (perhaps in a 'platform/libdispatch' and 'platform/openmp' structure).
> Source/JavaScriptCore/wtf/ParallelJobs.cpp:135 > +
Couldn't the m_numberOfJobs calculation and the threadpool setup be done in a method, hiding the knowledge of OPENMP use? Ditto for maxNumberOfThreads (above), though probably less important.
> Source/JavaScriptCore/wtf/ParallelJobs.h:73 > +static const int maxParallelThreads = 2;
Can't this be hidden inside the implementation file? Is it accessed by other compilation units?
> Source/JavaScriptCore/wtf/ParallelJobs.h:149 > + void executeJobs()
For clarity, could this implementation be hidden inside the cpp file?
> Source/JavaScriptCore/wtf/ParallelJobs.h:158 > + int i;
Why is this declared outside the for?
Geoffrey Garen
Comment 48
2011-04-11 14:56:10 PDT
> So many is a little exaggerated :) SVG is getting more attention than in the past, and the code improves rapidly.
Given the rapid improvement you're expecting, would you be willing to hold off on changes like this until all reproducible crashers in SVG filters have been fixed?
Dirk Schulze
Comment 49
2011-04-12 00:01:40 PDT
(In reply to
comment #48
)
> > So many is a little exaggerated :) SVG is getting more attention than in the past, and the code improves rapidly. > > Given the rapid improvement you're expecting, would you be willing to hold off on changes like this until all reproducible crashers in SVG filters have been fixed?
I just could just find of two crashes in bugzilla. IIRC Zoltan is working on one of these crashes. The second one is more a implementation problem of SVGImage, that affects SVG filter. So SVGImage should get fixed instead of filter. If you know of more bugs, please let me know. If you know of more crashes, file a bug and add me and Zoltan please.
Gabor Loki
Comment 50
2011-04-12 00:41:58 PDT
Created
attachment 89166
[details]
Speeding up FETurbulence with SMP Well, I have updated the path as I promised; using separate headers, base classes try to be close to the platform dependent parallel processing APIs, the template class is as generic as possible. Although I am not familiar with Mac, I have introduced a possible usage of libdistpatch API. I hope there will be someone else who is able to fine tune this part for Mac.
Gabor Loki
Comment 51
2011-04-12 00:46:00 PDT
> For clarity, could this implementation be hidden inside the cpp file?
Hmm. Why do you want to hide the implementation details? For example, in JavaScriptCore almost the whole assembler is public.
Gabor Loki
Comment 52
2011-04-12 03:00:30 PDT
Comment on
attachment 89166
[details]
Speeding up FETurbulence with SMP I am going to update the build systems to make svn apply happy.
Gabor Loki
Comment 53
2011-04-12 06:26:06 PDT
Created
attachment 89195
[details]
Speeding up FETurbulence with SMP
WebKit Review Bot
Comment 54
2011-04-12 06:27:33 PDT
Attachment 89195
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/wtf/ParallelJobsLibdispatch.h:75: More than one command on the same line [whitespace/newline] [4] Source/JavaScriptCore/wtf/ParallelJobsLibdispatch.h:75: Missing space before { [whitespace/braces] [5] Source/JavaScriptCore/wtf/ParallelJobsLibdispatch.h:78: More than one command on the same line [whitespace/newline] [4] Source/JavaScriptCore/wtf/ParallelJobsLibdispatch.h:78: Missing space before { [whitespace/braces] [5] Total errors found: 4 in 16 files If any of these errors are false positives, please file a bug against check-webkit-style.
Gabor Loki
Comment 55
2011-04-26 01:15:08 PDT
Created
attachment 91082
[details]
Speeding up FETurbulence with SMP I got a feedback about the container of parameters in private; the destructor of parameters should be called during the destruction of their container. Well, in order to solve this issue the ByteArray is changed to Vector, the container of the parameters and the ParallelEnviroment become the members of ParallelJobs class.
WebKit Review Bot
Comment 56
2011-04-26 01:19:27 PDT
Attachment 91082
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/wtf/ParallelJobsLibdispatch.h:67: More than one command on the same line [whitespace/newline] [4] Source/JavaScriptCore/wtf/ParallelJobsLibdispatch.h:67: Missing space before { [whitespace/braces] [5] Source/JavaScriptCore/wtf/ParallelJobsLibdispatch.h:72: More than one command on the same line [whitespace/newline] [4] Source/JavaScriptCore/wtf/ParallelJobsLibdispatch.h:72: Missing space before { [whitespace/braces] [5] Total errors found: 4 in 16 files If any of these errors are false positives, please file a bug against check-webkit-style.
Csaba Osztrogonác
Comment 57
2011-04-26 04:59:16 PDT
Comment on
attachment 91082
[details]
Speeding up FETurbulence with SMP You fixed all things what other asked, and there weren't any other objections, so it is ready for landing now. r=me
Eric Seidel (no email)
Comment 58
2011-04-26 06:57:53 PDT
Style?
Gabor Loki
Comment 59
2011-04-26 07:02:43 PDT
> Style?
:) I guess style bot is not prepared for the 'newly-supported feature of the C language'. See here:
http://libdispatch.macosforge.org/trac/wiki/tutorial#Definingworkitems:FunctionsandBlocks
Eric Seidel (no email)
Comment 60
2011-04-26 07:36:13 PDT
Please file a bug against the style-bot. :)
WebKit Commit Bot
Comment 61
2011-04-26 07:38:07 PDT
The commit-queue encountered the following flaky tests while processing
attachment 91082
[details]
: http/tests/misc/favicon-loads-with-icon-loading-override.html
bug 58412
(author:
alice.liu@apple.com
) The commit-queue is continuing to process your patch.
WebKit Commit Bot
Comment 62
2011-04-26 07:41:04 PDT
Comment on
attachment 91082
[details]
Speeding up FETurbulence with SMP Clearing flags on attachment: 91082 Committed
r84911
: <
http://trac.webkit.org/changeset/84911
>
WebKit Commit Bot
Comment 63
2011-04-26 07:41:13 PDT
All reviewed patches have been landed. Closing bug.
WebKit Review Bot
Comment 64
2011-04-26 09:24:31 PDT
http://trac.webkit.org/changeset/84911
might have broken SnowLeopard Intel Release (WebKit2 Tests)
Nikolas Zimmermann
Comment 65
2011-04-26 10:47:37 PDT
Excellent work, looking forward to test it :-)
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug