Bug 66378 - libdispatch based ParallelJobs is not enough parallel
Summary: libdispatch based ParallelJobs is not enough parallel
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac Unspecified
: P2 Normal
Assignee: Balazs Kelemen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-17 07:56 PDT by Balazs Kelemen
Modified: 2011-10-03 06:07 PDT (History)
8 users (show)

See Also:


Attachments
Patch (2.41 KB, patch)
2011-08-17 08:15 PDT, Balazs Kelemen
no flags Details | Formatted Diff | Diff
WIP patch (3.00 KB, patch)
2011-08-25 01:45 PDT, Balazs Kelemen
no flags Details | Formatted Diff | Diff
Patch (3.45 KB, patch)
2011-09-08 03:09 PDT, Balazs Kelemen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Balazs Kelemen 2011-08-17 07:56:21 PDT
We use the wrong API. We should use a concurrent dispatch queue with a dispatch group instead of using a serial queue.
Comment 1 Balazs Kelemen 2011-08-17 08:15:19 PDT
Created attachment 104175 [details]
Patch
Comment 2 Balazs Kelemen 2011-08-17 08:20:04 PDT
Performance results on methanol (https://gitorious.org/methanol/) with the svg test set:

baseline:              2798.2 with 7.6% deviance
parallel jobs enabled: 2663.9 with 15.5% deviance
parallel jobs + patch: 1634.6 with 11.6% deviance

I don't know where are the huge deviance is coming from.
However I believe the results are convincing even with that.
Comment 3 WebKit Review Bot 2011-08-17 08:23:49 PDT
Attachment 104175 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/wtf/ParallelJobsLibdispatch.h:67:  More than one command on the same line  [whitespace/newline] [4]
Source/JavaScriptCore/wtf/ParallelJobsLibdispatch.h:67:  Missing space before {  [whitespace/braces] [5]
Total errors found: 2 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Balazs Kelemen 2011-08-17 11:42:11 PDT
Note that the style failure is because of the parallel libdispatch syle block: ^{(*m_threadFunction)(parameters);}
Comment 5 Mark Rowe (bdash) 2011-08-17 15:04:53 PDT
(In reply to comment #4)
> Note that the style failure is because of the parallel libdispatch syle block: ^{(*m_threadFunction)(parameters);}

You should have whitespace inside of the braces. In other words:

^{ (*m_threadFunction)(parameters); }
Comment 6 Mark Rowe (bdash) 2011-08-17 15:10:13 PDT
Comment on attachment 104175 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=104175&action=review

> Source/JavaScriptCore/ChangeLog:12
> +        implementatoin was almost identical (less then 5% win).

*implementation* was almost identical (less *than* 5% win).
Comment 7 Gabor Loki 2011-08-17 23:22:14 PDT
Comment on attachment 104175 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=104175&action=review

> Source/JavaScriptCore/wtf/ParallelJobsLibdispatch.h:63
> -        dispatch_queue_t parallelJobsQueue = dispatch_queue_create("ParallelJobs", 0);
> +        static dispatch_queue_t globalQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);

Good catch! However, could we use dispatch_queue_create("ParallelJobs", DISPATCH_QUEUE_CONCURRENT) instead of the global concurrent queue? I do not know what is the main difference between those (in terms of design and performance) ?
Comment 8 Balazs Kelemen 2011-08-18 02:14:08 PDT
(In reply to comment #7)
> (From update of attachment 104175 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=104175&action=review
> 
> > Source/JavaScriptCore/wtf/ParallelJobsLibdispatch.h:63
> > -        dispatch_queue_t parallelJobsQueue = dispatch_queue_create("ParallelJobs", 0);
> > +        static dispatch_queue_t globalQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
> 
> Good catch! However, could we use dispatch_queue_create("ParallelJobs", DISPATCH_QUEUE_CONCURRENT) instead of the global concurrent queue? I do not know what is the main difference between those (in terms of design and performance) ?

I don't know which solution is better. Maybe there is an overhead of creating our own queue each time we have been called. Mark, could you share your knowledge about that with us?
Comment 9 Mark Rowe (bdash) 2011-08-18 10:34:16 PDT
One obvious point is that DISPATCH_QUEUE_CONCURRENT is only available on 10.7 and newer.
Comment 10 Mark Rowe (bdash) 2011-08-22 02:05:55 PDT
By the way, couldn’t the explicit loop be replaced with a call to dispatch_apply?
Comment 11 Balazs Kelemen 2011-08-22 02:26:14 PDT
(In reply to comment #10)
> By the way, couldn’t the explicit loop be replaced with a call to dispatch_apply?

Yes, the same idea came to my mind after a bit more documentation reading.
I will do that in the next patch.
Comment 12 Balazs Kelemen 2011-08-25 01:44:41 PDT
I have a new WIP patch with dispatch_apply that turns on ENABLE(THREADING_LIBDISPATCH) for testing but it crashes. If I'm not doing smg wrong than it seems like there is an issue with FELightning. According to the crash log there are 4 threads and all of them crashing in FELightning::LightningData::interior. I'm going to upload the WIP patch in a moment.
1   WTF::ByteArray::get(unsigned int) const
1   WTF::ByteArray::get(unsigned int) const
1   WTF::ByteArray::get(unsigned int) const
1   WTF::ByteArray::get(unsigned int) const
2   WebCore::FELighting::LightingData::interior(int, WebCore::IntPoint&)
2   WebCore::FELighting::LightingData::interior(int, WebCore::IntPoint&)
2   WebCore::FELighting::LightingData::interior(int, WebCore::IntPoint&)
2   WebCore::FELighting::LightingData::interior(int, WebCore::IntPoint&)
3   WebCore::FELighting::platformApplyGenericPaint(WebCore::FELighting::LightingData&, WebCore::LightSource::PaintingData&, int, int)
3   WebCore::FELighting::platformApplyGenericPaint(WebCore::FELighting::LightingData&, WebCore::LightSource::PaintingData&, int, int)
3   WebCore::FELighting::platformApplyGenericPaint(WebCore::FELighting::LightingData&, WebCore::LightSource::PaintingData&, int, int)
3   WebCore::FELighting::platformApplyGenericPaint(WebCore::FELighting::LightingData&, WebCore::LightSource::PaintingData&, int, int)
4   WebCore::FELighting::platformApplyGenericWorker(WebCore::FELighting::PlatformApplyGenericParameters*)
4   WebCore::FELighting::platformApplyGenericWorker(WebCore::FELighting::PlatformApplyGenericParameters*)
4   WebCore::FELighting::platformApplyGenericWorker(WebCore::FELighting::PlatformApplyGenericParameters*)
4   WebCore::FELighting::platformApplyGenericWorker(WebCore::FELighting::PlatformApplyGenericParameters*)
5   __execute_block_invoke_1
5   __execute_block_invoke_1
5   __execute_block_invoke_1
5   __execute_block_invoke_1
6   _dispatch_apply2
6   _dispatch_apply2
6   _dispatch_apply2
6   _dispatch_apply2
7   dispatch_apply_f
7   _dispatch_worker_thread2
7   _dispatch_worker_thread2
7   _dispatch_worker_thread2
8   WTF::ParallelEnvironment::execute(unsigned char*)
8   _pthread_wqthread
8   _pthread_wqthread
8   _pthread_wqthread
9   WTF::ParallelJobs<WebCore::FELighting::PlatformApplyGenericParameters>::execute()
9   start_wqthread
9   start_wqthread
9   start_wqthread
Comment 13 Balazs Kelemen 2011-08-25 01:45:55 PDT
Created attachment 105145 [details]
WIP patch
Comment 14 Balazs Kelemen 2011-08-25 02:08:33 PDT
There is some debugging info for SVG experts. The parameters given to the threads are seems to be valid so I have no clue about what's going on.

(gdb) f 3
#3  0x0000000101aa93bb in WebCore::FELighting::platformApplyGenericWorker (parameters=0x13c8010d0) at /Users/balazs/master_clean/Source/WebCore/platform/graphics/filters/FELighting.cpp:248
248         parameters->filter->platformApplyGenericPaint(parameters->data, parameters->paintingData, parameters->yStart, parameters->yEnd);
(gdb) p parameters
$2 = ('WebCore::FELighting::PlatformApplyGenericParameters' *) 0x13c8010d0
(gdb) p *parameters
$3 = {
  filter = 0x12b851130, 
  data = {
    pixels = 0x1338b2000, 
    surfaceScale = 0.0392156877, 
    widthMultipliedByPixelSize = 2036, 
    widthDecreasedByOne = 508, 
    heightDecreasedByOne = 355
  }, 
  paintingData = {
    lightVector = {
      m_x = -558, 
      m_y = -354, 
      m_z = 147.215683
    }, 
    colorVector = {
      m_x = 166.03772, 
      m_y = 60.3773499, 
      m_z = 0
    }, 
    lightVectorLength = 677.017334, 
    directionVector = {
      m_x = 0.624695003, 
      m_y = 0.624695003, 
      m_z = -0.468521267
    }, 
    privateColorVector = {
      m_x = 176, 
      m_y = 64, 
      m_z = 0
    }, 
    coneCutOffLimit = -0.342020124, 
    coneFullLight = -0.358020127, 
    specularExponent = 1
  }, 
  yStart = 407, 
  yEnd = 436
}
(gdb) f 0
#0  0x0000000101a9e12c in WTF::ByteArray::get (this=0x1338b2000, index=828655) at ByteArray.h:67
67                  ASSERT(index < m_size);
(gdb) p index
$4 = 828655
(gdb) p m_size
$5 = 724816
Comment 15 Balazs Kelemen 2011-09-07 03:14:32 PDT
I tried it on Linux with '#define ENABLE_THREADING_GENERIC 1' in Platform.h, and I increased the max number of threads from 2 to 8 (in ParallelJobsGeneric.h). I runned the methanol svg test suite on a 16 core machine and the same crash happened than on Mac with the WIP patch:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffa24ce700 (LWP 3978)]
0x00007ffff738cfc9 in WebCore::FELighting::platformApplyGenericWorker(WebCore::FELighting::PlatformApplyGenericParameters*) ()
   from /home/balazs/master_clean/WebKitBuild/Release/bin/../lib/libQtWebKit.so.4
(gdb) bt 10
#0  0x00007ffff738cfc9 in WebCore::FELighting::platformApplyGenericWorker(WebCore::FELighting::PlatformApplyGenericParameters*) ()
   from /home/balazs/master_clean/WebKitBuild/Release/bin/../lib/libQtWebKit.so.4
#1  0x00007ffff74698b9 in WTF::ParallelEnvironment::ThreadPrivate::workerThread(void*) ()
   from /home/balazs/master_clean/WebKitBuild/Release/bin/../lib/libQtWebKit.so.4
#2  0x00007ffff746841b in WTF::ThreadPrivate::run() () from /home/balazs/master_clean/WebKitBuild/Release/bin/../lib/libQtWebKit.so.4

Something is definitely wrong with FELightning.
Comment 16 Tim Horton 2011-09-07 11:02:36 PDT
(In reply to comment #15)
> Something is definitely wrong with FELightning.

Looks like somebody swapped "width" in for "height" in FELighting::platformApplyGeneric, when determining the yStep. I'll file a separate bug about that.
Comment 17 Tim Horton 2011-09-07 11:20:12 PDT
https://bugs.webkit.org/show_bug.cgi?id=67719 is the crash
Comment 18 Tim Horton 2011-09-07 12:06:31 PDT
The speedup on the methanol benchmark with this patch is quite impressive! I'm excited for this to land :-)
Comment 19 Balazs Kelemen 2011-09-07 15:44:10 PDT
(In reply to comment #18)
> The speedup on the methanol benchmark with this patch is quite impressive! I'm excited for this to land :-)

Thank you for the fix! I think it's time to utilize the infrastructure. First I will update the patch without actually enabling PARALLEL_JOBS by default. The next step will be enabling it - the libdispatch implementation for Mac and the generic for other platforms. I hope these plans sounds good to everyone.
Comment 20 Balazs Kelemen 2011-09-08 02:36:02 PDT
Current methanol results on a MacPro5,1 (6 cores) with SL Server:

ref -           2582.733 (16.12% deviance)
parallel jobs - 982.06 (19.9% diviance)
Comment 21 Balazs Kelemen 2011-09-08 03:09:43 PDT
Created attachment 106718 [details]
Patch
Comment 22 Balazs Kelemen 2011-09-19 08:35:53 PDT
Ping ...
Comment 23 Zoltan Herczeg 2011-10-03 02:02:31 PDT
Comment on attachment 106718 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=106718&action=review

> Source/JavaScriptCore/ChangeLog:12
> +        was a misusage of the API). Enabling PARALLEL_JOBS is now

misuse

r=me with the typo fix above
Comment 24 Balazs Kelemen 2011-10-03 06:06:07 PDT
Committed r96492: <http://trac.webkit.org/changeset/96492>