79184 – Implement SpinLockMutexLocker on WTF and optimize some parts of JavaScriptCore using it

UNCONFIRMED 79184

Implement SpinLockMutexLocker on WTF and optimize some parts of JavaScriptCore using it

https://bugs.webkit.org/show_bug.cgi?id=79184

Summary Implement SpinLockMutexLocker on WTF and optimize some parts of JavaScriptCor...

Kwonjin Jeong

Reported 2012-02-21 19:08:58 PST

Implement SpinLockLocker on WTF and optimize some parts of JavaScriptCore using SpinLockLocker

Attachments
Patch (9.04 KB, patch) 2012-02-21 19:34 PST, Kwonjin Jeong	eric: review-	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Kwonjin Jeong

Comment 1 2012-02-21 19:18:11 PST

Change the class name to SpinLockMutexLocker.

Kwonjin Jeong

Comment 2 2012-02-21 19:34:45 PST

Created attachment 128114 [details] Patch

Alexey Proskuryakov

Comment 3 2012-02-22 10:17:01 PST

How did you confirm that this patch is an optimization?

Kwonjin Jeong

Comment 4 2012-02-22 18:18:03 PST

(In reply to comment #3) > How did you confirm that this patch is an optimization? At first, I compare the performance just between MutexLocker and SpinLockMutexLocker. Upon examination, SpinLockMutexLocker is two times faster than MutexLocker. I created a performance test program that has a loop within a simple calculation and the Lockers. Then, the program loops one billion times. And then, I applied SpinLockMutexLocker on some parts of JavaScriptCore, then ran 'run-sunspider', and then compare the results. Actually, the result using SpinLockMutexLocker is 0.5% ~ 1.0% faster than MutexLocker's - it's within the margin of error. But I guess that the performance can be better by using SpinLockMutexLocker on some more parts. My test environment and test result is as written below. Port: QtWebKit OS: Linux (Ubuntu 11.10) CPU: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz (6 core) RAM: 12GiB Loop test result: Mutex : 17047.357910 ms SpinLockMutex: 7897.659180 ms Simple Loop : 656.346680 ms Sum : 565039360 Sunspider result: The test program is WebKit/Source/JavaScriptCore/jsc.cpp with some modification. I just modify jscmain() function. int jscmain(int argc, char** argv, JSGlobalData* globalData) { int i; int sum = 0; Mutex mutex; SpinLockMutex spinLockMutex; double startTime; double endTimeMutexLocker; double endTimeSpinLock; double endTimeSimpleLoop; startTime = currentTime(); for (i = 0; i < 1000000000; i++) { MutexLocker locker(mutex); sum += i; } endTimeMutexLocker = currentTime(); for (i = 0; i < 1000000000; i++) { SpinLockMutexLocker locker(spinLockMutex); sum += i; } endTimeSpinLock = currentTime(); for (i = 0; i < 1000000000; i++) { sum += i; } endTimeSimpleLoop = currentTime(); printf("Mutex : %lf ms\n", endTimeMutexLocker * 1000.0 - startTime * 1000.0); printf("SpinLockMutex: %lf ms\n", endTimeSpinLock * 1000.0 - endTimeMutexLocker * 1000.0); printf("Simple Loop : %lf ms\n", endTimeSimpleLoop * 1000.0 - endTimeSpinLock * 1000.0); printf("Sum : %d\n", sum); return 0; }

Kwonjin Jeong

Comment 5 2012-02-22 18:42:05 PST

(In reply to comment #3) > How did you confirm that this patch is an optimization? I attach the sunspider results below. MutexLocker ============================================ RESULTS (means and 95% confidence intervals) -------------------------------------------- Total: 188.7ms +/- 0.5% -------------------------------------------- 3d: 26.9ms +/- 1.5% cube: 8.0ms +/- 0.0% morph: 10.1ms +/- 2.2% raytrace: 8.8ms +/- 3.4% access: 17.7ms +/- 2.0% binary-trees: 2.0ms +/- 0.0% fannkuch: 8.3ms +/- 4.2% nbody: 4.0ms +/- 0.0% nsieve: 3.4ms +/- 10.9% bitops: 15.8ms +/- 1.9% 3bit-bits-in-byte: 1.0ms +/- 0.0% bits-in-byte: 5.8ms +/- 5.2% bitwise-and: 3.0ms +/- 0.0% nsieve-bits: 6.0ms +/- 0.0% controlflow: 2.0ms +/- 0.0% recursive: 2.0ms +/- 0.0% crypto: 12.2ms +/- 2.5% aes: 8.1ms +/- 2.8% md5: 2.0ms +/- 0.0% sha1: 2.1ms +/- 10.8% date: 22.8ms +/- 2.0% format-tofte: 12.2ms +/- 2.5% format-xparb: 10.6ms +/- 3.5% math: 23.7ms +/- 2.5% cordic: 8.0ms +/- 0.0% partial-sums: 13.3ms +/- 2.6% spectral-norm: 2.4ms +/- 15.4% regexp: 11.0ms +/- 0.0% dna: 11.0ms +/- 0.0% string: 56.6ms +/- 1.4% base64: 4.6ms +/- 8.0% fasta: 8.1ms +/- 2.8% tagcloud: 14.1ms +/- 1.6% unpack-code: 23.8ms +/- 1.9% validate-input: 6.0ms +/- 0.0% SpinLockMutexLocker ============================================ RESULTS (means and 95% confidence intervals) -------------------------------------------- Total: 187.4ms +/- 0.4% -------------------------------------------- 3d: 27.0ms +/- 1.2% cube: 8.0ms +/- 0.0% morph: 10.1ms +/- 2.2% raytrace: 8.9ms +/- 2.5% access: 17.5ms +/- 2.2% binary-trees: 2.0ms +/- 0.0% fannkuch: 8.1ms +/- 2.8% nbody: 4.0ms +/- 0.0% nsieve: 3.4ms +/- 10.9% bitops: 16.0ms +/- 0.0% 3bit-bits-in-byte: 1.0ms +/- 0.0% bits-in-byte: 6.0ms +/- 0.0% bitwise-and: 3.0ms +/- 0.0% nsieve-bits: 6.0ms +/- 0.0% controlflow: 2.1ms +/- 10.8% recursive: 2.1ms +/- 10.8% crypto: 12.0ms +/- 0.0% aes: 8.0ms +/- 0.0% md5: 2.0ms +/- 0.0% sha1: 2.0ms +/- 0.0% date: 22.3ms +/- 1.5% format-tofte: 12.2ms +/- 2.5% format-xparb: 10.1ms +/- 2.2% math: 23.1ms +/- 1.0% cordic: 8.0ms +/- 0.0% partial-sums: 13.0ms +/- 0.0% spectral-norm: 2.1ms +/- 10.8% regexp: 11.0ms +/- 0.0% dna: 11.0ms +/- 0.0% string: 56.4ms +/- 0.9% base64: 4.4ms +/- 8.4% fasta: 8.0ms +/- 0.0% tagcloud: 14.0ms +/- 0.0% unpack-code: 23.9ms +/- 0.9% validate-input: 6.1ms +/- 3.7%

Kwonjin Jeong

Comment 6 2012-02-22 18:50:17 PST

I'm trying to fix build errors on mac and win port. It's caused by including TCSpinLock.h but the header file isn't in PrivateHeaders.

Alexey Proskuryakov

Comment 7 2012-02-22 19:34:59 PST

Please use SunSpider built-in feature to compare the results - it will tell you if the change is statistically significant. If the new locker is only two times as fast in a tight loop, it might not be worth adding complexity and bypassing all platform specific optimizations OS vendors add to regular mutex.

Kwonjin Jeong

Comment 8 2012-02-22 21:09:41 PST

(In reply to comment #7) > Please use SunSpider built-in feature to compare the results - it will tell you if the change is statistically significant. > > If the new locker is only two times as fast in a tight loop, it might not be worth adding complexity and bypassing all platform specific optimizations OS vendors add to regular mutex. Thank you for your comment. I'll try to use the feature and write the result here.

Eric Seidel (no email)

Comment 9 2012-04-19 16:42:50 PDT

Comment on attachment 128114 [details] Patch Waiting for the sunspider-compare-results results.

Note You need to log in before you can comment on or make changes to this bug.

Status UNCONFIRMED

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component Web Template Framework

Assignee

Nobody

Reported

2012-02-21 19:08 PST

Modified

2012-04-19 16:42 PDT History

CC List

7 users Show

URL

Keywords

Depends on

Blocks