WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
Bug 146448
JavaScriptCore performance is very bad on Windows
https://bugs.webkit.org/show_bug.cgi?id=146448
Summary
JavaScriptCore performance is very bad on Windows
ewmailing
Reported
2015-06-29 19:57:02 PDT
Created
attachment 255809
[details]
How to build JSCore for Windows I’ve been trying to use JavaScriptCore as an embedded scripting language for a video game engine. I’ve been using it on multiple platforms. During my testing, I’ve discovered that the JavaScriptCore Windows performance is very slow. I wanted to alert you in hopes there is something that can be done to fix it. I’ve been getting help on building JavaScriptCore for Windows from a few people in the community. My current build is as of a couple of weeks ago. However, I’ve been seeing the same performance characteristics from a build from a little over a year ago. To prove to you that this is a problem to just JavaScriptCore on Windows, and not a general Windows problem, nor a problem with my code, I have done the following: - I ran the program on Windows, OS X, Linux, iOS, Android, and Raspberry Pi (Linux). - I wrote comparable programs in C and Lua and also ran these on all platforms. - Additionally, I was able to run on the same hardware (dual boot) for Windows/Linux. So here is the quick summary: Ubuntu 12.04 LTS (64-bit) Windows 8.1 (64-bit) =========================================================== C 5600 sprites 27000 sprites Lua 3100 sprites 12000 sprites JSCore 2400 sprites 120 sprites This was on an Intel Core i5-3570K @ 3.40 GHz with Intel 4000 HD graphics. The reason for the big jump in Windows performance is mostly due to the Intel graphics drivers being poor on Linux. However, that 120 sprites is not a typo. Whereas everything else was faster on Windows in this case, JSCore is a magnitude slower. My numbers for the other operating systems tend to follow sensible trends. Windows/JSCore is an outlier. And other programs I’ve run also show JavaScriptCore on Windows has performance problems. You can see this summary with the programs running in a short video I put together and uploaded to YouTube:
https://youtu.be/UDyI1cckDJI
I have uploaded pre-built binaries of my test program. JavaScriptCore is dynamically linked in all cases, so it is easy to swap in another build of JSCore if you want to try things. There is a file called config.txt in each bundle that lets you change the number of sprites. (I did not sign the OS X binaries with my Developer ID because I figured you might want to change the config.txt.) Also, the Lua and JavaScript scripts are in plain text and can be easily modified. And for clarity: All my test programs here are 64-bit versions. I have not actually benchmarked 32-bit except on ARM devices. Windows 7 and up (x64) Zip:
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkC.zip
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkLua.zip
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkJavaScript.zip
Installer:
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkC-1.0.0-win64.exe
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkLua-1.0.0-win64.exe
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkJavaScript-1.0.0-win64.exe
OS X 10.9 and up
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkC.app.zip
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkLua.app.zip
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkJavaScript.app.zip
Linux (x64 Ubuntu 12.04 LTS a.k.a Steam Runtime)
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkC-0.1.1-Linux.tar.gz
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkLua-0.1.1-Linux.tar.gz
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkJavaScript-0.1.1-Linux.tar.gz
I also made the repos to my programs available:
https://bitbucket.org/blurrr/minimalspritebenchmarkc
https://bitbucket.org/blurrr/minimalspritebenchmarklua
https://bitbucket.org/blurrr/minimalspritebenchmarkjavascript
However, it probably won’t be of much use without my SDK because without it, you’ll need to compile all the dependencies yourself and figure out the build system. If you need a copy of my SDK, let me know. I'm also attaching my write up of how to build JavaScriptCore for Windows (64-bit). The existing instructions had problems and with the help of a few in the community, I was able to get through it and the document is what I came up with.
Attachments
How to build JSCore for Windows
(503.63 KB, application/zip)
2015-06-29 19:57 PDT
,
ewmailing
no flags
Details
screen capture of Visual Studio profilers
(281.20 KB, image/jpeg)
2015-06-30 10:21 PDT
,
ewmailing
no flags
Details
Instruments profile for OS X
(508.98 KB, image/png)
2015-06-30 11:12 PDT
,
ewmailing
no flags
Details
Patch
(3.90 KB, patch)
2015-07-14 13:17 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(10.22 KB, patch)
2015-07-15 05:42 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(9.72 KB, patch)
2015-07-15 06:09 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(9.19 KB, patch)
2015-07-16 03:05 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(9.17 KB, patch)
2015-07-16 10:57 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Screenshot of profiling.
(57.89 KB, image/png)
2015-07-17 05:21 PDT
,
peavo
no flags
Details
Patch
(14.00 KB, patch)
2015-07-23 05:23 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(8.01 KB, patch)
2015-07-23 06:12 PDT
,
peavo
beidson
: review-
Details
Formatted Diff
Diff
Show Obsolete
(6)
View All
Add attachment
proposed patch, testcase, etc.
Brent Fulgham
Comment 1
2015-06-29 21:52:44 PDT
You mentioned Ubuntu and Windows, and indicated you had run on OS X as well, but I don't see OS X results. Would you mind showing those as well?
ewmailing
Comment 2
2015-06-29 22:33:44 PDT
I excluded the OS X results because my Mac is a different machine with different hardware so it is harder to do direct comparisons and I didn't want to confuse anybody by the results. But the C vs. Lua vs. JavaScriptCore results are still useful for extrapolation. So on an 21.5" iMac Core i3 Mid 2010 @ 3.2 GHz with 16GB of RAM and ATI Radeon HD 5670 OS X 10.10.3 ========================== C 15000 sprites Lua 5500 sprites JSCore 3200 sprites Just a reminder, I did provide an OS X build in my links so you can yourself. So if you have a triple boot machine, you can run all 3 to get more controlled results.
Brent Fulgham
Comment 3
2015-06-29 22:34:42 PDT
<
rdar://problem/21604141
>
peavo
Comment 4
2015-06-30 00:51:35 PDT
Maybe JIT in JSC is not enabled? It should be enabled by default, though.
peavo
Comment 5
2015-06-30 01:07:41 PDT
Can you do some profiling to try to determine where time is spent?
peavo
Comment 6
2015-06-30 02:45:24 PDT
I just built WebKit for Win64, and tested WinLauncher and IE 11 on SunSpider on dromaeo.com. WinLauncher makes 1108.84runs/s (
http://dromaeo.com/?id=238774
) and IE 11 makes 1096.88runs/s (
http://dromaeo.com/?id=238775
), so it seems performance is good on SunSpider at least.
Hyungwook Lee
Comment 7
2015-06-30 06:16:29 PDT
In my investigation, Windows port looks 2x slower than Ubuntu EFL port. WebKit Windows port 32bit Release: 442.4ms (SunSpider 1.0.2) WebKit Windows port 64bit Release: Not yet tested (SunSpider 1.0.2) WebKit2 Ubuntu EFL port 64bit Release: 164.1ms (SunSpider 1.0.2) I will check whether JavaScriptCore FTL makes this difference results or not.
Timothy Hatcher
Comment 8
2015-06-30 09:19:54 PDT
radar://21604141
Timothy Hatcher
Comment 9
2015-06-30 09:21:26 PDT
rdar://problem/21604141
ewmailing
Comment 10
2015-06-30 10:12:42 PDT
(In reply to
comment #4
)
> Maybe JIT in JSC is not enabled? It should be enabled by default, though.
So I don't think this is a JIT specific issue for several reasons. - The number I'm seeing on Windows is way too slow, even without JIT - Remember that Lua (official PUC-Rio) has no JIT at all. (That is one reason I included Lua as a reference point.) - I didn't do a full suite, but I did some spot checks with LuaJIT about a year ago. (Lua 5.3 was still in alpha so it's changed a little bit since then and probably better today.) iMac 3.2GHz Intel Core i3 (OS X 10.9.2) @60fps (some variance, hard to reproduce sometimes due to background processes, etc) C - 16000 sprites Lua 5.3w1 - 5400 sprites JavaScriptCore - 3300 sprites LuaJIT - 9000 sprites On an Intel 3570k (i5) with HD 4000 (3.4GHz): SteamOS (beta from a year ago): C - 5700 sprites Lua 5.3w1 - 2700 sprites JavaScriptCore - 3200 sprites LuaJIT - 3600 sprites Notice that LuaJIT performance really climbs up. I did not do any LuaJIT specific optimizations. Also notice the SteamOS benchmark where JavaScriptCore beats regular Lua. This is not happening on Mac so it got me thinking that JavaScriptCore on Mac does not enable JIT for Mac Apps. - Here is an old benchmark I did on an iPad Mini 1st gen. iPad mini (1st gen, iOS 7.1) @60fps C - 750 sprites Lua 400 sprites (double/long long) and 430 for float/int JavaScriptCore 200 sprites LuJIT 430 sprites I know iOS doesn't enable JIT for JavaScriptCore. JIT also must be disabled on LuaJIT for iOS so it returns to regular Lua-like performance. But notice that JavaScriptCore on a 1st gen iPad mini with no JIT is still beating Windows desktop. So I expect JavaScriptCore, even without JIT to do a lot better than I see on Windows. P.S. You said JIT should be compiled in by default. What are the compiler flags to look for? I'm compiling Android by hand and I don't know how to verify if JIT is on or not. My Android numbers are slightly better than iOS, but I think that's because my Android hardware is better than my iOS hardware and not because of JIT.
ewmailing
Comment 11
2015-06-30 10:21:31 PDT
Created
attachment 255828
[details]
screen capture of Visual Studio profilers I am not knowledgable on how to use the Visual Studio profiler at all. The profile shows that the JavaScript binding to SDL is where most of the time is spent. But this is totally expected. The question is why is Windows 100 times slower overall? The language bindings are exactly the same for all platforms... public JavaScriptCore C API.
peavo
Comment 12
2015-06-30 10:47:08 PDT
(In reply to
comment #10
)
> (In reply to
comment #4
) > > P.S. You said JIT should be compiled in by default. What are the compiler > flags to look for? I'm compiling Android by hand and I don't know how to > verify if JIT is on or not. My Android numbers are slightly better than iOS, > but I think that's because my Android hardware is better than my iOS > hardware and not because of JIT.
ENABLE_JIT in WTF/Platform.h. Also ENABLE_CONCURRENT_JIT (in WTF/Platform.h) is not enabled on Windows, I don't know if that can have any impact on your tests.
peavo
Comment 13
2015-06-30 10:49:18 PDT
(In reply to
comment #11
)
> Created
attachment 255828
[details]
> screen capture of Visual Studio profilers > > I am not knowledgable on how to use the Visual Studio profiler at all. > The profile shows that the JavaScript binding to SDL is where most of the > time is spent. But this is totally expected. The question is why is Windows > 100 times slower overall? The language bindings are exactly the same for all > platforms... public JavaScriptCore C API.
Maybe you can do similar profiling on OSX, and see if you spot any big differences?
ewmailing
Comment 14
2015-06-30 11:12:09 PDT
Created
attachment 255833
[details]
Instruments profile for OS X Here is the Instruments profile on OS X. Nothing really jumps out at me. Perhaps you can see something?
Hyungwook Lee
Comment 15
2015-07-07 20:17:52 PDT
I've disabled ENABLE_CONCURRENT_JIT in WebKit2 EFL port and measured Sun Spider 1.0.2 test as following. WebKit Windows port 32bit Release: 442.4ms (SunSpider 1.0.2) WebKit Windows port 64bit Release: Not yet tested (SunSpider 1.0.2) WebKit2 Ubuntu EFL port 64bit Release: 164.1ms (SunSpider 1.0.2) WebKit2 Ubuntu EFL port 64bit Reelase: 223.0ms (Disabled CONCURRENT_JIT) I think ENABLE_CONCURRENT_JIT make performance improvement but it is not major reason for this issue. Additionally, WebKit2 EFL port doesn't use FTL as below. #if PLATFORM(GTK) && HAVE(LLVM) && ENABLE(JIT) && !defined(ENABLE_FTL_JIT) && CPU(X86_64) #define ENABLE_FTL_JIT 1 #endif It looks current issue doesn't related to JIT and it's related stuff.
peavo
Comment 16
2015-07-14 13:17:10 PDT
Created
attachment 256787
[details]
Patch
WebKit Commit Bot
Comment 17
2015-07-14 13:19:34 PDT
Attachment 256787
[details]
did not pass style-queue: ERROR: Source/JavaScriptCore/runtime/JSLock.cpp:35: _Thrd_current is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] Total errors found: 1 in 5 files If any of these errors are false positives, please file a bug against check-webkit-style.
peavo
Comment 18
2015-07-14 13:20:58 PDT
(In reply to
comment #14
)
> Created
attachment 255833
[details]
> Instruments profile for OS X > > Here is the Instruments profile on OS X. Nothing really jumps out at me. > Perhaps you can see something?
Could you try building with this patch, and check the performance?
peavo
Comment 19
2015-07-15 05:42:28 PDT
Created
attachment 256834
[details]
Patch
peavo
Comment 20
2015-07-15 06:09:20 PDT
Created
attachment 256835
[details]
Patch
ewmailing
Comment 21
2015-07-15 07:52:30 PDT
I applied the patch from
attachment 256835
[details]
. That was a huge improvement in performance. My system is has a bunch of stuff open right now so I can't run a clean benchmark, but JSCore has gone from 120 sprites to about 4100. For comparison, the Lua benchmark currently runs at about 10000 sprites. So this is a terrific fix. I'm still wondering if there is something more that can be done. The differential here between Lua and JavaScriptCore still seems a little wider than the other platforms. But thank you. This finally makes JavaScriptCore usable on Windows.
Alex Christensen
Comment 22
2015-07-15 10:38:35 PDT
This looks good to me. Could someone who works more on JavaScriptCore review this?
Mark Lam
Comment 23
2015-07-15 10:45:15 PDT
Comment on
attachment 256835
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=256835&action=review
> Source/JavaScriptCore/runtime/JSLock.h:84 > +class ThreadId {
My first instinct is to question whether we should be creating a new abstraction here. We already have a WTF::ThreadIdentifier in WTF/wtf/Threading.h. Can we use that instead? I’ll look closer.
peavo
Comment 24
2015-07-16 03:05:57 PDT
Created
attachment 256896
[details]
Patch
peavo
Comment 25
2015-07-16 03:48:58 PDT
(In reply to
comment #23
)
> Comment on
attachment 256835
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=256835&action=review
> > > Source/JavaScriptCore/runtime/JSLock.h:84 > > +class ThreadId { > > My first instinct is to question whether we should be creating a new > abstraction here. We already have a WTF::ThreadIdentifier in > WTF/wtf/Threading.h. Can we use that instead? I’ll look closer.
Thanks for reviewing :) Updated patch.
Mark Lam
Comment 26
2015-07-16 09:11:38 PDT
Comment on
attachment 256896
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=256896&action=review
Per, there are 2 changes here: 1. Replacing use of std::thread::id with WTF::ThreadIdentifier. 2. Optimizing out the computation of stack bounds when still in the same Windows fiber. Can you (or someone) please confirm that (1) actually contributes to the performance difference you’re seeing? These 2 changes should be tested separately to measure their effects.
> Source/JavaScriptCore/runtime/JSLock.cpp:121 > + m_ownerThreadID = WTF::currentThread();
No need to use the WTF::. The convention for WTF APIs is that their header files already provide "using” declarations. We can just use an unqualified currentThread() here.
> Source/JavaScriptCore/runtime/JSLock.cpp:200 > + ASSERT(!m_hasExclusiveThread || (exclusiveThread() == WTF::currentThread()));
Ditto. No need for WTF:: qualifier.
> Source/JavaScriptCore/runtime/JSLock.cpp:203 > + return m_ownerThreadID == WTF::currentThread();
Ditto.
> Source/JavaScriptCore/runtime/JSLock.cpp:210 > + ASSERT(exclusiveThread() == WTF::currentThread());
Ditto.
> Source/WebCore/bindings/js/JSDOMWindowBase.cpp:206 > + vm->setExclusiveThread(WTF::currentThread());
Ditto. No need to WTF:: qualifier.
Anders Carlsson
Comment 27
2015-07-16 10:28:27 PDT
I don't think we should do this. If std::thread::id is slow on Windows then that's a MSVC bug and should be reported through appropriate channels.
Alex Christensen
Comment 28
2015-07-16 10:43:22 PDT
(In reply to
comment #27
)
> I don't think we should do this. If std::thread::id is slow on Windows then > that's a MSVC bug and should be reported through appropriate channels.
That is true, but we've worked around bugs in MSVC before, and I think a 35x speed improvement is worth it.
Anders Carlsson
Comment 29
2015-07-16 10:45:21 PDT
(In reply to
comment #28
)
> (In reply to
comment #27
) > > I don't think we should do this. If std::thread::id is slow on Windows then > > that's a MSVC bug and should be reported through appropriate channels. > That is true, but we've worked around bugs in MSVC before, and I think a 35x > speed improvement is worth it.
We have, but this would be a very invasive change. Do we know if this bug exists in 2015?
Mark Lam
Comment 30
2015-07-16 10:46:23 PDT
(In reply to
comment #28
)
> ... I think a 35x speed improvement is worth it.
First, let's make sure that it the replacement of std::thread::id with ThreadIdentifier is the really the reason for the 35x speed up. There are 2 changes in this patch. We need to have them tested separately for their effect.
peavo
Comment 31
2015-07-16 10:57:28 PDT
Created
attachment 256913
[details]
Patch
peavo
Comment 32
2015-07-16 11:04:47 PDT
(In reply to
comment #26
)
> Comment on
attachment 256896
[details]
> Patch
Thanks again for reviewing :)
> > View in context: >
https://bugs.webkit.org/attachment.cgi?id=256896&action=review
> > Per, there are 2 changes here: > 1. Replacing use of std::thread::id with WTF::ThreadIdentifier. > 2. Optimizing out the computation of stack bounds when still in the same > Windows fiber. > > Can you (or someone) please confirm that (1) actually contributes to the > performance difference you’re seeing? These 2 changes should be tested > separately to measure their effects. >
I first made change 2. The sprite count in this benchmark then went from 120 to ~300. Next I made change 1. The sprite count then increased from 300 to ~3000.
peavo
Comment 33
2015-07-16 11:11:51 PDT
Below is the MSVC implementation of _Thrd_current(), which is called by std::this_thread::get_id(). It's the DuplicateHandle() call which is time consuming according to the profiler. The code for _WIN32_WCE just calls GetCurrentThreadId(), which is faster. This is basically what we do with this patch. _Thrd_t _Thrd_current(void) { /* return _Thrd_t identifying current thread */ _Thrd_t thr; #ifdef _WIN32_WCE thr._Hnd = (HANDLE)GetCurrentThreadId(); #elif defined(_CRT_APP) && !defined(_KERNELX) thr._Hnd = (HANDLE)__crtGetCurrentWinRTThread(); #else /* _WIN32_WCE */ if (DuplicateHandle(GetCurrentProcess(), GetCurrentThread(), GetCurrentProcess(), &thr._Hnd, DUPLICATE_SAME_ACCESS, TRUE, 0)) CloseHandle(thr._Hnd); else thr._Hnd = 0; #endif /* _WIN32_WCE */ #if defined(_CRT_APP) && !defined(_KERNELX) thr._Id = __crtGetCurrentWinRTThreadId(); #else /* defined(_CRT_APP) && !defined(_KERNELX) */ thr._Id = GetCurrentThreadId(); #endif /* defined(_CRT_APP) && !defined(_KERNELX) */ return (thr); }
Mark Lam
Comment 34
2015-07-16 21:23:16 PDT
Per, ewmailing, do you have access to VS2015? Can you confirm if the issue still manifests on the latest VS?
peavo
Comment 35
2015-07-17 05:21:06 PDT
Created
attachment 256968
[details]
Screenshot of profiling. This attachment shows a summary of the profiling of the sprite benchmark with Visual Studio. The _Thrd_current function is called by std::this_thread::get_id(), which is called via JSLock::lock(). The VirtualQuery function is called via VM::updateStackLimit. These are the two issues addressed in this patch.
peavo
Comment 36
2015-07-17 05:24:35 PDT
(In reply to
comment #34
)
> Per, ewmailing, do you have access to VS2015? Can you confirm if the issue > still manifests on the latest VS?
I checked the source code of _Thrd_current in VS2015 RC, and the issues has not been fixed there, according to the source: _Thrd_t _Thrd_current(void) { /* return _Thrd_t identifying current thread */ _Thrd_t thr; #ifdef _WIN32_WCE thr._Hnd = (HANDLE)GetCurrentThreadId(); #else /* _WIN32_WCE */ if (DuplicateHandle(GetCurrentProcess(), GetCurrentThread(), GetCurrentProcess(), &thr._Hnd, DUPLICATE_SAME_ACCESS, TRUE, 0)) CloseHandle(thr._Hnd); else thr._Hnd = 0; #endif /* _WIN32_WCE */ thr._Id = GetCurrentThreadId(); return (thr); }
Mark Lam
Comment 37
2015-07-17 14:18:07 PDT
Comment on
attachment 256913
[details]
Patch r=me Per, I think you should also file a bug against Windows / MSVC and add the url of the MS bug to this bugzilla (and preferably in the ChangeLog as well before landing). Thanks for investigating this issue.
peavo
Comment 38
2015-07-17 14:25:37 PDT
(In reply to
comment #37
)
> Comment on
attachment 256913
[details]
> Patch > > r=me > > Per, I think you should also file a bug against Windows / MSVC and add the > url of the MS bug to this bugzilla (and preferably in the ChangeLog as well > before landing). Thanks for investigating this issue.
Thanks! I will file a bug against Windows/MSVC.
peavo
Comment 39
2015-07-20 01:15:55 PDT
I reported this issue to Microsoft:
https://connect.microsoft.com/VisualStudio/feedback/details/1558211
peavo
Comment 40
2015-07-20 02:25:32 PDT
Committed
r187020
: <
http://trac.webkit.org/changeset/187020
>
peavo
Comment 41
2015-07-20 03:12:06 PDT
(In reply to
comment #40
)
> Committed
r187020
: <
http://trac.webkit.org/changeset/187020
>
This commit have broken the builds: webcore\bindings\js\JSDOMWindowBase.cpp(206): error C2664: 'void JSC::VM::setExclusiveThread(std::thread::id)' : cannot convert argument 1 from 'WTF::ThreadIdentifier' to 'std::thread::id' (..\bindings\js\JSBindingsAllInOne.cpp) [C:\cygwin\home\buildbot\slave\win-debug\build\Source\WebCore\WebCore.vcxproj\WebCore.vcxproj] No constructor could take the source type, or constructor overload resolution was ambiguous But as far as I can see from VM.h, setExclusiveThread has the signature void setExclusiveThread(ThreadIdentifier threadId). I don't quite understand this ...
peavo
Comment 42
2015-07-20 05:57:15 PDT
After an unreviewed debug build fix in
http://trac.webkit.org/changeset/187021
, the builds are looking better, but this patch has caused JSC api tests to fail for the CLoop builds: ASSERTION FAILED: !identifierByPthreadHandle(pthreadHandle) /Volumes/Data/slave/mavericks-cloop-debug/build/Source/WTF/wtf/ThreadingPthreads.cpp(154) : ThreadIdentifier WTF::establishIdentifierForPthreadHandle(const pthread_t &) 1 0x1078e1510 WTFCrash 2 0x10793cac7 WTF::establishIdentifierForPthreadHandle(_opaque_pthread_t* const&) 3 0x10793c730 WTF::createThreadInternal(void (*)(void*), void*, char const*) 4 0x10793af57 WTF::createThread(char const*, std::__1::function<void ()>) 5 0x10793b0b5 WTF::createThread(void (*)(void*), void*, char const*) 6 0x107446317 JSC::GCThreadSharedData::GCThreadSharedData(JSC::VM*) 7 0x107445c7d JSC::GCThreadSharedData::GCThreadSharedData(JSC::VM*) 8 0x107451a44 JSC::Heap::Heap(JSC::VM*, JSC::HeapType) 9 0x1074517b3 JSC::Heap::Heap(JSC::VM*, JSC::HeapType) 10 0x10788b80f JSC::VM::VM(JSC::VM::VMType, JSC::HeapType) 11 0x10788b731 JSC::VM::VM(JSC::VM::VMType, JSC::HeapType) 12 0x10788ea0c JSC::VM::createContextGroup(JSC::HeapType) 13 0x107554ceb JSContextGroupCreate 14 0x107647885 -[JSVirtualMachine init] 15 0x10755356d -[JSContext init] 16 0x1072739dd currentThisInsideBlockGetterTest() 17 0x107294888 testObjectiveCAPIMain() 18 0x1072859e4 testObjectiveCAPI 19 0x10727bec3 main 20 0x7fff9439c5fd start
peavo
Comment 43
2015-07-20 06:44:19 PDT
Correction: it seems the JSC api tests are failing for all Debug builds. The reason for Debug-only failures might be caused by the assert in MachineStackMarker.cpp, line 287: ASSERT(!m_heap->vm()->hasExclusiveThread() || m_heap->vm()->exclusiveThread() == currentThread());
David Kilzer (:ddkilzer)
Comment 44
2015-07-20 08:17:09 PDT
(In reply to
comment #40
)
> Committed
r187020
: <
http://trac.webkit.org/changeset/187020
>
Debug build fix: Committed
r187021
: <
http://trac.webkit.org/changeset/187021
>
peavo
Comment 45
2015-07-20 08:35:30 PDT
I added a possible fix for the JSC api test failures in
https://bugs.webkit.org/show_bug.cgi?id=147110
, but it might be better to revert the two commits related to this bug, and include the fix in this bug?
Mark Lam
Comment 46
2015-07-20 11:15:59 PDT
Rolled out
r187020
and
r187021
in
r187026
: <
http://trac.webkit.org/r187026
>.
Geoffrey Garen
Comment 47
2015-07-21 16:23:27 PDT
I think it's wrong to focus on re-architecting thread id and stack bounds here. You'll notice that re-architecting thread id and stack bounds did not make JavaScriptCore competitive with Lua. That's a sure sign that it did not fix the root problem. The profile shows two things: (1) We're taking the JSLock just to answer the "is null?" question; (2) Taking the JSLock does a lot of work. We can fix (1) by not taking the JSLock just to answer trivial type checking questions. (Similarly, we probably should not take the JSLock just to create trivial types like null.) Then we should profile again to see if JSLock is still expensive relative to other operations. We can probably make JSLock do a lot less work if we need to.
Anders Carlsson
Comment 48
2015-07-21 16:30:05 PDT
FWIW, I looked at the MSVC runtime library in VS2015 and the performance problem with std::this_thread::get_id() has been fixed.
peavo
Comment 49
2015-07-22 05:58:39 PDT
(In reply to
comment #48
)
> FWIW, I looked at the MSVC runtime library in VS2015 and the performance > problem with std::this_thread::get_id() has been fixed.
Thanks for looking into this :) Yes, you are right, I unfortunately only checked the source of the _Thrd_current function in VS2015 RC, but not the performance of the std::this_thread::get_id() function itself. I will update the patch.
peavo
Comment 50
2015-07-22 06:11:53 PDT
(In reply to
comment #47
)
> I think it's wrong to focus on re-architecting thread id and stack bounds > here. > > You'll notice that re-architecting thread id and stack bounds did not make > JavaScriptCore competitive with Lua. That's a sure sign that it did not fix > the root problem. > > The profile shows two things: > > (1) We're taking the JSLock just to answer the "is null?" question; > > (2) Taking the JSLock does a lot of work. > > We can fix (1) by not taking the JSLock just to answer trivial type checking > questions. (Similarly, we probably should not take the JSLock just to create > trivial types like null.) > > Then we should profile again to see if JSLock is still expensive relative to > other operations. We can probably make JSLock do a lot less work if we need > to.
Thanks, these are good points :) I will check the performance without locking in the trivial cases. Maybe we can include the stack bounds optimization on Windows as well?
peavo
Comment 51
2015-07-22 07:07:39 PDT
(In reply to
comment #47
)
> > We can fix (1) by not taking the JSLock just to answer trivial type checking > questions. (Similarly, we probably should not take the JSLock just to create > trivial types like null.) >
I tested removing locking in the trivial cases, and the sprite count increased from ~3000 to ~4400 :) (without changing the rest of the patch) Profiling now shows that JSValueToNumber and JSValueToObject is spending ~20% of the total time taking and releasing the lock. I assume we need to lock in these cases?
Geoffrey Garen
Comment 52
2015-07-22 11:19:23 PDT
> Profiling now shows that JSValueToNumber and JSValueToObject is spending > ~20% of the total time taking and releasing the lock. I assume we need to > lock in these cases?
There's no need to lock if the value is already a number or an object. Perhaps that optimization would also be worthwhile.
peavo
Comment 53
2015-07-22 11:28:11 PDT
Hmm, I noticed that std::mutex::lock() and std::mutex::unlock() spent about 20% of the total time, and tried replacing the std::mutex in JSLock with WTF::Mutex. WTF::Mutex::lock() and WTF::Mutex::unlock() then spent about 10% of the total time, and the sprite count went from ~4400 to ~6000. So it seems the MSVC implementation of std::mutex is not as fast as it could be.
Anders Carlsson
Comment 54
2015-07-22 11:29:33 PDT
(In reply to
comment #53
)
> Hmm, I noticed that std::mutex::lock() and std::mutex::unlock() spent about > 20% of the total time, and tried replacing the std::mutex in JSLock with > WTF::Mutex. WTF::Mutex::lock() and WTF::Mutex::unlock() then spent about 10% > of the total time, and the sprite count went from ~4400 to ~6000. So it > seems the MSVC implementation of std::mutex is not as fast as it could be.
Again, are you testing with VS2015?
peavo
Comment 55
2015-07-22 11:31:53 PDT
(In reply to
comment #54
)
> (In reply to
comment #53
) > > Hmm, I noticed that std::mutex::lock() and std::mutex::unlock() spent about > > 20% of the total time, and tried replacing the std::mutex in JSLock with > > WTF::Mutex. WTF::Mutex::lock() and WTF::Mutex::unlock() then spent about 10% > > of the total time, and the sprite count went from ~4400 to ~6000. So it > > seems the MSVC implementation of std::mutex is not as fast as it could be. > > Again, are you testing with VS2015?
Not yet, I will test there as well :)
peavo
Comment 56
2015-07-23 05:23:35 PDT
Created
attachment 257349
[details]
Patch
peavo
Comment 57
2015-07-23 05:26:28 PDT
(In reply to
comment #56
)
> Created
attachment 257349
[details]
> Patch
There is no need to review this patch. It is just for testing in case ewmailing wants to test if we are getting closer to Lua in performance. I will remove the WTF::ThreadIdentifier and WTF::Mutex changes in the next patch.
peavo
Comment 58
2015-07-23 06:12:09 PDT
Created
attachment 257350
[details]
Patch
Brent Fulgham
Comment 59
2015-08-13 08:56:12 PDT
Now that we've cut over to Visual Studio 2015, do we still need any of these changes?
peavo
Comment 60
2015-08-13 09:26:19 PDT
(In reply to
comment #59
)
> Now that we've cut over to Visual Studio 2015, do we still need any of these > changes?
I believe the changes related to the reduced use of locks, and the stack bounds optimization (by checking current fiber), are still valid.
peavo
Comment 61
2015-08-13 10:36:44 PDT
(In reply to
comment #60
)
> (In reply to
comment #59
) > > Now that we've cut over to Visual Studio 2015, do we still need any of these > > changes? > > I believe the changes related to the reduced use of locks, and the stack > bounds optimization (by checking current fiber), are still valid.
I guess locking will spend some time waiting for the lock in some cases, so removing unneeded locking should be a good thing. Also, updateStackLimit() seems to be called many times from other places than the api locking, so I believe it makes sense to check whether a new fiber is running. Please correct me if I am wrong :)
Darin Adler
Comment 62
2016-03-09 09:07:20 PST
Geoff, could you review and either do review+ or review- on this?
Brady Eidson
Comment 63
2017-04-24 19:11:09 PDT
Comment on
attachment 257350
[details]
Patch This patch has been pending review since 2015 with no recent activity. It seems unlikely that it would even still apply to trunk in its current form. Clearing from the review queue. Feel free to update and resubmit if the patch is still relevant.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug