Bug 146448

Summary: JavaScriptCore performance is very bad on Windows
Product: WebKit Reporter: ewmailing
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: NEW    
Severity: Major CC: achristensen, andersca, beidson, benjamin, bfulgham, cmarcelo, commit-queue, darin, ddkilzer, fpizlo, ggaren, hyungwook.lee, mark.lam, mark.salisbury, msaboff, peavo, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Windows 7   
Bug Depends on:    
Bug Blocks: 129105    
Attachments:
Description Flags
How to build JSCore for Windows
none
screen capture of Visual Studio profilers
none
Instruments profile for OS X
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Screenshot of profiling.
none
Patch
none
Patch beidson: review-

ewmailing
Reported 2015-06-29 19:57:02 PDT
Created attachment 255809 [details] How to build JSCore for Windows I’ve been trying to use JavaScriptCore as an embedded scripting language for a video game engine. I’ve been using it on multiple platforms. During my testing, I’ve discovered that the JavaScriptCore Windows performance is very slow. I wanted to alert you in hopes there is something that can be done to fix it. I’ve been getting help on building JavaScriptCore for Windows from a few people in the community. My current build is as of a couple of weeks ago. However, I’ve been seeing the same performance characteristics from a build from a little over a year ago. To prove to you that this is a problem to just JavaScriptCore on Windows, and not a general Windows problem, nor a problem with my code, I have done the following: - I ran the program on Windows, OS X, Linux, iOS, Android, and Raspberry Pi (Linux). - I wrote comparable programs in C and Lua and also ran these on all platforms. - Additionally, I was able to run on the same hardware (dual boot) for Windows/Linux. So here is the quick summary: Ubuntu 12.04 LTS (64-bit) Windows 8.1 (64-bit) =========================================================== C 5600 sprites 27000 sprites Lua 3100 sprites 12000 sprites JSCore 2400 sprites 120 sprites This was on an Intel Core i5-3570K @ 3.40 GHz with Intel 4000 HD graphics. The reason for the big jump in Windows performance is mostly due to the Intel graphics drivers being poor on Linux. However, that 120 sprites is not a typo. Whereas everything else was faster on Windows in this case, JSCore is a magnitude slower. My numbers for the other operating systems tend to follow sensible trends. Windows/JSCore is an outlier. And other programs I’ve run also show JavaScriptCore on Windows has performance problems. You can see this summary with the programs running in a short video I put together and uploaded to YouTube: https://youtu.be/UDyI1cckDJI I have uploaded pre-built binaries of my test program. JavaScriptCore is dynamically linked in all cases, so it is easy to swap in another build of JSCore if you want to try things. There is a file called config.txt in each bundle that lets you change the number of sprites. (I did not sign the OS X binaries with my Developer ID because I figured you might want to change the config.txt.) Also, the Lua and JavaScript scripts are in plain text and can be easily modified. And for clarity: All my test programs here are 64-bit versions. I have not actually benchmarked 32-bit except on ARM devices. Windows 7 and up (x64) Zip: http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkC.zip http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkLua.zip http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkJavaScript.zip Installer: http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkC-1.0.0-win64.exe http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkLua-1.0.0-win64.exe http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkJavaScript-1.0.0-win64.exe OS X 10.9 and up http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkC.app.zip http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkLua.app.zip http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkJavaScript.app.zip Linux (x64 Ubuntu 12.04 LTS a.k.a Steam Runtime) http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkC-0.1.1-Linux.tar.gz http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkLua-0.1.1-Linux.tar.gz http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkJavaScript-0.1.1-Linux.tar.gz I also made the repos to my programs available: https://bitbucket.org/blurrr/minimalspritebenchmarkc https://bitbucket.org/blurrr/minimalspritebenchmarklua https://bitbucket.org/blurrr/minimalspritebenchmarkjavascript However, it probably won’t be of much use without my SDK because without it, you’ll need to compile all the dependencies yourself and figure out the build system. If you need a copy of my SDK, let me know. I'm also attaching my write up of how to build JavaScriptCore for Windows (64-bit). The existing instructions had problems and with the help of a few in the community, I was able to get through it and the document is what I came up with.
Attachments
How to build JSCore for Windows (503.63 KB, application/zip)
2015-06-29 19:57 PDT, ewmailing
no flags
screen capture of Visual Studio profilers (281.20 KB, image/jpeg)
2015-06-30 10:21 PDT, ewmailing
no flags
Instruments profile for OS X (508.98 KB, image/png)
2015-06-30 11:12 PDT, ewmailing
no flags
Patch (3.90 KB, patch)
2015-07-14 13:17 PDT, peavo
no flags
Patch (10.22 KB, patch)
2015-07-15 05:42 PDT, peavo
no flags
Patch (9.72 KB, patch)
2015-07-15 06:09 PDT, peavo
no flags
Patch (9.19 KB, patch)
2015-07-16 03:05 PDT, peavo
no flags
Patch (9.17 KB, patch)
2015-07-16 10:57 PDT, peavo
no flags
Screenshot of profiling. (57.89 KB, image/png)
2015-07-17 05:21 PDT, peavo
no flags
Patch (14.00 KB, patch)
2015-07-23 05:23 PDT, peavo
no flags
Patch (8.01 KB, patch)
2015-07-23 06:12 PDT, peavo
beidson: review-
Brent Fulgham
Comment 1 2015-06-29 21:52:44 PDT
You mentioned Ubuntu and Windows, and indicated you had run on OS X as well, but I don't see OS X results. Would you mind showing those as well?
ewmailing
Comment 2 2015-06-29 22:33:44 PDT
I excluded the OS X results because my Mac is a different machine with different hardware so it is harder to do direct comparisons and I didn't want to confuse anybody by the results. But the C vs. Lua vs. JavaScriptCore results are still useful for extrapolation. So on an 21.5" iMac Core i3 Mid 2010 @ 3.2 GHz with 16GB of RAM and ATI Radeon HD 5670 OS X 10.10.3 ========================== C 15000 sprites Lua 5500 sprites JSCore 3200 sprites Just a reminder, I did provide an OS X build in my links so you can yourself. So if you have a triple boot machine, you can run all 3 to get more controlled results.
Brent Fulgham
Comment 3 2015-06-29 22:34:42 PDT
peavo
Comment 4 2015-06-30 00:51:35 PDT
Maybe JIT in JSC is not enabled? It should be enabled by default, though.
peavo
Comment 5 2015-06-30 01:07:41 PDT
Can you do some profiling to try to determine where time is spent?
peavo
Comment 6 2015-06-30 02:45:24 PDT
I just built WebKit for Win64, and tested WinLauncher and IE 11 on SunSpider on dromaeo.com. WinLauncher makes 1108.84runs/s (http://dromaeo.com/?id=238774) and IE 11 makes 1096.88runs/s (http://dromaeo.com/?id=238775), so it seems performance is good on SunSpider at least.
Hyungwook Lee
Comment 7 2015-06-30 06:16:29 PDT
In my investigation, Windows port looks 2x slower than Ubuntu EFL port. WebKit Windows port 32bit Release: 442.4ms (SunSpider 1.0.2) WebKit Windows port 64bit Release: Not yet tested (SunSpider 1.0.2) WebKit2 Ubuntu EFL port 64bit Release: 164.1ms (SunSpider 1.0.2) I will check whether JavaScriptCore FTL makes this difference results or not.
Timothy Hatcher
Comment 8 2015-06-30 09:19:54 PDT
Timothy Hatcher
Comment 9 2015-06-30 09:21:26 PDT
ewmailing
Comment 10 2015-06-30 10:12:42 PDT
(In reply to comment #4) > Maybe JIT in JSC is not enabled? It should be enabled by default, though. So I don't think this is a JIT specific issue for several reasons. - The number I'm seeing on Windows is way too slow, even without JIT - Remember that Lua (official PUC-Rio) has no JIT at all. (That is one reason I included Lua as a reference point.) - I didn't do a full suite, but I did some spot checks with LuaJIT about a year ago. (Lua 5.3 was still in alpha so it's changed a little bit since then and probably better today.) iMac 3.2GHz Intel Core i3 (OS X 10.9.2) @60fps (some variance, hard to reproduce sometimes due to background processes, etc) C - 16000 sprites Lua 5.3w1 - 5400 sprites JavaScriptCore - 3300 sprites LuaJIT - 9000 sprites On an Intel 3570k (i5) with HD 4000 (3.4GHz): SteamOS (beta from a year ago): C - 5700 sprites Lua 5.3w1 - 2700 sprites JavaScriptCore - 3200 sprites LuaJIT - 3600 sprites Notice that LuaJIT performance really climbs up. I did not do any LuaJIT specific optimizations. Also notice the SteamOS benchmark where JavaScriptCore beats regular Lua. This is not happening on Mac so it got me thinking that JavaScriptCore on Mac does not enable JIT for Mac Apps. - Here is an old benchmark I did on an iPad Mini 1st gen. iPad mini (1st gen, iOS 7.1) @60fps C - 750 sprites Lua 400 sprites (double/long long) and 430 for float/int JavaScriptCore 200 sprites LuJIT 430 sprites I know iOS doesn't enable JIT for JavaScriptCore. JIT also must be disabled on LuaJIT for iOS so it returns to regular Lua-like performance. But notice that JavaScriptCore on a 1st gen iPad mini with no JIT is still beating Windows desktop. So I expect JavaScriptCore, even without JIT to do a lot better than I see on Windows. P.S. You said JIT should be compiled in by default. What are the compiler flags to look for? I'm compiling Android by hand and I don't know how to verify if JIT is on or not. My Android numbers are slightly better than iOS, but I think that's because my Android hardware is better than my iOS hardware and not because of JIT.
ewmailing
Comment 11 2015-06-30 10:21:31 PDT
Created attachment 255828 [details] screen capture of Visual Studio profilers I am not knowledgable on how to use the Visual Studio profiler at all. The profile shows that the JavaScript binding to SDL is where most of the time is spent. But this is totally expected. The question is why is Windows 100 times slower overall? The language bindings are exactly the same for all platforms... public JavaScriptCore C API.
peavo
Comment 12 2015-06-30 10:47:08 PDT
(In reply to comment #10) > (In reply to comment #4) > > P.S. You said JIT should be compiled in by default. What are the compiler > flags to look for? I'm compiling Android by hand and I don't know how to > verify if JIT is on or not. My Android numbers are slightly better than iOS, > but I think that's because my Android hardware is better than my iOS > hardware and not because of JIT. ENABLE_JIT in WTF/Platform.h. Also ENABLE_CONCURRENT_JIT (in WTF/Platform.h) is not enabled on Windows, I don't know if that can have any impact on your tests.
peavo
Comment 13 2015-06-30 10:49:18 PDT
(In reply to comment #11) > Created attachment 255828 [details] > screen capture of Visual Studio profilers > > I am not knowledgable on how to use the Visual Studio profiler at all. > The profile shows that the JavaScript binding to SDL is where most of the > time is spent. But this is totally expected. The question is why is Windows > 100 times slower overall? The language bindings are exactly the same for all > platforms... public JavaScriptCore C API. Maybe you can do similar profiling on OSX, and see if you spot any big differences?
ewmailing
Comment 14 2015-06-30 11:12:09 PDT
Created attachment 255833 [details] Instruments profile for OS X Here is the Instruments profile on OS X. Nothing really jumps out at me. Perhaps you can see something?
Hyungwook Lee
Comment 15 2015-07-07 20:17:52 PDT
I've disabled ENABLE_CONCURRENT_JIT in WebKit2 EFL port and measured Sun Spider 1.0.2 test as following. WebKit Windows port 32bit Release: 442.4ms (SunSpider 1.0.2) WebKit Windows port 64bit Release: Not yet tested (SunSpider 1.0.2) WebKit2 Ubuntu EFL port 64bit Release: 164.1ms (SunSpider 1.0.2) WebKit2 Ubuntu EFL port 64bit Reelase: 223.0ms (Disabled CONCURRENT_JIT) I think ENABLE_CONCURRENT_JIT make performance improvement but it is not major reason for this issue. Additionally, WebKit2 EFL port doesn't use FTL as below. #if PLATFORM(GTK) && HAVE(LLVM) && ENABLE(JIT) && !defined(ENABLE_FTL_JIT) && CPU(X86_64) #define ENABLE_FTL_JIT 1 #endif It looks current issue doesn't related to JIT and it's related stuff.
peavo
Comment 16 2015-07-14 13:17:10 PDT
WebKit Commit Bot
Comment 17 2015-07-14 13:19:34 PDT
Attachment 256787 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/runtime/JSLock.cpp:35: _Thrd_current is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] Total errors found: 1 in 5 files If any of these errors are false positives, please file a bug against check-webkit-style.
peavo
Comment 18 2015-07-14 13:20:58 PDT
(In reply to comment #14) > Created attachment 255833 [details] > Instruments profile for OS X > > Here is the Instruments profile on OS X. Nothing really jumps out at me. > Perhaps you can see something? Could you try building with this patch, and check the performance?
peavo
Comment 19 2015-07-15 05:42:28 PDT
peavo
Comment 20 2015-07-15 06:09:20 PDT
ewmailing
Comment 21 2015-07-15 07:52:30 PDT
I applied the patch from attachment 256835 [details]. That was a huge improvement in performance. My system is has a bunch of stuff open right now so I can't run a clean benchmark, but JSCore has gone from 120 sprites to about 4100. For comparison, the Lua benchmark currently runs at about 10000 sprites. So this is a terrific fix. I'm still wondering if there is something more that can be done. The differential here between Lua and JavaScriptCore still seems a little wider than the other platforms. But thank you. This finally makes JavaScriptCore usable on Windows.
Alex Christensen
Comment 22 2015-07-15 10:38:35 PDT
This looks good to me. Could someone who works more on JavaScriptCore review this?
Mark Lam
Comment 23 2015-07-15 10:45:15 PDT
Comment on attachment 256835 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=256835&action=review > Source/JavaScriptCore/runtime/JSLock.h:84 > +class ThreadId { My first instinct is to question whether we should be creating a new abstraction here. We already have a WTF::ThreadIdentifier in WTF/wtf/Threading.h. Can we use that instead? I’ll look closer.
peavo
Comment 24 2015-07-16 03:05:57 PDT
peavo
Comment 25 2015-07-16 03:48:58 PDT
(In reply to comment #23) > Comment on attachment 256835 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=256835&action=review > > > Source/JavaScriptCore/runtime/JSLock.h:84 > > +class ThreadId { > > My first instinct is to question whether we should be creating a new > abstraction here. We already have a WTF::ThreadIdentifier in > WTF/wtf/Threading.h. Can we use that instead? I’ll look closer. Thanks for reviewing :) Updated patch.
Mark Lam
Comment 26 2015-07-16 09:11:38 PDT
Comment on attachment 256896 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=256896&action=review Per, there are 2 changes here: 1. Replacing use of std::thread::id with WTF::ThreadIdentifier. 2. Optimizing out the computation of stack bounds when still in the same Windows fiber. Can you (or someone) please confirm that (1) actually contributes to the performance difference you’re seeing? These 2 changes should be tested separately to measure their effects. > Source/JavaScriptCore/runtime/JSLock.cpp:121 > + m_ownerThreadID = WTF::currentThread(); No need to use the WTF::. The convention for WTF APIs is that their header files already provide "using” declarations. We can just use an unqualified currentThread() here. > Source/JavaScriptCore/runtime/JSLock.cpp:200 > + ASSERT(!m_hasExclusiveThread || (exclusiveThread() == WTF::currentThread())); Ditto. No need for WTF:: qualifier. > Source/JavaScriptCore/runtime/JSLock.cpp:203 > + return m_ownerThreadID == WTF::currentThread(); Ditto. > Source/JavaScriptCore/runtime/JSLock.cpp:210 > + ASSERT(exclusiveThread() == WTF::currentThread()); Ditto. > Source/WebCore/bindings/js/JSDOMWindowBase.cpp:206 > + vm->setExclusiveThread(WTF::currentThread()); Ditto. No need to WTF:: qualifier.
Anders Carlsson
Comment 27 2015-07-16 10:28:27 PDT
I don't think we should do this. If std::thread::id is slow on Windows then that's a MSVC bug and should be reported through appropriate channels.
Alex Christensen
Comment 28 2015-07-16 10:43:22 PDT
(In reply to comment #27) > I don't think we should do this. If std::thread::id is slow on Windows then > that's a MSVC bug and should be reported through appropriate channels. That is true, but we've worked around bugs in MSVC before, and I think a 35x speed improvement is worth it.
Anders Carlsson
Comment 29 2015-07-16 10:45:21 PDT
(In reply to comment #28) > (In reply to comment #27) > > I don't think we should do this. If std::thread::id is slow on Windows then > > that's a MSVC bug and should be reported through appropriate channels. > That is true, but we've worked around bugs in MSVC before, and I think a 35x > speed improvement is worth it. We have, but this would be a very invasive change. Do we know if this bug exists in 2015?
Mark Lam
Comment 30 2015-07-16 10:46:23 PDT
(In reply to comment #28) > ... I think a 35x speed improvement is worth it. First, let's make sure that it the replacement of std::thread::id with ThreadIdentifier is the really the reason for the 35x speed up. There are 2 changes in this patch. We need to have them tested separately for their effect.
peavo
Comment 31 2015-07-16 10:57:28 PDT
peavo
Comment 32 2015-07-16 11:04:47 PDT
(In reply to comment #26) > Comment on attachment 256896 [details] > Patch Thanks again for reviewing :) > > View in context: > https://bugs.webkit.org/attachment.cgi?id=256896&action=review > > Per, there are 2 changes here: > 1. Replacing use of std::thread::id with WTF::ThreadIdentifier. > 2. Optimizing out the computation of stack bounds when still in the same > Windows fiber. > > Can you (or someone) please confirm that (1) actually contributes to the > performance difference you’re seeing? These 2 changes should be tested > separately to measure their effects. > I first made change 2. The sprite count in this benchmark then went from 120 to ~300. Next I made change 1. The sprite count then increased from 300 to ~3000.
peavo
Comment 33 2015-07-16 11:11:51 PDT
Below is the MSVC implementation of _Thrd_current(), which is called by std::this_thread::get_id(). It's the DuplicateHandle() call which is time consuming according to the profiler. The code for _WIN32_WCE just calls GetCurrentThreadId(), which is faster. This is basically what we do with this patch. _Thrd_t _Thrd_current(void) { /* return _Thrd_t identifying current thread */ _Thrd_t thr; #ifdef _WIN32_WCE thr._Hnd = (HANDLE)GetCurrentThreadId(); #elif defined(_CRT_APP) && !defined(_KERNELX) thr._Hnd = (HANDLE)__crtGetCurrentWinRTThread(); #else /* _WIN32_WCE */ if (DuplicateHandle(GetCurrentProcess(), GetCurrentThread(), GetCurrentProcess(), &thr._Hnd, DUPLICATE_SAME_ACCESS, TRUE, 0)) CloseHandle(thr._Hnd); else thr._Hnd = 0; #endif /* _WIN32_WCE */ #if defined(_CRT_APP) && !defined(_KERNELX) thr._Id = __crtGetCurrentWinRTThreadId(); #else /* defined(_CRT_APP) && !defined(_KERNELX) */ thr._Id = GetCurrentThreadId(); #endif /* defined(_CRT_APP) && !defined(_KERNELX) */ return (thr); }
Mark Lam
Comment 34 2015-07-16 21:23:16 PDT
Per, ewmailing, do you have access to VS2015? Can you confirm if the issue still manifests on the latest VS?
peavo
Comment 35 2015-07-17 05:21:06 PDT
Created attachment 256968 [details] Screenshot of profiling. This attachment shows a summary of the profiling of the sprite benchmark with Visual Studio. The _Thrd_current function is called by std::this_thread::get_id(), which is called via JSLock::lock(). The VirtualQuery function is called via VM::updateStackLimit. These are the two issues addressed in this patch.
peavo
Comment 36 2015-07-17 05:24:35 PDT
(In reply to comment #34) > Per, ewmailing, do you have access to VS2015? Can you confirm if the issue > still manifests on the latest VS? I checked the source code of _Thrd_current in VS2015 RC, and the issues has not been fixed there, according to the source: _Thrd_t _Thrd_current(void) { /* return _Thrd_t identifying current thread */ _Thrd_t thr; #ifdef _WIN32_WCE thr._Hnd = (HANDLE)GetCurrentThreadId(); #else /* _WIN32_WCE */ if (DuplicateHandle(GetCurrentProcess(), GetCurrentThread(), GetCurrentProcess(), &thr._Hnd, DUPLICATE_SAME_ACCESS, TRUE, 0)) CloseHandle(thr._Hnd); else thr._Hnd = 0; #endif /* _WIN32_WCE */ thr._Id = GetCurrentThreadId(); return (thr); }
Mark Lam
Comment 37 2015-07-17 14:18:07 PDT
Comment on attachment 256913 [details] Patch r=me Per, I think you should also file a bug against Windows / MSVC and add the url of the MS bug to this bugzilla (and preferably in the ChangeLog as well before landing). Thanks for investigating this issue.
peavo
Comment 38 2015-07-17 14:25:37 PDT
(In reply to comment #37) > Comment on attachment 256913 [details] > Patch > > r=me > > Per, I think you should also file a bug against Windows / MSVC and add the > url of the MS bug to this bugzilla (and preferably in the ChangeLog as well > before landing). Thanks for investigating this issue. Thanks! I will file a bug against Windows/MSVC.
peavo
Comment 39 2015-07-20 01:15:55 PDT
peavo
Comment 40 2015-07-20 02:25:32 PDT
peavo
Comment 41 2015-07-20 03:12:06 PDT
(In reply to comment #40) > Committed r187020: <http://trac.webkit.org/changeset/187020> This commit have broken the builds: webcore\bindings\js\JSDOMWindowBase.cpp(206): error C2664: 'void JSC::VM::setExclusiveThread(std::thread::id)' : cannot convert argument 1 from 'WTF::ThreadIdentifier' to 'std::thread::id' (..\bindings\js\JSBindingsAllInOne.cpp) [C:\cygwin\home\buildbot\slave\win-debug\build\Source\WebCore\WebCore.vcxproj\WebCore.vcxproj] No constructor could take the source type, or constructor overload resolution was ambiguous But as far as I can see from VM.h, setExclusiveThread has the signature void setExclusiveThread(ThreadIdentifier threadId). I don't quite understand this ...
peavo
Comment 42 2015-07-20 05:57:15 PDT
After an unreviewed debug build fix in http://trac.webkit.org/changeset/187021, the builds are looking better, but this patch has caused JSC api tests to fail for the CLoop builds: ASSERTION FAILED: !identifierByPthreadHandle(pthreadHandle) /Volumes/Data/slave/mavericks-cloop-debug/build/Source/WTF/wtf/ThreadingPthreads.cpp(154) : ThreadIdentifier WTF::establishIdentifierForPthreadHandle(const pthread_t &) 1 0x1078e1510 WTFCrash 2 0x10793cac7 WTF::establishIdentifierForPthreadHandle(_opaque_pthread_t* const&) 3 0x10793c730 WTF::createThreadInternal(void (*)(void*), void*, char const*) 4 0x10793af57 WTF::createThread(char const*, std::__1::function<void ()>) 5 0x10793b0b5 WTF::createThread(void (*)(void*), void*, char const*) 6 0x107446317 JSC::GCThreadSharedData::GCThreadSharedData(JSC::VM*) 7 0x107445c7d JSC::GCThreadSharedData::GCThreadSharedData(JSC::VM*) 8 0x107451a44 JSC::Heap::Heap(JSC::VM*, JSC::HeapType) 9 0x1074517b3 JSC::Heap::Heap(JSC::VM*, JSC::HeapType) 10 0x10788b80f JSC::VM::VM(JSC::VM::VMType, JSC::HeapType) 11 0x10788b731 JSC::VM::VM(JSC::VM::VMType, JSC::HeapType) 12 0x10788ea0c JSC::VM::createContextGroup(JSC::HeapType) 13 0x107554ceb JSContextGroupCreate 14 0x107647885 -[JSVirtualMachine init] 15 0x10755356d -[JSContext init] 16 0x1072739dd currentThisInsideBlockGetterTest() 17 0x107294888 testObjectiveCAPIMain() 18 0x1072859e4 testObjectiveCAPI 19 0x10727bec3 main 20 0x7fff9439c5fd start
peavo
Comment 43 2015-07-20 06:44:19 PDT
Correction: it seems the JSC api tests are failing for all Debug builds. The reason for Debug-only failures might be caused by the assert in MachineStackMarker.cpp, line 287: ASSERT(!m_heap->vm()->hasExclusiveThread() || m_heap->vm()->exclusiveThread() == currentThread());
David Kilzer (:ddkilzer)
Comment 44 2015-07-20 08:17:09 PDT
peavo
Comment 45 2015-07-20 08:35:30 PDT
I added a possible fix for the JSC api test failures in https://bugs.webkit.org/show_bug.cgi?id=147110, but it might be better to revert the two commits related to this bug, and include the fix in this bug?
Mark Lam
Comment 46 2015-07-20 11:15:59 PDT
Geoffrey Garen
Comment 47 2015-07-21 16:23:27 PDT
I think it's wrong to focus on re-architecting thread id and stack bounds here. You'll notice that re-architecting thread id and stack bounds did not make JavaScriptCore competitive with Lua. That's a sure sign that it did not fix the root problem. The profile shows two things: (1) We're taking the JSLock just to answer the "is null?" question; (2) Taking the JSLock does a lot of work. We can fix (1) by not taking the JSLock just to answer trivial type checking questions. (Similarly, we probably should not take the JSLock just to create trivial types like null.) Then we should profile again to see if JSLock is still expensive relative to other operations. We can probably make JSLock do a lot less work if we need to.
Anders Carlsson
Comment 48 2015-07-21 16:30:05 PDT
FWIW, I looked at the MSVC runtime library in VS2015 and the performance problem with std::this_thread::get_id() has been fixed.
peavo
Comment 49 2015-07-22 05:58:39 PDT
(In reply to comment #48) > FWIW, I looked at the MSVC runtime library in VS2015 and the performance > problem with std::this_thread::get_id() has been fixed. Thanks for looking into this :) Yes, you are right, I unfortunately only checked the source of the _Thrd_current function in VS2015 RC, but not the performance of the std::this_thread::get_id() function itself. I will update the patch.
peavo
Comment 50 2015-07-22 06:11:53 PDT
(In reply to comment #47) > I think it's wrong to focus on re-architecting thread id and stack bounds > here. > > You'll notice that re-architecting thread id and stack bounds did not make > JavaScriptCore competitive with Lua. That's a sure sign that it did not fix > the root problem. > > The profile shows two things: > > (1) We're taking the JSLock just to answer the "is null?" question; > > (2) Taking the JSLock does a lot of work. > > We can fix (1) by not taking the JSLock just to answer trivial type checking > questions. (Similarly, we probably should not take the JSLock just to create > trivial types like null.) > > Then we should profile again to see if JSLock is still expensive relative to > other operations. We can probably make JSLock do a lot less work if we need > to. Thanks, these are good points :) I will check the performance without locking in the trivial cases. Maybe we can include the stack bounds optimization on Windows as well?
peavo
Comment 51 2015-07-22 07:07:39 PDT
(In reply to comment #47) > > We can fix (1) by not taking the JSLock just to answer trivial type checking > questions. (Similarly, we probably should not take the JSLock just to create > trivial types like null.) > I tested removing locking in the trivial cases, and the sprite count increased from ~3000 to ~4400 :) (without changing the rest of the patch) Profiling now shows that JSValueToNumber and JSValueToObject is spending ~20% of the total time taking and releasing the lock. I assume we need to lock in these cases?
Geoffrey Garen
Comment 52 2015-07-22 11:19:23 PDT
> Profiling now shows that JSValueToNumber and JSValueToObject is spending > ~20% of the total time taking and releasing the lock. I assume we need to > lock in these cases? There's no need to lock if the value is already a number or an object. Perhaps that optimization would also be worthwhile.
peavo
Comment 53 2015-07-22 11:28:11 PDT
Hmm, I noticed that std::mutex::lock() and std::mutex::unlock() spent about 20% of the total time, and tried replacing the std::mutex in JSLock with WTF::Mutex. WTF::Mutex::lock() and WTF::Mutex::unlock() then spent about 10% of the total time, and the sprite count went from ~4400 to ~6000. So it seems the MSVC implementation of std::mutex is not as fast as it could be.
Anders Carlsson
Comment 54 2015-07-22 11:29:33 PDT
(In reply to comment #53) > Hmm, I noticed that std::mutex::lock() and std::mutex::unlock() spent about > 20% of the total time, and tried replacing the std::mutex in JSLock with > WTF::Mutex. WTF::Mutex::lock() and WTF::Mutex::unlock() then spent about 10% > of the total time, and the sprite count went from ~4400 to ~6000. So it > seems the MSVC implementation of std::mutex is not as fast as it could be. Again, are you testing with VS2015?
peavo
Comment 55 2015-07-22 11:31:53 PDT
(In reply to comment #54) > (In reply to comment #53) > > Hmm, I noticed that std::mutex::lock() and std::mutex::unlock() spent about > > 20% of the total time, and tried replacing the std::mutex in JSLock with > > WTF::Mutex. WTF::Mutex::lock() and WTF::Mutex::unlock() then spent about 10% > > of the total time, and the sprite count went from ~4400 to ~6000. So it > > seems the MSVC implementation of std::mutex is not as fast as it could be. > > Again, are you testing with VS2015? Not yet, I will test there as well :)
peavo
Comment 56 2015-07-23 05:23:35 PDT
peavo
Comment 57 2015-07-23 05:26:28 PDT
(In reply to comment #56) > Created attachment 257349 [details] > Patch There is no need to review this patch. It is just for testing in case ewmailing wants to test if we are getting closer to Lua in performance. I will remove the WTF::ThreadIdentifier and WTF::Mutex changes in the next patch.
peavo
Comment 58 2015-07-23 06:12:09 PDT
Brent Fulgham
Comment 59 2015-08-13 08:56:12 PDT
Now that we've cut over to Visual Studio 2015, do we still need any of these changes?
peavo
Comment 60 2015-08-13 09:26:19 PDT
(In reply to comment #59) > Now that we've cut over to Visual Studio 2015, do we still need any of these > changes? I believe the changes related to the reduced use of locks, and the stack bounds optimization (by checking current fiber), are still valid.
peavo
Comment 61 2015-08-13 10:36:44 PDT
(In reply to comment #60) > (In reply to comment #59) > > Now that we've cut over to Visual Studio 2015, do we still need any of these > > changes? > > I believe the changes related to the reduced use of locks, and the stack > bounds optimization (by checking current fiber), are still valid. I guess locking will spend some time waiting for the lock in some cases, so removing unneeded locking should be a good thing. Also, updateStackLimit() seems to be called many times from other places than the api locking, so I believe it makes sense to check whether a new fiber is running. Please correct me if I am wrong :)
Darin Adler
Comment 62 2016-03-09 09:07:20 PST
Geoff, could you review and either do review+ or review- on this?
Brady Eidson
Comment 63 2017-04-24 19:11:09 PDT
Comment on attachment 257350 [details] Patch This patch has been pending review since 2015 with no recent activity. It seems unlikely that it would even still apply to trunk in its current form. Clearing from the review queue. Feel free to update and resubmit if the patch is still relevant.
Note You need to log in before you can comment on or make changes to this bug.