Bug 146448 - JavaScriptCore performance is very bad on Windows
Summary: JavaScriptCore performance is very bad on Windows
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Windows 7
: P2 Major
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks: 129105
  Show dependency treegraph
 
Reported: 2015-06-29 19:57 PDT by ewmailing
Modified: 2017-04-24 19:11 PDT (History)
17 users (show)

See Also:


Attachments
How to build JSCore for Windows (503.63 KB, application/zip)
2015-06-29 19:57 PDT, ewmailing
no flags Details
screen capture of Visual Studio profilers (281.20 KB, image/jpeg)
2015-06-30 10:21 PDT, ewmailing
no flags Details
Instruments profile for OS X (508.98 KB, image/png)
2015-06-30 11:12 PDT, ewmailing
no flags Details
Patch (3.90 KB, patch)
2015-07-14 13:17 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (10.22 KB, patch)
2015-07-15 05:42 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (9.72 KB, patch)
2015-07-15 06:09 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (9.19 KB, patch)
2015-07-16 03:05 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (9.17 KB, patch)
2015-07-16 10:57 PDT, peavo
no flags Details | Formatted Diff | Diff
Screenshot of profiling. (57.89 KB, image/png)
2015-07-17 05:21 PDT, peavo
no flags Details
Patch (14.00 KB, patch)
2015-07-23 05:23 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (8.01 KB, patch)
2015-07-23 06:12 PDT, peavo
beidson: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description ewmailing 2015-06-29 19:57:02 PDT
Created attachment 255809 [details]
How to build JSCore for Windows

I’ve been trying to use JavaScriptCore as an embedded scripting language for a video game engine. I’ve been using it on multiple platforms. During my testing, I’ve discovered that the JavaScriptCore Windows performance is very slow. I wanted to alert you in hopes there is something that can be done to fix it.


I’ve been getting help on building JavaScriptCore for Windows from a few people in the community. My current build is as of a couple of weeks ago. However, I’ve been seeing the same performance characteristics from a build from a little over a year ago.

To prove to you that this is a problem to just JavaScriptCore on Windows, and not a general Windows problem, nor a problem with my code, I have done the following:

- I ran the program on Windows, OS X, Linux, iOS, Android, and Raspberry Pi (Linux).
- I wrote comparable programs in C and Lua and also ran these on all platforms.
- Additionally, I was able to run on the same hardware (dual boot) for Windows/Linux.

So here is the quick summary:

                   Ubuntu 12.04 LTS (64-bit)                       Windows 8.1 (64-bit)
===========================================================
C                     5600 sprites                                              27000 sprites
Lua                  3100 sprites                                             12000 sprites
JSCore            2400 sprites                                              120 sprites

This was on an Intel Core i5-3570K @ 3.40 GHz with Intel 4000 HD graphics. The reason for the big jump in Windows performance is mostly due to the Intel graphics drivers being poor on Linux. However, that 120 sprites is not a typo. Whereas everything else was faster on Windows in this case, JSCore is a magnitude slower.

My numbers for the other operating systems tend to follow sensible trends. Windows/JSCore is an outlier. And other programs I’ve run also show JavaScriptCore on Windows has performance problems.

You can see this summary with the programs running in a short video I put together and uploaded to YouTube:
https://youtu.be/UDyI1cckDJI



I have uploaded pre-built binaries of my test program. JavaScriptCore is dynamically linked in all cases, so it is easy to swap in another build of JSCore if you want to try things. There is a file called config.txt in each bundle that lets you change the number of sprites. (I did not sign the OS X binaries with my Developer ID because I figured you might want to change the config.txt.) Also, the Lua and JavaScript scripts are in plain text and can be easily modified.


And for clarity: All my test programs here are 64-bit versions. I have not actually benchmarked 32-bit except on ARM devices.


Windows 7 and up (x64) 
Zip:
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkC.zip
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkLua.zip
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkJavaScript.zip
Installer:
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkC-1.0.0-win64.exe
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkLua-1.0.0-win64.exe
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkJavaScript-1.0.0-win64.exe

OS X 10.9 and up
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkC.app.zip
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkLua.app.zip
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkJavaScript.app.zip

Linux (x64 Ubuntu 12.04 LTS a.k.a Steam Runtime)
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkC-0.1.1-Linux.tar.gz
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkLua-0.1.1-Linux.tar.gz
http://playcontrol.net/tempdownload/WebKit/MinimalSDLSpriteBenchmarkJavaScript-0.1.1-Linux.tar.gz



I also made the repos to my programs available:

https://bitbucket.org/blurrr/minimalspritebenchmarkc
https://bitbucket.org/blurrr/minimalspritebenchmarklua
https://bitbucket.org/blurrr/minimalspritebenchmarkjavascript

However, it probably won’t be of much use without my SDK because without it, you’ll need to compile all the dependencies yourself and figure out the build system. If you need a copy of my SDK, let me know. 


I'm also attaching my write up of how to build JavaScriptCore for Windows (64-bit). The existing instructions had problems and with the help of a few in the community, I was able to get through it and the document is what I came up with.
Comment 1 Brent Fulgham 2015-06-29 21:52:44 PDT
You mentioned Ubuntu and Windows, and indicated you had run on OS X as well, but I don't see OS X results.

Would you mind showing those as well?
Comment 2 ewmailing 2015-06-29 22:33:44 PDT
I excluded the OS X results because my Mac is a different machine with different hardware so it is harder to do direct comparisons and I didn't want to confuse anybody by the results. But the C vs. Lua vs. JavaScriptCore results are still useful for extrapolation.

So on an 21.5" iMac Core i3 Mid 2010 @ 3.2 GHz with 16GB of RAM and ATI Radeon HD 5670

          OS X 10.10.3
==========================
C        15000 sprites
Lua       5500 sprites
JSCore    3200 sprites


Just a reminder, I did provide an OS X build in my links so you can yourself. So if you have a triple boot machine, you can run all 3 to get more controlled results.
Comment 3 Brent Fulgham 2015-06-29 22:34:42 PDT
<rdar://problem/21604141>
Comment 4 peavo 2015-06-30 00:51:35 PDT
Maybe JIT in JSC is not enabled? It should be enabled by default, though.
Comment 5 peavo 2015-06-30 01:07:41 PDT
Can you do some profiling to try to determine where time is spent?
Comment 6 peavo 2015-06-30 02:45:24 PDT
I just built WebKit for Win64, and tested WinLauncher and IE 11 on SunSpider on dromaeo.com. WinLauncher makes 1108.84runs/s (http://dromaeo.com/?id=238774) and IE 11 makes 1096.88runs/s (http://dromaeo.com/?id=238775), so it seems performance is good on SunSpider at least.
Comment 7 Hyungwook Lee 2015-06-30 06:16:29 PDT
In my investigation, Windows port looks 2x slower than Ubuntu EFL port.

WebKit Windows port 32bit Release: 442.4ms (SunSpider 1.0.2)
WebKit Windows port 64bit Release: Not yet tested (SunSpider 1.0.2)
WebKit2 Ubuntu EFL port 64bit Release: 164.1ms (SunSpider 1.0.2)

I will check whether JavaScriptCore FTL makes this difference results or not.
Comment 8 Timothy Hatcher 2015-06-30 09:19:54 PDT
radar://21604141
Comment 9 Timothy Hatcher 2015-06-30 09:21:26 PDT
rdar://problem/21604141
Comment 10 ewmailing 2015-06-30 10:12:42 PDT
(In reply to comment #4)
> Maybe JIT in JSC is not enabled? It should be enabled by default, though.

So I don't think this is a JIT specific issue for several reasons.

- The number I'm seeing on Windows is way too slow, even without JIT

- Remember that Lua (official PUC-Rio) has no JIT at all. (That is one reason I included Lua as a reference point.)

- I didn't do a full suite, but I did some spot checks with LuaJIT about a year ago. (Lua 5.3 was still in alpha so it's changed a little bit since then and probably better today.)

iMac 3.2GHz Intel Core i3 (OS X 10.9.2)
@60fps (some variance, hard to reproduce sometimes due to background processes, etc)
C - 16000 sprites
Lua 5.3w1 - 5400 sprites
JavaScriptCore - 3300 sprites
LuaJIT - 9000 sprites

On an Intel 3570k (i5) with HD 4000 (3.4GHz):
SteamOS (beta from a year ago):
C - 5700 sprites
Lua 5.3w1 - 2700 sprites
JavaScriptCore - 3200 sprites
LuaJIT - 3600 sprites


Notice that LuaJIT performance really climbs up. I did not do any LuaJIT specific optimizations. 

Also notice the SteamOS benchmark where JavaScriptCore beats regular Lua. This is not happening on Mac so it got me thinking that JavaScriptCore on Mac does not enable JIT for Mac Apps.


- Here is an old benchmark I did on an iPad Mini 1st gen. 

iPad mini (1st gen, iOS 7.1)
@60fps
C - 750 sprites 
Lua 400 sprites (double/long long) and 430 for float/int
JavaScriptCore 200 sprites
LuJIT 430 sprites

I know iOS doesn't enable JIT for JavaScriptCore. JIT also must be disabled on LuaJIT for iOS so it returns to regular Lua-like performance. But notice that JavaScriptCore on a 1st gen iPad mini with no JIT is still beating Windows desktop.


So I expect JavaScriptCore, even without JIT to do a lot better than I see on Windows.


P.S. You said JIT should be compiled in by default. What are the compiler flags to look for? I'm compiling Android by hand and I don't know how to verify if JIT is on or not. My Android numbers are slightly better than iOS, but I think that's because my Android hardware is better than my iOS hardware and not because of JIT.
Comment 11 ewmailing 2015-06-30 10:21:31 PDT
Created attachment 255828 [details]
screen capture of Visual Studio profilers

I am not knowledgable on how to use the Visual Studio profiler at all.
The profile shows that the JavaScript binding to SDL is where most of the time is spent. But this is totally expected. The question is why is Windows 100 times slower overall? The language bindings are exactly the same for all platforms... public JavaScriptCore C API.
Comment 12 peavo 2015-06-30 10:47:08 PDT
(In reply to comment #10)
> (In reply to comment #4)
> 
> P.S. You said JIT should be compiled in by default. What are the compiler
> flags to look for? I'm compiling Android by hand and I don't know how to
> verify if JIT is on or not. My Android numbers are slightly better than iOS,
> but I think that's because my Android hardware is better than my iOS
> hardware and not because of JIT.

ENABLE_JIT in WTF/Platform.h.

Also ENABLE_CONCURRENT_JIT (in WTF/Platform.h) is not enabled on Windows, I don't know if that can have any impact on your tests.
Comment 13 peavo 2015-06-30 10:49:18 PDT
(In reply to comment #11)
> Created attachment 255828 [details]
> screen capture of Visual Studio profilers
> 
> I am not knowledgable on how to use the Visual Studio profiler at all.
> The profile shows that the JavaScript binding to SDL is where most of the
> time is spent. But this is totally expected. The question is why is Windows
> 100 times slower overall? The language bindings are exactly the same for all
> platforms... public JavaScriptCore C API.

Maybe you can do similar profiling on OSX, and see if you spot any big differences?
Comment 14 ewmailing 2015-06-30 11:12:09 PDT
Created attachment 255833 [details]
Instruments profile for OS X

Here is the Instruments profile on OS X. Nothing really jumps out at me. Perhaps you can see something?
Comment 15 Hyungwook Lee 2015-07-07 20:17:52 PDT
I've disabled ENABLE_CONCURRENT_JIT in WebKit2 EFL port and measured Sun Spider 1.0.2 test as following.

WebKit Windows port 32bit Release: 442.4ms (SunSpider 1.0.2)
WebKit Windows port 64bit Release: Not yet tested (SunSpider 1.0.2)
WebKit2 Ubuntu EFL port 64bit Release: 164.1ms (SunSpider 1.0.2)
WebKit2 Ubuntu EFL port 64bit Reelase: 223.0ms (Disabled CONCURRENT_JIT)

I think ENABLE_CONCURRENT_JIT make performance improvement but it is not major reason for this issue.

Additionally, WebKit2 EFL port doesn't use FTL as below.

#if PLATFORM(GTK) && HAVE(LLVM) && ENABLE(JIT) && !defined(ENABLE_FTL_JIT) && CPU(X86_64)
#define ENABLE_FTL_JIT 1
#endif

It looks current issue doesn't related to JIT and it's related stuff.
Comment 16 peavo 2015-07-14 13:17:10 PDT
Created attachment 256787 [details]
Patch
Comment 17 WebKit Commit Bot 2015-07-14 13:19:34 PDT
Attachment 256787 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/runtime/JSLock.cpp:35:  _Thrd_current is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
Total errors found: 1 in 5 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 18 peavo 2015-07-14 13:20:58 PDT
(In reply to comment #14)
> Created attachment 255833 [details]
> Instruments profile for OS X
> 
> Here is the Instruments profile on OS X. Nothing really jumps out at me.
> Perhaps you can see something?

Could you try building with this patch, and check the performance?
Comment 19 peavo 2015-07-15 05:42:28 PDT
Created attachment 256834 [details]
Patch
Comment 20 peavo 2015-07-15 06:09:20 PDT
Created attachment 256835 [details]
Patch
Comment 21 ewmailing 2015-07-15 07:52:30 PDT
I applied the patch from attachment 256835 [details].
That was a huge improvement in performance.

My system is has a bunch of stuff open right now so I can't run a clean benchmark, but JSCore has gone from 120 sprites to about 4100. For comparison, the Lua benchmark currently runs at about 10000 sprites.

So this is a terrific fix. I'm still wondering if there is something more that can be done. The differential here between Lua and JavaScriptCore still seems a little wider than the other platforms.

But thank you. This finally makes JavaScriptCore usable on Windows.
Comment 22 Alex Christensen 2015-07-15 10:38:35 PDT
This looks good to me.  Could someone who works more on JavaScriptCore review this?
Comment 23 Mark Lam 2015-07-15 10:45:15 PDT
Comment on attachment 256835 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=256835&action=review

> Source/JavaScriptCore/runtime/JSLock.h:84
> +class ThreadId {

My first instinct is to question whether we should be creating a new abstraction here.  We already have a WTF::ThreadIdentifier in WTF/wtf/Threading.h.  Can we use that instead?  I’ll look closer.
Comment 24 peavo 2015-07-16 03:05:57 PDT
Created attachment 256896 [details]
Patch
Comment 25 peavo 2015-07-16 03:48:58 PDT
(In reply to comment #23)
> Comment on attachment 256835 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=256835&action=review
> 
> > Source/JavaScriptCore/runtime/JSLock.h:84
> > +class ThreadId {
> 
> My first instinct is to question whether we should be creating a new
> abstraction here.  We already have a WTF::ThreadIdentifier in
> WTF/wtf/Threading.h.  Can we use that instead?  I’ll look closer.

Thanks for reviewing :) Updated patch.
Comment 26 Mark Lam 2015-07-16 09:11:38 PDT
Comment on attachment 256896 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=256896&action=review

Per, there are 2 changes here:
1. Replacing use of std::thread::id with WTF::ThreadIdentifier.
2. Optimizing out the computation of stack bounds when still in the same Windows fiber.

Can you (or someone) please confirm that (1) actually contributes to the performance difference you’re seeing?  These 2 changes should be tested separately to measure their effects.

> Source/JavaScriptCore/runtime/JSLock.cpp:121
> +        m_ownerThreadID = WTF::currentThread();

No need to use the WTF::.  The convention for WTF APIs is that their header files already provide "using” declarations.  We can just use an unqualified currentThread() here.

> Source/JavaScriptCore/runtime/JSLock.cpp:200
> +    ASSERT(!m_hasExclusiveThread || (exclusiveThread() == WTF::currentThread()));

Ditto.  No need for WTF:: qualifier.

> Source/JavaScriptCore/runtime/JSLock.cpp:203
> +    return m_ownerThreadID == WTF::currentThread();

Ditto.

> Source/JavaScriptCore/runtime/JSLock.cpp:210
> +        ASSERT(exclusiveThread() == WTF::currentThread());

Ditto.

> Source/WebCore/bindings/js/JSDOMWindowBase.cpp:206
> +        vm->setExclusiveThread(WTF::currentThread());

Ditto.  No need to WTF:: qualifier.
Comment 27 Anders Carlsson 2015-07-16 10:28:27 PDT
I don't think we should do this. If std::thread::id is slow on Windows then that's a MSVC bug and should be reported through appropriate channels.
Comment 28 Alex Christensen 2015-07-16 10:43:22 PDT
(In reply to comment #27)
> I don't think we should do this. If std::thread::id is slow on Windows then
> that's a MSVC bug and should be reported through appropriate channels.
That is true, but we've worked around bugs in MSVC before, and I think a 35x speed improvement is worth it.
Comment 29 Anders Carlsson 2015-07-16 10:45:21 PDT
(In reply to comment #28)
> (In reply to comment #27)
> > I don't think we should do this. If std::thread::id is slow on Windows then
> > that's a MSVC bug and should be reported through appropriate channels.
> That is true, but we've worked around bugs in MSVC before, and I think a 35x
> speed improvement is worth it.

We have, but this would be a very invasive change. Do we know if this bug exists in 2015?
Comment 30 Mark Lam 2015-07-16 10:46:23 PDT
(In reply to comment #28)
> ... I think a 35x speed improvement is worth it.

First, let's make sure that it the replacement of std::thread::id with ThreadIdentifier is the really the reason for the 35x speed up.  There are 2 changes in this patch.  We need to have them tested separately for their effect.
Comment 31 peavo 2015-07-16 10:57:28 PDT
Created attachment 256913 [details]
Patch
Comment 32 peavo 2015-07-16 11:04:47 PDT
(In reply to comment #26)
> Comment on attachment 256896 [details]
> Patch

Thanks again for reviewing :)

> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=256896&action=review
> 
> Per, there are 2 changes here:
> 1. Replacing use of std::thread::id with WTF::ThreadIdentifier.
> 2. Optimizing out the computation of stack bounds when still in the same
> Windows fiber.
> 
> Can you (or someone) please confirm that (1) actually contributes to the
> performance difference you’re seeing?  These 2 changes should be tested
> separately to measure their effects.
> 

I first made change 2. The sprite count in this benchmark then went from 120 to ~300.
Next I made change 1. The sprite count then increased from 300 to ~3000.
Comment 33 peavo 2015-07-16 11:11:51 PDT
Below is the MSVC implementation of _Thrd_current(), which is called by std::this_thread::get_id(). It's the DuplicateHandle() call which is time consuming according to the profiler. The code for _WIN32_WCE just calls GetCurrentThreadId(), which is faster. This is basically what we do with this patch.

_Thrd_t _Thrd_current(void)
{	/* return _Thrd_t identifying current thread */
	_Thrd_t thr;

#ifdef _WIN32_WCE
	thr._Hnd = (HANDLE)GetCurrentThreadId();
#elif defined(_CRT_APP) && !defined(_KERNELX)
	thr._Hnd = (HANDLE)__crtGetCurrentWinRTThread();
#else /* _WIN32_WCE */
	if (DuplicateHandle(GetCurrentProcess(), GetCurrentThread(),
		GetCurrentProcess(), &thr._Hnd,
		DUPLICATE_SAME_ACCESS, TRUE, 0))
		CloseHandle(thr._Hnd);
	else
		thr._Hnd = 0;
#endif /* _WIN32_WCE */

 #if defined(_CRT_APP) && !defined(_KERNELX)
	thr._Id = __crtGetCurrentWinRTThreadId();
 #else /* defined(_CRT_APP) && !defined(_KERNELX) */
	thr._Id = GetCurrentThreadId();
 #endif /* defined(_CRT_APP) && !defined(_KERNELX) */
	return (thr);
}
Comment 34 Mark Lam 2015-07-16 21:23:16 PDT
Per, ewmailing, do you have access to VS2015?  Can you confirm if the issue still manifests on the latest VS?
Comment 35 peavo 2015-07-17 05:21:06 PDT
Created attachment 256968 [details]
Screenshot of profiling.

This attachment shows a summary of the profiling of the sprite benchmark with Visual Studio. The _Thrd_current function is called by std::this_thread::get_id(), which is called via JSLock::lock(). The VirtualQuery function is called via VM::updateStackLimit. These are the two issues addressed in this patch.
Comment 36 peavo 2015-07-17 05:24:35 PDT
(In reply to comment #34)
> Per, ewmailing, do you have access to VS2015?  Can you confirm if the issue
> still manifests on the latest VS?

I checked the source code of _Thrd_current in VS2015 RC, and the issues has not been fixed there, according to the source:

_Thrd_t _Thrd_current(void)
{	/* return _Thrd_t identifying current thread */
	_Thrd_t thr;

#ifdef _WIN32_WCE
	thr._Hnd = (HANDLE)GetCurrentThreadId();
#else /* _WIN32_WCE */
	if (DuplicateHandle(GetCurrentProcess(), GetCurrentThread(),
		GetCurrentProcess(), &thr._Hnd,
		DUPLICATE_SAME_ACCESS, TRUE, 0))
		CloseHandle(thr._Hnd);
	else
		thr._Hnd = 0;
#endif /* _WIN32_WCE */

	thr._Id = GetCurrentThreadId();
	return (thr);
}
Comment 37 Mark Lam 2015-07-17 14:18:07 PDT
Comment on attachment 256913 [details]
Patch

r=me

Per, I think you should also file a bug against Windows / MSVC and add the url of the MS bug to this bugzilla (and preferably in the ChangeLog as well before landing).  Thanks for investigating this issue.
Comment 38 peavo 2015-07-17 14:25:37 PDT
(In reply to comment #37)
> Comment on attachment 256913 [details]
> Patch
> 
> r=me
> 
> Per, I think you should also file a bug against Windows / MSVC and add the
> url of the MS bug to this bugzilla (and preferably in the ChangeLog as well
> before landing).  Thanks for investigating this issue.

Thanks! I will file a bug against Windows/MSVC.
Comment 39 peavo 2015-07-20 01:15:55 PDT
I reported this issue to Microsoft:

https://connect.microsoft.com/VisualStudio/feedback/details/1558211
Comment 40 peavo 2015-07-20 02:25:32 PDT
Committed r187020: <http://trac.webkit.org/changeset/187020>
Comment 41 peavo 2015-07-20 03:12:06 PDT
(In reply to comment #40)
> Committed r187020: <http://trac.webkit.org/changeset/187020>

This commit have broken the builds:

webcore\bindings\js\JSDOMWindowBase.cpp(206): error C2664: 'void JSC::VM::setExclusiveThread(std::thread::id)' : cannot convert argument 1 from 'WTF::ThreadIdentifier' to 'std::thread::id' (..\bindings\js\JSBindingsAllInOne.cpp) [C:\cygwin\home\buildbot\slave\win-debug\build\Source\WebCore\WebCore.vcxproj\WebCore.vcxproj]
          No constructor could take the source type, or constructor overload resolution was ambiguous

But as far as I can see from VM.h, setExclusiveThread has the signature void setExclusiveThread(ThreadIdentifier threadId).
I don't quite understand this ...
Comment 42 peavo 2015-07-20 05:57:15 PDT
After an unreviewed debug build fix in http://trac.webkit.org/changeset/187021, the builds are looking better, but this patch has caused JSC api tests to fail for the CLoop builds:

ASSERTION FAILED: !identifierByPthreadHandle(pthreadHandle)
/Volumes/Data/slave/mavericks-cloop-debug/build/Source/WTF/wtf/ThreadingPthreads.cpp(154) : ThreadIdentifier WTF::establishIdentifierForPthreadHandle(const pthread_t &)
1   0x1078e1510 WTFCrash
2   0x10793cac7 WTF::establishIdentifierForPthreadHandle(_opaque_pthread_t* const&)
3   0x10793c730 WTF::createThreadInternal(void (*)(void*), void*, char const*)
4   0x10793af57 WTF::createThread(char const*, std::__1::function<void ()>)
5   0x10793b0b5 WTF::createThread(void (*)(void*), void*, char const*)
6   0x107446317 JSC::GCThreadSharedData::GCThreadSharedData(JSC::VM*)
7   0x107445c7d JSC::GCThreadSharedData::GCThreadSharedData(JSC::VM*)
8   0x107451a44 JSC::Heap::Heap(JSC::VM*, JSC::HeapType)
9   0x1074517b3 JSC::Heap::Heap(JSC::VM*, JSC::HeapType)
10  0x10788b80f JSC::VM::VM(JSC::VM::VMType, JSC::HeapType)
11  0x10788b731 JSC::VM::VM(JSC::VM::VMType, JSC::HeapType)
12  0x10788ea0c JSC::VM::createContextGroup(JSC::HeapType)
13  0x107554ceb JSContextGroupCreate
14  0x107647885 -[JSVirtualMachine init]
15  0x10755356d -[JSContext init]
16  0x1072739dd currentThisInsideBlockGetterTest()
17  0x107294888 testObjectiveCAPIMain()
18  0x1072859e4 testObjectiveCAPI
19  0x10727bec3 main
20  0x7fff9439c5fd start
Comment 43 peavo 2015-07-20 06:44:19 PDT
Correction: it seems the JSC api tests are failing for all Debug builds.

The reason for Debug-only failures might be caused by the assert in MachineStackMarker.cpp, line 287:

ASSERT(!m_heap->vm()->hasExclusiveThread() || m_heap->vm()->exclusiveThread() == currentThread());
Comment 44 David Kilzer (:ddkilzer) 2015-07-20 08:17:09 PDT
(In reply to comment #40)
> Committed r187020: <http://trac.webkit.org/changeset/187020>

Debug build fix:

Committed r187021: <http://trac.webkit.org/changeset/187021>
Comment 45 peavo 2015-07-20 08:35:30 PDT
I added a possible fix for the JSC api test failures in https://bugs.webkit.org/show_bug.cgi?id=147110, but it might be better to revert the two commits related to this bug, and include the fix in this bug?
Comment 46 Mark Lam 2015-07-20 11:15:59 PDT
Rolled out r187020 and r187021 in r187026: <http://trac.webkit.org/r187026>.
Comment 47 Geoffrey Garen 2015-07-21 16:23:27 PDT
I think it's wrong to focus on re-architecting thread id and stack bounds here. 

You'll notice that re-architecting thread id and stack bounds did not make JavaScriptCore competitive with Lua. That's a sure sign that it did not fix the root problem.

The profile shows two things:

(1) We're taking the JSLock just to answer the "is null?" question;

(2) Taking the JSLock does a lot of work.

We can fix (1) by not taking the JSLock just to answer trivial type checking questions. (Similarly, we probably should not take the JSLock just to create trivial types like null.)

Then we should profile again to see if JSLock is still expensive relative to other operations. We can probably make JSLock do a lot less work if we need to.
Comment 48 Anders Carlsson 2015-07-21 16:30:05 PDT
FWIW, I looked at the MSVC runtime library in VS2015 and the performance problem with std::this_thread::get_id() has been fixed.
Comment 49 peavo 2015-07-22 05:58:39 PDT
(In reply to comment #48)
> FWIW, I looked at the MSVC runtime library in VS2015 and the performance
> problem with std::this_thread::get_id() has been fixed.

Thanks for looking into this :)
Yes, you are right, I unfortunately only checked the source of the _Thrd_current function in VS2015 RC, but not the performance of the std::this_thread::get_id() function itself. I will update the patch.
Comment 50 peavo 2015-07-22 06:11:53 PDT
(In reply to comment #47)
> I think it's wrong to focus on re-architecting thread id and stack bounds
> here. 
> 
> You'll notice that re-architecting thread id and stack bounds did not make
> JavaScriptCore competitive with Lua. That's a sure sign that it did not fix
> the root problem.
> 
> The profile shows two things:
> 
> (1) We're taking the JSLock just to answer the "is null?" question;
> 
> (2) Taking the JSLock does a lot of work.
> 
> We can fix (1) by not taking the JSLock just to answer trivial type checking
> questions. (Similarly, we probably should not take the JSLock just to create
> trivial types like null.)
> 
> Then we should profile again to see if JSLock is still expensive relative to
> other operations. We can probably make JSLock do a lot less work if we need
> to.

Thanks, these are good points :) I will check the performance without locking in the trivial cases. Maybe we can include the stack bounds optimization on Windows as well?
Comment 51 peavo 2015-07-22 07:07:39 PDT
(In reply to comment #47)
> 
> We can fix (1) by not taking the JSLock just to answer trivial type checking
> questions. (Similarly, we probably should not take the JSLock just to create
> trivial types like null.)
> 

I tested removing locking in the trivial cases, and the sprite count increased from ~3000 to ~4400 :) (without changing the rest of the patch)
Profiling now shows that JSValueToNumber and JSValueToObject is spending ~20% of the total time taking and releasing the lock. I assume we need to lock in these cases?
Comment 52 Geoffrey Garen 2015-07-22 11:19:23 PDT
> Profiling now shows that JSValueToNumber and JSValueToObject is spending
> ~20% of the total time taking and releasing the lock. I assume we need to
> lock in these cases?

There's no need to lock if the value is already a number or an object. Perhaps that optimization would also be worthwhile.
Comment 53 peavo 2015-07-22 11:28:11 PDT
Hmm, I noticed that std::mutex::lock() and std::mutex::unlock() spent about 20% of the total time, and tried replacing the std::mutex in JSLock with WTF::Mutex. WTF::Mutex::lock() and WTF::Mutex::unlock() then spent about 10% of the total time, and the sprite count went from ~4400 to ~6000. So it seems the MSVC implementation of std::mutex is not as fast as it could be.
Comment 54 Anders Carlsson 2015-07-22 11:29:33 PDT
(In reply to comment #53)
> Hmm, I noticed that std::mutex::lock() and std::mutex::unlock() spent about
> 20% of the total time, and tried replacing the std::mutex in JSLock with
> WTF::Mutex. WTF::Mutex::lock() and WTF::Mutex::unlock() then spent about 10%
> of the total time, and the sprite count went from ~4400 to ~6000. So it
> seems the MSVC implementation of std::mutex is not as fast as it could be.

Again, are you testing with VS2015?
Comment 55 peavo 2015-07-22 11:31:53 PDT
(In reply to comment #54)
> (In reply to comment #53)
> > Hmm, I noticed that std::mutex::lock() and std::mutex::unlock() spent about
> > 20% of the total time, and tried replacing the std::mutex in JSLock with
> > WTF::Mutex. WTF::Mutex::lock() and WTF::Mutex::unlock() then spent about 10%
> > of the total time, and the sprite count went from ~4400 to ~6000. So it
> > seems the MSVC implementation of std::mutex is not as fast as it could be.
> 
> Again, are you testing with VS2015?

Not yet, I will test there as well :)
Comment 56 peavo 2015-07-23 05:23:35 PDT
Created attachment 257349 [details]
Patch
Comment 57 peavo 2015-07-23 05:26:28 PDT
(In reply to comment #56)
> Created attachment 257349 [details]
> Patch

There is no need to review this patch. It is just for testing in case ewmailing wants to test if we are getting closer to Lua in performance. I will remove the WTF::ThreadIdentifier and WTF::Mutex changes in the next patch.
Comment 58 peavo 2015-07-23 06:12:09 PDT
Created attachment 257350 [details]
Patch
Comment 59 Brent Fulgham 2015-08-13 08:56:12 PDT
Now that we've cut over to Visual Studio 2015, do we still need any of these changes?
Comment 60 peavo 2015-08-13 09:26:19 PDT
(In reply to comment #59)
> Now that we've cut over to Visual Studio 2015, do we still need any of these
> changes?

I believe the changes related to the reduced use of locks, and the stack bounds optimization (by checking current fiber), are still valid.
Comment 61 peavo 2015-08-13 10:36:44 PDT
(In reply to comment #60)
> (In reply to comment #59)
> > Now that we've cut over to Visual Studio 2015, do we still need any of these
> > changes?
> 
> I believe the changes related to the reduced use of locks, and the stack
> bounds optimization (by checking current fiber), are still valid.

I guess locking will spend some time waiting for the lock in some cases, so removing unneeded locking should be a good thing. Also, updateStackLimit() seems to be called many times from other places than the api locking, so I believe it makes sense to check whether a new fiber is running. Please correct me if I am wrong :)
Comment 62 Darin Adler 2016-03-09 09:07:20 PST
Geoff, could you review and either do review+ or review- on this?
Comment 63 Brady Eidson 2017-04-24 19:11:09 PDT
Comment on attachment 257350 [details]
Patch

This patch has been pending review since 2015 with no recent activity.
It seems unlikely that it would even still apply to trunk in its current form.

Clearing from the review queue.

Feel free to update and resubmit if the patch is still relevant.