RESOLVED FIXED Bug 199866
Better cache our serialization of the outer TDZ environment when creating FunctionExecutables during bytecode generation
https://bugs.webkit.org/show_bug.cgi?id=199866
Summary Better cache our serialization of the outer TDZ environment when creating Fun...
Andrew Brogdon
Reported 2019-07-17 09:04:26 PDT
I've run into an issue attempting to load a large JS file (the compiled-to-JavaScript version of the Dart SDK) in Safari, and I'm unable to replicate the problem in other, non-WebKit browsers. Loading the file as if it were a single script with no requireJS-style wrapper or ES6 module import works perfectly well (~5MB of JS in <300ms). When I add the `define` wrapper needed for requireJS or add an export statement and attempt to load the file via an ES6 module, however, Safari freezes for ~10 seconds during hard reloads. I'm unable to interact with the page at all. Once the script is loaded, I can usually reload the page without incurring the delay (I'm guessing whatever in-memory representation is made of the script file hangs out in between loads). I knocked out a GitHub repo to demonstrate the issue at https://github.com/RedBrogdon/parsing_speed. It has four HTML files, all of which try different techniques for loading the file. They're marked for which have a delay in Safari/WebKit and which do not. I've tested that code with Safari v12.1.1 (14607.2.6.1.1) and the MiniBrowser shipped with build 247494 of WebKit. Both show the same behavior, which I've been unable to replicate with any other browser. I'm not a JS engine expert by any means, but I'm guessing there's a parsing optimization that's not available when the code in my file is moved out of global scope and into a function/module, and it's possible that the behavior I'm seeing is WebKit working as intended. If that's the case and you can suggest some way for me to modify my code (or even point me to a guide that explains when/how parsing can be optimized), I'd appreciate it.
Attachments
Repro for 10x slower initialization time in Figma (2.33 MB, application/zip)
2020-10-20 23:12 PDT, evan.exe
no flags
repro for JSC cli of dart code (4.72 MB, application/x-javascript)
2020-10-21 10:31 PDT, Saam Barati
no flags
WIP (18.25 KB, patch)
2020-10-21 15:56 PDT, Saam Barati
no flags
WIP to fix the Dart issue (18.93 KB, patch)
2020-10-21 17:45 PDT, Saam Barati
no flags
WIP (31.80 KB, patch)
2020-10-21 22:55 PDT, Saam Barati
no flags
WIP (77.63 KB, patch)
2020-10-22 17:11 PDT, Saam Barati
no flags
WIP (78.25 KB, patch)
2020-10-23 12:36 PDT, Saam Barati
no flags
patch (5.68 MB, patch)
2020-10-23 19:43 PDT, Saam Barati
ews-feeder: commit-queue-
patch without huge test case (83.04 KB, patch)
2020-10-23 19:44 PDT, Saam Barati
ews-feeder: commit-queue-
patch (5.68 MB, patch)
2020-10-26 10:21 PDT, Saam Barati
no flags
patch (5.68 MB, patch)
2020-10-26 10:22 PDT, Saam Barati
no flags
same patch, but without huge test case (83.09 KB, patch)
2020-10-26 10:23 PDT, Saam Barati
tzagallo: review+
patch (5.68 MB, patch)
2020-10-26 10:27 PDT, Saam Barati
no flags
patch for landing (4.92 MB, patch)
2020-10-27 17:58 PDT, Saam Barati
no flags
patch for landing (4.92 MB, patch)
2020-10-27 18:02 PDT, Saam Barati
no flags
patch for landing (4.92 MB, patch)
2020-10-27 18:11 PDT, Saam Barati
no flags
Keith Miller
Comment 1 2019-07-19 18:13:39 PDT
Interesting, I tried to take a quick look and it looks like we are spending a lot of time in scope resolution code. I couldn't get the vanilla version to run in the JSC CLI. I'll try to take a look at this soon though.
Radar WebKit Bug Importer
Comment 2 2019-07-19 18:13:53 PDT
Andrew Brogdon
Comment 3 2019-07-22 21:38:53 PDT
(In reply to Keith Miller from comment #1) > I'll try to take a look at this soon though. Awesome, thanks. If there's anything I can do to help track down the issue, let me know.
Andrew Brogdon
Comment 4 2019-08-07 08:39:10 PDT
Just checking in -- has anyone had time to look into the issue further?
Tadeu Zagallo
Comment 5 2019-11-21 12:52:12 PST
*** Bug 204462 has been marked as a duplicate of this bug. ***
Keith Miller
Comment 6 2019-11-21 12:56:25 PST
Wow, sorry about the delay on this! I completely forgot about it... I'll look into this now.
Andrew Brogdon
Comment 7 2019-12-03 13:13:53 PST
Thanks for taking a look at the issue!
Andrew Brogdon
Comment 8 2019-12-03 13:15:26 PST
As I mentioned above, if there's any additional info we can provide, I'm happy to track it down.
Keith Miller
Comment 9 2020-01-22 18:41:21 PST
So, I figured out what the problem is but I haven't had time to implement a fix yet. Basically, the problem is in how we track if we need a TDZ check. If a work around is helpful, I tried changing dart_sdk_es6.js to use "var" everywhere instead of "let"/"const" and it ran instantly in about 300ms in the JSC CLI.
Andrew Brogdon
Comment 10 2020-01-27 11:22:07 PST
(In reply to Keith Miller from comment #9) > So, I figured out what the problem is but I haven't had time to implement a > fix yet. Basically, the problem is in how we track if we need a TDZ check. > > If a work around is helpful, I tried changing dart_sdk_es6.js to use "var" > everywhere instead of "let"/"const" and it ran instantly in about 300ms in > the JSC CLI. Interesting -- and thanks for taking the time to diagnose it. I'll communicate the workaround back to the Dart team.
Andrew Brogdon
Comment 11 2020-03-19 15:49:09 PDT
An update from our end: the Dart SDK team has modified their development compiler to implement the workaround you suggested (replace "let/const" with "var") in its output. That change made its way into DartPad a few days ago, and we're no longer seeing the delay. Thanks for the help!
Keith Miller
Comment 12 2020-09-01 15:44:35 PDT
It would probably be worthwhile to, at least roughly, document what a working solution would be. I think we can have each scope contain a shared (with each nested closure) TDZ epoch object. This object would have the TDZ epoch for each let/const variable declared in that scope. At each closure we would make sure to record the current epoch lexically at that point. Then whenever we load a variable from a scope we would just check if the function's epoch is greater than the variable's. If it's not then we need a TDZ check.
evan.exe
Comment 13 2020-10-20 23:12:48 PDT
Created attachment 411960 [details] Repro for 10x slower initialization time in Figma I'm coming here from https://github.com/evanw/esbuild/issues/478. Attached are two builds of the Figma code base (https://www.figma.com/) bundled with the esbuild bundler. One is a normal build and loads in ~2000ms in Safari. The other one has a workaround where let/const/class statements are replaced with var statements and loads in ~200ms in Safari. See the readme for instructions. I hope this is helpful. Let me know if you need more information from me. (This code is fine to share publicly since it's basically a more minified version of the code that's served on figma.com when you use the app)
Mihai Parparita
Comment 14 2020-10-20 23:25:33 PDT
*** Bug 212962 has been marked as a duplicate of this bug. ***
Saam Barati
Comment 15 2020-10-21 10:31:44 PDT
Created attachment 412007 [details] repro for JSC cli of dart code If you remove the outermost IIFE, we compile bytecode in 40ms. If you put in the IIFE, we compile bytecode in 2000ms.
Saam Barati
Comment 16 2020-10-21 15:56:09 PDT
Created attachment 412040 [details] WIP 2400ms -> 40ms
Saam Barati
Comment 17 2020-10-21 17:37:23 PDT
It appears the Figma issue is different. Looking into it now.
Saam Barati
Comment 18 2020-10-21 17:45:02 PDT
Created attachment 412054 [details] WIP to fix the Dart issue
Saam Barati
Comment 19 2020-10-21 22:55:00 PDT
Created attachment 412068 [details] WIP Fixes our pathologies in both Dart/Sigma. Just requires some final touches to the patch tomorrow, then I'll have it ready for review.
Saam Barati
Comment 20 2020-10-22 16:20:14 PDT
(In reply to Saam Barati from comment #17) > It appears the Figma issue is different. Looking into it now. This is wrong. The fix is the same for both, but there are two things I'm optimizing, and Figma needed both to be fully performant, where Dart needed like 1.5 of the things.
Saam Barati
Comment 21 2020-10-22 17:11:15 PDT
Saam Barati
Comment 22 2020-10-23 12:36:51 PDT
Saam Barati
Comment 23 2020-10-23 19:43:29 PDT
Saam Barati
Comment 24 2020-10-23 19:44:34 PDT
Created attachment 412226 [details] patch without huge test case might be easier to review without the ~5MB test case
Saam Barati
Comment 25 2020-10-26 09:46:32 PDT
Looking into the failures now
Saam Barati
Comment 26 2020-10-26 10:19:06 PDT
I broke TDZ checks as I refactored my patch, forgot to actually assign to a member field inside the constructor of the environment ;-)
Saam Barati
Comment 27 2020-10-26 10:21:10 PDT
Saam Barati
Comment 28 2020-10-26 10:22:39 PDT
Saam Barati
Comment 29 2020-10-26 10:23:18 PDT
Created attachment 412330 [details] same patch, but without huge test case
Saam Barati
Comment 30 2020-10-26 10:27:37 PDT
Tadeu Zagallo
Comment 31 2020-10-26 17:29:34 PDT
Comment on attachment 412330 [details] same patch, but without huge test case View in context: https://bugs.webkit.org/attachment.cgi?id=412330&action=review r=me > Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:366 > + m_parentScopeTDZVariables = parentScopeTDZVariables; This is not necessary, right? All this information is already in m_cachedVariablesUnderTDZ > Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:367 > + m_cachedVariablesUnderTDZ = m_parentScopeTDZVariables; Could this be implemented as just a link to the parent environment?
Saam Barati
Comment 32 2020-10-27 17:39:02 PDT
Comment on attachment 412330 [details] same patch, but without huge test case View in context: https://bugs.webkit.org/attachment.cgi?id=412330&action=review >> Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:366 >> + m_parentScopeTDZVariables = parentScopeTDZVariables; > > This is not necessary, right? All this information is already in m_cachedVariablesUnderTDZ Yeah, we can just turn this into an integer to know where to start the search in m_cachedVariablesUnderTDZ >> Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:367 >> + m_cachedVariablesUnderTDZ = m_parentScopeTDZVariables; > > Could this be implemented as just a link to the parent environment? Tadeu and I spoke offline about this. We could definitely move this to being a linked list/inverse tree like data structure, but we're going to stick with a vector for now, until the data says it's necessary
Saam Barati
Comment 33 2020-10-27 17:58:53 PDT
Created attachment 412487 [details] patch for landing
Saam Barati
Comment 34 2020-10-27 18:02:36 PDT
Created attachment 412488 [details] patch for landing
Saam Barati
Comment 35 2020-10-27 18:11:00 PDT
Created attachment 412490 [details] patch for landing
EWS
Comment 36 2020-10-28 12:25:19 PDT
Committed r269115: <https://trac.webkit.org/changeset/269115> All reviewed patches have been landed. Closing bug and clearing flags on attachment 412490 [details].
Darin Adler
Comment 37 2020-10-28 12:39:15 PDT
Comment on attachment 412330 [details] same patch, but without huge test case View in context: https://bugs.webkit.org/attachment.cgi?id=412330&action=review > Source/WTF/wtf/RefPtr.h:-200 > -template<typename T, typename U, typename V, typename X, typename Y, typename Z, typename = std::enable_if_t<!std::is_same<U, DumbPtrTraits<T>>::value || !std::is_same<Y, DumbPtrTraits<X>>::value>> > -inline void swap(RefPtr<T, U, V>& a, RefPtr<X, Y, Z>& b) > -{ > - a.swap(b); > -} This is intended to be a performance optimization, so calls to swap(a, b) when both are RefPtr don’t ref/deref. The change log says that this function template is no longer necessary. My guess is that it was never "necessary" but maybe it was an effective optimization. Could you clarify how you determined we didn’t need the optimization? Maybe move semantics mean this isn’t as important to optimize as it once was. Or maybe inlining causes all the ref/deref pairs to cancel each other out. Or maybe it is not big deal to churn reference because that additional code and execution time is not enough to be a problem. Or something else perhaps.
Andrew Brogdon
Comment 38 2020-10-29 00:12:05 PDT
Thanks for the fix, y'all!
Saam Barati
Comment 39 2020-10-29 15:58:10 PDT
(In reply to Darin Adler from comment #37) > Comment on attachment 412330 [details] > same patch, but without huge test case > > View in context: > https://bugs.webkit.org/attachment.cgi?id=412330&action=review > > > Source/WTF/wtf/RefPtr.h:-200 > > -template<typename T, typename U, typename V, typename X, typename Y, typename Z, typename = std::enable_if_t<!std::is_same<U, DumbPtrTraits<T>>::value || !std::is_same<Y, DumbPtrTraits<X>>::value>> > > -inline void swap(RefPtr<T, U, V>& a, RefPtr<X, Y, Z>& b) > > -{ > > - a.swap(b); > > -} > > This is intended to be a performance optimization, so calls to swap(a, b) > when both are RefPtr don’t ref/deref. The change log says that this function > template is no longer necessary. My guess is that it was never "necessary" > but maybe it was an effective optimization. Could you clarify how you > determined we didn’t need the optimization? Maybe move semantics mean this > isn’t as important to optimize as it once was. Or maybe inlining causes all > the ref/deref pairs to cancel each other out. Or maybe it is not big deal to > churn reference because that additional code and execution time is not > enough to be a problem. Or something else perhaps. I removed it because it was causing a compilation error. This function was also only available when not using DumbPtrTraits, so I'm not sure how it was ever much of an optimization given the limited use. I'd also expect the performance optimization to be more important for things with custom ref/deref functions, and for ThreadSafeRefCounted. For normal RefCounted, I'd expect the compiler to optimize it out. Let me see if I can get a POC of the compiler error I was encountering. I'm happy add this back if I can get it to compile
Saam Barati
Comment 40 2020-10-29 16:33:31 PDT
Here's a POC. C++ program: ``` #include <vector> #include <algorithm> namespace WTF { template <typename T> class RefPtr { public: RefPtr(T* ptr) : m_ptr(ptr) { } RefPtr& operator=(const RefPtr<T>& other) { m_ptr = other.m_ptr; return *this; } void swap(RefPtr<T>& other) { std::swap(m_ptr, other.m_ptr); } T* m_ptr; }; template<typename T, typename U> inline void swap(RefPtr<T>& a, RefPtr<U>& b) { a.swap(b); } } using WTF::RefPtr; namespace JavaScriptCore { void foo() { std::vector<RefPtr<int>> v; std::sort(v.begin(), v.end(), [&] (auto& a, auto& b) { return a.m_ptr < b.m_ptr; }); } } ``` And how to compile: $ clang++ test.cpp -stdlib=libc++ -std=c++17 And the compile error is many lines looking like this: ``` In file included from test.cpp:1: In file included from <.....>/usr/bin/../include/c++/v1/vector:274: In file included from <.....>/usr/bin/../include/c++/v1/__bit_reference:15: <....>/usr/bin/../include/c++/v1/algorithm:3945:17: error: call to 'swap' is ambiguous swap(*__first, *__last); ^~~~ <....>/usr/bin/../include/c++/v1/algorithm:4125:12: note: in instantiation of function template specialization 'std::__1::__sort<(lambda at test.cpp:43:35) &, WTF::RefPtr<int> *>' requested here _VSTD::__sort<_Comp_ref>(__first, __last, _Comp_ref(__comp)); ^ <....>/usr/bin/../include/c++/v1/algorithm:4158:12: note: in instantiation of function template specialization 'std::__1::sort<WTF::RefPtr<int> *, (lambda at test.cpp:43:35) &>' requested here _VSTD::sort<_Tp*, _Comp_ref>(__first.base(), __last.base(), __comp); ^ test.cpp:43:10: note: in instantiation of function template specialization 'std::__1::sort<WTF::RefPtr<int>, (lambda at test.cpp:43:35)>' requested here std::sort(v.begin(), v.end(), [&] (auto& a, auto& b) { ``` I gave up after a couple hours of trying to resolve this, given I was doubtful that the swap optimization bought us any perf.
Darin Adler
Comment 41 2020-10-30 13:01:43 PDT
Can we discuss this in another bug report? The language/library feature we are trying to take advantage of is here: https://en.cppreference.com/w/cpp/named_req/Swappable
Darin Adler
Comment 42 2020-10-30 13:06:54 PDT
(In reply to Saam Barati from comment #40) > template<typename T, typename U> > inline void swap(RefPtr<T>& a, RefPtr<U>& b) In the test you provided, changing the lines above to these fixes the error and restores the optimization: template<typename T> inline void swap(RefPtr<T>& a, RefPtr<T>& b) Maybe there are other possible solutions; this is just the first thing I tried.
Saam Barati
Comment 43 2020-11-02 12:52:38 PST
(In reply to Darin Adler from comment #41) > Can we discuss this in another bug report? The language/library feature we > are trying to take advantage of is here: > > https://en.cppreference.com/w/cpp/named_req/Swappable I opened https://bugs.webkit.org/show_bug.cgi?id=218454
Note You need to log in before you can comment on or make changes to this bug.