I've run into an issue attempting to load a large JS file (the compiled-to-JavaScript version of the Dart SDK) in Safari, and I'm unable to replicate the problem in other, non-WebKit browsers. Loading the file as if it were a single script with no requireJS-style wrapper or ES6 module import works perfectly well (~5MB of JS in <300ms). When I add the `define` wrapper needed for requireJS or add an export statement and attempt to load the file via an ES6 module, however, Safari freezes for ~10 seconds during hard reloads. I'm unable to interact with the page at all. Once the script is loaded, I can usually reload the page without incurring the delay (I'm guessing whatever in-memory representation is made of the script file hangs out in between loads). I knocked out a GitHub repo to demonstrate the issue at https://github.com/RedBrogdon/parsing_speed. It has four HTML files, all of which try different techniques for loading the file. They're marked for which have a delay in Safari/WebKit and which do not. I've tested that code with Safari v12.1.1 (14607.2.6.1.1) and the MiniBrowser shipped with build 247494 of WebKit. Both show the same behavior, which I've been unable to replicate with any other browser. I'm not a JS engine expert by any means, but I'm guessing there's a parsing optimization that's not available when the code in my file is moved out of global scope and into a function/module, and it's possible that the behavior I'm seeing is WebKit working as intended. If that's the case and you can suggest some way for me to modify my code (or even point me to a guide that explains when/how parsing can be optimized), I'd appreciate it.
Interesting, I tried to take a quick look and it looks like we are spending a lot of time in scope resolution code. I couldn't get the vanilla version to run in the JSC CLI. I'll try to take a look at this soon though.
<rdar://problem/53333108>
(In reply to Keith Miller from comment #1) > I'll try to take a look at this soon though. Awesome, thanks. If there's anything I can do to help track down the issue, let me know.
Just checking in -- has anyone had time to look into the issue further?
*** Bug 204462 has been marked as a duplicate of this bug. ***
Wow, sorry about the delay on this! I completely forgot about it... I'll look into this now.
Thanks for taking a look at the issue!
As I mentioned above, if there's any additional info we can provide, I'm happy to track it down.
So, I figured out what the problem is but I haven't had time to implement a fix yet. Basically, the problem is in how we track if we need a TDZ check. If a work around is helpful, I tried changing dart_sdk_es6.js to use "var" everywhere instead of "let"/"const" and it ran instantly in about 300ms in the JSC CLI.
(In reply to Keith Miller from comment #9) > So, I figured out what the problem is but I haven't had time to implement a > fix yet. Basically, the problem is in how we track if we need a TDZ check. > > If a work around is helpful, I tried changing dart_sdk_es6.js to use "var" > everywhere instead of "let"/"const" and it ran instantly in about 300ms in > the JSC CLI. Interesting -- and thanks for taking the time to diagnose it. I'll communicate the workaround back to the Dart team.
An update from our end: the Dart SDK team has modified their development compiler to implement the workaround you suggested (replace "let/const" with "var") in its output. That change made its way into DartPad a few days ago, and we're no longer seeing the delay. Thanks for the help!
It would probably be worthwhile to, at least roughly, document what a working solution would be. I think we can have each scope contain a shared (with each nested closure) TDZ epoch object. This object would have the TDZ epoch for each let/const variable declared in that scope. At each closure we would make sure to record the current epoch lexically at that point. Then whenever we load a variable from a scope we would just check if the function's epoch is greater than the variable's. If it's not then we need a TDZ check.
Created attachment 411960 [details] Repro for 10x slower initialization time in Figma I'm coming here from https://github.com/evanw/esbuild/issues/478. Attached are two builds of the Figma code base (https://www.figma.com/) bundled with the esbuild bundler. One is a normal build and loads in ~2000ms in Safari. The other one has a workaround where let/const/class statements are replaced with var statements and loads in ~200ms in Safari. See the readme for instructions. I hope this is helpful. Let me know if you need more information from me. (This code is fine to share publicly since it's basically a more minified version of the code that's served on figma.com when you use the app)
*** Bug 212962 has been marked as a duplicate of this bug. ***
Created attachment 412007 [details] repro for JSC cli of dart code If you remove the outermost IIFE, we compile bytecode in 40ms. If you put in the IIFE, we compile bytecode in 2000ms.
Created attachment 412040 [details] WIP 2400ms -> 40ms
It appears the Figma issue is different. Looking into it now.
Created attachment 412054 [details] WIP to fix the Dart issue
Created attachment 412068 [details] WIP Fixes our pathologies in both Dart/Sigma. Just requires some final touches to the patch tomorrow, then I'll have it ready for review.
(In reply to Saam Barati from comment #17) > It appears the Figma issue is different. Looking into it now. This is wrong. The fix is the same for both, but there are two things I'm optimizing, and Figma needed both to be fully performant, where Dart needed like 1.5 of the things.
Created attachment 412146 [details] WIP
Created attachment 412206 [details] WIP
Created attachment 412225 [details] patch
Created attachment 412226 [details] patch without huge test case might be easier to review without the ~5MB test case
Looking into the failures now
I broke TDZ checks as I refactored my patch, forgot to actually assign to a member field inside the constructor of the environment ;-)
Created attachment 412328 [details] patch
Created attachment 412329 [details] patch
Created attachment 412330 [details] same patch, but without huge test case
Created attachment 412333 [details] patch
Comment on attachment 412330 [details] same patch, but without huge test case View in context: https://bugs.webkit.org/attachment.cgi?id=412330&action=review r=me > Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:366 > + m_parentScopeTDZVariables = parentScopeTDZVariables; This is not necessary, right? All this information is already in m_cachedVariablesUnderTDZ > Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:367 > + m_cachedVariablesUnderTDZ = m_parentScopeTDZVariables; Could this be implemented as just a link to the parent environment?
Comment on attachment 412330 [details] same patch, but without huge test case View in context: https://bugs.webkit.org/attachment.cgi?id=412330&action=review >> Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:366 >> + m_parentScopeTDZVariables = parentScopeTDZVariables; > > This is not necessary, right? All this information is already in m_cachedVariablesUnderTDZ Yeah, we can just turn this into an integer to know where to start the search in m_cachedVariablesUnderTDZ >> Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:367 >> + m_cachedVariablesUnderTDZ = m_parentScopeTDZVariables; > > Could this be implemented as just a link to the parent environment? Tadeu and I spoke offline about this. We could definitely move this to being a linked list/inverse tree like data structure, but we're going to stick with a vector for now, until the data says it's necessary
Created attachment 412487 [details] patch for landing
Created attachment 412488 [details] patch for landing
Created attachment 412490 [details] patch for landing
Committed r269115: <https://trac.webkit.org/changeset/269115> All reviewed patches have been landed. Closing bug and clearing flags on attachment 412490 [details].
Comment on attachment 412330 [details] same patch, but without huge test case View in context: https://bugs.webkit.org/attachment.cgi?id=412330&action=review > Source/WTF/wtf/RefPtr.h:-200 > -template<typename T, typename U, typename V, typename X, typename Y, typename Z, typename = std::enable_if_t<!std::is_same<U, DumbPtrTraits<T>>::value || !std::is_same<Y, DumbPtrTraits<X>>::value>> > -inline void swap(RefPtr<T, U, V>& a, RefPtr<X, Y, Z>& b) > -{ > - a.swap(b); > -} This is intended to be a performance optimization, so calls to swap(a, b) when both are RefPtr don’t ref/deref. The change log says that this function template is no longer necessary. My guess is that it was never "necessary" but maybe it was an effective optimization. Could you clarify how you determined we didn’t need the optimization? Maybe move semantics mean this isn’t as important to optimize as it once was. Or maybe inlining causes all the ref/deref pairs to cancel each other out. Or maybe it is not big deal to churn reference because that additional code and execution time is not enough to be a problem. Or something else perhaps.
Thanks for the fix, y'all!
(In reply to Darin Adler from comment #37) > Comment on attachment 412330 [details] > same patch, but without huge test case > > View in context: > https://bugs.webkit.org/attachment.cgi?id=412330&action=review > > > Source/WTF/wtf/RefPtr.h:-200 > > -template<typename T, typename U, typename V, typename X, typename Y, typename Z, typename = std::enable_if_t<!std::is_same<U, DumbPtrTraits<T>>::value || !std::is_same<Y, DumbPtrTraits<X>>::value>> > > -inline void swap(RefPtr<T, U, V>& a, RefPtr<X, Y, Z>& b) > > -{ > > - a.swap(b); > > -} > > This is intended to be a performance optimization, so calls to swap(a, b) > when both are RefPtr don’t ref/deref. The change log says that this function > template is no longer necessary. My guess is that it was never "necessary" > but maybe it was an effective optimization. Could you clarify how you > determined we didn’t need the optimization? Maybe move semantics mean this > isn’t as important to optimize as it once was. Or maybe inlining causes all > the ref/deref pairs to cancel each other out. Or maybe it is not big deal to > churn reference because that additional code and execution time is not > enough to be a problem. Or something else perhaps. I removed it because it was causing a compilation error. This function was also only available when not using DumbPtrTraits, so I'm not sure how it was ever much of an optimization given the limited use. I'd also expect the performance optimization to be more important for things with custom ref/deref functions, and for ThreadSafeRefCounted. For normal RefCounted, I'd expect the compiler to optimize it out. Let me see if I can get a POC of the compiler error I was encountering. I'm happy add this back if I can get it to compile
Here's a POC. C++ program: ``` #include <vector> #include <algorithm> namespace WTF { template <typename T> class RefPtr { public: RefPtr(T* ptr) : m_ptr(ptr) { } RefPtr& operator=(const RefPtr<T>& other) { m_ptr = other.m_ptr; return *this; } void swap(RefPtr<T>& other) { std::swap(m_ptr, other.m_ptr); } T* m_ptr; }; template<typename T, typename U> inline void swap(RefPtr<T>& a, RefPtr<U>& b) { a.swap(b); } } using WTF::RefPtr; namespace JavaScriptCore { void foo() { std::vector<RefPtr<int>> v; std::sort(v.begin(), v.end(), [&] (auto& a, auto& b) { return a.m_ptr < b.m_ptr; }); } } ``` And how to compile: $ clang++ test.cpp -stdlib=libc++ -std=c++17 And the compile error is many lines looking like this: ``` In file included from test.cpp:1: In file included from <.....>/usr/bin/../include/c++/v1/vector:274: In file included from <.....>/usr/bin/../include/c++/v1/__bit_reference:15: <....>/usr/bin/../include/c++/v1/algorithm:3945:17: error: call to 'swap' is ambiguous swap(*__first, *__last); ^~~~ <....>/usr/bin/../include/c++/v1/algorithm:4125:12: note: in instantiation of function template specialization 'std::__1::__sort<(lambda at test.cpp:43:35) &, WTF::RefPtr<int> *>' requested here _VSTD::__sort<_Comp_ref>(__first, __last, _Comp_ref(__comp)); ^ <....>/usr/bin/../include/c++/v1/algorithm:4158:12: note: in instantiation of function template specialization 'std::__1::sort<WTF::RefPtr<int> *, (lambda at test.cpp:43:35) &>' requested here _VSTD::sort<_Tp*, _Comp_ref>(__first.base(), __last.base(), __comp); ^ test.cpp:43:10: note: in instantiation of function template specialization 'std::__1::sort<WTF::RefPtr<int>, (lambda at test.cpp:43:35)>' requested here std::sort(v.begin(), v.end(), [&] (auto& a, auto& b) { ``` I gave up after a couple hours of trying to resolve this, given I was doubtful that the swap optimization bought us any perf.
Can we discuss this in another bug report? The language/library feature we are trying to take advantage of is here: https://en.cppreference.com/w/cpp/named_req/Swappable
(In reply to Saam Barati from comment #40) > template<typename T, typename U> > inline void swap(RefPtr<T>& a, RefPtr<U>& b) In the test you provided, changing the lines above to these fixes the error and restores the optimization: template<typename T> inline void swap(RefPtr<T>& a, RefPtr<T>& b) Maybe there are other possible solutions; this is just the first thing I tried.
(In reply to Darin Adler from comment #41) > Can we discuss this in another bug report? The language/library feature we > are trying to take advantage of is here: > > https://en.cppreference.com/w/cpp/named_req/Swappable I opened https://bugs.webkit.org/show_bug.cgi?id=218454