Bug 199866 - Better cache our serialization of the outer TDZ environment when creating FunctionExecutables during bytecode generation
Summary: Better cache our serialization of the outer TDZ environment when creating Fun...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Mac macOS 10.14
: P2 Normal
Assignee: Saam Barati
URL:
Keywords: InRadar
: 204462 212962 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-07-17 09:04 PDT by Andrew Brogdon
Modified: 2020-11-02 12:52 PST (History)
25 users (show)

See Also:


Attachments
Repro for 10x slower initialization time in Figma (2.33 MB, application/zip)
2020-10-20 23:12 PDT, evan.exe
no flags Details
repro for JSC cli of dart code (4.72 MB, application/x-javascript)
2020-10-21 10:31 PDT, Saam Barati
no flags Details
WIP (18.25 KB, patch)
2020-10-21 15:56 PDT, Saam Barati
no flags Details | Formatted Diff | Diff
WIP to fix the Dart issue (18.93 KB, patch)
2020-10-21 17:45 PDT, Saam Barati
no flags Details | Formatted Diff | Diff
WIP (31.80 KB, patch)
2020-10-21 22:55 PDT, Saam Barati
no flags Details | Formatted Diff | Diff
WIP (77.63 KB, patch)
2020-10-22 17:11 PDT, Saam Barati
no flags Details | Formatted Diff | Diff
WIP (78.25 KB, patch)
2020-10-23 12:36 PDT, Saam Barati
no flags Details | Formatted Diff | Diff
patch (5.68 MB, patch)
2020-10-23 19:43 PDT, Saam Barati
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
patch without huge test case (83.04 KB, patch)
2020-10-23 19:44 PDT, Saam Barati
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
patch (5.68 MB, patch)
2020-10-26 10:21 PDT, Saam Barati
no flags Details | Formatted Diff | Diff
patch (5.68 MB, patch)
2020-10-26 10:22 PDT, Saam Barati
no flags Details | Formatted Diff | Diff
same patch, but without huge test case (83.09 KB, patch)
2020-10-26 10:23 PDT, Saam Barati
tzagallo: review+
Details | Formatted Diff | Diff
patch (5.68 MB, patch)
2020-10-26 10:27 PDT, Saam Barati
no flags Details | Formatted Diff | Diff
patch for landing (4.92 MB, patch)
2020-10-27 17:58 PDT, Saam Barati
no flags Details | Formatted Diff | Diff
patch for landing (4.92 MB, patch)
2020-10-27 18:02 PDT, Saam Barati
no flags Details | Formatted Diff | Diff
patch for landing (4.92 MB, patch)
2020-10-27 18:11 PDT, Saam Barati
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Brogdon 2019-07-17 09:04:26 PDT
I've run into an issue attempting to load a large JS file (the compiled-to-JavaScript version of the Dart SDK) in Safari, and I'm unable to replicate the problem in other, non-WebKit browsers.

Loading the file as if it were a single script with no requireJS-style wrapper or ES6 module import works perfectly well (~5MB of JS in <300ms). When I add the `define` wrapper needed for requireJS or add an export statement and attempt to load the file via an ES6 module, however, Safari freezes for ~10 seconds during hard reloads. I'm unable to interact with the page at all. Once the script is loaded, I can usually reload the page without incurring the delay (I'm guessing whatever in-memory representation is made of the script file hangs out in between loads).

I knocked out a GitHub repo to demonstrate the issue at https://github.com/RedBrogdon/parsing_speed. It has four HTML files, all of which try different techniques for loading the file. They're marked for which have a delay in Safari/WebKit and which do not. I've tested that code with Safari v12.1.1 (14607.2.6.1.1) and the MiniBrowser shipped with build 247494 of WebKit. Both show the same behavior, which I've been unable to replicate with any other browser.

I'm not a JS engine expert by any means, but I'm guessing there's a parsing optimization that's not available when the code in my file is moved out of global scope and into a function/module, and it's possible that the behavior I'm seeing is WebKit working as intended. If that's the case and you can suggest some way for me to modify my code (or even point me to a guide that explains when/how parsing can be optimized), I'd appreciate it.
Comment 1 Keith Miller 2019-07-19 18:13:39 PDT
Interesting, I tried to take a quick look and it looks like we are spending a lot of time in scope resolution code. I couldn't get the vanilla version to run in the JSC CLI. I'll try to take a look at this soon though.
Comment 2 Radar WebKit Bug Importer 2019-07-19 18:13:53 PDT
<rdar://problem/53333108>
Comment 3 Andrew Brogdon 2019-07-22 21:38:53 PDT
(In reply to Keith Miller from comment #1)
> I'll try to take a look at this soon though.

Awesome, thanks. If there's anything I can do to help track down the issue, let me know.
Comment 4 Andrew Brogdon 2019-08-07 08:39:10 PDT
Just checking in -- has anyone had time to look into the issue further?
Comment 5 Tadeu Zagallo 2019-11-21 12:52:12 PST
*** Bug 204462 has been marked as a duplicate of this bug. ***
Comment 6 Keith Miller 2019-11-21 12:56:25 PST
Wow, sorry about the delay on this! I completely forgot about it... I'll look into this now.
Comment 7 Andrew Brogdon 2019-12-03 13:13:53 PST
Thanks for taking a look at the issue!
Comment 8 Andrew Brogdon 2019-12-03 13:15:26 PST
As I mentioned above, if there's any additional info we can provide, I'm happy to track it down.
Comment 9 Keith Miller 2020-01-22 18:41:21 PST
So, I figured out what the problem is but I haven't had time to implement a fix yet. Basically, the problem is in how we track if we need a TDZ check. 

If a work around is helpful, I tried changing dart_sdk_es6.js to use "var" everywhere instead of "let"/"const" and it ran instantly in about 300ms in the JSC CLI.
Comment 10 Andrew Brogdon 2020-01-27 11:22:07 PST
(In reply to Keith Miller from comment #9)
> So, I figured out what the problem is but I haven't had time to implement a
> fix yet. Basically, the problem is in how we track if we need a TDZ check. 
> 
> If a work around is helpful, I tried changing dart_sdk_es6.js to use "var"
> everywhere instead of "let"/"const" and it ran instantly in about 300ms in
> the JSC CLI.

Interesting -- and thanks for taking the time to diagnose it.

I'll communicate the workaround back to the Dart team.
Comment 11 Andrew Brogdon 2020-03-19 15:49:09 PDT
An update from our end: the Dart SDK team has modified their development compiler to implement the workaround you suggested (replace "let/const" with "var") in its output. That change made its way into DartPad a few days ago, and we're no longer seeing the delay.

Thanks for the help!
Comment 12 Keith Miller 2020-09-01 15:44:35 PDT
It would probably be worthwhile to, at least roughly, document what a working solution would be. I think we can have each scope contain a shared (with each nested closure) TDZ epoch object. This object would have the TDZ epoch for each let/const variable declared in that scope. At each closure we would make sure to record the current epoch lexically at that point. Then whenever we load a variable from a scope we would just check if the function's epoch is greater than the variable's. If it's not then we need a TDZ check.
Comment 13 evan.exe 2020-10-20 23:12:48 PDT
Created attachment 411960 [details]
Repro for 10x slower initialization time in Figma

I'm coming here from https://github.com/evanw/esbuild/issues/478.

Attached are two builds of the Figma code base (https://www.figma.com/) bundled with the esbuild bundler. One is a normal build and loads in ~2000ms in Safari. The other one has a workaround where let/const/class statements are replaced with var statements and loads in ~200ms in Safari. See the readme for instructions.

I hope this is helpful. Let me know if you need more information from me.

(This code is fine to share publicly since it's basically a more minified version of the code that's served on figma.com when you use the app)
Comment 14 Mihai Parparita 2020-10-20 23:25:33 PDT
*** Bug 212962 has been marked as a duplicate of this bug. ***
Comment 15 Saam Barati 2020-10-21 10:31:44 PDT
Created attachment 412007 [details]
repro for JSC cli of dart code

If you remove the outermost IIFE, we compile bytecode in 40ms. If you put in the IIFE, we compile bytecode in 2000ms.
Comment 16 Saam Barati 2020-10-21 15:56:09 PDT
Created attachment 412040 [details]
WIP

2400ms -> 40ms
Comment 17 Saam Barati 2020-10-21 17:37:23 PDT
It appears the Figma issue is different. Looking into it now.
Comment 18 Saam Barati 2020-10-21 17:45:02 PDT
Created attachment 412054 [details]
WIP to fix the Dart issue
Comment 19 Saam Barati 2020-10-21 22:55:00 PDT
Created attachment 412068 [details]
WIP

Fixes our pathologies in both Dart/Sigma.

Just requires some final touches to the patch tomorrow, then I'll have it ready for review.
Comment 20 Saam Barati 2020-10-22 16:20:14 PDT
(In reply to Saam Barati from comment #17)
> It appears the Figma issue is different. Looking into it now.

This is wrong. The fix is the same for both, but there are two things I'm optimizing, and Figma needed both to be fully performant, where Dart needed like 1.5 of the things.
Comment 21 Saam Barati 2020-10-22 17:11:15 PDT
Created attachment 412146 [details]
WIP
Comment 22 Saam Barati 2020-10-23 12:36:51 PDT
Created attachment 412206 [details]
WIP
Comment 23 Saam Barati 2020-10-23 19:43:29 PDT
Created attachment 412225 [details]
patch
Comment 24 Saam Barati 2020-10-23 19:44:34 PDT
Created attachment 412226 [details]
patch without huge test case

might be easier to review without the ~5MB test case
Comment 25 Saam Barati 2020-10-26 09:46:32 PDT
Looking into the failures now
Comment 26 Saam Barati 2020-10-26 10:19:06 PDT
I broke TDZ checks as I refactored my patch, forgot to actually assign to a member field inside the constructor of the environment ;-)
Comment 27 Saam Barati 2020-10-26 10:21:10 PDT
Created attachment 412328 [details]
patch
Comment 28 Saam Barati 2020-10-26 10:22:39 PDT
Created attachment 412329 [details]
patch
Comment 29 Saam Barati 2020-10-26 10:23:18 PDT
Created attachment 412330 [details]
same patch, but without huge test case
Comment 30 Saam Barati 2020-10-26 10:27:37 PDT
Created attachment 412333 [details]
patch
Comment 31 Tadeu Zagallo 2020-10-26 17:29:34 PDT
Comment on attachment 412330 [details]
same patch, but without huge test case

View in context: https://bugs.webkit.org/attachment.cgi?id=412330&action=review

r=me

> Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:366
> +    m_parentScopeTDZVariables = parentScopeTDZVariables;

This is not necessary, right? All this information is already in m_cachedVariablesUnderTDZ

> Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:367
> +    m_cachedVariablesUnderTDZ = m_parentScopeTDZVariables;

Could this be implemented as just a link to the parent environment?
Comment 32 Saam Barati 2020-10-27 17:39:02 PDT
Comment on attachment 412330 [details]
same patch, but without huge test case

View in context: https://bugs.webkit.org/attachment.cgi?id=412330&action=review

>> Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:366
>> +    m_parentScopeTDZVariables = parentScopeTDZVariables;
> 
> This is not necessary, right? All this information is already in m_cachedVariablesUnderTDZ

Yeah, we can just turn this into an integer to know where to start the search in m_cachedVariablesUnderTDZ

>> Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:367
>> +    m_cachedVariablesUnderTDZ = m_parentScopeTDZVariables;
> 
> Could this be implemented as just a link to the parent environment?

Tadeu and I spoke offline about this. We could definitely move this  to being a linked list/inverse tree like data structure, but we're going to stick with a vector for now, until the data says it's necessary
Comment 33 Saam Barati 2020-10-27 17:58:53 PDT
Created attachment 412487 [details]
patch for landing
Comment 34 Saam Barati 2020-10-27 18:02:36 PDT
Created attachment 412488 [details]
patch for landing
Comment 35 Saam Barati 2020-10-27 18:11:00 PDT
Created attachment 412490 [details]
patch for landing
Comment 36 EWS 2020-10-28 12:25:19 PDT
Committed r269115: <https://trac.webkit.org/changeset/269115>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 412490 [details].
Comment 37 Darin Adler 2020-10-28 12:39:15 PDT
Comment on attachment 412330 [details]
same patch, but without huge test case

View in context: https://bugs.webkit.org/attachment.cgi?id=412330&action=review

> Source/WTF/wtf/RefPtr.h:-200
> -template<typename T, typename U, typename V, typename X, typename Y, typename Z, typename = std::enable_if_t<!std::is_same<U, DumbPtrTraits<T>>::value || !std::is_same<Y, DumbPtrTraits<X>>::value>>
> -inline void swap(RefPtr<T, U, V>& a, RefPtr<X, Y, Z>& b)
> -{
> -    a.swap(b);
> -}

This is intended to be a performance optimization, so calls to swap(a, b) when both are RefPtr don’t ref/deref. The change log says that this function template is no longer necessary. My guess is that it was never "necessary" but maybe it was an effective optimization. Could you clarify how you determined we didn’t need the optimization? Maybe move semantics mean this isn’t as important to optimize as it once was. Or maybe inlining causes all the ref/deref pairs to cancel each other out. Or maybe it is not big deal to churn reference because that additional code and execution time is not enough to be a problem. Or something else perhaps.
Comment 38 Andrew Brogdon 2020-10-29 00:12:05 PDT
Thanks for the fix, y'all!
Comment 39 Saam Barati 2020-10-29 15:58:10 PDT
(In reply to Darin Adler from comment #37)
> Comment on attachment 412330 [details]
> same patch, but without huge test case
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=412330&action=review
> 
> > Source/WTF/wtf/RefPtr.h:-200
> > -template<typename T, typename U, typename V, typename X, typename Y, typename Z, typename = std::enable_if_t<!std::is_same<U, DumbPtrTraits<T>>::value || !std::is_same<Y, DumbPtrTraits<X>>::value>>
> > -inline void swap(RefPtr<T, U, V>& a, RefPtr<X, Y, Z>& b)
> > -{
> > -    a.swap(b);
> > -}
> 
> This is intended to be a performance optimization, so calls to swap(a, b)
> when both are RefPtr don’t ref/deref. The change log says that this function
> template is no longer necessary. My guess is that it was never "necessary"
> but maybe it was an effective optimization. Could you clarify how you
> determined we didn’t need the optimization? Maybe move semantics mean this
> isn’t as important to optimize as it once was. Or maybe inlining causes all
> the ref/deref pairs to cancel each other out. Or maybe it is not big deal to
> churn reference because that additional code and execution time is not
> enough to be a problem. Or something else perhaps.

I removed it because it was causing a compilation error. This function was also only available when not using DumbPtrTraits, so I'm not sure how it was ever much of an optimization given the limited use.

I'd also expect the performance optimization to be more important for things with custom ref/deref functions, and for ThreadSafeRefCounted.

For normal RefCounted, I'd expect the compiler to optimize it out.

Let me see if I can get a POC of the compiler error I was encountering. I'm happy add this back if I can get it to compile
Comment 40 Saam Barati 2020-10-29 16:33:31 PDT
Here's a POC.


C++ program:
```
#include <vector>
#include <algorithm>

namespace WTF {

template <typename T>
class RefPtr {
public:
    RefPtr(T* ptr) 
        : m_ptr(ptr)
    {
    }

    RefPtr& operator=(const RefPtr<T>& other) 
    {
        m_ptr = other.m_ptr;
        return *this;
    }

    void swap(RefPtr<T>& other)
    {
        std::swap(m_ptr, other.m_ptr);
    }

    T* m_ptr;
};


template<typename T, typename U>
inline void swap(RefPtr<T>& a, RefPtr<U>& b)
{
    a.swap(b);
}

}

using WTF::RefPtr;

namespace JavaScriptCore {

void foo() {
    std::vector<RefPtr<int>> v;
    std::sort(v.begin(), v.end(), [&] (auto& a, auto& b) {
        return a.m_ptr < b.m_ptr;
    });
}

}
```

And how to compile:
$ clang++ test.cpp -stdlib=libc++ -std=c++17

And the compile error is many lines looking like this:
```
In file included from test.cpp:1:
In file included from <.....>/usr/bin/../include/c++/v1/vector:274:
In file included from <.....>/usr/bin/../include/c++/v1/__bit_reference:15:
<....>/usr/bin/../include/c++/v1/algorithm:3945:17: error: call to 'swap' is ambiguous
                swap(*__first, *__last);
                ^~~~
<....>/usr/bin/../include/c++/v1/algorithm:4125:12: note: in instantiation of function template specialization
      'std::__1::__sort<(lambda at test.cpp:43:35) &, WTF::RefPtr<int> *>' requested here
    _VSTD::__sort<_Comp_ref>(__first, __last, _Comp_ref(__comp));
           ^
<....>/usr/bin/../include/c++/v1/algorithm:4158:12: note: in instantiation of function template specialization
      'std::__1::sort<WTF::RefPtr<int> *, (lambda at test.cpp:43:35) &>' requested here
    _VSTD::sort<_Tp*, _Comp_ref>(__first.base(), __last.base(), __comp);
           ^
test.cpp:43:10: note: in instantiation of function template specialization 'std::__1::sort<WTF::RefPtr<int>, (lambda at test.cpp:43:35)>' requested here
    std::sort(v.begin(), v.end(), [&] (auto& a, auto& b) {
```

I gave up after a couple hours of trying to resolve this, given I was doubtful that the swap optimization bought us any perf.
Comment 41 Darin Adler 2020-10-30 13:01:43 PDT
Can we discuss this in another bug report? The language/library feature we are trying to take advantage of is here:

https://en.cppreference.com/w/cpp/named_req/Swappable
Comment 42 Darin Adler 2020-10-30 13:06:54 PDT
(In reply to Saam Barati from comment #40)
> template<typename T, typename U>
> inline void swap(RefPtr<T>& a, RefPtr<U>& b)

In the test you provided, changing the lines above to these fixes the error and restores the optimization:

    template<typename T>
    inline void swap(RefPtr<T>& a, RefPtr<T>& b)

Maybe there are other possible solutions; this is just the first thing I tried.
Comment 43 Saam Barati 2020-11-02 12:52:38 PST
(In reply to Darin Adler from comment #41)
> Can we discuss this in another bug report? The language/library feature we
> are trying to take advantage of is here:
> 
> https://en.cppreference.com/w/cpp/named_req/Swappable

I opened https://bugs.webkit.org/show_bug.cgi?id=218454