Bug 216172

Summary: [JSC] Use symbols as identifiers for class fields computed names storage
Product: WebKit Reporter: Xan Lopez <xan.lopez>
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, caitp, cdumez, cmarcelo, ews-watchlist, keith_miller, mark.lam, msaboff, saam, ticaiolima, tzagallo, webkit-bug-importer, xan.lopez, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 194095    
Attachments:
Description Flags
Use symbols as identifiers
none
Use symbols as identifiers
ews-feeder: commit-queue-
Use symbols as identifiers, v3
none
Use symbols as identifiers, v4
none
Use symbols as identifiers, v5
ews-feeder: commit-queue-
Use symbols as identifiers, v6
none
Use symbols as identifiers, v7
none
Use symbols as identifiers, v8
none
Use symbols as identifiers, v9
none
Use symbols as identifiers, v10
none
Use symbols as identifiers, v11
ews-feeder: commit-queue-
Use symbols as identifiers, 12
none
Use symbols as identifiers, v13
none
Use symbols as identifiers, v14
none
Use symbols as identifiers, v15 none

Description Xan Lopez 2020-09-04 02:57:40 PDT
Right now we are using plain numeric identifiers to store the property keys for the field's computed names ("0" for the first one, "1" for the second, etc). Using symbols is cleaner, avoids potential collisions with other features storing data in the class scope, and will allow for a more sane implementation when we land static class fields.
Comment 1 Xan Lopez 2020-09-04 08:27:39 PDT
Created attachment 407975 [details]
Use symbols as identifiers

v1
Comment 2 Caio Lima 2020-09-04 08:45:28 PDT
Comment on attachment 407975 [details]
Use symbols as identifiers

View in context: https://bugs.webkit.org/attachment.cgi?id=407975&action=review

> Source/JavaScriptCore/parser/Parser.cpp:2851
> +    Symbol* symbol = Symbol::create(vm, vm.symbolRegistry().symbolForKey(symbolName));

I think we'd like to use PrivateSymbols here instead. Also, I think it is necessary to use a separate SymbolRegistry for this internal usage, since we can have `Symbol.for("0")` from use code and I don't think it's safe to use something that can be retrieved by arbritrary code. Finally, I don't think we actually need to create a Symbol, since we are only interested into uid and `vm.symbolRegistry().symbolForKey(symbolName)` is already giving us that.
Comment 3 Radar WebKit Bug Importer 2020-09-11 02:58:14 PDT
<rdar://problem/68699684>
Comment 4 Xan Lopez 2020-10-01 02:34:23 PDT
Created attachment 410212 [details]
Use symbols as identifiers

Use private symbols as suggested. I'm keeping the Symbol wrapper (created with Symbol::create) since it's needed to keep the UID alive in the SymbolRegistry. It will be garbage collected as needed, so it's not leaked. Also not using a parallel SymbolRegistry for now, we'll discuss this in a sync-up meeting.
Comment 5 Xan Lopez 2020-10-01 06:16:00 PDT
Created attachment 410226 [details]
Use symbols as identifiers, v3

Fix a mistake with changed default flags for registered symbols. There's still a bug with the bytecode cache, but just to make sure that's the only missing thing.
Comment 6 Xan Lopez 2020-10-01 11:04:10 PDT
(In reply to Xan Lopez from comment #5)
> Created attachment 410226 [details]
> Use symbols as identifiers, v3
> 
> Fix a mistake with changed default flags for registered symbols. There's
> still a bug with the bytecode cache, but just to make sure that's the only
> missing thing.

Will also update the patch to use a different SymbolRegistry, as discussed.
Comment 7 Xan Lopez 2020-10-19 07:54:38 PDT
Created attachment 411746 [details]
Use symbols as identifiers, v4

v4, attempts to fix the bytecode cache bugs (basically by teaching it about registered symbols) and uses a different hashshet for private symbols in SymbolRegistry.
Comment 8 Caio Lima 2020-10-20 04:54:00 PDT
Comment on attachment 411746 [details]
Use symbols as identifiers, v4

View in context: https://bugs.webkit.org/attachment.cgi?id=411746&action=review

> Source/JavaScriptCore/parser/Parser.cpp:2849
> +    auto symbol = Symbol::create(vm, vm.symbolRegistry().privateSymbolForKey(symbolName));

After thinking this in more depth, I'm uncomfortable with the fact we have a short lived heap object to return the uid here. This seems a bad choice for this API, since for a fresh created `SymbolImpl`, the only thing that is handling its ref is the Symbol we created. It means that if GC runs after `getOrCreateSymbolForComputedName`  and before we ever reference SymbolImpl somewhere else, we will end up with an invalid pointer that can cause UAF issues. I think it's not correct relying that such scenario won't happen. Is there a particular reason to have such function as a primitive? 

It sounds more correct if we have a less fine grain API to create/retrieve private identifier on `identifierArena` called `makePrivateIdentifier(VM& vm, unsigned identifier)`. This way, we would be able to avoid a heap allocation (removing `Symbol::create`) and have full control of SymbolImpl lifetime.

> Source/WTF/wtf/text/SymbolRegistry.h:91
> +    HashSet<SymbolRegistryKey> m_privateTable;

I don't think we need to include a `m_privateTable` on SymbolRegistry. I think this concept of private/public is more on the client side of this API than on SymbolRegistry itself. The alternative I'd propose here is to have a new SymbolRegistry on `VM` that store private symbols. This way we would be achieving the same goal we want with current `SymbolRegistry` API we have.
Comment 9 Xan Lopez 2020-10-20 07:42:07 PDT
(In reply to Caio Lima from comment #8)

> It sounds more correct if we have a less fine grain API to create/retrieve
> private identifier on `identifierArena` called `makePrivateIdentifier(VM&
> vm, unsigned identifier)`. This way, we would be able to avoid a heap
> allocation (removing `Symbol::create`) and have full control of SymbolImpl
> lifetime.

This makes a lot of sense, even more so because the pending bytecode cache failures are also related to this! So we need a better API than wrapping things with a heap Symbol.

> 
> > Source/WTF/wtf/text/SymbolRegistry.h:91
> > +    HashSet<SymbolRegistryKey> m_privateTable;
> 
> I don't think we need to include a `m_privateTable` on SymbolRegistry. I
> think this concept of private/public is more on the client side of this API
> than on SymbolRegistry itself. The alternative I'd propose here is to have a
> new SymbolRegistry on `VM` that store private symbols. This way we would be
> achieving the same goal we want with current `SymbolRegistry` API we have.

Cool. What do you think about adding some code in SymbolRegistry so that it can be configured to store public or private symbols, but never both?
Comment 10 Caio Lima 2020-10-20 08:12:35 PDT
(In reply to Xan Lopez from comment #9)
> (In reply to Caio Lima from comment #8)
> 
> > It sounds more correct if we have a less fine grain API to create/retrieve
> > private identifier on `identifierArena` called `makePrivateIdentifier(VM&
> > vm, unsigned identifier)`. This way, we would be able to avoid a heap
> > allocation (removing `Symbol::create`) and have full control of SymbolImpl
> > lifetime.
> 
> This makes a lot of sense, even more so because the pending bytecode cache
> failures are also related to this! So we need a better API than wrapping
> things with a heap Symbol.
> 
> > 
> > > Source/WTF/wtf/text/SymbolRegistry.h:91
> > > +    HashSet<SymbolRegistryKey> m_privateTable;
> > 
> > I don't think we need to include a `m_privateTable` on SymbolRegistry. I
> > think this concept of private/public is more on the client side of this API
> > than on SymbolRegistry itself. The alternative I'd propose here is to have a
> > new SymbolRegistry on `VM` that store private symbols. This way we would be
> > achieving the same goal we want with current `SymbolRegistry` API we have.
> 
> Cool. What do you think about adding some code in SymbolRegistry so that it
> can be configured to store public or private symbols, but never both?

I personally would avoid including any info into SymbolRegistry for this change. We could definitely do that in the future, if we think it's necessary. I can see the advantage of this on cases where we don't want to mix Private and Public Symbols.
Comment 11 Xan Lopez 2020-10-21 01:54:17 PDT
(In reply to Caio Lima from comment #10)
> > Cool. What do you think about adding some code in SymbolRegistry so that it
> > can be configured to store public or private symbols, but never both?
> 
> I personally would avoid including any info into SymbolRegistry for this
> change. We could definitely do that in the future, if we think it's
> necessary. I can see the advantage of this on cases where we don't want to
> mix Private and Public Symbols.

The thing is, SymbolRegistry is the one creating the symbols. If there's no way to configure it we need to subclass it, otherwise it will still create non-private symbols for all keys.
Comment 12 Xan Lopez 2020-10-21 04:27:30 PDT
Created attachment 411971 [details]
Use symbols as identifiers, v5

v5,

use makePrivateIdentifier to create private symbols as identifiers in the parser
use a new symbolregistry in VM, changing the class to create private symbols when asked to (in symbolKey())
(hopefully) fix the bytecode cache bugs
Comment 13 Xan Lopez 2020-10-21 06:36:26 PDT
Created attachment 411979 [details]
Use symbols as identifiers, v6

v6,

since we need to make some changes in SymbolRegistry, set the type (public/private symbols) already in the constructor, that way we ensure we'll always store only one type of symbol and get rid of the extra parameter in symbolForKey()
Comment 14 Caio Lima 2020-10-21 09:41:10 PDT
Comment on attachment 411979 [details]
Use symbols as identifiers, v6

View in context: https://bugs.webkit.org/attachment.cgi?id=411979&action=review

LGTM. I left one comment regarding changes on `Source/WTF/wtf/text/SymbolRegistry.h`

> Source/WTF/wtf/text/SymbolRegistry.h:94
> +    CreateSymbol m_createSymbol { nullptr };

Do you mind measuring mem usage with this change to double check if it's perf neutral?
Comment 15 Xan Lopez 2020-10-22 09:59:58 PDT
I run RAMification with v6 and without it (ToT), assuming that would give me a worst case of what the patch causes.

v6:

Footprint geomean: 58974859 (56.243 MB)
Peak Footprint geomean: 63023929 (60.104 MB)
Score: 60965788 (58.142 MB)

ToT:

Footprint geomean: 59780486 (57.011 MB)
Peak Footprint geomean: 63500091 (60.558 MB)
Score: 61612225 (58.758 MB)

Not sure if I understand how could this be, but I'm happy to run other tests if needed.
Comment 16 Xan Lopez 2020-10-22 10:12:29 PDT
(In reply to Xan Lopez from comment #15)
> I run RAMification with v6 and without it (ToT), assuming that would give me
> a worst case of what the patch causes.
> 
> v6:
> 
> Footprint geomean: 58974859 (56.243 MB)
> Peak Footprint geomean: 63023929 (60.104 MB)
> Score: 60965788 (58.142 MB)
> 
> ToT:
> 
> Footprint geomean: 59780486 (57.011 MB)
> Peak Footprint geomean: 63500091 (60.558 MB)
> Score: 61612225 (58.758 MB)
> 
> Not sure if I understand how could this be, but I'm happy to run other tests
> if needed.

I'm told RAMification.py itself does not have super accurate averaging so just one execution is not enough, so I'll provide better numbers soon.
Comment 17 Xan Lopez 2020-10-26 08:40:53 PDT
(In reply to Xan Lopez from comment #16)
> 
> I'm told RAMification.py itself does not have super accurate averaging so
> just one execution is not enough, so I'll provide better numbers soon.

Here comparing a series of several ToT runs (A) and v6 runs (B). Again it seems somehow the v6 patch is slightly lower than ToT, but in any case if there are no errors it would seem we are good:

A Low Diff: 0.14497648057239587
A high Diff: -0.14497648057239587
B Low Diff: 0.12794076230601092
B high Diff: -0.12794076230601092
Are they different?: False
Mean Baseline: 58.83728571428571
Mean Changes: 58.63028571428571
Diff: 1.0035305985205112
Comment 18 Xan Lopez 2020-10-27 05:02:41 PDT
Created attachment 412410 [details]
Use symbols as identifiers, v7

v7,

rebased
fix a bug uncovered while rebasing static fields on top of this, we were using the temporary private id for the computed property key as the function name in some cases; will update test262 accordingly, since by chance this was not tested properly in the instance test suite
Comment 19 Caitlin Potter (:caitp) 2020-10-27 12:09:30 PDT
Comment on attachment 412410 [details]
Use symbols as identifiers, v7

View in context: https://bugs.webkit.org/attachment.cgi?id=412410&action=review

overall it looks ok to me

> Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:4827
> +            setFunctionName = true;

it looks like this is only used by ComputedName, so we can probably get rid of it and just add `&& m_type != DefineFieldNode::Type::ComputedName` to the if condition

> Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:4868
> +        if (setFunctionName)

and replace this with `if (generator.shouldSetFunctionName(m_assign)`
Comment 20 Caitlin Potter (:caitp) 2020-10-27 12:10:09 PDT
Comment on attachment 412410 [details]
Use symbols as identifiers, v7

View in context: https://bugs.webkit.org/attachment.cgi?id=412410&action=review

overall it looks ok to me

>> Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:4827
>> +            setFunctionName = true;
> 
> it looks like this is only used by ComputedName, so we can probably get rid of it and just add `&& m_type != DefineFieldNode::Type::ComputedName` to the if condition

it looks like this is only used by ComputedName, so we can probably get rid of it and just add `&& m_type != DefineFieldNode::Type::ComputedName` to the if condition

>> Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:4868
>> +        if (setFunctionName)
> 
> and replace this with `if (generator.shouldSetFunctionName(m_assign)`

and replace this with `if (generator.shouldSetFunctionName(m_assign)`
Comment 21 Xan Lopez 2020-10-28 02:34:56 PDT
Created attachment 412513 [details]
Use symbols as identifiers, v8

v8,
Hrm, what you suggest does not seem to work (new failures introduced); a bit odd, since generator.shouldSetFunctionName() seems to be read-only.
I have rearranged the code a bit so that the boolean is used in both places and we keep a single call to the function. Seems a bit neater to me, let me know if that's fine.
Comment 22 Caio Lima 2020-11-06 10:14:12 PST
Comment on attachment 412513 [details]
Use symbols as identifiers, v8

View in context: https://bugs.webkit.org/attachment.cgi?id=412513&action=review

Patch LGTM. I left a comment below.

> Source/WTF/wtf/text/SymbolRegistry.h:94
> +    CreateSymbol m_createSymbol { nullptr };

I'm wondering if it is really worth it have those 2 new members on `SymbolRegistry`. I'm neutral on `m_symbolType` but I'm not sold on `m_createSymbol`, since we could call either `RegisteredSymbolImpl::createPrivate` or `RegisteredSymbolImpl::create` directly on `symbolForKey`  based on `m_symbolType`.
Comment 23 Caio Lima 2020-11-06 10:17:54 PST
Comment on attachment 412513 [details]
Use symbols as identifiers, v8

View in context: https://bugs.webkit.org/attachment.cgi?id=412513&action=review

> Source/JavaScriptCore/ChangeLog:14
> +        No new tests since no functionality was changed.

I think we are also fixing bug on `emitSetFunctionName` for computed fields, right? If that's the case, I think we should include tests that this patch is fixing
Comment 24 Xan Lopez 2020-11-06 10:46:31 PST
(In reply to Caio Lima from comment #22)
> Comment on attachment 412513 [details]
> Use symbols as identifiers, v8
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=412513&action=review
> 
> Patch LGTM. I left a comment below.
> 
> > Source/WTF/wtf/text/SymbolRegistry.h:94
> > +    CreateSymbol m_createSymbol { nullptr };
> 
> I'm wondering if it is really worth it have those 2 new members on
> `SymbolRegistry`. I'm neutral on `m_symbolType` but I'm not sold on
> `m_createSymbol`, since we could call either
> `RegisteredSymbolImpl::createPrivate` or `RegisteredSymbolImpl::create`
> directly on `symbolForKey`  based on `m_symbolType`.

Yeah, I think a previous version did this. If memory is a concern we could do it again (also the opposite, just set the function once in the ctor and then forget the type).

About the test, there is one in test262 for static fields that verifies this works, but it won't land with this patch. We should add a test262 test for instance fields too, but it also would not land with this patch, right? I'd say doing that is enough coverage, but I could add a test for it locally if you think that's important.
Comment 25 Xan Lopez 2020-11-09 02:12:44 PST
Created attachment 413571 [details]
Use symbols as identifiers, v9

v9,
do not store the symbol creation method in SymbolRegistry. We'll figure out the correct one from the symbol type each time, this function is not called very often so there's no need for the micro-optimization.
Comment 26 Yusuke Suzuki 2020-11-09 03:00:40 PST
Comment on attachment 413571 [details]
Use symbols as identifiers, v9

View in context: https://bugs.webkit.org/attachment.cgi?id=413571&action=review

> Source/WTF/wtf/text/SymbolRegistry.cpp:56
> +    CreateSymbol m_createSymbol;
> +
> +    if (m_symbolType == Type::PrivateSymbol)
> +        m_createSymbol = &RegisteredSymbolImpl::createPrivate;
> +    else
> +        m_createSymbol = &RegisteredSymbolImpl::create;
> +
> +    auto symbol = m_createSymbol(*rep.impl(), *this);

Let's just switch 

if (m_symbolType == Type::PrivateSymbol)
    symbol = RegisteredSymbolImpl::createPrivate(...);
else
    symbol = RegisteredSymbolImpl::create(...);

We don't need to make this a indirect call, and it is slower.
Comment 27 Xan Lopez 2020-11-09 03:34:07 PST
Created attachment 413577 [details]
Use symbols as identifiers, v10

v10,
do not use an indirect call to create symbols in the registry
Comment 28 Yusuke Suzuki 2020-11-09 17:45:21 PST
Comment on attachment 413577 [details]
Use symbols as identifiers, v10

View in context: https://bugs.webkit.org/attachment.cgi?id=413577&action=review

Looks good overall. Some nits.

> Source/JavaScriptCore/parser/Parser.cpp:2840
> +#define INSTANCE_COMPUTED_NAME_PREFIX "instanceComputedName"

Let's use `static constexpr ASCIILiteral instanceComputedNamePrefix { "instanceComputedName"_s };`

> Source/JavaScriptCore/parser/ParserArena.cpp:110
> +const Identifier& IdentifierArena::makePrivateIdentifier(VM& vm, const char* prefix, unsigned identifier)

Let's use `ASCIILiteral` instead of `const char*`.

> Source/WTF/wtf/text/SymbolRegistry.h:81
> +    enum class Type { PublicSymbol, PrivateSymbol };

Make it `enum class Type : uint8_t`

> Source/WTF/wtf/text/SymbolRegistry.h:83
> +    SymbolRegistry(Type type) { m_symbolType = type; };

Set it in initializer list instead of function body.
Comment 29 Xan Lopez 2020-11-10 01:25:39 PST
Created attachment 413678 [details]
Use symbols as identifiers, v11

v11,
fix style issues pointed out in previous comment
Comment 30 Xan Lopez 2020-11-10 01:42:32 PST
Created attachment 413679 [details]
Use symbols as identifiers, 12

v12,
try to make the wincairo build happy with WTF_EXPORT_PRIVATE
Comment 31 Xan Lopez 2020-11-10 02:14:14 PST
Created attachment 413681 [details]
Use symbols as identifiers, v13

v13,
and remove the now unnecessary function body type initialization, forgot about it
Comment 32 Yusuke Suzuki 2020-11-10 16:22:18 PST
Comment on attachment 413681 [details]
Use symbols as identifiers, v13

View in context: https://bugs.webkit.org/attachment.cgi?id=413681&action=review

> Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:4827
> +        if (m_ident && shouldSetFunctionName && m_type != DefineFieldNode::Type::ComputedName)

It seems that `m_type != DefineFieldNode::Type::ComputedName` is new behavior. Can you add a test for this?
Comment 33 Xan Lopez 2020-11-11 01:15:42 PST
Created attachment 413795 [details]
Use symbols as identifiers, v14

v14,
add a test for computed fields+setFunctionName
(this will be also fully tested in test262 when class fields is feature complete in WebKit)
Comment 34 Caio Lima 2020-11-11 03:51:07 PST
(In reply to Xan Lopez from comment #24)
> (In reply to Caio Lima from comment #22)
> 
> About the test, there is one in test262 for static fields that verifies this
> works, but it won't land with this patch. We should add a test262 test for
> instance fields too, but it also would not land with this patch, right? I'd
> say doing that is enough coverage, but I could add a test for it locally if
> you think that's important.

The problem with relying on Test262 coverage is that we don't run them on EWS yet. This makes our life harder to know when we break things not included on stress tests, since we will only figure out that they broke after landing it.

I think it's a very good practice try to have stress test to increase coverage of every change we make on a patch, with a few exceptions of cases where it's hard to reproduce them.
Comment 35 Caio Lima 2020-11-11 04:05:35 PST
Comment on attachment 413795 [details]
Use symbols as identifiers, v14

View in context: https://bugs.webkit.org/attachment.cgi?id=413795&action=review

> JSTests/stress/class-fields-harmony.js:922
> +}

As I commented above, I think it's valuable have tests asserting that we are setting function name properly. I think there are at least 2 new cases we'd like to include:

1. Computed field with some function expression on assignment.
2. Computed field with class expression on assignment.

So we'd like to assert for those cases `assertSame(c[x].name, <x evaluation to function name>)`.
I'd personally would include cases where `x` is not a trivial primitive, but also an object.
Comment 36 Xan Lopez 2020-11-11 04:40:08 PST
Created attachment 413811 [details]
Use symbols as identifiers, v15

v15,
add a few more explicit tests as suggested by Caio. The previous test was testing setFunctionName but implicitly.
Comment 37 Yusuke Suzuki 2020-11-13 14:00:21 PST
Comment on attachment 413811 [details]
Use symbols as identifiers, v15

r=me
Comment 38 EWS 2020-11-13 14:32:24 PST
Committed r269801: <https://trac.webkit.org/changeset/269801>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 413811 [details].