Bug 229175

Summary: [ARM64] Fix pre-index address mode
Product: WebKit Reporter: Yijia Huang <yijia_huang>
Component: JavaScriptCoreAssignee: Yijia Huang <yijia_huang>
Status: REOPENED ---    
Severity: Normal CC: commit-queue, ews-watchlist, keith_miller, mark.lam, msaboff, saam, tzagallo, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 229609    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
saam: review+
Patch for landing ews-feeder: commit-queue-

Description Yijia Huang 2021-08-16 20:04:52 PDT
...
Comment 1 Yijia Huang 2021-08-17 08:13:21 PDT
Created attachment 435686 [details]
Patch
Comment 2 Yijia Huang 2021-08-17 08:16:33 PDT
Created attachment 435688 [details]
Patch
Comment 3 Yijia Huang 2021-08-17 13:04:26 PDT
Created attachment 435703 [details]
Patch
Comment 4 Yijia Huang 2021-08-21 04:02:06 PDT
Created attachment 436068 [details]
Patch
Comment 5 Yijia Huang 2021-08-21 10:29:58 PDT
Created attachment 436077 [details]
Patch
Comment 6 Yijia Huang 2021-08-23 09:09:36 PDT
Created attachment 436201 [details]
Patch
Comment 7 Yijia Huang 2021-08-23 11:01:16 PDT
Created attachment 436211 [details]
Patch
Comment 8 Yijia Huang 2021-08-23 13:10:59 PDT
Created attachment 436226 [details]
Patch
Comment 9 Radar WebKit Bug Importer 2021-08-23 20:05:29 PDT
<rdar://problem/82273878>
Comment 10 Yijia Huang 2021-08-24 10:54:47 PDT
Created attachment 436308 [details]
Patch
Comment 11 Saam Barati 2021-08-24 14:15:33 PDT
Comment on attachment 436308 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436308&action=review

Mostly LGTM, but I have a lot of minor comments/suggestions, and a bigger comment/suggestion on the main dominance check algorithm.

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:61
> +    HashMap<Value*, Vector<Value*>> addToParents;

name nit: I'd call this something like "addUses" or "usersOfAdd"

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:77
> +                // Pre-Index Load Pattern:

Might also be worth a comment at the top of this file that strength reduction will leave the IR in the form we're matching now. It'll take an add(x, constant) that's an address and move the offset into the load itself, and that's why we can match this redundancy.

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:89
> +                if (!baseOffsetToAddresses.contains(baseOffsetkey))
> +                    break;
> +                for (Value* address : baseOffsetToAddresses.get(baseOffsetkey)) {

this is doing two hash table lookups. I believe the canonical way to do this in WK is:

auto iter = baseOffsetToAddresses.find(baseOffsetkey);
if (iter == baseOffsetToAddresses.end())
    break;
for (Value* address : iter->value) { ... }

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:92
> +                    if (addToParents.contains(address) && addToParents.find(address)->value.size() > 0)
> +                        continue;

ditto two hash table lookups here

Also, more importantly, this algorithm looks a bit iffy. Currently, you may be saved by the controlEquivalent check, but I think a better algorithm here would just be to check that the load dominates all users of the add, perhaps in some separate check below after gathering all candidates. Let me give you an example that's saved by your controlEquivalent check, but is the reason why I don't love the current algorithm.

```
bb#0:
x = add()
branch(#1, #2)

bb#1:
load(x)
jump#3

bb#2:
use(x)
jump #3

bb#3:
return
```

Since we're iterating the graph in pre-order, we're guaranteed to visit #0 before #1 or #2. But, we're not guaranteed to visit #2 before #1. If we visit #1 before #2. I think it's much nicer to just directly check that all users of the add are dominated by the load we're moving to.

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:126
> +            // Post-Index Load Pattern:
> +            //     memory = Load(base, 0)
> +            //     ...
> +            //     address = Add(base, offset)
> +            //
> +            // Post-Index Store Pattern:
> +            //     memory = Store(value1, base, 0)
> +            //     ...
> +            //     address = Add(base, offset)

nit: I think no need to write both load/store variants in all your comments, you can just be like:

```
Post-Index Pattern:
memory = MemoryValue(base, 0)
...
address = Add(base, offset)
```

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:129
> +            if (!baseToMemories.contains(left))
> +                break;
> +            for (MemoryValue* memory : baseToMemories.get(left)) {

ditto: two hash table lookups here, too

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:142
> +            if (addToParents.contains(child))
> +                addToParents.find(child)->value.append(value);

ditto, two hash table lookups here

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:187
> +                        Value* child = parent->child(i);
> +                        if (child == address)
> +                            parent->child(i) = newAddress;

small nit: I think you could write this without invoking parent->child(i) twice. Not crucial, but a touch nicer IMO, if you just make it:
Value*& child = parent->child(i);
if (child == address)
    child = newAddress;

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:208
> +            // Convert Post-Index Load Pattern to the Canonical Form:
> +            //     ...                                  newOffset = Constant
> +            //     ...                                  newAddress = Add(base, newOffset)
> +            //     memory = Load(base, 0)               memory = Load(base, 0)
> +            //     ...                            -->   ...
> +            //     address = Add(base, offset)          address = Identity(newAddress)
> +            //
> +            // Convert Post-Index Store Pattern to the Canonical Form:
> +            //     ...                                  newOffset = Constant
> +            //     ...                                  newAddress = Add(base, newOffset)
> +            //     memory = Store(value, base, 0)       memory = Store(value, base, 0)
> +            //     ...                            -->   ...
> +            //     address = Add(base, offset)          address = Identity(newAddress)

ditto, let's just make this one comment abstracting the difference between store/load, since there is no need to double the size of the comment. Same goes to elsewhere in this file.
Comment 12 Yijia Huang 2021-08-24 16:37:32 PDT
Created attachment 436345 [details]
Patch
Comment 13 Saam Barati 2021-08-24 19:17:04 PDT
Comment on attachment 436345 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436345&action=review

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:163
> +                auto dominateUses = [&] () -> bool {

name nit: dominatesAllAddUses

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:165
> +                    if (iter == addUses.end() || !iter->value.size())

I think this will return false if the memory value is the only user of the add. Is that right? If so, that seems wrong.

Actually, re-reading the code, I think the Load will always be put as a user of this. So size() is guaranteed to be >= 1. But see below for comments on dominance.

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:169
> +                        if (!dominators.dominates(memory->owner, parent->owner))
> +                            return false;

So this is just checking if the basic block of memory dominates parent, but this won't give you a precise answer if memory->owner == parent->owner. So if they're in the same block, we need to check that the position in that block dominates.

However, maybe we can do this when visiting the Load instruction above. When visiting the Load, if we see that addUses isn't empty, it means there was a user in the same block. So we can remove it from being a candidate.

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:212
> +        // This will reset values' indexes if there are any insertions.

why does this matter?
Comment 14 Yijia Huang 2021-08-24 22:49:22 PDT
Comment on attachment 436345 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436345&action=review

>> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:212
>> +        // This will reset values' indexes if there are any insertions.
> 
> why does this matter?

Because we preprocess the index information in the memoryToIndex map. Suppose there are several matched patterns in the same basic block. If we process them one by one in the sequence, the latter real value indexes will be changed with a displacement. And the mechanism of updating the insertions in the basic block is basic-block-based. So, we need to collect all matched patterns in one basic block and update the insertions at one time. And this is one of the challenges to implement this algorithm.
Comment 15 Yijia Huang 2021-08-24 22:58:40 PDT
Comment on attachment 436345 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436345&action=review

Because we preprocess the index information in the memoryToIndex map. Suppose there are several matched patterns in the same basic block. If we process them one by one in the sequence, the latter real value indexes will be changed with a displacement. And the mechanism of updating the insertions in the basic block is basic-block-based. So, we need to collect all matched patterns in one basic block and update the insertions at one time. And this is one of the challenges to implement this algorithm.

>>> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:212
>>> +        // This will reset values' indexes if there are any insertions.
>> 
>> why does this matter?
> 
> Because we preprocess the index information in the memoryToIndex map. Suppose there are several matched patterns in the same basic block. If we process them one by one in the sequence, the latter real value indexes will be changed with a displacement. And the mechanism of updating the insertions in the basic block is basic-block-based. So, we need to collect all matched patterns in one basic block and update the insertions at one time. And this is one of the challenges to implement this algorithm.

So, insertionSet.insert is just collecting all inserted values but does not actually do the insertions. And insertionSet.execute(basicBlock); is the execution to insert those collected values to the insertionIndex in the target basic block. This is tricky. It took me some time to figure that out since we initially would think that insertionSet.insert will do the real insertions.
Comment 16 Yijia Huang 2021-08-24 23:17:23 PDT
Comment on attachment 436345 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436345&action=review

>> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:165
>> +                    if (iter == addUses.end() || !iter->value.size())
> 
> I think this will return false if the memory value is the only user of the add. Is that right? If so, that seems wrong.
> 
> Actually, re-reading the code, I think the Load will always be put as a user of this. So size() is guaranteed to be >= 1. But see below for comments on dominance.

After strength reduction, the address is still in the B3 IR which indicates there must be other users. Here, I want to make sure this scenario is always true. Maybe I should an assertion here.
Comment 17 Yijia Huang 2021-08-24 23:40:53 PDT
Created attachment 436371 [details]
Patch
Comment 18 Yijia Huang 2021-08-25 08:58:56 PDT
Created attachment 436393 [details]
Patch
Comment 19 Yijia Huang 2021-08-25 10:03:28 PDT
Created attachment 436398 [details]
Patch
Comment 20 Saam Barati 2021-08-25 12:06:45 PDT
Comment on attachment 436398 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436398&action=review

r=me with comments

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:93
> +                    auto addUsesIter = addUses.find(address);
> +                    if (addUsesIter != addUses.end() && addUsesIter->value.size() > 0)
> +                        continue;

I think this is a bit more subtle, and deserves a subtle comment. You may want to refine, but something like:

```
The goal is to move the Add to this Load/Store. We only move the Add to this Memory value if we guarantee it dominates all uses.
If there are already other uses of the Add, it guarantees this Memory value doesn't dominate those uses. This is because we're visiting the program
in pre-order traversal, so we visit this Memory value before all the things it dominates. So if the Add has other users, we must not dominate those
users. Therefore, this MemoryValue won't be a candidate.
```

An alternative approach could be to just record indices of all users of the Add. And in your dominance check, you can also add a same-block dominance check, where the index of the memory value is < index of the Add user if they are in the same basic block. That algorithm requires more space, but is less subtle, and more explicit in what it's doing.

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:160
> +            // Convert Pre-Index Load/Store Pattern to the Canonical Form:

both loops inside this loop check for "handledValues.contains(memory)". But that check should go here before the loops below.

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:185
> +                Value* newAddress = insertionSet.insert<Value>(insertionIndex, Add, memory->origin(), address->child(0), address->child(1));

nit: I forgot about the nicer way to do this. We have a clone(Value*) function on Procedure. So I think the slightly more idiomatic way to do this is:
Value* newAddress = proc.clone(address);

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:186
> +                // update all address references with the newAddress

nit: remove this comment, since it's just saying exactly what the code below is doing. We avoid these kinds of comments in WebKit, like:

// do X
X

> Source/JavaScriptCore/b3/B3CanonicalizePrePostIncrements.cpp:219
> +        // This will reset values' indexes if there are any insertions.

I see what you're saying now. I'd just say that after this executes, memoryToIndex no longer contains up to date information for this block. But that's ok because we never touch this block again.

> Source/JavaScriptCore/b3/B3ValueKeyInlines.h:41
> +    // The observation is that when both child->index() and offset are 0's,
> +    // HashMap would not accept the ValueKey.

Isn't it sufficient to just do child->index() + 1? Why also offset + 1?
Comment 21 Yijia Huang 2021-08-25 12:22:32 PDT
Comment on attachment 436398 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436398&action=review

>> Source/JavaScriptCore/b3/B3ValueKeyInlines.h:41
>> +    // HashMap would not accept the ValueKey.
> 
> Isn't it sufficient to just do child->index() + 1? Why also offset + 1?

Actually, this update is for the Load/Store pair. When we have MemoryValue with base and offset, we can search for:

ValueKey(base, offset + bytes) and ValueKey(base, offset - bytes)

Here, the offset could be 0.
Comment 22 Yijia Huang 2021-08-25 13:22:44 PDT
Created attachment 436412 [details]
Patch for landing
Comment 23 EWS 2021-08-25 14:19:38 PDT
Committed r281587 (240951@main): <https://commits.webkit.org/240951@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 436412 [details].
Comment 24 WebKit Commit Bot 2021-08-27 07:40:38 PDT
Re-opened since this is blocked by bug 229609
Comment 25 Yijia Huang 2021-08-27 17:08:36 PDT
Yusuke found that stress/put-to-proto-chain-overrides-put.js.ftl-eager is failing only in OSS bot (https://build.webkit.org/#/builders/102). I bisected and found that this is caused by https://trac.webkit.org/changeset/281587/webkit.

The failure only happens on ARM64. Not in ARM64E (That’s why OSS bot is failing while Internal bot is not failing). The way to reproduce this crash is, In one shell, building WebKit to stress the machine In the other multiple shells (4~ shells), run source testing.zsh

The testing.zsh is:
```
for ((i = 0; i < 10000; i++)); do
    echo "Run $i"
    VM=~/dev/OpenSource/WebKitBuild/Release; DYLD_FRAMEWORK_PATH=$VM $VM/jsc --useFTLJIT\=false --useFunctionDotArguments\=true --validateExceptionChecks\=true --useDollarVM\=true --maxPerThreadStackUsage\=1572864 --airForceBriggsAllocator\=true --useRandomizingExecutableIslandAllocation\=true --forcePolyProto\=true --useDataIC\=true --useFTLJIT\=true --thresholdForJITAfterWarmUp\=10 --thresholdForJITSoon\=10 --thresholdForOptimizeAfterWarmUp\=20 --thresholdForOptimizeAfterLongWarmUp\=20 --thresholdForOptimizeSoon\=20 --thresholdForFTLOptimizeAfterWarmUp\=20 --thresholdForFTLOptimizeSoon\=20 --thresholdForOMGOptimizeAfterWarmUp\=20 --thresholdForOMGOptimizeSoon\=20 --maximumEvalCacheableSourceLength\=150000 --useEagerCodeBlockJettisonTiming\=true --repatchBufferingCountdown\=0 --collectContinuously\=true --useGenerationalGC\=false --verifyGC\=true JSTests/stress/put-to-proto-chain-overrides-put.js
done
```

In the investigation, the problem is related to store-with-post-increment. I have confirmed that the canonicalization works fine.

So, I suspect that the problem is due to the application of store-with-post-increment in the B3LowerToAir.cpp

```
incrementArg = Arg::postIndex(tmp(address), smallOffset);
```
Comment 26 Yijia Huang 2021-08-27 17:09:31 PDT
The detailed discussion is in this slack thread: https://a1391192.slack.com/archives/GM9EK0UGL/p1630059992114800