Bug 226984 - Add a new pattern to instruction selector to utilize UBFX supported by ARM64
Summary: Add a new pattern to instruction selector to utilize UBFX supported by ARM64
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Yijia Huang
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-14 12:38 PDT by Yijia Huang
Modified: 2021-06-17 12:04 PDT (History)
8 users (show)

See Also:


Attachments
Patch (12.54 KB, patch)
2021-06-16 14:10 PDT, Yijia Huang
no flags Details | Formatted Diff | Diff
Patch (12.55 KB, patch)
2021-06-16 14:25 PDT, Yijia Huang
no flags Details | Formatted Diff | Diff
Patch (16.68 KB, patch)
2021-06-16 17:42 PDT, Yijia Huang
no flags Details | Formatted Diff | Diff
Patch (16.03 KB, patch)
2021-06-16 21:36 PDT, Yijia Huang
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (16.22 KB, patch)
2021-06-16 22:24 PDT, Yijia Huang
no flags Details | Formatted Diff | Diff
Patch (16.53 KB, patch)
2021-06-17 09:15 PDT, Yijia Huang
no flags Details | Formatted Diff | Diff
Patch (16.53 KB, patch)
2021-06-17 09:19 PDT, Yijia Huang
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yijia Huang 2021-06-14 12:38:29 PDT
...
Comment 1 Yijia Huang 2021-06-16 14:10:13 PDT
Created attachment 431600 [details]
Patch
Comment 2 Yijia Huang 2021-06-16 14:25:44 PDT
Created attachment 431603 [details]
Patch
Comment 3 Yijia Huang 2021-06-16 17:42:25 PDT
Created attachment 431625 [details]
Patch
Comment 4 Filip Pizlo 2021-06-16 19:38:09 PDT
Comment on attachment 431625 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=431625&action=review

I think it looks right. I just have stylistic comments. The one about using multiple lines to declare multiple variables is important since that’s the style we use in JSC (though apparently WebKit style not is ok with what you did there). 

I do recommend you try to detect if an int has the low 1 bits in a somewhat more clever way, though the way you did it is right (I think). It’s just there are these cheap tricks for doing it, like mask & (mask + 1).

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:2720
> +                    auto contiguousOnesFromLSB = [&] (int64_t n) -> int64_t {

I feel like you could be a lot more creative in how you implement this and probably come up with something that is easier to deal with. 

You want a value that is one less than a power of two. So, if you add one, you should either get zero (because it was already all 1 bits) or you will get a value where __builtin_popcount(value) == 1. 

Another technique is mask & (mask + 1). If that is zero then you must have had the all-1 pattern in the low bits. 

There may be a function in WTF/wtf/MathExtras.h or StdLibExtras.h that does something like this.

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:2721
> +                        int64_t mask = 1, width = 1;

I think we would usually use separate lines:

int64_t mask = 1;
int64_t width = 1;
Comment 5 Yijia Huang 2021-06-16 21:36:37 PDT
Created attachment 431631 [details]
Patch
Comment 6 Yijia Huang 2021-06-16 21:37:30 PDT
(In reply to Filip Pizlo from comment #4)
> Comment on attachment 431625 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=431625&action=review
> 
> I think it looks right. I just have stylistic comments. The one about using
> multiple lines to declare multiple variables is important since that’s the
> style we use in JSC (though apparently WebKit style not is ok with what you
> did there). 
> 
> I do recommend you try to detect if an int has the low 1 bits in a somewhat
> more clever way, though the way you did it is right (I think). It’s just
> there are these cheap tricks for doing it, like mask & (mask + 1).
> 
> > Source/JavaScriptCore/b3/B3LowerToAir.cpp:2720
> > +                    auto contiguousOnesFromLSB = [&] (int64_t n) -> int64_t {
> 
> I feel like you could be a lot more creative in how you implement this and
> probably come up with something that is easier to deal with. 
> 
> You want a value that is one less than a power of two. So, if you add one,
> you should either get zero (because it was already all 1 bits) or you will
> get a value where __builtin_popcount(value) == 1. 
> 
> Another technique is mask & (mask + 1). If that is zero then you must have
> had the all-1 pattern in the low bits. 
> 
> There may be a function in WTF/wtf/MathExtras.h or StdLibExtras.h that does
> something like this.
> 
> > Source/JavaScriptCore/b3/B3LowerToAir.cpp:2721
> > +                        int64_t mask = 1, width = 1;
> 
> I think we would usually use separate lines:
> 
> int64_t mask = 1;
> int64_t width = 1;

Thank you for the review!
Comment 7 Yijia Huang 2021-06-16 22:24:03 PDT
Created attachment 431633 [details]
Patch
Comment 8 Filip Pizlo 2021-06-17 08:08:34 PDT
Comment on attachment 431633 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=431633&action=review

It still looks good, but I would check that you're not missing an obvious additional matching opportunity.

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:2728
> +                        && lsb + width < (32 << (opcode == Ubfx64))

Is that right?

So, it's not possible to do, say, (src >> 16) & 0xFFFF?

It feels like this should be a <=, not a <.
Comment 9 Yijia Huang 2021-06-17 08:27:53 PDT
(In reply to Filip Pizlo from comment #8)
> Comment on attachment 431633 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=431633&action=review
> 
> It still looks good, but I would check that you're not missing an obvious
> additional matching opportunity.
> 
> > Source/JavaScriptCore/b3/B3LowerToAir.cpp:2728
> > +                        && lsb + width < (32 << (opcode == Ubfx64))
> 
> Is that right?
> 
> So, it's not possible to do, say, (src >> 16) & 0xFFFF?
> 
> It feels like this should be a <=, not a <.

I have been thinking about the same question for a while. The reason why I use < instead of <= is due to the benefit and efficiency. Saam suggested I check with godbolt.org about how GCC optimized code (from c++ program to assembly).

In godbolt.org, I set "ARM64 gcc trunk" with "-O2". Then, I tested the following code:

#include <stdint.h>

uint64_t p64(uint64_t src) {
    char lsb = 34;
    char width = 30;
    return (src >> lsb) & ((1ULL << width) - 1ULL);
}

uint32_t p11(uint32_t src) {
    char lsb = 18;
    char width = 14;
    return (src >> lsb) & ((1U << width) - 1U);
}

The corresponding output provides:

p64(unsigned long):
        lsr     x0, x0, 34
        ret
p11(unsigned int):
        lsr     w0, w0, 18
        ret

So, it seems like the compiler prefers ubfx, when lsb + width < 32 or 64.

Please correct me if I am wrong. And please let me know what should I do for this.

1. Use ubfx for <=
2. Use lsr when ==
3. Skip pattern match when ==
Comment 10 Filip Pizlo 2021-06-17 08:53:11 PDT
I see, in the case where it’s ==, we could have just emitted a right shift without any masking. 

Do we do that?

I would guess we do that anyway because reduceStrength should turn (x >> 16) & 0xffff into just x>> 16. If it doesn’t then we need that rule anyway and that could be a great follow up patch for you. 

So, once we have that strength reduction, if the instruction selector does <= then this rule would only match if reduceStrength missed something. But it’s correct. I think I’ve usually erred on the side of the instruction selector supporting every legal pattern matching opportunity, even ones that are redundant with reduceStrength rules, if doing so doesn’t make the code more complex. 

In this case, using <= definitely doesn’t make the code more complex. It makes it simpler in the sense that anyone looking at this code won’t be scratching their heads thinking we are missing something.
Comment 11 Yijia Huang 2021-06-17 09:15:45 PDT
Created attachment 431679 [details]
Patch
Comment 12 Yijia Huang 2021-06-17 09:19:05 PDT
Created attachment 431680 [details]
Patch
Comment 13 EWS 2021-06-17 10:17:02 PDT
Committed r278994 (238920@main): <https://commits.webkit.org/238920@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 431680 [details].
Comment 14 Saam Barati 2021-06-17 12:04:02 PDT
Comment on attachment 431680 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=431680&action=review

nice, lgtm. Just some nits on making the code a bit simpler to read using variable names to express some tricky conditions.

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:2716
> +            // UBFX Pattern: dest = (src >> lsb) & ((1 << width) - 1)

I think it's worth highlighting here that "((1 << width) - 1)" is really just a constant. Maybe say something like:
"dest = (src >> lab) & mask, where mask is of the form ((1 << width) - 1)"

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:2723
> +                    uint8_t width = static_cast<uint8_t>(!(mask & (mask + 1))) * WTF::bitCount(mask);

this code is a bit too tricky for my liking:

Why not something a bit more obvious for less smart people like me:

bool isValidMask = mask && !(mask & (mask + 1)), and just branch on that? You can also remove the "width > 0" check below if you do

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:2728
> +                        && lsb + width <= (32 << (opcode == Ubfx64))

"(32 << (opcode == Ubfx64))" is quite subtle. Not that I don't get it, but why not just help readers here and define a variable above to be:

"maxBitWidth = opcode == Ubfx64 ? 64 : 32";

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:2729
> +                        && isValidForm(opcode, Arg::Tmp, Arg::Imm, Arg::Imm, Arg::Tmp))  {

I'd do this "isValidForm" check way above as the very first thing you branch on, since it should be proven by the compiler as false on x86.