226984 – Add a new pattern to instruction selector to utilize UBFX supported by ARM64

RESOLVED FIXED 226984

Add a new pattern to instruction selector to utilize UBFX supported by ARM64

https://bugs.webkit.org/show_bug.cgi?id=226984

Summary Add a new pattern to instruction selector to utilize UBFX supported by ARM64

Yijia Huang

Reported 2021-06-14 12:38:29 PDT

...

Attachments
Patch (12.54 KB, patch) 2021-06-16 14:10 PDT, Yijia Huang	no flags	Details Formatted Diff Diff
Patch (12.55 KB, patch) 2021-06-16 14:25 PDT, Yijia Huang	no flags	Details Formatted Diff Diff
Patch (16.68 KB, patch) 2021-06-16 17:42 PDT, Yijia Huang	no flags	Details Formatted Diff Diff
Patch (16.03 KB, patch) 2021-06-16 21:36 PDT, Yijia Huang	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (16.22 KB, patch) 2021-06-16 22:24 PDT, Yijia Huang	no flags	Details Formatted Diff Diff
Patch (16.53 KB, patch) 2021-06-17 09:15 PDT, Yijia Huang	no flags	Details Formatted Diff Diff
Patch (16.53 KB, patch) 2021-06-17 09:19 PDT, Yijia Huang	no flags	Details Formatted Diff Diff
Show Obsolete (6) View All Add attachment proposed patch, testcase, etc.

Yijia Huang

Comment 1 2021-06-16 14:10:13 PDT

Created attachment 431600 [details] Patch

Yijia Huang

Comment 2 2021-06-16 14:25:44 PDT

Created attachment 431603 [details] Patch

Yijia Huang

Comment 3 2021-06-16 17:42:25 PDT

Created attachment 431625 [details] Patch

Filip Pizlo

Comment 4 2021-06-16 19:38:09 PDT

Comment on attachment 431625 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=431625&action=review I think it looks right. I just have stylistic comments. The one about using multiple lines to declare multiple variables is important since that’s the style we use in JSC (though apparently WebKit style not is ok with what you did there). I do recommend you try to detect if an int has the low 1 bits in a somewhat more clever way, though the way you did it is right (I think). It’s just there are these cheap tricks for doing it, like mask & (mask + 1). > Source/JavaScriptCore/b3/B3LowerToAir.cpp:2720 > + auto contiguousOnesFromLSB = [&] (int64_t n) -> int64_t { I feel like you could be a lot more creative in how you implement this and probably come up with something that is easier to deal with. You want a value that is one less than a power of two. So, if you add one, you should either get zero (because it was already all 1 bits) or you will get a value where __builtin_popcount(value) == 1. Another technique is mask & (mask + 1). If that is zero then you must have had the all-1 pattern in the low bits. There may be a function in WTF/wtf/MathExtras.h or StdLibExtras.h that does something like this. > Source/JavaScriptCore/b3/B3LowerToAir.cpp:2721 > + int64_t mask = 1, width = 1; I think we would usually use separate lines: int64_t mask = 1; int64_t width = 1;

Yijia Huang

Comment 5 2021-06-16 21:36:37 PDT

Created attachment 431631 [details] Patch

Yijia Huang

Comment 6 2021-06-16 21:37:30 PDT

(In reply to Filip Pizlo from comment #4) > Comment on attachment 431625 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=431625&action=review > > I think it looks right. I just have stylistic comments. The one about using > multiple lines to declare multiple variables is important since that’s the > style we use in JSC (though apparently WebKit style not is ok with what you > did there). > > I do recommend you try to detect if an int has the low 1 bits in a somewhat > more clever way, though the way you did it is right (I think). It’s just > there are these cheap tricks for doing it, like mask & (mask + 1). > > > Source/JavaScriptCore/b3/B3LowerToAir.cpp:2720 > > + auto contiguousOnesFromLSB = [&] (int64_t n) -> int64_t { > > I feel like you could be a lot more creative in how you implement this and > probably come up with something that is easier to deal with. > > You want a value that is one less than a power of two. So, if you add one, > you should either get zero (because it was already all 1 bits) or you will > get a value where __builtin_popcount(value) == 1. > > Another technique is mask & (mask + 1). If that is zero then you must have > had the all-1 pattern in the low bits. > > There may be a function in WTF/wtf/MathExtras.h or StdLibExtras.h that does > something like this. > > > Source/JavaScriptCore/b3/B3LowerToAir.cpp:2721 > > + int64_t mask = 1, width = 1; > > I think we would usually use separate lines: > > int64_t mask = 1; > int64_t width = 1; Thank you for the review!

Yijia Huang

Comment 7 2021-06-16 22:24:03 PDT

Created attachment 431633 [details] Patch

Filip Pizlo

Comment 8 2021-06-17 08:08:34 PDT

Comment on attachment 431633 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=431633&action=review It still looks good, but I would check that you're not missing an obvious additional matching opportunity. > Source/JavaScriptCore/b3/B3LowerToAir.cpp:2728 > + && lsb + width < (32 << (opcode == Ubfx64)) Is that right? So, it's not possible to do, say, (src >> 16) & 0xFFFF? It feels like this should be a <=, not a <.

Yijia Huang

Comment 9 2021-06-17 08:27:53 PDT

(In reply to Filip Pizlo from comment #8) > Comment on attachment 431633 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=431633&action=review > > It still looks good, but I would check that you're not missing an obvious > additional matching opportunity. > > > Source/JavaScriptCore/b3/B3LowerToAir.cpp:2728 > > + && lsb + width < (32 << (opcode == Ubfx64)) > > Is that right? > > So, it's not possible to do, say, (src >> 16) & 0xFFFF? > > It feels like this should be a <=, not a <. I have been thinking about the same question for a while. The reason why I use < instead of <= is due to the benefit and efficiency. Saam suggested I check with godbolt.org about how GCC optimized code (from c++ program to assembly). In godbolt.org, I set "ARM64 gcc trunk" with "-O2". Then, I tested the following code: #include <stdint.h> uint64_t p64(uint64_t src) { char lsb = 34; char width = 30; return (src >> lsb) & ((1ULL << width) - 1ULL); } uint32_t p11(uint32_t src) { char lsb = 18; char width = 14; return (src >> lsb) & ((1U << width) - 1U); } The corresponding output provides: p64(unsigned long): lsr x0, x0, 34 ret p11(unsigned int): lsr w0, w0, 18 ret So, it seems like the compiler prefers ubfx, when lsb + width < 32 or 64. Please correct me if I am wrong. And please let me know what should I do for this. 1. Use ubfx for <= 2. Use lsr when == 3. Skip pattern match when ==

Filip Pizlo

Comment 10 2021-06-17 08:53:11 PDT

I see, in the case where it’s ==, we could have just emitted a right shift without any masking. Do we do that? I would guess we do that anyway because reduceStrength should turn (x >> 16) & 0xffff into just x>> 16. If it doesn’t then we need that rule anyway and that could be a great follow up patch for you. So, once we have that strength reduction, if the instruction selector does <= then this rule would only match if reduceStrength missed something. But it’s correct. I think I’ve usually erred on the side of the instruction selector supporting every legal pattern matching opportunity, even ones that are redundant with reduceStrength rules, if doing so doesn’t make the code more complex. In this case, using <= definitely doesn’t make the code more complex. It makes it simpler in the sense that anyone looking at this code won’t be scratching their heads thinking we are missing something.

Yijia Huang

Comment 11 2021-06-17 09:15:45 PDT

Created attachment 431679 [details] Patch

Yijia Huang

Comment 12 2021-06-17 09:19:05 PDT

Created attachment 431680 [details] Patch

EWS

Comment 13 2021-06-17 10:17:02 PDT

Committed r278994 (238920@main): <https://commits.webkit.org/238920@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 431680 [details].

Saam Barati

Comment 14 2021-06-17 12:04:02 PDT

Comment on attachment 431680 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=431680&action=review nice, lgtm. Just some nits on making the code a bit simpler to read using variable names to express some tricky conditions. > Source/JavaScriptCore/b3/B3LowerToAir.cpp:2716 > + // UBFX Pattern: dest = (src >> lsb) & ((1 << width) - 1) I think it's worth highlighting here that "((1 << width) - 1)" is really just a constant. Maybe say something like: "dest = (src >> lab) & mask, where mask is of the form ((1 << width) - 1)" > Source/JavaScriptCore/b3/B3LowerToAir.cpp:2723 > + uint8_t width = static_cast<uint8_t>(!(mask & (mask + 1))) * WTF::bitCount(mask); this code is a bit too tricky for my liking: Why not something a bit more obvious for less smart people like me: bool isValidMask = mask && !(mask & (mask + 1)), and just branch on that? You can also remove the "width > 0" check below if you do > Source/JavaScriptCore/b3/B3LowerToAir.cpp:2728 > + && lsb + width <= (32 << (opcode == Ubfx64)) "(32 << (opcode == Ubfx64))" is quite subtle. Not that I don't get it, but why not just help readers here and define a variable above to be: "maxBitWidth = opcode == Ubfx64 ? 64 : 32"; > Source/JavaScriptCore/b3/B3LowerToAir.cpp:2729 > + && isValidForm(opcode, Arg::Tmp, Arg::Imm, Arg::Imm, Arg::Tmp)) { I'd do this "isValidForm" check way above as the very first thing you branch on, since it should be proven by the compiler as false on x86.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component JavaScriptCore

Assignee

Yijia Huang

Reported

2021-06-14 12:38 PDT

Modified

2021-06-17 12:04 PDT History

CC List

8 users Show

URL

Keywords

Depends on

Blocks