Bug 199891

Summary: [X86] Emit BT instruction for shift + mask in B3
Product: WebKit Reporter: Justin Michaud <justin_michaud>
Component: JavaScriptCoreAssignee: Justin Michaud <justin_michaud>
Status: RESOLVED FIXED    
Severity: Normal CC: commit-queue, ews-watchlist, fpizlo, keith_miller, mark.lam, msaboff, rmorisset, ryanhaddad, saam, tzagallo, webkit-bot-watchers-bugzilla, webkit-bug-importer, ysuzuki, zan
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=202099
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Archive of layout-test-results from ews115 for mac-highsierra
none
Patch
none
Patch
none
Patch
none
patch
none
Patch
rmorisset: review+
patch none

Description Justin Michaud 2019-07-17 17:15:36 PDT
We should emit the BT instruction for patterns like:
if (a & (1<<b))
if ((a>>b)&1)
if ((~a>>b)&1)
if (~a & (1<<b))
Comment 1 Justin Michaud 2019-07-17 17:25:30 PDT
Created attachment 374357 [details]
Patch
Comment 2 Justin Michaud 2019-07-17 17:34:57 PDT
Created attachment 374358 [details]
Patch
Comment 3 Robin Morisset 2019-07-17 17:57:49 PDT
Comment on attachment 374357 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=374357&action=review

> Source/JavaScriptCore/assembler/testmasm.cpp:1300
> +#endif

Is your code tested at all on 32-bits x86 ?

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:3333
> +                    Air::Opcode opcode = opcodeForType(BranchTestBit32, BranchTestBit64, andValue->type());

What happens to these Air opcodes on ARM?

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:3341
> +                    // Turn if ((val >> x)&1) -> Bt val x

You have three cases in this file:
if ((val >> x)&1) -> Bt val x
if (val & (1<<x)) -> Bt val x
if (~val & (1<<x))
But you have 4 cases in the changelog:
if (a & (1<<b))
if ((a>>b)&1)
if ((~a>>b)&1)
if (~a & (1<<b))
It looks like you are missing if((~a>>b)&1) here.

And in bit-test-non-constant, you test:
if ((number>>bit)&1)
if (((~number)>>(bit+1))&1)
if (number & (1<<(bit-1)))
So you're both testing a case you're not optimizing (~a>>b)&1, and not testing a case that you are optimizing (~a & (1<<b)).

In theory you could even add a fifth case if you want to be exhaustive: ~(a>>b)&1

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:3342
> +                    // For shifts, we always mask the mask with 31/61 so we do not need to worry about large values of x

I am not entirely sure I understand this comment. The semantic of shifts at B3 indeed requires this masking, but where is it actually applied in the code below?
Comment 4 Justin Michaud 2019-07-17 19:34:03 PDT
(In reply to Robin Morisset from comment #3)
> Comment on attachment 374357 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=374357&action=review
> 
> > Source/JavaScriptCore/assembler/testmasm.cpp:1300
> > +#endif
> 
> Is your code tested at all on 32-bits x86 ?
No, I don't have any hardware
 
> > Source/JavaScriptCore/b3/B3LowerToAir.cpp:3333
> > +                    Air::Opcode opcode = opcodeForType(BranchTestBit32, BranchTestBit64, andValue->type());
> 
> What happens to these Air opcodes on ARM?

They will have isValidForm() = false. I will double check that all of the tests run correctly.

> > Source/JavaScriptCore/b3/B3LowerToAir.cpp:3341
> > +                    // Turn if ((val >> x)&1) -> Bt val x
> 
> You have three cases in this file:
> if ((val >> x)&1) -> Bt val x
> if (val & (1<<x)) -> Bt val x
> if (~val & (1<<x))
> But you have 4 cases in the changelog:
> if (a & (1<<b))
> if ((a>>b)&1)
> if ((~a>>b)&1)
> if (~a & (1<<b))
> It looks like you are missing if((~a>>b)&1) here.

The comment is misleading. The case:
>if (bitBase && bitBase->opcode() == BitXor && (bitBase->child(1)->isInt32(-1) || bitBase->child(1)->isInt64(-1l))) {
Runs after bitBase is set, so it works in both cases.

> And in bit-test-non-constant, you test:
> if ((number>>bit)&1)
> if (((~number)>>(bit+1))&1)
> if (number & (1<<(bit-1)))
> So you're both testing a case you're not optimizing (~a>>b)&1, and not
> testing a case that you are optimizing (~a & (1<<b)).

I could add another test for the second case, but I verified the first case does get optimized.

> In theory you could even add a fifth case if you want to be exhaustive:
> ~(a>>b)&1

This complicates the commitInternal logic, so I avoided it. I will take a second look.

> > Source/JavaScriptCore/b3/B3LowerToAir.cpp:3342
> > +                    // For shifts, we always mask the mask with 31/61 so we do not need to worry about large values of x
> 
> I am not entirely sure I understand this comment. The semantic of shifts at
> B3 indeed requires this masking, but where is it actually applied in the
> code below?

The comment should probably be removed. The masking happens in the air opcode / x86 bt instruction
Comment 5 Justin Michaud 2019-07-18 10:28:34 PDT
Created attachment 374394 [details]
Patch
Comment 6 EWS Watchlist 2019-07-18 12:29:25 PDT
Comment on attachment 374394 [details]
Patch

Attachment 374394 [details] did not pass mac-debug-ews (mac):
Output: https://webkit-queues.webkit.org/results/12767403

New failing tests:
storage/indexeddb/pending-version-change-on-exit-private.html
Comment 7 EWS Watchlist 2019-07-18 12:29:27 PDT
Created attachment 374404 [details]
Archive of layout-test-results from ews115 for mac-highsierra

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews115  Port: mac-highsierra  Platform: Mac OS X 10.13.6
Comment 8 Justin Michaud 2019-07-18 12:39:54 PDT
Test failures seem unrelated
Comment 9 Saam Barati 2019-07-22 18:01:13 PDT
Comment on attachment 374394 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=374394&action=review

> Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:2649
> +        m_assembler.bt_ir(static_cast<unsigned>(bit.m_value)%32, reg);

add a space between ) and % and 3

> Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:2659
> +        m_assembler.bt_im(static_cast<unsigned>(bit.m_value)%32, bitBase.offset, bitBase.base);

ditto

> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:1108
> +        m_assembler.btw_ir(static_cast<unsigned>(bit.m_value)%64, bitBase);

style: spacing

> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:1118
> +        m_assembler.btw_im(static_cast<unsigned>(bit.m_value)%64, bitBase.offset, bitBase.base);

ditto

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:3343
> +                        && andValue->opcode() == BitXor && (andValue->child(1)->isInt32(-1) || andValue->child(1)->isInt64(-1l)) && canBeInternal(andValue)

-1l => -1? Seems easier to read

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:3363
> +                        bitOffset = andMask->child(1);

Does this work for RotL if andMask->child(1) is > 32/64? Can you add a test for the expected behavior? Ditto above for RotR

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:3368
> +                    if (!negationNode && bitBase && bitBase->opcode() == BitXor && (bitBase->child(1)->isInt32(-1) || bitBase->child(1)->isInt64(-1l))) {

I think you need to check that bitBase canBeInternal
Comment 10 Justin Michaud 2019-07-22 19:06:00 PDT
(In reply to Saam Barati from comment #9)
> Comment on attachment 374394 [details]
> Patch
> 
> > Source/JavaScriptCore/b3/B3LowerToAir.cpp:3343
> > +                        && andValue->opcode() == BitXor && (andValue->child(1)->isInt32(-1) || andValue->child(1)->isInt64(-1l)) && canBeInternal(andValue)
> 
> -1l => -1? Seems easier to read
Different semantics. -1 is 0x0FFFFFFFF, but -1l is all 1s.
> 
> > Source/JavaScriptCore/b3/B3LowerToAir.cpp:3363
> > +                        bitOffset = andMask->child(1);
> 
> Does this work for RotL if andMask->child(1) is > 32/64? Can you add a test
> for the expected behavior? Ditto above for RotR
Good point, just going to remove it if that's OK with you.
Comment 11 Justin Michaud 2019-07-22 19:50:06 PDT
Created attachment 374666 [details]
Patch
Comment 12 EWS Watchlist 2019-07-22 22:15:21 PDT
Comment on attachment 374666 [details]
Patch

Attachment 374666 [details] did not pass jsc-ews (mac):
Output: https://webkit-queues.webkit.org/results/12792718

New failing tests:
mozilla-tests.yaml/js1_5/Array/regress-101964.js.mozilla-dfg-eager-no-cjit-validate-phases
Comment 13 Filip Pizlo 2019-07-26 11:41:52 PDT
Comment on attachment 374666 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=374666&action=review

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:3388
> +                                    }

This needs an else case for inclusive matching, or it needs to just use the commit unconditionally.

I recommend committing unconditionally since that's just easier.
Comment 14 Justin Michaud 2019-07-26 17:10:45 PDT
Created attachment 375001 [details]
Patch
Comment 15 Keith Miller 2019-07-26 17:58:57 PDT
Comment on attachment 375001 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=375001&action=review

r=me.

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:3335
> +                    Value* bitBase = nullptr;

Can we call this value or testValue? bitBase is kinda confusing.

It would probably be good to change this everywhere.

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:3375
> +                        for (auto& basePromise : Vector<ArgPromise>::from(loadPromise(bitBase), tmpPromise(bitBase))) {

I think you can just do:

for (auto& basePromise : { loadPromise(bitBase), tmpPromise(bitBase) }) {

You also won't do a heap allocation that way. Although, maybe the compiler will elide that?
Comment 16 Justin Michaud 2019-07-26 19:10:43 PDT
(In reply to Keith Miller from comment #15)
> 
> I think you can just do:
> 
> for (auto& basePromise : { loadPromise(bitBase), tmpPromise(bitBase) }) {
> 
> You also won't do a heap allocation that way. Although, maybe the compiler
> will elide that?

It needs to be mutable, so an initializer list won't work :(
Comment 17 Justin Michaud 2019-07-26 19:50:30 PDT
Created attachment 375011 [details]
Patch
Comment 18 Justin Michaud 2019-07-26 19:52:13 PDT
Created attachment 375012 [details]
patch
Comment 19 WebKit Commit Bot 2019-07-27 00:08:08 PDT
Comment on attachment 375012 [details]
patch

Clearing flags on attachment: 375012

Committed r247889: <https://trac.webkit.org/changeset/247889>
Comment 20 WebKit Commit Bot 2019-07-27 00:08:10 PDT
All reviewed patches have been landed.  Closing bug.
Comment 21 Radar WebKit Bug Importer 2019-07-27 00:09:19 PDT
<rdar://problem/53615684>
Comment 22 Ryan Haddad 2019-07-29 09:34:56 PDT
microbenchmarks/bit-test-load.js.ftl-no-cjit-small-pool is timing out on the debug JSC bot. 

It is passing on the release bot, so maybe we need to skip it on debug because it is very slow?
Comment 23 Justin Michaud 2019-07-29 10:45:17 PDT
(In reply to Ryan Haddad from comment #22)
> microbenchmarks/bit-test-load.js.ftl-no-cjit-small-pool is timing out on the
> debug JSC bot. 
> 
> It is passing on the release bot, so maybe we need to skip it on debug
> because it is very slow?

I will just cut the number of iterations down. Just a sec.
Comment 24 Justin Michaud 2019-07-29 10:53:57 PDT
Reopening to attach new patch.
Comment 25 Justin Michaud 2019-07-29 10:53:58 PDT
Created attachment 375084 [details]
Patch
Comment 26 Robin Morisset 2019-07-29 11:08:19 PDT
Comment on attachment 375084 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=375084&action=review

r=me

> JSTests/ChangeLog:8
> +        * microbenchmarks/bit-test-load.js:

Maybe just add a line to the changelog?
Comment 27 Justin Michaud 2019-07-29 11:12:02 PDT
Created attachment 375086 [details]
patch
Comment 28 WebKit Commit Bot 2019-07-29 11:54:52 PDT
Comment on attachment 375086 [details]
patch

Clearing flags on attachment: 375086

Committed r247913: <https://trac.webkit.org/changeset/247913>
Comment 29 WebKit Commit Bot 2019-07-29 11:54:54 PDT
All reviewed patches have been landed.  Closing bug.