Bug 124904

Summary: Add a bunch of early exits and local optimizations to the x86 assembler.
Product: WebKit Reporter: Nadav Rotem <nrotem>
Component: New BugsAssignee: Nadav Rotem <nrotem>
Status: RESOLVED FIXED    
Severity: Normal CC: commit-queue, fpizlo, ggaren, ossy
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Archive of layout-test-results from webkit-cq-01 for mac-mountainlion
none
Patch none

Description Nadav Rotem 2013-11-26 16:45:18 PST
Add a bunch of early exits and local optimizations to the x86 assembler.
Comment 1 Nadav Rotem 2013-11-26 16:45:29 PST
Created attachment 217911 [details]
Patch
Comment 2 Nadav Rotem 2013-11-26 16:47:05 PST
The Fixup peephole does not catch all of the ASM.js Or-with-zero patterns. This patch gets rid of the rest of them.  Once this patch gets in I will add more patterns and look at other targets.
Comment 3 Filip Pizlo 2013-11-26 17:11:04 PST
Comment on attachment 217911 [details]
Patch

Yup, this looks right. Fwiw, if a client of this API wants flags they have to call branchBlah. I think the arm assemblers may implement bramchBlah in terms of blah so you'd have to be careful there.
Comment 4 Nadav Rotem 2013-11-26 18:57:32 PST
Do we have any ARM bots ?
Comment 5 Filip Pizlo 2013-11-26 19:00:57 PST
(In reply to comment #4)
> Do we have any ARM bots ?

No. :-(
Comment 6 WebKit Commit Bot 2013-11-27 10:14:28 PST
Comment on attachment 217911 [details]
Patch

Rejecting attachment 217911 [details] from commit-queue.

New failing tests:
webaudio/audiobuffersource-playbackrate.html
webaudio/codec-tests/mp3/128kbps-44khz.html
webaudio/codec-tests/wav/24bit-22khz-resample.html
webgl/1.0.1/conformance/glsl/functions/glsl-function-distance.html
webgl/1.0.2/conformance/ogles/GL/any/any_001_to_004.html
webgl/1.0.1/conformance/glsl/functions/glsl-function-clamp-float.html
webaudio/codec-tests/wav/24bit-44khz.html
webgl/1.0.2/conformance/glsl/functions/glsl-function-ceil.html
webgl/1.0.2/conformance/ogles/GL/array/array_001_to_006.html
webgl/1.0.2/conformance/ogles/GL/asin/asin_001_to_006.html
webaudio/audiobuffersource-multi-channels.html
webgl/1.0.2/conformance/ogles/GL/acos/acos_001_to_006.html
webgl/1.0.2/conformance/glsl/functions/glsl-function-clamp-gentype.html
webgl/1.0.2/conformance/ogles/GL/all/all_001_to_004.html
webgl/1.0.2/conformance/ogles/GL/cross/cross_001_to_002.html
webgl/1.0.2/conformance/ogles/GL/cos/cos_001_to_006.html
webaudio/codec-tests/aac/vbr-128kbps-44khz.html
webgl/1.0.2/conformance/ogles/GL/abs/abs_001_to_006.html
webgl/1.0.2/conformance/ogles/GL/atan/atan_009_to_012.html
webgl/1.0.2/conformance/ogles/GL/atan/atan_001_to_008.html
webgl/1.0.1/conformance/glsl/functions/glsl-function-ceil.html
webgl/1.0.1/conformance/glsl/functions/glsl-function-abs.html
webgl/1.0.2/conformance/ogles/GL/ceil/ceil_001_to_006.html
webgl/1.0.2/conformance/glsl/functions/glsl-function-distance.html
webgl/1.0.2/conformance/ogles/GL/biConstants/biConstants_001_to_008.html
webgl/1.0.2/conformance/ogles/GL/control_flow/control_flow_001_to_008.html
webgl/1.0.2/conformance/glsl/functions/glsl-function-clamp-float.html
webgl/1.0.1/conformance/glsl/functions/glsl-function-clamp-gentype.html
webgl/1.0.2/conformance/ogles/GL/biConstants/biConstants_009_to_016.html
webgl/1.0.2/conformance/glsl/functions/glsl-function-abs.html
Full output: http://webkit-queues.appspot.com/results/38628027
Comment 7 WebKit Commit Bot 2013-11-27 10:14:30 PST
Created attachment 217956 [details]
Archive of layout-test-results from webkit-cq-01 for mac-mountainlion

The attached test failures were seen while running run-webkit-tests on the commit-queue.
Bot: webkit-cq-01  Port: mac-mountainlion  Platform: Mac OS X 10.8.5
Comment 8 Filip Pizlo 2013-11-27 10:27:56 PST
Oh, it looks like MacroAssemblerX86_64 uses the branchBlah-implemented-in-terms-of-blah idiom:

    Jump branchAdd32(ResultCondition cond, TrustedImm32 src, AbsoluteAddress dest)
    {
        move(TrustedImmPtr(dest.m_ptr), scratchRegister);
        add32(src, Address(scratchRegister));
        return Jump(m_assembler.jCC(x86Condition(cond)));
    }

That might be at fault for some of those test failures.

I recommend fixing it by just having those kinds of branch methods call m_assembler.addXYZ directly, or by adding a variant of add32/add64 called add32WithFlags.  If you add such a method you can make it protected.
Comment 9 Nadav Rotem 2013-11-27 21:19:37 PST
This is strange. The tests pass locally.  I will go over the branchXXX instructions and look for more violations.
Comment 10 Nadav Rotem 2013-11-27 22:31:06 PST
Created attachment 217982 [details]
Patch
Comment 11 Nadav Rotem 2013-11-27 22:31:28 PST
Added a new patch. Let's see if the bots like this one.
Comment 12 Filip Pizlo 2013-11-27 22:34:52 PST
Comment on attachment 217982 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=217982&action=review

> Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:1246
> +        m_assembler.addl_rr(src, dest);

Indentation looks weird - are you using tabs?  WebKit hackers use a mix of 4-space-tab editors and 8-space-tab editors so as a compromise we require tabs to be disabled.
Comment 13 Nadav Rotem 2013-11-27 23:10:50 PST
The file looks fine to me. Are you sure there is a problem with line 1246?  Usually webkit-patch warn about  problems like this. 

This is my vimrc, which matches the LLVM coding style. When I hack on webkit I manually add spaces to align the code to 4 spaces.

 set tabstop=2
 set shiftwidth=2
 set expandtab
Comment 14 Filip Pizlo 2013-11-27 23:14:17 PST
(In reply to comment #13)
> The file looks fine to me. Are you sure there is a problem with line 1246?  Usually webkit-patch warn about  problems like this. 
> 
> This is my vimrc, which matches the LLVM coding style. When I hack on webkit I manually add spaces to align the code to 4 spaces.
> 
>  set tabstop=2
>  set shiftwidth=2
>  set expandtab

Actually, that line is fine. :-)  It was a WebKit bug, that caused that line to show up weird in the patch review viewer.
Comment 15 WebKit Commit Bot 2013-11-27 23:37:26 PST
Comment on attachment 217982 [details]
Patch

Clearing flags on attachment: 217982

Committed r159835: <http://trac.webkit.org/changeset/159835>
Comment 16 WebKit Commit Bot 2013-11-27 23:37:28 PST
All reviewed patches have been landed.  Closing bug.
Comment 17 Csaba Osztrogon√°c 2013-11-28 03:55:12 PST
(In reply to comment #15)
> (From update of attachment 217982 [details])
> Clearing flags on attachment: 217982
> 
> Committed r159835: <http://trac.webkit.org/changeset/159835>

FYI, it broke 3 jsc tests and a layout test in debug mode:

** The following JSC stress test failures have been introduced:
	regress/script-tests/stepanov_container.js.default
	regress/script-tests/emscripten-cube2hash.js.no-cjit-validate-phases
	regress/script-tests/emscripten-cube2hash.js.dfg-eager-no-cjit-validate

Regressions: Unexpected crashes (1)
  js/regress/stepanov_container.html [ Crash ]



See the Mac Debug bots for details.
Comment 18 Nadav Rotem 2013-11-28 10:23:02 PST
Oops. I am sorry for breaking the build. I am trying to reproduce it locally. 

My I revert this commit without a review?
Comment 19 Csaba Osztrogon√°c 2013-11-28 10:25:08 PST
Not the build, only some failures in debug mode. If you can 
fix it in a reasonable time, you don't need to rollout it.
Comment 20 Nadav Rotem 2013-11-28 10:25:56 PST
I can reproduce it locally. This looks like the problem:

1374     Jump branchOr32(ResultCondition cond, RegisterID src, RegisterID dest)
1375     {
1376         or32(src, dest);
1377         return Jump(m_assembler.jCC(x86Condition(cond)));
1378     }
Comment 21 Filip Pizlo 2013-11-28 10:29:56 PST
(In reply to comment #20)
> I can reproduce it locally. This looks like the problem:
> 
> 1374     Jump branchOr32(ResultCondition cond, RegisterID src, RegisterID dest)
> 1375     {
> 1376         or32(src, dest);
> 1377         return Jump(m_assembler.jCC(x86Condition(cond)));
> 1378     }

Yup. Rs=me to fix. 

Yes, any committer can roll things out without review. 

"Rs=me" means that your change is "rubber stamped" - so I know roughly what you're going to do and I trust that it'll be fine, and you can land without a formal review.
Comment 22 Nadav Rotem 2013-11-28 10:37:05 PST
Another problem:

178     Jump branchAdd32(ResultCondition cond, TrustedImm32 src, AbsoluteAddress dest)
179     {
180         move(TrustedImmPtr(dest.m_ptr), scratchRegister);
181         add32(src, Address(scratchRegister));
182         return Jump(m_assembler.jCC(x86Condition(cond)));
183     }
Comment 23 Nadav Rotem 2013-11-28 11:05:21 PST
I can't figure it out yet. I plan to revert this patch in https://bugs.webkit.org/show_bug.cgi?id=124988.
Comment 24 Nadav Rotem 2013-11-28 14:38:11 PST
Bah, here is the problem:

                0x10a49e1ce: or $0x0, %eax
                0x10a49e1d1: mov $0xffffffff, %r11
                 0x10a49e1db: cmp %r11, %rax

In 64bits mode we assume that the OR clears the upper part of the register...    We can't just drop the or $0x0 because it will not zero the upper part.
Comment 25 Filip Pizlo 2013-11-28 14:53:20 PST
(In reply to comment #24)
> Bah, here is the problem:
> 
>                 0x10a49e1ce: or $0x0, %eax
>                 0x10a49e1d1: mov $0xffffffff, %r11
>                  0x10a49e1db: cmp %r11, %rax
> 
> In 64bits mode we assume that the OR clears the upper part of the register...    We can't just drop the or $0x0 because it will not zero the upper part.

Oh of course!

So youc. Any convert or 0 into zext, which happens to just be a movl on x86-64. 

Seems to imply that we need to remove or 0 in DFG IR. If Fixup missed some cases then maybe we can move this into CSEPhase. I'm kind of thinking we could make that into a general late-folding phase. Note that in CSEPhase you don't want to convert to Identity; instead you want to just directly use csephase's setReplacement primitive.
Comment 26 Nadav Rotem 2013-11-28 14:54:42 PST
Why don't you want to implement this at the MacroAssembler level ?
Comment 27 Nadav Rotem 2013-11-28 15:36:46 PST
I suggest that we change the MacroAssembler to add abstraction, not only missing functionality.  We can rename the target specific implementations of instructions to a new name and make MacroAssembler use them. For example or32 would become t_or32, and MacroAssembler::or32 would call t_or32, and optimize some of the cases. Does that sound like a good idea ?
Comment 28 Filip Pizlo 2013-11-28 17:01:35 PST
(In reply to comment #27)
> I suggest that we change the MacroAssembler to add abstraction, not only missing functionality.  We can rename the target specific implementations of instructions to a new name and make MacroAssembler use them. For example or32 would become t_or32, and MacroAssembler::or32 would call t_or32, and optimize some of the cases. Does that sound like a good idea ?

What would the semantics of t_or32 be?

Note that the DFG currently assumes that the result of an or32 operation is that the high bits are zero - so this could be a fairly subtle change. In particular, it's profitable for the DFG to know that a genuine BitOr will clear the high bits. This is why I suspect that the most powerful form of this optimization would not be at the masm level.