RESOLVED FIXED 124904
Add a bunch of early exits and local optimizations to the x86 assembler.
https://bugs.webkit.org/show_bug.cgi?id=124904
Summary Add a bunch of early exits and local optimizations to the x86 assembler.
Nadav Rotem
Reported 2013-11-26 16:45:18 PST
Add a bunch of early exits and local optimizations to the x86 assembler.
Attachments
Patch (7.35 KB, patch)
2013-11-26 16:45 PST, Nadav Rotem
no flags
Archive of layout-test-results from webkit-cq-01 for mac-mountainlion (744.15 KB, application/zip)
2013-11-27 10:14 PST, WebKit Commit Bot
no flags
Patch (8.62 KB, patch)
2013-11-27 22:31 PST, Nadav Rotem
no flags
Nadav Rotem
Comment 1 2013-11-26 16:45:29 PST
Nadav Rotem
Comment 2 2013-11-26 16:47:05 PST
The Fixup peephole does not catch all of the ASM.js Or-with-zero patterns. This patch gets rid of the rest of them. Once this patch gets in I will add more patterns and look at other targets.
Filip Pizlo
Comment 3 2013-11-26 17:11:04 PST
Comment on attachment 217911 [details] Patch Yup, this looks right. Fwiw, if a client of this API wants flags they have to call branchBlah. I think the arm assemblers may implement bramchBlah in terms of blah so you'd have to be careful there.
Nadav Rotem
Comment 4 2013-11-26 18:57:32 PST
Do we have any ARM bots ?
Filip Pizlo
Comment 5 2013-11-26 19:00:57 PST
(In reply to comment #4) > Do we have any ARM bots ? No. :-(
WebKit Commit Bot
Comment 6 2013-11-27 10:14:28 PST
Comment on attachment 217911 [details] Patch Rejecting attachment 217911 [details] from commit-queue. New failing tests: webaudio/audiobuffersource-playbackrate.html webaudio/codec-tests/mp3/128kbps-44khz.html webaudio/codec-tests/wav/24bit-22khz-resample.html webgl/1.0.1/conformance/glsl/functions/glsl-function-distance.html webgl/1.0.2/conformance/ogles/GL/any/any_001_to_004.html webgl/1.0.1/conformance/glsl/functions/glsl-function-clamp-float.html webaudio/codec-tests/wav/24bit-44khz.html webgl/1.0.2/conformance/glsl/functions/glsl-function-ceil.html webgl/1.0.2/conformance/ogles/GL/array/array_001_to_006.html webgl/1.0.2/conformance/ogles/GL/asin/asin_001_to_006.html webaudio/audiobuffersource-multi-channels.html webgl/1.0.2/conformance/ogles/GL/acos/acos_001_to_006.html webgl/1.0.2/conformance/glsl/functions/glsl-function-clamp-gentype.html webgl/1.0.2/conformance/ogles/GL/all/all_001_to_004.html webgl/1.0.2/conformance/ogles/GL/cross/cross_001_to_002.html webgl/1.0.2/conformance/ogles/GL/cos/cos_001_to_006.html webaudio/codec-tests/aac/vbr-128kbps-44khz.html webgl/1.0.2/conformance/ogles/GL/abs/abs_001_to_006.html webgl/1.0.2/conformance/ogles/GL/atan/atan_009_to_012.html webgl/1.0.2/conformance/ogles/GL/atan/atan_001_to_008.html webgl/1.0.1/conformance/glsl/functions/glsl-function-ceil.html webgl/1.0.1/conformance/glsl/functions/glsl-function-abs.html webgl/1.0.2/conformance/ogles/GL/ceil/ceil_001_to_006.html webgl/1.0.2/conformance/glsl/functions/glsl-function-distance.html webgl/1.0.2/conformance/ogles/GL/biConstants/biConstants_001_to_008.html webgl/1.0.2/conformance/ogles/GL/control_flow/control_flow_001_to_008.html webgl/1.0.2/conformance/glsl/functions/glsl-function-clamp-float.html webgl/1.0.1/conformance/glsl/functions/glsl-function-clamp-gentype.html webgl/1.0.2/conformance/ogles/GL/biConstants/biConstants_009_to_016.html webgl/1.0.2/conformance/glsl/functions/glsl-function-abs.html Full output: http://webkit-queues.appspot.com/results/38628027
WebKit Commit Bot
Comment 7 2013-11-27 10:14:30 PST
Created attachment 217956 [details] Archive of layout-test-results from webkit-cq-01 for mac-mountainlion The attached test failures were seen while running run-webkit-tests on the commit-queue. Bot: webkit-cq-01 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Filip Pizlo
Comment 8 2013-11-27 10:27:56 PST
Oh, it looks like MacroAssemblerX86_64 uses the branchBlah-implemented-in-terms-of-blah idiom: Jump branchAdd32(ResultCondition cond, TrustedImm32 src, AbsoluteAddress dest) { move(TrustedImmPtr(dest.m_ptr), scratchRegister); add32(src, Address(scratchRegister)); return Jump(m_assembler.jCC(x86Condition(cond))); } That might be at fault for some of those test failures. I recommend fixing it by just having those kinds of branch methods call m_assembler.addXYZ directly, or by adding a variant of add32/add64 called add32WithFlags. If you add such a method you can make it protected.
Nadav Rotem
Comment 9 2013-11-27 21:19:37 PST
This is strange. The tests pass locally. I will go over the branchXXX instructions and look for more violations.
Nadav Rotem
Comment 10 2013-11-27 22:31:06 PST
Nadav Rotem
Comment 11 2013-11-27 22:31:28 PST
Added a new patch. Let's see if the bots like this one.
Filip Pizlo
Comment 12 2013-11-27 22:34:52 PST
Comment on attachment 217982 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=217982&action=review > Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:1246 > + m_assembler.addl_rr(src, dest); Indentation looks weird - are you using tabs? WebKit hackers use a mix of 4-space-tab editors and 8-space-tab editors so as a compromise we require tabs to be disabled.
Nadav Rotem
Comment 13 2013-11-27 23:10:50 PST
The file looks fine to me. Are you sure there is a problem with line 1246? Usually webkit-patch warn about problems like this. This is my vimrc, which matches the LLVM coding style. When I hack on webkit I manually add spaces to align the code to 4 spaces. set tabstop=2 set shiftwidth=2 set expandtab
Filip Pizlo
Comment 14 2013-11-27 23:14:17 PST
(In reply to comment #13) > The file looks fine to me. Are you sure there is a problem with line 1246? Usually webkit-patch warn about problems like this. > > This is my vimrc, which matches the LLVM coding style. When I hack on webkit I manually add spaces to align the code to 4 spaces. > > set tabstop=2 > set shiftwidth=2 > set expandtab Actually, that line is fine. :-) It was a WebKit bug, that caused that line to show up weird in the patch review viewer.
WebKit Commit Bot
Comment 15 2013-11-27 23:37:26 PST
Comment on attachment 217982 [details] Patch Clearing flags on attachment: 217982 Committed r159835: <http://trac.webkit.org/changeset/159835>
WebKit Commit Bot
Comment 16 2013-11-27 23:37:28 PST
All reviewed patches have been landed. Closing bug.
Csaba Osztrogonác
Comment 17 2013-11-28 03:55:12 PST
(In reply to comment #15) > (From update of attachment 217982 [details]) > Clearing flags on attachment: 217982 > > Committed r159835: <http://trac.webkit.org/changeset/159835> FYI, it broke 3 jsc tests and a layout test in debug mode: ** The following JSC stress test failures have been introduced: regress/script-tests/stepanov_container.js.default regress/script-tests/emscripten-cube2hash.js.no-cjit-validate-phases regress/script-tests/emscripten-cube2hash.js.dfg-eager-no-cjit-validate Regressions: Unexpected crashes (1) js/regress/stepanov_container.html [ Crash ] See the Mac Debug bots for details.
Nadav Rotem
Comment 18 2013-11-28 10:23:02 PST
Oops. I am sorry for breaking the build. I am trying to reproduce it locally. My I revert this commit without a review?
Csaba Osztrogonác
Comment 19 2013-11-28 10:25:08 PST
Not the build, only some failures in debug mode. If you can fix it in a reasonable time, you don't need to rollout it.
Nadav Rotem
Comment 20 2013-11-28 10:25:56 PST
I can reproduce it locally. This looks like the problem: 1374 Jump branchOr32(ResultCondition cond, RegisterID src, RegisterID dest) 1375 { 1376 or32(src, dest); 1377 return Jump(m_assembler.jCC(x86Condition(cond))); 1378 }
Filip Pizlo
Comment 21 2013-11-28 10:29:56 PST
(In reply to comment #20) > I can reproduce it locally. This looks like the problem: > > 1374 Jump branchOr32(ResultCondition cond, RegisterID src, RegisterID dest) > 1375 { > 1376 or32(src, dest); > 1377 return Jump(m_assembler.jCC(x86Condition(cond))); > 1378 } Yup. Rs=me to fix. Yes, any committer can roll things out without review. "Rs=me" means that your change is "rubber stamped" - so I know roughly what you're going to do and I trust that it'll be fine, and you can land without a formal review.
Nadav Rotem
Comment 22 2013-11-28 10:37:05 PST
Another problem: 178 Jump branchAdd32(ResultCondition cond, TrustedImm32 src, AbsoluteAddress dest) 179 { 180 move(TrustedImmPtr(dest.m_ptr), scratchRegister); 181 add32(src, Address(scratchRegister)); 182 return Jump(m_assembler.jCC(x86Condition(cond))); 183 }
Nadav Rotem
Comment 23 2013-11-28 11:05:21 PST
I can't figure it out yet. I plan to revert this patch in https://bugs.webkit.org/show_bug.cgi?id=124988.
Nadav Rotem
Comment 24 2013-11-28 14:38:11 PST
Bah, here is the problem: 0x10a49e1ce: or $0x0, %eax 0x10a49e1d1: mov $0xffffffff, %r11 0x10a49e1db: cmp %r11, %rax In 64bits mode we assume that the OR clears the upper part of the register... We can't just drop the or $0x0 because it will not zero the upper part.
Filip Pizlo
Comment 25 2013-11-28 14:53:20 PST
(In reply to comment #24) > Bah, here is the problem: > > 0x10a49e1ce: or $0x0, %eax > 0x10a49e1d1: mov $0xffffffff, %r11 > 0x10a49e1db: cmp %r11, %rax > > In 64bits mode we assume that the OR clears the upper part of the register... We can't just drop the or $0x0 because it will not zero the upper part. Oh of course! So youc. Any convert or 0 into zext, which happens to just be a movl on x86-64. Seems to imply that we need to remove or 0 in DFG IR. If Fixup missed some cases then maybe we can move this into CSEPhase. I'm kind of thinking we could make that into a general late-folding phase. Note that in CSEPhase you don't want to convert to Identity; instead you want to just directly use csephase's setReplacement primitive.
Nadav Rotem
Comment 26 2013-11-28 14:54:42 PST
Why don't you want to implement this at the MacroAssembler level ?
Nadav Rotem
Comment 27 2013-11-28 15:36:46 PST
I suggest that we change the MacroAssembler to add abstraction, not only missing functionality. We can rename the target specific implementations of instructions to a new name and make MacroAssembler use them. For example or32 would become t_or32, and MacroAssembler::or32 would call t_or32, and optimize some of the cases. Does that sound like a good idea ?
Filip Pizlo
Comment 28 2013-11-28 17:01:35 PST
(In reply to comment #27) > I suggest that we change the MacroAssembler to add abstraction, not only missing functionality. We can rename the target specific implementations of instructions to a new name and make MacroAssembler use them. For example or32 would become t_or32, and MacroAssembler::or32 would call t_or32, and optimize some of the cases. Does that sound like a good idea ? What would the semantics of t_or32 be? Note that the DFG currently assumes that the result of an or32 operation is that the high bits are zero - so this could be a fairly subtle change. In particular, it's profitable for the DFG to know that a genuine BitOr will clear the high bits. This is why I suspect that the most powerful form of this optimization would not be at the masm level.
Note You need to log in before you can comment on or make changes to this bug.