Bug 105719 - Optimize TransformationMatrix::multiply() for x86_64
Summary: Optimize TransformationMatrix::multiply() for x86_64
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Benjamin Poulain
URL:
Keywords:
Depends on: 106019 106025
Blocks:
  Show dependency treegraph
 
Reported: 2012-12-24 07:17 PST by Benjamin Poulain
Modified: 2013-01-04 16:35 PST (History)
8 users (show)

See Also:


Attachments
Patch (9.34 KB, patch)
2012-12-24 10:18 PST, Benjamin Poulain
no flags Details | Formatted Diff | Diff
Patch (9.46 KB, patch)
2013-01-04 14:40 PST, Benjamin Poulain
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Benjamin Poulain 2012-12-24 07:17:42 PST
Use the hardware better :)
Comment 1 Benjamin Poulain 2012-12-24 10:18:08 PST
Created attachment 180678 [details]
Patch
Comment 2 Benjamin Poulain 2013-01-02 13:21:04 PST
Comment on attachment 180678 [details]
Patch

Clearing flags on attachment: 180678

Committed r138640: <http://trac.webkit.org/changeset/138640>
Comment 3 Benjamin Poulain 2013-01-02 13:21:06 PST
All reviewed patches have been landed.  Closing bug.
Comment 4 Dana Jansens 2013-01-03 08:47:44 PST
Looks like this change is causing crashes: http://code.google.com/p/chromium/issues/detail?id=168173

Sorry, but it looks like it needs to be reverted. I've also confirmed locally that reverting this fixes the crashes.
Comment 5 WebKit Review Bot 2013-01-03 08:49:49 PST
Re-opened since this is blocked by bug 106019
Comment 6 Benjamin Poulain 2013-01-03 09:36:35 PST
Did you just revert based on a downstream issue? (this has been discussed recently on the mailing list)

Can you give me more information about the problem. As far as I know, no WebKit test failed with the patch. How did you rule out your compiler or project settings?

I am honestly more than a little annoyed this was reverted without any information for me to work on.
Comment 7 Dana Jansens 2013-01-03 09:44:54 PST
We ran the test in gdb for quite some time but were unable to get any much information about what was causing the crash. As you can see in the backtrace on the linked bug, the crash in gdb is happening at the entrance to the method. The pointers going into the method are all fine, of course, and gdb didn't have anything interesting to say about the two matrices, they look valid.

I can provide the contents of them for you though, if you feel that will help.

This seems to point to a problem in the implementation of the method, which is surely going to cause a problem for all ports. These tests just happen to cause it to trigger reliably. This has really nothing to do with "downstream vs upstream" as far as I can tell. I don't know what project settings you're referring to that would change whether multiply() should crash or not given two matrices.

I'm sorry we don't have better test coverage on the webkit bots for functions like this, but I don't think it's that unusual for chromium bots to uncover bugs or problems that the webkit bots do not for that reason. We're looking into running the chromium compositor unit tests on the EWS bot, or on the canary waterfall, which would have helped here.

(gdb) frame 0
#0  0x00007ffff42dce67 in WebCore::TransformationMatrix::multiply (this=0x7fffffffd4c8, mat=...) at ../../third_party/WebKit/Source/WebCore/platform/graphics/transforms/TransformationMatrix.cpp:977
977	{
(gdb) p *this
$1 = {m_matrix =     {[0] =       {[0] = 1,
      [1] = 0,
      [2] = 0,
      [3] = 0},
    [1] =       {[0] = 0,
      [1] = 1,
      [2] = 0,
      [3] = 0},
    [2] =       {[0] = 0,
      [1] = 0,
      [2] = 1,
      [3] = 0},
    [3] =       {[0] = 0,
      [1] = 0,
      [2] = 0,
      [3] = 1}}}
(gdb) p mat
$2 = (const WebCore::TransformationMatrix &) @0xb64920: {m_matrix =     {[0] =       {[0] = 1,
      [1] = 0,
      [2] = 0,
      [3] = 0},
    [1] =       {[0] = 0,
      [1] = 1,
      [2] = 0,
      [3] = 0},
    [2] =       {[0] = 0,
      [1] = 0,
      [2] = 1,
      [3] = 0},
    [3] =       {[0] = 2,
      [1] = 0,
      [2] = 0,
      [3] = 1}}}
Comment 8 Benjamin Poulain 2013-01-03 09:49:57 PST
I suspect you simply have an alignment problem. There are many ways to screw that up despite completely correct code.

Please give me the disassembly at the point of crash and the content of registers. What compiler are you using?
Comment 9 Dana Jansens 2013-01-03 09:53:26 PST
Compiler:
% clang++ --version
clang version 3.3 (trunk 170392)
Target: x86_64-unknown-linux-gnu
Thread model: posix

Registers:
rax            0xb64920	11946272
rbx            0x0	0
rcx            0x0	0
rdx            0x7fffffffd4c8	140737488344264
rsi            0x7fffffffd4c8	140737488344264
rdi            0x7fffffffd4c8	140737488344264
rbp            0x7fffffffd150	0x7fffffffd150
rsp            0x7fffffffc650	0x7fffffffc650
r8             0x0	0
r9             0xfffffff7	4294967287
r10            0x0	0
r11            0x0	0
r12            0x42dce0	4381920
r13            0x7fffffffdb50	140737488345936
r14            0x0	0
r15            0x0	0
rip            0x7ffff42dce67	0x7ffff42dce67 <WebCore::TransformationMatrix::multiply(WebCore::TransformationMatrix const&)+39>
eflags         0x10206	[ PF IF RF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0

Dump of assembler code for function WebCore::TransformationMatrix::multiply(WebCore::TransformationMatrix const&):
   0x00007ffff42dce40 <+0>:	sub    $0xad8,%rsp
   0x00007ffff42dce47 <+7>:	mov    %rdi,0x90(%rsp)
   0x00007ffff42dce4f <+15>:	mov    %rsi,0x88(%rsp)
   0x00007ffff42dce57 <+23>:	mov    0x90(%rsp),%rsi
   0x00007ffff42dce5f <+31>:	mov    %rsi,0x98(%rsp)
=> 0x00007ffff42dce67 <+39>:	movapd (%rsi),%xmm0
   0x00007ffff42dce6b <+43>:	movapd %xmm0,0x70(%rsp)
   0x00007ffff42dce71 <+49>:	mov    %rsi,%rdi
   0x00007ffff42dce74 <+52>:	add    $0x20,%rdi
   0x00007ffff42dce78 <+56>:	mov    %rdi,0xad0(%rsp)
   0x00007ffff42dce80 <+64>:	movapd 0x20(%rsi),%xmm0
   0x00007ffff42dce85 <+69>:	movapd %xmm0,0x60(%rsp)
   0x00007ffff42dce8b <+75>:	mov    %rsi,%rax
   0x00007ffff42dce8e <+78>:	add    $0x40,%rax
   0x00007ffff42dce92 <+82>:	mov    %rax,0xac8(%rsp)
   0x00007ffff42dce9a <+90>:	movapd 0x40(%rsi),%xmm0
   0x00007ffff42dce9f <+95>:	movapd %xmm0,0x50(%rsp)
   0x00007ffff42dcea5 <+101>:	mov    %rsi,%rcx
   0x00007ffff42dcea8 <+104>:	add    $0x60,%rcx
   0x00007ffff42dceac <+108>:	mov    %rcx,0xac0(%rsp)
   0x00007ffff42dceb4 <+116>:	movapd 0x60(%rsi),%xmm0
   0x00007ffff42dceb9 <+121>:	movapd %xmm0,0x40(%rsp)
   0x00007ffff42dcebf <+127>:	mov    %rsi,%rdx
   0x00007ffff42dcec2 <+130>:	add    $0x10,%rdx
   0x00007ffff42dcec6 <+134>:	mov    %rdx,0xab8(%rsp)
   0x00007ffff42dcece <+142>:	movapd 0x10(%rsi),%xmm0
   0x00007ffff42dced3 <+147>:	movapd %xmm0,0x30(%rsp)
   0x00007ffff42dced9 <+153>:	mov    %rsi,%r8
   0x00007ffff42dcedc <+156>:	add    $0x30,%r8
   0x00007ffff42dcee0 <+160>:	mov    %r8,0xab0(%rsp)
   0x00007ffff42dcee8 <+168>:	movapd 0x30(%rsi),%xmm0
   0x00007ffff42dceed <+173>:	movapd %xmm0,0x20(%rsp)
   0x00007ffff42dcef3 <+179>:	mov    %rsi,%r9
   0x00007ffff42dcef6 <+182>:	add    $0x50,%r9
   0x00007ffff42dcefa <+186>:	mov    %r9,0xaa8(%rsp)
   0x00007ffff42dcf02 <+194>:	movapd 0x50(%rsi),%xmm0
   0x00007ffff42dcf07 <+199>:	movapd %xmm0,0x10(%rsp)
   0x00007ffff42dcf0d <+205>:	mov    %rsi,%r10
   0x00007ffff42dcf10 <+208>:	add    $0x70,%r10
   0x00007ffff42dcf14 <+212>:	mov    %r10,0xaa0(%rsp)
   0x00007ffff42dcf1c <+220>:	movapd 0x70(%rsi),%xmm0
   0x00007ffff42dcf21 <+225>:	movapd %xmm0,(%rsp)
   0x00007ffff42dcf26 <+230>:	mov    0x88(%rsp),%r11
   0x00007ffff42dcf2e <+238>:	movsd  (%r11),%xmm0
   0x00007ffff42dcf33 <+243>:	movsd  %xmm0,0xa98(%rsp)
   0x00007ffff42dcf3c <+252>:	shufpd $0x0,%xmm0,%xmm0
   0x00007ffff42dcf41 <+257>:	movapd %xmm0,0xa80(%rsp)
   0x00007ffff42dcf4a <+266>:	movapd %xmm0,-0x10(%rsp)
   0x00007ffff42dcf50 <+272>:	mov    0x88(%rsp),%r11
   0x00007ffff42dcf58 <+280>:	movsd  0x8(%r11),%xmm0
   0x00007ffff42dcf5e <+286>:	movsd  %xmm0,0xa78(%rsp)
   0x00007ffff42dcf67 <+295>:	shufpd $0x0,%xmm0,%xmm0
   0x00007ffff42dcf6c <+300>:	movapd %xmm0,0xa60(%rsp)
   0x00007ffff42dcf75 <+309>:	movapd %xmm0,-0x20(%rsp)
   0x00007ffff42dcf7b <+315>:	mov    0x88(%rsp),%r11
   0x00007ffff42dcf83 <+323>:	movsd  0x10(%r11),%xmm0
   0x00007ffff42dcf89 <+329>:	movsd  %xmm0,0xa58(%rsp)
   0x00007ffff42dcf92 <+338>:	shufpd $0x0,%xmm0,%xmm0
   0x00007ffff42dcf97 <+343>:	movapd %xmm0,0xa40(%rsp)
   0x00007ffff42dcfa0 <+352>:	movapd %xmm0,-0x30(%rsp)
   0x00007ffff42dcfa6 <+358>:	mov    0x88(%rsp),%r11
   0x00007ffff42dcfae <+366>:	movsd  0x18(%r11),%xmm0
   0x00007ffff42dcfb4 <+372>:	movsd  %xmm0,0xa38(%rsp)
   0x00007ffff42dcfbd <+381>:	shufpd $0x0,%xmm0,%xmm0
   0x00007ffff42dcfc2 <+386>:	movapd %xmm0,0xa20(%rsp)
   0x00007ffff42dcfcb <+395>:	movapd %xmm0,-0x40(%rsp)
   0x00007ffff42dcfd1 <+401>:	movapd 0x70(%rsp),%xmm0
   0x00007ffff42dcfd7 <+407>:	movapd -0x10(%rsp),%xmm1
   0x00007ffff42dcfdd <+413>:	movapd %xmm0,0xa10(%rsp)
   0x00007ffff42dcfe6 <+422>:	movapd %xmm1,0xa00(%rsp)
   0x00007ffff42dcfef <+431>:	movapd 0xa10(%rsp),%xmm0
   0x00007ffff42dcff8 <+440>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dcffc <+444>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd002 <+450>:	movapd 0x60(%rsp),%xmm0
   0x00007ffff42dd008 <+456>:	movapd -0x20(%rsp),%xmm1
   0x00007ffff42dd00e <+462>:	movapd %xmm0,0x9f0(%rsp)
   0x00007ffff42dd017 <+471>:	movapd %xmm1,0x9e0(%rsp)
   0x00007ffff42dd020 <+480>:	movapd 0x9f0(%rsp),%xmm0
   0x00007ffff42dd029 <+489>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd02d <+493>:	movapd %xmm0,-0x60(%rsp)
   0x00007ffff42dd033 <+499>:	movapd 0x50(%rsp),%xmm0
   0x00007ffff42dd039 <+505>:	movapd -0x30(%rsp),%xmm1
   0x00007ffff42dd03f <+511>:	movapd %xmm0,0x9d0(%rsp)
   0x00007ffff42dd048 <+520>:	movapd %xmm1,0x9c0(%rsp)
   0x00007ffff42dd051 <+529>:	movapd 0x9d0(%rsp),%xmm0
   0x00007ffff42dd05a <+538>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd05e <+542>:	movapd %xmm0,-0x70(%rsp)
   0x00007ffff42dd064 <+548>:	movapd 0x40(%rsp),%xmm0
   0x00007ffff42dd06a <+554>:	movapd -0x40(%rsp),%xmm1
   0x00007ffff42dd070 <+560>:	movapd %xmm0,0x9b0(%rsp)
   0x00007ffff42dd079 <+569>:	movapd %xmm1,0x9a0(%rsp)
   0x00007ffff42dd082 <+578>:	movapd 0x9b0(%rsp),%xmm0
   0x00007ffff42dd08b <+587>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd08f <+591>:	movapd %xmm0,-0x80(%rsp)
   0x00007ffff42dd095 <+597>:	movapd -0x50(%rsp),%xmm0
   0x00007ffff42dd09b <+603>:	movapd -0x60(%rsp),%xmm1
   0x00007ffff42dd0a1 <+609>:	movapd %xmm0,0x990(%rsp)
   0x00007ffff42dd0aa <+618>:	movapd %xmm1,0x980(%rsp)
   0x00007ffff42dd0b3 <+627>:	movapd 0x990(%rsp),%xmm0
   0x00007ffff42dd0bc <+636>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd0c0 <+640>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd0c6 <+646>:	movapd -0x70(%rsp),%xmm1
   0x00007ffff42dd0cc <+652>:	movapd %xmm0,0x970(%rsp)
   0x00007ffff42dd0d5 <+661>:	movapd %xmm1,0x960(%rsp)
   0x00007ffff42dd0de <+670>:	movapd 0x970(%rsp),%xmm0
   0x00007ffff42dd0e7 <+679>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd0eb <+683>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd0f1 <+689>:	movapd -0x80(%rsp),%xmm1
   0x00007ffff42dd0f7 <+695>:	movapd %xmm0,0x950(%rsp)
   0x00007ffff42dd100 <+704>:	movapd %xmm1,0x940(%rsp)
   0x00007ffff42dd109 <+713>:	movapd 0x950(%rsp),%xmm0
   0x00007ffff42dd112 <+722>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd116 <+726>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd11c <+732>:	mov    %rsi,0x938(%rsp)
   0x00007ffff42dd124 <+740>:	movapd %xmm0,0x920(%rsp)
   0x00007ffff42dd12d <+749>:	mov    0x938(%rsp),%r11
   0x00007ffff42dd135 <+757>:	movapd %xmm0,(%r11)
   0x00007ffff42dd13a <+762>:	movapd 0x30(%rsp),%xmm0
   0x00007ffff42dd140 <+768>:	movapd -0x10(%rsp),%xmm1
   0x00007ffff42dd146 <+774>:	movapd %xmm0,0x910(%rsp)
   0x00007ffff42dd14f <+783>:	movapd %xmm1,0x900(%rsp)
   0x00007ffff42dd158 <+792>:	movapd 0x910(%rsp),%xmm0
   0x00007ffff42dd161 <+801>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd165 <+805>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd16b <+811>:	movapd 0x20(%rsp),%xmm0
   0x00007ffff42dd171 <+817>:	movapd -0x20(%rsp),%xmm1
   0x00007ffff42dd177 <+823>:	movapd %xmm0,0x8f0(%rsp)
   0x00007ffff42dd180 <+832>:	movapd %xmm1,0x8e0(%rsp)
   0x00007ffff42dd189 <+841>:	movapd 0x8f0(%rsp),%xmm0
   0x00007ffff42dd192 <+850>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd196 <+854>:	movapd %xmm0,-0x60(%rsp)
   0x00007ffff42dd19c <+860>:	movapd 0x10(%rsp),%xmm0
   0x00007ffff42dd1a2 <+866>:	movapd -0x30(%rsp),%xmm1
   0x00007ffff42dd1a8 <+872>:	movapd %xmm0,0x8d0(%rsp)
   0x00007ffff42dd1b1 <+881>:	movapd %xmm1,0x8c0(%rsp)
   0x00007ffff42dd1ba <+890>:	movapd 0x8d0(%rsp),%xmm0
   0x00007ffff42dd1c3 <+899>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd1c7 <+903>:	movapd %xmm0,-0x70(%rsp)
   0x00007ffff42dd1cd <+909>:	movapd (%rsp),%xmm0
   0x00007ffff42dd1d2 <+914>:	movapd -0x40(%rsp),%xmm1
   0x00007ffff42dd1d8 <+920>:	movapd %xmm0,0x8b0(%rsp)
   0x00007ffff42dd1e1 <+929>:	movapd %xmm1,0x8a0(%rsp)
   0x00007ffff42dd1ea <+938>:	movapd 0x8b0(%rsp),%xmm0
   0x00007ffff42dd1f3 <+947>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd1f7 <+951>:	movapd %xmm0,-0x80(%rsp)
   0x00007ffff42dd1fd <+957>:	movapd -0x50(%rsp),%xmm0
   0x00007ffff42dd203 <+963>:	movapd -0x60(%rsp),%xmm1
   0x00007ffff42dd209 <+969>:	movapd %xmm0,0x890(%rsp)
   0x00007ffff42dd212 <+978>:	movapd %xmm1,0x880(%rsp)
   0x00007ffff42dd21b <+987>:	movapd 0x890(%rsp),%xmm0
   0x00007ffff42dd224 <+996>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd228 <+1000>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd22e <+1006>:	movapd -0x70(%rsp),%xmm1
   0x00007ffff42dd234 <+1012>:	movapd %xmm0,0x870(%rsp)
   0x00007ffff42dd23d <+1021>:	movapd %xmm1,0x860(%rsp)
   0x00007ffff42dd246 <+1030>:	movapd 0x870(%rsp),%xmm0
   0x00007ffff42dd24f <+1039>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd253 <+1043>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd259 <+1049>:	movapd -0x80(%rsp),%xmm1
   0x00007ffff42dd25f <+1055>:	movapd %xmm0,0x850(%rsp)
   0x00007ffff42dd268 <+1064>:	movapd %xmm1,0x840(%rsp)
   0x00007ffff42dd271 <+1073>:	movapd 0x850(%rsp),%xmm0
   0x00007ffff42dd27a <+1082>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd27e <+1086>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd284 <+1092>:	mov    %rdx,0x838(%rsp)
   0x00007ffff42dd28c <+1100>:	movapd %xmm0,0x820(%rsp)
   0x00007ffff42dd295 <+1109>:	mov    0x838(%rsp),%rdx
   0x00007ffff42dd29d <+1117>:	movapd %xmm0,(%rdx)
   0x00007ffff42dd2a1 <+1121>:	mov    0x88(%rsp),%rdx
   0x00007ffff42dd2a9 <+1129>:	movsd  0x20(%rdx),%xmm0
   0x00007ffff42dd2ae <+1134>:	movsd  %xmm0,0x818(%rsp)
   0x00007ffff42dd2b7 <+1143>:	shufpd $0x0,%xmm0,%xmm0
   0x00007ffff42dd2bc <+1148>:	movapd %xmm0,0x800(%rsp)
   0x00007ffff42dd2c5 <+1157>:	movapd %xmm0,-0x10(%rsp)
   0x00007ffff42dd2cb <+1163>:	mov    0x88(%rsp),%rdx
   0x00007ffff42dd2d3 <+1171>:	movsd  0x28(%rdx),%xmm0
   0x00007ffff42dd2d8 <+1176>:	movsd  %xmm0,0x7f8(%rsp)
   0x00007ffff42dd2e1 <+1185>:	shufpd $0x0,%xmm0,%xmm0
   0x00007ffff42dd2e6 <+1190>:	movapd %xmm0,0x7e0(%rsp)
   0x00007ffff42dd2ef <+1199>:	movapd %xmm0,-0x20(%rsp)
   0x00007ffff42dd2f5 <+1205>:	mov    0x88(%rsp),%rdx
   0x00007ffff42dd2fd <+1213>:	movsd  0x30(%rdx),%xmm0
   0x00007ffff42dd302 <+1218>:	movsd  %xmm0,0x7d8(%rsp)
   0x00007ffff42dd30b <+1227>:	shufpd $0x0,%xmm0,%xmm0
   0x00007ffff42dd310 <+1232>:	movapd %xmm0,0x7c0(%rsp)
   0x00007ffff42dd319 <+1241>:	movapd %xmm0,-0x30(%rsp)
   0x00007ffff42dd31f <+1247>:	mov    0x88(%rsp),%rdx
   0x00007ffff42dd327 <+1255>:	movsd  0x38(%rdx),%xmm0
   0x00007ffff42dd32c <+1260>:	movsd  %xmm0,0x7b8(%rsp)
   0x00007ffff42dd335 <+1269>:	shufpd $0x0,%xmm0,%xmm0
   0x00007ffff42dd33a <+1274>:	movapd %xmm0,0x7a0(%rsp)
   0x00007ffff42dd343 <+1283>:	movapd %xmm0,-0x40(%rsp)
   0x00007ffff42dd349 <+1289>:	movapd 0x70(%rsp),%xmm0
   0x00007ffff42dd34f <+1295>:	movapd -0x10(%rsp),%xmm1
   0x00007ffff42dd355 <+1301>:	movapd %xmm0,0x790(%rsp)
   0x00007ffff42dd35e <+1310>:	movapd %xmm1,0x780(%rsp)
   0x00007ffff42dd367 <+1319>:	movapd 0x790(%rsp),%xmm0
   0x00007ffff42dd370 <+1328>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd374 <+1332>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd37a <+1338>:	movapd 0x60(%rsp),%xmm0
   0x00007ffff42dd380 <+1344>:	movapd -0x20(%rsp),%xmm1
   0x00007ffff42dd386 <+1350>:	movapd %xmm0,0x770(%rsp)
   0x00007ffff42dd38f <+1359>:	movapd %xmm1,0x760(%rsp)
   0x00007ffff42dd398 <+1368>:	movapd 0x770(%rsp),%xmm0
   0x00007ffff42dd3a1 <+1377>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd3a5 <+1381>:	movapd %xmm0,-0x60(%rsp)
   0x00007ffff42dd3ab <+1387>:	movapd 0x50(%rsp),%xmm0
   0x00007ffff42dd3b1 <+1393>:	movapd -0x30(%rsp),%xmm1
   0x00007ffff42dd3b7 <+1399>:	movapd %xmm0,0x750(%rsp)
   0x00007ffff42dd3c0 <+1408>:	movapd %xmm1,0x740(%rsp)
   0x00007ffff42dd3c9 <+1417>:	movapd 0x750(%rsp),%xmm0
   0x00007ffff42dd3d2 <+1426>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd3d6 <+1430>:	movapd %xmm0,-0x70(%rsp)
   0x00007ffff42dd3dc <+1436>:	movapd 0x40(%rsp),%xmm0
   0x00007ffff42dd3e2 <+1442>:	movapd -0x40(%rsp),%xmm1
   0x00007ffff42dd3e8 <+1448>:	movapd %xmm0,0x730(%rsp)
   0x00007ffff42dd3f1 <+1457>:	movapd %xmm1,0x720(%rsp)
   0x00007ffff42dd3fa <+1466>:	movapd 0x730(%rsp),%xmm0
   0x00007ffff42dd403 <+1475>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd407 <+1479>:	movapd %xmm0,-0x80(%rsp)
   0x00007ffff42dd40d <+1485>:	movapd -0x50(%rsp),%xmm0
   0x00007ffff42dd413 <+1491>:	movapd -0x60(%rsp),%xmm1
   0x00007ffff42dd419 <+1497>:	movapd %xmm0,0x710(%rsp)
   0x00007ffff42dd422 <+1506>:	movapd %xmm1,0x700(%rsp)
   0x00007ffff42dd42b <+1515>:	movapd 0x710(%rsp),%xmm0
   0x00007ffff42dd434 <+1524>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd438 <+1528>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd43e <+1534>:	movapd -0x70(%rsp),%xmm1
   0x00007ffff42dd444 <+1540>:	movapd %xmm0,0x6f0(%rsp)
   0x00007ffff42dd44d <+1549>:	movapd %xmm1,0x6e0(%rsp)
   0x00007ffff42dd456 <+1558>:	movapd 0x6f0(%rsp),%xmm0
   0x00007ffff42dd45f <+1567>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd463 <+1571>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd469 <+1577>:	movapd -0x80(%rsp),%xmm1
   0x00007ffff42dd46f <+1583>:	movapd %xmm0,0x6d0(%rsp)
   0x00007ffff42dd478 <+1592>:	movapd %xmm1,0x6c0(%rsp)
   0x00007ffff42dd481 <+1601>:	movapd 0x6d0(%rsp),%xmm0
   0x00007ffff42dd48a <+1610>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd48e <+1614>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd494 <+1620>:	mov    %rdi,0x6b8(%rsp)
   0x00007ffff42dd49c <+1628>:	movapd %xmm0,0x6a0(%rsp)
   0x00007ffff42dd4a5 <+1637>:	mov    0x6b8(%rsp),%rdx
   0x00007ffff42dd4ad <+1645>:	movapd %xmm0,(%rdx)
   0x00007ffff42dd4b1 <+1649>:	movapd 0x30(%rsp),%xmm0
   0x00007ffff42dd4b7 <+1655>:	movapd -0x10(%rsp),%xmm1
   0x00007ffff42dd4bd <+1661>:	movapd %xmm0,0x690(%rsp)
   0x00007ffff42dd4c6 <+1670>:	movapd %xmm1,0x680(%rsp)
   0x00007ffff42dd4cf <+1679>:	movapd 0x690(%rsp),%xmm0
   0x00007ffff42dd4d8 <+1688>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd4dc <+1692>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd4e2 <+1698>:	movapd 0x20(%rsp),%xmm0
   0x00007ffff42dd4e8 <+1704>:	movapd -0x20(%rsp),%xmm1
   0x00007ffff42dd4ee <+1710>:	movapd %xmm0,0x670(%rsp)
   0x00007ffff42dd4f7 <+1719>:	movapd %xmm1,0x660(%rsp)
   0x00007ffff42dd500 <+1728>:	movapd 0x670(%rsp),%xmm0
   0x00007ffff42dd509 <+1737>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd50d <+1741>:	movapd %xmm0,-0x60(%rsp)
   0x00007ffff42dd513 <+1747>:	movapd 0x10(%rsp),%xmm0
   0x00007ffff42dd519 <+1753>:	movapd -0x30(%rsp),%xmm1
   0x00007ffff42dd51f <+1759>:	movapd %xmm0,0x650(%rsp)
   0x00007ffff42dd528 <+1768>:	movapd %xmm1,0x640(%rsp)
   0x00007ffff42dd531 <+1777>:	movapd 0x650(%rsp),%xmm0
   0x00007ffff42dd53a <+1786>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd53e <+1790>:	movapd %xmm0,-0x70(%rsp)
   0x00007ffff42dd544 <+1796>:	movapd (%rsp),%xmm0
   0x00007ffff42dd549 <+1801>:	movapd -0x40(%rsp),%xmm1
   0x00007ffff42dd54f <+1807>:	movapd %xmm0,0x630(%rsp)
   0x00007ffff42dd558 <+1816>:	movapd %xmm1,0x620(%rsp)
   0x00007ffff42dd561 <+1825>:	movapd 0x630(%rsp),%xmm0
   0x00007ffff42dd56a <+1834>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd56e <+1838>:	movapd %xmm0,-0x80(%rsp)
   0x00007ffff42dd574 <+1844>:	movapd -0x50(%rsp),%xmm0
   0x00007ffff42dd57a <+1850>:	movapd -0x60(%rsp),%xmm1
   0x00007ffff42dd580 <+1856>:	movapd %xmm0,0x610(%rsp)
   0x00007ffff42dd589 <+1865>:	movapd %xmm1,0x600(%rsp)
   0x00007ffff42dd592 <+1874>:	movapd 0x610(%rsp),%xmm0
   0x00007ffff42dd59b <+1883>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd59f <+1887>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd5a5 <+1893>:	movapd -0x70(%rsp),%xmm1
   0x00007ffff42dd5ab <+1899>:	movapd %xmm0,0x5f0(%rsp)
   0x00007ffff42dd5b4 <+1908>:	movapd %xmm1,0x5e0(%rsp)
   0x00007ffff42dd5bd <+1917>:	movapd 0x5f0(%rsp),%xmm0
   0x00007ffff42dd5c6 <+1926>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd5ca <+1930>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd5d0 <+1936>:	movapd -0x80(%rsp),%xmm1
   0x00007ffff42dd5d6 <+1942>:	movapd %xmm0,0x5d0(%rsp)
   0x00007ffff42dd5df <+1951>:	movapd %xmm1,0x5c0(%rsp)
   0x00007ffff42dd5e8 <+1960>:	movapd 0x5d0(%rsp),%xmm0
   0x00007ffff42dd5f1 <+1969>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd5f5 <+1973>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd5fb <+1979>:	mov    %r8,0x5b8(%rsp)
   0x00007ffff42dd603 <+1987>:	movapd %xmm0,0x5a0(%rsp)
   0x00007ffff42dd60c <+1996>:	mov    0x5b8(%rsp),%rdx
   0x00007ffff42dd614 <+2004>:	movapd %xmm0,(%rdx)
   0x00007ffff42dd618 <+2008>:	mov    0x88(%rsp),%rdx
   0x00007ffff42dd620 <+2016>:	movsd  0x40(%rdx),%xmm0
   0x00007ffff42dd625 <+2021>:	movsd  %xmm0,0x598(%rsp)
   0x00007ffff42dd62e <+2030>:	shufpd $0x0,%xmm0,%xmm0
   0x00007ffff42dd633 <+2035>:	movapd %xmm0,0x580(%rsp)
   0x00007ffff42dd63c <+2044>:	movapd %xmm0,-0x10(%rsp)
   0x00007ffff42dd642 <+2050>:	mov    0x88(%rsp),%rdx
   0x00007ffff42dd64a <+2058>:	movsd  0x48(%rdx),%xmm0
   0x00007ffff42dd64f <+2063>:	movsd  %xmm0,0x578(%rsp)
   0x00007ffff42dd658 <+2072>:	shufpd $0x0,%xmm0,%xmm0
   0x00007ffff42dd65d <+2077>:	movapd %xmm0,0x560(%rsp)
   0x00007ffff42dd666 <+2086>:	movapd %xmm0,-0x20(%rsp)
   0x00007ffff42dd66c <+2092>:	mov    0x88(%rsp),%rdx
   0x00007ffff42dd674 <+2100>:	movsd  0x50(%rdx),%xmm0
   0x00007ffff42dd679 <+2105>:	movsd  %xmm0,0x558(%rsp)
   0x00007ffff42dd682 <+2114>:	shufpd $0x0,%xmm0,%xmm0
   0x00007ffff42dd687 <+2119>:	movapd %xmm0,0x540(%rsp)
   0x00007ffff42dd690 <+2128>:	movapd %xmm0,-0x30(%rsp)
   0x00007ffff42dd696 <+2134>:	mov    0x88(%rsp),%rdx
   0x00007ffff42dd69e <+2142>:	movsd  0x58(%rdx),%xmm0
   0x00007ffff42dd6a3 <+2147>:	movsd  %xmm0,0x538(%rsp)
   0x00007ffff42dd6ac <+2156>:	shufpd $0x0,%xmm0,%xmm0
   0x00007ffff42dd6b1 <+2161>:	movapd %xmm0,0x520(%rsp)
   0x00007ffff42dd6ba <+2170>:	movapd %xmm0,-0x40(%rsp)
   0x00007ffff42dd6c0 <+2176>:	movapd 0x70(%rsp),%xmm0
   0x00007ffff42dd6c6 <+2182>:	movapd -0x10(%rsp),%xmm1
   0x00007ffff42dd6cc <+2188>:	movapd %xmm0,0x510(%rsp)
   0x00007ffff42dd6d5 <+2197>:	movapd %xmm1,0x500(%rsp)
   0x00007ffff42dd6de <+2206>:	movapd 0x510(%rsp),%xmm0
   0x00007ffff42dd6e7 <+2215>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd6eb <+2219>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd6f1 <+2225>:	movapd 0x60(%rsp),%xmm0
   0x00007ffff42dd6f7 <+2231>:	movapd -0x20(%rsp),%xmm1
   0x00007ffff42dd6fd <+2237>:	movapd %xmm0,0x4f0(%rsp)
   0x00007ffff42dd706 <+2246>:	movapd %xmm1,0x4e0(%rsp)
   0x00007ffff42dd70f <+2255>:	movapd 0x4f0(%rsp),%xmm0
   0x00007ffff42dd718 <+2264>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd71c <+2268>:	movapd %xmm0,-0x60(%rsp)
   0x00007ffff42dd722 <+2274>:	movapd 0x50(%rsp),%xmm0
   0x00007ffff42dd728 <+2280>:	movapd -0x30(%rsp),%xmm1
   0x00007ffff42dd72e <+2286>:	movapd %xmm0,0x4d0(%rsp)
   0x00007ffff42dd737 <+2295>:	movapd %xmm1,0x4c0(%rsp)
   0x00007ffff42dd740 <+2304>:	movapd 0x4d0(%rsp),%xmm0
   0x00007ffff42dd749 <+2313>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd74d <+2317>:	movapd %xmm0,-0x70(%rsp)
   0x00007ffff42dd753 <+2323>:	movapd 0x40(%rsp),%xmm0
   0x00007ffff42dd759 <+2329>:	movapd -0x40(%rsp),%xmm1
   0x00007ffff42dd75f <+2335>:	movapd %xmm0,0x4b0(%rsp)
   0x00007ffff42dd768 <+2344>:	movapd %xmm1,0x4a0(%rsp)
   0x00007ffff42dd771 <+2353>:	movapd 0x4b0(%rsp),%xmm0
   0x00007ffff42dd77a <+2362>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd77e <+2366>:	movapd %xmm0,-0x80(%rsp)
   0x00007ffff42dd784 <+2372>:	movapd -0x50(%rsp),%xmm0
   0x00007ffff42dd78a <+2378>:	movapd -0x60(%rsp),%xmm1
   0x00007ffff42dd790 <+2384>:	movapd %xmm0,0x490(%rsp)
   0x00007ffff42dd799 <+2393>:	movapd %xmm1,0x480(%rsp)
   0x00007ffff42dd7a2 <+2402>:	movapd 0x490(%rsp),%xmm0
   0x00007ffff42dd7ab <+2411>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd7af <+2415>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd7b5 <+2421>:	movapd -0x70(%rsp),%xmm1
   0x00007ffff42dd7bb <+2427>:	movapd %xmm0,0x470(%rsp)
   0x00007ffff42dd7c4 <+2436>:	movapd %xmm1,0x460(%rsp)
   0x00007ffff42dd7cd <+2445>:	movapd 0x470(%rsp),%xmm0
   0x00007ffff42dd7d6 <+2454>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd7da <+2458>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd7e0 <+2464>:	movapd -0x80(%rsp),%xmm1
   0x00007ffff42dd7e6 <+2470>:	movapd %xmm0,0x450(%rsp)
   0x00007ffff42dd7ef <+2479>:	movapd %xmm1,0x440(%rsp)
   0x00007ffff42dd7f8 <+2488>:	movapd 0x450(%rsp),%xmm0
   0x00007ffff42dd801 <+2497>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd805 <+2501>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd80b <+2507>:	mov    %rax,0x438(%rsp)
   0x00007ffff42dd813 <+2515>:	movapd %xmm0,0x420(%rsp)
   0x00007ffff42dd81c <+2524>:	mov    0x438(%rsp),%rax
   0x00007ffff42dd824 <+2532>:	movapd %xmm0,(%rax)
   0x00007ffff42dd828 <+2536>:	movapd 0x30(%rsp),%xmm0
   0x00007ffff42dd82e <+2542>:	movapd -0x10(%rsp),%xmm1
   0x00007ffff42dd834 <+2548>:	movapd %xmm0,0x410(%rsp)
   0x00007ffff42dd83d <+2557>:	movapd %xmm1,0x400(%rsp)
   0x00007ffff42dd846 <+2566>:	movapd 0x410(%rsp),%xmm0
   0x00007ffff42dd84f <+2575>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd853 <+2579>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd859 <+2585>:	movapd 0x20(%rsp),%xmm0
   0x00007ffff42dd85f <+2591>:	movapd -0x20(%rsp),%xmm1
   0x00007ffff42dd865 <+2597>:	movapd %xmm0,0x3f0(%rsp)
   0x00007ffff42dd86e <+2606>:	movapd %xmm1,0x3e0(%rsp)
   0x00007ffff42dd877 <+2615>:	movapd 0x3f0(%rsp),%xmm0
   0x00007ffff42dd880 <+2624>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd884 <+2628>:	movapd %xmm0,-0x60(%rsp)
   0x00007ffff42dd88a <+2634>:	movapd 0x10(%rsp),%xmm0
   0x00007ffff42dd890 <+2640>:	movapd -0x30(%rsp),%xmm1
   0x00007ffff42dd896 <+2646>:	movapd %xmm0,0x3d0(%rsp)
   0x00007ffff42dd89f <+2655>:	movapd %xmm1,0x3c0(%rsp)
   0x00007ffff42dd8a8 <+2664>:	movapd 0x3d0(%rsp),%xmm0
   0x00007ffff42dd8b1 <+2673>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd8b5 <+2677>:	movapd %xmm0,-0x70(%rsp)
   0x00007ffff42dd8bb <+2683>:	movapd (%rsp),%xmm0
   0x00007ffff42dd8c0 <+2688>:	movapd -0x40(%rsp),%xmm1
   0x00007ffff42dd8c6 <+2694>:	movapd %xmm0,0x3b0(%rsp)
   0x00007ffff42dd8cf <+2703>:	movapd %xmm1,0x3a0(%rsp)
   0x00007ffff42dd8d8 <+2712>:	movapd 0x3b0(%rsp),%xmm0
   0x00007ffff42dd8e1 <+2721>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dd8e5 <+2725>:	movapd %xmm0,-0x80(%rsp)
   0x00007ffff42dd8eb <+2731>:	movapd -0x50(%rsp),%xmm0
   0x00007ffff42dd8f1 <+2737>:	movapd -0x60(%rsp),%xmm1
   0x00007ffff42dd8f7 <+2743>:	movapd %xmm0,0x390(%rsp)
   0x00007ffff42dd900 <+2752>:	movapd %xmm1,0x380(%rsp)
   0x00007ffff42dd909 <+2761>:	movapd 0x390(%rsp),%xmm0
   0x00007ffff42dd912 <+2770>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd916 <+2774>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd91c <+2780>:	movapd -0x70(%rsp),%xmm1
   0x00007ffff42dd922 <+2786>:	movapd %xmm0,0x370(%rsp)
   0x00007ffff42dd92b <+2795>:	movapd %xmm1,0x360(%rsp)
   0x00007ffff42dd934 <+2804>:	movapd 0x370(%rsp),%xmm0
   0x00007ffff42dd93d <+2813>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd941 <+2817>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd947 <+2823>:	movapd -0x80(%rsp),%xmm1
   0x00007ffff42dd94d <+2829>:	movapd %xmm0,0x350(%rsp)
   0x00007ffff42dd956 <+2838>:	movapd %xmm1,0x340(%rsp)
   0x00007ffff42dd95f <+2847>:	movapd 0x350(%rsp),%xmm0
   0x00007ffff42dd968 <+2856>:	addpd  %xmm1,%xmm0
   0x00007ffff42dd96c <+2860>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dd972 <+2866>:	mov    %r9,0x338(%rsp)
   0x00007ffff42dd97a <+2874>:	movapd %xmm0,0x320(%rsp)
   0x00007ffff42dd983 <+2883>:	mov    0x338(%rsp),%rax
   0x00007ffff42dd98b <+2891>:	movapd %xmm0,(%rax)
   0x00007ffff42dd98f <+2895>:	mov    0x88(%rsp),%rax
   0x00007ffff42dd997 <+2903>:	movsd  0x60(%rax),%xmm0
   0x00007ffff42dd99c <+2908>:	movsd  %xmm0,0x318(%rsp)
   0x00007ffff42dd9a5 <+2917>:	shufpd $0x0,%xmm0,%xmm0
   0x00007ffff42dd9aa <+2922>:	movapd %xmm0,0x300(%rsp)
   0x00007ffff42dd9b3 <+2931>:	movapd %xmm0,-0x10(%rsp)
   0x00007ffff42dd9b9 <+2937>:	mov    0x88(%rsp),%rax
   0x00007ffff42dd9c1 <+2945>:	movsd  0x68(%rax),%xmm0
   0x00007ffff42dd9c6 <+2950>:	movsd  %xmm0,0x2f8(%rsp)
   0x00007ffff42dd9cf <+2959>:	shufpd $0x0,%xmm0,%xmm0
   0x00007ffff42dd9d4 <+2964>:	movapd %xmm0,0x2e0(%rsp)
   0x00007ffff42dd9dd <+2973>:	movapd %xmm0,-0x20(%rsp)
   0x00007ffff42dd9e3 <+2979>:	mov    0x88(%rsp),%rax
   0x00007ffff42dd9eb <+2987>:	movsd  0x70(%rax),%xmm0
   0x00007ffff42dd9f0 <+2992>:	movsd  %xmm0,0x2d8(%rsp)
   0x00007ffff42dd9f9 <+3001>:	shufpd $0x0,%xmm0,%xmm0
   0x00007ffff42dd9fe <+3006>:	movapd %xmm0,0x2c0(%rsp)
   0x00007ffff42dda07 <+3015>:	movapd %xmm0,-0x30(%rsp)
   0x00007ffff42dda0d <+3021>:	mov    0x88(%rsp),%rax
   0x00007ffff42dda15 <+3029>:	movsd  0x78(%rax),%xmm0
   0x00007ffff42dda1a <+3034>:	movsd  %xmm0,0x2b8(%rsp)
   0x00007ffff42dda23 <+3043>:	shufpd $0x0,%xmm0,%xmm0
   0x00007ffff42dda28 <+3048>:	movapd %xmm0,0x2a0(%rsp)
   0x00007ffff42dda31 <+3057>:	movapd %xmm0,-0x40(%rsp)
   0x00007ffff42dda37 <+3063>:	movapd 0x70(%rsp),%xmm0
   0x00007ffff42dda3d <+3069>:	movapd -0x10(%rsp),%xmm1
   0x00007ffff42dda43 <+3075>:	movapd %xmm0,0x290(%rsp)
   0x00007ffff42dda4c <+3084>:	movapd %xmm1,0x280(%rsp)
   0x00007ffff42dda55 <+3093>:	movapd 0x290(%rsp),%xmm0
   0x00007ffff42dda5e <+3102>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dda62 <+3106>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42dda68 <+3112>:	movapd 0x60(%rsp),%xmm0
   0x00007ffff42dda6e <+3118>:	movapd -0x20(%rsp),%xmm1
   0x00007ffff42dda74 <+3124>:	movapd %xmm0,0x270(%rsp)
   0x00007ffff42dda7d <+3133>:	movapd %xmm1,0x260(%rsp)
   0x00007ffff42dda86 <+3142>:	movapd 0x270(%rsp),%xmm0
   0x00007ffff42dda8f <+3151>:	mulpd  %xmm1,%xmm0
   0x00007ffff42dda93 <+3155>:	movapd %xmm0,-0x60(%rsp)
   0x00007ffff42dda99 <+3161>:	movapd 0x50(%rsp),%xmm0
   0x00007ffff42dda9f <+3167>:	movapd -0x30(%rsp),%xmm1
   0x00007ffff42ddaa5 <+3173>:	movapd %xmm0,0x250(%rsp)
   0x00007ffff42ddaae <+3182>:	movapd %xmm1,0x240(%rsp)
   0x00007ffff42ddab7 <+3191>:	movapd 0x250(%rsp),%xmm0
   0x00007ffff42ddac0 <+3200>:	mulpd  %xmm1,%xmm0
   0x00007ffff42ddac4 <+3204>:	movapd %xmm0,-0x70(%rsp)
   0x00007ffff42ddaca <+3210>:	movapd 0x40(%rsp),%xmm0
   0x00007ffff42ddad0 <+3216>:	movapd -0x40(%rsp),%xmm1
   0x00007ffff42ddad6 <+3222>:	movapd %xmm0,0x230(%rsp)
   0x00007ffff42ddadf <+3231>:	movapd %xmm1,0x220(%rsp)
   0x00007ffff42ddae8 <+3240>:	movapd 0x230(%rsp),%xmm0
   0x00007ffff42ddaf1 <+3249>:	mulpd  %xmm1,%xmm0
   0x00007ffff42ddaf5 <+3253>:	movapd %xmm0,-0x80(%rsp)
   0x00007ffff42ddafb <+3259>:	movapd -0x50(%rsp),%xmm0
   0x00007ffff42ddb01 <+3265>:	movapd -0x60(%rsp),%xmm1
   0x00007ffff42ddb07 <+3271>:	movapd %xmm0,0x210(%rsp)
   0x00007ffff42ddb10 <+3280>:	movapd %xmm1,0x200(%rsp)
   0x00007ffff42ddb19 <+3289>:	movapd 0x210(%rsp),%xmm0
   0x00007ffff42ddb22 <+3298>:	addpd  %xmm1,%xmm0
   0x00007ffff42ddb26 <+3302>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42ddb2c <+3308>:	movapd -0x70(%rsp),%xmm1
   0x00007ffff42ddb32 <+3314>:	movapd %xmm0,0x1f0(%rsp)
   0x00007ffff42ddb3b <+3323>:	movapd %xmm1,0x1e0(%rsp)
   0x00007ffff42ddb44 <+3332>:	movapd 0x1f0(%rsp),%xmm0
   0x00007ffff42ddb4d <+3341>:	addpd  %xmm1,%xmm0
   0x00007ffff42ddb51 <+3345>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42ddb57 <+3351>:	movapd -0x80(%rsp),%xmm1
   0x00007ffff42ddb5d <+3357>:	movapd %xmm0,0x1d0(%rsp)
   0x00007ffff42ddb66 <+3366>:	movapd %xmm1,0x1c0(%rsp)
   0x00007ffff42ddb6f <+3375>:	movapd 0x1d0(%rsp),%xmm0
   0x00007ffff42ddb78 <+3384>:	addpd  %xmm1,%xmm0
   0x00007ffff42ddb7c <+3388>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42ddb82 <+3394>:	mov    %rcx,0x1b8(%rsp)
   0x00007ffff42ddb8a <+3402>:	movapd %xmm0,0x1a0(%rsp)
   0x00007ffff42ddb93 <+3411>:	mov    0x1b8(%rsp),%rax
   0x00007ffff42ddb9b <+3419>:	movapd %xmm0,(%rax)
   0x00007ffff42ddb9f <+3423>:	movapd 0x30(%rsp),%xmm0
   0x00007ffff42ddba5 <+3429>:	movapd -0x10(%rsp),%xmm1
   0x00007ffff42ddbab <+3435>:	movapd %xmm0,0x190(%rsp)
   0x00007ffff42ddbb4 <+3444>:	movapd %xmm1,0x180(%rsp)
   0x00007ffff42ddbbd <+3453>:	movapd 0x190(%rsp),%xmm0
   0x00007ffff42ddbc6 <+3462>:	mulpd  %xmm1,%xmm0
   0x00007ffff42ddbca <+3466>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42ddbd0 <+3472>:	movapd 0x20(%rsp),%xmm0
   0x00007ffff42ddbd6 <+3478>:	movapd -0x20(%rsp),%xmm1
   0x00007ffff42ddbdc <+3484>:	movapd %xmm0,0x170(%rsp)
   0x00007ffff42ddbe5 <+3493>:	movapd %xmm1,0x160(%rsp)
   0x00007ffff42ddbee <+3502>:	movapd 0x170(%rsp),%xmm0
   0x00007ffff42ddbf7 <+3511>:	mulpd  %xmm1,%xmm0
   0x00007ffff42ddbfb <+3515>:	movapd %xmm0,-0x60(%rsp)
   0x00007ffff42ddc01 <+3521>:	movapd 0x10(%rsp),%xmm0
   0x00007ffff42ddc07 <+3527>:	movapd -0x30(%rsp),%xmm1
   0x00007ffff42ddc0d <+3533>:	movapd %xmm0,0x150(%rsp)
   0x00007ffff42ddc16 <+3542>:	movapd %xmm1,0x140(%rsp)
   0x00007ffff42ddc1f <+3551>:	movapd 0x150(%rsp),%xmm0
   0x00007ffff42ddc28 <+3560>:	mulpd  %xmm1,%xmm0
   0x00007ffff42ddc2c <+3564>:	movapd %xmm0,-0x70(%rsp)
   0x00007ffff42ddc32 <+3570>:	movapd (%rsp),%xmm0
   0x00007ffff42ddc37 <+3575>:	movapd -0x40(%rsp),%xmm1
   0x00007ffff42ddc3d <+3581>:	movapd %xmm0,0x130(%rsp)
   0x00007ffff42ddc46 <+3590>:	movapd %xmm1,0x120(%rsp)
   0x00007ffff42ddc4f <+3599>:	movapd 0x130(%rsp),%xmm0
   0x00007ffff42ddc58 <+3608>:	mulpd  %xmm1,%xmm0
   0x00007ffff42ddc5c <+3612>:	movapd %xmm0,-0x80(%rsp)
   0x00007ffff42ddc62 <+3618>:	movapd -0x50(%rsp),%xmm0
   0x00007ffff42ddc68 <+3624>:	movapd -0x60(%rsp),%xmm1
   0x00007ffff42ddc6e <+3630>:	movapd %xmm0,0x110(%rsp)
   0x00007ffff42ddc77 <+3639>:	movapd %xmm1,0x100(%rsp)
   0x00007ffff42ddc80 <+3648>:	movapd 0x110(%rsp),%xmm0
   0x00007ffff42ddc89 <+3657>:	addpd  %xmm1,%xmm0
   0x00007ffff42ddc8d <+3661>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42ddc93 <+3667>:	movapd -0x70(%rsp),%xmm1
   0x00007ffff42ddc99 <+3673>:	movapd %xmm0,0xf0(%rsp)
   0x00007ffff42ddca2 <+3682>:	movapd %xmm1,0xe0(%rsp)
   0x00007ffff42ddcab <+3691>:	movapd 0xf0(%rsp),%xmm0
   0x00007ffff42ddcb4 <+3700>:	addpd  %xmm1,%xmm0
   0x00007ffff42ddcb8 <+3704>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42ddcbe <+3710>:	movapd -0x80(%rsp),%xmm1
   0x00007ffff42ddcc4 <+3716>:	movapd %xmm0,0xd0(%rsp)
   0x00007ffff42ddccd <+3725>:	movapd %xmm1,0xc0(%rsp)
   0x00007ffff42ddcd6 <+3734>:	movapd 0xd0(%rsp),%xmm0
   0x00007ffff42ddcdf <+3743>:	addpd  %xmm1,%xmm0
   0x00007ffff42ddce3 <+3747>:	movapd %xmm0,-0x50(%rsp)
   0x00007ffff42ddce9 <+3753>:	mov    %r10,0xb8(%rsp)
   0x00007ffff42ddcf1 <+3761>:	movapd %xmm0,0xa0(%rsp)
   0x00007ffff42ddcfa <+3770>:	mov    0xb8(%rsp),%rax
   0x00007ffff42ddd02 <+3778>:	movapd %xmm0,(%rax)
   0x00007ffff42ddd06 <+3782>:	mov    %rsi,%rax
   0x00007ffff42ddd09 <+3785>:	add    $0xad8,%rsp
   0x00007ffff42ddd10 <+3792>:	retq
Comment 10 Benjamin Poulain 2013-01-03 10:02:36 PST
There you go, %rsi is unaligned.

It is the this pointer here.

If it was allocated on the stack, you may have either a bug in your compiler, or someone mess-up the alignment of the stack (likely at library boundaries).

If it is allocated on the heap, what allocator are you using?
Comment 11 Dana Jansens 2013-01-03 10:07:09 PST
(In reply to comment #10)
> There you go, %rsi is unaligned.
> 
> It is the this pointer here.
> 
> If it was allocated on the stack, you may have either a bug in your compiler, or someone mess-up the alignment of the stack (likely at library boundaries).

It's allocated on the stack, in WebTransformationOperations.cpp, which is inside the WebKit library.

(gdb) frame 2
#2  0x00007ffff367dd8d in WebKit::WebTransformOperations::apply (this=0xb63778) at ../../third_party/WebKit/Source/WebCore/platform/chromium/support/WebTransformOperations.cpp:95

91	WebTransformationMatrix WebTransformOperations::apply() const
92	{
93	    WebTransformationMatrix toReturn;
94	    for (size_t i = 0; i < m_private->operations.size(); ++i)
95	        toReturn.multiply(m_private->operations[i].matrix);
96	    return toReturn;
97	}

(gdb) frame 1 
#1  0x00007ffff3680e2d in WebKit::WebTransformationMatrix::multiply (this=0x7fffffffd4c8, t=...) at ../../third_party/WebKit/Source/WebCore/platform/chromium/support/WebTransformationMatrix.cpp:97

95	void WebTransformationMatrix::multiply(const WebTransformationMatrix& t)
96	{
97	    m_private.multiply(t.m_private);
98	}
Comment 12 Dana Jansens 2013-01-03 10:09:15 PST
Would registers/asm at frames 1 and 2 help point it out?
Comment 13 Benjamin Poulain 2013-01-03 10:13:51 PST
(In reply to comment #11)
> (In reply to comment #10)
> > There you go, %rsi is unaligned.
> > 
> > It is the this pointer here.
> > 
> > If it was allocated on the stack, you may have either a bug in your compiler, or someone mess-up the alignment of the stack (likely at library boundaries).
> 
> It's allocated on the stack, in WebTransformationOperations.cpp, which is inside the WebKit library.

In the patch, the alignment is specified on 16 bytes on the stack:
    typedef double Matrix4[4][4] __attribute__((aligned (16)));

This is not followed in Chromium for some reason.
Maybe?
1) Compiler bug (the assembly you pasted looks like quite aggressive debug code - do you have the bug in release?).
2) One of your libraries specify a different stack alignment?
Comment 14 James Robinson 2013-01-03 11:05:51 PST
(In reply to comment #13)
> 2) One of your libraries specify a different stack alignment?

I think this is the issue.  Will let you know when I verify.

In the future, would you prefer this sort of thing be restricted at compile-time to not run on chromium instead of reverted?  We can't really leave a crash in.
Comment 15 James Robinson 2013-01-03 11:36:33 PST
The issue is WebTransformationMatrix is aliasing space for WebCore::TransformationMatrix in different libraries without enforcing the same alignment requirements.  This is an ugly hack that isn't needed any more, so I'll just make it not alias at all.  This will take a little bit of time (probably not much more than an hour).

Benjamin - feel free to reland this patch with the ASM version guarded behind !PLATFORM(CHROMIUM) if you want to land before I get around to this.
Comment 16 Benjamin Poulain 2013-01-03 11:58:36 PST
> In the future, would you prefer this sort of thing be restricted at compile-time to not run on chromium instead of reverted?  We can't really leave a crash in.

I think #ifdefing Chromium would have been reasonable given this works everywhere except on that test.

I was annoyed because the patch was reverted without any information. Dana promptly provided the missing pieces so I guess it's ok.

> Benjamin - feel free to reland this patch with the ASM version guarded behind !PLATFORM(CHROMIUM) if you want to land before I get around to this.

I'll wait a bit. It would be nice if Chromium could have the optimization too. I will land tomorrow if you don't come back to me. Ping me if I can help.

I'll also add a #ifdef for FastMalloc. Stricto sensu, we cannot assume natural alignment on other allocator with malloc.
Comment 17 Benjamin Poulain 2013-01-04 14:40:19 PST
Created attachment 181377 [details]
Patch
Comment 18 Benjamin Poulain 2013-01-04 16:35:12 PST
Comment on attachment 181377 [details]
Patch

Clearing flags on attachment: 181377

Committed r138866: <http://trac.webkit.org/changeset/138866>
Comment 19 Benjamin Poulain 2013-01-04 16:35:16 PST
All reviewed patches have been landed.  Closing bug.