Bug 112239

Summary: LLint should be able to use x87 instead of SSE for floating point
Product: WebKit Reporter: Jan <medhefgo>
Component: JavaScriptCoreAssignee: Allan Sandfeld Jensen <allan.jensen>
Status: RESOLVED FIXED    
Severity: Critical CC: allan.jensen, benjamin, cgarcia, cmarcelo, commit-queue, fpizlo, gauvain, hausmann, kadam, mrobinson, ojan.autocc, ossy, webkit.review.bot, wingo, zan, zarvai
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Linux   
Bug Depends on: 114913    
Bug Blocks: 88186, 96286, 103747    
Attachments:
Description Flags
Trace from rekonq
none
Trace from arora
none
/proc/cpuinfo
none
WIP
none
Patch
none
LLIntAssembly.h
none
Patch
none
Patch
none
Patch
none
my version
none
Patch
none
Patch none

Description Jan 2013-03-13 02:40:29 PDT
QtWebKit crashes with illegal instruction (I tested rekonq and arora). I'm using Arch Linux on a quite old system. Traces and /proc/cpuinfo attached.
Comment 1 Jan 2013-03-13 02:41:46 PDT
Created attachment 192888 [details]
Trace from rekonq
Comment 2 Jan 2013-03-13 02:42:19 PDT
Created attachment 192889 [details]
Trace from arora
Comment 3 Jan 2013-03-13 02:42:54 PDT
Created attachment 192890 [details]
/proc/cpuinfo
Comment 4 Allan Sandfeld Jensen 2013-03-30 13:52:19 PDT
QtWebKit 2.3 or from Qt 5?

It looks like you have a x86 CPU without SSE2 which are used by default for math in QtWebkit for x86. In QtWebKit it can be disabled by using --no-force-sse2
Comment 5 Jan 2013-04-01 11:19:27 PDT
This is QtWebkit 2.3 using qt4. But it also crashes with the one shipped with qt5. I (and the arch package too) used --no-force-sse2 to create the debug build.
Comment 6 Allan Sandfeld Jensen 2013-04-01 13:35:16 PDT
Confirmed. X86.rb that picks X86 instructions uses SSE2 instructions in various places. You appear to hit mulsd in this case, but there are a few other examples of SSE2 instructions.

I will see if this can be fixed, but I am afraid there will be resistance to implementing it the right way since there is a fallback called cloop. Unfortunately cloop is not that fast and also has a lot of bugs since it is only used for otherwise unsupported architectures.
Comment 7 Filip Pizlo 2013-04-01 13:40:18 PDT
(In reply to comment #6)
> Confirmed. X86.rb that picks X86 instructions uses SSE2 instructions in various places. You appear to hit mulsd in this case, but there are a few other examples of SSE2 instructions.
> 
> I will see if this can be fixed, but I am afraid there will be resistance to implementing it the right way since there is a fallback called cloop. Unfortunately cloop is not that fast and also has a lot of bugs since it is only used for otherwise unsupported architectures.

It would be great to fix cloop.

For us, cloop appears to be pretty quick, in the tests we've done - so if it isn't on your platform, it might be easier to fix that problem than modifying x86.rb.

On the other hand, I don't want to get in the way of getting things working for you guys: if you'd rather add x87 support to x86.rb then I'd be happy to help.  I believe it should be straight-forward if you follow Intel's guidance on how to make x87 "look" like a flat register file.
Comment 8 Allan Sandfeld Jensen 2013-04-01 14:01:48 PDT
(In reply to comment #7)
> (In reply to comment #6)
> > Confirmed. X86.rb that picks X86 instructions uses SSE2 instructions in various places. You appear to hit mulsd in this case, but there are a few other examples of SSE2 instructions.
> > 
> > I will see if this can be fixed, but I am afraid there will be resistance to implementing it the right way since there is a fallback called cloop. Unfortunately cloop is not that fast and also has a lot of bugs since it is only used for otherwise unsupported architectures.
> 
> It would be great to fix cloop.
> 
> For us, cloop appears to be pretty quick, in the tests we've done - so if it isn't on your platform, it might be easier to fix that problem than modifying x86.rb.
> 
> On the other hand, I don't want to get in the way of getting things working for you guys: if you'd rather add x87 support to x86.rb then I'd be happy to help.  I believe it should be straight-forward if you follow Intel's guidance on how to make x87 "look" like a flat register file.

I looked further and it seems JIT detects it needs SSE2 for floating point operations and bails for block where it needs it and leaves the job to LLint.

Would it be possible for LLint to detect the situation similar to JIT and fallback to CLoop for blocks that can not be interpreted with native assembler?
Comment 9 Filip Pizlo 2013-04-01 14:05:34 PDT
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > Confirmed. X86.rb that picks X86 instructions uses SSE2 instructions in various places. You appear to hit mulsd in this case, but there are a few other examples of SSE2 instructions.
> > > 
> > > I will see if this can be fixed, but I am afraid there will be resistance to implementing it the right way since there is a fallback called cloop. Unfortunately cloop is not that fast and also has a lot of bugs since it is only used for otherwise unsupported architectures.
> > 
> > It would be great to fix cloop.
> > 
> > For us, cloop appears to be pretty quick, in the tests we've done - so if it isn't on your platform, it might be easier to fix that problem than modifying x86.rb.
> > 
> > On the other hand, I don't want to get in the way of getting things working for you guys: if you'd rather add x87 support to x86.rb then I'd be happy to help.  I believe it should be straight-forward if you follow Intel's guidance on how to make x87 "look" like a flat register file.
> 
> I looked further and it seems JIT detects it needs SSE2 for floating point operations and bails for block where it needs it and leaves the job to LLint.

I don't think that's true.  The JIT will leave the job to its C code slow paths if SSE2 is not around.  That's actually quite nasty - calling to C code for every double arithmetic op is really bad.

> 
> Would it be possible for LLint to detect the situation similar to JIT and fallback to CLoop for blocks that can not be interpreted with native assembler?

It would be hard to detect it at run-time, and would require a lot of changes to have _both_ the LLInt asm code and the LLInt cloop code in the same executable.  I'm not sure any of us want to deal with the maintenance hassles of approach! ;-)

So here are our options:

- Have an ability to build LLInt to use x87.  I like this approach the best.

- Have the LLInt do a run-time check on each arithmetic op, and bail to its C slow path, like the JIT does.  This is probably less elegant, and slower, than the previous option.  But it could work.

Anyways, if you're looking for a quick fix I highly recommend you get the cloop working.  The whole point of the clop is to be portable; if it isn't then we should fix it.

If you want performance, then let's do it right.  There's nothing fundamentally blocking x87 support in the LLInt.
Comment 10 Allan Sandfeld Jensen 2013-04-01 14:11:50 PDT
(In reply to comment #9)
> If you want performance, then let's do it right.  There's nothing fundamentally blocking x87 support in the LLInt.

I don't really care about performance on these old machines. The problem is that on Linux many distributions have a policy of supporting architectures as far back as i686 (or even i486 in extreme cases). So if we use a solution that forces the switch on compile time (like cloop) it would force the distributions to compile all x86 with this switch and also slow down more modern x86 processors. Anything on runtime would be fine as long as it only hurts the slow machines. Though it would be good if it is at least as fast as the old interpreter which is what we should avoid regression compared to.
Comment 11 Filip Pizlo 2013-04-01 14:13:46 PDT
(In reply to comment #10)
> (In reply to comment #9)
> > If you want performance, then let's do it right.  There's nothing fundamentally blocking x87 support in the LLInt.
> 
> I don't really care about performance on these old machines. The problem is that on Linux many distributions have a policy of supporting architectures as far back as i686 (or even i486 in extreme cases). So if we use a solution that forces the switch on compile time (like cloop) it would force the distributions to compile all x86 with this switch and also slow down more modern x86 processors. Anything on runtime would be fine as long as it only hurts the slow machines. Though it would be good if it is at least as fast as the old interpreter which is what we should avoid regression compared to.

Aha!  Got it.

I would recommend seeing if you can cache the hasSSE2() (or whatever it's called) result in JSGlobalData, and then, only when compiling in this configuration, have LLInt check that flag prior to doing SSE stuff and bail out if it's not available.

It'll be one more check on the double paths of the interpreter.  You should benchmark how this affects performance but I'm guessing it won't be much.

Or just implement x87.
Comment 12 Allan Sandfeld Jensen 2013-04-04 01:51:42 PDT
Created attachment 196454 [details]
WIP

First patch. Works if enabled on x64, but still has a few failing tests on x86 32bit
Comment 13 Allan Sandfeld Jensen 2013-04-04 02:24:04 PDT
Created attachment 196457 [details]
Patch

Now also working on x86-32
Comment 14 Filip Pizlo 2013-04-04 13:50:43 PDT
Comment on attachment 196457 [details]
Patch

I like this!  Based on looking at the code, R=me.  Could you do me a favor though: could you (a) upload your LLIntAssembly.h file so I could sanity check it, mostly for my own curiosity; and (b) make sure that you run full LayoutTests with JIT runtime disabled (put useJIT() = false into Options.cpp's Options::initialize()) and make sure all is cool?
Comment 15 Filip Pizlo 2013-04-04 13:53:18 PDT
Comment on attachment 196457 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=196457&action=review

> Source/JavaScriptCore/offlineasm/x86.rb:40
> +    when "X86"
> +        true

Interesting choice.  It's what I would have done, also - but it may be surprising to some that x86-32 loses SSE2 in LLInt.  Can you make sure you make it clear in the ChangeLog that this is one of the effects of this change?

(An alternative would have been to have a Platform.h macro that selects whether to use x87, and then wire it into here somehow.  I don't like that, since it's a lot of complexity for probably no measurable gain - I mean, SSE is faster than x87 but not by enough to have a noticeable effect on the interpreter.)
Comment 16 Allan Sandfeld Jensen 2013-04-05 01:34:18 PDT
Created attachment 196596 [details]
LLIntAssembly.h
Comment 17 Allan Sandfeld Jensen 2013-04-05 02:03:33 PDT
Committed r147729: <http://trac.webkit.org/changeset/147729>
Comment 18 Allan Sandfeld Jensen 2013-04-05 04:09:35 PDT
It seems this change caused 52 canvas tests to change subtle, but only on 32bit ( I have tested them with x87 enabled on x64). So I am reopening until I find out what happened.

http://build.webkit.sed.hu/builders/x86-32 Linux Qt Release NRWT/builds/31428
Comment 19 Csaba Osztrogonác 2013-04-05 06:16:56 PDT
I think GTK is interested in this fix too, because they disabled LLINT 
bacause of this bug previously - http://trac.webkit.org/changeset/130076
Comment 20 Allan Sandfeld Jensen 2013-04-05 09:04:20 PDT
Created attachment 196640 [details]
Patch
Comment 21 Allan Sandfeld Jensen 2013-04-05 09:06:22 PDT
(In reply to comment #20)
> Created an attachment (id=196640) [details]
> Patch

This is very minor fixups for the patch. It doesn't fix the failing canvas tests though. I have also tried forcing rounding to 64bit precission after every FP operation and that doesn't help either. So I am currently out of ideas of how this affects the canvas tests in 32bit mode, but not in 64bit (with x87 enabled).
Comment 22 Filip Pizlo 2013-04-05 13:40:44 PDT
Rolled out in http://trac.webkit.org/changeset/147794

Sorry about this - it's breaking some internal builds. :-(  The bug should be easy to fix though:

<inline asm>:1267:2: error: ambiguous instructions require an explicit suffix (could be 'ficomps', or 'ficompl')

Do you know how to fix it?

(If you have a speculative fix, feel free to reland your patch with it and I will watch our bots.)
Comment 23 Allan Sandfeld Jensen 2013-04-05 14:03:36 PDT
(In reply to comment #22)
> Rolled out in http://trac.webkit.org/changeset/147794
> 
> Sorry about this - it's breaking some internal builds. :-(  The bug should be easy to fix though:
> 
> <inline asm>:1267:2: error: ambiguous instructions require an explicit suffix (could be 'ficomps', or 'ficompl')
> 
> Do you know how to fix it?
> 
> (If you have a speculative fix, feel free to reland your patch with it and I will watch our bots.)

That is already fixed in the fixup patch I attached above.
Comment 24 Filip Pizlo 2013-04-05 14:04:59 PDT
Comment on attachment 196640 [details]
Patch

r=me

Feel free to reland your previous patch along with this one (preferably land them together, or in quick succession - your call).
Comment 25 Filip Pizlo 2013-04-05 14:05:42 PDT
(In reply to comment #23)
> (In reply to comment #22)
> > Rolled out in http://trac.webkit.org/changeset/147794
> > 
> > Sorry about this - it's breaking some internal builds. :-(  The bug should be easy to fix though:
> > 
> > <inline asm>:1267:2: error: ambiguous instructions require an explicit suffix (could be 'ficomps', or 'ficompl')
> > 
> > Do you know how to fix it?
> > 
> > (If you have a speculative fix, feel free to reland your patch with it and I will watch our bots.)
> 
> That is already fixed in the fixup patch I attached above.

I'm sorry!

I should have looked at that patch before rolling out.
Comment 26 Allan Sandfeld Jensen 2013-04-05 14:07:53 PDT
(In reply to comment #25)
> (In reply to comment #23)
> > (In reply to comment #22)
> > > Rolled out in http://trac.webkit.org/changeset/147794
> > > 
> > > Sorry about this - it's breaking some internal builds. :-(  The bug should be easy to fix though:
> > > 
> > > <inline asm>:1267:2: error: ambiguous instructions require an explicit suffix (could be 'ficomps', or 'ficompl')
> > > 
> > > Do you know how to fix it?
> > > 
> > > (If you have a speculative fix, feel free to reland your patch with it and I will watch our bots.)
> > 
> > That is already fixed in the fixup patch I attached above.
> 
> I'm sorry!
> 
> I should have looked at that patch before rolling out.

No problem. I will wait with landing again until I know what is going on with the last canvas tests in 32bit, and land it all together.
Comment 27 Allan Sandfeld Jensen 2013-04-08 09:17:48 PDT
Created attachment 196862 [details]
Patch
Comment 28 Allan Sandfeld Jensen 2013-04-09 02:23:41 PDT
(In reply to comment #27)
> Created an attachment (id=196862) [details]
> Patch

While this patch now works with no test regressions. Calling finit before every c-call may cause performance regressions. I will try moving finit to before using fp-instructions, though it will require modifying more llint assembler.
Comment 29 Filip Pizlo 2013-04-09 07:16:36 PDT
(In reply to comment #28)
> (In reply to comment #27)
> > Created an attachment (id=196862) [details] [details]
> > Patch
> 
> While this patch now works with no test regressions. Calling finit before every c-call may cause performance regressions. I will try moving finit to before using fp-instructions, though it will require modifying more llint assembler.

Why do you need to do a full finit?  Would be good to explain what problem this solves. :-)
Comment 30 Allan Sandfeld Jensen 2013-04-09 07:47:42 PDT
(In reply to comment #29)
> (In reply to comment #28)
> > (In reply to comment #27)
> > > Created an attachment (id=196862) [details] [details] [details]
> > > Patch
> > 
> > While this patch now works with no test regressions. Calling finit before every c-call may cause performance regressions. I will try moving finit to before using fp-instructions, though it will require modifying more llint assembler.
> 
> Why do you need to do a full finit?  Would be good to explain what problem this solves. :-)

I traced the problem with the canvas tests to wrong values calculated in C++ functions that was called from llint (the input values provided by llint was correct, but the output from C++ was wrong). According to the calling convensions details I could find, there should be made no assumptions about FP registers, but it seems like GCC somehow still expect the FPU to be clear when FP-using functions are called. So the finit is simply put there to ensure any mess we made of the FPU state is undone.

If it was possible to do, perhaps only calling ffree on the used registers would be enough.
Comment 31 Allan Sandfeld Jensen 2013-04-09 08:30:39 PDT
(In reply to comment #30)
> (In reply to comment #29)
> > (In reply to comment #28)
> > > (In reply to comment #27)
> > > > Created an attachment (id=196862) [details] [details] [details] [details]
> > > > Patch
> > > 
> > > While this patch now works with no test regressions. Calling finit before every c-call may cause performance regressions. I will try moving finit to before using fp-instructions, though it will require modifying more llint assembler.
> > 
> > Why do you need to do a full finit?  Would be good to explain what problem this solves. :-)
> 
> I traced the problem with the canvas tests to wrong values calculated in C++ functions that was called from llint (the input values provided by llint was correct, but the output from C++ was wrong). According to the calling convensions details I could find, there should be made no assumptions about FP registers, but it seems like GCC somehow still expect the FPU to be clear when FP-using functions are called. So the finit is simply put there to ensure any mess we made of the FPU state is undone.
> 
> If it was possible to do, perhaps only calling ffree on the used registers would be enough.

Yeah, found it. It wasn't mentioned in any of the common descriptions of the calling convention, but if you go to the source SystemV Intel 386 ABI, you find this:

%st(0)
Floating-point return values appear on the top of the floating-
point register stack; there is no difference in the representation
of single- or double-precision values in floating-point registers.
If the function does not return a floating-point value, then this
register must be empty. This register must be empty before
entry to a function.

%st(1) through %st(7)
Floating-point scratch registers have no specified role in the stan-
dard calling sequence. These registers must be empty before
entry and upon exit from a function.

The stuff about %st(0) being used to return values doesn't apply to Linux I think, but it does state the registers must be empty, which is what finit does (or ffree on registers not popped).
Comment 32 Allan Sandfeld Jensen 2013-04-10 03:25:59 PDT
Created attachment 197236 [details]
Patch

Replaced finit with releasing used registers
Comment 33 Filip Pizlo 2013-04-11 12:05:15 PDT
Comment on attachment 197236 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=197236&action=review

> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:708
> +    frelease ft0, ft1

I'm not sure how much I like this new op.  This feels like it could get quite fragile - we probably will be adding more stuff to LLint that uses doubles, and it would be weird to have to remember to call this.

Are you sure it's a speedup over the finit approach?
Comment 34 Filip Pizlo 2013-04-11 12:07:04 PDT
I'm going to run some of our perf tests on the finit approach to see what the impact is.
Comment 35 Filip Pizlo 2013-04-11 12:29:54 PDT
Performance impact in jsc commandline versus ToT in 32-bit mode with and without the X87 patch that does finit, with all JITs enabled:


Benchmark report for SunSpider, V8Spider, Octane, Kraken, and JSRegress on bigmac (MacPro5,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quartary/OpenSource/WebKitBuild/Release/jsc (r148221)
"X87" at /Volumes/Data/pizlo/secondary/OpenSource/WebKitBuild/Release/jsc (r148221)

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get
microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                     TipOfTree                    X87                                        
SunSpider:
   3d-cube                                         8.4942+-0.1890     ?      8.4965+-0.1056        ?
   3d-morph                                       11.5936+-0.1159           11.4602+-0.1025          might be 1.0116x faster
   3d-raytrace                                    11.0021+-0.2164     ?     11.2055+-0.2798        ? might be 1.0185x slower
   access-binary-trees                             1.8959+-0.0312     ?      1.9282+-0.0331        ? might be 1.0171x slower
   access-fannkuch                                 9.1795+-0.1041     ?      9.2876+-0.1071        ? might be 1.0118x slower
   access-nbody                                    6.3186+-0.0641     ?      6.3783+-0.1399        ?
   access-nsieve                                   4.4097+-0.0756     ?      4.4678+-0.0572        ? might be 1.0132x slower
   bitops-3bit-bits-in-byte                        1.5997+-0.0144     ?      1.6063+-0.0171        ?
   bitops-bits-in-byte                             5.6814+-0.0627            5.6249+-0.0858          might be 1.0100x faster
   bitops-bitwise-and                              1.8749+-0.0576            1.8669+-0.0424        
   bitops-nsieve-bits                              4.9566+-0.0523            4.9548+-0.0510        
   controlflow-recursive                           2.9820+-0.0057     ?      2.9860+-0.0078        ?
   crypto-aes                                      7.9668+-0.0931     ?      8.0028+-0.1023        ?
   crypto-md5                                      4.1935+-0.0919     ?      4.3964+-0.1250        ? might be 1.0484x slower
   crypto-sha1                                     3.2397+-0.0588            3.2360+-0.0570        
   date-format-tofte                              14.1429+-0.3104     ?     14.2059+-0.2927        ?
   date-format-xparb                               9.2639+-0.2593     ?      9.3237+-0.2280        ?
   math-cordic                                     3.6682+-0.0479     ?      3.6949+-0.0203        ?
   math-partial-sums                              11.7925+-0.1135     ?     11.8693+-0.0935        ?
   math-spectral-norm                              2.8165+-0.0448     ?      2.8226+-0.0479        ?
   regexp-dna                                      8.8477+-0.1811     ?      8.9139+-0.1787        ?
   string-base64                                   4.4073+-0.0523            4.4069+-0.0552        
   string-fasta                                   10.8259+-0.1055     ?     11.0169+-0.1527        ? might be 1.0176x slower
   string-tagcloud                                13.4961+-0.1733     ?     13.8031+-0.2359        ? might be 1.0227x slower
   string-unpack-code                             25.3899+-0.3293           25.2025+-0.2995        
   string-validate-input                           7.6714+-0.3432     ?      7.7163+-0.3442        ?

   <arithmetic> *                                  7.6042+-0.0696     ?      7.6490+-0.0755        ? might be 1.0059x slower
   <geometric>                                     6.0906+-0.0565     ?      6.1311+-0.0583        ? might be 1.0066x slower
   <harmonic>                                      4.7875+-0.0430     ?      4.8181+-0.0438        ? might be 1.0064x slower

                                                     TipOfTree                    X87                                        
V8Spider:
   crypto                                         91.6146+-0.5027           91.4137+-0.4699        
   deltablue                                     120.6087+-1.7202          118.8778+-0.7180          might be 1.0146x faster
   earley-boyer                                   76.1154+-0.8634           75.8586+-0.5411        
   raytrace                                       56.1912+-0.1729     ?     56.5184+-0.4265        ?
   regexp                                         89.7104+-0.5339     ?     90.5512+-0.3719        ?
   richards                                      121.3825+-0.7467     ?    122.2395+-1.4907        ?
   splay                                          53.0984+-0.5773     ?     53.3219+-0.3370        ?

   <arithmetic>                                   86.9602+-0.3707     ?     86.9687+-0.4621        ? might be 1.0001x slower
   <geometric> *                                  83.0763+-0.3135     ?     83.1551+-0.4236        ? might be 1.0009x slower
   <harmonic>                                     79.2106+-0.2817     ?     79.3559+-0.3977        ? might be 1.0018x slower

                                                     TipOfTree                    X87                                        
Octane and V8v7:
   encrypt                                        0.48348+-0.00065    ?     0.48353+-0.00053       ?
   decrypt                                        8.90966+-0.00596          8.89872+-0.00920       
   deltablue                             x2       0.56748+-0.00494    ^     0.55906+-0.00186       ^ definitely 1.0151x faster
   earley                                         0.93068+-0.00425          0.92654+-0.00374       
   boyer                                         13.20650+-0.04582         13.19327+-0.04078       
   raytrace                              x2       4.38906+-0.01493          4.37664+-0.01765       
   regexp                                x2      27.44398+-0.08363    ^    27.20890+-0.07996       ^ definitely 1.0086x faster
   richards                              x2       0.31996+-0.00264          0.31936+-0.00241       
   splay                                 x2       0.75715+-0.00909          0.74053+-0.01132         might be 1.0224x faster
   navier-stokes                         x2       9.38862+-0.00818    ?     9.45802+-0.07138       ?
   closure                                        0.31999+-0.00895    ?     0.32463+-0.00928       ? might be 1.0145x slower
   jquery                                         3.88385+-0.49710    ?     3.92851+-0.50244       ? might be 1.0115x slower
   gbemu                                 x2     135.47291+-1.18772        135.38785+-0.68321       
   box2d                                 x2      32.42179+-0.11889         32.42034+-0.10271       

V8v7:
   <arithmetic>                                   6.82893+-0.01033    ^     6.80169+-0.01670       ^ definitely 1.0040x faster
   <geometric> *                                  2.40812+-0.00569    ^     2.39418+-0.00655       ^ definitely 1.0058x faster
   <harmonic>                                     0.97024+-0.00397          0.96270+-0.00396         might be 1.0078x faster

Octane including V8v7:
   <arithmetic>                                  20.42073+-0.10691         20.39530+-0.05372         might be 1.0012x faster
   <geometric> *                                  4.09899+-0.03111          4.08625+-0.02954         might be 1.0031x faster
   <harmonic>                                     1.10206+-0.00837          1.09754+-0.00765         might be 1.0041x faster

                                                     TipOfTree                    X87                                        
Kraken:
   ai-astar                                       472.628+-0.471      ^     465.689+-4.594         ^ definitely 1.0149x faster
   audio-beat-detection                           274.489+-1.197      ?     276.595+-1.695         ?
   audio-dft                                      381.043+-9.137            373.690+-1.184           might be 1.0197x faster
   audio-fft                                      137.559+-0.181      ?     137.815+-0.216         ?
   audio-oscillator                               298.985+-0.725      ?     299.378+-0.733         ?
   imaging-darkroom                               339.991+-0.727      ?     365.113+-30.737        ? might be 1.0739x slower
   imaging-desaturate                             137.357+-0.622            136.682+-0.484         
   imaging-gaussian-blur                          418.365+-1.395            417.264+-0.265         
   json-parse-financial                            78.265+-0.249      ^      75.689+-0.343         ^ definitely 1.0340x faster
   json-stringify-tinderbox                       106.516+-0.297      !     107.743+-0.322         ! definitely 1.0115x slower
   stanford-crypto-aes                            101.634+-0.531            101.034+-0.338         
   stanford-crypto-ccm                            103.308+-1.699            102.352+-1.850         
   stanford-crypto-pbkdf2                         261.911+-1.525            261.581+-2.100         
   stanford-crypto-sha256-iterative               110.083+-0.357      ?     110.181+-0.432         ?

   <arithmetic> *                                 230.152+-0.785      ?     230.772+-2.217         ? might be 1.0027x slower
   <geometric>                                    193.090+-0.575            193.036+-1.085           might be 1.0003x faster
   <harmonic>                                     162.403+-0.501            161.676+-0.605           might be 1.0045x faster

                                                     TipOfTree                    X87                                        
JSRegress:
   adapt-to-double-divide                         19.0074+-0.1913     ^     18.6296+-0.1514        ^ definitely 1.0203x faster
   aliased-arguments-getbyval                      0.8775+-0.0087     ?      0.9004+-0.0162        ? might be 1.0261x slower
   allocate-big-object                             2.0838+-0.0328     ?      2.1084+-0.0401        ? might be 1.0118x slower
   arity-mismatch-inlining                         0.6851+-0.0080     ?      0.7001+-0.0079        ? might be 1.0219x slower
   array-access-polymorphic-structure              6.7216+-0.1357     ?      6.7285+-0.1152        ?
   array-with-double-add                           5.4884+-0.0229            5.4359+-0.0683        
   array-with-double-increment                     4.1975+-0.0137            4.1942+-0.0146        
   array-with-double-mul-add                       6.7421+-0.0724     ?      6.7514+-0.0711        ?
   array-with-double-sum                           7.1602+-0.1074     ?      7.2963+-0.1622        ? might be 1.0190x slower
   array-with-int32-add-sub                       11.7535+-0.0958     ?     11.8917+-0.0825        ? might be 1.0118x slower
   array-with-int32-or-double-sum                  7.2823+-0.1007            7.2463+-0.0909        
   big-int-mul                                     4.8611+-0.0549     ?      4.9735+-0.1023        ? might be 1.0231x slower
   boolean-test                                    3.9271+-0.0224     ?      3.9501+-0.0067        ?
   cast-int-to-double                             20.1325+-0.1357     ?     20.1586+-0.1269        ?
   cell-argument                                  12.1223+-0.1192     ?     12.2541+-0.1462        ? might be 1.0109x slower
   cfg-simplify                                    2.8959+-0.0180     ?      2.8970+-0.0098        ?
   cmpeq-obj-to-obj-other                         11.3564+-0.1334     ?     11.6300+-0.1946        ? might be 1.0241x slower
   constant-test                                   7.5822+-0.0770     ?      7.6811+-0.0850        ? might be 1.0131x slower
   direct-arguments-getbyval                       0.8090+-0.0075     !      0.8241+-0.0066        ! definitely 1.0187x slower
   double-pollution-getbyval                       8.9846+-0.1005     ?      8.9944+-0.1016        ?
   double-pollution-putbyoffset                    6.3751+-0.1142            6.3006+-0.0794          might be 1.0118x faster
   empty-string-plus-int                          10.1471+-0.1503     ?     10.2435+-0.2278        ?
   external-arguments-getbyval                     2.1419+-0.0103     !      2.1730+-0.0146        ! definitely 1.0145x slower
   external-arguments-putbyval                     5.0277+-0.1003     ?      5.0886+-0.0610        ? might be 1.0121x slower
   Float32Array-matrix-mult                       12.3379+-0.1668     ?     12.5577+-0.1459        ? might be 1.0178x slower
   fold-double-to-int                             23.0576+-0.3205           22.7423+-0.1152          might be 1.0139x faster
   function-dot-apply                              2.8390+-0.0041     !      2.8574+-0.0037        ! definitely 1.0065x slower
   function-test                                   5.5198+-0.0918     ?      5.5437+-0.0469        ?
   get-by-id-chain-from-try-block                  6.0694+-0.0337            6.0636+-0.0811        
   HashMap-put-get-iterate-keys                   96.0344+-0.8850     ?     96.0750+-1.1000        ?
   HashMap-put-get-iterate                        97.4174+-0.7436     ?     97.5351+-0.6526        ?
   HashMap-string-put-get-iterate                 71.9545+-0.8233           71.3582+-1.3494        
   indexed-properties-in-objects                   4.9761+-0.0455     ?      4.9909+-0.0647        ?
   inline-arguments-access                         1.0701+-0.0150     ?      1.0779+-0.0079        ?
   inline-arguments-local-escape                  23.9192+-0.3102     ?     24.1633+-0.3372        ? might be 1.0102x slower
   inline-get-scoped-var                           7.8889+-0.1194            7.8187+-0.0998        
   inlined-put-by-id-transition                   13.3434+-0.1641     ?     13.3613+-0.1483        ?
   int-or-other-abs-then-get-by-val                7.6326+-0.1583            7.5191+-0.1008          might be 1.0151x faster
   int-or-other-abs-zero-then-get-by-val          40.3623+-1.0207     ?     40.4948+-0.5783        ?
   int-or-other-add-then-get-by-val                9.5777+-0.1319            9.4847+-0.0905        
   int-or-other-add                                9.6544+-0.1102     ?      9.7780+-0.0944        ? might be 1.0128x slower
   int-or-other-div-then-get-by-val               13.6582+-0.3412     ?     13.8374+-0.2928        ? might be 1.0131x slower
   int-or-other-max-then-get-by-val                8.6050+-0.3335     ?      8.8420+-0.2348        ? might be 1.0275x slower
   int-or-other-min-then-get-by-val                7.0259+-0.0817     ?      7.0536+-0.0841        ?
   int-or-other-mod-then-get-by-val                6.5811+-0.0837     ?      6.6192+-0.0770        ?
   int-or-other-mul-then-get-by-val                6.2142+-0.0797            6.1776+-0.0673        
   int-or-other-neg-then-get-by-val                7.0530+-0.0928     ?      7.0781+-0.1463        ?
   int-or-other-neg-zero-then-get-by-val          40.0904+-0.9424           39.7395+-1.3277        
   int-or-other-sub-then-get-by-val                9.7512+-0.1150     ?      9.8447+-0.0999        ?
   int-or-other-sub                                7.5048+-0.1044            7.4675+-0.0922        
   int-overflow-local                             12.2717+-0.0939     ?     12.3912+-0.1467        ?
   Int16Array-bubble-sort                         47.9568+-0.2409           47.8767+-0.2574        
   Int16Array-load-int-mul                         1.6440+-0.0055     !      1.6625+-0.0054        ! definitely 1.0112x slower
   Int8Array-load                                  4.1016+-0.0089     !      4.1242+-0.0093        ! definitely 1.0055x slower
   integer-divide                                 14.1260+-0.1495           13.9724+-0.1302          might be 1.0110x faster
   integer-modulo                                  2.2314+-0.0247     !      2.5744+-0.0699        ! definitely 1.1537x slower
   make-indexed-storage                            3.8672+-0.0235     ?      3.8971+-0.0257        ?
   method-on-number                               25.8172+-0.4572           25.5702+-0.2097        
   nested-function-parsing-random                362.4207+-8.6741     ?    362.5110+-7.8273        ?
   nested-function-parsing                        44.2154+-1.3448           43.8929+-1.3587        
   new-array-buffer-dead                           3.1131+-0.0344            3.1070+-0.0208        
   new-array-buffer-push                           8.9108+-0.1986            8.8308+-0.1789        
   new-array-dead                                 23.6157+-0.0904     ?     23.7266+-0.1060        ?
   new-array-push                                  7.7680+-0.8281            7.3580+-0.8312          might be 1.0557x faster
   number-test                                     3.9562+-0.0067     !      3.9843+-0.0114        ! definitely 1.0071x slower
   object-closure-call                             7.4360+-0.0873     ?      7.4516+-0.0926        ?
   object-test                                     5.3319+-0.0526     ?      5.3892+-0.0736        ? might be 1.0107x slower
   poly-stricteq                                 125.2238+-0.2312     ?    125.4714+-0.6003        ?
   polymorphic-structure                          20.9966+-0.1812           20.9728+-0.0953        
   polyvariant-monomorphic-get-by-id              10.6855+-0.1418     ?     11.8826+-2.2215        ? might be 1.1120x slower
   rare-osr-exit-on-local                         17.4106+-0.1685     ?     17.4527+-0.1974        ?
   register-pressure-from-osr                     39.8603+-0.1835     ?     39.8854+-0.1673        ?
   simple-activation-demo                         32.8159+-0.1963           32.7628+-0.2312        
   slow-array-profile-convergence                  4.9639+-0.0184            4.9515+-0.0582        
   slow-convergence                                3.4908+-0.0091     !      3.5181+-0.0125        ! definitely 1.0078x slower
   sparse-conditional                              1.0529+-0.0081     ?      1.0669+-0.0082        ? might be 1.0134x slower
   splice-to-remove                               82.7474+-0.5393           82.2995+-0.4044        
   string-concat-object                            2.5940+-0.0951     ?      2.6412+-0.0814        ? might be 1.0182x slower
   string-concat-pair-object                       1.7753+-0.0363     ?      1.8018+-0.0199        ? might be 1.0149x slower
   string-concat-pair-simple                       9.8503+-0.1304     ?     10.0220+-0.1592        ? might be 1.0174x slower
   string-concat-simple                           23.9797+-0.3282     ?     23.9944+-0.4071        ?
   string-cons-repeat                              7.9639+-0.0929     ?      8.0451+-0.1028        ? might be 1.0102x slower
   string-cons-tower                               7.8234+-0.1375     ?      7.8898+-0.1135        ?
   string-equality                               115.8730+-2.2722          114.4837+-1.1632          might be 1.0121x faster
   string-hash                                     2.6324+-0.0090     !      2.6520+-0.0074        ! definitely 1.0074x slower
   string-repeat-arith                            95.8650+-0.4895           95.4210+-0.2657        
   string-sub                                    170.1931+-0.8318     ?    171.0446+-1.0772        ?
   string-test                                     4.0589+-0.0068     !      4.0987+-0.0053        ! definitely 1.0098x slower
   structure-hoist-over-transitions                2.6808+-0.0232     ?      2.7037+-0.0153        ?
   tear-off-arguments-simple                       1.7446+-0.0084     !      1.7646+-0.0075        ! definitely 1.0115x slower
   tear-off-arguments                              3.2545+-0.0078     !      3.2722+-0.0085        ! definitely 1.0054x slower
   temporal-structure                             20.9941+-0.1439           20.9355+-0.1897        
   to-int32-boolean                               27.9474+-0.1637           27.9473+-0.1661        
   undefined-test                                  4.1003+-0.0313     !      4.1557+-0.0195        ! definitely 1.0135x slower

   <arithmetic>                                   22.6581+-0.0684     ?     22.6658+-0.0613        ? might be 1.0003x slower
   <geometric> *                                   9.1816+-0.0388     ?      9.2349+-0.0446        ? might be 1.0058x slower
   <harmonic>                                      4.8062+-0.0292     !      4.8670+-0.0278        ! definitely 1.0127x slower

                                                     TipOfTree                    X87                                        
All benchmarks:
   <arithmetic>                                   40.5379+-0.1082     ?     40.5997+-0.2094        ? might be 1.0015x slower
   <geometric>                                    11.0130+-0.0549     ?     11.0570+-0.0580        ? might be 1.0040x slower
   <harmonic>                                      3.6120+-0.0217     ?      3.6278+-0.0201        ? might be 1.0044x slower

                                                     TipOfTree                    X87                                        
Geomean of preferred means:
   <scaled-result>                                22.2650+-0.1099     ?     22.3189+-0.1192        ? might be 1.0024x slower
Comment 36 Allan Sandfeld Jensen 2013-04-11 12:42:24 PDT
(In reply to comment #33)
> (From update of attachment 197236 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=197236&action=review
> 
> > Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:708
> > +    frelease ft0, ft1
> 
> I'm not sure how much I like this new op.  This feels like it could get quite fragile - we probably will be adding more stuff to LLint that uses doubles, and it would be weird to have to remember to call this.
> 
> Are you sure it's a speedup over the finit approach?

From some of the timing info I got for finit it seemed very slow. Though I can't imagine why if the fpu is unused and finit has the effect of a nop.

One thing missing from from the finit patch is that it only cleans up when calling deeper functions, not when returning from a function. So if a FP using function called into llint, that would return with a unclean fpu state that could lead to wrong behavior in the llint caller. Could we add a finit instructions to exit thunks of llint?
Comment 37 Filip Pizlo 2013-04-11 12:46:34 PDT
(In reply to comment #36)
> (In reply to comment #33)
> > (From update of attachment 197236 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=197236&action=review
> > 
> > > Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:708
> > > +    frelease ft0, ft1
> > 
> > I'm not sure how much I like this new op.  This feels like it could get quite fragile - we probably will be adding more stuff to LLint that uses doubles, and it would be weird to have to remember to call this.
> > 
> > Are you sure it's a speedup over the finit approach?
> 
> From some of the timing info I got for finit it seemed very slow. Though I can't imagine why if the fpu is unused and finit has the effect of a nop.

Interesting.  I'm doing full LLInt-only perf tests right now, and you're right, finit appears sooooper slow.  It's quite shocking actually!

My tests are still running, I'll post results here shortly!

> 
> One thing missing from from the finit patch is that it only cleans up when calling deeper functions, not when returning from a function. So if a FP using function called into llint, that would return with a unclean fpu state that could lead to wrong behavior in the llint caller. Could we add a finit instructions to exit thunks of llint?

I would add it to the ctiTrampoline, which the LLint uses.
Comment 38 Filip Pizlo 2013-04-11 13:32:05 PDT
Wow, that was pretty bad.  This is the finit patch with JITs disabled.


Benchmark report for SunSpider, V8Spider, Octane, Kraken, and JSRegress on bigmac (MacPro5,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quartary/OpenSource/WebKitBuild/Release/jsc (r148221)
"X87" at /Volumes/Data/pizlo/secondary/OpenSource/WebKitBuild/Release/jsc (r148221)

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get
microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                     TipOfTree                    X87                                        
SunSpider:
   3d-cube                                        17.9707+-0.1456     !     27.6701+-0.2054        ! definitely 1.5397x slower
   3d-morph                                       24.8084+-0.6925     !     28.5495+-0.3506        ! definitely 1.1508x slower
   3d-raytrace                                    29.7102+-0.2377     !     41.7799+-0.2842        ! definitely 1.4062x slower
   access-binary-trees                            13.0314+-0.1003     !     16.5678+-0.1654        ! definitely 1.2714x slower
   access-fannkuch                                41.8859+-0.6660     !     44.6337+-0.2818        ! definitely 1.0656x slower
   access-nbody                                   18.7420+-0.2359     !     31.4470+-0.2213        ! definitely 1.6779x slower
   access-nsieve                                   9.4611+-0.1465     ?      9.6155+-0.0866        ? might be 1.0163x slower
   bitops-3bit-bits-in-byte                       16.5080+-0.1824     !     17.8070+-0.1768        ! definitely 1.0787x slower
   bitops-bits-in-byte                            27.3810+-0.2920     !     48.1828+-0.3189        ! definitely 1.7597x slower
   bitops-bitwise-and                             52.0208+-0.2235     !    103.8744+-0.6643        ! definitely 1.9968x slower
   bitops-nsieve-bits                             36.8362+-1.6001     !     58.9099+-0.3699        ! definitely 1.5992x slower
   controlflow-recursive                          18.7358+-0.1621     !     25.4887+-0.1959        ! definitely 1.3604x slower
   crypto-aes                                     18.7230+-0.1685     !     21.4457+-0.1522        ! definitely 1.1454x slower
   crypto-md5                                     18.4726+-0.1677     !     21.1467+-0.1948        ! definitely 1.1448x slower
   crypto-sha1                                    17.2736+-0.2514     !     19.9465+-0.2236        ! definitely 1.1547x slower
   date-format-tofte                              24.4813+-0.1295     !     34.6394+-0.1085        ! definitely 1.4149x slower
   date-format-xparb                              25.3811+-0.2380     !     31.5430+-0.2089        ! definitely 1.2428x slower
   math-cordic                                    37.6304+-0.4022     !     58.6506+-0.1875        ! definitely 1.5586x slower
   math-partial-sums                              34.7646+-0.2600     !     62.4406+-0.2836        ! definitely 1.7961x slower
   math-spectral-norm                             17.7924+-0.2257     !     22.8651+-0.2856        ! definitely 1.2851x slower
   regexp-dna                                      8.7932+-0.1061     ?      9.0205+-0.1783        ? might be 1.0259x slower
   string-base64                                  26.7820+-0.2573     !     36.0657+-0.2347        ! definitely 1.3466x slower
   string-fasta                                   22.7089+-0.2262     !     34.8408+-0.1085        ! definitely 1.5342x slower
   string-tagcloud                                21.8114+-0.2932     !     27.2479+-0.1792        ! definitely 1.2493x slower
   string-unpack-code                             27.8768+-0.1707     !     34.1804+-0.2007        ! definitely 1.2261x slower
   string-validate-input                          19.5878+-0.2980     !     26.9678+-0.3583        ! definitely 1.3768x slower

   <arithmetic> *                                 24.1989+-0.1311     !     34.4434+-0.0844        ! definitely 1.4233x slower
   <geometric>                                    22.3012+-0.1200     !     29.8960+-0.0934        ! definitely 1.3406x slower
   <harmonic>                                     20.4460+-0.1176     !     25.7842+-0.1119        ! definitely 1.2611x slower

                                                     TipOfTree                    X87                                        
V8Spider:
   crypto                                        897.4223+-9.6310     !    955.2835+-6.6001        ! definitely 1.0645x slower
   deltablue                                    2434.8941+-23.6702    !   2915.4419+-17.4317       ! definitely 1.1974x slower
   earley-boyer                                  538.8258+-3.4744     !    778.5211+-5.3388        ! definitely 1.4448x slower
   raytrace                                      284.8305+-1.8339     !    366.3276+-2.0507        ! definitely 1.2861x slower
   regexp                                        121.5716+-0.3513     !    133.2938+-0.4799        ! definitely 1.0964x slower
   richards                                     2478.7269+-24.6796    !   2872.7077+-19.3525       ! definitely 1.1589x slower
   splay                                         205.5477+-0.6517     !    274.4447+-1.2608        ! definitely 1.3352x slower

   <arithmetic>                                  994.5456+-4.1183     !   1185.1458+-3.3382        ! definitely 1.1916x slower
   <geometric> *                                 574.9296+-1.3303     !    701.3212+-0.8083        ! definitely 1.2198x slower
   <harmonic>                                    343.3571+-0.5671     !    414.2277+-0.7583        ! definitely 1.2064x slower

                                                     TipOfTree                    X87                                        
Octane and V8v7:
   encrypt                                        5.61052+-0.00662    !     6.06751+-0.00675       ! definitely 1.0815x slower
   decrypt                                      105.82162+-0.13131    !   112.92338+-0.10450       ! definitely 1.0671x slower
   deltablue                             x2      16.04073+-0.12198    !    19.20199+-0.12105       ! definitely 1.1971x slower
   earley                                         6.77426+-0.03344    !     9.01893+-0.02794       ! definitely 1.3314x slower
   boyer                                        130.22991+-0.38527    !   186.68642+-1.14655       ! definitely 1.4335x slower
   raytrace                              x2      47.32154+-0.08259    !    61.00536+-0.10240       ! definitely 1.2892x slower
   regexp                                x2      42.93033+-0.19771    !    46.60571+-0.15538       ! definitely 1.0856x slower
   richards                              x2       7.28965+-0.05698    !     8.28408+-0.03999       ! definitely 1.1364x slower
   splay                                 x2       2.80364+-0.01558    !     3.69599+-0.02228       ! definitely 1.3183x slower
   navier-stokes                         x2      93.41132+-0.18337    !   137.42017+-0.13622       ! definitely 1.4711x slower
   closure                                        0.32450+-0.00902    ?     0.32824+-0.01026       ? might be 1.0115x slower
   jquery                                         3.55439+-0.50068    ?     3.63746+-0.50302       ? might be 1.0234x slower
   gbemu                                 x2     576.01998+-2.52232    !   811.78436+-0.90053       ! definitely 1.4093x slower
   box2d                                 x2     198.39429+-0.41165    !   246.71447+-0.70488       ! definitely 1.2436x slower

V8v7:
   <arithmetic>                                  41.75192+-0.04401    !    54.19518+-0.04514       ! definitely 1.2980x slower
   <geometric> *                                 21.46209+-0.05070    !    26.54716+-0.02672       ! definitely 1.2369x slower
   <harmonic>                                    10.21905+-0.04141    !    12.62915+-0.03699       ! definitely 1.2358x slower

Octane including V8v7:
   <arithmetic>                                 100.94264+-0.22823    !   135.82210+-0.11418       ! definitely 1.3455x slower
   <geometric> *                                 26.95841+-0.19774    !    33.16708+-0.25886       ! definitely 1.2303x slower
   <harmonic>                                     4.44173+-0.11147    !     4.77373+-0.13823       ! definitely 1.0747x slower

                                                     TipOfTree                    X87                                        
Kraken:
   ai-astar                                      2915.956+-11.411     !    4438.416+-11.309        ! definitely 1.5221x slower
   audio-beat-detection                          1355.453+-0.817      !    1827.941+-11.880        ! definitely 1.3486x slower
   audio-dft                                     1080.029+-1.664      !    1398.931+-5.902         ! definitely 1.2953x slower
   audio-fft                                     1254.061+-0.409      !    1692.906+-1.763         ! definitely 1.3499x slower
   audio-oscillator                              1152.206+-16.154     !    1604.303+-2.480         ! definitely 1.3924x slower
   imaging-darkroom                              2009.602+-9.660      !    2894.022+-6.894         ! definitely 1.4401x slower
   imaging-desaturate                            3230.711+-21.289     !    4436.863+-9.205         ! definitely 1.3733x slower
   imaging-gaussian-blur                        10705.198+-128.187        10691.383+-77.066        
   json-parse-financial                            78.417+-0.332      ^      75.490+-0.458         ^ definitely 1.0388x faster
   json-stringify-tinderbox                       106.198+-0.202      !     107.703+-0.363         ! definitely 1.0142x slower
   stanford-crypto-aes                            684.877+-1.218      !     901.412+-3.234         ! definitely 1.3162x slower
   stanford-crypto-ccm                            449.184+-0.355      !     466.431+-1.235         ! definitely 1.0384x slower
   stanford-crypto-pbkdf2                        1855.865+-3.148      !    2550.453+-1.299         ! definitely 1.3743x slower
   stanford-crypto-sha256-iterative               647.911+-1.594      !     868.874+-1.338         ! definitely 1.3410x slower

   <arithmetic> *                                1966.119+-8.148      !    2425.366+-6.062         ! definitely 1.2336x slower
   <geometric>                                   1020.027+-1.289      !    1281.029+-1.316         ! definitely 1.2559x slower
   <harmonic>                                     430.629+-0.870      !     456.109+-1.530         ! definitely 1.0592x slower

                                                     TipOfTree                    X87                                        
JSRegress:
   adapt-to-double-divide                         47.9787+-0.2660     !     94.7094+-0.2438        ! definitely 1.9740x slower
   aliased-arguments-getbyval                      8.4379+-0.1323     !     12.6069+-0.0897        ! definitely 1.4941x slower
   allocate-big-object                            27.6678+-0.3890     !     41.2623+-0.3427        ! definitely 1.4913x slower
   arity-mismatch-inlining                        16.3782+-0.2144     !     28.0538+-0.1212        ! definitely 1.7129x slower
   array-access-polymorphic-structure             54.0342+-0.6186     !     88.3635+-0.5378        ! definitely 1.6353x slower
   array-with-double-add                          33.7265+-0.2040     !     46.6482+-0.2422        ! definitely 1.3831x slower
   array-with-double-increment                    35.6488+-0.1635     !     69.5987+-0.2775        ! definitely 1.9523x slower
   array-with-double-mul-add                      61.1028+-0.3000     !     83.7804+-0.1966        ! definitely 1.3711x slower
   array-with-double-sum                          22.5687+-0.0991     !     31.2035+-0.1861        ! definitely 1.3826x slower
   array-with-int32-add-sub                       63.5340+-1.0317           63.3974+-0.8988        
   array-with-int32-or-double-sum                 22.7125+-0.1767     !     31.3637+-0.2571        ! definitely 1.3809x slower
   big-int-mul                                   132.1791+-1.6020     !    210.6236+-0.6975        ! definitely 1.5935x slower
   boolean-test                                   44.8232+-0.5983     !     80.7421+-1.0616        ! definitely 1.8013x slower
   cast-int-to-double                            268.5540+-1.4758     !    526.1327+-2.1717        ! definitely 1.9591x slower
   cell-argument                                  77.7083+-0.2848     ?     77.9413+-0.2473        ?
   cfg-simplify                                  132.1752+-0.8643     !    225.0296+-1.2993        ! definitely 1.7025x slower
   cmpeq-obj-to-obj-other                        119.7733+-1.0378     !    193.0783+-0.8836        ! definitely 1.6120x slower
   constant-test                                 283.3301+-0.6698     !    545.8044+-2.4496        ! definitely 1.9264x slower
   direct-arguments-getbyval                       3.6786+-0.0274     !      4.8977+-0.0603        ! definitely 1.3314x slower
   double-pollution-getbyval                      35.7608+-0.2526     !     60.3137+-0.2196        ! definitely 1.6866x slower
   double-pollution-putbyoffset                   33.8859+-0.6200     !     56.7275+-0.2107        ! definitely 1.6741x slower
   empty-string-plus-int                          25.9678+-0.3229     !     40.2512+-0.2047        ! definitely 1.5500x slower
   external-arguments-getbyval                     9.2215+-0.0939     !     13.5751+-0.1036        ! definitely 1.4721x slower
   external-arguments-putbyval                    15.4295+-0.1779     !     22.3645+-0.2252        ! definitely 1.4495x slower
   Float32Array-matrix-mult                       67.8031+-0.6321     !    102.6558+-0.7438        ! definitely 1.5140x slower
   fold-double-to-int                            632.2122+-5.5484     !   1078.4247+-4.0731        ! definitely 1.7058x slower
   function-dot-apply                             82.6871+-0.6923     !    128.9930+-0.8548        ! definitely 1.5600x slower
   function-test                                  47.3091+-0.5687     !     85.8363+-0.5746        ! definitely 1.8144x slower
   get-by-id-chain-from-try-block                134.9253+-1.4403     !    162.5439+-1.0927        ! definitely 1.2047x slower
   HashMap-put-get-iterate-keys                  464.1388+-4.4157     !    682.3402+-3.6444        ! definitely 1.4701x slower
   HashMap-put-get-iterate                       427.8627+-1.8751     !    628.6139+-2.5704        ! definitely 1.4692x slower
   HashMap-string-put-get-iterate                305.2206+-1.8602     !    416.6194+-1.4381        ! definitely 1.3650x slower
   indexed-properties-in-objects                  23.2174+-0.1440     !     24.2929+-0.1557        ! definitely 1.0463x slower
   inline-arguments-access                        55.0692+-0.6571     !     84.1111+-0.4045        ! definitely 1.5274x slower
   inline-arguments-local-escape                 104.0482+-0.4016     !    158.8205+-0.7914        ! definitely 1.5264x slower
   inline-get-scoped-var                          96.8276+-13.0027    ?     99.7438+-14.3146       ? might be 1.0301x slower
   inlined-put-by-id-transition                  228.2877+-1.2493     !    297.1468+-1.8769        ! definitely 1.3016x slower
   int-or-other-abs-then-get-by-val              140.1633+-0.8379     !    226.2548+-1.5490        ! definitely 1.6142x slower
   int-or-other-abs-zero-then-get-by-val         282.8424+-1.8545     !    468.3107+-2.5255        ! definitely 1.6557x slower
   int-or-other-add-then-get-by-val              244.1959+-1.1042     !    379.5522+-1.2374        ! definitely 1.5543x slower
   int-or-other-add                              226.3017+-1.5033     !    370.0413+-1.4292        ! definitely 1.6352x slower
   int-or-other-div-then-get-by-val              105.5875+-0.7421     !    167.7434+-0.4612        ! definitely 1.5887x slower
   int-or-other-max-then-get-by-val              139.9950+-0.4990     !    219.6380+-0.7832        ! definitely 1.5689x slower
   int-or-other-min-then-get-by-val              140.4771+-0.6001     !    221.5659+-0.6894        ! definitely 1.5772x slower
   int-or-other-mod-then-get-by-val              103.5357+-0.5079     !    174.2193+-1.1815        ! definitely 1.6827x slower
   int-or-other-mul-then-get-by-val              125.3206+-0.9712     !    203.7862+-0.9880        ! definitely 1.6261x slower
   int-or-other-neg-then-get-by-val              127.5894+-0.7397     !    214.6598+-0.6223        ! definitely 1.6824x slower
   int-or-other-neg-zero-then-get-by-val         284.2008+-1.2491     !    475.4374+-1.7580        ! definitely 1.6729x slower
   int-or-other-sub-then-get-by-val              243.5460+-0.8323     !    377.8331+-1.1872        ! definitely 1.5514x slower
   int-or-other-sub                              228.1011+-1.5760     !    364.1205+-1.6924        ! definitely 1.5963x slower
   int-overflow-local                            174.4003+-1.8327     !    327.9483+-1.4215        ! definitely 1.8804x slower
   Int16Array-bubble-sort                       1092.0413+-16.5536    !   1650.0448+-15.0169       ! definitely 1.5110x slower
   Int16Array-load-int-mul                        27.4004+-0.1134     !     57.4130+-0.4792        ! definitely 2.0953x slower
   Int8Array-load                                 36.5696+-0.4347     !     63.2672+-0.3075        ! definitely 1.7300x slower
   integer-divide                                393.3169+-1.6278     !    671.1779+-1.2598        ! definitely 1.7065x slower
   integer-modulo                                  9.0946+-0.0884     !     16.9393+-0.1593        ! definitely 1.8626x slower
   make-indexed-storage                            9.5870+-0.0822     !     10.3789+-0.1035        ! definitely 1.0826x slower
   method-on-number                               40.0598+-0.5277     !     54.9131+-1.0029        ! definitely 1.3708x slower
   nested-function-parsing-random                393.6571+-8.4825     !    417.0077+-7.8966        ! definitely 1.0593x slower
   nested-function-parsing                        47.6881+-1.3580     !     50.5424+-1.2863        ! definitely 1.0599x slower
   new-array-buffer-dead                        1001.0281+-5.2642     !   1377.5106+-6.0891        ! definitely 1.3761x slower
   new-array-buffer-push                          56.3387+-0.4913     !     77.2520+-0.4672        ! definitely 1.3712x slower
   new-array-dead                               1001.6732+-14.3254    !   1639.4984+-6.7574        ! definitely 1.6368x slower
   new-array-push                                 38.8717+-0.7892     !     60.2341+-0.9933        ! definitely 1.5496x slower
   number-test                                    45.1636+-0.6120     !     79.2752+-0.2523        ! definitely 1.7553x slower
   object-closure-call                           185.5009+-0.6381     !    297.4842+-1.3989        ! definitely 1.6037x slower
   object-test                                    46.7065+-0.5818     !     84.2340+-0.2634        ! definitely 1.8035x slower
   poly-stricteq                                 978.6510+-13.0603    !   1606.3261+-13.4500       ! definitely 1.6414x slower
   polymorphic-structure                        1052.3085+-29.0701    !   1752.2201+-36.4423       ! definitely 1.6651x slower
   polyvariant-monomorphic-get-by-id             521.1988+-9.1206     !    745.0362+-6.9739        ! definitely 1.4295x slower
   rare-osr-exit-on-local                        114.8795+-0.1844     !    210.5250+-0.3408        ! definitely 1.8326x slower
   register-pressure-from-osr                    377.8423+-2.3739     !    510.9493+-1.3184        ! definitely 1.3523x slower
   simple-activation-demo                        250.7149+-0.7627     !    267.2369+-0.9739        ! definitely 1.0659x slower
   slow-array-profile-convergence                 18.7003+-0.1632     !     21.9731+-0.1644        ! definitely 1.1750x slower
   slow-convergence                               15.0791+-0.1335     !     22.0212+-0.4736        ! definitely 1.4604x slower
   sparse-conditional                             44.7012+-1.3341     !     70.5484+-0.2370        ! definitely 1.5782x slower
   splice-to-remove                              108.0986+-0.7546     !    122.5045+-0.4792        ! definitely 1.1333x slower
   string-concat-object                           42.6399+-0.7259     !     63.1830+-0.8520        ! definitely 1.4818x slower
   string-concat-pair-object                      43.1884+-0.9060     !     59.3781+-0.6565        ! definitely 1.3749x slower
   string-concat-pair-simple                     180.0175+-6.6789     !    328.8233+-7.8179        ! definitely 1.8266x slower
   string-concat-simple                          198.0448+-7.2361     !    344.0921+-7.1324        ! definitely 1.7374x slower
   string-cons-repeat                            210.2401+-4.6946     !    357.2951+-4.2432        ! definitely 1.6995x slower
   string-cons-tower                             209.9568+-3.5678     !    242.5502+-0.8849        ! definitely 1.1552x slower
   string-equality                               447.8517+-1.3682     !    739.4193+-4.4193        ! definitely 1.6510x slower
   string-hash                                    67.5674+-0.9066     !    107.6550+-0.6354        ! definitely 1.5933x slower
   string-repeat-arith                           114.0440+-0.5667     !    152.3804+-0.3029        ! definitely 1.3362x slower
   string-sub                                    348.3055+-1.9988     !    524.0287+-1.0700        ! definitely 1.5045x slower
   string-test                                    45.1624+-0.6353     !     79.7162+-0.3508        ! definitely 1.7651x slower
   structure-hoist-over-transitions               33.4064+-0.7943     !     46.0349+-0.3370        ! definitely 1.3780x slower
   tear-off-arguments-simple                      51.7674+-0.4439     !     77.7820+-0.6882        ! definitely 1.5025x slower
   tear-off-arguments                             84.1647+-0.3593     !    127.3961+-0.4406        ! definitely 1.5137x slower
   temporal-structure                           1063.6193+-33.7472    !   1740.9303+-35.1564       ! definitely 1.6368x slower
   to-int32-boolean                              476.3257+-1.7403     !    869.0256+-3.0583        ! definitely 1.8244x slower
   undefined-test                                 44.5179+-0.5381     !     80.0131+-0.5642        ! definitely 1.7973x slower

   <arithmetic>                                  195.5302+-0.9561     !    304.9568+-1.3299        ! definitely 1.5596x slower
   <geometric> *                                  95.7849+-0.4214     !    145.5062+-0.5262        ! definitely 1.5191x slower
   <harmonic>                                     45.2604+-0.1408     !     65.6766+-0.2329        ! definitely 1.4511x slower

                                                     TipOfTree                    X87                                        
All benchmarks:
   <arithmetic>                                  341.8235+-0.9171     !    458.9000+-1.2699        ! definitely 1.3425x slower
   <geometric>                                         ERROR                     ERROR             
   <harmonic>                                     19.5051+-0.3127     !     22.9632+-0.4543        ! definitely 1.1773x slower

                                                     TipOfTree                    X87                                        
Geomean of preferred means:
   <scaled-result>                               147.8406+-0.4181     !    195.1065+-0.4459        ! definitely 1.3197x slower
Comment 39 Filip Pizlo 2013-04-11 13:32:42 PDT
(In reply to comment #37)
> (In reply to comment #36)
> > (In reply to comment #33)
> > > (From update of attachment 197236 [details] [details] [details])
> > > View in context: https://bugs.webkit.org/attachment.cgi?id=197236&action=review
> > > 
> > > > Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:708
> > > > +    frelease ft0, ft1
> > > 
> > > I'm not sure how much I like this new op.  This feels like it could get quite fragile - we probably will be adding more stuff to LLint that uses doubles, and it would be weird to have to remember to call this.
> > > 
> > > Are you sure it's a speedup over the finit approach?
> > 
> > From some of the timing info I got for finit it seemed very slow. Though I can't imagine why if the fpu is unused and finit has the effect of a nop.
> 
> Interesting.  I'm doing full LLInt-only perf tests right now, and you're right, finit appears sooooper slow.  It's quite shocking actually!
> 
> My tests are still running, I'll post results here shortly!
> 
> > 
> > One thing missing from from the finit patch is that it only cleans up when calling deeper functions, not when returning from a function. So if a FP using function called into llint, that would return with a unclean fpu state that could lead to wrong behavior in the llint caller. Could we add a finit instructions to exit thunks of llint?
> 
> I would add it to the ctiTrampoline, which the LLint uses.

Actually, you could put it in the LLInt's prologue!  That might be easier.
Comment 40 Filip Pizlo 2013-04-11 13:48:09 PDT
I hacked Allen's code and replaced finit with just ffree of st(0) and st(1), and limited the x86's backend set of FP registers to just ft0 and ft1.  This works fine for LLInt right now and probably will continue to work for the foreseeable future.  I think this will be more robust than having to call frelease inline in all of the places where you use floating point.  Here's a quick benchmark run of LLInt-only with x87 instead of SSE, and my ffree hack:


Benchmark report for SunSpider on bigmac (MacPro5,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quartary/OpenSource/WebKitBuild/Release/jsc (r148221)
"X87" at /Volumes/Data/pizlo/secondary/OpenSource/WebKitBuild/Release/jsc (r148221)

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between
sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific
preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95%
confidence intervals in milliseconds.

                                  TipOfTree                    X87                                        

3d-cube                        17.9783+-0.1753     !     26.3691+-0.1585        ! definitely 1.4667x slower
3d-morph                       24.3248+-0.9822     ?     25.2635+-0.1758        ? might be 1.0386x slower
3d-raytrace                    29.8836+-0.2624     !     34.3155+-0.3130        ! definitely 1.1483x slower
access-binary-trees            13.0033+-0.1874           12.9120+-0.1259        
access-fannkuch                41.3548+-0.2009     !     46.6845+-0.2882        ! definitely 1.1289x slower
access-nbody                   18.7540+-0.2405     !     31.6527+-0.1826        ! definitely 1.6878x slower
access-nsieve                   9.3333+-0.1018     !      9.9349+-0.1245        ! definitely 1.0645x slower
bitops-3bit-bits-in-byte       16.4463+-0.2274     !     17.8396+-0.1332        ! definitely 1.0847x slower
bitops-bits-in-byte            27.1933+-0.2444     !     28.2384+-0.3981        ! definitely 1.0384x slower
bitops-bitwise-and             50.7059+-0.2201     !     66.2652+-0.2938        ! definitely 1.3069x slower
bitops-nsieve-bits             40.6929+-1.5749     ^     36.8273+-0.1227        ^ definitely 1.1050x faster
controlflow-recursive          18.7057+-0.1341           18.6117+-0.1532        
crypto-aes                     18.9586+-0.1842     ^     18.4243+-0.1710        ^ definitely 1.0290x faster
crypto-md5                     18.4564+-0.2320     ^     17.6882+-0.2845        ^ definitely 1.0434x faster
crypto-sha1                    17.2156+-0.1717           16.7456+-0.3562          might be 1.0281x faster
date-format-tofte              24.7140+-0.1669     ^     24.0985+-0.2515        ^ definitely 1.0255x faster
date-format-xparb              25.1820+-0.2267           24.8168+-0.2614          might be 1.0147x faster
math-cordic                    37.8164+-0.3621     !     44.1493+-0.3623        ! definitely 1.1675x slower
math-partial-sums              35.1842+-0.3965     !     39.5324+-0.3883        ! definitely 1.1236x slower
math-spectral-norm             17.8386+-0.1806     ?     18.3042+-0.4146        ? might be 1.0261x slower
regexp-dna                      8.9110+-0.2140     ?      8.9316+-0.1848        ?
string-base64                  26.8302+-0.2538     ?     27.2993+-0.3646        ? might be 1.0175x slower
string-fasta                   22.7460+-0.1565     !     25.8323+-0.3643        ! definitely 1.1357x slower
string-tagcloud                21.9842+-0.1848           21.8362+-0.1764        
string-unpack-code             27.8313+-0.2588     ?     28.1235+-0.2181        ? might be 1.0105x slower
string-validate-input          19.6671+-0.3223     ?     20.0778+-0.4098        ? might be 1.0209x slower

<arithmetic> *                 24.2966+-0.0980     !     26.5683+-0.0817        ! definitely 1.0935x slower
<geometric>                    22.3661+-0.1046     !     24.0371+-0.0937        ! definitely 1.0747x slower
<harmonic>                     20.4804+-0.1135     !     21.6847+-0.1024        ! definitely 1.0588x slower

I think this is acceptable, particularly since it doesn't show up at all with all JITs enabled:


Benchmark report for SunSpider on bigmac (MacPro5,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quartary/OpenSource/WebKitBuild/Release/jsc (r148221)
"X87" at /Volumes/Data/pizlo/secondary/OpenSource/WebKitBuild/Release/jsc (r148221)

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between
sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific
preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95%
confidence intervals in milliseconds.

                                  TipOfTree                    X87                                        

3d-cube                         8.4606+-0.1155     ?      8.5137+-0.1353        ?
3d-morph                       11.6839+-0.1243           11.6104+-0.1301        
3d-raytrace                    11.0022+-0.2307           10.9968+-0.2107        
access-binary-trees             1.9070+-0.0387            1.8977+-0.0334        
access-fannkuch                 9.3917+-0.0751            9.2294+-0.1096          might be 1.0176x faster
access-nbody                    6.2916+-0.0990            6.2757+-0.0806        
access-nsieve                   4.5179+-0.0715            4.4826+-0.0484        
bitops-3bit-bits-in-byte        1.5943+-0.0121     ?      1.5974+-0.0122        ?
bitops-bits-in-byte             5.6738+-0.0634     ?      5.7138+-0.0460        ?
bitops-bitwise-and              1.8976+-0.0699            1.8559+-0.0504          might be 1.0225x faster
bitops-nsieve-bits              4.8670+-0.0680     ?      4.9450+-0.0705        ? might be 1.0160x slower
controlflow-recursive           2.9818+-0.0073            2.9806+-0.0127        
crypto-aes                      7.9970+-0.0659            7.9870+-0.0808        
crypto-md5                      4.1906+-0.0994     ?      4.2745+-0.0909        ? might be 1.0200x slower
crypto-sha1                     3.2123+-0.0590            3.2038+-0.0671        
date-format-tofte              14.1565+-0.2440           14.1345+-0.2333        
date-format-xparb               9.1981+-0.1982     ?      9.2653+-0.1622        ?
math-cordic                     3.6778+-0.0212     ?      3.6819+-0.0187        ?
math-partial-sums              11.8618+-0.1057     ?     11.9583+-0.1261        ?
math-spectral-norm              2.8211+-0.0566            2.8153+-0.0467        
regexp-dna                      8.6979+-0.1307     ?      8.9600+-0.1483        ? might be 1.0301x slower
string-base64                   4.3886+-0.0219     ?      4.4054+-0.0541        ?
string-fasta                   10.9107+-0.1414     ?     10.9863+-0.1428        ?
string-tagcloud                13.7753+-0.2281           13.6759+-0.2051        
string-unpack-code             25.6134+-0.4697           25.5479+-0.4052        
string-validate-input           7.6822+-0.3646            7.6761+-0.3357        

<arithmetic> *                  7.6328+-0.0693     ?      7.6412+-0.0711        ? might be 1.0011x slower
<geometric>                     6.1039+-0.0556     ?      6.1107+-0.0539        ? might be 1.0011x slower
<harmonic>                      4.7953+-0.0441            4.7926+-0.0410          might be 1.0006x faster

I will do more extensive benchmark runs now.
Comment 41 Filip Pizlo 2013-04-11 13:50:12 PDT
Created attachment 197667 [details]
my version
Comment 42 Allan Sandfeld Jensen 2013-04-11 14:18:37 PDT
Comment on attachment 197667 [details]
my version

View in context: https://bugs.webkit.org/attachment.cgi?id=197667&action=review

> Source/JavaScriptCore/offlineasm/x86.rb:965
> +                $asm.puts "fld #{operands[0].x87Operand(0)}"
> +                $asm.puts "frndint"
> +                $asm.puts "fucomip #{operands[0].x87Operand(1)}"

I should probably have mentioned it. I changed these lines in the latest patch because frndint didn't behave as expected. To properly cast double to integer, you need load the integer back from the stack.
Comment 43 Filip Pizlo 2013-04-11 14:27:22 PDT
(In reply to comment #42)
> (From update of attachment 197667 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=197667&action=review
> 
> > Source/JavaScriptCore/offlineasm/x86.rb:965
> > +                $asm.puts "fld #{operands[0].x87Operand(0)}"
> > +                $asm.puts "frndint"
> > +                $asm.puts "fucomip #{operands[0].x87Operand(1)}"
> 
> I should probably have mentioned it. I changed these lines in the latest patch because frndint didn't behave as expected. To properly cast double to integer, you need load the integer back from the stack.

Ah, OK!  I will make this change.
Comment 44 Filip Pizlo 2013-04-11 14:27:42 PDT
Comment on attachment 197667 [details]
my version

Clearing r? because I need to integrate Allen's latest change.
Comment 45 Allan Sandfeld Jensen 2013-04-11 14:40:06 PDT
(In reply to comment #43)
> (In reply to comment #42)
> > (From update of attachment 197667 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=197667&action=review
> > 
> > > Source/JavaScriptCore/offlineasm/x86.rb:965
> > > +                $asm.puts "fld #{operands[0].x87Operand(0)}"
> > > +                $asm.puts "frndint"
> > > +                $asm.puts "fucomip #{operands[0].x87Operand(1)}"
> > 
> > I should probably have mentioned it. I changed these lines in the latest patch because frndint didn't behave as expected. To properly cast double to integer, you need load the integer back from the stack.
> 
> Ah, OK!  I will make this change.

I also removed the changes to the stack-pointer to save instructions. It does make valgrind complain and I guess it is bad practice to access below the stackpointer, but it should be safe, right?
Comment 46 Filip Pizlo 2013-04-11 14:42:41 PDT
(In reply to comment #45)
> (In reply to comment #43)
> > (In reply to comment #42)
> > > (From update of attachment 197667 [details] [details] [details])
> > > View in context: https://bugs.webkit.org/attachment.cgi?id=197667&action=review
> > > 
> > > > Source/JavaScriptCore/offlineasm/x86.rb:965
> > > > +                $asm.puts "fld #{operands[0].x87Operand(0)}"
> > > > +                $asm.puts "frndint"
> > > > +                $asm.puts "fucomip #{operands[0].x87Operand(1)}"
> > > 
> > > I should probably have mentioned it. I changed these lines in the latest patch because frndint didn't behave as expected. To properly cast double to integer, you need load the integer back from the stack.
> > 
> > Ah, OK!  I will make this change.
> 
> I also removed the changes to the stack-pointer to save instructions. It does make valgrind complain and I guess it is bad practice to access below the stackpointer, but it should be safe, right?

It should be safe but it depends on how big your red zone is.  The only problem with accessing below the stack pointer is that if a signal fires, it will push onto the stack and possibly clobber things.  But signal handling logic already respects a "red zone" of stack locations beneath the stack that user code is allowed to play with.  The size of it varies by platform and calling convention.

I kind of prefer to not use the red zone just because the risk is often not worth it, but I think you're totally safe here - the red zone ought to always be at least 8 bytes (it's usually much more than that!) and we only have to worry about what happens on X86 platforms.  I'm fine with it.
Comment 47 Filip Pizlo 2013-04-11 14:46:57 PDT
Here are the full no-JIT results with the patch.  Note that the "all benchmarks <geometric>" thingy is reporting "ERROR" because I think there's a floating point bug in my harness - lol numerical stability is hard. ;-)

I will run JIT-enabled benchmarks next.


Benchmark report for SunSpider, V8Spider, Octane, Kraken, and JSRegress on bigmac (MacPro5,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quartary/OpenSource/WebKitBuild/Release/jsc (r148221)
"X87" at /Volumes/Data/pizlo/secondary/OpenSource/WebKitBuild/Release/jsc (r148221)

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get
microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                     TipOfTree                    X87                                        
SunSpider:
   3d-cube                                        17.9544+-0.1384     !     26.2953+-0.2174        ! definitely 1.4646x slower
   3d-morph                                       23.2337+-0.6940     !     25.3911+-0.2052        ! definitely 1.0929x slower
   3d-raytrace                                    29.8920+-0.2649     !     34.0675+-0.1459        ! definitely 1.1397x slower
   access-binary-trees                            12.9582+-0.0859     ?     13.0112+-0.1347        ?
   access-fannkuch                                41.4482+-0.1920     !     46.7403+-0.5399        ! definitely 1.1277x slower
   access-nbody                                   18.5611+-0.1642     !     31.6303+-0.1455        ! definitely 1.7041x slower
   access-nsieve                                   9.4760+-0.1078     !      9.8115+-0.1013        ! definitely 1.0354x slower
   bitops-3bit-bits-in-byte                       16.3962+-0.1308     !     17.8102+-0.1786        ! definitely 1.0862x slower
   bitops-bits-in-byte                            26.9323+-0.1926     ?     27.6365+-0.6107        ? might be 1.0261x slower
   bitops-bitwise-and                             52.1668+-0.3800     !     60.3746+-2.1944        ! definitely 1.1573x slower
   bitops-nsieve-bits                             42.2204+-1.2434     ^     37.5594+-0.3900        ^ definitely 1.1241x faster
   controlflow-recursive                          18.8285+-0.1660     ?     18.8898+-0.1996        ?
   crypto-aes                                     18.9806+-0.2549     ^     18.2652+-0.1559        ^ definitely 1.0392x faster
   crypto-md5                                     18.5364+-0.1919     ^     17.7638+-0.4439        ^ definitely 1.0435x faster
   crypto-sha1                                    17.2322+-0.1944           16.8773+-0.3320          might be 1.0210x faster
   date-format-tofte                              24.9981+-0.2550     ^     24.2617+-0.2743        ^ definitely 1.0304x faster
   date-format-xparb                              25.4501+-0.1946     ^     25.0034+-0.1719        ^ definitely 1.0179x faster
   math-cordic                                    37.6483+-0.3665     !     44.2332+-0.3411        ! definitely 1.1749x slower
   math-partial-sums                              34.9296+-0.2630     !     40.1234+-0.5674        ! definitely 1.1487x slower
   math-spectral-norm                             17.6544+-0.1390     !     19.4503+-0.4917        ! definitely 1.1017x slower
   regexp-dna                                      8.8638+-0.1221            8.8510+-0.1542        
   string-base64                                  27.1159+-0.4925     ?     27.2110+-0.2304        ?
   string-fasta                                   22.6352+-0.1540     !     25.4636+-0.2456        ! definitely 1.1250x slower
   string-tagcloud                                21.7558+-0.1759           21.6285+-0.1623        
   string-unpack-code                             27.8736+-0.2633     ?     28.3440+-0.2486        ? might be 1.0169x slower
   string-validate-input                          19.5678+-0.3874     ?     19.8040+-0.3555        ? might be 1.0121x slower

   <arithmetic> *                                 24.3581+-0.1014     !     26.4038+-0.1161        ! definitely 1.0840x slower
   <geometric>                                    22.3753+-0.0960     !     24.0000+-0.1127        ! definitely 1.0726x slower
   <harmonic>                                     20.4727+-0.0906     !     21.6854+-0.1226        ! definitely 1.0592x slower

                                                     TipOfTree                    X87                                        
V8Spider:
   crypto                                        901.2628+-10.5638    !    937.3151+-11.2742       ! definitely 1.0400x slower
   deltablue                                    2434.4545+-21.2540        2425.3902+-26.9994       
   earley-boyer                                  537.9689+-2.6866     ^    530.5799+-3.1642        ^ definitely 1.0139x faster
   raytrace                                      284.2341+-1.9883          283.2019+-1.4540        
   regexp                                        121.2950+-0.2894     !    122.0735+-0.4668        ! definitely 1.0064x slower
   richards                                     2512.2443+-17.9037    ^   2427.0873+-19.0544       ^ definitely 1.0351x faster
   splay                                         207.0846+-0.6688     !    211.0563+-1.5495        ! definitely 1.0192x slower

   <arithmetic>                                  999.7920+-4.3096     ^    990.9577+-4.3002        ^ definitely 1.0089x faster
   <geometric> *                                 576.4975+-1.2882     ?    577.2247+-1.4423        ? might be 1.0013x slower
   <harmonic>                                    343.6464+-0.6157     !    345.8782+-0.7725        ! definitely 1.0065x slower

                                                     TipOfTree                    X87                                        
Octane and V8v7:
   encrypt                                        5.62788+-0.02001    !     5.85796+-0.01966       ! definitely 1.0409x slower
   decrypt                                      106.24871+-0.38143    !   110.18448+-0.62942       ! definitely 1.0370x slower
   deltablue                             x2      16.10377+-0.11289    ^    15.78929+-0.04329       ^ definitely 1.0199x faster
   earley                                         6.75416+-0.01812    ^     6.63992+-0.03241       ^ definitely 1.0172x faster
   boyer                                        130.16360+-0.45605    ^   127.45871+-0.73964       ^ definitely 1.0212x faster
   raytrace                              x2      47.67922+-0.09124         47.40098+-0.20901       
   regexp                                x2      42.79441+-0.18373    ?    43.11715+-0.18228       ?
   richards                              x2       7.29126+-0.08431    ^     7.01501+-0.06343       ^ definitely 1.0394x faster
   splay                                 x2       2.80017+-0.01659    ?     2.80727+-0.01369       ?
   navier-stokes                         x2      93.59975+-0.11213    !   134.18157+-0.08814       ! definitely 1.4336x slower
   closure                                        0.32426+-0.00951          0.32384+-0.00924       
   jquery                                         3.55382+-0.49951    ?     3.57177+-0.50344       ?
   gbemu                                 x2     576.86359+-1.51066    !   584.87711+-1.80031       ! definitely 1.0139x slower
   box2d                                 x2     198.84708+-0.49096    !   219.52346+-0.72270       ! definitely 1.1040x slower

V8v7:
   <arithmetic>                                  41.83322+-0.03653    !    46.92272+-0.09009       ! definitely 1.1217x slower
   <geometric> *                                 21.49163+-0.04201    !    22.38354+-0.05644       ! definitely 1.0415x slower
   <harmonic>                                    10.21874+-0.03439         10.21714+-0.03402         might be 1.0002x faster

Octane including V8v7:
   <arithmetic>                                 101.11959+-0.12112    !   107.43001+-0.12070       ! definitely 1.0624x slower
   <geometric> *                                 26.99365+-0.20971    !    28.09455+-0.20618       ! definitely 1.0408x slower
   <harmonic>                                     4.43981+-0.11752          4.43780+-0.11243         might be 1.0005x faster

                                                     TipOfTree                    X87                                        
Kraken:
   ai-astar                                      2926.207+-13.127     !    2963.124+-12.578        ! definitely 1.0126x slower
   audio-beat-detection                          1354.818+-0.347      !    1751.951+-0.461         ! definitely 1.2931x slower
   audio-dft                                     1094.455+-23.305     !    1396.675+-5.148         ! definitely 1.2761x slower
   audio-fft                                     1255.999+-2.040      !    1655.995+-1.407         ! definitely 1.3185x slower
   audio-oscillator                              1164.374+-14.177     !    1327.359+-4.681         ! definitely 1.1400x slower
   imaging-darkroom                              2029.454+-6.323      !    2665.679+-3.510         ! definitely 1.3135x slower
   imaging-desaturate                            3255.786+-47.958     !    3580.350+-3.315         ! definitely 1.0997x slower
   imaging-gaussian-blur                        10626.032+-144.452    ?   10745.350+-75.979        ? might be 1.0112x slower
   json-parse-financial                            78.443+-0.425      ^      75.241+-0.452         ^ definitely 1.0426x faster
   json-stringify-tinderbox                       106.984+-0.351      ^     105.948+-0.242         ^ definitely 1.0098x faster
   stanford-crypto-aes                            683.796+-1.363      ^     679.112+-1.086         ^ definitely 1.0069x faster
   stanford-crypto-ccm                            449.706+-0.656      ^     418.933+-0.729         ^ definitely 1.0735x faster
   stanford-crypto-pbkdf2                        1857.815+-5.765      !    1942.681+-2.657         ! definitely 1.0457x slower
   stanford-crypto-sha256-iterative               649.638+-2.537      !     663.995+-0.468         ! definitely 1.0221x slower

   <arithmetic> *                                1966.679+-10.376     !    2140.885+-5.390         ! definitely 1.0886x slower
   <geometric>                                   1023.590+-1.845      !    1118.242+-1.159         ! definitely 1.0925x slower
   <harmonic>                                     432.051+-1.079      ?     432.909+-1.201         ? might be 1.0020x slower

                                                     TipOfTree                    X87                                        
JSRegress:
   adapt-to-double-divide                         47.9152+-0.2307     !     95.5433+-1.0917        ! definitely 1.9940x slower
   aliased-arguments-getbyval                      8.5251+-0.1143            8.2975+-0.1161          might be 1.0274x faster
   allocate-big-object                            27.9000+-0.2032           27.6419+-0.2438        
   arity-mismatch-inlining                        16.4505+-0.2063     ?     16.6662+-0.2167        ? might be 1.0131x slower
   array-access-polymorphic-structure             54.3291+-0.5280     ?     54.4588+-0.3840        ?
   array-with-double-add                          33.7276+-0.1719     !     45.1450+-0.2714        ! definitely 1.3385x slower
   array-with-double-increment                    35.4592+-0.1703     !     40.1362+-0.1328        ! definitely 1.1319x slower
   array-with-double-mul-add                      61.8230+-1.0954     !     81.9887+-0.3190        ! definitely 1.3262x slower
   array-with-double-sum                          22.5105+-0.1474     !     30.6861+-0.1444        ! definitely 1.3632x slower
   array-with-int32-add-sub                       63.6002+-0.9658           63.5210+-0.9274        
   array-with-int32-or-double-sum                 22.7085+-0.1855     !     30.8559+-0.2031        ! definitely 1.3588x slower
   big-int-mul                                   130.4622+-0.7350     ?    132.3166+-1.7035        ? might be 1.0142x slower
   boolean-test                                   44.7516+-0.6647     ?     45.0121+-0.8531        ?
   cast-int-to-double                            269.4681+-1.3547     !    375.0991+-1.8633        ! definitely 1.3920x slower
   cell-argument                                  77.7230+-0.2840     ?     78.6507+-0.7465        ? might be 1.0119x slower
   cfg-simplify                                  133.1694+-1.8531     ?    134.0724+-2.7341        ?
   cmpeq-obj-to-obj-other                        120.8431+-1.1685     ^    116.8870+-1.1027        ^ definitely 1.0338x faster
   constant-test                                 283.4232+-0.6060     !    360.5058+-1.3039        ! definitely 1.2720x slower
   direct-arguments-getbyval                       3.6782+-0.0424            3.5864+-0.0499          might be 1.0256x faster
   double-pollution-getbyval                      35.5837+-0.1561     !     59.4998+-0.2744        ! definitely 1.6721x slower
   double-pollution-putbyoffset                   33.9927+-0.4477     !     39.0707+-0.3272        ! definitely 1.1494x slower
   empty-string-plus-int                          26.3457+-0.6217           26.2105+-0.1600        
   external-arguments-getbyval                     9.0183+-0.1179            8.9002+-0.0899          might be 1.0133x faster
   external-arguments-putbyval                    15.5025+-0.2450     ?     15.7071+-0.1708        ? might be 1.0132x slower
   Float32Array-matrix-mult                       68.1563+-0.6462     ?     69.1521+-0.5622        ? might be 1.0146x slower
   fold-double-to-int                            632.7413+-7.1740     !    681.0175+-4.3578        ! definitely 1.0763x slower
   function-dot-apply                             83.6074+-0.7094           82.5575+-0.5637          might be 1.0127x faster
   function-test                                  46.8864+-0.5865     ?     47.3746+-0.4820        ? might be 1.0104x slower
   get-by-id-chain-from-try-block                135.0482+-1.0815     ^    131.9623+-1.3240        ^ definitely 1.0234x faster
   HashMap-put-get-iterate-keys                  463.8817+-2.3252          456.6323+-5.2464          might be 1.0159x faster
   HashMap-put-get-iterate                       432.5620+-3.3130     ^    423.0020+-2.3462        ^ definitely 1.0226x faster
   HashMap-string-put-get-iterate                305.0262+-1.7646     ^    301.4069+-1.4414        ^ definitely 1.0120x faster
   indexed-properties-in-objects                  23.1630+-0.1067     ?     23.2080+-0.1854        ?
   inline-arguments-access                        55.4069+-0.7658     ^     50.7680+-0.6078        ^ definitely 1.0914x faster
   inline-arguments-local-escape                 106.0205+-2.1873     ^    102.7597+-0.7441        ^ definitely 1.0317x faster
   inline-get-scoped-var                         100.7289+-15.2184          99.3675+-14.3224         might be 1.0137x faster
   inlined-put-by-id-transition                  229.6349+-1.2355          229.4402+-1.5024        
   int-or-other-abs-then-get-by-val              139.3494+-0.6893     ?    140.5268+-0.6068        ?
   int-or-other-abs-zero-then-get-by-val         284.1493+-2.4926          282.4749+-1.4470        
   int-or-other-add-then-get-by-val              242.9923+-1.4659     !    249.6290+-3.2146        ! definitely 1.0273x slower
   int-or-other-add                              226.2495+-1.6327          224.9208+-1.8598        
   int-or-other-div-then-get-by-val              105.2304+-0.6893          105.1252+-0.4331        
   int-or-other-max-then-get-by-val              140.2353+-0.6879     ?    141.5437+-0.6356        ?
   int-or-other-min-then-get-by-val              140.2266+-0.8224     !    144.6763+-2.4675        ! definitely 1.0317x slower
   int-or-other-mod-then-get-by-val              103.8230+-0.6266     ?    104.7759+-0.4527        ?
   int-or-other-mul-then-get-by-val              124.4648+-0.8262     ?    125.0835+-0.9148        ?
   int-or-other-neg-then-get-by-val              127.2243+-1.0448     ?    128.6683+-0.8059        ? might be 1.0113x slower
   int-or-other-neg-zero-then-get-by-val         283.6595+-1.1308     ?    284.3401+-1.9289        ?
   int-or-other-sub-then-get-by-val              241.7250+-1.4257     !    248.6575+-2.5706        ! definitely 1.0287x slower
   int-or-other-sub                              226.2594+-1.3067          226.2140+-1.7022        
   int-overflow-local                            175.9691+-2.6634          175.3229+-0.8638        
   Int16Array-bubble-sort                       1067.7213+-14.2553    !   1091.3273+-2.5565        ! definitely 1.0221x slower
   Int16Array-load-int-mul                        27.3937+-0.1835     !     28.1407+-0.1787        ! definitely 1.0273x slower
   Int8Array-load                                 37.1140+-0.7281           36.1678+-0.5502          might be 1.0262x faster
   integer-divide                                391.2293+-0.9023     !    424.9037+-0.9518        ! definitely 1.0861x slower
   integer-modulo                                  9.0282+-0.0743     !      9.4365+-0.1568        ! definitely 1.0452x slower
   make-indexed-storage                            9.5931+-0.1229     ?      9.7242+-0.1220        ? might be 1.0137x slower
   method-on-number                               39.8947+-0.8139     ?     40.8484+-0.2741        ? might be 1.0239x slower
   nested-function-parsing-random                393.8756+-8.0650     ?    397.6181+-8.0893        ?
   nested-function-parsing                        47.6917+-1.3551           47.6117+-1.2893        
   new-array-buffer-dead                         997.3167+-3.5530     !   1013.4809+-4.7945        ! definitely 1.0162x slower
   new-array-buffer-push                          55.7689+-0.2826     !     57.0448+-0.2980        ! definitely 1.0229x slower
   new-array-dead                               1001.8330+-14.7759    ?   1008.6254+-11.1651       ?
   new-array-push                                 39.5733+-1.1274           38.8385+-0.8491          might be 1.0189x faster
   number-test                                    44.7356+-0.6304           43.9065+-0.3294          might be 1.0189x faster
   object-closure-call                           185.3457+-0.5947     ^    181.3033+-0.7721        ^ definitely 1.0223x faster
   object-test                                    46.5967+-0.4557           46.2241+-0.1728        
   poly-stricteq                                 966.2038+-4.5872     ^    958.9021+-2.1751        ^ definitely 1.0076x faster
   polymorphic-structure                        1079.8636+-27.9441        1062.1190+-29.5844         might be 1.0167x faster
   polyvariant-monomorphic-get-by-id             508.4998+-2.9552     ?    511.9921+-3.0516        ?
   rare-osr-exit-on-local                        114.8582+-0.1350     ^    113.4716+-0.1760        ^ definitely 1.0122x faster
   register-pressure-from-osr                    378.7682+-1.7215     ?    378.8308+-1.3232        ?
   simple-activation-demo                        250.3638+-0.7172     ?    250.7049+-0.7741        ?
   slow-array-profile-convergence                 18.5058+-0.1898           18.1641+-0.1891          might be 1.0188x faster
   slow-convergence                               15.0500+-0.1545     ?     15.1581+-0.1617        ?
   sparse-conditional                             45.0809+-1.3548           43.3927+-0.3652          might be 1.0389x faster
   splice-to-remove                              108.3011+-0.3777     !    109.2319+-0.3910        ! definitely 1.0086x slower
   string-concat-object                           43.3260+-0.8530           41.6709+-1.1433          might be 1.0397x faster
   string-concat-pair-object                      42.8049+-0.9768           41.7154+-1.0172          might be 1.0261x faster
   string-concat-pair-simple                     181.4127+-7.2532     ?    181.5131+-8.0114        ?
   string-concat-simple                          195.2060+-7.2766     ?    195.7453+-7.3555        ?
   string-cons-repeat                            211.2515+-5.9333     ?    214.0855+-6.8960        ? might be 1.0134x slower
   string-cons-tower                             210.6660+-3.0246          210.2556+-1.4762        
   string-equality                               444.1645+-2.8023          439.4118+-2.4630          might be 1.0108x faster
   string-hash                                    66.8630+-0.7461     !     68.3695+-0.6929        ! definitely 1.0225x slower
   string-repeat-arith                           114.7803+-0.5278     ?    115.6372+-0.8842        ?
   string-sub                                    346.0125+-1.0248     !    351.4472+-3.6150        ! definitely 1.0157x slower
   string-test                                    44.6894+-0.4789           44.1954+-0.2385          might be 1.0112x faster
   structure-hoist-over-transitions               33.2790+-0.5486     ^     31.5402+-0.6329        ^ definitely 1.0551x faster
   tear-off-arguments-simple                      51.9610+-0.7462           50.7186+-0.9921          might be 1.0245x faster
   tear-off-arguments                             85.1029+-0.7896     ^     82.2770+-1.0577        ^ definitely 1.0343x faster
   temporal-structure                           1062.1951+-37.4462    ?   1077.2291+-29.3359       ? might be 1.0142x slower
   to-int32-boolean                              476.8701+-2.1493          475.9137+-3.6322        
   undefined-test                                 44.7773+-0.7267           44.7186+-0.5410        

   <arithmetic>                                  195.3075+-1.2274     !    199.7306+-1.1282        ! definitely 1.0226x slower
   <geometric> *                                  95.8224+-0.4718     !     99.0967+-0.4824        ! definitely 1.0342x slower
   <harmonic>                                     45.2477+-0.1644     !     46.4847+-0.1733        ! definitely 1.0273x slower

                                                     TipOfTree                    X87                                        
All benchmarks:
   <arithmetic>                                  342.0177+-0.6858     !    360.3297+-0.9751        ! definitely 1.0535x slower
   <geometric>                                         ERROR                     ERROR             
   <harmonic>                                     19.5023+-0.3298     ?     19.7968+-0.3185        ? might be 1.0151x slower

                                                     TipOfTree                    X87                                        
Geomean of preferred means:
   <scaled-result>                               148.1736+-0.4090     !    155.4717+-0.4907        ! definitely 1.0493x slower
Comment 48 Allan Sandfeld Jensen 2013-04-12 03:19:46 PDT
(In reply to comment #46)
> I kind of prefer to not use the red zone just because the risk is often not worth it, but I think you're totally safe here - the red zone ought to always be at least 8 bytes (it's usually much more than that!) and we only have to worry about what happens on X86 platforms.  I'm fine with it.

An alternative could be to have 8 bytes allocated somewhere and use that as a the memory temporary, but it would probably need to be thread-local which would make it complicated.
Comment 49 Allan Sandfeld Jensen 2013-04-12 05:26:05 PDT
Created attachment 197748 [details]
Patch

Newest patch with ffree only at exit and function calls
Comment 50 Allan Sandfeld Jensen 2013-04-12 05:26:58 PDT
(In reply to comment #48)
> (In reply to comment #46)
> > I kind of prefer to not use the red zone just because the risk is often not worth it, but I think you're totally safe here - the red zone ought to always be at least 8 bytes (it's usually much more than that!) and we only have to worry about what happens on X86 platforms.  I'm fine with it.
> 
> An alternative could be to have 8 bytes allocated somewhere and use that as a the memory temporary, but it would probably need to be thread-local which would make it complicated.

I actually would prefer this solution, but I haven't figured out how to best do that in llint. Maybe you could change that in a later patch?
Comment 51 Filip Pizlo 2013-04-12 10:30:05 PDT
Performance when all of the JITs are enabled:



Benchmark report for SunSpider, V8Spider, Octane, Kraken, and JSRegress on bigmac (MacPro5,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quartary/OpenSource/WebKitBuild/Release/jsc (r148221)
"X87" at /Volumes/Data/pizlo/secondary/OpenSource/WebKitBuild/Release/jsc (r148221)

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get
microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                     TipOfTree                    X87                                        
SunSpider:
   3d-cube                                         8.4713+-0.1406     ?      8.5584+-0.1313        ? might be 1.0103x slower
   3d-morph                                       11.5874+-0.0879     ?     11.6877+-0.1872        ?
   3d-raytrace                                    10.9103+-0.2066     ?     11.2307+-0.2053        ? might be 1.0294x slower
   access-binary-trees                             1.9041+-0.0309            1.8956+-0.0330        
   access-fannkuch                                 9.2455+-0.0914            9.2301+-0.1090        
   access-nbody                                    6.2713+-0.0893     ?      6.3165+-0.0863        ?
   access-nsieve                                   4.4406+-0.0506     ?      4.4999+-0.0658        ? might be 1.0133x slower
   bitops-3bit-bits-in-byte                        1.5981+-0.0122            1.5972+-0.0125        
   bitops-bits-in-byte                             5.7304+-0.0618            5.6920+-0.0716        
   bitops-bitwise-and                              1.8656+-0.0575            1.8476+-0.0357        
   bitops-nsieve-bits                              4.9271+-0.0586     ?      4.9553+-0.0548        ?
   controlflow-recursive                           2.9815+-0.0083     ?      2.9883+-0.0070        ?
   crypto-aes                                      7.9819+-0.0873     ?      8.0065+-0.0879        ?
   crypto-md5                                      4.1950+-0.0987     ?      4.2794+-0.0897        ? might be 1.0201x slower
   crypto-sha1                                     3.2086+-0.0632     ?      3.2259+-0.0615        ?
   date-format-tofte                              14.1293+-0.2135           14.1269+-0.2324        
   date-format-xparb                               9.3211+-0.1630            9.1489+-0.1612          might be 1.0188x faster
   math-cordic                                     3.6830+-0.0249            3.6790+-0.0237        
   math-partial-sums                              11.8966+-0.1070           11.8268+-0.1286        
   math-spectral-norm                              2.8458+-0.0626            2.8215+-0.0532        
   regexp-dna                                      8.8680+-0.1724     ?      9.0054+-0.1928        ? might be 1.0155x slower
   string-base64                                   4.3851+-0.0627            4.3769+-0.0672        
   string-fasta                                   11.0923+-0.1067           11.0117+-0.1260        
   string-tagcloud                                13.6231+-0.1474     ?     13.6900+-0.1933        ?
   string-unpack-code                             25.2536+-0.1884           25.2060+-0.2024        
   string-validate-input                           7.7005+-0.2790            7.6735+-0.3608        

   <arithmetic> *                                  7.6199+-0.0559     ?      7.6376+-0.0644        ? might be 1.0023x slower
   <geometric>                                     6.1017+-0.0483     ?      6.1140+-0.0541        ? might be 1.0020x slower
   <harmonic>                                      4.7920+-0.0367     ?      4.7948+-0.0405        ? might be 1.0006x slower

                                                     TipOfTree                    X87                                        
V8Spider:
   crypto                                         91.5816+-0.4602           91.4182+-0.4677        
   deltablue                                     118.4197+-0.5212     ?    118.8000+-0.5584        ?
   earley-boyer                                   75.8704+-0.4073           75.8014+-0.4896        
   raytrace                                       56.3275+-0.2092           56.1597+-0.2468        
   regexp                                         90.5604+-0.4672           90.2170+-0.4606        
   richards                                      121.4483+-0.9724          121.0089+-0.6860        
   splay                                          53.0720+-0.3530     ?     53.8372+-0.6719        ? might be 1.0144x slower

   <arithmetic>                                   86.7543+-0.3187           86.7489+-0.3281          might be 1.0001x faster
   <geometric> *                                  82.9621+-0.3022     ?     83.0132+-0.3201        ? might be 1.0006x slower
   <harmonic>                                     79.1680+-0.2917     ?     79.2885+-0.3291        ? might be 1.0015x slower

                                                     TipOfTree                    X87                                        
Octane and V8v7:
   encrypt                                        0.48404+-0.00108          0.48315+-0.00049       
   decrypt                                        8.89867+-0.00585    ?     8.90637+-0.01130       ?
   deltablue                             x2       0.56227+-0.00159    ?     0.56709+-0.00567       ?
   earley                                         0.92620+-0.00472          0.91757+-0.00493       
   boyer                                         13.18712+-0.04556    ?    13.23834+-0.05002       ?
   raytrace                              x2       4.37337+-0.00453          4.36113+-0.01516       
   regexp                                x2      27.35892+-0.07119         27.35433+-0.09016       
   richards                              x2       0.31841+-0.00105          0.31696+-0.00086       
   splay                                 x2       0.74958+-0.02239          0.74427+-0.01496       
   navier-stokes                         x2       9.40497+-0.01926          9.38817+-0.00985       
   closure                                        0.32011+-0.00899    ?     0.32026+-0.00915       ?
   jquery                                         3.89066+-0.50684          3.87249+-0.49892       
   gbemu                                 x2     135.58479+-0.72822        134.98093+-0.68585       
   box2d                                 x2      32.55236+-0.05705    ?    32.71983+-0.38062       ?

V8v7:
   <arithmetic>                                   6.81444+-0.01240          6.81309+-0.01427         might be 1.0002x faster
   <geometric> *                                  2.39825+-0.00861          2.39503+-0.00717         might be 1.0013x faster
   <harmonic>                                     0.96455+-0.00413          0.96277+-0.00413         might be 1.0019x faster

Octane including V8v7:
   <arithmetic>                                  20.43255+-0.06356         20.39107+-0.04657         might be 1.0020x faster
   <geometric> *                                  4.08882+-0.03090          4.08436+-0.03230         might be 1.0011x faster
   <harmonic>                                     1.09676+-0.00755          1.09514+-0.00822         might be 1.0015x faster

                                                     TipOfTree                    X87                                        
Kraken:
   ai-astar                                       469.424+-3.803            465.934+-4.717         
   audio-beat-detection                           273.807+-0.886      ?     275.288+-0.958         ?
   audio-dft                                      375.629+-0.863      ^     372.211+-1.117         ^ definitely 1.0092x faster
   audio-fft                                      137.547+-0.156      ?     137.770+-0.457         ?
   audio-oscillator                               299.070+-0.835      ?     299.699+-0.754         ?
   imaging-darkroom                               339.435+-0.786      ?     339.473+-1.241         ?
   imaging-desaturate                             137.537+-0.814            136.756+-0.308         
   imaging-gaussian-blur                          417.133+-0.117      ?     417.337+-0.319         ?
   json-parse-financial                            78.675+-0.389      ^      75.126+-0.273         ^ definitely 1.0472x faster
   json-stringify-tinderbox                       106.582+-0.297            106.288+-0.535         
   stanford-crypto-aes                            101.409+-0.327      ^     100.226+-0.320         ^ definitely 1.0118x faster
   stanford-crypto-ccm                            102.356+-1.681            100.923+-1.711           might be 1.0142x faster
   stanford-crypto-pbkdf2                         264.290+-3.688            262.732+-2.170         
   stanford-crypto-sha256-iterative               110.168+-0.377            110.023+-0.254         

   <arithmetic> *                                 229.504+-0.574            228.556+-0.443           might be 1.0041x faster
   <geometric>                                    192.785+-0.518      ^     191.488+-0.392         ^ definitely 1.0068x faster
   <harmonic>                                     162.290+-0.506      ^     160.508+-0.452         ^ definitely 1.0111x faster

                                                     TipOfTree                    X87                                        
JSRegress:
   adapt-to-double-divide                         18.6620+-0.1528     ?     18.9026+-0.2134        ? might be 1.0129x slower
   aliased-arguments-getbyval                      0.8870+-0.0085            0.8825+-0.0082        
   allocate-big-object                             2.0904+-0.0372            2.0904+-0.0343        
   arity-mismatch-inlining                         0.6860+-0.0082            0.6826+-0.0083        
   array-access-polymorphic-structure              6.6670+-0.1333     ?      6.6884+-0.1328        ?
   array-with-double-add                           5.5231+-0.0562            5.4432+-0.0625          might be 1.0147x faster
   array-with-double-increment                     4.1827+-0.0372     ?      4.1924+-0.0190        ?
   array-with-double-mul-add                       6.8065+-0.0964            6.7474+-0.0651        
   array-with-double-sum                           7.1997+-0.0942     ?      7.3575+-0.0834        ? might be 1.0219x slower
   array-with-int32-add-sub                       11.7241+-0.0699     ?     11.8291+-0.1340        ?
   array-with-int32-or-double-sum                  7.3780+-0.0417            7.2711+-0.0818          might be 1.0147x faster
   big-int-mul                                     4.9054+-0.0568            4.8555+-0.0631          might be 1.0103x faster
   boolean-test                                    3.9151+-0.0052     !      3.9461+-0.0170        ! definitely 1.0079x slower
   cast-int-to-double                             20.0697+-0.1025     ?     20.2041+-0.1600        ?
   cell-argument                                  12.1218+-0.1381     ?     12.1944+-0.1302        ?
   cfg-simplify                                    2.8944+-0.0162     ?      2.8947+-0.0115        ?
   cmpeq-obj-to-obj-other                         11.3485+-0.3009     ?     11.5544+-0.1935        ? might be 1.0181x slower
   constant-test                                   7.5989+-0.0821     ?      7.6972+-0.1676        ? might be 1.0129x slower
   direct-arguments-getbyval                       0.8200+-0.0112            0.8105+-0.0076          might be 1.0117x faster
   double-pollution-getbyval                       9.0295+-0.1284     ?      9.0451+-0.1121        ?
   double-pollution-putbyoffset                    6.3907+-0.1157            6.3683+-0.0735        
   empty-string-plus-int                          10.2419+-0.2578           10.1390+-0.1791          might be 1.0101x faster
   external-arguments-getbyval                     2.1421+-0.0110            2.1319+-0.0120        
   external-arguments-putbyval                     5.1082+-0.0681            5.0393+-0.0451          might be 1.0137x faster
   Float32Array-matrix-mult                       12.3590+-0.1681     ?     12.4051+-0.2283        ?
   fold-double-to-int                             23.1428+-0.2870           22.8783+-0.2126          might be 1.0116x faster
   function-dot-apply                              2.8377+-0.0032     ?      2.8384+-0.0041        ?
   function-test                                   5.5094+-0.1015     ?      5.5402+-0.0929        ?
   get-by-id-chain-from-try-block                  6.1146+-0.0897            6.0519+-0.0668          might be 1.0104x faster
   HashMap-put-get-iterate-keys                   95.8641+-1.2456           95.8528+-1.0248        
   HashMap-put-get-iterate                        97.4573+-0.9539           97.0483+-0.6366        
   HashMap-string-put-get-iterate                 73.3810+-1.5786           71.4995+-0.7505          might be 1.0263x faster
   indexed-properties-in-objects                   4.9999+-0.0537            4.9789+-0.0661        
   inline-arguments-access                         1.0655+-0.0090     ?      1.0659+-0.0076        ?
   inline-arguments-local-escape                  23.9774+-0.1634           23.8001+-0.2629        
   inline-get-scoped-var                           7.7726+-0.0888     ?      7.7958+-0.0971        ?
   inlined-put-by-id-transition                   13.7203+-0.1477           13.3768+-0.2447          might be 1.0257x faster
   int-or-other-abs-then-get-by-val                7.6878+-0.1108            7.6678+-0.0494        
   int-or-other-abs-zero-then-get-by-val          40.3163+-1.0903           39.5318+-0.6394          might be 1.0198x faster
   int-or-other-add-then-get-by-val                9.5297+-0.0970     ?      9.5403+-0.1115        ?
   int-or-other-add                                9.6070+-0.0956     ?      9.6808+-0.1389        ?
   int-or-other-div-then-get-by-val               13.7068+-0.2952           13.6847+-0.2809        
   int-or-other-max-then-get-by-val                8.9716+-0.2515            8.8679+-0.2627          might be 1.0117x faster
   int-or-other-min-then-get-by-val                6.9859+-0.0964            6.9514+-0.0800        
   int-or-other-mod-then-get-by-val                6.5939+-0.0817            6.5403+-0.0918        
   int-or-other-mul-then-get-by-val                6.1754+-0.0673     ?      6.2010+-0.0725        ?
   int-or-other-neg-then-get-by-val                7.0743+-0.0616     ?      7.0794+-0.0886        ?
   int-or-other-neg-zero-then-get-by-val          40.1960+-0.8963           40.1210+-0.8703        
   int-or-other-sub-then-get-by-val                9.7823+-0.1034     ?      9.8527+-0.1058        ?
   int-or-other-sub                                7.4765+-0.0990            7.4220+-0.0832        
   int-overflow-local                             12.3823+-0.1342           12.2641+-0.1071        
   Int16Array-bubble-sort                         47.9325+-0.1673     ?     48.0858+-0.4122        ?
   Int16Array-load-int-mul                         1.6472+-0.0049            1.6449+-0.0057        
   Int8Array-load                                  4.0870+-0.0338     ?      4.1175+-0.0344        ?
   integer-divide                                 14.1285+-0.1962           13.9381+-0.0705          might be 1.0137x faster
   integer-modulo                                  2.2381+-0.0346            2.2331+-0.0256        
   make-indexed-storage                            3.9175+-0.0597            3.8699+-0.0355          might be 1.0123x faster
   method-on-number                               26.2468+-0.3698           25.7263+-0.6429          might be 1.0202x faster
   nested-function-parsing-random                360.3527+-7.5504     ?    364.3197+-7.7778        ? might be 1.0110x slower
   nested-function-parsing                        43.8675+-1.3831           43.8332+-1.3596        
   new-array-buffer-dead                           3.1047+-0.0234            3.0978+-0.0216        
   new-array-buffer-push                           8.9484+-0.2113            8.9037+-0.0935        
   new-array-dead                                 23.6217+-0.1106           23.5961+-0.0979        
   new-array-push                                  8.1822+-0.8248            7.2123+-0.8327          might be 1.1345x faster
   number-test                                     3.9190+-0.0495     ?      3.9740+-0.0245        ? might be 1.0140x slower
   object-closure-call                             7.4267+-0.0850            7.4054+-0.0933        
   object-test                                     5.2685+-0.0856     ?      5.3046+-0.0544        ?
   poly-stricteq                                 125.0701+-0.2014     ?    125.0846+-0.6873        ?
   polymorphic-structure                          20.9994+-0.1814     ?     21.1096+-0.1608        ?
   polyvariant-monomorphic-get-by-id              10.8186+-0.6052     ?     11.3207+-1.0194        ? might be 1.0464x slower
   rare-osr-exit-on-local                         17.4148+-0.1382           17.3323+-0.1217        
   register-pressure-from-osr                     39.7328+-0.2049     ?     39.8345+-0.2256        ?
   simple-activation-demo                         32.7903+-0.1577     ?     32.8377+-0.1830        ?
   slow-array-profile-convergence                  4.9651+-0.0312            4.9423+-0.0428        
   slow-convergence                                3.4965+-0.0096     ?      3.5097+-0.0091        ?
   sparse-conditional                              1.0508+-0.0086            1.0503+-0.0082        
   splice-to-remove                               81.1577+-0.6335     ?     81.3371+-0.7318        ?
   string-concat-object                            2.6037+-0.0571     ?      2.6748+-0.0537        ? might be 1.0273x slower
   string-concat-pair-object                       1.8310+-0.0784            1.7893+-0.0310          might be 1.0233x faster
   string-concat-pair-simple                       9.8376+-0.1170            9.8214+-0.0899        
   string-concat-simple                           24.0914+-0.3094           23.8376+-0.1993          might be 1.0106x faster
   string-cons-repeat                              8.1876+-0.1630            8.0398+-0.1312          might be 1.0184x faster
   string-cons-tower                               7.7395+-0.0931     ?      7.7585+-0.0989        ?
   string-equality                               115.1079+-1.8495     ?    116.4246+-1.3613        ? might be 1.0114x slower
   string-hash                                     2.6314+-0.0075     ?      2.6347+-0.0099        ?
   string-repeat-arith                            95.7668+-0.5033           95.1092+-0.4644        
   string-sub                                    170.1028+-0.5544     ?    170.1274+-0.8092        ?
   string-test                                     4.0592+-0.0092     ?      4.0662+-0.0061        ?
   structure-hoist-over-transitions                2.6915+-0.0275     ?      2.7018+-0.0319        ?
   tear-off-arguments-simple                       1.7485+-0.0082            1.7456+-0.0074        
   tear-off-arguments                              3.2541+-0.0079            3.2530+-0.0092        
   temporal-structure                             20.8301+-0.1698           20.8226+-0.1213        
   to-int32-boolean                               27.8484+-0.1490     ?     27.9343+-0.1371        ?
   undefined-test                                  4.0781+-0.0411            4.0312+-0.0597          might be 1.0116x faster

   <arithmetic>                                   22.6370+-0.0655     ?     22.6372+-0.0562        ? might be 1.0000x slower
   <geometric> *                                   9.2036+-0.0370            9.1799+-0.0368          might be 1.0026x faster
   <harmonic>                                      4.8231+-0.0272            4.8092+-0.0270          might be 1.0029x faster

                                                     TipOfTree                    X87                                        
All benchmarks:
   <arithmetic>                                   40.4653+-0.0889           40.3810+-0.0754          might be 1.0021x faster
   <geometric>                                    11.0256+-0.0478           11.0050+-0.0504          might be 1.0019x faster
   <harmonic>                                      3.6101+-0.0166            3.6034+-0.0198          might be 1.0019x faster

                                                     TipOfTree                    X87                                        
Geomean of preferred means:
   <scaled-result>                                22.2551+-0.0976           22.2334+-0.1039          might be 1.0010x faster
Comment 52 Allan Sandfeld Jensen 2013-04-14 03:15:31 PDT
(In reply to comment #51)
> Performance when all of the JITs are enabled:
> 
That looks very reasonable. One of two percent up or down.
Comment 53 Allan Sandfeld Jensen 2013-04-16 04:55:08 PDT
Does you old r+ still stand or do you want to test the new patch some more?
Comment 54 Filip Pizlo 2013-04-18 17:53:41 PDT
(In reply to comment #53)
> Does you old r+ still stand or do you want to test the new patch some more?

Sorry for the delay!  I was away since Friday.  I just ran tests on your patch and see these new failures:

  sputnik/Conformance/08_Types/8.5_The_Number_Type/S8.5_A2.1.html [ Failure ]
  sputnik/Conformance/08_Types/8.5_The_Number_Type/S8.5_A2.2.html [ Failure ]

Do you see these failures also, or are they Mac-only?  I don't get these failures on trunk, but I do see them when I apply your patch.
Comment 55 Filip Pizlo 2013-04-18 17:55:23 PDT
(In reply to comment #54)
> (In reply to comment #53)
> > Does you old r+ still stand or do you want to test the new patch some more?
> 
> Sorry for the delay!  I was away since Friday.  I just ran tests on your patch and see these new failures:
> 
>   sputnik/Conformance/08_Types/8.5_The_Number_Type/S8.5_A2.1.html [ Failure ]
>   sputnik/Conformance/08_Types/8.5_The_Number_Type/S8.5_A2.2.html [ Failure ]
> 
> Do you see these failures also, or are they Mac-only?  I don't get these failures on trunk, but I do see them when I apply your patch.

For example:


[pizlo@bigmac OpenSource] DYLD_FRAMEWORK_PATH=WebKitBuild/Debug/ WebKitBuild/Debug/DumpRenderTree LayoutTests/sputnik/Conformance/08_Types/8.5_The_Number_Type/S8.5_A2.1.html
Content-Type: text/plain
DumpMalloc: 0
S8.5_A2.1

FAIL SputnikError: #1: var x = 9007199254740994.0; var y = 1.0 - 1/65536.0; var z = x + y; var d = z - x; d === 0. Actual: 2

TEST COMPLETE

#EOF

(i.e. the test was expecting 'd' to be zero but it ended up being 2.)
Comment 56 Filip Pizlo 2013-04-18 17:56:01 PDT
Comment on attachment 197748 [details]
Patch

I think this needs a bit more love.  We should figure out why it causes Sputnik regressions.
Comment 57 Allan Sandfeld Jensen 2013-04-19 05:39:48 PDT
(In reply to comment #56)
> (From update of attachment 197748 [details])
> I think this needs a bit more love.  We should figure out why it causes Sputnik regressions.

It is due to increased precision. Those two tests test that we lose precision.
Comment 58 Allan Sandfeld Jensen 2013-04-19 05:41:57 PDT
Created attachment 198846 [details]
Patch

Set x87 precision to 64bit, since Linux defaults it to 80bit
Comment 59 Filip Pizlo 2013-04-19 13:24:22 PDT
Comment on attachment 198846 [details]
Patch

r=me!  This fixes Sputnik.
Comment 60 WebKit Commit Bot 2013-04-20 03:26:19 PDT
Comment on attachment 198846 [details]
Patch

Clearing flags on attachment: 198846

Committed r148790: <http://trac.webkit.org/changeset/148790>
Comment 61 WebKit Commit Bot 2013-04-20 03:26:25 PDT
All reviewed patches have been landed.  Closing bug.
Comment 62 Csaba Osztrogonác 2013-04-21 03:33:46 PDT
(In reply to comment #60)
> (From update of attachment 198846 [details])
> Clearing flags on attachment: 198846
> 
> Committed r148790: <http://trac.webkit.org/changeset/148790>

FYI, it broke some tests on 32 bit: http://build.webkit.sed.hu/builders/x86-32%20Linux%20Qt%20Release%20NRWT/builds/32135

Could you check and fix it, please?
Comment 63 Allan Sandfeld Jensen 2013-04-21 05:40:40 PDT
(In reply to comment #62)
> (In reply to comment #60)
> > (From update of attachment 198846 [details] [details])
> > Clearing flags on attachment: 198846
> > 
> > Committed r148790: <http://trac.webkit.org/changeset/148790>
> 
> FYI, it broke some tests on 32 bit: http://build.webkit.sed.hu/builders/x86-32%20Linux%20Qt%20Release%20NRWT/builds/32135
> 
> Could you check and fix it, please?

Of course. For reference it appears to be 5 canvas tests and two sputnik tests:
  fast/canvas/canvas-arc-360-winding.html [ Failure ]
  fast/canvas/canvas-fillPath-alpha-shadow.html [ Failure ]
  fast/canvas/canvas-fillPath-gradient-shadow.html [ Failure ]
  fast/canvas/canvas-fillPath-pattern-shadow.html [ Failure ]
  fast/canvas/canvas-strokePath-alpha-shadow.html [ Failure ]  sputnik/Conformance/15_Native_Objects/15.8_Math/15.8.2/15.8.2.13_pow/S15.8.2.13_A24.html [ Failure ]
  sputnik/Conformance/15_Native_Objects/15.8_Math/15.8.2/15.8.2.8_exp/S15.8.2.8_A6.html [ Failure ]
Comment 64 Allan Sandfeld Jensen 2013-04-21 06:27:04 PDT
(In reply to comment #63)
> (In reply to comment #62)
> > (In reply to comment #60)
> > > (From update of attachment 198846 [details] [details] [details])
> > > Clearing flags on attachment: 198846
> > > 
> > > Committed r148790: <http://trac.webkit.org/changeset/148790>
> > 
> > FYI, it broke some tests on 32 bit: http://build.webkit.sed.hu/builders/x86-32%20Linux%20Qt%20Release%20NRWT/builds/32135
> > 
> > Could you check and fix it, please?
> 
> Of course. For reference it appears to be 5 canvas tests and two sputnik tests:
>   fast/canvas/canvas-arc-360-winding.html [ Failure ]
>   fast/canvas/canvas-fillPath-alpha-shadow.html [ Failure ]
>   fast/canvas/canvas-fillPath-gradient-shadow.html [ Failure ]
>   fast/canvas/canvas-fillPath-pattern-shadow.html [ Failure ]
>   fast/canvas/canvas-strokePath-alpha-shadow.html [ Failure ]  
> sputnik/Conformance/15_Native_Objects/15.8_Math/15.8.2/15.8.2.13_pow/S15.8.2.13_A24.html [ Failure ]
>   sputnik/Conformance/15_Native_Objects/15.8_Math/15.8.2/15.8.2.8_exp/S15.8.2.8_A6.html [ Failure ]

I have solved these before. I when hand merging the last patch I forgot to include one more change. The bcd2i instructions has to use fcomip comparison instead of fucomip comparison since fucomip specifically does not report invalid comparison when either value is +/- infinity. Which means infinity may get wrongly converted to integer.
Comment 65 Allan Sandfeld Jensen 2013-04-21 06:52:11 PDT
(In reply to comment #64)
> I have solved these before. I when hand merging the last patch I forgot to include one more change. The bcd2i instructions has to use fcomip comparison instead of fucomip comparison since fucomip specifically does not report invalid comparison when either value is +/- infinity. Which means infinity may get wrongly converted to integer.

Nevermind. That makes no sense. I will figure out which difference in my debug-tree makes the difference monday.
Comment 66 Jan 2013-04-23 04:36:51 PDT
The fix did land QtWebkit 2.3.1, right? Unfortunately, I still get illegal instructions. Although it seems to be working for most sites now. I can reproduce it running http://octane-benchmark.googlecode.com/svn/latest/index.html. It crashes at "Splay - Memory & GC". It could take me some time to get a debug build if it's needed to find the culprit for this one.
Comment 67 Allan Sandfeld Jensen 2013-04-23 04:47:47 PDT
(In reply to comment #66)
> The fix did land QtWebkit 2.3.1, right? Unfortunately, I still get illegal instructions. Although it seems to be working for most sites now. I can reproduce it running http://octane-benchmark.googlecode.com/svn/latest/index.html. It crashes at "Splay - Memory & GC". It could take me some time to get a debug build if it's needed to find the culprit for this one.

This patch is not in QtWebKit 2.3.1. I just disabled LLInt in 2.3.1 if GCC does not find the __SSE2__ define.
Comment 68 Jan 2013-04-23 04:55:23 PDT
(In reply to comment #67)
> This patch is not in QtWebKit 2.3.1. I just disabled LLInt in 2.3.1 if GCC does not find the __SSE2__ define.

So, I'm hitting another bug for which I should file a new bug?
Comment 69 Allan Sandfeld Jensen 2013-04-26 08:02:20 PDT
Reclose
Comment 70 Allan Sandfeld Jensen 2013-06-07 08:26:04 PDT
There is a remaining case where the stack was not reset right. Filip, have you had a chance to take a look at bug 148790 ?
Comment 71 Allan Sandfeld Jensen 2013-06-07 08:26:51 PDT
(In reply to comment #70)
> There is a remaining case where the stack was not reset right. Filip, have you had a chance to take a look at bug 148790 ?

Sorry bug 114913 of course
Comment 72 Gauvain Pocentek 2013-07-26 06:42:13 PDT
Hello,

I'm experiencing this problem with qtwebkit 2.3.2. The build is based on ubuntu packaging, with these options build:
 ./Tools/Scripts/build-webkit --qt DEFINES+=ENABLE_JIT=0 DEFINES+=ENABLE_YARR_JIT=0 DEFINES+=ENABLE_ASSEMBLER=0 --no-force-sse2

The crash happens on an AMD geode CPU, with an "Illegal Instruction". The backtrace is similar to the arora trace provided in comment #1.

I've tested with a personal application and with arora.

Maybe the build options are not good, but it looks like the bug is still there in this configuration.

Let me know if you need more information.
Comment 73 Allan Sandfeld Jensen 2013-07-26 06:58:02 PDT
(In reply to comment #72)
> Hello,
> 
> I'm experiencing this problem with qtwebkit 2.3.2. The build is based on ubuntu packaging, with these options build:
>  ./Tools/Scripts/build-webkit --qt DEFINES+=ENABLE_JIT=0 DEFINES+=ENABLE_YARR_JIT=0 DEFINES+=ENABLE_ASSEMBLER=0 --no-force-sse2
> 
> The crash happens on an AMD geode CPU, with an "Illegal Instruction". The backtrace is similar to the arora trace provided in comment #1.
> 
> I've tested with a personal application and with arora.
> 
> Maybe the build options are not good, but it looks like the bug is still there in this configuration.
> 
> Let me know if you need more information.

That sounds like a different bug. This particular codepath in llint does not get used if you disable JIT.
Comment 74 Gauvain Pocentek 2013-07-26 07:09:40 PDT
OK. I'm going to test a build with JIT enabled and see if it helps.

I'll report an other bug if needed.

Thanks.