Bug 114852

Summary: Whenever it is cheap and non-invasive, SunSpider tests should validate their results to ensure that the browser runs them correctly
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: Tools / TestsAssignee: Filip Pizlo <fpizlo>
Status: RESOLVED FIXED    
Severity: Normal CC: barraclough, commit-queue, ggaren, mark.lam, mhahnenberg, msaboff, oliver, rniwa, sam
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
work in progress
none
the patch
none
the patch ggaren: review+

Description Filip Pizlo 2013-04-18 22:26:11 PDT
Patch forthcoming.
Comment 1 Filip Pizlo 2013-04-18 22:29:26 PDT
Created attachment 198802 [details]
work in progress

Not yet ready for review.
Comment 2 Filip Pizlo 2013-04-18 22:43:11 PDT
Example output when a browser fails tests (in this case FROM is Chrome, which apparently does math differently than everyone else):

TEST                   COMPARISON               FROM                 TO             DETAILS

===============================================================================

** TOTAL **:           ??                      ERROR        134.8ms +/- 1.2%      invalid runs detected

===============================================================================

  3d:                  ??                      ERROR         19.7ms +/- 4.5%      invalid runs detected
    cube:              1.56x as fast     10.9ms +/- 36.8%     7.0ms +/- 11.8%     significant
    morph:             ??                      ERROR          5.6ms +/- 8.9%      invalid runs detected
    raytrace:          1.32x as fast      9.4ms +/- 8.2%      7.1ms +/- 3.2%      significant

  access:              1.59x as fast     19.1ms +/- 23.4%    12.0ms +/- 4.9%      significant
    binary-trees:      -                  2.3ms +/- 106.7%     1.1ms +/- 20.5% 
    fannkuch:          1.70x as fast      8.5ms +/- 18.7%     5.0ms +/- 0.0%      significant
    nbody:             -                  5.3ms +/- 70.2%     3.0ms +/- 11.2% 
    nsieve:            -                  3.0ms +/- 50.2%     2.9ms +/- 14.0% 

  bitops:              1.96x as fast     14.5ms +/- 24.6%     7.4ms +/- 6.8%      significant
    3bit-bits-in-byte: ??                 0.9ms +/- 25.1%     1.1ms +/- 20.5%     not conclusive: might be *1.22x as slow*
    bits-in-byte:      2.00x as fast      3.6ms +/- 13.9%     1.8ms +/- 16.7%     significant
    bitwise-and:       1.50x as fast      2.4ms +/- 15.4%     1.6ms +/- 23.1%     significant
    nsieve-bits:       2.62x as fast      7.6ms +/- 49.2%     2.9ms +/- 14.0%     significant

  controlflow:         -                  1.7ms +/- 20.3%     1.7ms +/- 20.3% 
    recursive:         -                  1.7ms +/- 20.3%     1.7ms +/- 20.3% 

  crypto:              1.46x as fast     15.0ms +/- 7.8%     10.3ms +/- 4.7%      significant
    aes:               -                  5.7ms +/- 13.3%     5.2ms +/- 5.8%  
    md5:               1.48x as fast      4.3ms +/- 17.6%     2.9ms +/- 7.8%      significant
    sha1:              2.27x as fast      5.0ms +/- 21.3%     2.2ms +/- 13.7%     significant

  date:                -                 22.0ms +/- 24.9%    19.0ms +/- 5.0%  
    format-tofte:      ??                 9.6ms +/- 35.0%    10.8ms +/- 2.8%      not conclusive: might be *1.125x as slow*
    format-xparb:      1.51x as fast     12.4ms +/- 33.5%     8.2ms +/- 10.7%     significant

  math:                ??                      ERROR         10.8ms +/- 5.2%      invalid runs detected
    cordic:            -                  3.2ms +/- 17.6%     2.7ms +/- 17.9% 
    partial-sums:      ??                      ERROR          6.1ms +/- 3.7%      invalid runs detected
    spectral-norm:     -                  3.3ms +/- 63.0%     2.0ms +/- 0.0%  

  regexp:              ??                 6.2ms +/- 4.9%      6.4ms +/- 5.8%      not conclusive: might be *1.032x as slow*
    dna:               ??                 6.2ms +/- 4.9%      6.4ms +/- 5.8%      not conclusive: might be *1.032x as slow*

  string:              1.175x as fast    55.8ms +/- 9.9%     47.5ms +/- 1.1%      significant
    base64:            ??                 3.4ms +/- 14.7%     3.6ms +/- 10.3%     not conclusive: might be *1.059x as slow*
    fasta:             *1.91x as slow*    5.3ms +/- 12.8%    10.1ms +/- 2.2%      significant
    tagcloud:          1.74x as fast     16.9ms +/- 6.1%      9.7ms +/- 3.6%      significant
    unpack-code:       -                 21.8ms +/- 13.1%    19.2ms +/- 3.8%  
    validate-input:    -                  8.4ms +/- 46.8%     4.9ms +/- 4.6%
Comment 3 Filip Pizlo 2013-04-18 22:43:38 PDT
Example output in analyze mode (again this is Chrome):


============================================
RESULTS (means and 95% confidence intervals)
--------------------------------------------
Total:                   ERROR: Some tests failed.
--------------------------------------------

  3d:                    ERROR: Some tests failed.
    cube:               10.9ms +/- 36.8%
    morph:               ERROR: Invalid test run.
    raytrace:            9.4ms +/- 8.2%

  access:               19.1ms +/- 23.4%
    binary-trees:        2.3ms +/- 106.7%
    fannkuch:            8.5ms +/- 18.7%
    nbody:               5.3ms +/- 70.2%
    nsieve:              3.0ms +/- 50.2%

  bitops:               14.5ms +/- 24.6%
    3bit-bits-in-byte:   0.9ms +/- 25.1%
    bits-in-byte:        3.6ms +/- 13.9%
    bitwise-and:         2.4ms +/- 15.4%
    nsieve-bits:         7.6ms +/- 49.2%

  controlflow:           1.7ms +/- 20.3%
    recursive:           1.7ms +/- 20.3%

  crypto:               15.0ms +/- 7.8%
    aes:                 5.7ms +/- 13.3%
    md5:                 4.3ms +/- 17.6%
    sha1:                5.0ms +/- 21.3%

  date:                 22.0ms +/- 24.9%
    format-tofte:        9.6ms +/- 35.0%
    format-xparb:       12.4ms +/- 33.5%

  math:                  ERROR: Some tests failed.
    cordic:              3.2ms +/- 17.6%
    partial-sums:        ERROR: Invalid test run.
    spectral-norm:       3.3ms +/- 63.0%

  regexp:                6.2ms +/- 4.9%
    dna:                 6.2ms +/- 4.9%

  string:               55.8ms +/- 9.9%
    base64:              3.4ms +/- 14.7%
    fasta:               5.3ms +/- 12.8%
    tagcloud:           16.9ms +/- 6.1%
    unpack-code:        21.8ms +/- 13.1%
    validate-input:      8.4ms +/- 46.8%
Comment 4 Filip Pizlo 2013-04-18 22:45:35 PDT
Created attachment 198803 [details]
the patch
Comment 5 Filip Pizlo 2013-04-18 22:53:51 PDT
Comment on attachment 198803 [details]
the patch

Clearing r? while I investigate more, whether the expected results are correct.
Comment 6 Filip Pizlo 2013-04-18 23:51:34 PDT
Actually, I think both Safari/Firefox and Chrome were both "right" in their different results - ECMAScript does not specify what pow(), sin(), or cos() return. :-/
Comment 7 Filip Pizlo 2013-04-18 23:54:55 PDT
Created attachment 198812 [details]
the patch
Comment 8 Oliver Hunt 2013-04-19 08:09:00 PDT
(In reply to comment #6)
> Actually, I think both Safari/Firefox and Chrome were both "right" in their different results - ECMAScript does not specify what pow(), sin(), or cos() return. :-/

O_o

Are firefox and chrome consistent with each other?
Comment 9 Filip Pizlo 2013-04-19 12:53:55 PDT
(In reply to comment #8)
> (In reply to comment #6)
> > Actually, I think both Safari/Firefox and Chrome were both "right" in their different results - ECMAScript does not specify what pow(), sin(), or cos() return. :-/
> 
> O_o
> 
> Are firefox and chrome consistent with each other?

No.

Firefox is consistent with Safari.  Chrome differs.
Comment 10 Geoffrey Garen 2013-04-19 16:39:08 PDT
Comment on attachment 198812 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=198812&action=review

r=me

> PerformanceTests/SunSpider/tests/sunspider-1.0/3d-morph.js:56
> +// This has to be an approximate test since ECMAscript doesn't 

Missed the end of the sentence here.
Comment 11 Filip Pizlo 2013-04-19 17:26:23 PDT
(In reply to comment #10)
> (From update of attachment 198812 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=198812&action=review
> 
> r=me
> 
> > PerformanceTests/SunSpider/tests/sunspider-1.0/3d-morph.js:56
> > +// This has to be an approximate test since ECMAscript doesn't 
> 
> Missed the end of the sentence here.

Fixed.
Comment 12 Filip Pizlo 2013-04-19 17:26:31 PDT
Landed in http://trac.webkit.org/changeset/148784