Bug 158906 - REGRESSION(concurrent baseline JIT): Kraken/ai-astar runs 20% slower
Summary: REGRESSION(concurrent baseline JIT): Kraken/ai-astar runs 20% slower
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: All All
: P2 Normal
Assignee: Filip Pizlo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-06-18 12:28 PDT by Filip Pizlo
Modified: 2016-06-20 18:59 PDT (History)
11 users (show)

See Also:


Attachments
the patch (7.25 KB, patch)
2016-06-18 13:50 PDT, Filip Pizlo
benjamin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2016-06-18 12:28:50 PDT
So far this looks like it's because the baseline compile of the search# is delayed by >10ms.  I'm still investigating why

I do know that it's only the decision to compile baseline code concurrently that causes the slow-down, and not any of the other changes in the patch.
Comment 1 Filip Pizlo 2016-06-18 12:32:03 PDT
Looks like we enqueue the search# plan and then it takes ~10ms for the compiler thread to wake up!
Comment 2 Filip Pizlo 2016-06-18 12:43:18 PDT
This is fascinating!  Here's what's happening: we enqueued a compile of <global>, which is a ginormous piece of global code, during the benchmark's unmeansured initialization phase.  Then at the time that we try to compile search#, the JIT thread is busy compiling <global>.  That takes a long time to compile and it generates a disgusting amount of code.  <global> takes 16ms to compile and generates 8MB!!

There are a bunch of solutions we could attempt:

1) If we want to compile something and the concurrent JIT thread is busy then compile on the main thread instead.

2) Launch multiple JIT threads.

3) Delay the compilation of <global> based on size heuristics the same way that we do for DFG and FTL tier-up.  This might be hard because I think that <global> actually spends a significant amount of time in its silly loop.

4) Don't compile all of <global> with the baseline JIT.  It tiers up because of a tiny loop at the end.

I think I'll attempt (1).  That seems like the easiest way of recovering this regression.  I think it would be great to consider the other solutions, too.
Comment 3 Filip Pizlo 2016-06-18 13:15:53 PDT
Attempt (1) seems to work for Kraken/ai-astar!

Trunk (including concurrent baseline JIT): 111.2 ms
With my fix: 92.6 ms
With concurrent baseline JIT totally disabled: 87.9 ms

I'll do more tests.
Comment 4 Filip Pizlo 2016-06-18 13:22:19 PDT
I believe that this resolves the Kraken issue.  Here are the numbers now.

FROM = concurrent baseline JIT completely disabled.
TO = my change, which uses synchronous compilation of the thread is busy.

TEST                         COMPARISON            FROM                 TO               DETAILS

====================================================================================

** TOTAL **:                 1.013x as fast    826.1ms +/- 0.6%   815.5ms +/- 1.1%     significant

====================================================================================

  ai:                        *1.043x as slow*   85.3ms +/- 2.0%    89.0ms +/- 1.6%     significant
    astar:                   *1.043x as slow*   85.3ms +/- 2.0%    89.0ms +/- 1.6%     significant

  audio:                     1.037x as fast    259.1ms +/- 1.1%   249.8ms +/- 2.9%     significant
    beat-detection:          ??                 58.1ms +/- 3.7%    59.1ms +/- 11.5%     might be *1.017x as slow*
    dft:                     1.100x as fast    103.1ms +/- 2.1%    93.7ms +/- 2.1%     significant
    fft:                     -                  43.9ms +/- 1.8%    42.9ms +/- 2.4% 
    oscillator:              ??                 54.0ms +/- 1.2%    54.1ms +/- 2.1%     might be *1.002x as slow*

  imaging:                   ??                185.1ms +/- 1.0%   185.6ms +/- 2.5%     might be *1.003x as slow*
    gaussian-blur:           -                  64.2ms +/- 2.2%    62.2ms +/- 3.4% 
    darkroom:                ??                 72.9ms +/- 1.0%    73.5ms +/- 0.8%     might be *1.008x as slow*
    desaturate:              ??                 48.0ms +/- 5.1%    49.9ms +/- 6.5%     might be *1.040x as slow*

  json:                      -                  58.7ms +/- 1.3%    57.8ms +/- 1.3% 
    parse-financial:         -                  35.2ms +/- 1.6%    34.8ms +/- 1.6% 
    stringify-tinderbox:     1.022x as fast     23.5ms +/- 1.6%    23.0ms +/- 1.5%     significant

  stanford:                  1.020x as fast    237.9ms +/- 0.8%   233.3ms +/- 1.2%     significant
    crypto-aes:              1.045x as fast     51.5ms +/- 1.9%    49.3ms +/- 1.2%     significant
    crypto-ccm:              -                  48.4ms +/- 1.9%    48.3ms +/- 4.7% 
    crypto-pbkdf2:           1.014x as fast     99.6ms +/- 1.3%    98.2ms +/- 0.6%     significant
    crypto-sha256-iterative: 1.024x as fast     38.4ms +/- 1.3%    37.5ms +/- 1.3%     significant
Comment 5 Filip Pizlo 2016-06-18 13:50:33 PDT
Created attachment 281618 [details]
the patch
Comment 6 Benjamin Poulain 2016-06-18 23:51:22 PDT
Comment on attachment 281618 [details]
the patch

A bit nasty but let's try anyway. Especially since ARM hurts a lot and we don't have more cores.
Comment 7 Filip Pizlo 2016-06-19 09:40:25 PDT
(In reply to comment #6)
> Comment on attachment 281618 [details]
> the patch
> 
> A bit nasty but let's try anyway. Especially since ARM hurts a lot and we
> don't have more cores.

We had a similar thing arise with DFG vs FTL compilation, and we found that even on two cores, it's profitable to have DFG and FTL be in separate threads so that they interleave with each other in case the FTL takes a long time.

I think that a similar thing applies here.  Having additional baseline threads would likely help.  It would be a generalization of what this patch does.  But maybe what this patch does is even better, since most baseline compiles are fast enough that it's OK to just do them on the main thread when the helper is busy.
Comment 8 Filip Pizlo 2016-06-19 09:45:04 PDT
Landed in r202205.  I think.
Comment 9 Ryosuke Niwa 2016-06-20 18:59:34 PDT
I think we're ~3% regressed on iPhone 6 Plus. Will email you details.