Bug 113635 - fourthTier: FTL JIT should be able to compile the Marsaglia random number generator
Summary: fourthTier: FTL JIT should be able to compile the Marsaglia random number gen...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Filip Pizlo
URL:
Keywords:
Depends on:
Blocks: 112840
  Show dependency treegraph
 
Reported: 2013-03-29 21:45 PDT by Filip Pizlo
Modified: 2013-04-01 10:54 PDT (History)
7 users (show)

See Also:


Attachments
work in progress (21.48 KB, patch)
2013-03-29 23:06 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (35.78 KB, patch)
2013-03-29 23:52 PDT, Filip Pizlo
oliver: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2013-03-29 21:45:54 PDT
This is a very simple PRNG that we ought to be able to compile:

function marsaglia(m_z, m_w, n) {
    var result;
    for (var i = 0; i < n; ++i) {
        m_z = (36969 * (m_z & 65535) + (m_z >> 16)) | 0;
        m_w = (18000 * (m_w & 65535) + (m_w >> 16)) | 0;
        result = ((m_z << 16) + m_w) | 0;
    }
    return result;
}

var result = 0;
for (var i = 0; i < 100; ++i)
    result += marsaglia(i, i + 1, 1000000);

print(result);
Comment 1 Filip Pizlo 2013-03-29 23:06:35 PDT
Created attachment 195843 [details]
work in progress
Comment 2 Filip Pizlo 2013-03-29 23:52:26 PDT
Created attachment 195844 [details]
the patch
Comment 3 Oliver Hunt 2013-03-30 16:53:04 PDT
Comment on attachment 195844 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=195844&action=review

> Source/JavaScriptCore/ChangeLog:14
> +        The Marsaglia function runs ~60% faster with FTL, than DFG. Not a terrible start.

Does the code look sane?
Comment 4 Oliver Hunt 2013-03-30 16:54:15 PDT
Comment on attachment 195844 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=195844&action=review

> LayoutTests/fast/js/regress/script-tests/marsaglia.js:15
> +print(result);

Oh, this should be debug(result) I think, as it occurs to me that in-browser print() brings up the print dialog (obviously :D )
Comment 5 Mark Hahnenberg 2013-03-30 16:57:32 PDT
Comment on attachment 195844 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=195844&action=review

>> Source/JavaScriptCore/ChangeLog:14
>> +        The Marsaglia function runs ~60% faster with FTL, than DFG. Not a terrible start.
> 
> Does the code look sane?

And when you say 60% faster, is that for the full 100 iterations in your original benchmark code or comparing the two once both of them fully tier up?
Comment 6 Filip Pizlo 2013-03-30 16:57:51 PDT
(In reply to comment #4)
> (From update of attachment 195844 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=195844&action=review
> 
> > LayoutTests/fast/js/regress/script-tests/marsaglia.js:15
> > +print(result);
> 
> Oh, this should be debug(result) I think, as it occurs to me that in-browser print() brings up the print dialog (obviously :D )

Ooooops!!  Haha, thanks for the catch.
Comment 7 Filip Pizlo 2013-03-30 16:58:29 PDT
(In reply to comment #3)
> (From update of attachment 195844 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=195844&action=review
> 
> > Source/JavaScriptCore/ChangeLog:14
> > +        The Marsaglia function runs ~60% faster with FTL, than DFG. Not a terrible start.
> 
> Does the code look sane?

The LLVM IR looks sane.

I'm still doing the wiring to allow us to actually disassemble the things that LLVM generates.
Comment 8 Filip Pizlo 2013-03-30 16:58:53 PDT
(In reply to comment #5)
> (From update of attachment 195844 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=195844&action=review
> 
> >> Source/JavaScriptCore/ChangeLog:14
> >> +        The Marsaglia function runs ~60% faster with FTL, than DFG. Not a terrible start.
> > 
> > Does the code look sane?
> 
> And when you say 60% faster, is that for the full 100 iterations in your original benchmark code or comparing the two once both of them fully tier up?

Yup.
Comment 9 Mark Hahnenberg 2013-03-30 17:00:31 PDT
> > And when you say 60% faster, is that for the full 100 iterations in your original benchmark code or comparing the two once both of them fully tier up?
> 
> Yup.

I'll assume you meant "yup" to the first option :-P
Comment 10 Filip Pizlo 2013-03-30 17:02:21 PDT
(In reply to comment #9)
> > > And when you say 60% faster, is that for the full 100 iterations in your original benchmark code or comparing the two once both of them fully tier up?
> > 
> > Yup.
> 
> I'll assume you meant "yup" to the first option :-P

Actually, I don't understand the question.  What is the difference between "the full 100 iterations in your original benchmark" and "comparing the two once both of them fully tier up"?

I'm just running the program, as it exists in the benchmark I'm landing in JSRegress, either with FTL turned on, or with FTL turned off.  Both runs involve tier-ups.  Both runs include compile time.  Both runs include 100 iterations, by virtue of the fact that the program has a loop that calls marsaglia() 100 times.
Comment 11 Filip Pizlo 2013-03-30 17:03:36 PDT
Landed in http://trac.webkit.org/changeset/147283
Comment 12 Mark Hahnenberg 2013-03-30 17:08:56 PDT
> Actually, I don't understand the question.  What is the difference between "the full 100 iterations in your original benchmark" and "comparing the two once both of them fully tier up"?

Sorry, it was kind of a vague question. You could imagine comparing the two with "tier up time" not included in their times. So you'd wait some amount of time until both tiered all the way up, then test how long it takes each to run n iterations. I would imagine this would make the FTL look even better :-) I know our harness isn't really setup for that, but I didn't know if you had cooked up something custom.
Comment 13 Filip Pizlo 2013-03-30 17:15:22 PDT
(In reply to comment #12)
> > Actually, I don't understand the question.  What is the difference between "the full 100 iterations in your original benchmark" and "comparing the two once both of them fully tier up"?
> 
> Sorry, it was kind of a vague question. You could imagine comparing the two with "tier up time" not included in their times. So you'd wait some amount of time until both tiered all the way up, then test how long it takes each to run n iterations. I would imagine this would make the FTL look even better :-) I know our harness isn't really setup for that, but I didn't know if you had cooked up something custom.

Nope, I included the tier-up times.

I actually didn't even run in the harness - since the harness still makes it super awkward to pass environment variables to one configuration (you did some things to fix that, I think, but I don't remember and anyways I didn't feel like using my brain to think or eyes to read).  I just used 'time' on the command-line.

So these measurements include:

- The time it takes for the 'jsc' tool to start.
- The time it takes to run in the interpreter and baseline JIT prior to tier-up.
- The time it takes to compile the benchmark and tier up all the way to either DFG or FTL.
- The time it takes to actually run once tiered up.

One way to measure what the tier-up times look like is to increase, or decrease, the running time of the benchmark and see how it affects the speed-up.  I just tried increasing it to 10x longer and the speed-up went to 68% instead of 60%.  This implies that only a very small fraction of the 0.22 sec FTL total run-time includes compilation.

Basically what I'm seeing consistently when I mess around with the Battlestar is that it actually takes significantly less time to launch than the nay-sayers claimed.
Comment 14 Geoffrey Garen 2013-04-01 10:54:57 PDT
> Basically what I'm seeing consistently when I mess around with the Battlestar is that it actually takes significantly less time to launch than the nay-sayers claimed.

Launch times are much shorter when you start from low Earth orbit!