Bug 111533

Summary: Replace Mersenne Twister RNG with a simple but fast RNG
Product: WebKit Reporter: Andrew Bortz <andrew>
Component: Web Template FrameworkAssignee: Andrew Bortz <andrew>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, ap, benjamin, cmarcelo, dbates, eric, gyuyoung.kim, laszlo.gombos, mjs, ojan.autocc, oliver, paroga, rwlbuis, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch none

Description Andrew Bortz 2013-03-06 00:11:54 PST
Replace Mersenne Twister RNG with a simple but fast RNG
Comment 1 Andrew Bortz 2013-03-06 00:21:59 PST
Created attachment 191670 [details]
Patch
Comment 2 Eric Seidel (no email) 2013-03-06 00:24:41 PST
Comment on attachment 191670 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=191670&action=review

> Source/WTF/wtf/RandomNumber.cpp:61
> +    g_seed = g_seed + (g_seed * g_seed | 5);
> +    return static_cast<uint32_t>(g_seed >> 32);

It feels odd to write back to the seed?  Is this typical?
Comment 3 Andrew Bortz 2013-03-06 00:29:21 PST
Created attachment 191672 [details]
Patch
Comment 4 Eric Seidel (no email) 2013-03-06 00:31:38 PST
Comment on attachment 191672 [details]
Patch

LGTM.
Comment 5 Eric Seidel (no email) 2013-03-06 00:31:53 PST
Comment on attachment 191672 [details]
Patch

Should probably let the EWS chew on it first.
Comment 6 WebKit Review Bot 2013-03-06 00:34:42 PST
Attachment 191672 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/ThirdParty/ChangeLog', u'Source/ThirdParty/mt19937ar.c', u'Source/WTF/ChangeLog', u'Source/WTF/wtf/Platform.h', u'Source/WTF/wtf/RandomNumber.cpp', u'Source/WTF/wtf/RandomNumber.h', u'Source/WTF/wtf/RandomNumberSeed.h']" exit_code: 1
Source/WTF/wtf/RandomNumber.cpp:44:  g_state is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
Total errors found: 1 in 5 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 7 Benjamin Poulain 2013-03-06 01:00:07 PST
Comment on attachment 191672 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=191672&action=review

I don't know anything about random number generators, but IMHO, the patch could get a little bit of polish

> Source/WTF/ChangeLog:21
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        * wtf/Platform.h: Mersenne Twister defines are no longer needed
> +        * wtf/RandomNumber.cpp:
> +        (WTF::Internal::initializeRandomNumber): Initialize the seed of the RNG
> +        (WTF::Internal::randomNumber): Instead of using the Mersenne Twister, we
> +        use this really simple, really fast but statistically strong RNG with a
> +        guaranteed cycle of 2^64.
> +        (WTF::randomNumber): We don't need to fall back on a non-Mersenne-based
> +        RNG anymore, so this code is greatly simplified.
> +        * wtf/RandomNumber.h:
> +        * wtf/RandomNumberSeed.h:
> +        (WTF::initializeRandomNumberGenerator): This code is no longer needed.
> +        Additionally, the code had an error, since rand() returns 32-bits, so each
> +        initializationBuffer's upper 16-bits has more bits set than random.
> +

This is all very nice, but you ChangeLog should state the "Why?" of the change (like in this case comparing the two algorithms and tell what properties makes this change a better solution).

That way, people can understand the reason of the change if it causes issues in the future.

> Source/WTF/wtf/RandomNumber.cpp:57
> +// This RNG comes from: Klimov, A. and Shamir, A., "A New Class of 
> +// Invertible Mappings", Cryptographic Hardware and Embedded Systems 2002,
> +// http://dl.acm.org/citation.cfm?id=752741
> +//
> +// Very fast, very simple, and passes Diehard and other good statistical
> +// tests as strongly as cryptographically-secure RNGs (but is not itself
> +// cryptographically-secure).

Please avoid the acronym RNG.
Comment 8 Oliver Hunt 2013-03-06 10:33:58 PST
Comment on attachment 191672 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=191672&action=review

Don't use the g_ prefix on the global (we tend to use s_ in general, but also it's not exported from RandomNumber.cpp so you probably don't need a prefix anyway.

That said i think this whole thing should be in just the cpp file, there's no need to have any initializer see my comments inline.

> Source/WTF/wtf/RandomNumber.cpp:49
> +namespace Internal {
> +
> +static uint64_t g_state;
> +
> +void initializeRandomNumber(uint64_t seed)
> +{
> +    g_state = seed;
>  }

Remove these changes, and remove g_state

> Source/WTF/wtf/RandomNumber.cpp:62
> +uint32_t randomNumber()
> +{
> +    g_state += ((g_state * g_state) | 5);
> +    return static_cast<uint32_t>(g_state >> 32);
> +}

uint32_t randomNumber()
{
    // Don't use rand() as your seed here, it's been demonstrated that you can rollback to find the initial seed
    // I'd suggest a combo of time() + ASLR sourced entropy to try and make things less deterministic, a la:
    static uint64_t state = (reinterpret_cast<uintptr_t>(&state) >> 5) ^ time(0) ^ (reinterpret_cast<uintptr_t>(&state) >> 13);
    state += ((state * state) | 5);
    return static_cast<uint32_t>(state >> 32);
}

> Source/WTF/wtf/RandomNumber.h:35
> +#if !USE(OS_RANDOMNESS)
> +namespace Internal {
> +void initializeRandomNumber(uint64_t);
> +}
> +#endif

remove this, and so reset RendomNumber.h you don't need changes.

> Source/WTF/wtf/RandomNumberSeed.h:65
> -#if USE(MERSENNE_TWISTER_19937)
> -    // use rand() to initialize the Mersenne Twister random number generator.
> -    unsigned long initializationBuffer[4];
> -    initializationBuffer[0] = (rand() << 16) | rand();
> -    initializationBuffer[1] = (rand() << 16) | rand();
> -    initializationBuffer[2] = (rand() << 16) | rand();
> -    initializationBuffer[3] = (rand() << 16) | rand();
> -    init_by_array(initializationBuffer, 4);
> +#if !USE(OS_RANDOMNESS)
> +    uint64_t seed = static_cast<uint64_t>(rand()) << 32 | static_cast<uint64_t>(rand());
> +    Internal::initializeRandomNumber(seed);
>  #endif

You can remove all this explicit initialization
Comment 9 Adam Barth 2013-03-06 10:42:37 PST
>     static uint64_t state = (reinterpret_cast<uintptr_t>(&state) >> 5) ^ time(0) ^ (reinterpret_cast<uintptr_t>(&state) >> 13);

How many bits of entropy does that produce?  What about on systems that don't have ASLR enabled?
Comment 10 Adam Barth 2013-03-06 10:43:11 PST
(Also, what about on 32 bit systems?)
Comment 11 Adam Barth 2013-03-06 10:58:20 PST
>     // Don't use rand() as your seed here, it's been demonstrated that you can rollback to find the initial seed

I'm not sure we need to worry about that too much.  This RNG is explicitly not cryptographically secure.  If we had a cryptographically secure source of entropy, we'd just enable USE(OS_RANDOMNESS) and use cryptographicallyRandomNumber() anyway.

In any case, we might as well XOR in 64 bits from rand().  It's not going to hurt anything, and it will probably help in the case of a 32-bit system without ASLR.

As for the self-initialization, that's a neat trick.  One risk of your suggestion is that it creates a race condition where two threads might both think it's their job to init the RNG.  If we're worried about that, we can make sure to call WTF::randomNumber() once in RandomNumberSeed.h.
Comment 12 Oliver Hunt 2013-03-06 11:04:09 PST
(In reply to comment #9)
> >     static uint64_t state = (reinterpret_cast<uintptr_t>(&state) >> 5) ^ time(0) ^ (reinterpret_cast<uintptr_t>(&state) >> 13);
> 
> How many bits of entropy does that produce?  What about on systems that don't have ASLR enabled?

In the absence of ASLR this degenerates (essentially) to time(0), which is fairly crappy, but arguably still superior to rand() in that your seed isn't always the same.  In the absence of ASLR different builds and distributions will have different values.

ASLR bits are relatively low entropy (low 16-20 bits are fairly constant iirc)
Comment 13 Oliver Hunt 2013-03-06 11:08:57 PST
(In reply to comment #11)
> >     // Don't use rand() as your seed here, it's been demonstrated that you can rollback to find the initial seed
> 
> I'm not sure we need to worry about that too much.  This RNG is explicitly not cryptographically secure.  If we had a cryptographically secure source of entropy, we'd just enable USE(OS_RANDOMNESS) and use cryptographicallyRandomNumber() anyway.
> 
> In any case, we might as well XOR in 64 bits from rand().  It's not going to hurt anything, and it will probably help in the case of a 32-bit system without ASLR.
> 
> As for the self-initialization, that's a neat trick.  One risk of your suggestion is that it creates a race condition where two threads might both think it's their job to init the RNG.  If we're worried about that, we can make sure to call WTF::randomNumber() once in RandomNumberSeed.h.

I believe function scoped globals are guaranteed to be initialized only once, but we're not too worried about that, but your solution would work if not.

rand() produces a constant deterministic value across multiple runs, initializing with the result of rand() is logically equivalent to initializing with a constant.

Deterministic seeds have been attacked before for user tracking (you rollback the RNG until you get a value that is logically feasible that the exposed random number is feasibly a start point and track that.
Comment 14 Eric Seidel (no email) 2013-03-06 11:19:40 PST
BTW, my understanding is that this code is only used on platforms which do not have "OS_RANDOMNESS" defined:
http://trac.webkit.org/browser/trunk/Source/WTF/wtf/Platform.h#L642
http://trac.webkit.org/browser/trunk/Source/WTF/wtf/Platform.h#L701

Which may be only Blackberry and WinCE, possibly EFL?

This patch is for the non-cryptographically random code-path and is mostly about deleting code. :)
Comment 15 Adam Barth 2013-03-06 11:20:29 PST
> I believe function scoped globals are guaranteed to be initialized only once, but we're not too worried about that, but your solution would work if not.

I don't think that's correct:

http://blogs.msdn.com/b/oldnewthing/archive/2004/03/08/85901.aspx

> rand() produces a constant deterministic value across multiple runs, initializing with the result of rand() is logically equivalent to initializing with a constant.

That depends on whether and how the code using WebKit has called srand().  Given that the configurations that use this code were already using rand() for their seed entropy, we should probably continue to use XOR in bits from rand() in the seed.

> Deterministic seeds have been attacked before for user tracking

No one is advocating a deterministic seed.
Comment 16 Andrew Bortz 2013-03-07 20:54:58 PST
Created attachment 192138 [details]
Patch
Comment 17 Andrew Bortz 2013-03-07 20:56:47 PST
Since the thread-safety of a static initializer is unclear, I choose to keep the existing explicit initialization (we already had a place for it to be called anyway).

Also, given that the old code paths used rand() for initialization, I feel okay leaving that in as well. Perhaps the correct solution is to get some entropy from time(), ASLR, and rand(), and mix it all together, but I feel pretty silly doing all that for a random number generator that doesn't need cryptographic strength. If someone wants to fix this in a later patch, go for it, but we're not making things worse by leaving it in.

In case anyone wants to verify that it passes solid statistical tests, see http://www.cacert.at/cgi-bin/rngresults and look for xpxso (x plus x squared or).
Comment 18 WebKit Review Bot 2013-03-07 21:51:54 PST
Comment on attachment 192138 [details]
Patch

Clearing flags on attachment: 192138

Committed r145179: <http://trac.webkit.org/changeset/145179>
Comment 19 WebKit Review Bot 2013-03-07 21:52:01 PST
All reviewed patches have been landed.  Closing bug.
Comment 20 Laszlo Gombos 2013-05-10 10:45:04 PDT
Andrew, I am wondering who is using the !USE(OS_RANDOMNESS) path (and how did you tested this code path in the patch). As far as I can tell USE(OS_RANDOMNESS) guard is enabled for all ports on trunk.

Perhaps we can continue the discussion at bug 108095 .