Summary: | Replace Mersenne Twister RNG with a simple but fast RNG | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Andrew Bortz <andrew> | ||||||||
Component: | Web Template Framework | Assignee: | Andrew Bortz <andrew> | ||||||||
Status: | RESOLVED FIXED | ||||||||||
Severity: | Normal | CC: | abarth, ap, benjamin, cmarcelo, dbates, eric, gyuyoung.kim, laszlo.gombos, mjs, ojan.autocc, oliver, paroga, rwlbuis, webkit.review.bot | ||||||||
Priority: | P2 | ||||||||||
Version: | 528+ (Nightly build) | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Attachments: |
|
Description
Andrew Bortz
2013-03-06 00:11:54 PST
Created attachment 191670 [details]
Patch
Comment on attachment 191670 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=191670&action=review > Source/WTF/wtf/RandomNumber.cpp:61 > + g_seed = g_seed + (g_seed * g_seed | 5); > + return static_cast<uint32_t>(g_seed >> 32); It feels odd to write back to the seed? Is this typical? Created attachment 191672 [details]
Patch
Comment on attachment 191672 [details]
Patch
LGTM.
Comment on attachment 191672 [details]
Patch
Should probably let the EWS chew on it first.
Attachment 191672 [details] did not pass style-queue:
Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/ThirdParty/ChangeLog', u'Source/ThirdParty/mt19937ar.c', u'Source/WTF/ChangeLog', u'Source/WTF/wtf/Platform.h', u'Source/WTF/wtf/RandomNumber.cpp', u'Source/WTF/wtf/RandomNumber.h', u'Source/WTF/wtf/RandomNumberSeed.h']" exit_code: 1
Source/WTF/wtf/RandomNumber.cpp:44: g_state is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
Total errors found: 1 in 5 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 191672 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=191672&action=review I don't know anything about random number generators, but IMHO, the patch could get a little bit of polish > Source/WTF/ChangeLog:21 > + > + Reviewed by NOBODY (OOPS!). > + > + * wtf/Platform.h: Mersenne Twister defines are no longer needed > + * wtf/RandomNumber.cpp: > + (WTF::Internal::initializeRandomNumber): Initialize the seed of the RNG > + (WTF::Internal::randomNumber): Instead of using the Mersenne Twister, we > + use this really simple, really fast but statistically strong RNG with a > + guaranteed cycle of 2^64. > + (WTF::randomNumber): We don't need to fall back on a non-Mersenne-based > + RNG anymore, so this code is greatly simplified. > + * wtf/RandomNumber.h: > + * wtf/RandomNumberSeed.h: > + (WTF::initializeRandomNumberGenerator): This code is no longer needed. > + Additionally, the code had an error, since rand() returns 32-bits, so each > + initializationBuffer's upper 16-bits has more bits set than random. > + This is all very nice, but you ChangeLog should state the "Why?" of the change (like in this case comparing the two algorithms and tell what properties makes this change a better solution). That way, people can understand the reason of the change if it causes issues in the future. > Source/WTF/wtf/RandomNumber.cpp:57 > +// This RNG comes from: Klimov, A. and Shamir, A., "A New Class of > +// Invertible Mappings", Cryptographic Hardware and Embedded Systems 2002, > +// http://dl.acm.org/citation.cfm?id=752741 > +// > +// Very fast, very simple, and passes Diehard and other good statistical > +// tests as strongly as cryptographically-secure RNGs (but is not itself > +// cryptographically-secure). Please avoid the acronym RNG. Comment on attachment 191672 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=191672&action=review Don't use the g_ prefix on the global (we tend to use s_ in general, but also it's not exported from RandomNumber.cpp so you probably don't need a prefix anyway. That said i think this whole thing should be in just the cpp file, there's no need to have any initializer see my comments inline. > Source/WTF/wtf/RandomNumber.cpp:49 > +namespace Internal { > + > +static uint64_t g_state; > + > +void initializeRandomNumber(uint64_t seed) > +{ > + g_state = seed; > } Remove these changes, and remove g_state > Source/WTF/wtf/RandomNumber.cpp:62 > +uint32_t randomNumber() > +{ > + g_state += ((g_state * g_state) | 5); > + return static_cast<uint32_t>(g_state >> 32); > +} uint32_t randomNumber() { // Don't use rand() as your seed here, it's been demonstrated that you can rollback to find the initial seed // I'd suggest a combo of time() + ASLR sourced entropy to try and make things less deterministic, a la: static uint64_t state = (reinterpret_cast<uintptr_t>(&state) >> 5) ^ time(0) ^ (reinterpret_cast<uintptr_t>(&state) >> 13); state += ((state * state) | 5); return static_cast<uint32_t>(state >> 32); } > Source/WTF/wtf/RandomNumber.h:35 > +#if !USE(OS_RANDOMNESS) > +namespace Internal { > +void initializeRandomNumber(uint64_t); > +} > +#endif remove this, and so reset RendomNumber.h you don't need changes. > Source/WTF/wtf/RandomNumberSeed.h:65 > -#if USE(MERSENNE_TWISTER_19937) > - // use rand() to initialize the Mersenne Twister random number generator. > - unsigned long initializationBuffer[4]; > - initializationBuffer[0] = (rand() << 16) | rand(); > - initializationBuffer[1] = (rand() << 16) | rand(); > - initializationBuffer[2] = (rand() << 16) | rand(); > - initializationBuffer[3] = (rand() << 16) | rand(); > - init_by_array(initializationBuffer, 4); > +#if !USE(OS_RANDOMNESS) > + uint64_t seed = static_cast<uint64_t>(rand()) << 32 | static_cast<uint64_t>(rand()); > + Internal::initializeRandomNumber(seed); > #endif You can remove all this explicit initialization > static uint64_t state = (reinterpret_cast<uintptr_t>(&state) >> 5) ^ time(0) ^ (reinterpret_cast<uintptr_t>(&state) >> 13);
How many bits of entropy does that produce? What about on systems that don't have ASLR enabled?
(Also, what about on 32 bit systems?) > // Don't use rand() as your seed here, it's been demonstrated that you can rollback to find the initial seed
I'm not sure we need to worry about that too much. This RNG is explicitly not cryptographically secure. If we had a cryptographically secure source of entropy, we'd just enable USE(OS_RANDOMNESS) and use cryptographicallyRandomNumber() anyway.
In any case, we might as well XOR in 64 bits from rand(). It's not going to hurt anything, and it will probably help in the case of a 32-bit system without ASLR.
As for the self-initialization, that's a neat trick. One risk of your suggestion is that it creates a race condition where two threads might both think it's their job to init the RNG. If we're worried about that, we can make sure to call WTF::randomNumber() once in RandomNumberSeed.h.
(In reply to comment #9) > > static uint64_t state = (reinterpret_cast<uintptr_t>(&state) >> 5) ^ time(0) ^ (reinterpret_cast<uintptr_t>(&state) >> 13); > > How many bits of entropy does that produce? What about on systems that don't have ASLR enabled? In the absence of ASLR this degenerates (essentially) to time(0), which is fairly crappy, but arguably still superior to rand() in that your seed isn't always the same. In the absence of ASLR different builds and distributions will have different values. ASLR bits are relatively low entropy (low 16-20 bits are fairly constant iirc) (In reply to comment #11) > > // Don't use rand() as your seed here, it's been demonstrated that you can rollback to find the initial seed > > I'm not sure we need to worry about that too much. This RNG is explicitly not cryptographically secure. If we had a cryptographically secure source of entropy, we'd just enable USE(OS_RANDOMNESS) and use cryptographicallyRandomNumber() anyway. > > In any case, we might as well XOR in 64 bits from rand(). It's not going to hurt anything, and it will probably help in the case of a 32-bit system without ASLR. > > As for the self-initialization, that's a neat trick. One risk of your suggestion is that it creates a race condition where two threads might both think it's their job to init the RNG. If we're worried about that, we can make sure to call WTF::randomNumber() once in RandomNumberSeed.h. I believe function scoped globals are guaranteed to be initialized only once, but we're not too worried about that, but your solution would work if not. rand() produces a constant deterministic value across multiple runs, initializing with the result of rand() is logically equivalent to initializing with a constant. Deterministic seeds have been attacked before for user tracking (you rollback the RNG until you get a value that is logically feasible that the exposed random number is feasibly a start point and track that. BTW, my understanding is that this code is only used on platforms which do not have "OS_RANDOMNESS" defined: http://trac.webkit.org/browser/trunk/Source/WTF/wtf/Platform.h#L642 http://trac.webkit.org/browser/trunk/Source/WTF/wtf/Platform.h#L701 Which may be only Blackberry and WinCE, possibly EFL? This patch is for the non-cryptographically random code-path and is mostly about deleting code. :) > I believe function scoped globals are guaranteed to be initialized only once, but we're not too worried about that, but your solution would work if not. I don't think that's correct: http://blogs.msdn.com/b/oldnewthing/archive/2004/03/08/85901.aspx > rand() produces a constant deterministic value across multiple runs, initializing with the result of rand() is logically equivalent to initializing with a constant. That depends on whether and how the code using WebKit has called srand(). Given that the configurations that use this code were already using rand() for their seed entropy, we should probably continue to use XOR in bits from rand() in the seed. > Deterministic seeds have been attacked before for user tracking No one is advocating a deterministic seed. Created attachment 192138 [details]
Patch
Since the thread-safety of a static initializer is unclear, I choose to keep the existing explicit initialization (we already had a place for it to be called anyway). Also, given that the old code paths used rand() for initialization, I feel okay leaving that in as well. Perhaps the correct solution is to get some entropy from time(), ASLR, and rand(), and mix it all together, but I feel pretty silly doing all that for a random number generator that doesn't need cryptographic strength. If someone wants to fix this in a later patch, go for it, but we're not making things worse by leaving it in. In case anyone wants to verify that it passes solid statistical tests, see http://www.cacert.at/cgi-bin/rngresults and look for xpxso (x plus x squared or). Comment on attachment 192138 [details] Patch Clearing flags on attachment: 192138 Committed r145179: <http://trac.webkit.org/changeset/145179> All reviewed patches have been landed. Closing bug. Andrew, I am wondering who is using the !USE(OS_RANDOMNESS) path (and how did you tested this code path in the patch). As far as I can tell USE(OS_RANDOMNESS) guard is enabled for all ports on trunk. Perhaps we can continue the discussion at bug 108095 . |