RESOLVED FIXED 190814
HashMap should support selecting a random entry
https://bugs.webkit.org/show_bug.cgi?id=190814
Summary HashMap should support selecting a random entry
Geoffrey Garen
Reported 2018-10-22 16:53:41 PDT
HashMap should support selecting a random entry
Attachments
Patch (7.92 KB, patch)
2018-10-22 17:17 PDT, Geoffrey Garen
no flags
Patch (7.92 KB, patch)
2018-10-22 17:41 PDT, Geoffrey Garen
no flags
Archive of layout-test-results from ews101 for mac-sierra (2.50 MB, application/zip)
2018-10-22 18:57 PDT, EWS Watchlist
no flags
Patch (10.35 KB, patch)
2018-10-23 21:38 PDT, Geoffrey Garen
no flags
Patch (3.05 KB, patch)
2018-10-25 22:31 PDT, Geoffrey Garen
no flags
Patch (4.21 KB, patch)
2018-10-27 19:00 PDT, Geoffrey Garen
no flags
Geoffrey Garen
Comment 1 2018-10-22 17:17:46 PDT
Geoffrey Garen
Comment 2 2018-10-22 17:41:52 PDT
EWS Watchlist
Comment 3 2018-10-22 18:57:57 PDT
Comment on attachment 352937 [details] Patch Attachment 352937 [details] did not pass mac-ews (mac): Output: https://webkit-queues.webkit.org/results/9700481 New failing tests: http/wpt/css/css-animations/start-animation-001.html
EWS Watchlist
Comment 4 2018-10-22 18:57:59 PDT
Created attachment 352943 [details] Archive of layout-test-results from ews101 for mac-sierra The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews101 Port: mac-sierra Platform: Mac OS X 10.12.6
Antti Koivisto
Comment 5 2018-10-22 22:53:00 PDT
Comment on attachment 352937 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=352937&action=review r=me > Source/WTF/ChangeLog:8 > + In some cases, remove(begin()) is not quite good enough as a random We also need to audit all sites that are currently using this pattern.
Antti Koivisto
Comment 6 2018-10-22 22:58:25 PDT
Comment on attachment 352937 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=352937&action=review > Source/WTF/wtf/RandomNumber.h:34 > +// Returns a cryptographically secure pseudo-random number in the range [0, 1). > WTF_EXPORT_PRIVATE double randomNumber(); > > +// Returns a cheap pseudo-random number in the range (0, UINT_MAX]. > +WTF_EXPORT_PRIVATE unsigned weakRandomNumber(); It is bit surprising that these similarly named functions return different types.
Antti Koivisto
Comment 7 2018-10-23 00:05:37 PDT
Comment on attachment 352937 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=352937&action=review > Source/WTF/wtf/HashMap.h:272 > +inline auto HashMap<T, U, V, W, X>::random() -> iterator You should also add this to to HashSet/ListHashSet for completeness.
Alexey Proskuryakov
Comment 8 2018-10-23 00:26:53 PDT
Random cache eviction is going to have interesting effects on performance testing, isn’t it?
Geoffrey Garen
Comment 9 2018-10-23 21:33:51 PDT
> We also need to audit all sites that are currently using this pattern. I'll do that in a separate patch, just in case it has surprising performance consequences.
Geoffrey Garen
Comment 10 2018-10-23 21:34:08 PDT
> > +// Returns a cheap pseudo-random number in the range (0, UINT_MAX]. > > +WTF_EXPORT_PRIVATE unsigned weakRandomNumber(); > > It is bit surprising that these similarly named functions return different > types. Renamed to weakRandomUint32().
Geoffrey Garen
Comment 11 2018-10-23 21:34:22 PDT
> You should also add this to to HashSet/ListHashSet for completeness. Haha, there turned out to be soooo many!
Geoffrey Garen
Comment 12 2018-10-23 21:38:41 PDT
Antti Koivisto
Comment 13 2018-10-24 00:42:14 PDT
Comment on attachment 353020 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=353020&action=review > Source/WTF/wtf/RandomNumber.h:31 > +// Returns a cryptographically secure pseudo-random number in the range [0, 1). > WTF_EXPORT_PRIVATE double randomNumber(); Might be nice to rename this to randomDouble() too at some point.
Geoffrey Garen
Comment 14 2018-10-25 10:11:06 PDT
Comment on attachment 353020 [details] Patch cq+
WebKit Commit Bot
Comment 15 2018-10-25 10:41:44 PDT
Comment on attachment 353020 [details] Patch Clearing flags on attachment: 353020 Committed r237419: <https://trac.webkit.org/changeset/237419>
WebKit Commit Bot
Comment 16 2018-10-25 10:41:46 PDT
All reviewed patches have been landed. Closing bug.
Radar WebKit Bug Importer
Comment 17 2018-10-25 10:42:33 PDT
Darin Adler
Comment 18 2018-10-25 11:19:17 PDT
For small tables I am concerned that the first entry is twice as likely as the others. I suggest looping and choosing a new random number rather than mapping end to begin. This has come up in many contexts before and that has almost always been the best strategy. I also suggest looping on other high numbers beyond what divides easily. Those also result in bias toward the beginning of the table. I suppose we should construct a test case that proves the results are evenly distributed. Should not be too hard to write one.
Geoffrey Garen
Comment 19 2018-10-25 22:23:27 PDT
> For small tables I am concerned that the first entry is twice as likely as > the others. I guess there's a potential bias issue in any table where an entry has a run of many empty or deleted buckets before it. This could happen at the beginning of the table or not. Beginning of the table example: 1 2 x x x x x x Bucket #1 maps to 2, and all other buckets map to 1. 1 has 7/8 probability. Not the beginning of the table example: 1 x x x x x x 2 Bucket #0 maps to 1, and all other buckets map to 2. 2 has 7/8 probability. That said, we expect these cases to be rare on average because we expect a hash function to distribute 1 and 2 evenly in the table. > I suggest looping and choosing a new random number rather than > mapping end to begin. This has come up in many contexts before and that has > almost always been the best strategy. A downside to looping is that it makes random() O(TableSize) when the table is sparse. But I guess begin() would have been O(TableSize) too. > I also suggest looping on other high > numbers beyond what divides easily. Those also result in bias toward the > beginning of the table. I didn't get this part. Can you give an example? > I suppose we should construct a test case that proves the results are evenly distributed. Should not be too hard to write one. I'll update Random_IsRandom. I was afraid of making a flaky test, but I think the binomial distribution guarantees-ish that 600/1000 will not be flaky.
Geoffrey Garen
Comment 20 2018-10-25 22:31:17 PDT
Reopening to attach new patch.
Geoffrey Garen
Comment 21 2018-10-25 22:31:18 PDT
Alexey Proskuryakov
Comment 22 2018-10-26 09:19:37 PDT
Do you have any insight with regards to comment 8 (i.e. effects on performance testing)? I guess we can wait and see, but randomness in cache eviction seems unlikely to be good for noise in test results.
Geoffrey Garen
Comment 23 2018-10-26 11:26:21 PDT
> Do you have any insight with regards to comment 8 (i.e. effects on > performance testing)? > > I guess we can wait and see, but randomness in cache eviction seems unlikely > to be good for noise in test results. There are no known effects. Random eviction is a pre-existing behavior.
Geoffrey Garen
Comment 24 2018-10-26 11:33:38 PDT
Comment on attachment 353159 [details] Patch cq+
WebKit Commit Bot
Comment 25 2018-10-26 11:58:59 PDT
Comment on attachment 353159 [details] Patch Clearing flags on attachment: 353159 Committed r237476: <https://trac.webkit.org/changeset/237476>
WebKit Commit Bot
Comment 26 2018-10-26 11:59:01 PDT
All reviewed patches have been landed. Closing bug.
Alexey Proskuryakov
Comment 27 2018-10-26 15:25:38 PDT
> There are no known effects. What are the expected effects? > Random eviction is a pre-existing behavior. Can you elaborate? This bug literally introduces the ability to have random eviction.
Darin Adler
Comment 28 2018-10-26 19:52:24 PDT
(In reply to Geoffrey Garen from comment #19) > > I also suggest looping on other high > > numbers beyond what divides easily. Those also result in bias toward the > > beginning of the table. > > I didn't get this part. Can you give an example? I was confused about size vs. capacity. Confusing that HashTable::capacity is backed by HashTable::m_tableSize and HashTable::size is backed by HashTable::m_keyCount. Since m_tableSize is always a power of two then there is "number that does not divide easily". And the thing you did about looping when we encounter an empty or deleted bucket is essentially what I was looking for here. > > I suppose we should construct a test case that proves the results are evenly distributed. Should not be too hard to write one. > > I'll update Random_IsRandom. I was afraid of making a flaky test, but I > think the binomial distribution guarantees-ish that 600/1000 will not be > flaky. That’s great. The patch you landed was exactly what I was looking for. For testing, I think we should go at least a little bit further. In particular, I’d like to consider adding at least one test that would definitely have failed with the old algorithm; not sure the existing "hash table of size 2" one would have failed. Maybe repeating a similar test for at least one hash table size other than 2 will do the trick. Or some sequence of operations that intentionally makes hash tables with different "densities". And with no deletion, we are definitely not testing correct handling of deleted buckets.
Darin Adler
Comment 29 2018-10-26 19:54:36 PDT
(In reply to Darin Adler from comment #28) > there is "number that does not divide easily" there is *no* "number that does not divide easily [sic]" (meant evenly)
Darin Adler
Comment 30 2018-10-26 19:56:46 PDT
(In reply to Geoffrey Garen from comment #19) > I'll update Random_IsRandom. This test is misnamed. Is should be named something more like Random_IsEvenlyDistributed. It doesn’t check for randomness. I believe this checks for one of the desirable properties of pseudorandom numbers that truly random numbers won’t necessarily have. Although I suppose that perhaps truly random numbers also have it to some high probability.
Darin Adler
Comment 31 2018-10-26 20:14:49 PDT
Comment on attachment 353159 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=353159&action=review > Source/WTF/wtf/HashTable.h:392 > + ValueType* entry = m_table + (weakRandomUint32() & m_tableSizeMask); > + if (isEmptyBucket(*entry) || isDeletedBucket(*entry)) > + continue; > + return makeKnownGoodIterator(entry); I’m in a style nitpick mood so I wanted to share how I would have written this block: auto& bucket = m_table[weakRandomUint32() & m_tableSizeMask]; if (isEmptyOrDeletedBucket(bucket)) continue; return makeKnownGoodIterator(&bucket);
Darin Adler
Comment 32 2018-10-26 20:15:40 PDT
Comment on attachment 353159 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=353159&action=review >> Source/WTF/wtf/HashTable.h:392 >> + return makeKnownGoodIterator(entry); > > I’m in a style nitpick mood so I wanted to share how I would have written this block: > > auto& bucket = m_table[weakRandomUint32() & m_tableSizeMask]; > if (isEmptyOrDeletedBucket(bucket)) > continue; > return makeKnownGoodIterator(&bucket); No, that’s not quite right. I would have written this: auto& bucket = m_table[weakRandomUint32() & m_tableSizeMask]; if (!isEmptyOrDeletedBucket(bucket)) return makeKnownGoodIterator(&bucket);
Geoffrey Garen
Comment 33 2018-10-27 17:29:08 PDT
> > There are no known effects. > > What are the expected effects? That's a broad question. I don't know how to give it any specific answer. > > Random eviction is a pre-existing behavior. > > Can you elaborate? This bug literally introduces the ability to have random > eviction. Our existing idiom for random eviction is remove(begin()). For most tables most of the time, remove(begin()) is mostly random.
Geoffrey Garen
Comment 34 2018-10-27 18:01:12 PDT
> For testing, I think we should go at least a little bit further. In > particular, I’d like to consider adding at least one test that would > definitely have failed with the old algorithm; not sure the existing "hash > table of size 2" one would have failed. Random_IsRandom [sic] did fail with the old implementation of random(). > Maybe repeating a similar test for > at least one hash table size other than 2 will do the trick. I'm worried that larger tables will require lots more iterations to guarantee that flukes don't cause test failures. I consulted an online calculator for the binomial distribution to prove that a coin flip will never-ish land heads more than 600/1000 times. I'm not quite sure how to calculate the same result across a range of numbers when you're flipping a 100-sided or 1000-sided coin. > Or some > sequence of operations that intentionally makes hash tables with different > "densities". And with no deletion, we are definitely not testing correct > handling of deleted buckets. I'll add some tests for remove() at different densities.
Geoffrey Garen
Comment 35 2018-10-27 19:00:18 PDT
Reopening to attach new patch.
Geoffrey Garen
Comment 36 2018-10-27 19:00:19 PDT
EWS Watchlist
Comment 37 2018-10-27 19:03:55 PDT
Attachment 353251 [details] did not pass style-queue: ERROR: Tools/TestWebKitAPI/Tests/WTF/HashMap.cpp:1042: Multi line control clauses should use braces. [whitespace/braces] [4] Total errors found: 1 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Commit Bot
Comment 38 2018-10-28 11:19:05 PDT
Comment on attachment 353251 [details] Patch Clearing flags on attachment: 353251 Committed r237522: <https://trac.webkit.org/changeset/237522>
WebKit Commit Bot
Comment 39 2018-10-28 11:19:07 PDT
All reviewed patches have been landed. Closing bug.
Alexey Proskuryakov
Comment 40 2018-10-29 09:11:24 PDT
> Our existing idiom for random eviction is remove(begin()). For most tables most of the time, remove(begin()) is mostly random. My question was and remains about performance testing. Are you claiming that it's already "mostly random" during page load performance testing? How so? Anyway, at the end of the day, you are the person with the highest interest in stability of page load speed and memory use testing, so if this patch makes testing more challenging, you'll discover and fix that eventually.
Geoffrey Garen
Comment 41 2018-11-01 13:07:30 PDT
> > Our existing idiom for random eviction is remove(begin()). For most tables most of the time, remove(begin()) is mostly random. > > My question was and remains about performance testing. Are you claiming that > it's already "mostly random" during page load performance testing? Yes. > How so? Hashing algorithms are mostly random. Therefore, the entry with the lowest index at a given table size is mostly random. Therefore, remove(begin()) is mostly random. It would be easier to have this discussion about a specific use case for a specific cache. The general theory of random eviction is that, though individual eviction decisions are random, overall observed behavior is stable, for the same reason that chaotic electrical impulses in the heart can produce a stable sinus rhythm.
Alexey Proskuryakov
Comment 42 2018-11-02 10:11:27 PDT
Sounds like you do not have insight into why the general theory would help when running performance tests. When running tests, we do the same things in the same order, so I don't expect remove(begin()) to be "mostly random". As I said, it is OK to look into this when/if test result stability impacts your productivity. > It would be easier to have this discussion about a specific use case for a specific cache. That would be every case where the new functionality is used. There was no discussion in bug 190957, where it was applied in several places.
Note You need to log in before you can comment on or make changes to this bug.