190814 – HashMap should support selecting a random entry

RESOLVED FIXED 190814

HashMap should support selecting a random entry

https://bugs.webkit.org/show_bug.cgi?id=190814

Summary HashMap should support selecting a random entry

Geoffrey Garen

Reported 2018-10-22 16:53:41 PDT

HashMap should support selecting a random entry

Attachments
Patch (7.92 KB, patch) 2018-10-22 17:17 PDT, Geoffrey Garen	no flags	Details Formatted Diff Diff
Patch (7.92 KB, patch) 2018-10-22 17:41 PDT, Geoffrey Garen	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews101 for mac-sierra (2.50 MB, application/zip) 2018-10-22 18:57 PDT, EWS Watchlist	no flags	Details
Patch (10.35 KB, patch) 2018-10-23 21:38 PDT, Geoffrey Garen	no flags	Details Formatted Diff Diff
Patch (3.05 KB, patch) 2018-10-25 22:31 PDT, Geoffrey Garen	no flags	Details Formatted Diff Diff
Patch (4.21 KB, patch) 2018-10-27 19:00 PDT, Geoffrey Garen	no flags	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Geoffrey Garen

Comment 1 2018-10-22 17:17:46 PDT

Created attachment 352930 [details] Patch

Geoffrey Garen

Comment 2 2018-10-22 17:41:52 PDT

Created attachment 352937 [details] Patch

EWS Watchlist

Comment 3 2018-10-22 18:57:57 PDT

Comment on attachment 352937 [details] Patch Attachment 352937 [details] did not pass mac-ews (mac): Output: https://webkit-queues.webkit.org/results/9700481 New failing tests: http/wpt/css/css-animations/start-animation-001.html

EWS Watchlist

Comment 4 2018-10-22 18:57:59 PDT

Created attachment 352943 [details] Archive of layout-test-results from ews101 for mac-sierra The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews101 Port: mac-sierra Platform: Mac OS X 10.12.6

Antti Koivisto

Comment 5 2018-10-22 22:53:00 PDT

Comment on attachment 352937 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=352937&action=review r=me > Source/WTF/ChangeLog:8 > + In some cases, remove(begin()) is not quite good enough as a random We also need to audit all sites that are currently using this pattern.

Antti Koivisto

Comment 6 2018-10-22 22:58:25 PDT

Comment on attachment 352937 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=352937&action=review > Source/WTF/wtf/RandomNumber.h:34 > +// Returns a cryptographically secure pseudo-random number in the range [0, 1). > WTF_EXPORT_PRIVATE double randomNumber(); > > +// Returns a cheap pseudo-random number in the range (0, UINT_MAX]. > +WTF_EXPORT_PRIVATE unsigned weakRandomNumber(); It is bit surprising that these similarly named functions return different types.

Antti Koivisto

Comment 7 2018-10-23 00:05:37 PDT

Comment on attachment 352937 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=352937&action=review > Source/WTF/wtf/HashMap.h:272 > +inline auto HashMap<T, U, V, W, X>::random() -> iterator You should also add this to to HashSet/ListHashSet for completeness.

Alexey Proskuryakov

Comment 8 2018-10-23 00:26:53 PDT

Random cache eviction is going to have interesting effects on performance testing, isn’t it?

Geoffrey Garen

Comment 9 2018-10-23 21:33:51 PDT

> We also need to audit all sites that are currently using this pattern. I'll do that in a separate patch, just in case it has surprising performance consequences.

Geoffrey Garen

Comment 10 2018-10-23 21:34:08 PDT

> > +// Returns a cheap pseudo-random number in the range (0, UINT_MAX]. > > +WTF_EXPORT_PRIVATE unsigned weakRandomNumber(); > > It is bit surprising that these similarly named functions return different > types. Renamed to weakRandomUint32().

Geoffrey Garen

Comment 11 2018-10-23 21:34:22 PDT

> You should also add this to to HashSet/ListHashSet for completeness. Haha, there turned out to be soooo many!

Geoffrey Garen

Comment 12 2018-10-23 21:38:41 PDT

Created attachment 353020 [details] Patch

Antti Koivisto

Comment 13 2018-10-24 00:42:14 PDT

Comment on attachment 353020 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=353020&action=review > Source/WTF/wtf/RandomNumber.h:31 > +// Returns a cryptographically secure pseudo-random number in the range [0, 1). > WTF_EXPORT_PRIVATE double randomNumber(); Might be nice to rename this to randomDouble() too at some point.

Geoffrey Garen

Comment 14 2018-10-25 10:11:06 PDT

Comment on attachment 353020 [details] Patch cq+

WebKit Commit Bot

Comment 15 2018-10-25 10:41:44 PDT

Comment on attachment 353020 [details] Patch Clearing flags on attachment: 353020 Committed r237419: <https://trac.webkit.org/changeset/237419>

WebKit Commit Bot

Comment 16 2018-10-25 10:41:46 PDT

All reviewed patches have been landed. Closing bug.

Radar WebKit Bug Importer

Comment 17 2018-10-25 10:42:33 PDT

<rdar://problem/45558390>

Darin Adler

Comment 18 2018-10-25 11:19:17 PDT

For small tables I am concerned that the first entry is twice as likely as the others. I suggest looping and choosing a new random number rather than mapping end to begin. This has come up in many contexts before and that has almost always been the best strategy. I also suggest looping on other high numbers beyond what divides easily. Those also result in bias toward the beginning of the table. I suppose we should construct a test case that proves the results are evenly distributed. Should not be too hard to write one.

Geoffrey Garen

Comment 19 2018-10-25 22:23:27 PDT

> For small tables I am concerned that the first entry is twice as likely as > the others. I guess there's a potential bias issue in any table where an entry has a run of many empty or deleted buckets before it. This could happen at the beginning of the table or not. Beginning of the table example: 1 2 x x x x x x Bucket #1 maps to 2, and all other buckets map to 1. 1 has 7/8 probability. Not the beginning of the table example: 1 x x x x x x 2 Bucket #0 maps to 1, and all other buckets map to 2. 2 has 7/8 probability. That said, we expect these cases to be rare on average because we expect a hash function to distribute 1 and 2 evenly in the table. > I suggest looping and choosing a new random number rather than > mapping end to begin. This has come up in many contexts before and that has > almost always been the best strategy. A downside to looping is that it makes random() O(TableSize) when the table is sparse. But I guess begin() would have been O(TableSize) too. > I also suggest looping on other high > numbers beyond what divides easily. Those also result in bias toward the > beginning of the table. I didn't get this part. Can you give an example? > I suppose we should construct a test case that proves the results are evenly distributed. Should not be too hard to write one. I'll update Random_IsRandom. I was afraid of making a flaky test, but I think the binomial distribution guarantees-ish that 600/1000 will not be flaky.

Geoffrey Garen

Comment 20 2018-10-25 22:31:17 PDT

Reopening to attach new patch.

Geoffrey Garen

Comment 21 2018-10-25 22:31:18 PDT

Created attachment 353159 [details] Patch

Alexey Proskuryakov

Comment 22 2018-10-26 09:19:37 PDT

Do you have any insight with regards to comment 8 (i.e. effects on performance testing)? I guess we can wait and see, but randomness in cache eviction seems unlikely to be good for noise in test results.

Geoffrey Garen

Comment 23 2018-10-26 11:26:21 PDT

> Do you have any insight with regards to comment 8 (i.e. effects on > performance testing)? > > I guess we can wait and see, but randomness in cache eviction seems unlikely > to be good for noise in test results. There are no known effects. Random eviction is a pre-existing behavior.

Geoffrey Garen

Comment 24 2018-10-26 11:33:38 PDT

Comment on attachment 353159 [details] Patch cq+

WebKit Commit Bot

Comment 25 2018-10-26 11:58:59 PDT

Comment on attachment 353159 [details] Patch Clearing flags on attachment: 353159 Committed r237476: <https://trac.webkit.org/changeset/237476>

WebKit Commit Bot

Comment 26 2018-10-26 11:59:01 PDT

All reviewed patches have been landed. Closing bug.

Alexey Proskuryakov

Comment 27 2018-10-26 15:25:38 PDT

> There are no known effects. What are the expected effects? > Random eviction is a pre-existing behavior. Can you elaborate? This bug literally introduces the ability to have random eviction.

Darin Adler

Comment 28 2018-10-26 19:52:24 PDT

(In reply to Geoffrey Garen from comment #19) > > I also suggest looping on other high > > numbers beyond what divides easily. Those also result in bias toward the > > beginning of the table. > > I didn't get this part. Can you give an example? I was confused about size vs. capacity. Confusing that HashTable::capacity is backed by HashTable::m_tableSize and HashTable::size is backed by HashTable::m_keyCount. Since m_tableSize is always a power of two then there is "number that does not divide easily". And the thing you did about looping when we encounter an empty or deleted bucket is essentially what I was looking for here. > > I suppose we should construct a test case that proves the results are evenly distributed. Should not be too hard to write one. > > I'll update Random_IsRandom. I was afraid of making a flaky test, but I > think the binomial distribution guarantees-ish that 600/1000 will not be > flaky. That’s great. The patch you landed was exactly what I was looking for. For testing, I think we should go at least a little bit further. In particular, I’d like to consider adding at least one test that would definitely have failed with the old algorithm; not sure the existing "hash table of size 2" one would have failed. Maybe repeating a similar test for at least one hash table size other than 2 will do the trick. Or some sequence of operations that intentionally makes hash tables with different "densities". And with no deletion, we are definitely not testing correct handling of deleted buckets.

Darin Adler

Comment 29 2018-10-26 19:54:36 PDT

(In reply to Darin Adler from comment #28) > there is "number that does not divide easily" there is *no* "number that does not divide easily [sic]" (meant evenly)

Darin Adler

Comment 30 2018-10-26 19:56:46 PDT

(In reply to Geoffrey Garen from comment #19) > I'll update Random_IsRandom. This test is misnamed. Is should be named something more like Random_IsEvenlyDistributed. It doesn’t check for randomness. I believe this checks for one of the desirable properties of pseudorandom numbers that truly random numbers won’t necessarily have. Although I suppose that perhaps truly random numbers also have it to some high probability.

Darin Adler

Comment 31 2018-10-26 20:14:49 PDT

Comment on attachment 353159 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=353159&action=review > Source/WTF/wtf/HashTable.h:392 > + ValueType* entry = m_table + (weakRandomUint32() & m_tableSizeMask); > + if (isEmptyBucket(*entry) || isDeletedBucket(*entry)) > + continue; > + return makeKnownGoodIterator(entry); I’m in a style nitpick mood so I wanted to share how I would have written this block: auto& bucket = m_table[weakRandomUint32() & m_tableSizeMask]; if (isEmptyOrDeletedBucket(bucket)) continue; return makeKnownGoodIterator(&bucket);

Darin Adler

Comment 32 2018-10-26 20:15:40 PDT

Comment on attachment 353159 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=353159&action=review >> Source/WTF/wtf/HashTable.h:392 >> + return makeKnownGoodIterator(entry); > > I’m in a style nitpick mood so I wanted to share how I would have written this block: > > auto& bucket = m_table[weakRandomUint32() & m_tableSizeMask]; > if (isEmptyOrDeletedBucket(bucket)) > continue; > return makeKnownGoodIterator(&bucket); No, that’s not quite right. I would have written this: auto& bucket = m_table[weakRandomUint32() & m_tableSizeMask]; if (!isEmptyOrDeletedBucket(bucket)) return makeKnownGoodIterator(&bucket);

Geoffrey Garen

Comment 33 2018-10-27 17:29:08 PDT

> > There are no known effects. > > What are the expected effects? That's a broad question. I don't know how to give it any specific answer. > > Random eviction is a pre-existing behavior. > > Can you elaborate? This bug literally introduces the ability to have random > eviction. Our existing idiom for random eviction is remove(begin()). For most tables most of the time, remove(begin()) is mostly random.

Geoffrey Garen

Comment 34 2018-10-27 18:01:12 PDT

> For testing, I think we should go at least a little bit further. In > particular, I’d like to consider adding at least one test that would > definitely have failed with the old algorithm; not sure the existing "hash > table of size 2" one would have failed. Random_IsRandom [sic] did fail with the old implementation of random(). > Maybe repeating a similar test for > at least one hash table size other than 2 will do the trick. I'm worried that larger tables will require lots more iterations to guarantee that flukes don't cause test failures. I consulted an online calculator for the binomial distribution to prove that a coin flip will never-ish land heads more than 600/1000 times. I'm not quite sure how to calculate the same result across a range of numbers when you're flipping a 100-sided or 1000-sided coin. > Or some > sequence of operations that intentionally makes hash tables with different > "densities". And with no deletion, we are definitely not testing correct > handling of deleted buckets. I'll add some tests for remove() at different densities.

Geoffrey Garen

Comment 35 2018-10-27 19:00:18 PDT

Reopening to attach new patch.

Geoffrey Garen

Comment 36 2018-10-27 19:00:19 PDT

Created attachment 353251 [details] Patch

EWS Watchlist

Comment 37 2018-10-27 19:03:55 PDT

Attachment 353251 [details] did not pass style-queue: ERROR: Tools/TestWebKitAPI/Tests/WTF/HashMap.cpp:1042: Multi line control clauses should use braces. [whitespace/braces] [4] Total errors found: 1 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.

WebKit Commit Bot

Comment 38 2018-10-28 11:19:05 PDT

Comment on attachment 353251 [details] Patch Clearing flags on attachment: 353251 Committed r237522: <https://trac.webkit.org/changeset/237522>

WebKit Commit Bot

Comment 39 2018-10-28 11:19:07 PDT

All reviewed patches have been landed. Closing bug.

Alexey Proskuryakov

Comment 40 2018-10-29 09:11:24 PDT

> Our existing idiom for random eviction is remove(begin()). For most tables most of the time, remove(begin()) is mostly random. My question was and remains about performance testing. Are you claiming that it's already "mostly random" during page load performance testing? How so? Anyway, at the end of the day, you are the person with the highest interest in stability of page load speed and memory use testing, so if this patch makes testing more challenging, you'll discover and fix that eventually.

Geoffrey Garen

Comment 41 2018-11-01 13:07:30 PDT

> > Our existing idiom for random eviction is remove(begin()). For most tables most of the time, remove(begin()) is mostly random. > > My question was and remains about performance testing. Are you claiming that > it's already "mostly random" during page load performance testing? Yes. > How so? Hashing algorithms are mostly random. Therefore, the entry with the lowest index at a given table size is mostly random. Therefore, remove(begin()) is mostly random. It would be easier to have this discussion about a specific use case for a specific cache. The general theory of random eviction is that, though individual eviction decisions are random, overall observed behavior is stable, for the same reason that chaotic electrical impulses in the heart can produce a stable sinus rhythm.

Alexey Proskuryakov

Comment 42 2018-11-02 10:11:27 PDT

Sounds like you do not have insight into why the general theory would help when running performance tests. When running tests, we do the same things in the same order, so I don't expect remove(begin()) to be "mostly random". As I said, it is OK to look into this when/if test result stability impacts your productivity. > It would be easier to have this discussion about a specific use case for a specific cache. That would be every case where the new functionality is used. There was no discussion in bug 190957, where it was applied in several places.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component New Bugs

Assignee

Geoffrey Garen

Reported

2018-10-22 16:53 PDT

Modified

2018-11-02 10:11 PDT History

CC List

13 users Show

URL

Keywords InRadar

Depends on

Blocks