Bug 207183

Summary: [WTF] Try using 75% load factor for HashTable
Product: WebKit Reporter: Yusuke Suzuki <ysuzuki>
Component: Web Template FrameworkAssignee: Yusuke Suzuki <ysuzuki>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, benjamin, cdumez, cmarcelo, darin, dbates, ews-watchlist, fpizlo, mark.lam, tsavell, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=207382
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Data
none
Patch
none
Patch mark.lam: review+

Description Yusuke Suzuki 2020-02-03 22:22:30 PST
We are currently using 50% load-factor, which super aggressively allocates memory, and keeps them alive.
I wonder if we can use 75% load-factor instead.

1. Abseil SwissTable is using 75%[1]. Since rust's HashMap is derived from this, it is also using 75%[2].
2. Oracle JDK's HashMap's load-factor is 75%[3].
3. LLVM ADT DenseMap's load-factor is 75%[4].
4. Mozilla's HashTable's load-factor is 75%[5]

[1]: https://github.com/abseil/abseil-cpp
[2]: https://github.com/rust-lang/rust/blob/master/src/libstd/collections/hash/map.rs
[3]: https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html
[4]: https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/ADT/DenseMap.h#L539-L550
[5]: https://github.com/mozilla/gecko-dev/blob/master/mfbt/HashTable.h#L1568-L1572
Comment 1 Yusuke Suzuki 2020-02-03 22:28:13 PST
Created attachment 389630 [details]
Patch
Comment 2 Yusuke Suzuki 2020-02-03 22:37:06 PST
Created attachment 389632 [details]
Patch
Comment 3 Yusuke Suzuki 2020-02-04 03:08:30 PST
We need to keep power-of-two size since this is important for double-hashing.
Double-hashing requires that h2 hash value is relatively prime to table size, and we are achieving this by, (1) choosing odd number for h2 and (2) using power-of-two table-size.
Comment 4 Yusuke Suzuki 2020-02-04 23:08:45 PST
Created attachment 389775 [details]
Patch
Comment 5 Yusuke Suzuki 2020-02-05 00:10:21 PST
Created attachment 389777 [details]
Patch
Comment 6 Yusuke Suzuki 2020-02-05 00:31:55 PST
Created attachment 389779 [details]
Patch
Comment 7 Yusuke Suzuki 2020-02-05 00:33:20 PST
Created attachment 389781 [details]
Patch
Comment 8 Yusuke Suzuki 2020-02-05 00:45:09 PST
Created attachment 389782 [details]
Patch
Comment 9 Yusuke Suzuki 2020-02-05 11:54:21 PST
Created attachment 389839 [details]
Patch
Comment 10 Yusuke Suzuki 2020-02-05 12:21:14 PST
(In reply to Yusuke Suzuki from comment #0)
> We are currently using 50% load-factor, which super aggressively allocates
> memory, and keeps them alive.
> I wonder if we can use 75% load-factor instead.
> 
> 1. Abseil SwissTable is using 75%[1]. Since rust's HashMap is derived from
> this, it is also using 75%[2].
> 2. Oracle JDK's HashMap's load-factor is 75%[3].
> 3. LLVM ADT DenseMap's load-factor is 75%[4].
> 4. Mozilla's HashTable's load-factor is 75%[5]
> 
> [1]: https://github.com/abseil/abseil-cpp
> [2]:
> https://github.com/rust-lang/rust/blob/master/src/libstd/collections/hash/
> map.rs
> [3]: https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html
> [4]:
> https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/ADT/
> DenseMap.h#L539-L550
> [5]:
> https://github.com/mozilla/gecko-dev/blob/master/mfbt/HashTable.h#L1568-L1572

Note that v8's hashmap load-factor is 80%.
Comment 11 Yusuke Suzuki 2020-02-05 16:35:10 PST
Created attachment 389898 [details]
Patch
Comment 12 Mark Lam 2020-02-05 17:31:31 PST
Comment on attachment 389898 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=389898&action=review

r=me

> Source/WTF/wtf/HashTable.h:492
> +            return keyAndDeleteCount * maxLoadDenominator >= tableSize * maxLoadNumerator;

As we discussed offline, let's use 64-bit math here to ensure that there's no overflow issue.

> Source/WTF/wtf/HashTable.h:568
> +        constexpr unsigned maxLoadNumerator = 3;
> +        constexpr unsigned maxLoadDenominator = 4;

Can you put the HashTable's constexprs in a HashTableBase class, make them public, and just use those here?

Alternatively, instead of making those fields public, make this capacityForSize() function a private utility method in HashTableCapacityForSize, and make template<unsigned> struct HashTableCapacityForSize, a friend of HashTableBase (since capacityForSize() is only used by HashTableCapacityForSize).

> Source/WTF/wtf/HashTable.h:569
> +        if (size == 0)

WebKit style: use !size instead of size == 0 (though I personally prefer size == 0).

> Source/WTF/wtf/HashTable.h:571
> +        unsigned candidate = roundUpToPowerOfTwo(size);

Let's call this capacity instead of candidate.

> Source/WTF/wtf/HashTable.h:572
> +        if (size * maxLoadDenominator >= candidate * maxLoadNumerator)

Alternatively, can you put the constexpr shouldExpand in HashTableBase and just use HashTableBase::shouldExpand(size, capacity) here.

> Source/WTF/wtf/HashTable.h:1231
> +        // If we are getting halfway between 11/24 and 3/4 (we pick 15/24 == 5/8), we double the size to avoid being too close to

I suggest rephrasing "(we pick 15/24 == 5/8)" as "(i.e. past 14.5/24, which we'll round up to 15/24 i.e. 5/8)"

> Source/WTF/wtf/HashTable.h:1232
> +        // loadMax and bring the ratio close to 11/24. This give us a load in the bounds [9/24, 15/24).

I haven't figured out how you arrived at the 9/24 number, but the reasoning for choosing 5/8 is sound to me.

> Source/WTF/wtf/HashTable.h:1233
> +        bool aboveThreeQuarterLoad = keyCount * 8 >= bestTableSize * 5;

The name aboveThreeQuarterLoad sounds wrong given the math.  Can you use a better name?

> Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:58
>      // Adding items up to less than half the capacity should not change the capacity.

This comment is stale.  Please update.
Comment 13 Yusuke Suzuki 2020-02-05 18:14:48 PST
Comment on attachment 389898 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=389898&action=review

>> Source/WTF/wtf/HashTable.h:492
>> +            return keyAndDeleteCount * maxLoadDenominator >= tableSize * maxLoadNumerator;
> 
> As we discussed offline, let's use 64-bit math here to ensure that there's no overflow issue.

Fixed.

>> Source/WTF/wtf/HashTable.h:568
>> +        constexpr unsigned maxLoadDenominator = 4;
> 
> Can you put the HashTable's constexprs in a HashTableBase class, make them public, and just use those here?
> 
> Alternatively, instead of making those fields public, make this capacityForSize() function a private utility method in HashTableCapacityForSize, and make template<unsigned> struct HashTableCapacityForSize, a friend of HashTableBase (since capacityForSize() is only used by HashTableCapacityForSize).

I don't think adding HashTableBase and moving them are clean just to share these two variables: this makes HashTable complicated while the benefit is small.
But we cannot use them from HashTableCapacityForSize since we do not have a way to instantiate HashTable's template.
For now, I would like to put `capacityForSize` simply in HashTableCapacityForSize and duplicate maxLoadNumerator and maxLoadDenominator here.
We should refresh HashTableCapacityForSize's implementation (like, merging it into HashTable?), or we should remove HashTableCapacityForSize simply in the future.

>> Source/WTF/wtf/HashTable.h:569
>> +        if (size == 0)
> 
> WebKit style: use !size instead of size == 0 (though I personally prefer size == 0).

Fixed.

>> Source/WTF/wtf/HashTable.h:571
>> +        unsigned candidate = roundUpToPowerOfTwo(size);
> 
> Let's call this capacity instead of candidate.

Fixed.

>> Source/WTF/wtf/HashTable.h:572
>> +        if (size * maxLoadDenominator >= candidate * maxLoadNumerator)
> 
> Alternatively, can you put the constexpr shouldExpand in HashTableBase and just use HashTableBase::shouldExpand(size, capacity) here.

For now, I just put capacityForSize into HashTableCapacityForSize, and having duplicate implementation of shouldExpand here instead of introducing HashTableBase since it is too complicated while the benefit is small.
We should remove HashTableCapacityForSize in the future.

>> Source/WTF/wtf/HashTable.h:1231
>> +        // If we are getting halfway between 11/24 and 3/4 (we pick 15/24 == 5/8), we double the size to avoid being too close to
> 
> I suggest rephrasing "(we pick 15/24 == 5/8)" as "(i.e. past 14.5/24, which we'll round up to 15/24 i.e. 5/8)"

Changed.

>> Source/WTF/wtf/HashTable.h:1232
>> +        // loadMax and bring the ratio close to 11/24. This give us a load in the bounds [9/24, 15/24).
> 
> I haven't figured out how you arrived at the 9/24 number, but the reasoning for choosing 5/8 is sound to me.

9/24 is the ratio just after expansion. We expand when we reach to 3/4. And after expansion, we get 3/8 == 9/24.
This is the same to the original 3/12 calculation (previously we expand when we reach to 1/2. And we expand it to 1/4 == 3/12).

>> Source/WTF/wtf/HashTable.h:1233
>> +        bool aboveThreeQuarterLoad = keyCount * 8 >= bestTableSize * 5;
> 
> The name aboveThreeQuarterLoad sounds wrong given the math.  Can you use a better name?

Fixed.

>> Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:58
>>      // Adding items up to less than half the capacity should not change the capacity.
> 
> This comment is stale.  Please update.

Fixed.
Comment 14 Yusuke Suzuki 2020-02-05 18:37:43 PST
Committed r255889: <https://trac.webkit.org/changeset/255889>
Comment 15 Radar WebKit Bug Importer 2020-02-05 18:38:12 PST
<rdar://problem/59210837>
Comment 16 Darin Adler 2020-02-05 20:10:10 PST
We should fix that aggregate-sorted-data-no-storage-access test to not depend on hash table ordering. The simplest way to do that would be to change ResourceLoadStatisticsDatabaseStore::dumpResourceLoadStatistics to make a vector of the domains and sort the vector before dumping them. Or we could do more research to find out exactly how hash table order ends up influencing what's inserted into the database, which might be a more complete fix because it isn't great that hash table order influences what gets written into the database either.
Comment 17 Yusuke Suzuki 2020-02-06 12:53:07 PST
(In reply to Darin Adler from comment #16)
> We should fix that aggregate-sorted-data-no-storage-access test to not
> depend on hash table ordering. The simplest way to do that would be to
> change ResourceLoadStatisticsDatabaseStore::dumpResourceLoadStatistics to
> make a vector of the domains and sort the vector before dumping them. Or we
> could do more research to find out exactly how hash table order ends up
> influencing what's inserted into the database, which might be a more
> complete fix because it isn't great that hash table order influences what
> gets written into the database either.

I've just filed it in https://bugs.webkit.org/show_bug.cgi?id=207348 and attached the patch :)
Comment 18 Yusuke Suzuki 2020-02-07 00:56:05 PST
Committed r256011: <https://trac.webkit.org/changeset/256011>
Comment 19 Yusuke Suzuki 2020-02-07 02:56:54 PST
Rolled out. It offered large improvement in Membuster. But it also brings JetStream2/string-unpack-code-SP regression (6-10%).
I'm now looking into this.

Several notes.

1. Accelerating String comparison w/ hash code comparison does not help
2. Regression does not happen when only running this test

My (super) rough guess is that, probe length increases slightly. Comparison itself is cheap, it does not matter. But multiple probes mean that we need to access several slots of hash table. And if table gets large, cache miss rate increase. It’s possible that probe miss cost increases w/ table size. If this is true, we should pick different load factor for larger table size.

AtomStringTable size in JetStream2/string-unpack-code-SP is 30109 - 31703 when running this as a part of whole JetStream2 suite. On the other hand, it becomes 1731 - 3696 when we just run this test.
Comment 20 Truitt Savell 2020-02-07 08:23:50 PST
THe revert in https://trac.webkit.org/changeset/256011/webkit

broke http/tests/resourceLoadStatistics/aggregate-sorted-data-no-storage-access.html

tracking in https://bugs.webkit.org/show_bug.cgi?id=207382
Comment 21 Alex Christensen 2020-02-07 10:40:32 PST
Yusuke, try this in combination with https://bugs.webkit.org/show_bug.cgi?id=207251
Comment 22 Yusuke Suzuki 2020-02-07 19:41:00 PST
Created attachment 390161 [details]
Data

Attached the microbenchmark data. I think we should use 75% load-factor.
I think using this load-factor <= 1024 tables is a good tradeoff.
Comment 23 Yusuke Suzuki 2020-02-07 20:12:59 PST
Created attachment 390164 [details]
Patch
Comment 24 Yusuke Suzuki 2020-02-07 20:18:02 PST
And for larger tables, I would like to make load-factor 75% too.
But to make it possible, we should decrease the collision rate.
I'm currently trying implementing SwissTable / F14 table for larger sizes, and then, we get lower collision rate and we could safely increase load-factor for large tables too. For small tables, I’m now sticking to the current design since SwissTable / F14 table require additional control data.
Comment 25 Yusuke Suzuki 2020-02-07 20:18:23 PST
(In reply to Alex Christensen from comment #21)
> Yusuke, try this in combination with
> https://bugs.webkit.org/show_bug.cgi?id=207251

That looks nice!
Comment 26 Mark Lam 2020-02-07 21:27:26 PST
Comment on attachment 390164 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=390164&action=review

> Source/WTF/wtf/HashTable.h:316
> +        static constexpr unsigned maxSmallTableSize = 1024;

Let's call this maxSmallTableCapacity.

> Source/WTF/wtf/HashTable.h:319
> +        static constexpr bool shouldExpand(uint64_t keyAndDeleteCount, uint64_t tableSize)

Let's call rename these: keyAndDeleteCount => size, tableSize => capacity.

> Source/WTF/wtf/HashTable.h:338
> +        COMPILE_ASSERT(size > 0, HashTableNonZeroMinimumCapacity);
> +        COMPILE_ASSERT(!static_cast<unsigned>(value >> 31), HashTableNoCapacityOverflow);

We can switch to using static_assert now.  COMPILE_ASSERT is just from the old days before we had static_assert.

> Source/WTF/wtf/HashTable.h:527
> +        bool shouldExpand() const { return HashTableSizePolicy::shouldExpand(keyCount() + deletedCount(), tableSize()); }

It's a pity that tableSize() actually means tableCapacity().  Maybe we should rename these in a refactoring patch later.

> Source/WTF/wtf/HashTable.h:1256
> +            double average = loadFactor + (1.0 / minLoad);

Also, let's call this averageLoad instead of just average.
Let's also introduce another value: maxLoad = loadFactor.  This makes it clear that we're using it as the upper bound, and not as some factor.

This equation does not match the comment above.  average here should be (maxLoad + (1.0 / minLoad)) / 2.  Otherwise, you won't get 11/24 for the 3/4 and 1/6.

> Source/WTF/wtf/HashTable.h:1257
> +            double halfway = (average + loadFactor) / 2;

Based on the comment, the "halfway" here should be halfway between the averageLoad and the maxLoad i.e. it's really the 3/4 load.  So ...

Let's rename "halfway" as threeQuarterLoad, and the equation becomes: threeQuarterLoad = (averageLoad + maxLoad) / 2;

> Source/WTF/wtf/HashTable.h:1258
> +            return keyCount >= tableSize * halfway;

And this becomes: keyCount >= tableSize * threeQuarterLoad;

Note: this means that the original algorithm (based on the comment) calls for eager expansion only when the size reaches 3/4 of the way to the loadFactor.  Your algorithm above changed it to eagerly expand if size reaches half of loadFactor.  This might not be what you intended.  Sounds like you might want to retest with the benchmarks ... unless you want to go with the way you want to change to eager expansion at half of loadFactor.  If so, you should fix the comment.

> Source/WTF/wtf/HashTraits.h:56
> +    static constexpr unsigned maxLoadNumerator = 1;
> +    static constexpr unsigned maxLoadDenominator = 2;

This appears to be unused.  Why were they added?  Are they stale now that you have to handle small and large loads?

> Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:63
> +    // Adding items up to less than 3/4 of the capacity should not change the capacity.
> +    unsigned capacityLimit = initialCapacity * 3 / 4 - 1;

Is this still correct given that we have account for maxSmallTableSize now?

> Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:531
> +struct HighLoadStringHash : public WTF::HashTraits<String> {
> +    static constexpr unsigned maxLoadNumerator = 3;
> +    static constexpr unsigned maxLoadDenominator = 4;
> +};
> +
> +struct HighLoadStringHash2 : public WTF::HashTraits<String> {
> +    static constexpr unsigned maxLoadNumerator = 3;
> +    static constexpr unsigned maxLoadDenominator = 5;
> +};
> +
> +struct HighLoadUnsignedHash : public WTF::HashTraits<unsigned> {
> +    static constexpr unsigned maxLoadNumerator = 3;
> +    static constexpr unsigned maxLoadDenominator = 4;
> +};
> +
> +struct HighLoadUnsignedHash2 : public WTF::HashTraits<unsigned> {
> +    static constexpr unsigned maxLoadNumerator = 3;
> +    static constexpr unsigned maxLoadDenominator = 5;
> +};

How are these used?  Who uses them or what are they testing?  Also, are these stale now that you have handling of small and large loads?
Comment 27 Yusuke Suzuki 2020-02-07 22:24:59 PST
Created attachment 390168 [details]
Patch
Comment 28 Mark Lam 2020-02-07 22:42:16 PST
Comment on attachment 390168 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=390168&action=review

r=me with suggestions.

> Source/WTF/ChangeLog:13
> +        small tables which capacity is <= 1024 based on collected data by micro-benchmarking.

Please add a comment about why you chose to do this.

> Source/WTF/ChangeLog:22
> +        * wtf/HashTraits.h:

Please remove.

> Source/WTF/wtf/HashTable.h:1260
> +            double maxLoadRate = loadFactor;
> +            double minLoadRate = 1.0 / minLoad;
> +            double averageLoadRate = (maxLoadRate + minLoadRate) / 2;
> +            double halfWayBetweenAverageAndMaxLoadRate = (averageLoadRate + maxLoadRate) / 2;
> +            return keyCount >= tableSize * halfWayBetweenAverageAndMaxLoadRate;

nit: I suggest using "Ratio" instead of "Rate".  "Rate" makes me think of frequency, which doesn't apply here.  Instead, what you're computing here is the ratio of a load threshold relative to the tableSize.  So, I think "Ratio" accurately describes the quantities you're computing here.

> Source/WTF/wtf/HashTraits.h:56
> +
> +    static constexpr unsigned maxLoadNumerator = 1;
> +    static constexpr unsigned maxLoadDenominator = 2;

As you mention offline, this will be removed.
Comment 29 Yusuke Suzuki 2020-02-08 13:50:09 PST
Committed r256093: <https://trac.webkit.org/changeset/256093>
Comment 30 Darin Adler 2020-02-08 16:38:58 PST
Comment on attachment 390168 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=390168&action=review

> Source/WTF/wtf/HashTable.h:1271
> +        if (bestTableSize <= maxSmallTableCapacity) {
> +            constexpr double smallLoadFactor = static_cast<double>(smallMaxLoadNumerator) / smallMaxLoadDenominator;
> +            if (aboveThresholdForEagerExpansion(smallLoadFactor, keyCount, bestTableSize))
> +                bestTableSize *= 2;
> +        } else {
> +            constexpr double largeLoadFactor = static_cast<double>(largeMaxLoadNumerator) / largeMaxLoadDenominator;
> +            if (aboveThresholdForEagerExpansion(largeLoadFactor, keyCount, bestTableSize))
> +                bestTableSize *= 2;
> +        }

I’d find it more elegant to have less code inside the if statements here; the two sides differ only in the "load factor". Why not just compute the load factor in an if statement, and have the rest of the code be outside the if?

> Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:38
> +#include <wtf/DataLog.h>
>  #include <wtf/HashSet.h>
> +#include <wtf/MonotonicTime.h>
>  #include <wtf/RefPtr.h>
> +#include <wtf/text/StringHash.h>
> +#include <wtf/text/WTFString.h>

Why did all these new includes need to be added? I don’t see any changes that would seem to motivate that.
Comment 31 Yusuke Suzuki 2020-02-10 10:25:53 PST
Comment on attachment 390168 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=390168&action=review

>> Source/WTF/wtf/HashTable.h:1271
>> +        }
> 
> I’d find it more elegant to have less code inside the if statements here; the two sides differ only in the "load factor". Why not just compute the load factor in an if statement, and have the rest of the code be outside the if?

Currently, I'm planning to remove this two different load-factors by introducing swiss-table for larger size. https://bugs.webkit.org/show_bug.cgi?id=207426
So, after that, we can just remove `bestTableSize <= maxSmallTableCapacity` check here.

>> Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:38
>> +#include <wtf/text/WTFString.h>
> 
> Why did all these new includes need to be added? I don’t see any changes that would seem to motivate that.

This was removed in the landed patch :)
Comment 32 Yusuke Suzuki 2020-02-10 10:35:09 PST
Committed r256194: <https://trac.webkit.org/changeset/256194>
Comment 33 Yusuke Suzuki 2020-02-10 12:51:54 PST
Committed r256203: <https://trac.webkit.org/changeset/256203>