Bug 207426

Summary: [WTF] Use SwissTable algorithm for larger HashTable
Product: WebKit Reporter: Yusuke Suzuki <ysuzuki>
Component: Web Template FrameworkAssignee: Yusuke Suzuki <ysuzuki>
Status: NEW    
Severity: Normal CC: benjamin, cdumez, cmarcelo, darin, dbates, ews-watchlist, ggaren, keith_miller, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch keith_miller: review+

Yusuke Suzuki
Reported 2020-02-07 22:37:14 PST
And let's use 75% load-factor even in larger table.
Attachments
Patch (32.09 KB, patch)
2020-02-08 22:26 PST, Yusuke Suzuki
no flags
Patch (32.04 KB, patch)
2020-02-08 22:40 PST, Yusuke Suzuki
no flags
Patch (32.18 KB, patch)
2020-02-08 23:04 PST, Yusuke Suzuki
no flags
Patch (38.23 KB, patch)
2020-02-09 01:26 PST, Yusuke Suzuki
no flags
Patch (38.94 KB, patch)
2020-02-09 15:04 PST, Yusuke Suzuki
no flags
Patch (40.28 KB, patch)
2020-02-09 17:47 PST, Yusuke Suzuki
no flags
Patch (42.87 KB, patch)
2020-02-09 19:19 PST, Yusuke Suzuki
no flags
Patch (42.08 KB, patch)
2020-02-09 19:23 PST, Yusuke Suzuki
no flags
Patch (37.28 KB, patch)
2020-02-09 22:11 PST, Yusuke Suzuki
no flags
Patch (37.23 KB, patch)
2020-02-10 19:54 PST, Yusuke Suzuki
no flags
Patch (39.27 KB, patch)
2020-02-11 14:44 PST, Yusuke Suzuki
no flags
Patch (39.28 KB, patch)
2020-02-11 18:56 PST, Yusuke Suzuki
no flags
Patch (46.07 KB, patch)
2020-02-14 00:13 PST, Yusuke Suzuki
keith_miller: review+
Yusuke Suzuki
Comment 1 2020-02-08 22:26:24 PST
Yusuke Suzuki
Comment 2 2020-02-08 22:40:25 PST
Yusuke Suzuki
Comment 3 2020-02-08 23:04:31 PST
Yusuke Suzuki
Comment 4 2020-02-09 01:26:45 PST
Yusuke Suzuki
Comment 5 2020-02-09 15:04:33 PST
Yusuke Suzuki
Comment 6 2020-02-09 17:47:28 PST
Yusuke Suzuki
Comment 7 2020-02-09 19:19:31 PST
Yusuke Suzuki
Comment 8 2020-02-09 19:23:25 PST
Yusuke Suzuki
Comment 9 2020-02-09 22:11:27 PST
Yusuke Suzuki
Comment 10 2020-02-10 19:54:39 PST
Yusuke Suzuki
Comment 11 2020-02-10 21:57:16 PST
Some of membuster process's HashTable dump by using MALLOC_BREAKDOWN. This is interesting. Zone HashTable_0x110574000: 3867 nodes (2306608 bytes) COUNT BYTES AVG CLASS_NAME TYPE BINARY ===== ===== === ========== ==== ====== 1 16 16.0 non-object 720 57600 80.0 non-object 1739 250416 144.0 non-object 714 148512 208.0 non-object 276 75072 272.0 non-object 4 1344 336.0 non-object 61 24400 400.0 non-object 133 70224 528.0 non-object 1 592 592.0 non-object 11 7216 656.0 non-object 1 720 720.0 non-object 22 17248 784.0 non-object 66 101376 1536.0 non-object 8 16384 2048.0 non-object 28 71680 2560.0 non-object 3 9216 3072.0 non-object 31 142848 4608.0 non-object 1 5120 5120.0 non-object 3 16896 5632.0 non-object 6 39936 6656.0 non-object 1 7680 7680.0 non-object 1 8192 8192.0 non-object 8 69632 8704.0 non-object 5 53760 10752.0 non-object 4 51200 12800.0 non-object 1 14848 14848.0 non-object 4 67584 16896.0 non-object 2 41984 20992.0 non-object 2 50176 25088.0 non-object 7 458752 65536.0 non-object 2 196608 98304.0 non-object 1 229376 229376.0 non-object While # of HashTable has skew (144.0 is the most), accumulated memory of long-tail has fair-amount of memory. For example, total memory of 144.0 is smaller than 65536.0 sized table. We should optimize large-sized table too.
Yusuke Suzuki
Comment 12 2020-02-11 14:44:07 PST
Yusuke Suzuki
Comment 13 2020-02-11 18:56:54 PST
Yusuke Suzuki
Comment 14 2020-02-13 15:00:37 PST
Test failure comes from the fact that this test is relying on hashtable. In some place, we put properties of JSGlobalObject into hashtable, iterating, and picking the first one to dump. Previously, it was, "undefined": undefined But now, it is, "NaN": NaN that's why we are seeing the failure.
Yusuke Suzuki
Comment 15 2020-02-14 00:13:11 PST
Radar WebKit Bug Importer
Comment 16 2020-02-14 00:55:54 PST
Keith Miller
Comment 17 2020-02-18 11:39:33 PST
Comment on attachment 390734 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=390734&action=review r=me if you fix the style errors and rename h1/h2. Can you make sure we have tests for SwissHashing in testWTF? > Source/WTF/ChangeLog:13 > + to select Group we will search. The key is that we maintains Control bytes as a side data, Nit: maintains => maintain and "a side data" => "side data" > Source/WTF/ChangeLog:16 > + For each bucket, we store either of (1) 7 bit hash code, (2) empty marker, or (3) deleted marker. Nit: either => one > Source/WTF/ChangeLog:18 > + operation to match 16 bucket hash codes in parallel, to filter out search candidate buckets in Nit: operation => operations. no comma after parallel. > Source/WTF/ChangeLog:19 > + one Group. And we can see last empty bucket if any bucket of Group is empty. So, almost all cases, Nit: "So, almost" => "So, in almost" > Source/WTF/wtf/HashTable.h:622 > + ALWAYS_INLINE unsigned h1(unsigned hash) h1 doesn't really convey the meaning of what we are computing here. Can we call this groupHash? > Source/WTF/wtf/HashTable.h:628 > + inline Control h2(unsigned hash) h2 doesn't really convey the meaning of what we are computing here. Can we call this controlHash? > Source/WTF/wtf/HashTable.h:1033 > + inline auto HashTable<Key, Value, Extractor, HashFunctions, Traits, KeyTraits>::lookupForWriting(const T& key, unsigned h) -> LookupType Can we change h to hash? It's poor style in the old code and now that it's a parameter is even worse...
Geoffrey Garen
Comment 18 2020-02-18 20:58:29 PST
Comment on attachment 390734 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=390734&action=review > Source/WTF/ChangeLog:29 > + While JetStream2 is neutral, we get sub-1% improvement in Membuster. How big was the improvement in Membuster?
Yusuke Suzuki
Comment 19 2020-02-19 19:22:14 PST
Comment on attachment 390734 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=390734&action=review >> Source/WTF/ChangeLog:29 >> + While JetStream2 is neutral, we get sub-1% improvement in Membuster. > > How big was the improvement in Membuster? In my local testing, it is ~1%. We already got large improvement when increasing the load-factor from 50% to 75% for small HashMap. This patch allows this load-factor for larger tables. But maybe, I first need to check the result of https://bugs.webkit.org/show_bug.cgi?id=207827 and https://trac.webkit.org/changeset/257025/webkit
Darin Adler
Comment 20 2020-04-13 20:20:16 PDT
Looks OK to me. The key is really the performance testing and making sure we also do enough correctness testing. Like maybe testing with CHECK_HASHTABLE_CONSISTENCY set to 1?
Note You need to log in before you can comment on or make changes to this bug.