WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
207426
[WTF] Use SwissTable algorithm for larger HashTable
https://bugs.webkit.org/show_bug.cgi?id=207426
Summary
[WTF] Use SwissTable algorithm for larger HashTable
Yusuke Suzuki
Reported
2020-02-07 22:37:14 PST
And let's use 75% load-factor even in larger table.
Attachments
Patch
(32.09 KB, patch)
2020-02-08 22:26 PST
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(32.04 KB, patch)
2020-02-08 22:40 PST
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(32.18 KB, patch)
2020-02-08 23:04 PST
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(38.23 KB, patch)
2020-02-09 01:26 PST
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(38.94 KB, patch)
2020-02-09 15:04 PST
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(40.28 KB, patch)
2020-02-09 17:47 PST
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(42.87 KB, patch)
2020-02-09 19:19 PST
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(42.08 KB, patch)
2020-02-09 19:23 PST
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(37.28 KB, patch)
2020-02-09 22:11 PST
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(37.23 KB, patch)
2020-02-10 19:54 PST
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(39.27 KB, patch)
2020-02-11 14:44 PST
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(39.28 KB, patch)
2020-02-11 18:56 PST
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(46.07 KB, patch)
2020-02-14 00:13 PST
,
Yusuke Suzuki
keith_miller
: review+
Details
Formatted Diff
Diff
Show Obsolete
(12)
View All
Add attachment
proposed patch, testcase, etc.
Yusuke Suzuki
Comment 1
2020-02-08 22:26:24 PST
Created
attachment 390197
[details]
Patch
Yusuke Suzuki
Comment 2
2020-02-08 22:40:25 PST
Created
attachment 390198
[details]
Patch
Yusuke Suzuki
Comment 3
2020-02-08 23:04:31 PST
Created
attachment 390199
[details]
Patch
Yusuke Suzuki
Comment 4
2020-02-09 01:26:45 PST
Created
attachment 390200
[details]
Patch
Yusuke Suzuki
Comment 5
2020-02-09 15:04:33 PST
Created
attachment 390214
[details]
Patch
Yusuke Suzuki
Comment 6
2020-02-09 17:47:28 PST
Created
attachment 390220
[details]
Patch
Yusuke Suzuki
Comment 7
2020-02-09 19:19:31 PST
Created
attachment 390222
[details]
Patch
Yusuke Suzuki
Comment 8
2020-02-09 19:23:25 PST
Created
attachment 390223
[details]
Patch
Yusuke Suzuki
Comment 9
2020-02-09 22:11:27 PST
Created
attachment 390233
[details]
Patch
Yusuke Suzuki
Comment 10
2020-02-10 19:54:39 PST
Created
attachment 390339
[details]
Patch
Yusuke Suzuki
Comment 11
2020-02-10 21:57:16 PST
Some of membuster process's HashTable dump by using MALLOC_BREAKDOWN. This is interesting. Zone HashTable_0x110574000: 3867 nodes (2306608 bytes) COUNT BYTES AVG CLASS_NAME TYPE BINARY ===== ===== === ========== ==== ====== 1 16 16.0 non-object 720 57600 80.0 non-object 1739 250416 144.0 non-object 714 148512 208.0 non-object 276 75072 272.0 non-object 4 1344 336.0 non-object 61 24400 400.0 non-object 133 70224 528.0 non-object 1 592 592.0 non-object 11 7216 656.0 non-object 1 720 720.0 non-object 22 17248 784.0 non-object 66 101376 1536.0 non-object 8 16384 2048.0 non-object 28 71680 2560.0 non-object 3 9216 3072.0 non-object 31 142848 4608.0 non-object 1 5120 5120.0 non-object 3 16896 5632.0 non-object 6 39936 6656.0 non-object 1 7680 7680.0 non-object 1 8192 8192.0 non-object 8 69632 8704.0 non-object 5 53760 10752.0 non-object 4 51200 12800.0 non-object 1 14848 14848.0 non-object 4 67584 16896.0 non-object 2 41984 20992.0 non-object 2 50176 25088.0 non-object 7 458752 65536.0 non-object 2 196608 98304.0 non-object 1 229376 229376.0 non-object While # of HashTable has skew (144.0 is the most), accumulated memory of long-tail has fair-amount of memory. For example, total memory of 144.0 is smaller than 65536.0 sized table. We should optimize large-sized table too.
Yusuke Suzuki
Comment 12
2020-02-11 14:44:07 PST
Created
attachment 390429
[details]
Patch
Yusuke Suzuki
Comment 13
2020-02-11 18:56:54 PST
Created
attachment 390478
[details]
Patch
Yusuke Suzuki
Comment 14
2020-02-13 15:00:37 PST
Test failure comes from the fact that this test is relying on hashtable. In some place, we put properties of JSGlobalObject into hashtable, iterating, and picking the first one to dump. Previously, it was, "undefined": undefined But now, it is, "NaN": NaN that's why we are seeing the failure.
Yusuke Suzuki
Comment 15
2020-02-14 00:13:11 PST
Created
attachment 390734
[details]
Patch
Radar WebKit Bug Importer
Comment 16
2020-02-14 00:55:54 PST
<
rdar://problem/59453007
>
Keith Miller
Comment 17
2020-02-18 11:39:33 PST
Comment on
attachment 390734
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=390734&action=review
r=me if you fix the style errors and rename h1/h2. Can you make sure we have tests for SwissHashing in testWTF?
> Source/WTF/ChangeLog:13 > + to select Group we will search. The key is that we maintains Control bytes as a side data,
Nit: maintains => maintain and "a side data" => "side data"
> Source/WTF/ChangeLog:16 > + For each bucket, we store either of (1) 7 bit hash code, (2) empty marker, or (3) deleted marker.
Nit: either => one
> Source/WTF/ChangeLog:18 > + operation to match 16 bucket hash codes in parallel, to filter out search candidate buckets in
Nit: operation => operations. no comma after parallel.
> Source/WTF/ChangeLog:19 > + one Group. And we can see last empty bucket if any bucket of Group is empty. So, almost all cases,
Nit: "So, almost" => "So, in almost"
> Source/WTF/wtf/HashTable.h:622 > + ALWAYS_INLINE unsigned h1(unsigned hash)
h1 doesn't really convey the meaning of what we are computing here. Can we call this groupHash?
> Source/WTF/wtf/HashTable.h:628 > + inline Control h2(unsigned hash)
h2 doesn't really convey the meaning of what we are computing here. Can we call this controlHash?
> Source/WTF/wtf/HashTable.h:1033 > + inline auto HashTable<Key, Value, Extractor, HashFunctions, Traits, KeyTraits>::lookupForWriting(const T& key, unsigned h) -> LookupType
Can we change h to hash? It's poor style in the old code and now that it's a parameter is even worse...
Geoffrey Garen
Comment 18
2020-02-18 20:58:29 PST
Comment on
attachment 390734
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=390734&action=review
> Source/WTF/ChangeLog:29 > + While JetStream2 is neutral, we get sub-1% improvement in Membuster.
How big was the improvement in Membuster?
Yusuke Suzuki
Comment 19
2020-02-19 19:22:14 PST
Comment on
attachment 390734
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=390734&action=review
>> Source/WTF/ChangeLog:29 >> + While JetStream2 is neutral, we get sub-1% improvement in Membuster. > > How big was the improvement in Membuster?
In my local testing, it is ~1%. We already got large improvement when increasing the load-factor from 50% to 75% for small HashMap. This patch allows this load-factor for larger tables. But maybe, I first need to check the result of
https://bugs.webkit.org/show_bug.cgi?id=207827
and
https://trac.webkit.org/changeset/257025/webkit
Darin Adler
Comment 20
2020-04-13 20:20:16 PDT
Looks OK to me. The key is really the performance testing and making sure we also do enough correctness testing. Like maybe testing with CHECK_HASHTABLE_CONSISTENCY set to 1?
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug