WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
161640
HashMapImpl should take into account m_deleteCount in its load factor and it should be able to rehash the table to be smaller
https://bugs.webkit.org/show_bug.cgi?id=161640
Summary
HashMapImpl should take into account m_deleteCount in its load factor and it ...
Saam Barati
Reported
2016-09-06 12:23:02 PDT
...
Attachments
patch
(8.31 KB, patch)
2016-09-08 16:43 PDT
,
Saam Barati
saam
: review-
Details
Formatted Diff
Diff
patch
(9.48 KB, patch)
2016-09-08 17:06 PDT
,
Saam Barati
ggaren
: review+
Details
Formatted Diff
Diff
patch for landing
(11.57 KB, patch)
2016-09-10 23:57 PDT
,
Saam Barati
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Saam Barati
Comment 1
2016-09-08 16:43:09 PDT
Created
attachment 288357
[details]
patch
Saam Barati
Comment 2
2016-09-08 16:44:32 PDT
Comment on
attachment 288357
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=288357&action=review
> JSTests/microbenchmarks/map-rehash.js:1 > +function test(bias) {
Interestingly, this test runs 100x faster in my map implementation than the old one. Quite weird, I'm not sure why the old implementation chokes on this program. Maybe it has to do with iterators.
Ryosuke Niwa
Comment 3
2016-09-08 16:45:24 PDT
Comment on
attachment 288357
[details]
patch Did you run perf tests with this change?
Mark Lam
Comment 4
2016-09-08 16:46:08 PDT
Comment on
attachment 288357
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=288357&action=review
> JSTests/ChangeLog:3 > + HashMapImpl should be take into account m_deleteCount into its load factor and it should be able to rehash the table to be smaller
/should be take into account/should take into account/
Mark Lam
Comment 5
2016-09-08 16:52:50 PDT
Comment on
attachment 288357
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=288357&action=review
> Source/JavaScriptCore/ChangeLog:3 > + HashMapImpl should be take into account m_deleteCount into its load factor and it should be able to rehash the table to be smaller
/should be take into account/should take into account/ here too.
> Source/JavaScriptCore/ChangeLog:8 > + HashMapImpl now takes into account m_deleteCount into its load factor.
"... into ... into ..." sounds weird. How about rephrasing as "now takes m_deleteCount into account in its load factor."
Saam Barati
Comment 6
2016-09-08 16:53:12 PDT
Comment on
attachment 288357
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=288357&action=review
> Source/JavaScriptCore/runtime/HashMapImpl.h:514 > + }
This has a bug. We should be memsetting the buffer on the else case here.
Saam Barati
Comment 7
2016-09-08 17:06:33 PDT
Created
attachment 288364
[details]
patch
Geoffrey Garen
Comment 8
2016-09-08 23:22:28 PDT
Comment on
attachment 288364
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=288364&action=review
r=me
> Source/JavaScriptCore/runtime/HashMapImpl.h:420 > + if (!ASSERT_DISABLED) { > + HashMapBucketType* iter = m_head->next(); > + HashMapBucketType* end = m_tail.get(); > + uint32_t size = 0; > + while (iter != end) { > + ++size; > + iter = iter->next(); > + } > + ASSERT(size == result); > + }
This looks like it might be O(N^2) in debug builds -- that's probably too expensive even for an ASSERT. Can you move this into a function named checkConsistency() and call it less often to amortize the cost?
> Source/JavaScriptCore/runtime/HashMapImpl.h:505 > + // If we were to shrink, and we're less than 1/4th full, do indeed shrink.
I don't think this comment adds anything to the math above. It might be nice to comment on where the number 4 came from. The goal here is to avoid thrash -- we grow at 1/2X and shrink at 1/4X so the grow and shrink points are not equal.
> Source/JavaScriptCore/runtime/HashMapImpl.h:509 > + } else if (4 * itemCount <= m_capacity) { > + // If we were to stay the same capacity, and we're less than 1/4th full, then don't grow or shrink. > + // We don't need to change m_capacity.
I don't think this comment adds anything to the math above. Shouldn't this be 2 * itemCount? Our goal is 1/2X load.
> Source/JavaScriptCore/runtime/HashMapImpl.h:511 > + // Ok, we need to grow larger.
I don't think this comment adds anything to the assignment below.
> Source/JavaScriptCore/runtime/HashMapImpl.h:512 > + m_capacity = (Checked<uint32_t>(m_capacity) * 2).unsafeGet();
What's the point of a checked integer that we unsafeGet()? Is there any safety added here?
> Source/JavaScriptCore/runtime/HashMapImpl.h:544 > + m_deleteCount = 0; > + m_size = itemCount;
FWIW, I think it would be a simpler design for m_size to track non-deleted entries only, so you if you the key count you check m_size, and if you want the in-use bucket count you check m_size + m_deleteCount. You could even rename m_size to m_keyCount.
> Source/JavaScriptCore/runtime/HashMapImpl.h:546 > + if (!ASSERT_DISABLED)
No need for this branch.
Saam Barati
Comment 9
2016-09-09 19:45:52 PDT
Comment on
attachment 288364
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=288364&action=review
>> Source/JavaScriptCore/runtime/HashMapImpl.h:420 >> + } > > This looks like it might be O(N^2) in debug builds -- that's probably too expensive even for an ASSERT. > > Can you move this into a function named checkConsistency() and call it less often to amortize the cost?
Ok. I'll move this to when we rehash.
>> Source/JavaScriptCore/runtime/HashMapImpl.h:505 >> + // If we were to shrink, and we're less than 1/4th full, do indeed shrink. > > I don't think this comment adds anything to the math above. > > It might be nice to comment on where the number 4 came from. > > The goal here is to avoid thrash -- we grow at 1/2X and shrink at 1/4X so the grow and shrink points are not equal.
I chose 1/4th as a number because we know we grow larger when we fill up half way. 1/4th seems like a load ratio we can live with. So this code basically asks what's the best way to get to 1/4th full (shrinking, rehashing at the same size, or growing). I'm not sure how other hash table algorithms decide at what point to shrink/maintain size/grow. I based 1/4th off the fact that if we grow when we're half full, we will now be 1/4th full. So I'm fitting for the best capacity to remain at 1/4th full. Maybe there are better ways to do this? Any ideas?
>> Source/JavaScriptCore/runtime/HashMapImpl.h:511 >> + // Ok, we need to grow larger. > > I don't think this comment adds anything to the assignment below.
I'll remove the comments.
>> Source/JavaScriptCore/runtime/HashMapImpl.h:512 >> + m_capacity = (Checked<uint32_t>(m_capacity) * 2).unsafeGet(); > > What's the point of a checked integer that we unsafeGet()? Is there any safety added here?
The default overflow handler for Checked<T> is CrashOnOverflow, so this will crash if the multiply overflows.
>> Source/JavaScriptCore/runtime/HashMapImpl.h:544 >> + m_size = itemCount; > > FWIW, I think it would be a simpler design for m_size to track non-deleted entries only, so you if you the key count you check m_size, and if you want the in-use bucket count you check m_size + m_deleteCount. You could even rename m_size to m_keyCount.
I agree, I'm switching to this. I even found a bug because this comment caused my brain to churn on my old algorithm. This patch reports m_size+m_deleteCount as the load, however, if m_size represents the total number of filled buckets in the buffer, m_size alone should be the load. The addition of the two caused this patch to over-report the load. The solution is to move to m_keyCount, as you suggest, and then m_keyCount+m_deleteCount is the load. Also, the size() function for HashMapImpl now just returns m_keyCount as the size.
Saam Barati
Comment 10
2016-09-10 23:57:31 PDT
Created
attachment 288527
[details]
patch for landing I'll give Geoff (or anybody else) another chance to comment before landing since I slightly changed the patch to address Geoff's comments and fix a bug. If nobody comments before Monday afternoon I'll land this patch.
Geoffrey Garen
Comment 11
2016-09-12 12:17:19 PDT
Comment on
attachment 288527
[details]
patch for landing View in context:
https://bugs.webkit.org/attachment.cgi?id=288527&action=review
> Source/JavaScriptCore/runtime/HashMapImpl.h:513 > + } else if (3 * m_keyCount <= m_capacity && m_capacity > 64) { > + // We stay at the same size if rehashing would cause us to be no more than > + // 1/3rd full. This comes up for programs like this: > + // Say the hash table grew to a key count of 64, causing it to grow to a capacity of 256. > + // Then, the table added 63 items. The load is now 127. Then, 63 items are deleted. > + // The load is still 127. Then, another item is added. The load is now 128, and we > + // decide that we need to rehash. The key count is 65, almost exactly what it was > + // when we grew to a capacity of 256. We don't really need to grow to a capacity > + // of 512 in this situation. Instead, we choose to rehash at the same size. This > + // will bring the load down to 65. We rehash into the same size when we determine > + // that the new load ratio will be under 1/3rd. (We also pick a minumum capacity > + // at which this rule kicks in because otherwise we will be too sensitive to rehashing > + // at the same capacity).
I'm a little bit suspicious about these numbers because they don't exactly match what WTF::HashTable and bmalloc::Map do. Still, looks OK to me.
WebKit Commit Bot
Comment 12
2016-09-12 17:17:04 PDT
Comment on
attachment 288527
[details]
patch for landing Clearing flags on attachment: 288527 Committed
r205842
: <
http://trac.webkit.org/changeset/205842
>
WebKit Commit Bot
Comment 13
2016-09-12 17:17:09 PDT
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug