Bug 161640 - HashMapImpl should take into account m_deleteCount in its load factor and it should be able to rehash the table to be smaller
Summary: HashMapImpl should take into account m_deleteCount in its load factor and it ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Local Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Saam Barati
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-06 12:23 PDT by Saam Barati
Modified: 2016-09-12 17:17 PDT (History)
13 users (show)

See Also:


Attachments
patch (8.31 KB, patch)
2016-09-08 16:43 PDT, Saam Barati
saam: review-
Details | Formatted Diff | Diff
patch (9.48 KB, patch)
2016-09-08 17:06 PDT, Saam Barati
ggaren: review+
Details | Formatted Diff | Diff
patch for landing (11.57 KB, patch)
2016-09-10 23:57 PDT, Saam Barati
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Saam Barati 2016-09-06 12:23:02 PDT
...
Comment 1 Saam Barati 2016-09-08 16:43:09 PDT
Created attachment 288357 [details]
patch
Comment 2 Saam Barati 2016-09-08 16:44:32 PDT
Comment on attachment 288357 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=288357&action=review

> JSTests/microbenchmarks/map-rehash.js:1
> +function test(bias) {

Interestingly, this test runs 100x faster in my map implementation than the old one. Quite weird, I'm not sure why the old implementation chokes on this program. Maybe it has to do with iterators.
Comment 3 Ryosuke Niwa 2016-09-08 16:45:24 PDT
Comment on attachment 288357 [details]
patch

Did you run perf tests with this change?
Comment 4 Mark Lam 2016-09-08 16:46:08 PDT
Comment on attachment 288357 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=288357&action=review

> JSTests/ChangeLog:3
> +        HashMapImpl should be take into account m_deleteCount into its load factor and it should be able to rehash the table to be smaller

/should be take into account/should take into account/
Comment 5 Mark Lam 2016-09-08 16:52:50 PDT
Comment on attachment 288357 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=288357&action=review

> Source/JavaScriptCore/ChangeLog:3
> +        HashMapImpl should be take into account m_deleteCount into its load factor and it should be able to rehash the table to be smaller

/should be take into account/should take into account/ here too.

> Source/JavaScriptCore/ChangeLog:8
> +        HashMapImpl now takes into account m_deleteCount into its load factor.

"... into ... into ..." sounds weird.  How about rephrasing as "now takes m_deleteCount into account in its load factor."
Comment 6 Saam Barati 2016-09-08 16:53:12 PDT
Comment on attachment 288357 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=288357&action=review

> Source/JavaScriptCore/runtime/HashMapImpl.h:514
> +        }

This has a bug. We should be memsetting the buffer on the else case here.
Comment 7 Saam Barati 2016-09-08 17:06:33 PDT
Created attachment 288364 [details]
patch
Comment 8 Geoffrey Garen 2016-09-08 23:22:28 PDT
Comment on attachment 288364 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=288364&action=review

r=me

> Source/JavaScriptCore/runtime/HashMapImpl.h:420
> +        if (!ASSERT_DISABLED) {
> +            HashMapBucketType* iter = m_head->next();
> +            HashMapBucketType* end = m_tail.get();
> +            uint32_t size = 0;
> +            while (iter != end) {
> +                ++size;
> +                iter = iter->next();
> +            }
> +            ASSERT(size == result);
> +        }

This looks like it might be O(N^2) in debug builds -- that's probably too expensive even for an ASSERT.

Can you move this into a function named checkConsistency() and call it less often to amortize the cost?

> Source/JavaScriptCore/runtime/HashMapImpl.h:505
> +            // If we were to shrink, and we're less than 1/4th full, do indeed shrink.

I don't think this comment adds anything to the math above.

It might be nice to comment on where the number 4 came from.

The goal here is to avoid thrash -- we grow at 1/2X and shrink at 1/4X so the grow and shrink points are not equal.

> Source/JavaScriptCore/runtime/HashMapImpl.h:509
> +        } else if (4 * itemCount <= m_capacity) {
> +            // If we were to stay the same capacity, and we're less than 1/4th full, then don't grow or shrink.
> +            // We don't need to change m_capacity.

I don't think this comment adds anything to the math above.

Shouldn't this be 2 * itemCount? Our goal is 1/2X load.

> Source/JavaScriptCore/runtime/HashMapImpl.h:511
> +            // Ok, we need to grow larger.

I don't think this comment adds anything to the assignment below.

> Source/JavaScriptCore/runtime/HashMapImpl.h:512
> +            m_capacity = (Checked<uint32_t>(m_capacity) * 2).unsafeGet();

What's the point of a checked integer that we unsafeGet()? Is there any safety added here?

> Source/JavaScriptCore/runtime/HashMapImpl.h:544
> +        m_deleteCount = 0;
> +        m_size = itemCount;

FWIW, I think it would be a simpler design for m_size to track non-deleted entries only, so you if you the key count you check m_size, and if you want the in-use bucket count you check m_size + m_deleteCount. You could even rename m_size to m_keyCount.

> Source/JavaScriptCore/runtime/HashMapImpl.h:546
> +        if (!ASSERT_DISABLED)

No need for this branch.
Comment 9 Saam Barati 2016-09-09 19:45:52 PDT
Comment on attachment 288364 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=288364&action=review

>> Source/JavaScriptCore/runtime/HashMapImpl.h:420
>> +        }
> 
> This looks like it might be O(N^2) in debug builds -- that's probably too expensive even for an ASSERT.
> 
> Can you move this into a function named checkConsistency() and call it less often to amortize the cost?

Ok. I'll move this to when we rehash.

>> Source/JavaScriptCore/runtime/HashMapImpl.h:505
>> +            // If we were to shrink, and we're less than 1/4th full, do indeed shrink.
> 
> I don't think this comment adds anything to the math above.
> 
> It might be nice to comment on where the number 4 came from.
> 
> The goal here is to avoid thrash -- we grow at 1/2X and shrink at 1/4X so the grow and shrink points are not equal.

I chose 1/4th as a number because we know we grow larger when we fill up half way.
1/4th seems like a load ratio we can live with. So this code basically asks what's the best
way to get to 1/4th full (shrinking, rehashing at the same size, or growing). I'm not sure
how other hash table algorithms decide at what point to shrink/maintain size/grow. I based
1/4th off the fact that if we grow when we're half full, we will now be 1/4th full. So I'm fitting
for the best capacity to remain at 1/4th full.

Maybe there are better ways to do this? Any ideas?

>> Source/JavaScriptCore/runtime/HashMapImpl.h:511
>> +            // Ok, we need to grow larger.
> 
> I don't think this comment adds anything to the assignment below.

I'll remove the comments.

>> Source/JavaScriptCore/runtime/HashMapImpl.h:512
>> +            m_capacity = (Checked<uint32_t>(m_capacity) * 2).unsafeGet();
> 
> What's the point of a checked integer that we unsafeGet()? Is there any safety added here?

The default overflow handler for Checked<T> is CrashOnOverflow, so this will crash if the multiply overflows.

>> Source/JavaScriptCore/runtime/HashMapImpl.h:544
>> +        m_size = itemCount;
> 
> FWIW, I think it would be a simpler design for m_size to track non-deleted entries only, so you if you the key count you check m_size, and if you want the in-use bucket count you check m_size + m_deleteCount. You could even rename m_size to m_keyCount.

I agree, I'm switching to this. I even found a bug because this comment caused my brain to churn on my old algorithm. This patch reports m_size+m_deleteCount as the load, however, if m_size
represents the total number of filled buckets in the buffer, m_size alone should be the load. The addition of the two caused this patch to over-report the load. The solution is to move to
m_keyCount, as you suggest, and then m_keyCount+m_deleteCount is the load. Also, the size() function for HashMapImpl now just returns m_keyCount as the size.
Comment 10 Saam Barati 2016-09-10 23:57:31 PDT
Created attachment 288527 [details]
patch for landing

I'll give Geoff (or anybody else) another chance to comment before landing
since I slightly changed the patch to address Geoff's comments and fix a bug.
If nobody comments before Monday afternoon I'll land this patch.
Comment 11 Geoffrey Garen 2016-09-12 12:17:19 PDT
Comment on attachment 288527 [details]
patch for landing

View in context: https://bugs.webkit.org/attachment.cgi?id=288527&action=review

> Source/JavaScriptCore/runtime/HashMapImpl.h:513
> +        } else if (3 * m_keyCount <= m_capacity && m_capacity > 64) {
> +            // We stay at the same size if rehashing would cause us to be no more than
> +            // 1/3rd full. This comes up for programs like this:
> +            // Say the hash table grew to a key count of 64, causing it to grow to a capacity of 256.
> +            // Then, the table added 63 items. The load is now 127. Then, 63 items are deleted.
> +            // The load is still 127. Then, another item is added. The load is now 128, and we
> +            // decide that we need to rehash. The key count is 65, almost exactly what it was
> +            // when we grew to a capacity of 256. We don't really need to grow to a capacity
> +            // of 512 in this situation. Instead, we choose to rehash at the same size. This
> +            // will bring the load down to 65. We rehash into the same size when we determine
> +            // that the new load ratio will be under 1/3rd. (We also pick a minumum capacity
> +            // at which this rule kicks in because otherwise we will be too sensitive to rehashing
> +            // at the same capacity).

I'm a little bit suspicious about these numbers because they don't exactly match what WTF::HashTable and bmalloc::Map do. Still, looks OK to me.
Comment 12 WebKit Commit Bot 2016-09-12 17:17:04 PDT
Comment on attachment 288527 [details]
patch for landing

Clearing flags on attachment: 288527

Committed r205842: <http://trac.webkit.org/changeset/205842>
Comment 13 WebKit Commit Bot 2016-09-12 17:17:09 PDT
All reviewed patches have been landed.  Closing bug.