Bug 196681

Summary: HashTable::removeIf always shrinks the hash table by half even if there is nothing left
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: Web Template FrameworkAssignee: Ryosuke Niwa <rniwa>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, cdumez, cmarcelo, commit-queue, darin, dbates, ews-watchlist, ggaren, simon.fraser, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Fixes the bug
none
Patch for landing
none
Patch darin: review+

Description Ryosuke Niwa 2019-04-06 17:19:11 PDT
HashTable::removeIf always shrinks the table to half of its original size
even if there is nothing left in the table afterwards.
Comment 1 Ryosuke Niwa 2019-04-06 17:22:35 PDT
HashTable::removeIf has the code like this:

if (shouldShrink())
    shrink();

But shrink does:

void shrink() { rehash(m_tableSize / 2, nullptr); }

Let's say we removed 30 out of 32 entires in the hash table. Then m_tableSize might have been 64 but we'd only shrink the table to 16 after the removal instead of the minimal table size of 8.
Comment 2 Ryosuke Niwa 2019-04-06 17:30:00 PDT
Maybe this design was intentional to limit the rate by which we shrink to avoid table size churn??
Comment 3 Geoffrey Garen 2019-04-06 20:48:25 PDT
I don't think it's intentional. It's inconsistent with remove().

That said, it is true that shrinking to exactly your current size is usually a poor choice, since the very next insertion will grow.

One way to fix this is to do the shouldShrink test inside the loop.

Another way to fix this is to change the if (shouldShrink()) into a while (shouldShrink()).

Both of those are a bit wasteful, though.

The optimal solution is probably something like rehash(std::max(KeyTraits::minimumTableSize,  roundUpToPowerOfTwo(m_keyCount + 1)).
Comment 4 Darin Adler 2019-04-10 09:45:29 PDT
I agree with Geoff. The removeIf function should implement the same policy as shouldShrink/shrink, but those function are written with removing a single element from the hash table in mind. It’s the policy that is important. Also nice that before the policy was highly localized in those functions. It would be a shame to "spread out" the policy across more functions in the class, so we should carefully figure out how to factor this "remove multiple items" case.
Comment 5 Ryosuke Niwa 2019-04-12 15:10:16 PDT
Created attachment 367348 [details]
Fixes the bug
Comment 6 EWS Watchlist 2019-04-12 15:13:05 PDT
Attachment 367348 [details] did not pass style-queue:


ERROR: Tools/ChangeLog:12:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:473:  More than one command on the same line  [whitespace/newline] [4]
ERROR: Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:490:  More than one command on the same line  [whitespace/newline] [4]
ERROR: Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:499:  More than one command on the same line  [whitespace/newline] [4]
Total errors found: 4 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 7 Darin Adler 2019-04-12 15:54:45 PDT
Comment on attachment 367348 [details]
Fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=367348&action=review

> Source/WTF/wtf/HashTable.h:467
> +        unsigned computeBestTableSize(unsigned keyCount) { return WTF::roundUpToPowerOfTwo(keyCount) * 2; }

I think this should be should be static (and constexpr) member function, rather than a non-static.

Is it safe to call roundUpToPowerOfTwo on the key count? Is there a guarantee it won’t overflow?

> Source/WTF/wtf/HashTable.h:477
> +        void shrinkToBestSize()
> +        {
> +            unsigned minimumTableSize = KeyTraits::minimumTableSize;
> +            rehash(std::max<unsigned>(minimumTableSize, computeBestTableSize(m_keyCount + 1)), nullptr);
> +        }

Why does this add one to m_keyCount? The copy constructor doesn’t do that.

Unclear why it’s correct to do this computation without the "aboveThreeQuarterLoad" check that is in the copy constructor.

Is it safe to add one to m_keyCount? Is there something that guarantees it won’t overflow?

The <unsigned> is not needed in the call to std::max since both arguments are unsigned. Same in the code this was copied from, in the copy constructor.

Unclear why we need minimumTableSize to be put into a local variable. Same in the code this was copied from, in the copy constructor.

Style thought: I think a multi-line function should be defined outside the class, so it doesn’t take up four lines in the class definition.

> Tools/ChangeLog:12
> +	(WTF_HashSet.RemoveIfShrinkToBestSize):

Looks like there’s a tab here.
Comment 8 Ryosuke Niwa 2019-04-12 18:52:50 PDT
Comment on attachment 367348 [details]
Fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=367348&action=review

>> Source/WTF/wtf/HashTable.h:467
>> +        unsigned computeBestTableSize(unsigned keyCount) { return WTF::roundUpToPowerOfTwo(keyCount) * 2; }
> 
> I think this should be should be static (and constexpr) member function, rather than a non-static.
> 
> Is it safe to call roundUpToPowerOfTwo on the key count? Is there a guarantee it won’t overflow?

Sure. I can make it static.
I don't think this code is guaranteed to be safe but HashTable in general doesn't seem to have that kind of safety check.
For example, expand(~) always multiples the current table size by 2.

>> Source/WTF/wtf/HashTable.h:477
>> +        }
> 
> Why does this add one to m_keyCount? The copy constructor doesn’t do that.
> 
> Unclear why it’s correct to do this computation without the "aboveThreeQuarterLoad" check that is in the copy constructor.
> 
> Is it safe to add one to m_keyCount? Is there something that guarantees it won’t overflow?
> 
> The <unsigned> is not needed in the call to std::max since both arguments are unsigned. Same in the code this was copied from, in the copy constructor.
> 
> Unclear why we need minimumTableSize to be put into a local variable. Same in the code this was copied from, in the copy constructor.
> 
> Style thought: I think a multi-line function should be defined outside the class, so it doesn’t take up four lines in the class definition.

Oh, I din't see that code there. We clearly should do that math. I was adding 1 to do what aboveThreeQuarterLoad is doing.
There is also std::max<unsigned>(bestTableSize, minimumTableSize) already so we can merge that too.

If we don't store minimumTableSize in a local variable, we hit a linking error :(
It's probably some kind of a clang bug.
Comment 9 Ryosuke Niwa 2019-04-12 19:04:39 PDT
Created attachment 367371 [details]
Patch for landing
Comment 10 Ryosuke Niwa 2019-04-12 19:06:16 PDT
Waiting for EWS.
Comment 11 EWS Watchlist 2019-04-12 19:07:42 PDT
Attachment 367371 [details] did not pass style-queue:


ERROR: Tools/ChangeLog:12:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:473:  More than one command on the same line  [whitespace/newline] [4]
ERROR: Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:490:  More than one command on the same line  [whitespace/newline] [4]
ERROR: Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:499:  More than one command on the same line  [whitespace/newline] [4]
Total errors found: 4 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 12 Darin Adler 2019-04-13 08:14:26 PDT
Comment on attachment 367348 [details]
Fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=367348&action=review

>>> Source/WTF/wtf/HashTable.h:467
>>> +        unsigned computeBestTableSize(unsigned keyCount) { return WTF::roundUpToPowerOfTwo(keyCount) * 2; }
>> 
>> I think this should be should be static (and constexpr) member function, rather than a non-static.
>> 
>> Is it safe to call roundUpToPowerOfTwo on the key count? Is there a guarantee it won’t overflow?
> 
> Sure. I can make it static.
> I don't think this code is guaranteed to be safe but HashTable in general doesn't seem to have that kind of safety check.
> For example, expand(~) always multiples the current table size by 2.

We still need to think about what prevents overflow. I understand that this is consistent with the rest of the class. We should think about the rest of the class too. Not to block landing this patch, but to make sure it’s safe to program with HashTable.
Comment 13 Ryosuke Niwa 2019-04-15 14:04:46 PDT
(In reply to Darin Adler from comment #12)
> Comment on attachment 367348 [details]
> Fixes the bug
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=367348&action=review
> 
> >>> Source/WTF/wtf/HashTable.h:467
> >>> +        unsigned computeBestTableSize(unsigned keyCount) { return WTF::roundUpToPowerOfTwo(keyCount) * 2; }
> >> 
> >> I think this should be should be static (and constexpr) member function, rather than a non-static.
> >> 
> >> Is it safe to call roundUpToPowerOfTwo on the key count? Is there a guarantee it won’t overflow?
> > 
> > Sure. I can make it static.
> > I don't think this code is guaranteed to be safe but HashTable in general doesn't seem to have that kind of safety check.
> > For example, expand(~) always multiples the current table size by 2.
> 
> We still need to think about what prevents overflow. I understand that this
> is consistent with the rest of the class. We should think about the rest of
> the class too. Not to block landing this patch, but to make sure it’s safe
> to program with HashTable.

Indeed. Vector::append for example has a CRASH() call after calling expandCapacity. We probably need something similar for HashTable in addition to avoiding the overflow.
Comment 14 Ryosuke Niwa 2019-04-15 14:30:37 PDT
Tracking the overflow issue in https://bugs.webkit.org/show_bug.cgi?id=196935
Comment 15 Ryosuke Niwa 2019-04-15 14:30:40 PDT
Committed r244289: <https://trac.webkit.org/changeset/244289>
Comment 16 Radar WebKit Bug Importer 2019-04-15 14:32:25 PDT
<rdar://problem/49917764>
Comment 17 Darin Adler 2019-04-19 14:04:43 PDT
Comment on attachment 367371 [details]
Patch for landing

View in context: https://bugs.webkit.org/attachment.cgi?id=367371&action=review

> Source/WTF/wtf/HashTable.h:1213
> +        return std::max<unsigned>(bestTableSize, minimumTableSize);

Can we remove <unsigned> here?

> Source/WTF/wtf/HashTable.h:1220
> +        rehash(std::max<unsigned>(minimumTableSize, computeBestTableSize(m_keyCount)), nullptr);

Can we remove <unsigned> here?

> Tools/ChangeLog:12
> +	(WTF_HashSet.RemoveIfShrinkToBestSize):

Surprised Subversion pre-commit hooks allowed us to land a patch with a tab in a ChangeLog
Comment 18 Ryosuke Niwa 2019-04-19 21:35:27 PDT
(In reply to Darin Adler from comment #17)
> Comment on attachment 367371 [details]
> Patch for landing
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=367371&action=review
> 
> > Source/WTF/wtf/HashTable.h:1213
> > +        return std::max<unsigned>(bestTableSize, minimumTableSize);
> 
> Can we remove <unsigned> here?

Oops, I forgot to remove this. Will remove.


> > Source/WTF/wtf/HashTable.h:1220
> > +        rehash(std::max<unsigned>(minimumTableSize, computeBestTableSize(m_keyCount)), nullptr);
> 
> Can we remove <unsigned> here?

Ditto.

> > Tools/ChangeLog:12
> > +	(WTF_HashSet.RemoveIfShrinkToBestSize):
> 
> Surprised Subversion pre-commit hooks allowed us to land a patch with a tab
> in a ChangeLog

Oh, it DID catch it. I had to fix that before actually committing.
Comment 19 Ryosuke Niwa 2019-04-19 23:22:15 PDT
Reopening to attach new patch.
Comment 20 Ryosuke Niwa 2019-04-19 23:22:17 PDT
Created attachment 367881 [details]
Patch
Comment 21 Ryosuke Niwa 2019-04-20 12:16:24 PDT
Committed r244489: <https://trac.webkit.org/changeset/244489>