WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
196681
HashTable::removeIf always shrinks the hash table by half even if there is nothing left
https://bugs.webkit.org/show_bug.cgi?id=196681
Summary
HashTable::removeIf always shrinks the hash table by half even if there is no...
Ryosuke Niwa
Reported
2019-04-06 17:19:11 PDT
HashTable::removeIf always shrinks the table to half of its original size even if there is nothing left in the table afterwards.
Attachments
Fixes the bug
(5.14 KB, patch)
2019-04-12 15:10 PDT
,
Ryosuke Niwa
no flags
Details
Formatted Diff
Diff
Patch for landing
(6.95 KB, patch)
2019-04-12 19:04 PDT
,
Ryosuke Niwa
no flags
Details
Formatted Diff
Diff
Patch
(1.82 KB, patch)
2019-04-19 23:22 PDT
,
Ryosuke Niwa
darin
: review+
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Ryosuke Niwa
Comment 1
2019-04-06 17:22:35 PDT
HashTable::removeIf has the code like this: if (shouldShrink()) shrink(); But shrink does: void shrink() { rehash(m_tableSize / 2, nullptr); } Let's say we removed 30 out of 32 entires in the hash table. Then m_tableSize might have been 64 but we'd only shrink the table to 16 after the removal instead of the minimal table size of 8.
Ryosuke Niwa
Comment 2
2019-04-06 17:30:00 PDT
Maybe this design was intentional to limit the rate by which we shrink to avoid table size churn??
Geoffrey Garen
Comment 3
2019-04-06 20:48:25 PDT
I don't think it's intentional. It's inconsistent with remove(). That said, it is true that shrinking to exactly your current size is usually a poor choice, since the very next insertion will grow. One way to fix this is to do the shouldShrink test inside the loop. Another way to fix this is to change the if (shouldShrink()) into a while (shouldShrink()). Both of those are a bit wasteful, though. The optimal solution is probably something like rehash(std::max(KeyTraits::minimumTableSize, roundUpToPowerOfTwo(m_keyCount + 1)).
Darin Adler
Comment 4
2019-04-10 09:45:29 PDT
I agree with Geoff. The removeIf function should implement the same policy as shouldShrink/shrink, but those function are written with removing a single element from the hash table in mind. It’s the policy that is important. Also nice that before the policy was highly localized in those functions. It would be a shame to "spread out" the policy across more functions in the class, so we should carefully figure out how to factor this "remove multiple items" case.
Ryosuke Niwa
Comment 5
2019-04-12 15:10:16 PDT
Created
attachment 367348
[details]
Fixes the bug
EWS Watchlist
Comment 6
2019-04-12 15:13:05 PDT
Attachment 367348
[details]
did not pass style-queue: ERROR: Tools/ChangeLog:12: Line contains tab character. [whitespace/tab] [5] ERROR: Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:473: More than one command on the same line [whitespace/newline] [4] ERROR: Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:490: More than one command on the same line [whitespace/newline] [4] ERROR: Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:499: More than one command on the same line [whitespace/newline] [4] Total errors found: 4 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.
Darin Adler
Comment 7
2019-04-12 15:54:45 PDT
Comment on
attachment 367348
[details]
Fixes the bug View in context:
https://bugs.webkit.org/attachment.cgi?id=367348&action=review
> Source/WTF/wtf/HashTable.h:467 > + unsigned computeBestTableSize(unsigned keyCount) { return WTF::roundUpToPowerOfTwo(keyCount) * 2; }
I think this should be should be static (and constexpr) member function, rather than a non-static. Is it safe to call roundUpToPowerOfTwo on the key count? Is there a guarantee it won’t overflow?
> Source/WTF/wtf/HashTable.h:477 > + void shrinkToBestSize() > + { > + unsigned minimumTableSize = KeyTraits::minimumTableSize; > + rehash(std::max<unsigned>(minimumTableSize, computeBestTableSize(m_keyCount + 1)), nullptr); > + }
Why does this add one to m_keyCount? The copy constructor doesn’t do that. Unclear why it’s correct to do this computation without the "aboveThreeQuarterLoad" check that is in the copy constructor. Is it safe to add one to m_keyCount? Is there something that guarantees it won’t overflow? The <unsigned> is not needed in the call to std::max since both arguments are unsigned. Same in the code this was copied from, in the copy constructor. Unclear why we need minimumTableSize to be put into a local variable. Same in the code this was copied from, in the copy constructor. Style thought: I think a multi-line function should be defined outside the class, so it doesn’t take up four lines in the class definition.
> Tools/ChangeLog:12 > + (WTF_HashSet.RemoveIfShrinkToBestSize):
Looks like there’s a tab here.
Ryosuke Niwa
Comment 8
2019-04-12 18:52:50 PDT
Comment on
attachment 367348
[details]
Fixes the bug View in context:
https://bugs.webkit.org/attachment.cgi?id=367348&action=review
>> Source/WTF/wtf/HashTable.h:467 >> + unsigned computeBestTableSize(unsigned keyCount) { return WTF::roundUpToPowerOfTwo(keyCount) * 2; } > > I think this should be should be static (and constexpr) member function, rather than a non-static. > > Is it safe to call roundUpToPowerOfTwo on the key count? Is there a guarantee it won’t overflow?
Sure. I can make it static. I don't think this code is guaranteed to be safe but HashTable in general doesn't seem to have that kind of safety check. For example, expand(~) always multiples the current table size by 2.
>> Source/WTF/wtf/HashTable.h:477 >> + } > > Why does this add one to m_keyCount? The copy constructor doesn’t do that. > > Unclear why it’s correct to do this computation without the "aboveThreeQuarterLoad" check that is in the copy constructor. > > Is it safe to add one to m_keyCount? Is there something that guarantees it won’t overflow? > > The <unsigned> is not needed in the call to std::max since both arguments are unsigned. Same in the code this was copied from, in the copy constructor. > > Unclear why we need minimumTableSize to be put into a local variable. Same in the code this was copied from, in the copy constructor. > > Style thought: I think a multi-line function should be defined outside the class, so it doesn’t take up four lines in the class definition.
Oh, I din't see that code there. We clearly should do that math. I was adding 1 to do what aboveThreeQuarterLoad is doing. There is also std::max<unsigned>(bestTableSize, minimumTableSize) already so we can merge that too. If we don't store minimumTableSize in a local variable, we hit a linking error :( It's probably some kind of a clang bug.
Ryosuke Niwa
Comment 9
2019-04-12 19:04:39 PDT
Created
attachment 367371
[details]
Patch for landing
Ryosuke Niwa
Comment 10
2019-04-12 19:06:16 PDT
Waiting for EWS.
EWS Watchlist
Comment 11
2019-04-12 19:07:42 PDT
Attachment 367371
[details]
did not pass style-queue: ERROR: Tools/ChangeLog:12: Line contains tab character. [whitespace/tab] [5] ERROR: Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:473: More than one command on the same line [whitespace/newline] [4] ERROR: Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:490: More than one command on the same line [whitespace/newline] [4] ERROR: Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:499: More than one command on the same line [whitespace/newline] [4] Total errors found: 4 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.
Darin Adler
Comment 12
2019-04-13 08:14:26 PDT
Comment on
attachment 367348
[details]
Fixes the bug View in context:
https://bugs.webkit.org/attachment.cgi?id=367348&action=review
>>> Source/WTF/wtf/HashTable.h:467 >>> + unsigned computeBestTableSize(unsigned keyCount) { return WTF::roundUpToPowerOfTwo(keyCount) * 2; } >> >> I think this should be should be static (and constexpr) member function, rather than a non-static. >> >> Is it safe to call roundUpToPowerOfTwo on the key count? Is there a guarantee it won’t overflow? > > Sure. I can make it static. > I don't think this code is guaranteed to be safe but HashTable in general doesn't seem to have that kind of safety check. > For example, expand(~) always multiples the current table size by 2.
We still need to think about what prevents overflow. I understand that this is consistent with the rest of the class. We should think about the rest of the class too. Not to block landing this patch, but to make sure it’s safe to program with HashTable.
Ryosuke Niwa
Comment 13
2019-04-15 14:04:46 PDT
(In reply to Darin Adler from
comment #12
)
> Comment on
attachment 367348
[details]
> Fixes the bug > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=367348&action=review
> > >>> Source/WTF/wtf/HashTable.h:467 > >>> + unsigned computeBestTableSize(unsigned keyCount) { return WTF::roundUpToPowerOfTwo(keyCount) * 2; } > >> > >> I think this should be should be static (and constexpr) member function, rather than a non-static. > >> > >> Is it safe to call roundUpToPowerOfTwo on the key count? Is there a guarantee it won’t overflow? > > > > Sure. I can make it static. > > I don't think this code is guaranteed to be safe but HashTable in general doesn't seem to have that kind of safety check. > > For example, expand(~) always multiples the current table size by 2. > > We still need to think about what prevents overflow. I understand that this > is consistent with the rest of the class. We should think about the rest of > the class too. Not to block landing this patch, but to make sure it’s safe > to program with HashTable.
Indeed. Vector::append for example has a CRASH() call after calling expandCapacity. We probably need something similar for HashTable in addition to avoiding the overflow.
Ryosuke Niwa
Comment 14
2019-04-15 14:30:37 PDT
Tracking the overflow issue in
https://bugs.webkit.org/show_bug.cgi?id=196935
Ryosuke Niwa
Comment 15
2019-04-15 14:30:40 PDT
Committed
r244289
: <
https://trac.webkit.org/changeset/244289
>
Radar WebKit Bug Importer
Comment 16
2019-04-15 14:32:25 PDT
<
rdar://problem/49917764
>
Darin Adler
Comment 17
2019-04-19 14:04:43 PDT
Comment on
attachment 367371
[details]
Patch for landing View in context:
https://bugs.webkit.org/attachment.cgi?id=367371&action=review
> Source/WTF/wtf/HashTable.h:1213 > + return std::max<unsigned>(bestTableSize, minimumTableSize);
Can we remove <unsigned> here?
> Source/WTF/wtf/HashTable.h:1220 > + rehash(std::max<unsigned>(minimumTableSize, computeBestTableSize(m_keyCount)), nullptr);
Can we remove <unsigned> here?
> Tools/ChangeLog:12 > + (WTF_HashSet.RemoveIfShrinkToBestSize):
Surprised Subversion pre-commit hooks allowed us to land a patch with a tab in a ChangeLog
Ryosuke Niwa
Comment 18
2019-04-19 21:35:27 PDT
(In reply to Darin Adler from
comment #17
)
> Comment on
attachment 367371
[details]
> Patch for landing > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=367371&action=review
> > > Source/WTF/wtf/HashTable.h:1213 > > + return std::max<unsigned>(bestTableSize, minimumTableSize); > > Can we remove <unsigned> here?
Oops, I forgot to remove this. Will remove.
> > Source/WTF/wtf/HashTable.h:1220 > > + rehash(std::max<unsigned>(minimumTableSize, computeBestTableSize(m_keyCount)), nullptr); > > Can we remove <unsigned> here?
Ditto.
> > Tools/ChangeLog:12 > > + (WTF_HashSet.RemoveIfShrinkToBestSize): > > Surprised Subversion pre-commit hooks allowed us to land a patch with a tab > in a ChangeLog
Oh, it DID catch it. I had to fix that before actually committing.
Ryosuke Niwa
Comment 19
2019-04-19 23:22:15 PDT
Reopening to attach new patch.
Ryosuke Niwa
Comment 20
2019-04-19 23:22:17 PDT
Created
attachment 367881
[details]
Patch
Ryosuke Niwa
Comment 21
2019-04-20 12:16:24 PDT
Committed
r244489
: <
https://trac.webkit.org/changeset/244489
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug