WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
195152
Add WeakHashSet
https://bugs.webkit.org/show_bug.cgi?id=195152
Summary
Add WeakHashSet
Ryosuke Niwa
Reported
2019-02-28 00:36:13 PST
Make it possible to store WeakPtr as a HashSet's key. Adding the support for HashMap is tricky because we'd have to delete the value as well.
Attachments
WIP
(37.65 KB, patch)
2019-02-28 00:36 PST
,
Ryosuke Niwa
no flags
Details
Formatted Diff
Diff
GTK+ builds fix attempt
(38.23 KB, patch)
2019-02-28 14:19 PST
,
Ryosuke Niwa
no flags
Details
Formatted Diff
Diff
Added leak checks
(39.46 KB, patch)
2019-02-28 18:34 PST
,
Ryosuke Niwa
no flags
Details
Formatted Diff
Diff
Archive of layout-test-results from ews114 for mac-highsierra
(1.60 MB, application/zip)
2019-02-28 23:42 PST
,
EWS Watchlist
no flags
Details
Adds WeakHashSet
(32.58 KB, patch)
2019-02-28 23:43 PST
,
Ryosuke Niwa
koivisto
: review+
ews-watchlist
: commit-queue-
Details
Formatted Diff
Diff
Archive of layout-test-results from ews112 for mac-highsierra
(1.36 MB, application/zip)
2019-03-01 21:13 PST
,
EWS Watchlist
no flags
Details
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Ryosuke Niwa
Comment 1
2019-02-28 00:36:54 PST
Created
attachment 363209
[details]
WIP
EWS Watchlist
Comment 2
2019-02-28 00:39:39 PST
Attachment 363209
[details]
did not pass style-queue: ERROR: Source/WTF/wtf/HashTable.h:305: This { should be at the end of the previous line [whitespace/braces] [4] ERROR: Source/WTF/wtf/HashTable.h:317: This { should be at the end of the previous line [whitespace/braces] [4] ERROR: Source/WTF/wtf/WeakPtr.h:74: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] Total errors found: 3 in 13 files If any of these errors are false positives, please file a bug against check-webkit-style.
Ryosuke Niwa
Comment 3
2019-02-28 00:48:10 PST
A few notes: size(), isEmpty(), and random() can no longer be done in O(1) as some of keys may have been "released" without HashTable ever getting notified. I'm disabling these methods with std::enable_if for now. random() is particularly an interesting one because it would work in most cases except when the hash table is empty, in which case, it could infinite loop as there is no guarantee that we've visited all N slots at any point as it performs a random access. Unlike with other keys, we need to deal with the fact key suddenly becomes "released". We have to deal with this issue in inlineLookup, and in fact, have to shrink the table there because the canonical way to remove a key, remove(find(X)), would fail as find(X) would need to return HashSet::end(). expand() needs to consider the possibility that enough buckets may become available if we yanked out all WeakPtr that have become "released". Note that m_keyCount + m_deletedCount stay valid if we consider "released" state as equivalent as "deleted" because WeakPtr could only move from being valid to being "released".
Geoffrey Garen
Comment 4
2019-02-28 11:33:16 PST
Comment on
attachment 363209
[details]
WIP Now that you've enumerated all the edge cases created by weak keys, I don't feel great about them. The idea that lookup must delete is particularly surprising. But even the idea that insertion must delete is somewhat surprising. I'm starting to wonder if a dedicated WeakSet is the right way to go.
Ryosuke Niwa
Comment 5
2019-02-28 13:38:31 PST
(In reply to Geoffrey Garen from
comment #4
)
> Comment on
attachment 363209
[details]
> WIP
>
> The idea that lookup must delete is particularly surprising. > > But even the idea that insertion must delete is somewhat surprising. > > I'm starting to wonder if a dedicated WeakSet is the right way to go.
Yes but creating WeakSt doesn't necessarily reduce the complexity of the code change needed for HashTable. Another observation I made is that treating released weak reference is a deleted value is problematic because we'd then need to remember to invoke WeakPtr::~WeakPtr in the future anyway. Otherwise, we'd leak every WeakReference whose m_ptr has become null.
Geoffrey Garen
Comment 6
2019-02-28 14:00:47 PST
> > I'm starting to wonder if a dedicated WeakSet is the right way to go. > > Yes but creating WeakSt doesn't necessarily reduce the complexity of the > code change needed for HashTable.
Antti's proposal was a WeakSet class with an internal implementation of HashSet<RefPtr<WeakReference<T>>, which explicitly checks for nulls at the API layer. Is that unworkable in some way? Maybe it's still a problem if its buckets become empty?
Ryosuke Niwa
Comment 7
2019-02-28 14:17:24 PST
(In reply to Geoffrey Garen from
comment #6
)
> > > I'm starting to wonder if a dedicated WeakSet is the right way to go. > > > > Yes but creating WeakSt doesn't necessarily reduce the complexity of the > > code change needed for HashTable. > > Antti's proposal was a WeakSet class with an internal implementation of > HashSet<RefPtr<WeakReference<T>>, which explicitly checks for nulls at the > API layer. Is that unworkable in some way? Maybe it's still a problem if its > buckets become empty?
It's still a problem because if the bucket becomes "released weak", then they're indistinguishable. We'd need to clear them away at some point. I also came to realization that removal is actually problematic than insertion. In the most naive implementation, removal of "released weak" turns into no-op and then we'd never shrink the table because m_keyCount never goes down. A slightly simpler implementation, which would use ~2x memory than what I have because we can't re-use released weak until HashTable::expand() is called, might invoke something like this: 1. Create a wrapper class like WeakHashSet 2. Update HashTable::expand to do deleteReleasedWeakBuckets (same as current patch) 3. Update HashTable::rehash (same as current patch) 4. WeakHashSet::remove would find "released weak" and actually deletes it. This requires a new method on HashTable or a specialized iterator. I think this approach would have made a sense if we could treat "released weak" like "deleted". As soon as I realized that we can't because we have to call the destructor on WeakPtr, I concluded that it's no simpler than my current approach because we'd be simply shuffling around where ReleasedWeakValueDeleter is used, and it doesn't really reduce much complexity from HashTable itself.
Ryosuke Niwa
Comment 8
2019-02-28 14:19:23 PST
Created
attachment 363259
[details]
GTK+ builds fix attempt
EWS Watchlist
Comment 9
2019-02-28 14:20:52 PST
Attachment 363259
[details]
did not pass style-queue: ERROR: Source/WTF/wtf/HashTable.h:305: This { should be at the end of the previous line [whitespace/braces] [4] ERROR: Source/WTF/wtf/HashTable.h:317: This { should be at the end of the previous line [whitespace/braces] [4] ERROR: Source/WTF/wtf/WeakPtr.h:74: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] Total errors found: 3 in 14 files If any of these errors are false positives, please file a bug against check-webkit-style.
Geoffrey Garen
Comment 10
2019-02-28 14:53:47 PST
> > Antti's proposal was a WeakSet class with an internal implementation of > > HashSet<RefPtr<WeakReference<T>>, which explicitly checks for nulls at the > > API layer. Is that unworkable in some way? Maybe it's still a problem if its > > buckets become empty? > > It's still a problem because if the bucket becomes "released weak", then > they're indistinguishable. We'd need to clear them away at some point.
Let me try to clarify the specifics of the problem. Let's say I have HashSet<RefPtr<WeakReference<T>>. Object identity guarantees that the WeakRefrence pointers in my table are unique, and that they hash and test for equality correctly. (Each CanMakeWeakPtr object has exactly one WeakReference*, that WeakReference* uniquely identifies the object, and that WeakReference* lives as long as the table needs it to live.) So, from a correctness perspective, I believe it "just works". I don't think released weak pointers become "indistinguishable". But maybe I'm misunderstanding the term? I believe that each WeakReference* can still be distinguished from each other WeakReference*, and also distinguished from the empty and deleted buckets. (The empty bucket will be the null WeakReference* and the deleted bucket will be the -1 WeakReference*, which no CanMakeWeakPtr object will ever vend to us.) Also, I believe that it's impossible to do a hash lookup that hits a released weak pointer. Every lookup uses the CanMakeWeakPtr object's unique WeakReference*. By implementation, it is guaranteed that a WeakReference* is not released if its CanMakeWeakPtr object still exists. And it is a programming error to lookup a deleted CanMakeWeakPtr object. Given those guarantees, I believe the specifics of the problem are: 1. How should functions like size() and isEmpty() work? 2. How should we avoid unbounded growth in the number of weak released pointer buckets? Is that right?
> I also came to realization that removal is actually problematic than > insertion. In the most naive implementation, removal of "released weak" > turns into no-op and then we'd never shrink the table because m_keyCount > never goes down.
Yes. This sounds like (2). Right?
> A slightly simpler implementation, which would use ~2x memory than what I > have because we can't re-use released weak until HashTable::expand() is > called, might invoke something like this: > 1. Create a wrapper class like WeakHashSet > 2. Update HashTable::expand to do deleteReleasedWeakBuckets (same as current > patch) > 3. Update HashTable::rehash (same as current patch)
Yeah, I think something like this would work as a solution to (2). But why would this use more memory? As long as we take the opportunity to clear released weak buckets before we grow the table, I would expect it to take equal memory. Another option is just to make this the wrapper class's problem, and make the wrapper class scan for released weak buckets after N insertions or N total operations. Not quite optimal, but equal by the standard of average case amortized performance, and maybe easier to work with.
> 4. WeakHashSet::remove would find "released weak" and actually deletes it. > This requires a new method on HashTable or a specialized iterator.
I don't think we need this. As long as you're removing an object that has not yet been deleted, you will either find the object or find nothing. There's no collision with a released weak bucket. Right?
Ryosuke Niwa
Comment 11
2019-02-28 16:54:04 PST
(In reply to Geoffrey Garen from
comment #10
)
> > > Antti's proposal was a WeakSet class with an internal implementation of > > > HashSet<RefPtr<WeakReference<T>>, which explicitly checks for nulls at the > > > API layer. Is that unworkable in some way? Maybe it's still a problem if its > > > buckets become empty? > > > > It's still a problem because if the bucket becomes "released weak", then > > they're indistinguishable. We'd need to clear them away at some point. > > Let me try to clarify the specifics of the problem. > > Let's say I have HashSet<RefPtr<WeakReference<T>>. Object identity > guarantees that the WeakRefrence pointers in my table are unique, and that > they hash and test for equality correctly. (Each CanMakeWeakPtr object has > exactly one WeakReference*, that WeakReference* uniquely identifies the > object, and that WeakReference* lives as long as the table needs it to > live.) So, from a correctness perspective, I believe it "just works".
Sure, it "works" in the sense that we won't have a crash, etc... but they'd be "leaked" unless we remove them during rehashing.
> I don't think released weak pointers become "indistinguishable". But maybe > I'm misunderstanding the term? I believe that each WeakReference* can still > be distinguished from each other WeakReference*, and also distinguished from > the empty and deleted buckets.
Yes, that's true. What I'm saying is that from lookup / removal perspective. There is nothing you can do to find those "released weak's".
> Also, I believe that it's impossible to do a hash lookup that hits a > released weak pointer. Every lookup uses the CanMakeWeakPtr object's unique > WeakReference*. By implementation, it is guaranteed that a WeakReference* is > not released if its CanMakeWeakPtr object still exists. And it is a > programming error to lookup a deleted CanMakeWeakPtr object.
Right. This is precisely the issue I was referring to when I said they're "indistinguishable".
> Given those guarantees, I believe the specifics of the problem are: > > 1. How should functions like size() and isEmpty() work? > > 2. How should we avoid unbounded growth in the number of weak released > pointer buckets? > > Is that right?
Right.
> > I also came to realization that removal is actually problematic than > > insertion. In the most naive implementation, removal of "released weak" > > turns into no-op and then we'd never shrink the table because m_keyCount > > never goes down. > > Yes. This sounds like (2). Right?
Yes.
> > A slightly simpler implementation, which would use ~2x memory than what I > > have because we can't re-use released weak until HashTable::expand() is > > called, might invoke something like this: > > 1. Create a wrapper class like WeakHashSet > > 2. Update HashTable::expand to do deleteReleasedWeakBuckets (same as current > > patch) > > 3. Update HashTable::rehash (same as current patch) > > Yeah, I think something like this would work as a solution to (2). > > But why would this use more memory? As long as we take the opportunity to > clear released weak buckets before we grow the table, I would expect it to > take equal memory.
Because we would never realize that the keys had been "released" until we examine. Unless we detect whenever we hit a "released weak" and increment m_deletedCount or m_keyCount, we wouldn't realize until we do re-hashing.
> Another option is just to make this the wrapper class's problem, and make > the wrapper class scan for released weak buckets after N insertions or N > total operations. Not quite optimal, but equal by the standard of average > case amortized performance, and maybe easier to work with.
Yeah, if we're making this a separate class, we probably want to do that. In fact, even my current implementation has an issue that when the hash set is of size N, and N-1 items had been released, we wouldn't ever shrink the hash table until we trigger re-hash or enough lookups to realize that had happened.
> > 4. WeakHashSet::remove would find "released weak" and actually deletes it. > > This requires a new method on HashTable or a specialized iterator. > > I don't think we need this. As long as you're removing an object that has > not yet been deleted, you will either find the object or find nothing. > There's no collision with a released weak bucket. Right?
Actually, yeah, if we have a wrapper class, then we HashSet itself would return a WeakReference with null m_ptr so we could then delete such an entry.
Ryosuke Niwa
Comment 12
2019-02-28 18:34:04 PST
Created
attachment 363287
[details]
Added leak checks Here's basically the final patch for this approach.
EWS Watchlist
Comment 13
2019-02-28 18:37:07 PST
Attachment 363287
[details]
did not pass style-queue: ERROR: Source/WTF/wtf/HashTable.h:305: This { should be at the end of the previous line [whitespace/braces] [4] ERROR: Source/WTF/wtf/HashTable.h:317: This { should be at the end of the previous line [whitespace/braces] [4] ERROR: Source/WTF/wtf/WeakPtr.h:76: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] Total errors found: 3 in 14 files If any of these errors are false positives, please file a bug against check-webkit-style.
Geoffrey Garen
Comment 14
2019-02-28 19:07:30 PST
> > But why would this use more memory? As long as we take the opportunity to > > clear released weak buckets before we grow the table, I would expect it to > > take equal memory. > > Because we would never realize that the keys had been "released" until we > examine. Unless we detect whenever we hit a "released weak" and increment > m_deletedCount or m_keyCount, we wouldn't realize until we do re-hashing.
Got it. So, the memory you're talking about is the backing storage for the WeakReference object.
> > Another option is just to make this the wrapper class's problem, and make > > the wrapper class scan for released weak buckets after N insertions or N > > total operations. Not quite optimal, but equal by the standard of average > > case amortized performance, and maybe easier to work with. > > Yeah, if we're making this a separate class, we probably want to do that. In > fact, even my current implementation has an issue that when the hash set is > of size N, and N-1 items had been released, we wouldn't ever shrink the hash > table until we trigger re-hash or enough lookups to realize that had > happened.
Right. Adding code to HashTable gives us more opportunities to delete individual WeakReferences -- but only in proportion to the number of client lookup operations, and their luckiness. If we assume that 1/2 of buckets are released WeakReferences on average, and one lookup deletes one released WeakReferences on average, then the memory usage after LiveKeyCount lookups is identical between both implementations. (The HashTable implementation incrementally removes one bucket per lookup up to LiveKeyCount buckets, and the wrapper class implementation incrementally removes zero buckets per lookup and then removes LiveKeyCount buckets at the end.) If we assume other average case conditions, then one or the other implementation will win. In practice, I suspect that either implementation is probably acceptable.
> > > 4. WeakHashSet::remove would find "released weak" and actually deletes it. > > > This requires a new method on HashTable or a specialized iterator. > > > > I don't think we need this. As long as you're removing an object that has > > not yet been deleted, you will either find the object or find nothing. > > There's no collision with a released weak bucket. Right? > > Actually, yeah, if we have a wrapper class, then we HashSet itself would > return a WeakReference with null m_ptr so we could then delete such an entry.
I don't think that's quite right. Rather, I think it's impossible for the wrapper class solution to return a WeakReference with null m_ptr. To look up T* and return WeakReference<T> with null m_ptr, T* must be a deleted pointer. But looking up a deleted pointer in a weak hash table is not a supported operation. (For one thing, it's a logic error because T* may have been reused to allocate a new object, and you might get a hash table hit in error! Also, mechanically, how would you access T*'s WeakReference* to do the lookup? You would crash trying to load the WeakReference* from the deleted T*.)
EWS Watchlist
Comment 15
2019-02-28 23:42:15 PST
Comment on
attachment 363287
[details]
Added leak checks
Attachment 363287
[details]
did not pass mac-debug-ews (mac): Output:
https://webkit-queues.webkit.org/results/11326670
New failing tests: storage/indexeddb/modern/blocked-open-db-requests-private.html
EWS Watchlist
Comment 16
2019-02-28 23:42:17 PST
Created
attachment 363310
[details]
Archive of layout-test-results from ews114 for mac-highsierra The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews114 Port: mac-highsierra Platform: Mac OS X 10.13.6
Ryosuke Niwa
Comment 17
2019-02-28 23:43:33 PST
Created
attachment 363311
[details]
Adds WeakHashSet
Ryosuke Niwa
Comment 18
2019-02-28 23:45:57 PST
(In reply to Geoffrey Garen from
comment #14
) >
> > > Another option is just to make this the wrapper class's problem, and make > > > the wrapper class scan for released weak buckets after N insertions or N > > > total operations. Not quite optimal, but equal by the standard of average > > > case amortized performance, and maybe easier to work with. > > > > Yeah, if we're making this a separate class, we probably want to do that. In > > fact, even my current implementation has an issue that when the hash set is > > of size N, and N-1 items had been released, we wouldn't ever shrink the hash > > table until we trigger re-hash or enough lookups to realize that had > > happened. > > Right. Adding code to HashTable gives us more opportunities to delete > individual WeakReferences -- but only in proportion to the number of client > lookup operations, and their luckiness. If we assume that 1/2 of buckets are > released WeakReferences on average, and one lookup deletes one released > WeakReferences on average, then the memory usage after LiveKeyCount lookups > is identical between both implementations. (The HashTable implementation > incrementally removes one bucket per lookup up to LiveKeyCount buckets, and > the wrapper class implementation incrementally removes zero buckets per > lookup and then removes LiveKeyCount buckets at the end.) > > If we assume other average case conditions, then one or the other > implementation will win. In practice, I suspect that either implementation > is probably acceptable.
Maybe. Now that I have a new implementation, I've started to think that this difference is indeed insignificant. A better strategy might to make iterator keep track of how many "released weak's" we've seen, and trigger a cleanup at the end of it has seen enough of it.
> > > > 4. WeakHashSet::remove would find "released weak" and actually deletes it. > > > > This requires a new method on HashTable or a specialized iterator. > > > > > > I don't think we need this. As long as you're removing an object that has > > > not yet been deleted, you will either find the object or find nothing. > > > There's no collision with a released weak bucket. Right? > > > > Actually, yeah, if we have a wrapper class, then we HashSet itself would > > return a WeakReference with null m_ptr so we could then delete such an entry. > > I don't think that's quite right. Rather, I think it's impossible for the > wrapper class solution to return a WeakReference with null m_ptr. To look up > T* and return WeakReference<T> with null m_ptr, T* must be a deleted > pointer. But looking up a deleted pointer in a weak hash table is not a > supported operation. (For one thing, it's a logic error because T* may have > been reused to allocate a new object, and you might get a hash table hit in > error! Also, mechanically, how would you access T*'s WeakReference* to do > the lookup? You would crash trying to load the WeakReference* from the > deleted T*.)
Yeah, I see what you're saying that I've written the code.
Antti Koivisto
Comment 19
2019-03-01 05:29:19 PST
Comment on
attachment 363311
[details]
Adds WeakHashSet View in context:
https://bugs.webkit.org/attachment.cgi?id=363311&action=review
> Source/WTF/wtf/HashTable.h:427 > + static bool isReleasedWeakBucket(const ValueType& value) { return isHashTraitsReleasedWeakValue<KeyTraits>(Extractor::extract(value)); }
I wasn't sure with the old patch if these HashTable changes are really worth doing. Are they a significant improvement over simply clearing out released weak values in WeakHashSet::add when hitting the expansion size?
> Source/WTF/wtf/WeakHashSet.h:64 > + void skipEmptyBuckets() > + { > + while (m_position != m_endPosition && !m_position->get().get()) > + ++m_position; > + }
Iterator could also drop released weak values so iterating a table with lots of released values would be slow only once.
Ryosuke Niwa
Comment 20
2019-03-01 10:51:55 PST
(In reply to Antti Koivisto from
comment #19
)
> Comment on
attachment 363311
[details]
> Adds WeakHashSet > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=363311&action=review
> > > Source/WTF/wtf/HashTable.h:427 > > + static bool isReleasedWeakBucket(const ValueType& value) { return isHashTraitsReleasedWeakValue<KeyTraits>(Extractor::extract(value)); } > > I wasn't sure with the old patch if these HashTable changes are really worth > doing. Are they a significant improvement over simply clearing out released > weak values in WeakHashSet::add when hitting the expansion size?
The challenge here is that shouldExpand() & shouldShrink() are checked after we've inserted / removed an item. This would mean that we need a new variant of shouldExpand() & shouldShrink() that should return true when we were to add or remove another item. Even then we'd still incur an extra iteration over the hash table for the initial cleanup. That seemed like a really fragile design because then if we ever make refactoring changes to HashTable, whatever approach / optimization we do in WeakSet break.
> > Source/WTF/wtf/WeakHashSet.h:64 > > + void skipEmptyBuckets() > > + { > > + while (m_position != m_endPosition && !m_position->get().get()) > > + ++m_position; > > + } > > Iterator could also drop released weak values so iterating a table with lots > of released values would be slow only once.
Yeah, as I said in my earlier comment, that could be a valid optimization strategy. I was thinking that maybe I'll do that in a separate patch.
EWS Watchlist
Comment 21
2019-03-01 21:13:46 PST
Comment on
attachment 363311
[details]
Adds WeakHashSet
Attachment 363311
[details]
did not pass mac-debug-ews (mac): Output:
https://webkit-queues.webkit.org/results/11339043
New failing tests: storage/indexeddb/modern/createobjectstore-basic-private.html
EWS Watchlist
Comment 22
2019-03-01 21:13:47 PST
Created
attachment 363408
[details]
Archive of layout-test-results from ews112 for mac-highsierra The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews112 Port: mac-highsierra Platform: Mac OS X 10.13.6
Antti Koivisto
Comment 23
2019-03-03 00:33:41 PST
Comment on
attachment 363311
[details]
Adds WeakHashSet Ok, seems good enough to start with. r=me
Ryosuke Niwa
Comment 24
2019-03-04 13:52:34 PST
Committed
r242387
: <
https://trac.webkit.org/changeset/242387
>
Radar WebKit Bug Importer
Comment 25
2019-03-04 15:43:14 PST
<
rdar://problem/48580628
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug