195152 – Add WeakHashSet

RESOLVED FIXED 195152

Add WeakHashSet

https://bugs.webkit.org/show_bug.cgi?id=195152

Summary Add WeakHashSet

Ryosuke Niwa

Reported 2019-02-28 00:36:13 PST

Make it possible to store WeakPtr as a HashSet's key. Adding the support for HashMap is tricky because we'd have to delete the value as well.

Attachments
WIP (37.65 KB, patch) 2019-02-28 00:36 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
GTK+ builds fix attempt (38.23 KB, patch) 2019-02-28 14:19 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Added leak checks (39.46 KB, patch) 2019-02-28 18:34 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews114 for mac-highsierra (1.60 MB, application/zip) 2019-02-28 23:42 PST, EWS Watchlist	no flags	Details
Adds WeakHashSet (32.58 KB, patch) 2019-02-28 23:43 PST, Ryosuke Niwa	koivisto: review+ ews-watchlist: commit-queue-	Details Formatted Diff Diff
Archive of layout-test-results from ews112 for mac-highsierra (1.36 MB, application/zip) 2019-03-01 21:13 PST, EWS Watchlist	no flags	Details
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Ryosuke Niwa

Comment 1 2019-02-28 00:36:54 PST

Created attachment 363209 [details] WIP

EWS Watchlist

Comment 2 2019-02-28 00:39:39 PST

Attachment 363209 [details] did not pass style-queue: ERROR: Source/WTF/wtf/HashTable.h:305: This { should be at the end of the previous line [whitespace/braces] [4] ERROR: Source/WTF/wtf/HashTable.h:317: This { should be at the end of the previous line [whitespace/braces] [4] ERROR: Source/WTF/wtf/WeakPtr.h:74: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] Total errors found: 3 in 13 files If any of these errors are false positives, please file a bug against check-webkit-style.

Ryosuke Niwa

Comment 3 2019-02-28 00:48:10 PST

A few notes: size(), isEmpty(), and random() can no longer be done in O(1) as some of keys may have been "released" without HashTable ever getting notified. I'm disabling these methods with std::enable_if for now. random() is particularly an interesting one because it would work in most cases except when the hash table is empty, in which case, it could infinite loop as there is no guarantee that we've visited all N slots at any point as it performs a random access. Unlike with other keys, we need to deal with the fact key suddenly becomes "released". We have to deal with this issue in inlineLookup, and in fact, have to shrink the table there because the canonical way to remove a key, remove(find(X)), would fail as find(X) would need to return HashSet::end(). expand() needs to consider the possibility that enough buckets may become available if we yanked out all WeakPtr that have become "released". Note that m_keyCount + m_deletedCount stay valid if we consider "released" state as equivalent as "deleted" because WeakPtr could only move from being valid to being "released".

Geoffrey Garen

Comment 4 2019-02-28 11:33:16 PST

Comment on attachment 363209 [details] WIP Now that you've enumerated all the edge cases created by weak keys, I don't feel great about them. The idea that lookup must delete is particularly surprising. But even the idea that insertion must delete is somewhat surprising. I'm starting to wonder if a dedicated WeakSet is the right way to go.

Ryosuke Niwa

Comment 5 2019-02-28 13:38:31 PST

(In reply to Geoffrey Garen from comment #4) > Comment on attachment 363209 [details] > WIP > > The idea that lookup must delete is particularly surprising. > > But even the idea that insertion must delete is somewhat surprising. > > I'm starting to wonder if a dedicated WeakSet is the right way to go. Yes but creating WeakSt doesn't necessarily reduce the complexity of the code change needed for HashTable. Another observation I made is that treating released weak reference is a deleted value is problematic because we'd then need to remember to invoke WeakPtr::~WeakPtr in the future anyway. Otherwise, we'd leak every WeakReference whose m_ptr has become null.

Geoffrey Garen

Comment 6 2019-02-28 14:00:47 PST

> > I'm starting to wonder if a dedicated WeakSet is the right way to go. > > Yes but creating WeakSt doesn't necessarily reduce the complexity of the > code change needed for HashTable. Antti's proposal was a WeakSet class with an internal implementation of HashSet<RefPtr<WeakReference<T>>, which explicitly checks for nulls at the API layer. Is that unworkable in some way? Maybe it's still a problem if its buckets become empty?

Ryosuke Niwa

Comment 7 2019-02-28 14:17:24 PST

(In reply to Geoffrey Garen from comment #6) > > > I'm starting to wonder if a dedicated WeakSet is the right way to go. > > > > Yes but creating WeakSt doesn't necessarily reduce the complexity of the > > code change needed for HashTable. > > Antti's proposal was a WeakSet class with an internal implementation of > HashSet<RefPtr<WeakReference<T>>, which explicitly checks for nulls at the > API layer. Is that unworkable in some way? Maybe it's still a problem if its > buckets become empty? It's still a problem because if the bucket becomes "released weak", then they're indistinguishable. We'd need to clear them away at some point. I also came to realization that removal is actually problematic than insertion. In the most naive implementation, removal of "released weak" turns into no-op and then we'd never shrink the table because m_keyCount never goes down. A slightly simpler implementation, which would use ~2x memory than what I have because we can't re-use released weak until HashTable::expand() is called, might invoke something like this: 1. Create a wrapper class like WeakHashSet 2. Update HashTable::expand to do deleteReleasedWeakBuckets (same as current patch) 3. Update HashTable::rehash (same as current patch) 4. WeakHashSet::remove would find "released weak" and actually deletes it. This requires a new method on HashTable or a specialized iterator. I think this approach would have made a sense if we could treat "released weak" like "deleted". As soon as I realized that we can't because we have to call the destructor on WeakPtr, I concluded that it's no simpler than my current approach because we'd be simply shuffling around where ReleasedWeakValueDeleter is used, and it doesn't really reduce much complexity from HashTable itself.

Ryosuke Niwa

Comment 8 2019-02-28 14:19:23 PST

Created attachment 363259 [details] GTK+ builds fix attempt

EWS Watchlist

Comment 9 2019-02-28 14:20:52 PST

Attachment 363259 [details] did not pass style-queue: ERROR: Source/WTF/wtf/HashTable.h:305: This { should be at the end of the previous line [whitespace/braces] [4] ERROR: Source/WTF/wtf/HashTable.h:317: This { should be at the end of the previous line [whitespace/braces] [4] ERROR: Source/WTF/wtf/WeakPtr.h:74: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] Total errors found: 3 in 14 files If any of these errors are false positives, please file a bug against check-webkit-style.

Geoffrey Garen

Comment 10 2019-02-28 14:53:47 PST

> > Antti's proposal was a WeakSet class with an internal implementation of > > HashSet<RefPtr<WeakReference<T>>, which explicitly checks for nulls at the > > API layer. Is that unworkable in some way? Maybe it's still a problem if its > > buckets become empty? > > It's still a problem because if the bucket becomes "released weak", then > they're indistinguishable. We'd need to clear them away at some point. Let me try to clarify the specifics of the problem. Let's say I have HashSet<RefPtr<WeakReference<T>>. Object identity guarantees that the WeakRefrence pointers in my table are unique, and that they hash and test for equality correctly. (Each CanMakeWeakPtr object has exactly one WeakReference*, that WeakReference* uniquely identifies the object, and that WeakReference* lives as long as the table needs it to live.) So, from a correctness perspective, I believe it "just works". I don't think released weak pointers become "indistinguishable". But maybe I'm misunderstanding the term? I believe that each WeakReference* can still be distinguished from each other WeakReference*, and also distinguished from the empty and deleted buckets. (The empty bucket will be the null WeakReference* and the deleted bucket will be the -1 WeakReference*, which no CanMakeWeakPtr object will ever vend to us.) Also, I believe that it's impossible to do a hash lookup that hits a released weak pointer. Every lookup uses the CanMakeWeakPtr object's unique WeakReference*. By implementation, it is guaranteed that a WeakReference* is not released if its CanMakeWeakPtr object still exists. And it is a programming error to lookup a deleted CanMakeWeakPtr object. Given those guarantees, I believe the specifics of the problem are: 1. How should functions like size() and isEmpty() work? 2. How should we avoid unbounded growth in the number of weak released pointer buckets? Is that right? > I also came to realization that removal is actually problematic than > insertion. In the most naive implementation, removal of "released weak" > turns into no-op and then we'd never shrink the table because m_keyCount > never goes down. Yes. This sounds like (2). Right? > A slightly simpler implementation, which would use ~2x memory than what I > have because we can't re-use released weak until HashTable::expand() is > called, might invoke something like this: > 1. Create a wrapper class like WeakHashSet > 2. Update HashTable::expand to do deleteReleasedWeakBuckets (same as current > patch) > 3. Update HashTable::rehash (same as current patch) Yeah, I think something like this would work as a solution to (2). But why would this use more memory? As long as we take the opportunity to clear released weak buckets before we grow the table, I would expect it to take equal memory. Another option is just to make this the wrapper class's problem, and make the wrapper class scan for released weak buckets after N insertions or N total operations. Not quite optimal, but equal by the standard of average case amortized performance, and maybe easier to work with. > 4. WeakHashSet::remove would find "released weak" and actually deletes it. > This requires a new method on HashTable or a specialized iterator. I don't think we need this. As long as you're removing an object that has not yet been deleted, you will either find the object or find nothing. There's no collision with a released weak bucket. Right?

Ryosuke Niwa

Comment 11 2019-02-28 16:54:04 PST

(In reply to Geoffrey Garen from comment #10) > > > Antti's proposal was a WeakSet class with an internal implementation of > > > HashSet<RefPtr<WeakReference<T>>, which explicitly checks for nulls at the > > > API layer. Is that unworkable in some way? Maybe it's still a problem if its > > > buckets become empty? > > > > It's still a problem because if the bucket becomes "released weak", then > > they're indistinguishable. We'd need to clear them away at some point. > > Let me try to clarify the specifics of the problem. > > Let's say I have HashSet<RefPtr<WeakReference<T>>. Object identity > guarantees that the WeakRefrence pointers in my table are unique, and that > they hash and test for equality correctly. (Each CanMakeWeakPtr object has > exactly one WeakReference*, that WeakReference* uniquely identifies the > object, and that WeakReference* lives as long as the table needs it to > live.) So, from a correctness perspective, I believe it "just works". Sure, it "works" in the sense that we won't have a crash, etc... but they'd be "leaked" unless we remove them during rehashing. > I don't think released weak pointers become "indistinguishable". But maybe > I'm misunderstanding the term? I believe that each WeakReference* can still > be distinguished from each other WeakReference*, and also distinguished from > the empty and deleted buckets. Yes, that's true. What I'm saying is that from lookup / removal perspective. There is nothing you can do to find those "released weak's". > Also, I believe that it's impossible to do a hash lookup that hits a > released weak pointer. Every lookup uses the CanMakeWeakPtr object's unique > WeakReference*. By implementation, it is guaranteed that a WeakReference* is > not released if its CanMakeWeakPtr object still exists. And it is a > programming error to lookup a deleted CanMakeWeakPtr object. Right. This is precisely the issue I was referring to when I said they're "indistinguishable". > Given those guarantees, I believe the specifics of the problem are: > > 1. How should functions like size() and isEmpty() work? > > 2. How should we avoid unbounded growth in the number of weak released > pointer buckets? > > Is that right? Right. > > I also came to realization that removal is actually problematic than > > insertion. In the most naive implementation, removal of "released weak" > > turns into no-op and then we'd never shrink the table because m_keyCount > > never goes down. > > Yes. This sounds like (2). Right? Yes. > > A slightly simpler implementation, which would use ~2x memory than what I > > have because we can't re-use released weak until HashTable::expand() is > > called, might invoke something like this: > > 1. Create a wrapper class like WeakHashSet > > 2. Update HashTable::expand to do deleteReleasedWeakBuckets (same as current > > patch) > > 3. Update HashTable::rehash (same as current patch) > > Yeah, I think something like this would work as a solution to (2). > > But why would this use more memory? As long as we take the opportunity to > clear released weak buckets before we grow the table, I would expect it to > take equal memory. Because we would never realize that the keys had been "released" until we examine. Unless we detect whenever we hit a "released weak" and increment m_deletedCount or m_keyCount, we wouldn't realize until we do re-hashing. > Another option is just to make this the wrapper class's problem, and make > the wrapper class scan for released weak buckets after N insertions or N > total operations. Not quite optimal, but equal by the standard of average > case amortized performance, and maybe easier to work with. Yeah, if we're making this a separate class, we probably want to do that. In fact, even my current implementation has an issue that when the hash set is of size N, and N-1 items had been released, we wouldn't ever shrink the hash table until we trigger re-hash or enough lookups to realize that had happened. > > 4. WeakHashSet::remove would find "released weak" and actually deletes it. > > This requires a new method on HashTable or a specialized iterator. > > I don't think we need this. As long as you're removing an object that has > not yet been deleted, you will either find the object or find nothing. > There's no collision with a released weak bucket. Right? Actually, yeah, if we have a wrapper class, then we HashSet itself would return a WeakReference with null m_ptr so we could then delete such an entry.

Ryosuke Niwa

Comment 12 2019-02-28 18:34:04 PST

Created attachment 363287 [details] Added leak checks Here's basically the final patch for this approach.

EWS Watchlist

Comment 13 2019-02-28 18:37:07 PST

Attachment 363287 [details] did not pass style-queue: ERROR: Source/WTF/wtf/HashTable.h:305: This { should be at the end of the previous line [whitespace/braces] [4] ERROR: Source/WTF/wtf/HashTable.h:317: This { should be at the end of the previous line [whitespace/braces] [4] ERROR: Source/WTF/wtf/WeakPtr.h:76: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] Total errors found: 3 in 14 files If any of these errors are false positives, please file a bug against check-webkit-style.

Geoffrey Garen

Comment 14 2019-02-28 19:07:30 PST

> > But why would this use more memory? As long as we take the opportunity to > > clear released weak buckets before we grow the table, I would expect it to > > take equal memory. > > Because we would never realize that the keys had been "released" until we > examine. Unless we detect whenever we hit a "released weak" and increment > m_deletedCount or m_keyCount, we wouldn't realize until we do re-hashing. Got it. So, the memory you're talking about is the backing storage for the WeakReference object. > > Another option is just to make this the wrapper class's problem, and make > > the wrapper class scan for released weak buckets after N insertions or N > > total operations. Not quite optimal, but equal by the standard of average > > case amortized performance, and maybe easier to work with. > > Yeah, if we're making this a separate class, we probably want to do that. In > fact, even my current implementation has an issue that when the hash set is > of size N, and N-1 items had been released, we wouldn't ever shrink the hash > table until we trigger re-hash or enough lookups to realize that had > happened. Right. Adding code to HashTable gives us more opportunities to delete individual WeakReferences -- but only in proportion to the number of client lookup operations, and their luckiness. If we assume that 1/2 of buckets are released WeakReferences on average, and one lookup deletes one released WeakReferences on average, then the memory usage after LiveKeyCount lookups is identical between both implementations. (The HashTable implementation incrementally removes one bucket per lookup up to LiveKeyCount buckets, and the wrapper class implementation incrementally removes zero buckets per lookup and then removes LiveKeyCount buckets at the end.) If we assume other average case conditions, then one or the other implementation will win. In practice, I suspect that either implementation is probably acceptable. > > > 4. WeakHashSet::remove would find "released weak" and actually deletes it. > > > This requires a new method on HashTable or a specialized iterator. > > > > I don't think we need this. As long as you're removing an object that has > > not yet been deleted, you will either find the object or find nothing. > > There's no collision with a released weak bucket. Right? > > Actually, yeah, if we have a wrapper class, then we HashSet itself would > return a WeakReference with null m_ptr so we could then delete such an entry. I don't think that's quite right. Rather, I think it's impossible for the wrapper class solution to return a WeakReference with null m_ptr. To look up T* and return WeakReference<T> with null m_ptr, T* must be a deleted pointer. But looking up a deleted pointer in a weak hash table is not a supported operation. (For one thing, it's a logic error because T* may have been reused to allocate a new object, and you might get a hash table hit in error! Also, mechanically, how would you access T*'s WeakReference* to do the lookup? You would crash trying to load the WeakReference* from the deleted T*.)

EWS Watchlist

Comment 15 2019-02-28 23:42:15 PST

Comment on attachment 363287 [details] Added leak checks Attachment 363287 [details] did not pass mac-debug-ews (mac): Output: https://webkit-queues.webkit.org/results/11326670 New failing tests: storage/indexeddb/modern/blocked-open-db-requests-private.html

EWS Watchlist

Comment 16 2019-02-28 23:42:17 PST

Created attachment 363310 [details] Archive of layout-test-results from ews114 for mac-highsierra The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews114 Port: mac-highsierra Platform: Mac OS X 10.13.6

Ryosuke Niwa

Comment 17 2019-02-28 23:43:33 PST

Created attachment 363311 [details] Adds WeakHashSet

Ryosuke Niwa

Comment 18 2019-02-28 23:45:57 PST

(In reply to Geoffrey Garen from comment #14) > > > > Another option is just to make this the wrapper class's problem, and make > > > the wrapper class scan for released weak buckets after N insertions or N > > > total operations. Not quite optimal, but equal by the standard of average > > > case amortized performance, and maybe easier to work with. > > > > Yeah, if we're making this a separate class, we probably want to do that. In > > fact, even my current implementation has an issue that when the hash set is > > of size N, and N-1 items had been released, we wouldn't ever shrink the hash > > table until we trigger re-hash or enough lookups to realize that had > > happened. > > Right. Adding code to HashTable gives us more opportunities to delete > individual WeakReferences -- but only in proportion to the number of client > lookup operations, and their luckiness. If we assume that 1/2 of buckets are > released WeakReferences on average, and one lookup deletes one released > WeakReferences on average, then the memory usage after LiveKeyCount lookups > is identical between both implementations. (The HashTable implementation > incrementally removes one bucket per lookup up to LiveKeyCount buckets, and > the wrapper class implementation incrementally removes zero buckets per > lookup and then removes LiveKeyCount buckets at the end.) > > If we assume other average case conditions, then one or the other > implementation will win. In practice, I suspect that either implementation > is probably acceptable. Maybe. Now that I have a new implementation, I've started to think that this difference is indeed insignificant. A better strategy might to make iterator keep track of how many "released weak's" we've seen, and trigger a cleanup at the end of it has seen enough of it. > > > > 4. WeakHashSet::remove would find "released weak" and actually deletes it. > > > > This requires a new method on HashTable or a specialized iterator. > > > > > > I don't think we need this. As long as you're removing an object that has > > > not yet been deleted, you will either find the object or find nothing. > > > There's no collision with a released weak bucket. Right? > > > > Actually, yeah, if we have a wrapper class, then we HashSet itself would > > return a WeakReference with null m_ptr so we could then delete such an entry. > > I don't think that's quite right. Rather, I think it's impossible for the > wrapper class solution to return a WeakReference with null m_ptr. To look up > T* and return WeakReference<T> with null m_ptr, T* must be a deleted > pointer. But looking up a deleted pointer in a weak hash table is not a > supported operation. (For one thing, it's a logic error because T* may have > been reused to allocate a new object, and you might get a hash table hit in > error! Also, mechanically, how would you access T*'s WeakReference* to do > the lookup? You would crash trying to load the WeakReference* from the > deleted T*.) Yeah, I see what you're saying that I've written the code.

Antti Koivisto

Comment 19 2019-03-01 05:29:19 PST

Comment on attachment 363311 [details] Adds WeakHashSet View in context: https://bugs.webkit.org/attachment.cgi?id=363311&action=review > Source/WTF/wtf/HashTable.h:427 > + static bool isReleasedWeakBucket(const ValueType& value) { return isHashTraitsReleasedWeakValue<KeyTraits>(Extractor::extract(value)); } I wasn't sure with the old patch if these HashTable changes are really worth doing. Are they a significant improvement over simply clearing out released weak values in WeakHashSet::add when hitting the expansion size? > Source/WTF/wtf/WeakHashSet.h:64 > + void skipEmptyBuckets() > + { > + while (m_position != m_endPosition && !m_position->get().get()) > + ++m_position; > + } Iterator could also drop released weak values so iterating a table with lots of released values would be slow only once.

Ryosuke Niwa

Comment 20 2019-03-01 10:51:55 PST

(In reply to Antti Koivisto from comment #19) > Comment on attachment 363311 [details] > Adds WeakHashSet > > View in context: > https://bugs.webkit.org/attachment.cgi?id=363311&action=review > > > Source/WTF/wtf/HashTable.h:427 > > + static bool isReleasedWeakBucket(const ValueType& value) { return isHashTraitsReleasedWeakValue<KeyTraits>(Extractor::extract(value)); } > > I wasn't sure with the old patch if these HashTable changes are really worth > doing. Are they a significant improvement over simply clearing out released > weak values in WeakHashSet::add when hitting the expansion size? The challenge here is that shouldExpand() & shouldShrink() are checked after we've inserted / removed an item. This would mean that we need a new variant of shouldExpand() & shouldShrink() that should return true when we were to add or remove another item. Even then we'd still incur an extra iteration over the hash table for the initial cleanup. That seemed like a really fragile design because then if we ever make refactoring changes to HashTable, whatever approach / optimization we do in WeakSet break. > > Source/WTF/wtf/WeakHashSet.h:64 > > + void skipEmptyBuckets() > > + { > > + while (m_position != m_endPosition && !m_position->get().get()) > > + ++m_position; > > + } > > Iterator could also drop released weak values so iterating a table with lots > of released values would be slow only once. Yeah, as I said in my earlier comment, that could be a valid optimization strategy. I was thinking that maybe I'll do that in a separate patch.

EWS Watchlist

Comment 21 2019-03-01 21:13:46 PST

Comment on attachment 363311 [details] Adds WeakHashSet Attachment 363311 [details] did not pass mac-debug-ews (mac): Output: https://webkit-queues.webkit.org/results/11339043 New failing tests: storage/indexeddb/modern/createobjectstore-basic-private.html

EWS Watchlist

Comment 22 2019-03-01 21:13:47 PST

Created attachment 363408 [details] Archive of layout-test-results from ews112 for mac-highsierra The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews112 Port: mac-highsierra Platform: Mac OS X 10.13.6

Antti Koivisto

Comment 23 2019-03-03 00:33:41 PST

Comment on attachment 363311 [details] Adds WeakHashSet Ok, seems good enough to start with. r=me

Ryosuke Niwa

Comment 24 2019-03-04 13:52:34 PST

Committed r242387: <https://trac.webkit.org/changeset/242387>

Radar WebKit Bug Importer

Comment 25 2019-03-04 15:43:14 PST

<rdar://problem/48580628>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Web Template Framework

Assignee

Ryosuke Niwa

Reported

2019-02-28 00:36 PST

Modified

2019-03-04 15:43 PST History

CC List

10 users Show

URL

Keywords InRadar

Depends on

Blocks