Bug 195152 - Add WeakHashSet
Summary: Add WeakHashSet
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Ryosuke Niwa
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-02-28 00:36 PST by Ryosuke Niwa
Modified: 2019-03-04 15:43 PST (History)
10 users (show)

See Also:


Attachments
WIP (37.65 KB, patch)
2019-02-28 00:36 PST, Ryosuke Niwa
no flags Details | Formatted Diff | Diff
GTK+ builds fix attempt (38.23 KB, patch)
2019-02-28 14:19 PST, Ryosuke Niwa
no flags Details | Formatted Diff | Diff
Added leak checks (39.46 KB, patch)
2019-02-28 18:34 PST, Ryosuke Niwa
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews114 for mac-highsierra (1.60 MB, application/zip)
2019-02-28 23:42 PST, EWS Watchlist
no flags Details
Adds WeakHashSet (32.58 KB, patch)
2019-02-28 23:43 PST, Ryosuke Niwa
koivisto: review+
ews-watchlist: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews112 for mac-highsierra (1.36 MB, application/zip)
2019-03-01 21:13 PST, EWS Watchlist
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ryosuke Niwa 2019-02-28 00:36:13 PST
Make it possible to store WeakPtr as a HashSet's key.

Adding the support for HashMap is tricky because we'd have to delete the value as well.
Comment 1 Ryosuke Niwa 2019-02-28 00:36:54 PST
Created attachment 363209 [details]
WIP
Comment 2 EWS Watchlist 2019-02-28 00:39:39 PST
Attachment 363209 [details] did not pass style-queue:


ERROR: Source/WTF/wtf/HashTable.h:305:  This { should be at the end of the previous line  [whitespace/braces] [4]
ERROR: Source/WTF/wtf/HashTable.h:317:  This { should be at the end of the previous line  [whitespace/braces] [4]
ERROR: Source/WTF/wtf/WeakPtr.h:74:  Should be indented on a separate line, with the colon or comma first on that line.  [whitespace/indent] [4]
Total errors found: 3 in 13 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Ryosuke Niwa 2019-02-28 00:48:10 PST
A few notes:

size(), isEmpty(), and random() can no longer be done in O(1) as some of keys may have been "released" without HashTable ever getting notified. I'm disabling these methods with std::enable_if for now. random() is particularly an interesting one because it would work in most cases except when the hash table is empty, in which case, it could infinite loop as there is no guarantee that we've visited all N slots at any point as it performs a random access.

Unlike with other keys, we need to deal with the fact key suddenly becomes "released". We have to deal with this issue in inlineLookup, and in fact, have to shrink the table there because the canonical way to remove a key, remove(find(X)), would fail as find(X) would need to return HashSet::end().

expand() needs to consider the possibility that enough buckets may become available if we yanked out all WeakPtr that have become "released". Note that m_keyCount + m_deletedCount stay valid if we consider "released" state as equivalent as "deleted" because WeakPtr could only move from being valid to being "released".
Comment 4 Geoffrey Garen 2019-02-28 11:33:16 PST
Comment on attachment 363209 [details]
WIP

Now that you've enumerated all the edge cases created by weak keys, I don't feel great about them.

The idea that lookup must delete is particularly surprising. 

But even the idea that insertion must delete is somewhat surprising.

I'm starting to wonder if a dedicated WeakSet is the right way to go.
Comment 5 Ryosuke Niwa 2019-02-28 13:38:31 PST
(In reply to Geoffrey Garen from comment #4)
> Comment on attachment 363209 [details]
> WIP
>
> The idea that lookup must delete is particularly surprising. 
> 
> But even the idea that insertion must delete is somewhat surprising.
> 
> I'm starting to wonder if a dedicated WeakSet is the right way to go.

Yes but creating WeakSt doesn't necessarily reduce the complexity of the code change needed for HashTable.

Another observation I made is that treating released weak reference is a deleted value is problematic because we'd then need to remember to invoke WeakPtr::~WeakPtr in the future anyway. Otherwise, we'd leak every WeakReference whose m_ptr has become null.
Comment 6 Geoffrey Garen 2019-02-28 14:00:47 PST
> > I'm starting to wonder if a dedicated WeakSet is the right way to go.
> 
> Yes but creating WeakSt doesn't necessarily reduce the complexity of the
> code change needed for HashTable.

Antti's proposal was a WeakSet class with an internal implementation of HashSet<RefPtr<WeakReference<T>>, which explicitly checks for nulls at the API layer. Is that unworkable in some way? Maybe it's still a problem if its buckets become empty?
Comment 7 Ryosuke Niwa 2019-02-28 14:17:24 PST
(In reply to Geoffrey Garen from comment #6)
> > > I'm starting to wonder if a dedicated WeakSet is the right way to go.
> > 
> > Yes but creating WeakSt doesn't necessarily reduce the complexity of the
> > code change needed for HashTable.
> 
> Antti's proposal was a WeakSet class with an internal implementation of
> HashSet<RefPtr<WeakReference<T>>, which explicitly checks for nulls at the
> API layer. Is that unworkable in some way? Maybe it's still a problem if its
> buckets become empty?

It's still a problem because if the bucket becomes "released weak", then they're indistinguishable. We'd need to clear them away at some point.

I also came to realization that removal is actually problematic than insertion. In the most naive implementation, removal of "released weak" turns into no-op and then we'd never shrink the table because m_keyCount never goes down.

A slightly simpler implementation, which would use ~2x memory than what I have because we can't re-use released weak until HashTable::expand() is called, might invoke something like this:
1. Create a wrapper class like WeakHashSet
2. Update HashTable::expand to do deleteReleasedWeakBuckets (same as current patch)
3. Update HashTable::rehash (same as current patch)
4. WeakHashSet::remove would find "released weak" and actually deletes it. This requires a new method on HashTable or a specialized iterator.

I think this approach would have made a sense if we could treat "released weak" like "deleted". As soon as I realized that we can't because we have to call the destructor on WeakPtr, I concluded that it's no simpler than my current approach because we'd be simply shuffling around where ReleasedWeakValueDeleter is used, and it doesn't really reduce much complexity from HashTable itself.
Comment 8 Ryosuke Niwa 2019-02-28 14:19:23 PST
Created attachment 363259 [details]
GTK+ builds fix attempt
Comment 9 EWS Watchlist 2019-02-28 14:20:52 PST
Attachment 363259 [details] did not pass style-queue:


ERROR: Source/WTF/wtf/HashTable.h:305:  This { should be at the end of the previous line  [whitespace/braces] [4]
ERROR: Source/WTF/wtf/HashTable.h:317:  This { should be at the end of the previous line  [whitespace/braces] [4]
ERROR: Source/WTF/wtf/WeakPtr.h:74:  Should be indented on a separate line, with the colon or comma first on that line.  [whitespace/indent] [4]
Total errors found: 3 in 14 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 10 Geoffrey Garen 2019-02-28 14:53:47 PST
> > Antti's proposal was a WeakSet class with an internal implementation of
> > HashSet<RefPtr<WeakReference<T>>, which explicitly checks for nulls at the
> > API layer. Is that unworkable in some way? Maybe it's still a problem if its
> > buckets become empty?
> 
> It's still a problem because if the bucket becomes "released weak", then
> they're indistinguishable. We'd need to clear them away at some point.

Let me try to clarify the specifics of the problem.

Let's say I have HashSet<RefPtr<WeakReference<T>>. Object identity guarantees that the WeakRefrence pointers in my table are unique, and that they hash and test for equality correctly. (Each CanMakeWeakPtr object has exactly one WeakReference*, that WeakReference* uniquely identifies the object, and that WeakReference* lives as long as the table needs it to live.) So, from a correctness perspective, I believe it "just works".

I don't think released weak pointers become "indistinguishable". But maybe I'm misunderstanding the term? I believe that each WeakReference* can still be distinguished from each other WeakReference*, and also distinguished from the empty and deleted buckets. (The empty bucket will be the null WeakReference* and the deleted bucket will be the -1 WeakReference*, which no CanMakeWeakPtr object will ever vend to us.)

Also, I believe that it's impossible to do a hash lookup that hits a released weak pointer. Every lookup uses the CanMakeWeakPtr object's unique WeakReference*. By implementation, it is guaranteed that a WeakReference* is not released if its CanMakeWeakPtr object still exists. And it is a programming error to lookup a deleted CanMakeWeakPtr object.

Given those guarantees, I believe the specifics of the problem are:

1. How should functions like size() and isEmpty() work?

2. How should we avoid unbounded growth in the number of weak released pointer buckets?

Is that right?

> I also came to realization that removal is actually problematic than
> insertion. In the most naive implementation, removal of "released weak"
> turns into no-op and then we'd never shrink the table because m_keyCount
> never goes down.

Yes. This sounds like (2). Right?

> A slightly simpler implementation, which would use ~2x memory than what I
> have because we can't re-use released weak until HashTable::expand() is
> called, might invoke something like this:
> 1. Create a wrapper class like WeakHashSet
> 2. Update HashTable::expand to do deleteReleasedWeakBuckets (same as current
> patch)
> 3. Update HashTable::rehash (same as current patch)

Yeah, I think something like this would work as a solution to (2).

But why would this use more memory? As long as we take the opportunity to clear released weak buckets before we grow the table, I would expect it to take equal memory.

Another option is just to make this the wrapper class's problem, and make the wrapper class scan for released weak buckets after N insertions or N total operations. Not quite optimal, but equal by the standard of average case amortized performance, and maybe easier to work with.

> 4. WeakHashSet::remove would find "released weak" and actually deletes it.
> This requires a new method on HashTable or a specialized iterator.

I don't think we need this. As long as you're removing an object that has not yet been deleted, you will either find the object or find nothing. There's no collision with a released weak bucket. Right?
Comment 11 Ryosuke Niwa 2019-02-28 16:54:04 PST
(In reply to Geoffrey Garen from comment #10)
> > > Antti's proposal was a WeakSet class with an internal implementation of
> > > HashSet<RefPtr<WeakReference<T>>, which explicitly checks for nulls at the
> > > API layer. Is that unworkable in some way? Maybe it's still a problem if its
> > > buckets become empty?
> > 
> > It's still a problem because if the bucket becomes "released weak", then
> > they're indistinguishable. We'd need to clear them away at some point.
> 
> Let me try to clarify the specifics of the problem.
> 
> Let's say I have HashSet<RefPtr<WeakReference<T>>. Object identity
> guarantees that the WeakRefrence pointers in my table are unique, and that
> they hash and test for equality correctly. (Each CanMakeWeakPtr object has
> exactly one WeakReference*, that WeakReference* uniquely identifies the
> object, and that WeakReference* lives as long as the table needs it to
> live.) So, from a correctness perspective, I believe it "just works".

Sure, it "works" in the sense that we won't have a crash, etc... but they'd be "leaked" unless we remove them during rehashing.

> I don't think released weak pointers become "indistinguishable". But maybe
> I'm misunderstanding the term? I believe that each WeakReference* can still
> be distinguished from each other WeakReference*, and also distinguished from
> the empty and deleted buckets.

Yes, that's true. What I'm saying is that from lookup / removal perspective. There is nothing you can do to find those "released weak's".

> Also, I believe that it's impossible to do a hash lookup that hits a
> released weak pointer. Every lookup uses the CanMakeWeakPtr object's unique
> WeakReference*. By implementation, it is guaranteed that a WeakReference* is
> not released if its CanMakeWeakPtr object still exists. And it is a
> programming error to lookup a deleted CanMakeWeakPtr object.

Right. This is precisely the issue I was referring to when I said they're "indistinguishable".

> Given those guarantees, I believe the specifics of the problem are:
> 
> 1. How should functions like size() and isEmpty() work?
> 
> 2. How should we avoid unbounded growth in the number of weak released
> pointer buckets?
> 
> Is that right?

Right.

> > I also came to realization that removal is actually problematic than
> > insertion. In the most naive implementation, removal of "released weak"
> > turns into no-op and then we'd never shrink the table because m_keyCount
> > never goes down.
> 
> Yes. This sounds like (2). Right?

Yes.

> > A slightly simpler implementation, which would use ~2x memory than what I
> > have because we can't re-use released weak until HashTable::expand() is
> > called, might invoke something like this:
> > 1. Create a wrapper class like WeakHashSet
> > 2. Update HashTable::expand to do deleteReleasedWeakBuckets (same as current
> > patch)
> > 3. Update HashTable::rehash (same as current patch)
> 
> Yeah, I think something like this would work as a solution to (2).
> 
> But why would this use more memory? As long as we take the opportunity to
> clear released weak buckets before we grow the table, I would expect it to
> take equal memory.

Because we would never realize that the keys had been "released" until we examine. Unless we detect whenever we hit a "released weak" and increment m_deletedCount or m_keyCount, we wouldn't realize until we do re-hashing.

> Another option is just to make this the wrapper class's problem, and make
> the wrapper class scan for released weak buckets after N insertions or N
> total operations. Not quite optimal, but equal by the standard of average
> case amortized performance, and maybe easier to work with.

Yeah, if we're making this a separate class, we probably want to do that. In fact, even my current implementation has an issue that when the hash set is of size N, and N-1 items had been released, we wouldn't ever shrink the hash table until we trigger re-hash or enough lookups to realize that had happened.

> > 4. WeakHashSet::remove would find "released weak" and actually deletes it.
> > This requires a new method on HashTable or a specialized iterator.
> 
> I don't think we need this. As long as you're removing an object that has
> not yet been deleted, you will either find the object or find nothing.
> There's no collision with a released weak bucket. Right?

Actually, yeah, if we have a wrapper class, then we HashSet itself would return a WeakReference with null m_ptr so we could then delete such an entry.
Comment 12 Ryosuke Niwa 2019-02-28 18:34:04 PST
Created attachment 363287 [details]
Added leak checks

Here's basically the final patch for this approach.
Comment 13 EWS Watchlist 2019-02-28 18:37:07 PST
Attachment 363287 [details] did not pass style-queue:


ERROR: Source/WTF/wtf/HashTable.h:305:  This { should be at the end of the previous line  [whitespace/braces] [4]
ERROR: Source/WTF/wtf/HashTable.h:317:  This { should be at the end of the previous line  [whitespace/braces] [4]
ERROR: Source/WTF/wtf/WeakPtr.h:76:  Should be indented on a separate line, with the colon or comma first on that line.  [whitespace/indent] [4]
Total errors found: 3 in 14 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 14 Geoffrey Garen 2019-02-28 19:07:30 PST
> > But why would this use more memory? As long as we take the opportunity to
> > clear released weak buckets before we grow the table, I would expect it to
> > take equal memory.
> 
> Because we would never realize that the keys had been "released" until we
> examine. Unless we detect whenever we hit a "released weak" and increment
> m_deletedCount or m_keyCount, we wouldn't realize until we do re-hashing.

Got it. So, the memory you're talking about is the backing storage for the WeakReference object.

> > Another option is just to make this the wrapper class's problem, and make
> > the wrapper class scan for released weak buckets after N insertions or N
> > total operations. Not quite optimal, but equal by the standard of average
> > case amortized performance, and maybe easier to work with.
> 
> Yeah, if we're making this a separate class, we probably want to do that. In
> fact, even my current implementation has an issue that when the hash set is
> of size N, and N-1 items had been released, we wouldn't ever shrink the hash
> table until we trigger re-hash or enough lookups to realize that had
> happened.

Right. Adding code to HashTable gives us more opportunities to delete individual WeakReferences -- but only in proportion to the number of client lookup operations, and their luckiness. If we assume that 1/2 of buckets are released WeakReferences on average, and one lookup deletes one released WeakReferences on average, then the memory usage after LiveKeyCount lookups is identical between both implementations. (The HashTable implementation incrementally removes one bucket per lookup up to LiveKeyCount buckets, and the wrapper class implementation incrementally removes zero buckets per lookup and then removes LiveKeyCount buckets at the end.)

If we assume other average case conditions, then one or the other implementation will win. In practice, I suspect that either implementation is probably acceptable.

> > > 4. WeakHashSet::remove would find "released weak" and actually deletes it.
> > > This requires a new method on HashTable or a specialized iterator.
> > 
> > I don't think we need this. As long as you're removing an object that has
> > not yet been deleted, you will either find the object or find nothing.
> > There's no collision with a released weak bucket. Right?
> 
> Actually, yeah, if we have a wrapper class, then we HashSet itself would
> return a WeakReference with null m_ptr so we could then delete such an entry.

I don't think that's quite right. Rather, I think it's impossible for the wrapper class solution to return a WeakReference with null m_ptr. To look up T* and return WeakReference<T> with null m_ptr, T* must be a deleted pointer. But looking up a deleted pointer in a weak hash table is not a supported operation. (For one thing, it's a logic error because T* may have been reused to allocate a new object, and you might get a hash table hit in error! Also, mechanically, how would you access T*'s WeakReference* to do the lookup? You would crash trying to load the WeakReference* from the deleted T*.)
Comment 15 EWS Watchlist 2019-02-28 23:42:15 PST
Comment on attachment 363287 [details]
Added leak checks

Attachment 363287 [details] did not pass mac-debug-ews (mac):
Output: https://webkit-queues.webkit.org/results/11326670

New failing tests:
storage/indexeddb/modern/blocked-open-db-requests-private.html
Comment 16 EWS Watchlist 2019-02-28 23:42:17 PST
Created attachment 363310 [details]
Archive of layout-test-results from ews114 for mac-highsierra

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews114  Port: mac-highsierra  Platform: Mac OS X 10.13.6
Comment 17 Ryosuke Niwa 2019-02-28 23:43:33 PST
Created attachment 363311 [details]
Adds WeakHashSet
Comment 18 Ryosuke Niwa 2019-02-28 23:45:57 PST
(In reply to Geoffrey Garen from comment #14)
>
> > > Another option is just to make this the wrapper class's problem, and make
> > > the wrapper class scan for released weak buckets after N insertions or N
> > > total operations. Not quite optimal, but equal by the standard of average
> > > case amortized performance, and maybe easier to work with.
> > 
> > Yeah, if we're making this a separate class, we probably want to do that. In
> > fact, even my current implementation has an issue that when the hash set is
> > of size N, and N-1 items had been released, we wouldn't ever shrink the hash
> > table until we trigger re-hash or enough lookups to realize that had
> > happened.
> 
> Right. Adding code to HashTable gives us more opportunities to delete
> individual WeakReferences -- but only in proportion to the number of client
> lookup operations, and their luckiness. If we assume that 1/2 of buckets are
> released WeakReferences on average, and one lookup deletes one released
> WeakReferences on average, then the memory usage after LiveKeyCount lookups
> is identical between both implementations. (The HashTable implementation
> incrementally removes one bucket per lookup up to LiveKeyCount buckets, and
> the wrapper class implementation incrementally removes zero buckets per
> lookup and then removes LiveKeyCount buckets at the end.)
> 
> If we assume other average case conditions, then one or the other
> implementation will win. In practice, I suspect that either implementation
> is probably acceptable.

Maybe. Now that I have a new implementation, I've started to think that this difference is indeed insignificant.

A better strategy might to make iterator keep track of how many "released weak's" we've seen, and trigger a cleanup at the end of it has seen enough of it.

> > > > 4. WeakHashSet::remove would find "released weak" and actually deletes it.
> > > > This requires a new method on HashTable or a specialized iterator.
> > > 
> > > I don't think we need this. As long as you're removing an object that has
> > > not yet been deleted, you will either find the object or find nothing.
> > > There's no collision with a released weak bucket. Right?
> > 
> > Actually, yeah, if we have a wrapper class, then we HashSet itself would
> > return a WeakReference with null m_ptr so we could then delete such an entry.
> 
> I don't think that's quite right. Rather, I think it's impossible for the
> wrapper class solution to return a WeakReference with null m_ptr. To look up
> T* and return WeakReference<T> with null m_ptr, T* must be a deleted
> pointer. But looking up a deleted pointer in a weak hash table is not a
> supported operation. (For one thing, it's a logic error because T* may have
> been reused to allocate a new object, and you might get a hash table hit in
> error! Also, mechanically, how would you access T*'s WeakReference* to do
> the lookup? You would crash trying to load the WeakReference* from the
> deleted T*.)

Yeah, I see what you're saying that I've written the code.
Comment 19 Antti Koivisto 2019-03-01 05:29:19 PST
Comment on attachment 363311 [details]
Adds WeakHashSet

View in context: https://bugs.webkit.org/attachment.cgi?id=363311&action=review

> Source/WTF/wtf/HashTable.h:427
> +        static bool isReleasedWeakBucket(const ValueType& value) { return isHashTraitsReleasedWeakValue<KeyTraits>(Extractor::extract(value)); }

I wasn't sure with the old patch if these HashTable changes are really worth doing. Are they a significant improvement over simply clearing out released weak values in WeakHashSet::add when hitting the expansion size?

> Source/WTF/wtf/WeakHashSet.h:64
> +        void skipEmptyBuckets()
> +        {
> +            while (m_position != m_endPosition && !m_position->get().get())
> +                ++m_position;
> +        }

Iterator could also drop released weak values so iterating a table with lots of released values would be slow only once.
Comment 20 Ryosuke Niwa 2019-03-01 10:51:55 PST
(In reply to Antti Koivisto from comment #19)
> Comment on attachment 363311 [details]
> Adds WeakHashSet
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=363311&action=review
> 
> > Source/WTF/wtf/HashTable.h:427
> > +        static bool isReleasedWeakBucket(const ValueType& value) { return isHashTraitsReleasedWeakValue<KeyTraits>(Extractor::extract(value)); }
> 
> I wasn't sure with the old patch if these HashTable changes are really worth
> doing. Are they a significant improvement over simply clearing out released
> weak values in WeakHashSet::add when hitting the expansion size?

The challenge here is that shouldExpand() & shouldShrink() are checked after we've inserted / removed an item. This would mean that we need a new variant of shouldExpand() & shouldShrink() that should return true when we were to add or remove another item. Even then we'd still incur an extra iteration over the hash table for the initial cleanup.

That seemed like a really fragile design because then if we ever make refactoring changes to HashTable, whatever approach / optimization we do in WeakSet break.

> > Source/WTF/wtf/WeakHashSet.h:64
> > +        void skipEmptyBuckets()
> > +        {
> > +            while (m_position != m_endPosition && !m_position->get().get())
> > +                ++m_position;
> > +        }
> 
> Iterator could also drop released weak values so iterating a table with lots
> of released values would be slow only once.

Yeah, as I said in my earlier comment, that could be a valid optimization strategy. I was thinking that maybe I'll do that in a separate patch.
Comment 21 EWS Watchlist 2019-03-01 21:13:46 PST
Comment on attachment 363311 [details]
Adds WeakHashSet

Attachment 363311 [details] did not pass mac-debug-ews (mac):
Output: https://webkit-queues.webkit.org/results/11339043

New failing tests:
storage/indexeddb/modern/createobjectstore-basic-private.html
Comment 22 EWS Watchlist 2019-03-01 21:13:47 PST
Created attachment 363408 [details]
Archive of layout-test-results from ews112 for mac-highsierra

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112  Port: mac-highsierra  Platform: Mac OS X 10.13.6
Comment 23 Antti Koivisto 2019-03-03 00:33:41 PST
Comment on attachment 363311 [details]
Adds WeakHashSet

Ok, seems good enough to start with. r=me
Comment 24 Ryosuke Niwa 2019-03-04 13:52:34 PST
Committed r242387: <https://trac.webkit.org/changeset/242387>
Comment 25 Radar WebKit Bug Importer 2019-03-04 15:43:14 PST
<rdar://problem/48580628>