Bug 147947 - WTF::Lock should not suffer from the thundering herd
Summary: WTF::Lock should not suffer from the thundering herd
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Filip Pizlo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-12 12:46 PDT by Filip Pizlo
Modified: 2015-08-13 10:30 PDT (History)
14 users (show)

See Also:


Attachments
it might be right (14.86 KB, patch)
2015-08-12 12:48 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
I think it works (23.34 KB, patch)
2015-08-12 13:53 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
it works (23.35 KB, patch)
2015-08-12 15:15 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (23.80 KB, patch)
2015-08-12 16:54 PDT, Filip Pizlo
ggaren: review+
Details | Formatted Diff | Diff
patch for landing (23.86 KB, patch)
2015-08-12 18:11 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2015-08-12 12:46:43 PDT
Patch forthcoming.
Comment 1 Filip Pizlo 2015-08-12 12:48:31 PDT
Created attachment 258838 [details]
it might be right
Comment 2 Filip Pizlo 2015-08-12 13:53:45 PDT
Created attachment 258844 [details]
I think it works
Comment 3 WebKit Commit Bot 2015-08-12 13:55:31 PDT
Attachment 258844 [details] did not pass style-queue:


ERROR: Source/WTF/wtf/ParkingLot.cpp:579:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WTF/wtf/ParkingLot.cpp:585:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WTF/wtf/Lock.cpp:98:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 3 in 9 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Filip Pizlo 2015-08-12 15:15:33 PDT
Created attachment 258849 [details]
it works
Comment 5 WebKit Commit Bot 2015-08-12 15:17:05 PDT
Attachment 258849 [details] did not pass style-queue:


ERROR: Source/WTF/wtf/ParkingLot.cpp:579:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WTF/wtf/ParkingLot.cpp:585:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WTF/wtf/Lock.cpp:98:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 3 in 9 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 6 Filip Pizlo 2015-08-12 16:54:08 PDT
Created attachment 258853 [details]
the patch

Fix a debug-only issue.
Comment 7 WebKit Commit Bot 2015-08-12 16:55:46 PDT
Attachment 258853 [details] did not pass style-queue:


ERROR: Source/WTF/wtf/ParkingLot.cpp:579:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WTF/wtf/ParkingLot.cpp:585:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WTF/wtf/Lock.cpp:98:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 3 in 9 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 Mark Lam 2015-08-12 17:31:45 PDT
Comment on attachment 258849 [details]
it works

View in context: https://bugs.webkit.org/attachment.cgi?id=258849&action=review

> Source/WTF/wtf/ParkingLot.cpp:497
> +        finishFunctor(result);

"finishFunctor" is not very descriptive.  Can we call it something like "reportMayHaveMoreThreadsFunctor" instead?

> Source/WTF/wtf/ParkingLot.cpp:-506
> -    ASSERT(!me->shouldPark);
> -    

Can we ASSERT(!me->address)?

> Source/WTF/wtf/ParkingLot.cpp:-528
> -
> -        me->address = nullptr;

Why remove this clearing of me->address?  Oh, I see that we clear it in the unpark functions.

> Source/WTF/wtf/ParkingLot.cpp:598
> +    threadData->parkingCondition.notify_one();

Let me reason thru how things are working.  The way we get to this unparkOne function is if the lock word / byte indicates that it is parked.  To unpark, we do the following in order:
1. in dequeue(), lock the bucket that waiting threads are parked on.  That keeps new threads from parking.
2. in Bucket::genericDequeue(), we take a candidate ThreadData from the bucket.
3. in dequeue(), we call the reportMayHaveMoreThreads functor, which invokes our callback here.
4. the callback in this case is from LockBase::unlockSlow(), where it clears the isHeldBit on the lock word / byte.  

    The lock is now available for contention.
    If the lock is still parked (has waiting threads), the acquiring thread will acquire the lock with isHeldBit | hasParkedBit.
    If the lock has no waiting threads, the acquiring thread will acquire the lock with isHeldBit.

5. in dequeue(), we unlock the bucket.  Any other thread may now park in the bucket.

    If an contending thread comes along, it will just park in the bucket now if the lock hasParkedBit.

6. In unparkOne() here, we notify the thread whose ThreadData we took from the bucket.  This wakes this thread up to contend for the lock.

The only guarantee that I think we need is for step 4 (setting the new lock bits) to take place before we unlock the bucket in step 5.  As long as we have that, we can guarantee that the hasParkedBit in the lock word / byte is consistent with whether we have waiting threads in the bucket or not.

Then, why does this notify_one() need to be outside of the protection of the parkingLock?  Ditto for the notifies in the other 2 unpark functions.  The waking thread can't do anything until its parkingLock is released.  If it doesn't make a difference (as I see it), then I'd prefer to do the notify while holding the parkingLock as that's the normal practice with how condition variables are used.

> Tools/TestWebKitAPI/Tests/WTF/Lock.cpp:84
> +        for (unsigned i = 2; i--;) {
> +            locks[threadGroupIndex].lock();
> +            locks[threadGroupIndex].unlock();
> +        }

From what I can tell, the algorithm for clearing the hasParkedBit is exact and not racy. Above, we waited for all the contending threads to complete.  That means they should have unlocked the locks.  Why do you need to do this lock and unlock pair at all?
Comment 9 Geoffrey Garen 2015-08-12 17:46:36 PDT
Comment on attachment 258853 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=258853&action=review

r=me

> Source/WTF/wtf/ParkingLot.cpp:545
> +        false,

This should be a named local variable or an enum, so you can tell at the call site that it means "do not ensure bucket".

> Source/WTF/wtf/ParkingLot.cpp:609
> +        false,

This should be a named local variable or an enum, so you can tell at the call site that it means "do not ensure bucket".

> Source/WTF/wtf/ParkingLot.h:75
> +        std::function<void(bool didUnparkThread, bool mayHaveMoreThreads)> callback);

In what sense is mayHaveMoreThreads a maybe as opposed to a yes/no?

I think you're pointing out that another thread may be trying to park right now. But, since I hold the parking lock and the bucket lock during this callback, the other thread can't park yet, and I know for certain which other threads are parked at this moment.

Do I understand this correctly? If so, I would rename to "hasMoreParkedThreads".
Comment 10 Filip Pizlo 2015-08-12 17:56:01 PDT
(In reply to comment #8)
> Comment on attachment 258849 [details]
> it works
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=258849&action=review
> 
> > Source/WTF/wtf/ParkingLot.cpp:497
> > +        finishFunctor(result);
> 
> "finishFunctor" is not very descriptive.  Can we call it something like
> "reportMayHaveMoreThreadsFunctor" instead?

But that's not an accurate name.  It's also called when the lock is held.  In particular, it's the last thing we do before releasing the lock.  Hence "finish" is probably a decent name - short, and about as precise as you can get.

> 
> > Source/WTF/wtf/ParkingLot.cpp:-506
> > -    ASSERT(!me->shouldPark);
> > -    
> 
> Can we ASSERT(!me->address)?

Fixed in latest patch.

> 
> > Source/WTF/wtf/ParkingLot.cpp:-528
> > -
> > -        me->address = nullptr;
> 
> Why remove this clearing of me->address?  Oh, I see that we clear it in the
> unpark functions.

Yeah.

> 
> > Source/WTF/wtf/ParkingLot.cpp:598
> > +    threadData->parkingCondition.notify_one();
> 
> Let me reason thru how things are working.  The way we get to this unparkOne
> function is if the lock word / byte indicates that it is parked.  To unpark,
> we do the following in order:
> 1. in dequeue(), lock the bucket that waiting threads are parked on.  That
> keeps new threads from parking.
> 2. in Bucket::genericDequeue(), we take a candidate ThreadData from the
> bucket.
> 3. in dequeue(), we call the reportMayHaveMoreThreads functor, which invokes
> our callback here.
> 4. the callback in this case is from LockBase::unlockSlow(), where it clears
> the isHeldBit on the lock word / byte.  
> 
>     The lock is now available for contention.
>     If the lock is still parked (has waiting threads), the acquiring thread
> will acquire the lock with isHeldBit | hasParkedBit.
>     If the lock has no waiting threads, the acquiring thread will acquire
> the lock with isHeldBit.
> 
> 5. in dequeue(), we unlock the bucket.  Any other thread may now park in the
> bucket.
> 
>     If an contending thread comes along, it will just park in the bucket now
> if the lock hasParkedBit.

No, it wouldn't park, because threads only park if isHeldBit is also set:

        ParkingLot::compareAndPark(&m_byte, isHeldBit | hasParkedBit);

Right now, just after we unlocked the lock, the isHeldBit is *not* set.  The lock's value is either "0" or "hasParkedBit".

So, it's true that at this point some thread (either the one we unparked or a different one) could already acquire the lock, and then any other thread acquiring the lock would park.  But, I just wanted to clarify that there has to be more than one contending thread for parking to happen.

> 
> 6. In unparkOne() here, we notify the thread whose ThreadData we took from
> the bucket.  This wakes this thread up to contend for the lock.
> 
> The only guarantee that I think we need is for step 4 (setting the new lock
> bits) to take place before we unlock the bucket in step 5.  As long as we
> have that, we can guarantee that the hasParkedBit in the lock word / byte is
> consistent with whether we have waiting threads in the bucket or not.

I'm not sure what you're suggesting, but I'm pretty sure you're wrong.  Are you saying that you believe that the algorithm is unnecessarily complex?  Are you saying that it could be made faster?  Are you saying that it's wrong?

Step 4 needs to set the hasParkedBit based on whether the queue contains any more elements after we dequeued the first one.  If we moved step 4 to before step 2, you'd need to do a second queue traversal to get this information.  If we moved step 4 to before step 1, you'd need to do a second hashing, second locking of the bucket, and a second queue traversal.  Clearly, step 4 is most efficient right where it is.

> 
> Then, why does this notify_one() need to be outside of the protection of the
> parkingLock?  Ditto for the notifies in the other 2 unpark functions.  The
> waking thread can't do anything until its parkingLock is released.  If it
> doesn't make a difference (as I see it), then I'd prefer to do the notify
> while holding the parkingLock as that's the normal practice with how
> condition variables are used.

To avoid waking up a thread while the lock is still held.  It's recommended practice for C++11 locking code.

> 
> > Tools/TestWebKitAPI/Tests/WTF/Lock.cpp:84
> > +        for (unsigned i = 2; i--;) {
> > +            locks[threadGroupIndex].lock();
> > +            locks[threadGroupIndex].unlock();
> > +        }
> 
> From what I can tell, the algorithm for clearing the hasParkedBit is exact
> and not racy. Above, we waited for all the contending threads to complete. 
> That means they should have unlocked the locks.  Why do you need to do this
> lock and unlock pair at all?

It's not exact.  A queue attached to a Bucket is actually a random merging of queues for all addresses that hash to that Bucket.  Enqueue works normally, but dequeue needs to search forward from the head until it finds the first node that has exactly the address we're interested in.  Therefore, for answering "are there any more threads on this queue after I dequeue this one", we have two choices:

1) Do an exact forward search to find if any of the remaining elements in the Bucket match our address.

2) Just report whether the queue has any more elements.  This may be true even if none of those are for our address.

The algorithm goes with (2).  The effect of this is that in the case of collisions we will set hasParkedBit to true even though we didn't have to.  This is OK so long as the hasParkedBit gets cleared eventually.  It will get cleared on the next uncontended lock()/unlock() because the unlock() will call unparkOne(), but unparkOne() will detect that it had not dequeued any elements.  When that happens, it passes false for mayHaveMoreThreads because of this logic:

            callback(!!threadData, threadData && mayHaveMoreThreads);

Because of this slight case of imprecision, we need exactly one lock()/unlock() cycle to reset the lock.  I decided the use two cycles in the test, because I didn't want the test to be too much of a white box.  If we did change any locking algorithm to require two cycles of locking to reset itself, then we wouldn't want this test to fail.  But, if you think it really matters to be precise, I'd be happy to make it just one lock()/unlock() cycle.
Comment 11 Filip Pizlo 2015-08-12 17:58:08 PDT
(In reply to comment #9)
> Comment on attachment 258853 [details]
> the patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=258853&action=review
> 
> r=me
> 
> > Source/WTF/wtf/ParkingLot.cpp:545
> > +        false,
> 
> This should be a named local variable or an enum, so you can tell at the
> call site that it means "do not ensure bucket".

OK.

> 
> > Source/WTF/wtf/ParkingLot.cpp:609
> > +        false,
> 
> This should be a named local variable or an enum, so you can tell at the
> call site that it means "do not ensure bucket".

OK.

> 
> > Source/WTF/wtf/ParkingLot.h:75
> > +        std::function<void(bool didUnparkThread, bool mayHaveMoreThreads)> callback);
> 
> In what sense is mayHaveMoreThreads a maybe as opposed to a yes/no?
> 
> I think you're pointing out that another thread may be trying to park right
> now. But, since I hold the parking lock and the bucket lock during this
> callback, the other thread can't park yet, and I know for certain which
> other threads are parked at this moment.
> 
> Do I understand this correctly? If so, I would rename to
> "hasMoreParkedThreads".

It's actually conservative.  From my reply to mlam:

A queue attached to a Bucket is actually a random merging of queues for all addresses that hash to that Bucket.  Enqueue works normally, but dequeue needs to search forward from the head until it finds the first node that has exactly the address we're interested in.  Therefore, for answering "are there any more threads on this queue after I dequeue this one", we have two choices:

1) Do an exact forward search to find if any of the remaining elements in the Bucket match our address.

2) Just report whether the queue has any more elements.  This may be true even if none of those are for our address.

The algorithm goes with (2).  The effect of this is that in the case of collisions we will set hasParkedBit to true even though we didn't have to.  This is OK so long as the hasParkedBit gets cleared eventually.  It will get cleared on the next uncontended lock()/unlock() because the unlock() will call unparkOne(), but unparkOne() will detect that it had not dequeued any elements.  When that happens, it passes false for mayHaveMoreThreads because of this logic:

            callback(!!threadData, threadData && mayHaveMoreThreads);

Because of this slight case of imprecision, I prefer to call it mayHaveMoreThreads, because it could be true even if no more threads are waiting.
Comment 12 Filip Pizlo 2015-08-12 18:11:29 PDT
Created attachment 258864 [details]
patch for landing

With feedback addressed.
Comment 13 Filip Pizlo 2015-08-12 18:11:50 PDT
Will wait to see if mlam is OK with the patch...
Comment 14 WebKit Commit Bot 2015-08-12 18:13:19 PDT
Attachment 258864 [details] did not pass style-queue:


ERROR: Source/WTF/wtf/ParkingLot.cpp:584:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WTF/wtf/ParkingLot.cpp:590:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WTF/wtf/Lock.cpp:97:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 3 in 9 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 15 Filip Pizlo 2015-08-12 20:48:23 PDT
Haven't heard from Mark.  Will land.  Happy to roll out if an objection comes up.
Comment 16 Filip Pizlo 2015-08-12 20:55:13 PDT
Landed in http://trac.webkit.org/changeset/188374
Comment 17 Mark Lam 2015-08-12 22:03:51 PDT
(In reply to comment #10)
> (In reply to comment #8)
...
> > > Source/WTF/wtf/ParkingLot.cpp:497
> > > +        finishFunctor(result);
> > 
> > "finishFunctor" is not very descriptive.  Can we call it something like
> > "reportMayHaveMoreThreadsFunctor" instead?
> 
> But that's not an accurate name.  It's also called when the lock is held. 
> In particular, it's the last thing we do before releasing the lock.  Hence
> "finish" is probably a decent name - short, and about as precise as you can
> get.

The reason I think that reportMayHaveMoreThreadsFunctor is a better name is because the sole purpose of this functor is to pass the bool mayHaveMoreThreads to the caller.  It is not clear from the design of dequeue() what that value represents until I saw its use in unparkOne() after it.  Perhaps another way to add more clarity here would be to rename the passed bool "result" to "hasMoreParkedThreads" or something like that.

Regardless, I don't have a strong objection here.

> > > Source/WTF/wtf/ParkingLot.cpp:598
> > > +    threadData->parkingCondition.notify_one();
> > 
...
> Right now, just after we unlocked the lock, the isHeldBit is *not* set.  The
> lock's value is either "0" or "hasParkedBit".

True.  I neglected to consider that case.  I was primarily thinking about what happens if we have contention, and the parking action that can follow.

...
> > The only guarantee that I think we need is for step 4 (setting the new lock
> > bits) to take place before we unlock the bucket in step 5.  As long as we
> > have that, we can guarantee that the hasParkedBit in the lock word / byte is
> > consistent with whether we have waiting threads in the bucket or not.
> 
> I'm not sure what you're suggesting, but I'm pretty sure you're wrong.  Are
> you saying that you believe that the algorithm is unnecessarily complex? 
> Are you saying that it could be made faster?  Are you saying that it's wrong?
...

Not suggesting anything.  I was surprised to see the change that moved the notifying of the parkingCondition out or the guard of the parkingLock, and I was wondering what it could possibly have to do with the determination of the lock bits when we do the unparkOne().  Hence, I tried to methodically reason through the flow of things to see if I can figure why you moved the notify out.

Turns out ... 

> To avoid waking up a thread while the lock is still held.  It's recommended
> practice for C++11 locking code.

... turns out, it has nothing to do with the value of the lock bits.  It's just an optimization to avoid the waking thread doing more work to block on the parkingLock and then unblocking again shortly after.  I hadn't consider this aspect of it, but this is the answer I was looking for here.  Thanks for the explanation.
 
> > 
> > > Tools/TestWebKitAPI/Tests/WTF/Lock.cpp:84
> > > +        for (unsigned i = 2; i--;) {
> > > +            locks[threadGroupIndex].lock();
> > > +            locks[threadGroupIndex].unlock();
> > > +        }
> > 
> > From what I can tell, the algorithm for clearing the hasParkedBit is exact
> > and not racy. Above, we waited for all the contending threads to complete. 
> > That means they should have unlocked the locks.  Why do you need to do this
> > lock and unlock pair at all?
> 
> It's not exact.  A queue attached to a Bucket is actually a random merging
> of queues for all addresses that hash to that Bucket.  Enqueue works
> normally, but dequeue needs to search forward from the head until it finds
> the first node that has exactly the address we're interested in.  Therefore,
> for answering "are there any more threads on this queue after I dequeue this
> one", we have two choices:
> 
> 1) Do an exact forward search to find if any of the remaining elements in
> the Bucket match our address.
> 
> 2) Just report whether the queue has any more elements.  This may be true
> even if none of those are for our address.
> 
> The algorithm goes with (2).  The effect of this is that in the case of
> collisions we will set hasParkedBit to true even though we didn't have to. 
> This is OK so long as the hasParkedBit gets cleared eventually.  It will get
> cleared on the next uncontended lock()/unlock() because the unlock() will
> call unparkOne(), but unparkOne() will detect that it had not dequeued any
> elements.  When that happens, it passes false for mayHaveMoreThreads because
> of this logic:
> 
>             callback(!!threadData, threadData && mayHaveMoreThreads);
> 
> Because of this slight case of imprecision, we need exactly one
> lock()/unlock() cycle to reset the lock.  I decided the use two cycles in
> the test, because I didn't want the test to be too much of a white box.  If
> we did change any locking algorithm to require two cycles of locking to
> reset itself, then we wouldn't want this test to fail.  But, if you think it
> really matters to be precise, I'd be happy to make it just one
> lock()/unlock() cycle.

Nice.  Thanks for explaining.  The part about the bucket being a merged queue (due to the hashing of addresses) escaped my mind.  It all makes sense now.  I would have preferred that this detail about the imprecision of hasParkedBit be documented in the comment for the new unparkOne(), along with the detail of how it is self optimizing on the next uncontended lock + unlock on that address ... basically what you wrote up here.  The fact that both Geoff and I missed it suggests that it is probably not obvious enough to the casual reader of the code.  Perhaps you tried to explain this in the ChangeLog already, but if so, I didn't pick it up from there.

Anyway, I'm glad that it's at least documented here in the comments of this bugzilla for future reference.

(In reply to comment #15)
> Haven't heard from Mark.  Will land.  Happy to roll out if an objection
> comes up.

No objection from me.  I agree that the implementation is sound.
Comment 18 Filip Pizlo 2015-08-12 22:08:45 PDT
(In reply to comment #17)
> (In reply to comment #10)
> > (In reply to comment #8)
> ...
> > > > Source/WTF/wtf/ParkingLot.cpp:497
> > > > +        finishFunctor(result);
> > > 
> > > "finishFunctor" is not very descriptive.  Can we call it something like
> > > "reportMayHaveMoreThreadsFunctor" instead?
> > 
> > But that's not an accurate name.  It's also called when the lock is held. 
> > In particular, it's the last thing we do before releasing the lock.  Hence
> > "finish" is probably a decent name - short, and about as precise as you can
> > get.
> 
> The reason I think that reportMayHaveMoreThreadsFunctor is a better name is
> because the sole purpose of this functor is to pass the bool
> mayHaveMoreThreads to the caller.  

unparkOne() and dequeue() already returned a bool that told you whether there are more threads left, so this functor's sole purpose is definitely not to pass mayHaveMoreThreads.

The purpose of the functor is that it runs while the queue is locked.  The fact that it also happens to tell you some stats about the queue is less interesting.

> It is not clear from the design of
> dequeue() what that value represents until I saw its use in unparkOne()
> after it.  Perhaps another way to add more clarity here would be to rename
> the passed bool "result" to "hasMoreParkedThreads" or something like that.
> 
> Regardless, I don't have a strong objection here.
> 
> > > > Source/WTF/wtf/ParkingLot.cpp:598
> > > > +    threadData->parkingCondition.notify_one();
> > > 
> ...
> > Right now, just after we unlocked the lock, the isHeldBit is *not* set.  The
> > lock's value is either "0" or "hasParkedBit".
> 
> True.  I neglected to consider that case.  I was primarily thinking about
> what happens if we have contention, and the parking action that can follow.
> 
> ...
> > > The only guarantee that I think we need is for step 4 (setting the new lock
> > > bits) to take place before we unlock the bucket in step 5.  As long as we
> > > have that, we can guarantee that the hasParkedBit in the lock word / byte is
> > > consistent with whether we have waiting threads in the bucket or not.
> > 
> > I'm not sure what you're suggesting, but I'm pretty sure you're wrong.  Are
> > you saying that you believe that the algorithm is unnecessarily complex? 
> > Are you saying that it could be made faster?  Are you saying that it's wrong?
> ...
> 
> Not suggesting anything.  I was surprised to see the change that moved the
> notifying of the parkingCondition out or the guard of the parkingLock, and I
> was wondering what it could possibly have to do with the determination of
> the lock bits when we do the unparkOne().  Hence, I tried to methodically
> reason through the flow of things to see if I can figure why you moved the
> notify out.

That was just me realizing that I should apply this style here.  Ggaren had suggested calling notify after unlocking while reviewing a different patch.

> 
> Turns out ... 
> 
> > To avoid waking up a thread while the lock is still held.  It's recommended
> > practice for C++11 locking code.
> 
> ... turns out, it has nothing to do with the value of the lock bits.  It's
> just an optimization to avoid the waking thread doing more work to block on
> the parkingLock and then unblocking again shortly after.  I hadn't consider
> this aspect of it, but this is the answer I was looking for here.  Thanks
> for the explanation.
>  
> > > 
> > > > Tools/TestWebKitAPI/Tests/WTF/Lock.cpp:84
> > > > +        for (unsigned i = 2; i--;) {
> > > > +            locks[threadGroupIndex].lock();
> > > > +            locks[threadGroupIndex].unlock();
> > > > +        }
> > > 
> > > From what I can tell, the algorithm for clearing the hasParkedBit is exact
> > > and not racy. Above, we waited for all the contending threads to complete. 
> > > That means they should have unlocked the locks.  Why do you need to do this
> > > lock and unlock pair at all?
> > 
> > It's not exact.  A queue attached to a Bucket is actually a random merging
> > of queues for all addresses that hash to that Bucket.  Enqueue works
> > normally, but dequeue needs to search forward from the head until it finds
> > the first node that has exactly the address we're interested in.  Therefore,
> > for answering "are there any more threads on this queue after I dequeue this
> > one", we have two choices:
> > 
> > 1) Do an exact forward search to find if any of the remaining elements in
> > the Bucket match our address.
> > 
> > 2) Just report whether the queue has any more elements.  This may be true
> > even if none of those are for our address.
> > 
> > The algorithm goes with (2).  The effect of this is that in the case of
> > collisions we will set hasParkedBit to true even though we didn't have to. 
> > This is OK so long as the hasParkedBit gets cleared eventually.  It will get
> > cleared on the next uncontended lock()/unlock() because the unlock() will
> > call unparkOne(), but unparkOne() will detect that it had not dequeued any
> > elements.  When that happens, it passes false for mayHaveMoreThreads because
> > of this logic:
> > 
> >             callback(!!threadData, threadData && mayHaveMoreThreads);
> > 
> > Because of this slight case of imprecision, we need exactly one
> > lock()/unlock() cycle to reset the lock.  I decided the use two cycles in
> > the test, because I didn't want the test to be too much of a white box.  If
> > we did change any locking algorithm to require two cycles of locking to
> > reset itself, then we wouldn't want this test to fail.  But, if you think it
> > really matters to be precise, I'd be happy to make it just one
> > lock()/unlock() cycle.
> 
> Nice.  Thanks for explaining.  The part about the bucket being a merged
> queue (due to the hashing of addresses) escaped my mind.  It all makes sense
> now.  I would have preferred that this detail about the imprecision of
> hasParkedBit be documented in the comment for the new unparkOne(), along
> with the detail of how it is self optimizing on the next uncontended lock +
> unlock on that address ... basically what you wrote up here.  The fact that
> both Geoff and I missed it suggests that it is probably not obvious enough
> to the casual reader of the code.  Perhaps you tried to explain this in the
> ChangeLog already, but if so, I didn't pick it up from there.

I think I added that documentation to the ParkingLot.h header.

> 
> Anyway, I'm glad that it's at least documented here in the comments of this
> bugzilla for future reference.
> 
> (In reply to comment #15)
> > Haven't heard from Mark.  Will land.  Happy to roll out if an objection
> > comes up.
> 
> No objection from me.  I agree that the implementation is sound.
Comment 19 Geoffrey Garen 2015-08-13 10:30:38 PDT
> A queue attached to a Bucket is actually a random merging of queues for all
> addresses that hash to that Bucket.

Got it. I agree with "mayHave".