RESOLVED FIXED Bug 97539
Broken and incorrect code in FastMalloc.cpp
https://bugs.webkit.org/show_bug.cgi?id=97539
Summary Broken and incorrect code in FastMalloc.cpp
Mark Toller
Reported 2012-09-25 01:54:26 PDT
In the pthread supported #ifdef branch in Source/WTF/wtf/FastMalloc.cpp, the method TCMalloc_PageHeap::signalScavenger() contains the following code: // m_scavengeMutex should be held before accessing m_scavengeThreadActive. ASSERT(pthread_mutex_trylock(m_scavengeMutex)); if (!m_scavengeThreadActive && shouldScavenge()) pthread_cond_signal(&m_scavengeCondition); There are a couple of problems with this: 1) The TCMalloc code can't be compiled in (without changing the build system/editing the file) if debug is enabled, and the ASSERT() macro is compiled out if DEBUG is not enabled. This means that the 'trylock' code is never compiled. 2) The pthread_mutex_trylock(m_scavengeMutex) code won't compile - as it requires a pointer to the mutex (pthread_mutex_trylock(&m_scavengeMutex);) 3) If the (fixed) code was compiled in, it will either: a) cause a deadlock if the mutex is locked successfully, as it is not unlocked afterwards. b) cause the ASSERT to fire if the mutex is not taken (which can happen). The comment indicates that we should take the mutex before checking m_scavengeThreadActive. However, this will cause us to perform a lock/unlock on the mutex for many delete operations. The worst case scenario for *not* locking the mutex is that we either : 1) signal the condvar more often than required when we get several deletes before the m_scavengeThreadActive is set to true or .... 2) We don't signal the condvar as early as possible, and wait till the next delete...
Attachments
Patch (1.93 KB, patch)
2012-09-25 02:41 PDT, Mark Toller
no flags
Patch (2.38 KB, patch)
2012-09-26 06:03 PDT, Mark Toller
no flags
Patch (3.06 KB, patch)
2012-09-27 07:02 PDT, Mark Toller
no flags
Patch (4.08 KB, patch)
2012-09-28 00:59 PDT, Mark Toller
no flags
Mark Toller
Comment 1 2012-09-25 02:41:51 PDT
Alexey Proskuryakov
Comment 2 2012-09-25 11:20:00 PDT
Comment on attachment 165565 [details] Patch I think that this code should be: - ASSERT(pthread_mutex_trylock(m_scavengeMutex)); + ASSERT(!pthread_mutex_trylock(&m_scavengeMutex)); Does such an assert make more sense?
Benjamin Poulain
Comment 3 2012-09-25 11:38:58 PDT
Part of the explanations you put in bugzilla should be in the Changelog. Your ChangeLog is too terse to be useful.
Mark Toller
Comment 4 2012-09-26 00:45:59 PDT
(In reply to comment #2) > (From update of attachment 165565 [details]) > I think that this code should be: > > - ASSERT(pthread_mutex_trylock(m_scavengeMutex)); > + ASSERT(!pthread_mutex_trylock(&m_scavengeMutex)); > > Does such an assert make more sense? No, because if that code executes it will either deadlock if the lock is acquired, or assert if it fails... And it can (and does) fail if the TCMalloc_PageHeap::scavengerThread() has locked the mutex, and not yet released it in the pthread_cond_wait() call. As Benjamin stated, I should put some of this detail in the Changelog as well. You *could* change it to an : ASSERT(pthread_mutex_lock(&m_scavengeMutex)); if (!m_scavengeThreadActive && shouldScavenge()) pthread_cond_signal(&m_scavengeCondition); ASSERT(pthread_mutex_unlock(&m_scavengeMutex)); However, if the mutex lock/unlock is *needed* then this code shouldn't be wrapped in ASSERT macros, because they are compiled out in release builds... And if they're not required for release builds, and provide no additional benefit in debug, then why have the overhead at all?
Mark Toller
Comment 5 2012-09-26 00:48:20 PDT
Sorry, that should have been : ASSERT(!pthread_mutex_lock(&m_scavengeMutex)); if (!m_scavengeThreadActive && shouldScavenge()) pthread_cond_signal(&m_scavengeCondition); ASSERT(!pthread_mutex_unlock(&m_scavengeMutex)); as the pthread_mutex_* calls return 0 on success.
Mark Toller
Comment 6 2012-09-26 06:03:18 PDT
Alexey Proskuryakov
Comment 7 2012-09-26 08:51:28 PDT
Comment on attachment 165777 [details] Patch I think that you still misunderstand the purpose of this assertion. It is there to ensure that caller has taken the lock, so pthread_mutex_lock should fail, and there is no need to unlock.
Alexey Proskuryakov
Comment 8 2012-09-26 08:53:47 PDT
> pthread_mutex_lock should fail, and there is no need to unlock. pthread_mutex_trylock*
Mark Toller
Comment 9 2012-09-26 09:30:00 PDT
(In reply to comment #7) > (From update of attachment 165777 [details]) > I think that you still misunderstand the purpose of this assertion. It is there to ensure that caller has taken the lock, so pthread_mutex_lock should fail, and there is no need to unlock. OK, if that is indeed the case then : 1) The comment "m_scavengeMutex should be held before accessing m_scavengeThreadActive." is misleading, it should be stating "the caller [of signalScavenger()] should ensure that m_scavengeMutex is locked prior to calling this method." 2) The ASSERT *will* currently fail as the only location which locks the mutex (other than this 'trylock') is the scavengerThread itself (TCMalloc_PageHeap::scavengerThread()) - so once the scavengerThread waits on the condvar (and the mutex is then unlocked), the next call to signalScavenger() *will* obtain the lock with the trylock and therefore trigger the ASSERT. I'm not sure if your comment is based on a misunderstanding of the pthread_cond_wait(&m_scavengeCondition, &m_scavengeMutex); code - this atomically waits on the condvar and unlocks the mutex.
Alexey Proskuryakov
Comment 10 2012-09-26 09:41:01 PDT
> 1) The comment "m_scavengeMutex should be held before accessing m_scavengeThreadActive." is misleading, it should be stating "the caller [of signalScavenger()] should ensure that m_scavengeMutex is locked prior to calling this method." Yes, I like your variant more. I'd have said "taken" and not "locked", but either works for me. > I'm not sure if your comment is based on a misunderstanding of the pthread_cond_wait(&m_scavengeCondition, &m_scavengeMutex); code - this atomically waits on the condvar and unlocks the mutex. I wrote this code a few years ago, and that the logic was supposed to match what's in the version that is actually used: ALWAYS_INLINE void TCMalloc_PageHeap::signalScavenger() { ASSERT(pageheap_lock.IsHeld()); ... Of course, being an unused branch, it was not tested, and didn't even compile. It's quite possible that a major cleanup of the algorithm is needed there.
Mark Toller
Comment 11 2012-09-27 07:01:34 PDT
(In reply to comment #10) > > 1) The comment "m_scavengeMutex should be held before accessing m_scavengeThreadActive." is misleading, it should be stating "the caller [of signalScavenger()] should ensure that m_scavengeMutex is locked prior to calling this method." > > Yes, I like your variant more. I'd have said "taken" and not "locked", but either works for me. > > > I'm not sure if your comment is based on a misunderstanding of the pthread_cond_wait(&m_scavengeCondition, &m_scavengeMutex); code - this atomically waits on the condvar and unlocks the mutex. > > I wrote this code a few years ago, and that the logic was supposed to match what's in the version that is actually used: > > ALWAYS_INLINE void TCMalloc_PageHeap::signalScavenger() > { > ASSERT(pageheap_lock.IsHeld()); > ... > > Of course, being an unused branch, it was not tested, and didn't even compile. It's quite possible that a major cleanup of the algorithm is needed there. Ok, I see. Actually, this branch isn't unused as such - it is used by WebkitGtk (GtkLauncher), but as I said earlier, without modifying the build/code, you can't compile using TCMalloc AND debug (so the ASSERT was never compiled in). So, in the GTK/linux variant (no DISPATCH_H and !Windows) a condvar is used to make the scavenger thread sleep. As the condvar requires a mutex, it would seem logical to use the mutex to protect m_scavengeThreadActive - however, the mutex exists only for this convar / thread, and isn't locked (like the pageheap_lock) on entry to 'signalScavenger'. I doubt the overhead of locking/unlocking the mutex on every delete call is desirable? So, how about using the pageheap_lock spinlock to protect m_scavengeThreadActive? The mutex becomes purely a side-effect of using the condvar to make the thread sleep, and brings the code more in line with the other branches...
Mark Toller
Comment 12 2012-09-27 07:02:53 PDT
WebKit Review Bot
Comment 13 2012-09-27 07:05:53 PDT
Attachment 165992 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WTF/ChangeLog', u'Source/WTF/wtf/Fa..." exit_code: 1 Source/WTF/wtf/FastMalloc.cpp:2553: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] Source/WTF/wtf/FastMalloc.cpp:2555: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] Source/WTF/wtf/FastMalloc.cpp:2558: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] Source/WTF/wtf/FastMalloc.cpp:2559: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] Source/WTF/wtf/FastMalloc.cpp:2562: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] Source/WTF/wtf/FastMalloc.cpp:2566: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] Source/WTF/wtf/FastMalloc.cpp:2567: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] Source/WTF/wtf/FastMalloc.cpp:2568: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] Source/WTF/wtf/FastMalloc.cpp:2569: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] Source/WTF/wtf/FastMalloc.cpp:2571: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] Source/WTF/wtf/FastMalloc.cpp:2573: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] Total errors found: 11 in 2 files If any of these errors are false positives, please file a bug against check-webkit-style.
Alexey Proskuryakov
Comment 14 2012-09-27 09:54:12 PDT
Thank you. I've been out of touch with this code for too long to easily review, CC'ing some potential reviewers. Please upload a new patch with style bot comments addressed, this is quite hard to read in review patch view due to those comments.
Geoffrey Garen
Comment 15 2012-09-27 10:51:40 PDT
Comment on attachment 165992 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=165992&action=review >> Source/WTF/wtf/FastMalloc.cpp:2555 >> + // m_scavengeThreadActive protected by pageheap_lock. > > Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] m_scavengeThreadActive is an atomic variable. Why do you need to lock across assignment? > Source/WTF/wtf/FastMalloc.cpp:2564 > pthread_mutex_unlock(&m_scavengeMutex); I believe this unlock is wrong, as per the pthread_cond_mutex man page: The pthread_cond_wait() function atomically unlocks the mutex argument and waits on the cond argument. Before returning control to the calling function, pthread_cond_wait() re-acquires the mutex.
Mark Toller
Comment 16 2012-09-28 00:59:24 PDT
(In reply to comment #15) > m_scavengeThreadActive is an atomic variable. Why do you need to lock across assignment? I think you're right. However, we should lock pageheap_lock before calling 'shouldScavenge()', as we only want to scavenge when really required, so it's better to wait for any current memory operation to complete before checking. In this case, we may as well set m_scavengeThreadActive = false while still holding the lock. Do we need to re-lock before setting m_scavengeThreadActive = true ? No. In fact, it's probably better to do so outside of the lock, as this could prevent an unnecessary signal of the condvar (if we waited for a current delete op holding the spinlock to complete, which calls signalScavenger()). > > > Source/WTF/wtf/FastMalloc.cpp:2564 > > pthread_mutex_unlock(&m_scavengeMutex); > > I believe this unlock is wrong, as per the pthread_cond_mutex man page: > > > The pthread_cond_wait() function atomically unlocks the mutex argument > and waits on the cond argument. Before returning control to the calling > function, pthread_cond_wait() re-acquires the mutex. If we don't unlock the mutex (type is PTHREAD_MUTEX_NORMAL) which we now hold after exiting the pthread_cond_wait, then next time around the loop we will deadlock: If the mutex type is PTHREAD_MUTEX_NORMAL, deadlock detection is not provided. Attempting to relock the mutex causes deadlock.
Mark Toller
Comment 17 2012-09-28 00:59:55 PDT
Geoffrey Garen
Comment 18 2012-10-10 18:08:04 PDT
Comment on attachment 166162 [details] Patch r=me
WebKit Review Bot
Comment 19 2012-10-11 08:53:58 PDT
Comment on attachment 166162 [details] Patch Clearing flags on attachment: 166162 Committed r131066: <http://trac.webkit.org/changeset/131066>
WebKit Review Bot
Comment 20 2012-10-11 08:54:02 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.