Bug 71718 - Apparent threading bug causes intermittent segfaults
Summary: Apparent threading bug causes intermittent segfaults
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Qt (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Linux
: P2 Critical
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-07 12:18 PST by Andrew Wason
Modified: 2011-11-21 14:53 PST (History)
3 users (show)

See Also:


Attachments
sample app to reproduce the crash (3.03 KB, application/octet-stream)
2011-11-07 12:18 PST, Andrew Wason
no flags Details
gdb backtrace (23.97 KB, text/plain)
2011-11-07 12:20 PST, Andrew Wason
no flags Details
valgrind log (30.88 KB, text/plain)
2011-11-08 08:16 PST, Andrew Wason
no flags Details
fix for static local race condition (1.86 KB, patch)
2011-11-09 08:20 PST, Andrew Wason
hausmann: review-
hausmann: commit-queue-
Details | Formatted Diff | Diff
helgrind log (14.18 KB, text/plain)
2011-11-09 08:58 PST, Andrew Wason
no flags Details
fix for static local race condition (2.24 KB, patch)
2011-11-11 10:59 PST, Andrew Wason
ap: review+
ap: commit-queue-
Details | Formatted Diff | Diff
do not change initializeThreading (1.92 KB, patch)
2011-11-11 12:22 PST, Andrew Wason
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Wason 2011-11-07 12:18:33 PST
Created attachment 113908 [details]
sample app to reproduce the crash

Web pages using WebWorkers occasionally crash. There seems to be some thread synchronization issue.

A reduced testcase is attached. Run the included loop.sh script to run the testcase repeatedly in a loop, usually after 10 or so runs it will crash. You can also build with "make DEFINES=-DSUSPEND" which will install a SIGSEGV handler and suspend the process on SEGV - allowing gdb to be attached to the running process at the point the SEGV occurs.
Comment 1 Andrew Wason 2011-11-07 12:20:11 PST
Created attachment 113909 [details]
gdb backtrace

This is the output of "thread apply all bt" on the sample app after SIGSEGV. The app was compiled with -Dsuspend so it suspended itself upon receiving SIGSEGV.
Comment 2 Andrew Wason 2011-11-08 08:16:55 PST
Created attachment 114075 [details]
valgrind log

I built webkit with ENABLE_JIT=0 so that valgrind can be used, and after repeated runs was able to catch the error in the attached valgrind log.

The problem seems to be that the static Mutex in Source/JavaScriptCore/wtf/qt/ThreadingQt.cpp:threadMapMutex() is sometimes destroyed at process exit while the WorkerThreads are shutting down. The WorkerThreads are using the destroyed threadMapMutex while they are shutting down.
Comment 3 Andrew Wason 2011-11-09 08:20:42 PST
Created attachment 114287 [details]
fix for static local race condition

This fixes the problem shown in the valgrind log. It allocates the threadMapMutex() on the stack so it's destructor doesn't run at process shutdown while outstanding WorkerThreads are still accessing it.

I did the same thing with threadMap() since it has a similar race condition. Also I initialize threadMap in initializeThreading() to avoid another race condition initializing it.

This patch seems to fix the problem evident in the valgrind log, but there are remaining problems that can still cause a crash when using WebWorkers. I will attach a helgrind log with more information shortly.
Comment 4 WebKit Review Bot 2011-11-09 08:27:55 PST
Attachment 114287 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/wtf/qt/ThreadingQt.cpp:90:  More than one command on the same line  [whitespace/newline] [4]
Source/JavaScriptCore/wtf/qt/ThreadingQt.cpp:96:  More than one command on the same line  [whitespace/newline] [4]
Total errors found: 2 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 5 Alexey Proskuryakov 2011-11-09 08:57:35 PST
Comment on attachment 114287 [details]
fix for static local race condition

View in context: https://bugs.webkit.org/attachment.cgi?id=114287&action=review

The idea of the fix is correct.

Some time ago, we discussed having so many threading back-ends, and consensus was that ThreadingQt and other library specific implementations should be removed from WebKit. We can require either either POSIX threads or Win32 threads, and not waste time maintaining other back-ends.

>> Source/JavaScriptCore/wtf/qt/ThreadingQt.cpp:90
>> +    static Mutex* mutex = new Mutex();;
> 
> More than one command on the same line  [whitespace/newline] [4]

I suggest just copying this code from ThreadingPhreads:

    DEFINE_STATIC_LOCAL(Mutex, mutex, ());

>> Source/JavaScriptCore/wtf/qt/ThreadingQt.cpp:96
>> +    static HashMap<ThreadIdentifier, QThread*>* map = new HashMap<ThreadIdentifier, QThread*>();;
> 
> More than one command on the same line  [whitespace/newline] [4]

Ditto.
Comment 6 Andrew Wason 2011-11-09 08:58:15 PST
Created attachment 114296 [details]
helgrind log

This is the tail end of a valgrind helgrind log during a crash.

The WebWorker QThreads can outlive the QApplication instance, and even various Qt global static state. In this case the global static Qt internal global_callback_table() is being destroyed while outstanding QThreads are accessing it as part of their shutdown.

Allowing the QThreads to outlive QApplication seems like a problem. Perhaps we should QThread::wait for the threads in QCoreApplication::aboutToQuit?
Comment 7 Simon Hausmann 2011-11-09 11:09:47 PST
Comment on attachment 114287 [details]
fix for static local race condition

Good catch!

However, I agree with Alexey, please use the DEFINE_STATIC_LOCAL pattern.
Comment 8 Andrew Wason 2011-11-09 11:32:58 PST
(In reply to comment #5)
> 
> Some time ago, we discussed having so many threading back-ends, and consensus was that 
> ThreadingQt and other library specific implementations should be removed from WebKit.
> We can require either either POSIX threads or Win32 threads, and not waste time maintaining
> other back-ends.

In that case, perhaps I should do that instead of this patch? Especially since this patch doesn't fix the problem of QThreads outliving QApplication and other global internal Qt state, leading to crashes.

Using pthreads/win32 I think would fix all these issues.
Comment 9 Alexey Proskuryakov 2011-11-09 11:51:59 PST
As far as I'm concerned, dropping ThreadingQt would be preferable. Please discuss that with folks working on Qt port.
Comment 10 Andrew Wason 2011-11-11 10:59:56 PST
Created attachment 114733 [details]
fix for static local race condition 

Fix the problem in the same way ThreadingPthread does. I opened bug 72155 for switching the Qt threading back-end to pthreads/Win32 since I think that patch will take longer to review.
Comment 11 Alexey Proskuryakov 2011-11-11 11:40:41 PST
Comment on attachment 114733 [details]
fix for static local race condition 

View in context: https://bugs.webkit.org/attachment.cgi?id=114733&action=review

r=me on the condition that initializeThreading() change is removed before landing.

> Source/JavaScriptCore/wtf/qt/ThreadingQt.cpp:152
> +        threadMap();

This change is not mentioned in ChangeLog, is unnecessary, and is not something ThreadingPthreads does. All accesses to threadMap are protected by threadMapMutex, so there is no need to create it upfront.
Comment 12 Andrew Wason 2011-11-11 12:22:55 PST
Created attachment 114752 [details]
do not change initializeThreading
Comment 13 Andrew Wason 2011-11-21 07:08:27 PST
Bug 72155 makes this obsolete.