RESOLVED FIXED141990
MachineThreads::Thread clean up has a use after free race condition
https://bugs.webkit.org/show_bug.cgi?id=141990
Summary MachineThreads::Thread clean up has a use after free race condition
Mark Lam
Reported 2015-02-24 15:54:06 PST
Before r179753 (https://bugs.webkit.org/show_bug.cgi?id=141317), there was a race condition that could result in a small window for a use after free scenario. After r179753, there is a memory leak. We will roll out r179753 here and implement a fix for the race condition using a different approach.
Attachments
patch for rolling out r179753 (11.04 KB, patch)
2015-02-24 16:04 PST, Mark Lam
no flags
the patch. (8.49 KB, patch)
2015-02-24 17:48 PST, Mark Lam
msaboff: review+
JSC benchmark results (51.06 KB, text/plain)
2015-02-24 17:49 PST, Mark Lam
no flags
patch for rolling out r180602 and associated revs. (11.06 KB, patch)
2015-02-26 10:54 PST, Mark Lam
no flags
Approach 3: synchronize the construction and destruction of MachineThreads. (6.60 KB, patch)
2015-02-26 14:47 PST, Mark Lam
mark.lam: review-
patch with fix for Filip (9.08 KB, patch)
2015-02-26 15:18 PST, Mark Lam
fpizlo: review+
Mark Lam
Comment 1 2015-02-24 16:04:55 PST
Created attachment 247275 [details] patch for rolling out r179753
Mark Lam
Comment 2 2015-02-24 16:05:48 PST
Radar WebKit Bug Importer
Comment 3 2015-02-24 16:07:20 PST
Mark Lam
Comment 4 2015-02-24 17:10:09 PST
The code for _pthread_tsd_cleanup_key() looks like this: _pthread_tsd_cleanup_key(pthread_t self, pthread_key_t key) { void (*destructor)(void *); if (_pthread_key_get_destructor(key, &destructor)) { void **ptr = &self->tsd[key]; void *value = *ptr; // At this point, this thread has cached the destructor and value (which is a MachineThreads*). // If the VM gets destructed (along with its MachineThreads registry) by another thread, // then this thread will have no way of knowing that the MachineThreads* is now pointing // to freed memory. Calling the destructor below will therefore result in a use after free // scenario when it tries to access the MachineThreads' data members. if (value) { *ptr = NULL; if (destructor) { destructor(value); } } } } The solution is simply to change MachineThreads from a per VM thread registry to a process global singleton thread registry i.e. the MachineThreads registry is now immortal and we cannot have a use after free scenario. The cost of this change is that all VM instances will scan stacks of all threads ever touched by a VM, and not just those that touched this specific VM. However, stacks tend to be shallow. Hence, those additional scans will tend to be cheap. Secondly, it is not common for there to be multiple JSC VMs in use concurrently on multiple threads. Hence, this cost should rarely manifest in real world applications.
Mark Lam
Comment 5 2015-02-24 17:48:30 PST
Created attachment 247286 [details] the patch.
Mark Lam
Comment 6 2015-02-24 17:49:39 PST
Created attachment 247287 [details] JSC benchmark results A benchmark run shows perf to be neutral with this change. Results attached.
Michael Saboff
Comment 7 2015-02-24 18:15:11 PST
Comment on attachment 247286 [details] the patch. r=me. Be sure to run the worker tests.
Mark Lam
Comment 8 2015-02-24 18:33:39 PST
My local run of the layout test and JSC regression tests all passed without regressions, and I had landed the patch just before I saw the EWS bot complaints. I will re-run the layout test with -2 to see if there's an issue. If so, I'll rollout. For the record, landed in r180602: <http://trac.webkit.org/r180602>.
Mark Lam
Comment 9 2015-02-24 18:54:49 PST
I don't see any regressions with the wk2 run of the layout tests either. I'll consider this bug resolved.
Alexey Proskuryakov
Comment 10 2015-02-24 21:58:39 PST
Build fixes: http://trac.webkit.org/changeset/180608 http://trac.webkit.org/changeset/180613 http://trac.webkit.org/changeset/180617 And it looks like this broke ASan tests. This is probably a false positive, but we need to find a way to fix this ASAP. #0 0x10849737b in wrap_memcpy (/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.2.0/lib/darwin/libclang_rt.asan_osx_dynamic.dylib+0x3837b) #1 0x112bccb11 in _ZN3JSC14MachineThreads23tryCopyOtherThreadStackEPNS0_6ThreadEPvmPm (/Volumes/Data/slave/yosemite-asan-production-wk2-tests/build/buildToTest/JavaScriptCore.framework/Versions/A/JavaScriptCore+0x9c9b11) #2 0x112bcced1 in _ZN3JSC14MachineThreads24tryCopyOtherThreadStacksERN3WTF6LockerINS1_5MutexEEEPvmPm (/Volumes/Data/slave/yosemite-asan-production-wk2-tests/build/buildToTest/JavaScriptCore.framework/Versions/A/JavaScriptCore+0x9c9ed1) #3 0x112bcd1c5 in _ZN3JSC14MachineThreads23gatherConservativeRootsERNS_17ConservativeRootsERNS_17JITStubRoutineSetERNS_12CodeBlockSetEPvRA37_i (/Volumes/Data/slave/yosemite-asan-production-wk2-tests/build/buildToTest/JavaScriptCore.framework/Versions/A/JavaScriptCore+0x9ca1c5) #4 0x112960236 in _ZN3JSC4Heap9markRootsEd (/Volumes/Data/slave/yosemite-asan-production-wk2-tests/build/buildToTest/JavaScriptCore.framework/Versions/A/JavaScriptCore+0x75d236) #5 0x112962185 in _ZN3JSC4Heap7collectENS_13HeapOperationE (/Volumes/Data/slave/yosemite-asan-production-wk2-tests/build/buildToTest/JavaScriptCore.framework/Versions/A/JavaScriptCore+0x75f185)
Mark Lam
Comment 11 2015-02-24 22:38:39 PST
(In reply to comment #10) > And it looks like this broke ASan tests. This is probably a false positive, > but we need to find a way to fix this ASAP. I’m looking into it.
David Kilzer (:ddkilzer)
Comment 12 2015-02-25 07:29:25 PST
Is ASan just complaining about use of overlapping memcpy? That is safe to ignore on Darwin platforms because the undefined behavior is well-defined to do the same thing as memmove. However, this will affect other platforms where overlapping memcpy behavior is NOT well-defined (perhaps Windows and Linux), so we should still fix this.
David Kilzer (:ddkilzer)
Comment 13 2015-02-25 07:31:51 PST
(In reply to comment #12) > Is ASan just complaining about use of overlapping memcpy? That is safe to > ignore on Darwin platforms because the undefined behavior is well-defined to > do the same thing as memmove. > > However, this will affect other platforms where overlapping memcpy behavior > is NOT well-defined (perhaps Windows and Linux), so we should still fix this. Nope, that's not the issue. Perhaps the ASan test bots are not using the blacklist? Tools/asan/webkit-asan-ignore.txt This particular issue was supposedly fixed here: ASan does not like JSC::MachineThreads::tryCopyOtherThreadStack https://bugs.webkit.org/show_bug.cgi?id=141672
Mark Lam
Comment 14 2015-02-25 08:04:20 PST
Strange, I can't reproduce this ASan issue with a local build at all.
Alexey Proskuryakov
Comment 15 2015-02-25 10:26:42 PST
The blacklist would only help if the violation was detected in tryCopyOtherThreadStack, but it is detected in memcpy. As far as I can tell, there is simply no way to use memcpy on potentially poisoned regions, like the whole stack.
Mark Lam
Comment 16 2015-02-25 16:34:22 PST
Alexey is right that ASan was not happy with JSC memcpy'ing the stack to be GC scanned. Strictly speaking, this is the issue reported in https://webkit.org/b/141672. I've fixed it by introducing our own asanUnsafeMemcpy() to be used in tryCopyOtherThreadStack(), and telling ASan to ignore asanUnsafeMemcpy(). The fix was landed in r180649: <http://trac.webkit.org/r180649> for https://webkit.org/b/141672. There is no further issue. Closing this bug.
Carlos Alberto Lopez Perez
Comment 17 2015-02-26 04:31:05 PST
(In reply to comment #8) > For the record, landed in r180602: <http://trac.webkit.org/r180602>. (In reply to comment #12) > However, this will affect other platforms where overlapping memcpy behavior > is NOT well-defined (perhaps Windows and Linux), so we should still fix this. This seems to have cause a regression on both GTK+ and EFL. Check bug 142041
Mark Lam
Comment 18 2015-02-26 08:53:16 PST
(In reply to comment #17) > (In reply to comment #8) > > For the record, landed in r180602: <http://trac.webkit.org/r180602>. > > (In reply to comment #12) > > However, this will affect other platforms where overlapping memcpy behavior > > is NOT well-defined (perhaps Windows and Linux), so we should still fix this. > > This seems to have cause a regression on both GTK+ and EFL. Check bug 142041 There may be a regression, but it should be about overlapping memory. That is because tryCopyOtherThreadStack() doesn’t memcpy to overlapping memory.
Mark Lam
Comment 19 2015-02-26 09:05:36 PST
(In reply to comment #18) > There may be a regression, but it should be about overlapping memory. That > is because tryCopyOtherThreadStack() doesn’t memcpy to overlapping memory. typo: … it should *not* be about overlapping memory.
Filip Pizlo
Comment 20 2015-02-26 10:05:04 PST
Am I understanding right that after this fix, every GC will scan every thread ever known to any VM? I realize that workers suck and all, but this seems like we're punishing them a bit too much. Moreover, JSC benchmark results aren't going to tell us much here; to my knowledge none of our benchmarks exercise workers. We could run the *real* pdfjs (not the Octane benchmark but the actual app/framework), which uses workers, to see what the effects of this are.
Mark Lam
Comment 21 2015-02-26 10:54:03 PST
Created attachment 247432 [details] patch for rolling out r180602 and associated revs.
Mark Lam
Comment 22 2015-02-26 11:38:46 PST
Re-opening to use a different solution that won’t add any perf impact when worker threads are present.
Mark Lam
Comment 23 2015-02-26 11:45:17 PST
r180602 (and associated revs) rolled out in r180690: <http://trac.webkit.org/r180690>.
Mark Lam
Comment 24 2015-02-26 14:47:34 PST
Created attachment 247448 [details] Approach 3: synchronize the construction and destruction of MachineThreads.
Mark Lam
Comment 25 2015-02-26 14:55:12 PST
Comment on attachment 247448 [details] Approach 3: synchronize the construction and destruction of MachineThreads. Filip pointed out there's an ABA problem. Will fix.
Mark Lam
Comment 26 2015-02-26 15:18:01 PST
Created attachment 247451 [details] patch with fix for Filip
Mark Lam
Comment 27 2015-02-26 17:27:52 PST
EWS bots are happy build for all ports. A local run of the fast/workers tests on a GTK build does not show the regressions from the previous implementation. Landed in r180716: <http://trac.webkit.org/r180716>.
Csaba Osztrogonác
Comment 28 2015-02-27 03:37:16 PST
(In reply to comment #27) > EWS bots are happy build for all ports. A local run of the fast/workers > tests on a GTK build does not show the regressions from the previous > implementation. > > Landed in r180716: <http://trac.webkit.org/r180716>. Thanks for the fix, everything works fine on EFL and GTK too.
Note You need to log in before you can comment on or make changes to this bug.