Summary: | MachineThreads::Thread clean up has a use after free race condition | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Mark Lam <mark.lam> | ||||||||||||||
Component: | JavaScriptCore | Assignee: | Mark Lam <mark.lam> | ||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||
Severity: | Normal | CC: | ap, clopez, ddkilzer, fpizlo, ggaren, mhahnenb, mmirman, msaboff, oliver, ossy, webkit-bug-importer | ||||||||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||
OS: | Unspecified | ||||||||||||||||
Bug Depends on: | 142041 | ||||||||||||||||
Bug Blocks: | |||||||||||||||||
Attachments: |
|
Description
Mark Lam
2015-02-24 15:54:06 PST
Created attachment 247275 [details] patch for rolling out r179753 r179753 was rolled out in r180591: <http://trac.webkit.org/r180591>. The code for _pthread_tsd_cleanup_key() looks like this: _pthread_tsd_cleanup_key(pthread_t self, pthread_key_t key) { void (*destructor)(void *); if (_pthread_key_get_destructor(key, &destructor)) { void **ptr = &self->tsd[key]; void *value = *ptr; // At this point, this thread has cached the destructor and value (which is a MachineThreads*). // If the VM gets destructed (along with its MachineThreads registry) by another thread, // then this thread will have no way of knowing that the MachineThreads* is now pointing // to freed memory. Calling the destructor below will therefore result in a use after free // scenario when it tries to access the MachineThreads' data members. if (value) { *ptr = NULL; if (destructor) { destructor(value); } } } } The solution is simply to change MachineThreads from a per VM thread registry to a process global singleton thread registry i.e. the MachineThreads registry is now immortal and we cannot have a use after free scenario. The cost of this change is that all VM instances will scan stacks of all threads ever touched by a VM, and not just those that touched this specific VM. However, stacks tend to be shallow. Hence, those additional scans will tend to be cheap. Secondly, it is not common for there to be multiple JSC VMs in use concurrently on multiple threads. Hence, this cost should rarely manifest in real world applications. Created attachment 247286 [details]
the patch.
Created attachment 247287 [details]
JSC benchmark results
A benchmark run shows perf to be neutral with this change. Results attached.
Comment on attachment 247286 [details]
the patch.
r=me. Be sure to run the worker tests.
My local run of the layout test and JSC regression tests all passed without regressions, and I had landed the patch just before I saw the EWS bot complaints. I will re-run the layout test with -2 to see if there's an issue. If so, I'll rollout. For the record, landed in r180602: <http://trac.webkit.org/r180602>. I don't see any regressions with the wk2 run of the layout tests either. I'll consider this bug resolved. Build fixes: http://trac.webkit.org/changeset/180608 http://trac.webkit.org/changeset/180613 http://trac.webkit.org/changeset/180617 And it looks like this broke ASan tests. This is probably a false positive, but we need to find a way to fix this ASAP. #0 0x10849737b in wrap_memcpy (/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.2.0/lib/darwin/libclang_rt.asan_osx_dynamic.dylib+0x3837b) #1 0x112bccb11 in _ZN3JSC14MachineThreads23tryCopyOtherThreadStackEPNS0_6ThreadEPvmPm (/Volumes/Data/slave/yosemite-asan-production-wk2-tests/build/buildToTest/JavaScriptCore.framework/Versions/A/JavaScriptCore+0x9c9b11) #2 0x112bcced1 in _ZN3JSC14MachineThreads24tryCopyOtherThreadStacksERN3WTF6LockerINS1_5MutexEEEPvmPm (/Volumes/Data/slave/yosemite-asan-production-wk2-tests/build/buildToTest/JavaScriptCore.framework/Versions/A/JavaScriptCore+0x9c9ed1) #3 0x112bcd1c5 in _ZN3JSC14MachineThreads23gatherConservativeRootsERNS_17ConservativeRootsERNS_17JITStubRoutineSetERNS_12CodeBlockSetEPvRA37_i (/Volumes/Data/slave/yosemite-asan-production-wk2-tests/build/buildToTest/JavaScriptCore.framework/Versions/A/JavaScriptCore+0x9ca1c5) #4 0x112960236 in _ZN3JSC4Heap9markRootsEd (/Volumes/Data/slave/yosemite-asan-production-wk2-tests/build/buildToTest/JavaScriptCore.framework/Versions/A/JavaScriptCore+0x75d236) #5 0x112962185 in _ZN3JSC4Heap7collectENS_13HeapOperationE (/Volumes/Data/slave/yosemite-asan-production-wk2-tests/build/buildToTest/JavaScriptCore.framework/Versions/A/JavaScriptCore+0x75f185) (In reply to comment #10) > And it looks like this broke ASan tests. This is probably a false positive, > but we need to find a way to fix this ASAP. I’m looking into it. Is ASan just complaining about use of overlapping memcpy? That is safe to ignore on Darwin platforms because the undefined behavior is well-defined to do the same thing as memmove. However, this will affect other platforms where overlapping memcpy behavior is NOT well-defined (perhaps Windows and Linux), so we should still fix this. (In reply to comment #12) > Is ASan just complaining about use of overlapping memcpy? That is safe to > ignore on Darwin platforms because the undefined behavior is well-defined to > do the same thing as memmove. > > However, this will affect other platforms where overlapping memcpy behavior > is NOT well-defined (perhaps Windows and Linux), so we should still fix this. Nope, that's not the issue. Perhaps the ASan test bots are not using the blacklist? Tools/asan/webkit-asan-ignore.txt This particular issue was supposedly fixed here: ASan does not like JSC::MachineThreads::tryCopyOtherThreadStack https://bugs.webkit.org/show_bug.cgi?id=141672 Strange, I can't reproduce this ASan issue with a local build at all. The blacklist would only help if the violation was detected in tryCopyOtherThreadStack, but it is detected in memcpy. As far as I can tell, there is simply no way to use memcpy on potentially poisoned regions, like the whole stack. Alexey is right that ASan was not happy with JSC memcpy'ing the stack to be GC scanned. Strictly speaking, this is the issue reported in https://webkit.org/b/141672. I've fixed it by introducing our own asanUnsafeMemcpy() to be used in tryCopyOtherThreadStack(), and telling ASan to ignore asanUnsafeMemcpy(). The fix was landed in r180649: <http://trac.webkit.org/r180649> for https://webkit.org/b/141672. There is no further issue. Closing this bug. (In reply to comment #8) > For the record, landed in r180602: <http://trac.webkit.org/r180602>. (In reply to comment #12) > However, this will affect other platforms where overlapping memcpy behavior > is NOT well-defined (perhaps Windows and Linux), so we should still fix this. This seems to have cause a regression on both GTK+ and EFL. Check bug 142041 (In reply to comment #17) > (In reply to comment #8) > > For the record, landed in r180602: <http://trac.webkit.org/r180602>. > > (In reply to comment #12) > > However, this will affect other platforms where overlapping memcpy behavior > > is NOT well-defined (perhaps Windows and Linux), so we should still fix this. > > This seems to have cause a regression on both GTK+ and EFL. Check bug 142041 There may be a regression, but it should be about overlapping memory. That is because tryCopyOtherThreadStack() doesn’t memcpy to overlapping memory. (In reply to comment #18) > There may be a regression, but it should be about overlapping memory. That > is because tryCopyOtherThreadStack() doesn’t memcpy to overlapping memory. typo: … it should *not* be about overlapping memory. Am I understanding right that after this fix, every GC will scan every thread ever known to any VM? I realize that workers suck and all, but this seems like we're punishing them a bit too much. Moreover, JSC benchmark results aren't going to tell us much here; to my knowledge none of our benchmarks exercise workers. We could run the *real* pdfjs (not the Octane benchmark but the actual app/framework), which uses workers, to see what the effects of this are. Created attachment 247432 [details] patch for rolling out r180602 and associated revs. Re-opening to use a different solution that won’t add any perf impact when worker threads are present. r180602 (and associated revs) rolled out in r180690: <http://trac.webkit.org/r180690>. Created attachment 247448 [details]
Approach 3: synchronize the construction and destruction of MachineThreads.
Comment on attachment 247448 [details]
Approach 3: synchronize the construction and destruction of MachineThreads.
Filip pointed out there's an ABA problem. Will fix.
Created attachment 247451 [details]
patch with fix for Filip
EWS bots are happy build for all ports. A local run of the fast/workers tests on a GTK build does not show the regressions from the previous implementation. Landed in r180716: <http://trac.webkit.org/r180716>. (In reply to comment #27) > EWS bots are happy build for all ports. A local run of the fast/workers > tests on a GTK build does not show the regressions from the previous > implementation. > > Landed in r180716: <http://trac.webkit.org/r180716>. Thanks for the fix, everything works fine on EFL and GTK too. |