WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
141990
MachineThreads::Thread clean up has a use after free race condition
https://bugs.webkit.org/show_bug.cgi?id=141990
Summary
MachineThreads::Thread clean up has a use after free race condition
Mark Lam
Reported
2015-02-24 15:54:06 PST
Before
r179753
(
https://bugs.webkit.org/show_bug.cgi?id=141317
), there was a race condition that could result in a small window for a use after free scenario. After
r179753
, there is a memory leak. We will roll out
r179753
here and implement a fix for the race condition using a different approach.
Attachments
patch for rolling out r179753
(11.04 KB, patch)
2015-02-24 16:04 PST
,
Mark Lam
no flags
Details
Formatted Diff
Diff
the patch.
(8.49 KB, patch)
2015-02-24 17:48 PST
,
Mark Lam
msaboff
: review+
Details
Formatted Diff
Diff
JSC benchmark results
(51.06 KB, text/plain)
2015-02-24 17:49 PST
,
Mark Lam
no flags
Details
patch for rolling out r180602 and associated revs.
(11.06 KB, patch)
2015-02-26 10:54 PST
,
Mark Lam
no flags
Details
Formatted Diff
Diff
Approach 3: synchronize the construction and destruction of MachineThreads.
(6.60 KB, patch)
2015-02-26 14:47 PST
,
Mark Lam
mark.lam
: review-
Details
Formatted Diff
Diff
patch with fix for Filip
(9.08 KB, patch)
2015-02-26 15:18 PST
,
Mark Lam
fpizlo
: review+
Details
Formatted Diff
Diff
Show Obsolete
(5)
View All
Add attachment
proposed patch, testcase, etc.
Mark Lam
Comment 1
2015-02-24 16:04:55 PST
Created
attachment 247275
[details]
patch for rolling out
r179753
Mark Lam
Comment 2
2015-02-24 16:05:48 PST
r179753
was rolled out in
r180591
: <
http://trac.webkit.org/r180591
>.
Radar WebKit Bug Importer
Comment 3
2015-02-24 16:07:20 PST
<
rdar://problem/19945725
>
Mark Lam
Comment 4
2015-02-24 17:10:09 PST
The code for _pthread_tsd_cleanup_key() looks like this: _pthread_tsd_cleanup_key(pthread_t self, pthread_key_t key) { void (*destructor)(void *); if (_pthread_key_get_destructor(key, &destructor)) { void **ptr = &self->tsd[key]; void *value = *ptr; // At this point, this thread has cached the destructor and value (which is a MachineThreads*). // If the VM gets destructed (along with its MachineThreads registry) by another thread, // then this thread will have no way of knowing that the MachineThreads* is now pointing // to freed memory. Calling the destructor below will therefore result in a use after free // scenario when it tries to access the MachineThreads' data members. if (value) { *ptr = NULL; if (destructor) { destructor(value); } } } } The solution is simply to change MachineThreads from a per VM thread registry to a process global singleton thread registry i.e. the MachineThreads registry is now immortal and we cannot have a use after free scenario. The cost of this change is that all VM instances will scan stacks of all threads ever touched by a VM, and not just those that touched this specific VM. However, stacks tend to be shallow. Hence, those additional scans will tend to be cheap. Secondly, it is not common for there to be multiple JSC VMs in use concurrently on multiple threads. Hence, this cost should rarely manifest in real world applications.
Mark Lam
Comment 5
2015-02-24 17:48:30 PST
Created
attachment 247286
[details]
the patch.
Mark Lam
Comment 6
2015-02-24 17:49:39 PST
Created
attachment 247287
[details]
JSC benchmark results A benchmark run shows perf to be neutral with this change. Results attached.
Michael Saboff
Comment 7
2015-02-24 18:15:11 PST
Comment on
attachment 247286
[details]
the patch. r=me. Be sure to run the worker tests.
Mark Lam
Comment 8
2015-02-24 18:33:39 PST
My local run of the layout test and JSC regression tests all passed without regressions, and I had landed the patch just before I saw the EWS bot complaints. I will re-run the layout test with -2 to see if there's an issue. If so, I'll rollout. For the record, landed in
r180602
: <
http://trac.webkit.org/r180602
>.
Mark Lam
Comment 9
2015-02-24 18:54:49 PST
I don't see any regressions with the wk2 run of the layout tests either. I'll consider this bug resolved.
Alexey Proskuryakov
Comment 10
2015-02-24 21:58:39 PST
Build fixes:
http://trac.webkit.org/changeset/180608
http://trac.webkit.org/changeset/180613
http://trac.webkit.org/changeset/180617
And it looks like this broke ASan tests. This is probably a false positive, but we need to find a way to fix this ASAP. #0 0x10849737b in wrap_memcpy (/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.2.0/lib/darwin/libclang_rt.asan_osx_dynamic.dylib+0x3837b) #1 0x112bccb11 in _ZN3JSC14MachineThreads23tryCopyOtherThreadStackEPNS0_6ThreadEPvmPm (/Volumes/Data/slave/yosemite-asan-production-wk2-tests/build/buildToTest/JavaScriptCore.framework/Versions/A/JavaScriptCore+0x9c9b11) #2 0x112bcced1 in _ZN3JSC14MachineThreads24tryCopyOtherThreadStacksERN3WTF6LockerINS1_5MutexEEEPvmPm (/Volumes/Data/slave/yosemite-asan-production-wk2-tests/build/buildToTest/JavaScriptCore.framework/Versions/A/JavaScriptCore+0x9c9ed1) #3 0x112bcd1c5 in _ZN3JSC14MachineThreads23gatherConservativeRootsERNS_17ConservativeRootsERNS_17JITStubRoutineSetERNS_12CodeBlockSetEPvRA37_i (/Volumes/Data/slave/yosemite-asan-production-wk2-tests/build/buildToTest/JavaScriptCore.framework/Versions/A/JavaScriptCore+0x9ca1c5) #4 0x112960236 in _ZN3JSC4Heap9markRootsEd (/Volumes/Data/slave/yosemite-asan-production-wk2-tests/build/buildToTest/JavaScriptCore.framework/Versions/A/JavaScriptCore+0x75d236) #5 0x112962185 in _ZN3JSC4Heap7collectENS_13HeapOperationE (/Volumes/Data/slave/yosemite-asan-production-wk2-tests/build/buildToTest/JavaScriptCore.framework/Versions/A/JavaScriptCore+0x75f185)
Mark Lam
Comment 11
2015-02-24 22:38:39 PST
(In reply to
comment #10
)
> And it looks like this broke ASan tests. This is probably a false positive, > but we need to find a way to fix this ASAP.
I’m looking into it.
David Kilzer (:ddkilzer)
Comment 12
2015-02-25 07:29:25 PST
Is ASan just complaining about use of overlapping memcpy? That is safe to ignore on Darwin platforms because the undefined behavior is well-defined to do the same thing as memmove. However, this will affect other platforms where overlapping memcpy behavior is NOT well-defined (perhaps Windows and Linux), so we should still fix this.
David Kilzer (:ddkilzer)
Comment 13
2015-02-25 07:31:51 PST
(In reply to
comment #12
)
> Is ASan just complaining about use of overlapping memcpy? That is safe to > ignore on Darwin platforms because the undefined behavior is well-defined to > do the same thing as memmove. > > However, this will affect other platforms where overlapping memcpy behavior > is NOT well-defined (perhaps Windows and Linux), so we should still fix this.
Nope, that's not the issue. Perhaps the ASan test bots are not using the blacklist? Tools/asan/webkit-asan-ignore.txt This particular issue was supposedly fixed here: ASan does not like JSC::MachineThreads::tryCopyOtherThreadStack
https://bugs.webkit.org/show_bug.cgi?id=141672
Mark Lam
Comment 14
2015-02-25 08:04:20 PST
Strange, I can't reproduce this ASan issue with a local build at all.
Alexey Proskuryakov
Comment 15
2015-02-25 10:26:42 PST
The blacklist would only help if the violation was detected in tryCopyOtherThreadStack, but it is detected in memcpy. As far as I can tell, there is simply no way to use memcpy on potentially poisoned regions, like the whole stack.
Mark Lam
Comment 16
2015-02-25 16:34:22 PST
Alexey is right that ASan was not happy with JSC memcpy'ing the stack to be GC scanned. Strictly speaking, this is the issue reported in
https://webkit.org/b/141672
. I've fixed it by introducing our own asanUnsafeMemcpy() to be used in tryCopyOtherThreadStack(), and telling ASan to ignore asanUnsafeMemcpy(). The fix was landed in
r180649
: <
http://trac.webkit.org/r180649
> for
https://webkit.org/b/141672
. There is no further issue. Closing this bug.
Carlos Alberto Lopez Perez
Comment 17
2015-02-26 04:31:05 PST
(In reply to
comment #8
)
> For the record, landed in
r180602
: <
http://trac.webkit.org/r180602
>.
(In reply to
comment #12
)
> However, this will affect other platforms where overlapping memcpy behavior > is NOT well-defined (perhaps Windows and Linux), so we should still fix this.
This seems to have cause a regression on both GTK+ and EFL. Check
bug 142041
Mark Lam
Comment 18
2015-02-26 08:53:16 PST
(In reply to
comment #17
)
> (In reply to
comment #8
) > > For the record, landed in
r180602
: <
http://trac.webkit.org/r180602
>. > > (In reply to
comment #12
) > > However, this will affect other platforms where overlapping memcpy behavior > > is NOT well-defined (perhaps Windows and Linux), so we should still fix this. > > This seems to have cause a regression on both GTK+ and EFL. Check
bug 142041
There may be a regression, but it should be about overlapping memory. That is because tryCopyOtherThreadStack() doesn’t memcpy to overlapping memory.
Mark Lam
Comment 19
2015-02-26 09:05:36 PST
(In reply to
comment #18
)
> There may be a regression, but it should be about overlapping memory. That > is because tryCopyOtherThreadStack() doesn’t memcpy to overlapping memory.
typo: … it should *not* be about overlapping memory.
Filip Pizlo
Comment 20
2015-02-26 10:05:04 PST
Am I understanding right that after this fix, every GC will scan every thread ever known to any VM? I realize that workers suck and all, but this seems like we're punishing them a bit too much. Moreover, JSC benchmark results aren't going to tell us much here; to my knowledge none of our benchmarks exercise workers. We could run the *real* pdfjs (not the Octane benchmark but the actual app/framework), which uses workers, to see what the effects of this are.
Mark Lam
Comment 21
2015-02-26 10:54:03 PST
Created
attachment 247432
[details]
patch for rolling out
r180602
and associated revs.
Mark Lam
Comment 22
2015-02-26 11:38:46 PST
Re-opening to use a different solution that won’t add any perf impact when worker threads are present.
Mark Lam
Comment 23
2015-02-26 11:45:17 PST
r180602
(and associated revs) rolled out in
r180690
: <
http://trac.webkit.org/r180690
>.
Mark Lam
Comment 24
2015-02-26 14:47:34 PST
Created
attachment 247448
[details]
Approach 3: synchronize the construction and destruction of MachineThreads.
Mark Lam
Comment 25
2015-02-26 14:55:12 PST
Comment on
attachment 247448
[details]
Approach 3: synchronize the construction and destruction of MachineThreads. Filip pointed out there's an ABA problem. Will fix.
Mark Lam
Comment 26
2015-02-26 15:18:01 PST
Created
attachment 247451
[details]
patch with fix for Filip
Mark Lam
Comment 27
2015-02-26 17:27:52 PST
EWS bots are happy build for all ports. A local run of the fast/workers tests on a GTK build does not show the regressions from the previous implementation. Landed in
r180716
: <
http://trac.webkit.org/r180716
>.
Csaba Osztrogonác
Comment 28
2015-02-27 03:37:16 PST
(In reply to
comment #27
)
> EWS bots are happy build for all ports. A local run of the fast/workers > tests on a GTK build does not show the regressions from the previous > implementation. > > Landed in
r180716
: <
http://trac.webkit.org/r180716
>.
Thanks for the fix, everything works fine on EFL and GTK too.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug