RESOLVED FIXED 141290
[ARM][Linux] GC sometimes stuck in an infinite loop if parallel GC is enabled
https://bugs.webkit.org/show_bug.cgi?id=141290
Summary [ARM][Linux] GC sometimes stuck in an infinite loop if parallel GC is enabled
Csaba Osztrogonác
Reported 2015-02-05 09:25:54 PST
GTK bot: https://build.webkit.org/builders/GTK%20Linux%20ARM%20Release?numbuilds=200 EFL bot: http://build.webkit.sed.hu/builders/EFL%20ARMv7%20Linux%20Release%20%28Build%29?numbuilds=200 2-3-4 tests regularly fail with timeout on ARM Thumb2 Linux bots long long time ago. Unfortunately there were a long period when ~200 tests failed on the ARM Thumb2 EFL bot, that's why it wasn't so easy to found what and when happened. But fortunately it is quite easy to reproduce the bug on - jsc-layout-tests.yaml/js/script-tests/basic-set.js - jsc-layout-tests.yaml/js/script-tests/basic-map.js They fail - stuck in an infinite loop - once from 10-15 runs. GDB showed that it always happen somewhere in GC code. I checked the bot history thoroughly too and I found that this kind of timeout problems started when EFL port enabled parallel GC - http://trac.webkit.org/changeset/165527 And I can confirm that the problem goes away with disabled parallel GC. (There is no problem on ARM traditional, because it doesn't support weakCompareAndSwap yet (bug127679), so paralllel GC is still disabled on it.)
Attachments
Patch (1.33 KB, patch)
2015-02-06 02:07 PST, Csaba Osztrogonác
no flags
proposed patch for discussion only (2.77 KB, patch)
2015-02-24 11:51 PST, Tobias Netzel
no flags
Patch (1.35 KB, patch)
2015-03-12 05:01 PDT, Csaba Osztrogonác
no flags
Csaba Osztrogonác
Comment 1 2015-02-05 09:28:03 PST
I think we should disable parallel GC on ARM Linux Thumb2 platforms until we find and fix the real bug.
Csaba Osztrogonác
Comment 2 2015-02-05 09:31:27 PST
Just a note: I disabled parallel GC on EFL ARM Thumb2 bots locally to see more (at least 8-10) test runs before disabling it in trunk. - http://build.webkit.sed.hu/builders/EFL%20ARMv7%20Linux%20Release%20%28Build%29?numbuilds=200 (disabled since #8180) - http://build.webkit.sed.hu/builders/EFL%20ARMv7-Thumb2%20Linux%20Release%20%28Build%29%201404?numbuilds=200 (disabled since #269)
Csaba Osztrogonác
Comment 3 2015-02-06 02:00:59 PST
notes: - EFL enabled parallel GC in http://trac.webkit.org/changeset/165527 - GTK enabled parallel GC in http://trac.webkit.org/changeset/121869
Csaba Osztrogonác
Comment 4 2015-02-06 02:07:27 PST
Created attachment 246156 [details] Patch I propose disabling parallel GC temporarily on Linux/Thumb2 until we can fix the bug in GC.
Csaba Osztrogonác
Comment 5 2015-02-07 23:54:27 PST
Comment on attachment 246156 [details] Patch Clearing flags on attachment: 246156 Committed r179795: <http://trac.webkit.org/changeset/179795>
Geoffrey Garen
Comment 6 2015-02-12 11:21:43 PST
Now that parallel GC is disabled, who is going to investigate the failure, so we can re-enable it?
Csaba Osztrogonác
Comment 7 2015-02-12 11:34:35 PST
(In reply to comment #6) > Now that parallel GC is disabled, who is going to investigate the failure, > so we can re-enable it? The bug is assigned to me, so I'm going to investigate the failure in the near future and will re-enable once this bug is fixed.
Tobias Netzel
Comment 8 2015-02-13 13:04:04 PST
It might be interesting to know that in my PowerPC fork of WebKit I'm experiencing similar (if not identical) problems on multicore systems. Important similarities between my fork and the ARM Linux ports seem to be (I don't know the Linux ports and I might be wrong in my assumptions): - similar level of out-of-order execution (and generally both are RISC architectures) - threading API is pthreads exclusively - built using GCC Maybe that gives a hint.
Tobias Netzel
Comment 9 2015-02-18 13:35:49 PST
Maybe in CodeBlock::visitAggregate() after successfully passing "while (!WTF::weakCompareAndSwap(&m_visitAggregateHasBeenCalled, 0, 1));" there is missing a call to WTF::memoryBarrierAfterLock(). And in all places where m_visitAggregateHasBeenCalled is set to false again a call to WTF::memoryBarrierBeforeUnlock() might be necessary.
Geoffrey Garen
Comment 10 2015-02-18 15:37:44 PST
Does the hanging backtrace indicate a hang while marking a CodeBlock? Can someone post an example hang backtrace of all threads?
Tobias Netzel
Comment 11 2015-02-18 23:10:39 PST
Here is what I've got (It's from a third person so it's just an excerpt): Thread id: bf325f0 User stack: 130 ??? [0xb2fc] 130 _NSApplicationMain + 444 (in AppKit) [0x92b8229c] 130 -[NSApplication run] + 748 (in AppKit) [0x92bb18a0] 130 ??? [0x18c6c] 130 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 116 (in AppKit) [0x92bb7c00] 130 __DPSNextEvent + 600 (in AppKit) [0x92bb8248] 130 _BlockUntilNextEventMatchingListInMode + 88 (in HIToolbox) [0x9020377c] 130 _ReceiveNextEventCommon + 264 (in HIToolbox) [0x902038a4] 130 _RunCurrentEventLoopInMode + 268 (in HIToolbox) [0x90203b18] 130 _CFRunLoopRunSpecific + 2972 (in CoreFoundation) [0x9229081c] 130 __ZN3JSC9HeapTimer12timerDidFireEP16__CFRunLoopTimerPv + 224 (in JavaScriptCore) [0x7fadf0] 130 __ZN3JSC18GCActivityCallback6doWorkEv + 184 (in JavaScriptCore) [0x7bead8] 130 __ZN3JSC4Heap7collectENS_13HeapOperationE + 676 (in JavaScriptCore) [0x7f80c4] 130 __ZN3JSC4Heap9markRootsEd + 828 (in JavaScriptCore) [0x7f6b9c] 130 __ZN3JSC11SlotVisitor15drainFromSharedENS0_15SharedDrainModeE + 536 (in JavaScriptCore) [0xaaadc8] 130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c] Thread id: a2dcaf8 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 __ZN3WTF25TCMalloc_Central_FreeList14FetchFromSpansEv + 0 (in JavaScriptCore) [0xb360b0] 130 __ZN3WTF17TCMalloc_PageHeap15scavengerThreadEv + 136 (in JavaScriptCore) [0xb36048] 130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c] Thread id: a646d40 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c] 130 __ZN7WebCore12IconDatabase22iconDatabaseSyncThreadEv + 564 (in WebCore) [0x2dbd3e4] 130 __ZN7WebCore12IconDatabase18syncThreadMainLoopEv + 524 (in WebCore) [0x2dbd0bc] 130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c] Thread id: a9bdd40 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 __Z22CFURLCacheWorkerThreadPv + 292 (in CFNetwork) [0x96062394] 130 _CFRunLoopRunSpecific + 1816 (in CoreFoundation) [0x92290398] 130 _mach_msg_trap + 8 (in libSystem.B.dylib) [0x93f00078] Thread id: bf31000 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c] 130 __ZN3WTFL16compatEntryPointEPv + 44 (in JavaScriptCore) [0xb61edc] 130 ??? [0x324a4] 130 ??? [0x32518] 130 _CFRunLoopRunSpecific + 1816 (in CoreFoundation) [0x92290398] 130 _mach_msg_trap + 8 (in libSystem.B.dylib) [0x93f00078] Thread id: bf33838 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c] 130 __ZN3JSC14BlockAllocator22blockFreeingThreadMainEv + 260 (in JavaScriptCore) [0x753074] 130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c] Thread id: 9bf20e8 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c] 130 __ZN3JSC8GCThread12gcThreadMainEv + 140 (in JavaScriptCore) [0x7ea6ec] 130 __ZN3JSC11SlotVisitor15drainFromSharedENS0_15SharedDrainModeE + 432 (in JavaScriptCore) [0xaaad60] 130 __ZN3JSC11SlotVisitor5drainEv + 944 (in JavaScriptCore) [0xaaa520] 77 __ZN3JSC13JSFinalObject13visitChildrenEPNS_6JSCellERNS_11SlotVisitorE + 5232 (in JavaScriptCore) [0x8d7890] 29 __ZN3JSC13JSFinalObject13visitChildrenEPNS_6JSCellERNS_11SlotVisitorE + 5256 (in JavaScriptCore) [0x8d78a8] 20 __ZN3JSC13JSFinalObject13visitChildrenEPNS_6JSCellERNS_11SlotVisitorE + 5240 (in JavaScriptCore) [0x8d7898] 3 __ZN3JSC13JSFinalObject13visitChildrenEPNS_6JSCellERNS_11SlotVisitorE + 5248 (in JavaScriptCore) [0x8d78a0] 1 __ZN3JSC13JSFinalObject13visitChildrenEPNS_6JSCellERNS_11SlotVisitorE + 5244 (in JavaScriptCore) [0x8d789c] Thread id: b0fa248 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c] 130 __ZN3JSC8GCThread12gcThreadMainEv + 140 (in JavaScriptCore) [0x7ea6ec] 130 __ZN3JSC11SlotVisitor15drainFromSharedENS0_15SharedDrainModeE + 336 (in JavaScriptCore) [0xaaad00] 130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c] Thread id: bf31750 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c] 130 __ZN3JSC8GCThread12gcThreadMainEv + 140 (in JavaScriptCore) [0x7ea6ec] 130 __ZN3JSC11SlotVisitor15drainFromSharedENS0_15SharedDrainModeE + 336 (in JavaScriptCore) [0xaaad00] 130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c] Thread id: 987c000 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 ___NSThread__main__ + 1008 (in Foundation) [0x951e8d88] 130 +[NSURLConnection(NSURLConnectionReallyInternal) _resourceLoadLoop:] + 284 (in Foundation) [0x9523fd54] 130 _CFRunLoopRunSpecific + 1816 (in CoreFoundation) [0x92290398] 130 _mach_msg_trap + 8 (in libSystem.B.dylib) [0x93f00078] Thread id: 9bf0af8 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c] 130 __ZN3WTFL16compatEntryPointEPv + 44 (in JavaScriptCore) [0xb61edc] 130 ??? [0x324a4] 130 ??? [0x32518] 130 _CFRunLoopRunSpecific + 1816 (in CoreFoundation) [0x92290398] 130 _mach_msg_trap + 8 (in libSystem.B.dylib) [0x93f00078] Thread id: 99260e8 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 __ZN9CAPThread5EntryEPS_ + 108 (in CoreAudio) [0x927a9de8] 130 __ZN10HALRunLoop9OwnThreadEPv + 216 (in CoreAudio) [0x927a9fac] 130 _CFRunLoopRunSpecific + 1816 (in CoreFoundation) [0x92290398] 130 _mach_msg_trap + 8 (in libSystem.B.dylib) [0x93f00078] Thread id: c0ac5f0 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c] 130 __ZN7WebCore13StorageThread16threadEntryPointEv + 300 (in WebCore) [0x317b0cc] 130 __ZN3WTF15ThreadCondition9timedWaitERNS_5MutexEd + 92 (in JavaScriptCore) [0xb6245c] 130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c] Thread id: c0ad490 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c] 130 __ZN7WebCore13StorageThread16threadEntryPointEv + 300 (in WebCore) [0x317b0cc] 130 __ZN3WTF15ThreadCondition9timedWaitERNS_5MutexEd + 92 (in JavaScriptCore) [0xb6245c] 130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c] Thread id: bf32248 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c] 130 __ZN3WTFL16compatEntryPointEPv + 44 (in JavaScriptCore) [0xb61edc] 130 ??? [0x49a1c] 130 ??? [0x49b0c] 130 ??? [0x1d5d34] 130 __ZN3WTF15ThreadCondition9timedWaitERNS_5MutexEd + 92 (in JavaScriptCore) [0xb6245c] 130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c] Thread id: 91b13a8 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 _select$DARWIN_EXTSN + 12 (in libSystem.B.dylib) [0x93f63c14] Thread id: c26d5f0 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 __Z11CMMConvTaskPv + 52 (in ColorSync) [0x9347fbb0] 130 __Z20pthreadSemaphoreWaitP18t_pthreadSemaphore + 40 (in ColorSync) [0x9346c00c] 130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c] Thread id: c26e490 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 __Z11CMMConvTaskPv + 52 (in ColorSync) [0x9347fbb0] 130 __Z20pthreadSemaphoreWaitP18t_pthreadSemaphore + 40 (in ColorSync) [0x9346c00c] 130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c] Thread id: c0ab3a8 User stack: 130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74] 130 __Z11CMMConvTaskPv + 52 (in ColorSync) [0x9347fbb0] 130 __Z20pthreadSemaphoreWaitP18t_pthreadSemaphore + 40 (in ColorSync) [0x9346c00c] 130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c]
Tobias Netzel
Comment 12 2015-02-19 03:42:33 PST
Keep in mind that since the hang backtrace is from my fork it may or may not be related to this bug.
Tobias Netzel
Comment 13 2015-02-19 03:47:45 PST
Also forgot to mention that my fork is currently based on the safari-600-5 branch.
Csaba Osztrogonác
Comment 14 2015-02-19 03:49:38 PST
(In reply to comment #10) > Does the hanging backtrace indicate a hang while marking a CodeBlock? Can > someone post an example hang backtrace of all threads? Sure, I'll post some ARM Linux backtraces later today.
Csaba Osztrogonác
Comment 15 2015-02-19 08:13:53 PST
(In reply to comment #14) > (In reply to comment #10) > > Does the hanging backtrace indicate a hang while marking a CodeBlock? Can > > someone post an example hang backtrace of all threads? > > Sure, I'll post some ARM Linux backtraces later today. $ jsc standalone-pre.js basic-map.js standalone-post.js This test normally ran in 1-2 seconds. After half minutes I attached to the process with gdb and dumped backtrace of all threads. And then continued and stopped. I got the same backtrace always. Thread 3 (Thread 0xb4bbb460 (LWP 14564)): #0 0xb6f41384 in __libc_do_syscall () from /lib/arm-linux-gnueabihf/libpthread.so.0 #1 0xb6f3cff6 in pthread_cond_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0 #2 0xb6374a30 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () at /home/rgabor/new_toolchain/compiler/.build/arm-szeged-linux-gnueabi/build/build-cc-final/arm-szeged-linux-gnueabi/libstdc++-v3/include/arm-szeged-linux-gnueabi/bits/gthr-default.h:864 #3 0xb6c94150 in JSC::BlockAllocator::blockFreeingThreadMain() () at ../../Source/JavaScriptCore/heap/BlockAllocator.cpp:144 #4 0xb6eaafd6 in std::_Function_handler<void (), WTF::createThread(void (*)(void*), void*, char const*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () at ../../Source/WTF/wtf/Threading.cpp:81 #5 0xb6eab086 in WTF::threadEntryPoint(void*) () at /mnt/store/ARM/toolchain/hardfp/arm-szeged-linux-gnueabi-4.8.3-thumb2/arm-szeged-linux-gnueabi/include/c++/4.8.3/functional:2471 #6 0xb6ec4aa2 in WTF::wtfThreadEntryPoint(void*) () at ../../Source/WTF/wtf/ThreadingPthreads.cpp:170 #7 0xb6f39ed2 in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0 #8 0xb6225338 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6 #9 0xb6225338 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 2 (Thread 0xb1fff460 (LWP 14565)): #0 JSC::CodeBlock::stronglyVisitStrongReferences(JSC::SlotVisitor&) () at ../../Source/WTF/wtf/Atomics.h:317 #1 0xb6b51f02 in JSC::CodeBlock::visitAggregate(JSC::SlotVisitor&) () at ../../Source/JavaScriptCore/bytecode/CodeBlock.cpp:2261 #2 0xb6dc96e2 in JSC::EvalExecutable::visitChildren(JSC::JSCell*, JSC::SlotVisitor&) () at ../../Source/JavaScriptCore/runtime/Executable.cpp:437 #3 0xb6ca7ed2 in JSC::SlotVisitor::drain() () at ../../Source/JavaScriptCore/heap/SlotVisitor.cpp:104 #4 0xb6ca80fa in JSC::SlotVisitor::drainFromShared(JSC::SlotVisitor::SharedDrainMode) () at ../../Source/JavaScriptCore/heap/SlotVisitor.cpp:233 #5 0xb6c9a9cc in JSC::GCThread::gcThreadMain() () at ../../Source/JavaScriptCore/heap/GCThread.cpp:102 #6 0xb6eaafd6 in std::_Function_handler<void (), WTF::createThread(void (*)(void*), void*, char const*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () at ../../Source/WTF/wtf/Threading.cpp:81 #7 0xb6eab086 in WTF::threadEntryPoint(void*) () at /mnt/store/ARM/toolchain/hardfp/arm-szeged-linux-gnueabi-4.8.3-thumb2/arm-szeged-linux-gnueabi/include/c++/4.8.3/functional:2471 #8 0xb6ec4aa2 in WTF::wtfThreadEntryPoint(void*) () at ../../Source/WTF/wtf/ThreadingPthreads.cpp:170 #9 0xb6f39ed2 in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0 #10 0xb6225338 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6 #11 0xb6225338 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 1 (Thread 0xb4c57000 (LWP 14563)): #0 0xb6f41384 in __libc_do_syscall () from /lib/arm-linux-gnueabihf/libpthread.so.0 #1 0xb6f3cff6 in pthread_cond_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0 #2 0xb6374a30 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () at /home/rgabor/new_toolchain/compiler/.build/arm-szeged-linux-gnueabi/build/build-cc-final/arm-szeged-linux-gnueabi/libstdc++-v3/include/arm-szeged-linux-gnueabi/bits/gthr-default.h:864 #3 0xb6ca8124 in JSC::SlotVisitor::drainFromShared(JSC::SlotVisitor::SharedDrainMode) () at ../../Source/JavaScriptCore/heap/SlotVisitor.cpp:212 #4 0xb6c9ce32 in JSC::Heap::markRoots(double) () at ../../Source/JavaScriptCore/heap/Heap.cpp:553 #5 0xb6ca06ca in JSC::Heap::collect(JSC::HeapOperation) () at ../../Source/JavaScriptCore/heap/Heap.cpp:1039 #6 0xb6ca0860 in JSC::Heap::collectAllGarbage() () at ../../Source/JavaScriptCore/heap/Heap.cpp:985 #7 0x0000e9a8 in functionGCAndSweep(JSC::ExecState*) () #8 0xb40f9aa8 in ?? () #9 0xb40f9aa8 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Csaba Osztrogonác
Comment 16 2015-02-19 08:15:36 PST
( Of course I tried on ToT (r180345) after reenabled parallel GC. )
Tobias Netzel
Comment 17 2015-02-19 10:14:17 PST
The backtraces seem to show the same situation to me.
Geoffrey Garen
Comment 18 2015-02-19 13:00:36 PST
The backtrace in comment 11 does not show a CodeBlock, but the backtrace in comment 15 does. The backtrace in comment 15 states that CodeBlock::visitAggregate made progress past the weakCompareAndSwap loop, which seems to contradict the missing fence theory. If we can reproduce the backtrace in comment 15, can we add some logging, or just manual stepping, to show exactly how we end up looping? For example, are we repeatedly calling visit on the same CodeBlock? CodeBlock.cpp:2261 is not inside any loop, so it's not clear why we're looping.
Tobias Netzel
Comment 19 2015-02-19 13:28:07 PST
I think it's the loop in SlotVisitor::drainFromShared() where SlotVisitor::drain() is called inside a loop. Regarding the fence the call to WTF::memoryBarrierBeforeUnlock() before setting m_visitAggregateHasBeenCalled to false might be needed to ensure that this doesn't actually happen before all of the critical instructions have been executed. For example in tryHashConsLock() and releaseHashConsLock() it's implemented the same way.
Tobias Netzel
Comment 20 2015-02-24 11:51:08 PST
Created attachment 247250 [details] proposed patch for discussion only I discovered an optimization issue where GCC optimizes away a pointer deref. This patch is not meant to be committed nor reviewed. Since I generally cannot build and test ToT someone else has to pick it up in order to fix the bug in trunk. I'm still waiting for some testers of my fork to confirm it works for them, so I can't yet tell whether it will indeed fix this bug. The memory barriers I added might or might not be needed but as far as I can tell spinlocks cannot be implemented correctly without them on (more aggressive) out-of-order-execution architectures. Please note that SamplingRegion::Locker does also lack the memory barriers. The comment in SamplingRegion::Locker::~Locker() even shows that WTF::weakCompareAndSwap at some point did include memory barriers. So I think that comment has to be corrected and the missing barriers have to added.
Tobias Netzel
Comment 21 2015-02-24 22:54:02 PST
My testers confirmed the patch indeed fixes the hangs.
Csaba Osztrogonác
Comment 22 2015-03-04 03:20:26 PST
Unfortunately this bug is valid on Aarch64 Linux too.
Tobias Netzel
Comment 23 2015-03-04 14:43:16 PST
(In reply to comment #22) > Unfortunately this bug is valid on Aarch64 Linux too. Why not try my proposed (and confirmedly working) patch? What would be easier than trying, refactoring and committing it?
Geoffrey Garen
Comment 24 2015-03-04 14:50:45 PST
I think we'd like to know more clearly why this patch works. There are a lot of threading problems that go away if you just perform more barriers -- even if the barrier is not specifically related to the problem. Do we have a specific explanation of the memory ordering problem we think we're fixing with this patch?
Tobias Netzel
Comment 25 2015-03-04 15:20:18 PST
Well, I already tried to explain the patch in comment #20. I definitely saw the GCC optimization issue in the disassembled code and it was definitely fixed by the cast to volatile. The cast to volatile was inspired by Atomic.h line 317. The memory barriers were inspired by bug114934 and by the fact that the comment in SamplingRegion::Locker::~Locker() expects weakCompareAndSwap to perform a memory barrier. To me it seems it rather needs to be proved that memory barriers aren't needed in the places where I added them (see the changelog entry from bug114934).
Geoffrey Garen
Comment 26 2015-03-04 17:10:40 PST
> I discovered an optimization issue where GCC optimizes away a pointer deref. Which pointer? What did GCC replace the deref with?
Tobias Netzel
Comment 27 2015-03-04 23:10:16 PST
I had to force GCC into dereferencing wordPtr like shown below because GCC would otherwise rely on the pointer's target to never change inside the loop and load it into a register only once before the loop: - oldValue = *wordPtr; + oldValue = *const_cast<volatile WordType*>(wordPtr);
Geoffrey Garen
Comment 28 2015-03-06 16:09:20 PST
There appears to be a bug in the inline asm added for GCC: #elif CPU(ARM64) && COMPILER(GCC) unsigned tmp; unsigned result; asm volatile( "mov %w1, #1\n\t" "ldxr %w2, [%0]\n\t" "cmp %w3, %w2\n\t" "b.ne 0f\n\t" "stxr %w1, %w4, [%0]\n\t" "0:" : "+r"(location), "=&r"(result), "=&r"(tmp) : "r"(expected), "r"(newValue) : "memory"); result = !result; The contents of the location pointed to by location do not look correctly annotated.
Tobias Netzel
Comment 29 2015-03-08 00:10:08 PST
Here the full loop from WTF::Bitmap::concurrentTestAndSet() where GCC assumes that the target of wordPtr wouldn't change inside the loop. WordType* wordPtr = bits.data() + index; WordType oldValue; do { - oldValue = *wordPtr; + oldValue = *const_cast<volatile WordType*>(wordPtr); if (oldValue & mask) return true; } while (!weakCompareAndSwap(wordPtr, oldValue, oldValue | mask)); The loop in WTF::Bitmap::concurrentTestAndClear() is nearly identical. Not forcing the compiler to deref the pointer each loop iteration seems an obvious bug to me.
Geoffrey Garen
Comment 30 2015-03-08 15:49:11 PDT
> Not forcing the compiler to deref the pointer each loop iteration seems an > obvious bug to me. weakCompareAndSwap is (supposed to be) annotated as clobbering memory. That should be enough to force a pointer deref each time through the loop. If you need volatile inside the loop in order to force a pointer deref, you either have a compiler bug or an incorrect implementation of weakCompareAndSwap. As stated above, it looks like you have the latter -- an incorrect implementation of weakCompareAndSwap.
Tobias Netzel
Comment 31 2015-03-09 00:10:27 PDT
> As stated above, it looks like you have the latter -- an incorrect > implementation of weakCompareAndSwap. Now that's potentially interesting for you, because the implementation of weakCompareAndSwap that is in use in my fork is the same as for the various ARM ports. It's the workaround for non-Intel architectures not having a direct 8-bit-CAS operation: weakCompareAndSwap(uint8_t* location, uint8_t expected, uint8_t newValue) in Atomics.h . Maybe location should be declared volatile in that function declaration?
Tobias Netzel
Comment 32 2015-03-09 14:40:20 PDT
Replacing inline bool weakCompareAndSwap(uint8_t* location, uint8_t expected, uint8_t newValue) in Atomics.h with inline bool weakCompareAndSwap(volatile uint8_t* location, uint8_t expected, uint8_t newValue) (added volatile to the declaration of location) is sufficient for GCC to deref the pointer each loop iteration; the disassembled code looks correct.
Mark Lam
Comment 33 2015-03-09 17:45:18 PDT
At least part of the issue (if not all) is caused by https://bugs.webkit.org/show_bug.cgi?id=142513. Once that fix is landed, please check if there's any additional issue that still remains.
Csaba Osztrogonác
Comment 34 2015-03-10 00:55:55 PDT
(In reply to comment #33) > At least part of the issue (if not all) is caused by > https://bugs.webkit.org/show_bug.cgi?id=142513. Once that fix is landed, > please check if there's any additional issue that still remains. Thanks, I'll check it today.
Tobias Netzel
Comment 35 2015-03-10 12:01:49 PDT
The change from r181319 (<http://trac.webkit.org/r181319>) makes GCC deref wordPtr each loop iteration, confirmed by analyzing the disassembly (PowerPC in my case).
Csaba Osztrogonác
Comment 36 2015-03-12 05:01:51 PDT
Created attachment 248509 [details] Patch r181319 ( bug142513 ) fixed the issue, thanks. Let's enable parallel GC again.
Carlos Garcia Campos
Comment 37 2015-03-12 05:39:07 PDT
Comment on attachment 248509 [details] Patch Ok, let's try again.
WebKit Commit Bot
Comment 38 2015-03-12 06:33:44 PDT
The commit-queue encountered the following flaky tests while processing attachment 248509 [details]: The commit-queue is continuing to process your patch.
WebKit Commit Bot
Comment 39 2015-03-12 06:34:15 PDT
The commit-queue encountered the following flaky tests while processing attachment 248509 [details]: transitions/default-timing-function.html bug 138901 (author: simon.fraser@apple.com) The commit-queue is continuing to process your patch.
Csaba Osztrogonác
Comment 40 2015-03-12 07:50:54 PDT
Comment on attachment 248509 [details] Patch Clearing flags on attachment: 248509 Committed r181436: <http://trac.webkit.org/changeset/181436>
Csaba Osztrogonác
Comment 41 2015-03-12 07:51:07 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.