Bug 141290 - [ARM][Linux] GC sometimes stuck in an infinite loop if parallel GC is enabled
Summary: [ARM][Linux] GC sometimes stuck in an infinite loop if parallel GC is enabled
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P1 Critical
Assignee: Csaba Osztrogonác
URL:
Keywords:
Depends on: 142513
Blocks: 108645 90568 130177
  Show dependency treegraph
 
Reported: 2015-02-05 09:25 PST by Csaba Osztrogonác
Modified: 2015-03-12 07:51 PDT (History)
14 users (show)

See Also:


Attachments
Patch (1.33 KB, patch)
2015-02-06 02:07 PST, Csaba Osztrogonác
no flags Details | Formatted Diff | Diff
proposed patch for discussion only (2.77 KB, patch)
2015-02-24 11:51 PST, Tobias Netzel
no flags Details | Formatted Diff | Diff
Patch (1.35 KB, patch)
2015-03-12 05:01 PDT, Csaba Osztrogonác
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Csaba Osztrogonác 2015-02-05 09:25:54 PST
GTK bot: https://build.webkit.org/builders/GTK%20Linux%20ARM%20Release?numbuilds=200
EFL bot: http://build.webkit.sed.hu/builders/EFL%20ARMv7%20Linux%20Release%20%28Build%29?numbuilds=200

2-3-4 tests regularly fail with timeout on ARM Thumb2 Linux bots long long time ago.
Unfortunately there were a long period when ~200 tests failed on the ARM Thumb2 EFL
bot, that's why it wasn't so easy to found what and when happened.

But fortunately it is quite easy to reproduce the bug on
- jsc-layout-tests.yaml/js/script-tests/basic-set.js 
- jsc-layout-tests.yaml/js/script-tests/basic-map.js

They fail - stuck in an infinite loop - once from 10-15 runs.
GDB showed that it always happen somewhere in GC code.

I checked the bot history thoroughly too and I found that this kind of timeout
problems started when EFL port enabled parallel GC - http://trac.webkit.org/changeset/165527

And I can confirm that the problem goes away with disabled parallel GC.
(There is no problem on ARM traditional, because it doesn't support 
weakCompareAndSwap yet (bug127679), so paralllel GC is still disabled on it.)
Comment 1 Csaba Osztrogonác 2015-02-05 09:28:03 PST
I think we should disable parallel GC on ARM Linux Thumb2 
platforms until we find and fix the real bug.
Comment 2 Csaba Osztrogonác 2015-02-05 09:31:27 PST
Just a note: I disabled parallel GC on EFL ARM Thumb2 bots locally
to see more (at least 8-10) test runs before disabling it in trunk.

- http://build.webkit.sed.hu/builders/EFL%20ARMv7%20Linux%20Release%20%28Build%29?numbuilds=200
(disabled since #8180)
- http://build.webkit.sed.hu/builders/EFL%20ARMv7-Thumb2%20Linux%20Release%20%28Build%29%201404?numbuilds=200
(disabled since #269)
Comment 3 Csaba Osztrogonác 2015-02-06 02:00:59 PST
notes:
- EFL enabled parallel GC in http://trac.webkit.org/changeset/165527
- GTK enabled parallel GC in http://trac.webkit.org/changeset/121869
Comment 4 Csaba Osztrogonác 2015-02-06 02:07:27 PST
Created attachment 246156 [details]
Patch

I propose disabling parallel GC temporarily on Linux/Thumb2 until we can fix the bug in GC.
Comment 5 Csaba Osztrogonác 2015-02-07 23:54:27 PST
Comment on attachment 246156 [details]
Patch

Clearing flags on attachment: 246156

Committed r179795: <http://trac.webkit.org/changeset/179795>
Comment 6 Geoffrey Garen 2015-02-12 11:21:43 PST
Now that parallel GC is disabled, who is going to investigate the failure, so we can re-enable it?
Comment 7 Csaba Osztrogonác 2015-02-12 11:34:35 PST
(In reply to comment #6)
> Now that parallel GC is disabled, who is going to investigate the failure,
> so we can re-enable it?

The bug is assigned to me, so I'm going to investigate the failure 
in the near future and will re-enable once this bug is fixed.
Comment 8 Tobias Netzel 2015-02-13 13:04:04 PST
It might be interesting to know that in my PowerPC fork of WebKit I'm experiencing similar (if not identical) problems on multicore systems.
Important similarities between my fork and the ARM Linux ports seem to be (I don't know the Linux ports and I might be wrong in my assumptions):
- similar level of out-of-order execution (and generally both are RISC architectures)
- threading API is pthreads exclusively
- built using GCC
Maybe that gives a hint.
Comment 9 Tobias Netzel 2015-02-18 13:35:49 PST
Maybe in CodeBlock::visitAggregate() after successfully passing
"while (!WTF::weakCompareAndSwap(&m_visitAggregateHasBeenCalled, 0, 1));"
there is missing a call to WTF::memoryBarrierAfterLock().
And in all places where m_visitAggregateHasBeenCalled is set to false again a call to WTF::memoryBarrierBeforeUnlock() might be necessary.
Comment 10 Geoffrey Garen 2015-02-18 15:37:44 PST
Does the hanging backtrace indicate a hang while marking a CodeBlock? Can someone post an example hang backtrace of all threads?
Comment 11 Tobias Netzel 2015-02-18 23:10:39 PST
Here is what I've got (It's from a third person so it's just an excerpt):

  Thread id:      bf325f0
  User stack:
    130 ??? [0xb2fc]
      130 _NSApplicationMain + 444 (in AppKit) [0x92b8229c]
        130 -[NSApplication run] + 748 (in AppKit) [0x92bb18a0]
          130 ??? [0x18c6c]
            130 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 116 (in AppKit) [0x92bb7c00]
              130 __DPSNextEvent + 600 (in AppKit) [0x92bb8248]
                130 _BlockUntilNextEventMatchingListInMode + 88 (in HIToolbox) [0x9020377c]
                  130 _ReceiveNextEventCommon + 264 (in HIToolbox) [0x902038a4]
                    130 _RunCurrentEventLoopInMode + 268 (in HIToolbox) [0x90203b18]
                      130 _CFRunLoopRunSpecific + 2972 (in CoreFoundation) [0x9229081c]
                        130 __ZN3JSC9HeapTimer12timerDidFireEP16__CFRunLoopTimerPv + 224 (in JavaScriptCore) [0x7fadf0]
                          130 __ZN3JSC18GCActivityCallback6doWorkEv + 184 (in JavaScriptCore) [0x7bead8]
                            130 __ZN3JSC4Heap7collectENS_13HeapOperationE + 676 (in JavaScriptCore) [0x7f80c4]
                              130 __ZN3JSC4Heap9markRootsEd + 828 (in JavaScriptCore) [0x7f6b9c]
                                130 __ZN3JSC11SlotVisitor15drainFromSharedENS0_15SharedDrainModeE + 536 (in JavaScriptCore) [0xaaadc8]
                                  130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c]

  Thread id:      a2dcaf8
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 __ZN3WTF25TCMalloc_Central_FreeList14FetchFromSpansEv + 0 (in JavaScriptCore) [0xb360b0]
        130 __ZN3WTF17TCMalloc_PageHeap15scavengerThreadEv + 136 (in JavaScriptCore) [0xb36048]
          130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c]

  Thread id:      a646d40
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c]
        130 __ZN7WebCore12IconDatabase22iconDatabaseSyncThreadEv + 564 (in WebCore) [0x2dbd3e4]
          130 __ZN7WebCore12IconDatabase18syncThreadMainLoopEv + 524 (in WebCore) [0x2dbd0bc]
            130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c]

  Thread id:      a9bdd40
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 __Z22CFURLCacheWorkerThreadPv + 292 (in CFNetwork) [0x96062394]
        130 _CFRunLoopRunSpecific + 1816 (in CoreFoundation) [0x92290398]
          130 _mach_msg_trap + 8 (in libSystem.B.dylib) [0x93f00078]

  Thread id:      bf31000
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c]
        130 __ZN3WTFL16compatEntryPointEPv + 44 (in JavaScriptCore) [0xb61edc]
          130 ??? [0x324a4]
            130 ??? [0x32518]
              130 _CFRunLoopRunSpecific + 1816 (in CoreFoundation) [0x92290398]
                130 _mach_msg_trap + 8 (in libSystem.B.dylib) [0x93f00078]

  Thread id:      bf33838
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c]
        130 __ZN3JSC14BlockAllocator22blockFreeingThreadMainEv + 260 (in JavaScriptCore) [0x753074]
          130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c]

  Thread id:      9bf20e8
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c]
        130 __ZN3JSC8GCThread12gcThreadMainEv + 140 (in JavaScriptCore) [0x7ea6ec]
          130 __ZN3JSC11SlotVisitor15drainFromSharedENS0_15SharedDrainModeE + 432 (in JavaScriptCore) [0xaaad60]
            130 __ZN3JSC11SlotVisitor5drainEv + 944 (in JavaScriptCore) [0xaaa520]
              77 __ZN3JSC13JSFinalObject13visitChildrenEPNS_6JSCellERNS_11SlotVisitorE + 5232 (in JavaScriptCore) [0x8d7890]
              29 __ZN3JSC13JSFinalObject13visitChildrenEPNS_6JSCellERNS_11SlotVisitorE + 5256 (in JavaScriptCore) [0x8d78a8]
              20 __ZN3JSC13JSFinalObject13visitChildrenEPNS_6JSCellERNS_11SlotVisitorE + 5240 (in JavaScriptCore) [0x8d7898]
              3 __ZN3JSC13JSFinalObject13visitChildrenEPNS_6JSCellERNS_11SlotVisitorE + 5248 (in JavaScriptCore) [0x8d78a0]
              1 __ZN3JSC13JSFinalObject13visitChildrenEPNS_6JSCellERNS_11SlotVisitorE + 5244 (in JavaScriptCore) [0x8d789c]

  Thread id:      b0fa248
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c]
        130 __ZN3JSC8GCThread12gcThreadMainEv + 140 (in JavaScriptCore) [0x7ea6ec]
          130 __ZN3JSC11SlotVisitor15drainFromSharedENS0_15SharedDrainModeE + 336 (in JavaScriptCore) [0xaaad00]
            130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c]

  Thread id:      bf31750
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c]
        130 __ZN3JSC8GCThread12gcThreadMainEv + 140 (in JavaScriptCore) [0x7ea6ec]
          130 __ZN3JSC11SlotVisitor15drainFromSharedENS0_15SharedDrainModeE + 336 (in JavaScriptCore) [0xaaad00]
            130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c]

  Thread id:      987c000
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 ___NSThread__main__ + 1008 (in Foundation) [0x951e8d88]
        130 +[NSURLConnection(NSURLConnectionReallyInternal) _resourceLoadLoop:] + 284 (in Foundation) [0x9523fd54]
          130 _CFRunLoopRunSpecific + 1816 (in CoreFoundation) [0x92290398]
            130 _mach_msg_trap + 8 (in libSystem.B.dylib) [0x93f00078]

  Thread id:      9bf0af8
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c]
        130 __ZN3WTFL16compatEntryPointEPv + 44 (in JavaScriptCore) [0xb61edc]
          130 ??? [0x324a4]
            130 ??? [0x32518]
              130 _CFRunLoopRunSpecific + 1816 (in CoreFoundation) [0x92290398]
                130 _mach_msg_trap + 8 (in libSystem.B.dylib) [0x93f00078]

  Thread id:      99260e8
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 __ZN9CAPThread5EntryEPS_ + 108 (in CoreAudio) [0x927a9de8]
        130 __ZN10HALRunLoop9OwnThreadEPv + 216 (in CoreAudio) [0x927a9fac]
          130 _CFRunLoopRunSpecific + 1816 (in CoreFoundation) [0x92290398]
            130 _mach_msg_trap + 8 (in libSystem.B.dylib) [0x93f00078]

  Thread id:      c0ac5f0
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c]
        130 __ZN7WebCore13StorageThread16threadEntryPointEv + 300 (in WebCore) [0x317b0cc]
          130 __ZN3WTF15ThreadCondition9timedWaitERNS_5MutexEd + 92 (in JavaScriptCore) [0xb6245c]
            130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c]

  Thread id:      c0ad490
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c]
        130 __ZN7WebCore13StorageThread16threadEntryPointEv + 300 (in WebCore) [0x317b0cc]
          130 __ZN3WTF15ThreadCondition9timedWaitERNS_5MutexEd + 92 (in JavaScriptCore) [0xb6245c]
            130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c]

  Thread id:      bf32248
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 __ZN3WTFL19wtfThreadEntryPointEPv + 44 (in JavaScriptCore) [0xb6239c]
        130 __ZN3WTFL16compatEntryPointEPv + 44 (in JavaScriptCore) [0xb61edc]
          130 ??? [0x49a1c]
            130 ??? [0x49b0c]
              130 ??? [0x1d5d34]
                130 __ZN3WTF15ThreadCondition9timedWaitERNS_5MutexEd + 92 (in JavaScriptCore) [0xb6245c]
                  130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c]

  Thread id:      91b13a8
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 _select$DARWIN_EXTSN + 12 (in libSystem.B.dylib) [0x93f63c14]

  Thread id:      c26d5f0
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 __Z11CMMConvTaskPv + 52 (in ColorSync) [0x9347fbb0]
        130 __Z20pthreadSemaphoreWaitP18t_pthreadSemaphore + 40 (in ColorSync) [0x9346c00c]
          130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c]

  Thread id:      c26e490
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 __Z11CMMConvTaskPv + 52 (in ColorSync) [0x9347fbb0]
        130 __Z20pthreadSemaphoreWaitP18t_pthreadSemaphore + 40 (in ColorSync) [0x9346c00c]
          130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c]

  Thread id:      c0ab3a8
  User stack:
    130 __pthread_start + 320 (in libSystem.B.dylib) [0x93f41f74]
      130 __Z11CMMConvTaskPv + 52 (in ColorSync) [0x9347fbb0]
        130 __Z20pthreadSemaphoreWaitP18t_pthreadSemaphore + 40 (in ColorSync) [0x9346c00c]
          130 ___semwait_signal + 12 (in libSystem.B.dylib) [0x93f06a8c]
Comment 12 Tobias Netzel 2015-02-19 03:42:33 PST
Keep in mind that since the hang backtrace is from my fork it may or may not be related to this bug.
Comment 13 Tobias Netzel 2015-02-19 03:47:45 PST
Also forgot to mention that my fork is currently based on the safari-600-5 branch.
Comment 14 Csaba Osztrogonác 2015-02-19 03:49:38 PST
(In reply to comment #10)
> Does the hanging backtrace indicate a hang while marking a CodeBlock? Can
> someone post an example hang backtrace of all threads?

Sure, I'll post some ARM Linux backtraces later today.
Comment 15 Csaba Osztrogonác 2015-02-19 08:13:53 PST
(In reply to comment #14)
> (In reply to comment #10)
> > Does the hanging backtrace indicate a hang while marking a CodeBlock? Can
> > someone post an example hang backtrace of all threads?
> 
> Sure, I'll post some ARM Linux backtraces later today.

$ jsc standalone-pre.js basic-map.js standalone-post.js

This test normally ran in 1-2 seconds. After half minutes I attached
to the process with gdb and dumped backtrace of all threads. And then
continued and stopped. I got the same backtrace always.

Thread 3 (Thread 0xb4bbb460 (LWP 14564)):
#0  0xb6f41384 in __libc_do_syscall () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1  0xb6f3cff6 in pthread_cond_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
#2  0xb6374a30 in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
    at /home/rgabor/new_toolchain/compiler/.build/arm-szeged-linux-gnueabi/build/build-cc-final/arm-szeged-linux-gnueabi/libstdc++-v3/include/arm-szeged-linux-gnueabi/bits/gthr-default.h:864
#3  0xb6c94150 in JSC::BlockAllocator::blockFreeingThreadMain() () at ../../Source/JavaScriptCore/heap/BlockAllocator.cpp:144
#4  0xb6eaafd6 in std::_Function_handler<void (), WTF::createThread(void (*)(void*), void*, char const*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () at ../../Source/WTF/wtf/Threading.cpp:81
#5  0xb6eab086 in WTF::threadEntryPoint(void*) () at /mnt/store/ARM/toolchain/hardfp/arm-szeged-linux-gnueabi-4.8.3-thumb2/arm-szeged-linux-gnueabi/include/c++/4.8.3/functional:2471
#6  0xb6ec4aa2 in WTF::wtfThreadEntryPoint(void*) () at ../../Source/WTF/wtf/ThreadingPthreads.cpp:170
#7  0xb6f39ed2 in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#8  0xb6225338 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#9  0xb6225338 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 2 (Thread 0xb1fff460 (LWP 14565)):
#0  JSC::CodeBlock::stronglyVisitStrongReferences(JSC::SlotVisitor&) () at ../../Source/WTF/wtf/Atomics.h:317
#1  0xb6b51f02 in JSC::CodeBlock::visitAggregate(JSC::SlotVisitor&) () at ../../Source/JavaScriptCore/bytecode/CodeBlock.cpp:2261
#2  0xb6dc96e2 in JSC::EvalExecutable::visitChildren(JSC::JSCell*, JSC::SlotVisitor&) () at ../../Source/JavaScriptCore/runtime/Executable.cpp:437
#3  0xb6ca7ed2 in JSC::SlotVisitor::drain() () at ../../Source/JavaScriptCore/heap/SlotVisitor.cpp:104
#4  0xb6ca80fa in JSC::SlotVisitor::drainFromShared(JSC::SlotVisitor::SharedDrainMode) () at ../../Source/JavaScriptCore/heap/SlotVisitor.cpp:233
#5  0xb6c9a9cc in JSC::GCThread::gcThreadMain() () at ../../Source/JavaScriptCore/heap/GCThread.cpp:102
#6  0xb6eaafd6 in std::_Function_handler<void (), WTF::createThread(void (*)(void*), void*, char const*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () at ../../Source/WTF/wtf/Threading.cpp:81
#7  0xb6eab086 in WTF::threadEntryPoint(void*) () at /mnt/store/ARM/toolchain/hardfp/arm-szeged-linux-gnueabi-4.8.3-thumb2/arm-szeged-linux-gnueabi/include/c++/4.8.3/functional:2471
#8  0xb6ec4aa2 in WTF::wtfThreadEntryPoint(void*) () at ../../Source/WTF/wtf/ThreadingPthreads.cpp:170
#9  0xb6f39ed2 in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#10 0xb6225338 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#11 0xb6225338 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (Thread 0xb4c57000 (LWP 14563)):
#0  0xb6f41384 in __libc_do_syscall () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1  0xb6f3cff6 in pthread_cond_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
#2  0xb6374a30 in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
    at /home/rgabor/new_toolchain/compiler/.build/arm-szeged-linux-gnueabi/build/build-cc-final/arm-szeged-linux-gnueabi/libstdc++-v3/include/arm-szeged-linux-gnueabi/bits/gthr-default.h:864
#3  0xb6ca8124 in JSC::SlotVisitor::drainFromShared(JSC::SlotVisitor::SharedDrainMode) () at ../../Source/JavaScriptCore/heap/SlotVisitor.cpp:212
#4  0xb6c9ce32 in JSC::Heap::markRoots(double) () at ../../Source/JavaScriptCore/heap/Heap.cpp:553
#5  0xb6ca06ca in JSC::Heap::collect(JSC::HeapOperation) () at ../../Source/JavaScriptCore/heap/Heap.cpp:1039
#6  0xb6ca0860 in JSC::Heap::collectAllGarbage() () at ../../Source/JavaScriptCore/heap/Heap.cpp:985
#7  0x0000e9a8 in functionGCAndSweep(JSC::ExecState*) ()
#8  0xb40f9aa8 in ?? ()
#9  0xb40f9aa8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Comment 16 Csaba Osztrogonác 2015-02-19 08:15:36 PST
( Of course I tried on ToT (r180345) after reenabled parallel GC. )
Comment 17 Tobias Netzel 2015-02-19 10:14:17 PST
The backtraces seem to show the same situation to me.
Comment 18 Geoffrey Garen 2015-02-19 13:00:36 PST
The backtrace in comment 11 does not show a CodeBlock, but the backtrace in comment 15 does.

The backtrace in comment 15 states that CodeBlock::visitAggregate made progress past the weakCompareAndSwap loop, which seems to contradict the missing fence theory.

If we can reproduce the backtrace in comment 15, can we add some logging, or just manual stepping, to show exactly how we end up looping? For example, are we repeatedly calling visit on the same CodeBlock? CodeBlock.cpp:2261 is not inside any loop, so it's not clear why we're looping.
Comment 19 Tobias Netzel 2015-02-19 13:28:07 PST
I think it's the loop in SlotVisitor::drainFromShared() where SlotVisitor::drain() is called inside a loop.

Regarding the fence the call to WTF::memoryBarrierBeforeUnlock() before setting m_visitAggregateHasBeenCalled to false might be needed to ensure that this doesn't actually happen before all of the critical instructions have been executed. For example in tryHashConsLock() and releaseHashConsLock() it's implemented the same way.
Comment 20 Tobias Netzel 2015-02-24 11:51:08 PST
Created attachment 247250 [details]
proposed patch for discussion only

I discovered an optimization issue where GCC optimizes away a pointer deref.
This patch is not meant to be committed nor reviewed.
Since I generally cannot build and test ToT someone else has to pick it up in order to fix the bug in trunk.
I'm still waiting for some testers of my fork to confirm it works for them, so I can't yet tell whether it will indeed fix this bug.
The memory barriers I added might or might not be needed but as far as I can tell spinlocks cannot be implemented correctly without them on (more aggressive) out-of-order-execution architectures.
Please note that SamplingRegion::Locker does also lack the memory barriers. The comment in SamplingRegion::Locker::~Locker() even shows that WTF::weakCompareAndSwap at some point did include memory barriers. So I think that comment has to be corrected and the missing barriers have to added.
Comment 21 Tobias Netzel 2015-02-24 22:54:02 PST
My testers confirmed the patch indeed fixes the hangs.
Comment 22 Csaba Osztrogonác 2015-03-04 03:20:26 PST
Unfortunately this bug is valid on Aarch64 Linux too.
Comment 23 Tobias Netzel 2015-03-04 14:43:16 PST
(In reply to comment #22)
> Unfortunately this bug is valid on Aarch64 Linux too.

Why not try my proposed (and confirmedly working) patch?
What would be easier than trying, refactoring and committing it?
Comment 24 Geoffrey Garen 2015-03-04 14:50:45 PST
I think we'd like to know more clearly why this patch works.

There are a lot of threading problems that go away if you just perform more barriers -- even if the barrier is not specifically related to the problem.

Do we have a specific explanation of the memory ordering problem we think we're fixing with this patch?
Comment 25 Tobias Netzel 2015-03-04 15:20:18 PST
Well, I already tried to explain the patch in comment #20.

I definitely saw the GCC optimization issue in the disassembled code and it was definitely fixed by the cast to volatile. The cast to volatile was inspired by Atomic.h line 317.

The memory barriers were inspired by bug114934 and by the fact that the comment in SamplingRegion::Locker::~Locker() expects weakCompareAndSwap to perform a memory barrier. To me it seems it rather needs to be proved that memory barriers aren't needed in the places where I added them (see the changelog entry from bug114934).
Comment 26 Geoffrey Garen 2015-03-04 17:10:40 PST
> I discovered an optimization issue where GCC optimizes away a pointer deref.

Which pointer?

What did GCC replace the deref with?
Comment 27 Tobias Netzel 2015-03-04 23:10:16 PST
I had to force GCC into dereferencing wordPtr like shown below because GCC would otherwise rely on the pointer's target to never change inside the loop and load it into a register only once before the loop:
- oldValue = *wordPtr;
+ oldValue = *const_cast<volatile WordType*>(wordPtr);
Comment 28 Geoffrey Garen 2015-03-06 16:09:20 PST
There appears to be a bug in the inline asm added for GCC:

#elif CPU(ARM64) && COMPILER(GCC)
    unsigned tmp;
    unsigned result;
    asm volatile(
        "mov %w1, #1\n\t"
        "ldxr %w2, [%0]\n\t"
        "cmp %w3, %w2\n\t"
        "b.ne 0f\n\t"
        "stxr %w1, %w4, [%0]\n\t"
        "0:"
        : "+r"(location), "=&r"(result), "=&r"(tmp)
        : "r"(expected), "r"(newValue)
        : "memory");
    result = !result;

The contents of the location pointed to by location do not look correctly annotated.
Comment 29 Tobias Netzel 2015-03-08 00:10:08 PST
Here the full loop from WTF::Bitmap::concurrentTestAndSet() where GCC assumes that the target of wordPtr wouldn't change inside the loop.

     WordType* wordPtr = bits.data() + index;
     WordType oldValue;
     do {
-        oldValue = *wordPtr;
+        oldValue = *const_cast<volatile WordType*>(wordPtr);
         if (oldValue & mask)
             return true;
     } while (!weakCompareAndSwap(wordPtr, oldValue, oldValue | mask));

The loop in WTF::Bitmap::concurrentTestAndClear() is nearly identical.
Not forcing the compiler to deref the pointer each loop iteration seems an obvious bug to me.
Comment 30 Geoffrey Garen 2015-03-08 15:49:11 PDT
> Not forcing the compiler to deref the pointer each loop iteration seems an
> obvious bug to me.

weakCompareAndSwap is (supposed to be) annotated as clobbering memory. That should be enough to force a pointer deref each time through the loop.

If you need volatile inside the loop in order to force a pointer deref, you either have a compiler bug or an incorrect implementation of weakCompareAndSwap.

As stated above, it looks like you have the latter -- an incorrect implementation of weakCompareAndSwap.
Comment 31 Tobias Netzel 2015-03-09 00:10:27 PDT
> As stated above, it looks like you have the latter -- an incorrect
> implementation of weakCompareAndSwap.

Now that's potentially interesting for you, because the implementation of weakCompareAndSwap that is in use in my fork is the same as for the various ARM ports.
It's the workaround for non-Intel architectures not having a direct 8-bit-CAS operation: weakCompareAndSwap(uint8_t* location, uint8_t expected, uint8_t newValue) in Atomics.h .
Maybe location should be declared volatile in that function declaration?
Comment 32 Tobias Netzel 2015-03-09 14:40:20 PDT
Replacing

  inline bool weakCompareAndSwap(uint8_t* location, uint8_t expected, uint8_t newValue)

in Atomics.h with

  inline bool weakCompareAndSwap(volatile uint8_t* location, uint8_t expected, uint8_t newValue)

(added volatile to the declaration of location) is sufficient for GCC to deref the pointer each loop iteration; the disassembled code looks correct.
Comment 33 Mark Lam 2015-03-09 17:45:18 PDT
At least part of the issue (if not all) is caused by https://bugs.webkit.org/show_bug.cgi?id=142513.  Once that fix is landed, please check if there's any additional issue that still remains.
Comment 34 Csaba Osztrogonác 2015-03-10 00:55:55 PDT
(In reply to comment #33)
> At least part of the issue (if not all) is caused by
> https://bugs.webkit.org/show_bug.cgi?id=142513.  Once that fix is landed,
> please check if there's any additional issue that still remains.

Thanks, I'll check it today.
Comment 35 Tobias Netzel 2015-03-10 12:01:49 PDT
The change from r181319 (<http://trac.webkit.org/r181319>) makes GCC deref wordPtr each loop iteration, confirmed by analyzing the disassembly (PowerPC in my case).
Comment 36 Csaba Osztrogonác 2015-03-12 05:01:51 PDT
Created attachment 248509 [details]
Patch

r181319 ( bug142513 ) fixed the issue, thanks. Let's enable parallel GC again.
Comment 37 Carlos Garcia Campos 2015-03-12 05:39:07 PDT
Comment on attachment 248509 [details]
Patch

Ok, let's try again.
Comment 38 WebKit Commit Bot 2015-03-12 06:33:44 PDT
The commit-queue encountered the following flaky tests while processing attachment 248509 [details]:

The commit-queue is continuing to process your patch.
Comment 39 WebKit Commit Bot 2015-03-12 06:34:15 PDT
The commit-queue encountered the following flaky tests while processing attachment 248509 [details]:

transitions/default-timing-function.html bug 138901 (author: simon.fraser@apple.com)
The commit-queue is continuing to process your patch.
Comment 40 Csaba Osztrogonác 2015-03-12 07:50:54 PDT
Comment on attachment 248509 [details]
Patch

Clearing flags on attachment: 248509

Committed r181436: <http://trac.webkit.org/changeset/181436>
Comment 41 Csaba Osztrogonác 2015-03-12 07:51:07 PDT
All reviewed patches have been landed.  Closing bug.