Bug 173744 - LayoutTest workers/wasm-hashset-many.html is a flaky crash (with no crashlog)
Summary: LayoutTest workers/wasm-hashset-many.html is a flaky crash (with no crashlog)
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2017-06-22 16:33 PDT by Ryan Haddad
Modified: 2017-08-10 09:53 PDT (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Comment 1 Saam Barati 2017-06-27 11:41:08 PDT
I can reproduce a timeout 100% of the time by running:
./Tools/Scripts/run-webkit-tests LayoutTests/workers/wasm-hashset-many.html --repeat 150 --fully-parallel --child-processes 6 --no-retry-failure

However, the sample of the process isn't very telling. I'm not sure what's happening:
Call graph:
    986 Thread_18334665   DispatchQueue_1: com.apple.main-thread  (serial)
    + 986 start  (in libdyld.dylib) + 1  [0x7fffaa6f9235]
    +   986 main  (in com.apple.WebKit.WebContent.Development) + 492  [0x10bc29695]  XPCServiceMain.mm:148
    +     986 xpc_main  (in libxpc.dylib) + 494  [0x7fffaa9512e4]
    +       986 _xpc_objc_main  (in libxpc.dylib) + 775  [0x7fffaa9528c7]
    +         986 NSApplicationMain  (in AppKit) + 1237  [0x7fff92a06e0e]
    +           986 -[NSApplication run]  (in AppKit) + 926  [0x7fff92a3c3db]
    +             986 -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:]  (in AppKit) + 2796  [0x7fff931c37ee]
    +               986 _DPSNextEvent  (in AppKit) + 1120  [0x7fff92a47a54]
    +                 986 _BlockUntilNextEventMatchingListInModeWithFilter  (in HIToolbox) + 71  [0x7fff944aeb26]
    +                   986 ReceiveNextEventCommon  (in HIToolbox) + 432  [0x7fff944aecf1]
    +                     986 RunCurrentEventLoopInMode  (in HIToolbox) + 240  [0x7fff944aeebc]
    +                       986 CFRunLoopRunSpecific  (in CoreFoundation) + 420  [0x7fff94f4e114]
    +                         985 __CFRunLoopRun  (in CoreFoundation) + 1361  [0x7fff94f4e8c1]
    +                         ! 985 __CFRunLoopServiceMachPort  (in CoreFoundation) + 212  [0x7fff94f4f434]
    +                         !   985 mach_msg  (in libsystem_kernel.dylib) + 55  [0x7fffaa81f797]
    +                         !     985 mach_msg_trap  (in libsystem_kernel.dylib) + 10  [0x7fffaa82034a]
    +                         1 __CFRunLoopRun  (in CoreFoundation) + 2065  [0x7fff94f4eb81]
    +                           1 __CFRunLoopDoTimers  (in CoreFoundation) + 177  [0x7fff94f573c1]
    +                             1 CFArrayCreateMutable  (in CoreFoundation) + 113  [0x7fff94eeca11]
    +                               1 +[__NSPlaceholderArray mutablePlaceholder]  (in CoreFoundation) + 11  [0x7fff94eeca6b]
    986 Thread_18334777: com.apple.NSEventThread
    + 986 thread_start  (in libsystem_pthread.dylib) + 13  [0x7fffaa91208d]
    +   986 _pthread_start  (in libsystem_pthread.dylib) + 286  [0x7fffaa912887]
    +     986 _pthread_body  (in libsystem_pthread.dylib) + 180  [0x7fffaa91293b]
    +       986 _NSEventThread  (in AppKit) + 205  [0x7fff92b94f02]
    +         986 CFRunLoopRunSpecific  (in CoreFoundation) + 420  [0x7fff94f4e114]
    +           986 __CFRunLoopRun  (in CoreFoundation) + 1361  [0x7fff94f4e8c1]
    +             986 __CFRunLoopServiceMachPort  (in CoreFoundation) + 212  [0x7fff94f4f434]
    +               986 mach_msg  (in libsystem_kernel.dylib) + 55  [0x7fffaa81f797]
    +                 986 mach_msg_trap  (in libsystem_kernel.dylib) + 10  [0x7fffaa82034a]
    986 Thread_18334778
    + 986 thread_start  (in libsystem_pthread.dylib) + 13  [0x7fffaa91208d]
    +   986 _pthread_start  (in libsystem_pthread.dylib) + 286  [0x7fffaa912887]
    +     986 _pthread_body  (in libsystem_pthread.dylib) + 180  [0x7fffaa91293b]
    +       986 void* std::__1::__thread_proxy<std::__1::tuple<void (*)(bmalloc::AsyncTask<bmalloc::Heap, void (bmalloc::Heap::*)()>*), bmalloc::AsyncTask<bmalloc::Heap, void (bmalloc::Heap::*)()>*> >(void*)  (in JavaScriptCore) + 93  [0x1111f90cd]  __functional_base:416
    +         986 bmalloc::AsyncTask<bmalloc::Heap, void (bmalloc::Heap::*)()>::threadEntryPoint(bmalloc::AsyncTask<bmalloc::Heap, void (bmalloc::Heap::*)()>*)  (in JavaScriptCore) + 29  [0x1111f8bfd]  AsyncTask.h:114
    +           504 bmalloc::AsyncTask<bmalloc::Heap, void (bmalloc::Heap::*)()>::threadRunLoop()  (in JavaScriptCore) + 197  [0x1111f8d55]  condition_variable:235
    +           ! 504 std::__1::cv_status std::__1::condition_variable_any::wait_until<std::__1::unique_lock<bmalloc::Mutex>, std::__1::chrono::steady_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >(std::__1::unique_lock<bmalloc::Mutex>&, std::__1::chrono::time_point<std::__1::chrono::steady_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > > const&)  (in JavaScriptCore) + 116  [0x1111f8f14]  condition_variable:224
    +           !   504 std::__1::cv_status std::__1::condition_variable::wait_until<std::__1::chrono::steady_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >(std::__1::unique_lock<std::__1::mutex>&, std::__1::chrono::time_point<std::__1::chrono::steady_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > > const&)  (in JavaScriptCore) + 128  [0x1111f9040]  __mutex_base:388
    +           !     504 std::__1::condition_variable::__do_timed_wait(std::__1::unique_lock<std::__1::mutex>&, std::__1::chrono::time_point<std::__1::chrono::system_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)  (in libc++.1.dylib) + 126  [0x7fffa9291572]
    +           !       504 _pthread_cond_wait  (in libsystem_pthread.dylib) + 712  [0x7fffaa9137fa]
    +           !         504 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fffaa827bf2]
    +           481 bmalloc::AsyncTask<bmalloc::Heap, void (bmalloc::Heap::*)()>::threadRunLoop()  (in JavaScriptCore) + 107  [0x1111f8cfb]  condition_variable:211
    +           ! 481 void std::__1::condition_variable_any::wait<std::__1::unique_lock<bmalloc::Mutex> >(std::__1::unique_lock<bmalloc::Mutex>&)  (in JavaScriptCore) + 108  [0x1111f8dfc]  condition_variable:202
    +           !   481 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&)  (in libc++.1.dylib) + 47  [0x7fffa92914cd]
    +           !     481 _pthread_cond_wait  (in libsystem_pthread.dylib) + 712  [0x7fffaa9137fa]
    +           !       481 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fffaa827bf2]
    +           1 bmalloc::AsyncTask<bmalloc::Heap, void (bmalloc::Heap::*)()>::threadRunLoop()  (in JavaScriptCore) + 50  [0x1111f8cc2]  AsyncTask.h:138
    +             1 bmalloc::Heap::concurrentScavenge()  (in JavaScriptCore) + 156  [0x1111f6fec]  Heap.cpp:135
    +               1 bmalloc::Heap::scavenge(std::__1::lock_guard<bmalloc::StaticMutex>&)  (in JavaScriptCore) + 656  [0x1111f73e0]  VMAllocate.h:188
    +                 1 madvise  (in libsystem_kernel.dylib) + 10  [0x7fffaa828e9e]
    986 Thread_18334800: com.apple.NSURLConnectionLoader
    + 986 thread_start  (in libsystem_pthread.dylib) + 13  [0x7fffaa91208d]
    +   986 _pthread_start  (in libsystem_pthread.dylib) + 286  [0x7fffaa912887]
    +     986 _pthread_body  (in libsystem_pthread.dylib) + 180  [0x7fffaa91293b]
    +       986 __NSThread__start__  (in Foundation) + 1243  [0x7fff969718ed]
    +         986 +[NSURLConnection(Loader) _resourceLoadLoop:]  (in CFNetwork) + 313  [0x7fff9408b734]
    +           986 CFRunLoopRunSpecific  (in CoreFoundation) + 420  [0x7fff94f4e114]
    +             986 __CFRunLoopRun  (in CoreFoundation) + 1361  [0x7fff94f4e8c1]
    +               986 __CFRunLoopServiceMachPort  (in CoreFoundation) + 212  [0x7fff94f4f434]
    +                 986 mach_msg  (in libsystem_kernel.dylib) + 55  [0x7fffaa81f797]
    +                   986 mach_msg_trap  (in libsystem_kernel.dylib) + 10  [0x7fffaa82034a]
    986 Thread_18334851: WebCore: Scrolling
    + 986 thread_start  (in libsystem_pthread.dylib) + 13  [0x7fffaa91208d]
    +   986 _pthread_start  (in libsystem_pthread.dylib) + 286  [0x7fffaa912887]
    +     986 _pthread_body  (in libsystem_pthread.dylib) + 180  [0x7fffaa91293b]
    +       986 WTF::wtfThreadEntryPoint(void*)  (in JavaScriptCore) + 74  [0x1111e87fa]  ThreadingPthreads.cpp:209
    +         986 WTF::threadEntryPoint(void*)  (in JavaScriptCore) + 111  [0x1111e83df]  Function.h:56
    +           986 WebCore::ScrollingThread::initializeRunLoop()  (in WebCore) + 253  [0x10df9022d]  ScrollingThreadMac.mm:52
    +             986 CFRunLoopRun  (in CoreFoundation) + 97  [0x7fff94f8db91]
    +               986 CFRunLoopRunSpecific  (in CoreFoundation) + 420  [0x7fff94f4e114]
    +                 986 __CFRunLoopRun  (in CoreFoundation) + 1361  [0x7fff94f4e8c1]
    +                   986 __CFRunLoopServiceMachPort  (in CoreFoundation) + 212  [0x7fff94f4f434]
    +                     986 mach_msg  (in libsystem_kernel.dylib) + 55  [0x7fffaa81f797]
    +                       986 mach_msg_trap  (in libsystem_kernel.dylib) + 10  [0x7fffaa82034a]
    986 Thread_18381792
    + 986 start_wqthread  (in libsystem_pthread.dylib) + 13  [0x7fffaa91207d]
    +   986 _pthread_wqthread  (in libsystem_pthread.dylib) + 1023  [0x7fffaa91248e]
    +     986 __workq_kernreturn  (in libsystem_kernel.dylib) + 10  [0x7fffaa82844e]
    554 Thread_18377504
      554 start_wqthread  (in libsystem_pthread.dylib) + 13  [0x7fffaa91207d]
        553 _pthread_wqthread  (in libsystem_pthread.dylib) + 1426  [0x7fffaa912621]
        ! 553 __workq_kernreturn  (in libsystem_kernel.dylib) + 10  [0x7fffaa82844e]
        1 _pthread_wqthread  (in libsystem_pthread.dylib) + 1481  [0x7fffaa912658]

Total number in stack (recursive counted multiple, when >=5):

Sort by top of stack, same collapsed (when >= 5):
        mach_msg_trap  (in libsystem_kernel.dylib)        3943
        __workq_kernreturn  (in libsystem_kernel.dylib)        1539
        __psynch_cvwait  (in libsystem_kernel.dylib)        985
Comment 2 Saam Barati 2017-06-27 11:44:27 PDT
The expected output:
```
CONSOLE MESSAGE: line 27: Worker finished
CONSOLE MESSAGE: line 27: Worker finished
CONSOLE MESSAGE: line 27: Worker finished
CONSOLE MESSAGE: line 27: Worker finished
CONSOLE MESSAGE: line 27: Worker finished
CONSOLE MESSAGE: line 27: Worker finished
CONSOLE MESSAGE: line 27: Worker finished
CONSOLE MESSAGE: line 27: Worker finished
CONSOLE MESSAGE: line 8: Finished running
PASS successfullyParsed is true

TEST COMPLETE
```

Actual output:
```
AC Plugin 2.0Content-Type: text/plain
CONSOLE MESSAGE: line 27: Worker finished
CONSOLE MESSAGE: line 27: Worker finished
CONSOLE MESSAGE: line 27: Worker finished
CONSOLE MESSAGE: line 27: Worker finished
CONSOLE MESSAGE: line 27: Worker finished
CONSOLE MESSAGE: line 27: Worker finished
CONSOLE MESSAGE: line 27: Worker finished
#PID UNRESPONSIVE - com.apple.WebKit.WebContent.Development (pid 46688)
FAIL: Timed out waiting for notifyDone to be called

#EOF
#EOF
````

So it looks like just a single worker isn't finishing for some reason.
Comment 3 Saam Barati 2017-06-27 14:18:54 PDT
Ignore the above logs. It's not seeing a timeout. This log was from something else. I'm running into another issue where I periodically see "AC Plugin 2.0" in the diff. This is a known problem. The above sample is not related to this bug.
Comment 4 Ryan Haddad 2017-06-27 15:03:10 PDT
Jonathan, do you have any idea why we would report a test crash but not actually get a crash log?
Comment 5 Jonathan Bedard 2017-06-27 15:18:20 PDT
(In reply to Ryan Haddad from comment #4)
> Jonathan, do you have any idea why we would report a test crash but not
> actually get a crash log?

Looking to see if I can find any other information.  Failure to attach a crash-log usually means that a crash has (In reply to Ryan Haddad from comment #4)
> Jonathan, do you have any idea why we would report a test crash but not
> actually get a crash log?

It's possible that this is a test harness failure.

Here is the stdout of that test run: https://build.webkit.org/builders/Apple%20Sierra%20Release%20WK2%20%28Tests%29/builds/2469/steps/layout-test/logs/stdio.

Looking at the logging, we are actually killing the test runner because it doesn't have output when we expect it to.

Saam, is this test expected to take a long time?  Are their possible race conditions?  Or do we expect there to be a lot of output on this test?  I'd bet that this is something time-outy, we're hitting the code-path where we attempt to read from the test runner's stdout but fail.
Comment 6 Saam Barati 2017-06-27 15:29:39 PDT
It is definitely possible there are races here. The test is not super quick, the bot seems to finish most of the time in 10s or so. It seems possible we're hitting a deadlock we don't know about, but it'd be really nice to have the proper logs here so we can tell if that's happening or not. As is, it's probably a good bet to make that we're not deadlocking. But I can't say for certain.
Comment 7 Jonathan Bedard 2017-06-27 15:49:37 PDT
(In reply to Saam Barati from comment #6)
> It is definitely possible there are races here. The test is not super quick,
> the bot seems to finish most of the time in 10s or so. It seems possible
> we're hitting a deadlock we don't know about, but it'd be really nice to
> have the proper logs here so we can tell if that's happening or not. As is,
> it's probably a good bet to make that we're not deadlocking. But I can't say
> for certain.

I'm not sure what else to do about logs.  This test doesn't have much stdout or stderr and something closes the stdout and stderr file handle Python uses to communicate with the test runner.  We aren't picking up a crash because no crash-log is generated, which means that whatever this crash is, the OS is not considering it a crash.
Comment 8 Radar WebKit Bug Importer 2017-08-10 09:53:20 PDT
<rdar://problem/33829549>