Bug 38300

Summary: new-run-webkit-tests is hitting a python bug, and hanging/crashing on Chromium Mac Bots
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: Tools / TestsAssignee: Dirk Pranke <dpranke>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, cjerdonek, dpranke, jyasskin, ojan, tony, ukai
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Bug Depends on: 37987, 38298, 49566    
Bug Blocks: 38505    
Attachments:
Description Flags
Sample from new-run-webkit-tests when hung in logging code.
none
crash report from python assert crash none

Description Eric Seidel (no email) 2010-04-28 17:50:14 PDT
new-run-webkit-tests hanging on chromium bots

Chromium actually runs their own wrapper run_webkit_tests.py, but same deal.

This is a continuation of bug 37987.  We thought it was resolved by http://trac.webkit.org/changeset/58314 but it does not appear to be.

I've been able to reproduce two different hangs locally.  This bug will cover the crazy python logging hang, this bug will cover that.  Bug 38298 will cover the rather blocking I/O hang.
Comment 1 Eric Seidel (no email) 2010-04-28 17:52:38 PDT
Created attachment 54650 [details]
Sample from new-run-webkit-tests when hung in logging code.

I looked at this sample with python developer Jeffrey Yasskin.  We were not able to find the cause by inspection.

I updated the stack dumping code (locally) to also print out "logging._lock" which would tell us what thread was holding the lock.
Comment 2 Eric Seidel (no email) 2010-04-28 18:23:52 PDT
[snip] each represents 100 or so repetitions of the same line.  Note that it appears multiple threads are printing the same debug message.

This definitely seems to be a python bug, since it crashed.  Not sure how or why were' tickling it.


100428 18:15:13 dump_render_tree_thread.py:349  DEBUG Thread-3 http/tests/navigation/reload-subframe-iframe.html passed
100428 18:15:13 dump_render_tree_thread.py:349  DEBUG Thread-2 fast/block/float/marquee-shrink-to-avoid-floats.html passed
pthread_cond_wait: Invalid argument

[snip]

pthread_cond_wait: Invalid argument
pthread_cond_wait: Invalipthread_cond_wait: Invalid argument
pthread_cond_wait: Invalid argument

[snip]

pthread_cond_wait: Invalid argument
pthread_cond_wait: Invalid argument
d argument
pthread_cond_wait: Invalid argument
pthread_cond_wait: Invalid argument

[snip]

pthread_cond_wait: Invalid argument
pthread_cond_wait: Invalid argument
100428 18:15:14 dump_render_tree_thread.py:349  DEBUG Thread-2 fast/block/float/multiple-float-positioning.html passed
pthread_cond_wait: Invalid argument
pthread_cond_wait: Invalid argument

[snip]

pthread_cond_wait: Invalid argument
pthread_cond_wait: Invalid argument
Assertion failed: (tstate != NULL), function PyEval_EvalCodeEx, file Python/ceval.c, line 2664.
Comment 3 Eric Seidel (no email) 2010-04-28 18:46:01 PDT
Created attachment 54659 [details]
crash report from python assert crash
Comment 4 Eric Seidel (no email) 2010-04-28 18:47:37 PDT
I wonder if these asserts are during the fork/exec process.
Comment 5 Eric Seidel (no email) 2010-04-28 21:04:54 PDT
*** Bug 38252 has been marked as a duplicate of this bug. ***
Comment 6 Eric Seidel (no email) 2010-05-03 16:53:49 PDT
So far we've only seen this reported for Mac, and only for run_webkit_tests.py (chromium's new-run-webkit-tests wrapper).  I suspect it exists both for Chromium and WebKit ports however (they use slightly different python code). I suspect it may exist on non-mac python versions as well, although may be specific to Python 2.5.  More investigation is required.

Anyone having seen this should add their platform information/python version to the bug, so I can get a sense of where we're seeing this and how often.
Comment 7 Dirk Pranke 2010-05-04 16:13:56 PDT
From a fairly quick look over the past  few days on the Chromium bots, several bots (WebKit Mac, WebKit Mac (dbg)(3) at least) are hanging with several different sets of symptoms. It looks like most of those hangs are probably not related to pretty patch, since it looks like pretty patch may not be available. I will try to dig up some representative stack traces.
Comment 8 Dirk Pranke 2011-02-18 19:24:36 PST
This should be fixed as of r79062.