Bug 30869

Summary: commit-queue keeps crashing
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: Tools / TestsAssignee: Nobody <webkit-unassigned>
Status: RESOLVED DUPLICATE    
Severity: Normal CC: abarth, evan, levin, steveblock
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Attachments:
Description Flags
Script demonstrating python 2.5.1 (or mac os x?) bug
none
Patch v1 none

Description Eric Seidel (no email) 2009-10-28 09:55:08 PDT
commit-queue keeps crashing

Exception while checking queue and bots: unknown encoding: idna. Sleeping until 2009-10-27 23:07:56 (5 mins).
'import site' failed; use -v for traceback
Traceback (most recent call last):
  File "/Users/eseidel/Projects/WebKit/WebKitTools/Scripts/bugzilla-tool", line 33, in <module>
    import os
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/os.py", line 49, in <module>
    import posixpath as path
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/posixpath.py", line 14, in <module>
    import stat
ImportError: No module named stat

I think that's happening on re-exec.  I think it may only happen when I have the patch from bug 30098 applied locally.  Either way, it's disturbing.  It's happened 3 times at seeming random.  The commit-queue used to be rock-solid!
Comment 1 Eric Seidel (no email) 2009-10-29 11:48:50 PDT
This appears to be a python bug.  I've made a reduced test case.
Comment 2 Eric Seidel (no email) 2009-10-29 11:50:21 PDT
Created attachment 42122 [details]
Script demonstrating python 2.5.1 (or mac os x?) bug
Comment 3 Evan Martin 2009-10-29 12:00:20 PDT
On Linux (the "import site" bits are when I'm repeatedly hitting ctl-C).
Python 2.5.2


% ./t
...............................................................................'import site' failed; use -v for traceback
.........'import site' failed; use -v for traceback
............'import site' failed; use -v for traceback
...........'import site' failed; use -v for traceback
....................'import site' failed; use -v for traceback
..........
Comment 4 Eric Seidel (no email) 2009-10-29 12:02:36 PDT
Bah.  I just don't know how exec works.  I didn't realize it carried file handles through.
Comment 5 Eric Seidel (no email) 2009-10-29 12:04:02 PDT
Created attachment 42123 [details]
Patch v1
Comment 6 Adam Barth 2009-10-29 12:10:07 PDT
Comment on attachment 42123 [details]
Patch v1

File handle leaks = bad.
Comment 7 Eric Seidel (no email) 2009-10-29 12:51:32 PDT
Comment on attachment 42123 [details]
Patch v1

Going to make a better patch.
Comment 8 Eric Seidel (no email) 2009-10-30 11:42:03 PDT
It would appear that even with my fix, Python itself is still leaking 3 file descriptors on every exec():

Python    57958 eseidel   58r     REG       14,2    144580    466329 /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/Resources/HITo
olbox.rsrc
Python    57958 eseidel   59r     REG       14,2    490410   9627854 /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/Resources/Engl
ish.lproj/Localized.rsrc
Python    57958 eseidel   60r     REG       14,2     86770    481232 /System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/plat-mac/errors.rsrc.df.rsrc

I expect this may be a mac-only problem.
Comment 9 Eric Seidel (no email) 2009-10-30 14:33:15 PDT
I wrote a function:

def check_for_file_leak(message):
    log("before %s" % message)
    lsof_output = SCM.run_command(['lsof', '-p', str(os.getpid())])
    if lsof_output.count('HIToolbox'):
        log(message)
        error(lsof_output)
    log("after %s" % message)

and sprinkled it throughout the code.

It seems these 3 leaking files are opened by mechanize during:

self.browser = Browser()

in Bugzilla.__init__()

I've not tried to trace through the mechanize code to see where the leaks come from.
Comment 10 Adam Barth 2009-10-30 14:38:05 PDT
Why don't we fork and exec and then throw away the old version of ourselves?  If that doesn't work, can we use a trampoline executable?  Tracking down every leak seems fragile.
Comment 11 Eric Seidel (no email) 2009-10-30 14:41:12 PDT
Here is a stand-alone version of the same function:

def check_for_file_leak(message):
    import os
    import sys
    import subprocess
    print >> sys.stderr, "before %s" % message
    process = subprocess.Popen(['lsof', '-p', str(os.getpid())], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    lsof_output = process.communicate()[0].rstrip()
    if lsof_output.count('HIToolbox'):
        print >> sys.stderr, message
        print >> sys.stderr, lsof_output
        exit(1)
    print >> sys.stderr, "after %s" % message
Comment 12 Eric Seidel (no email) 2009-10-30 14:41:47 PDT
We could definitely use a trampoline executable.  That may be the simplest solution.
Comment 13 Eric Seidel (no email) 2009-10-30 14:43:21 PDT
bugzilla-tool commit-queue
is already a trampoline executable of sorts.  It's just no longer a shell script, instead it's implemented in bugzilla-tool itself.

I think I might just get rid of the "don't need to restart bugzilla-tool to notice committers.py changes" feature for now.
Comment 14 Eric Seidel (no email) 2009-10-30 15:43:24 PDT
Rolled out the change and re-opened bug 30084.

*** This bug has been marked as a duplicate of bug 30084 ***