Bug 107905 - Quick fix for Chromium EWS bots running out of space due to a tmp file leak
Summary: Quick fix for Chromium EWS bots running out of space due to a tmp file leak
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alan Cutter
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-24 20:34 PST by Alan Cutter
Modified: 2013-01-25 10:26 PST (History)
7 users (show)

See Also:


Attachments
Patch (1.41 KB, patch)
2013-01-24 20:46 PST, Alan Cutter
no flags Details | Formatted Diff | Diff
Patch (1.55 KB, patch)
2013-01-24 23:44 PST, Alan Cutter
no flags Details | Formatted Diff | Diff
Patch (1.70 KB, patch)
2013-01-25 00:54 PST, Alan Cutter
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alan Cutter 2013-01-24 20:34:53 PST
The gce-cr-linux-## bots have been hitting disk capacity recently due to directories being left behind by testing.
The directories were of the form "/tmp/.org.chromium.Chromium.XXXXXX" with XXXXXX being random characters.
This patch is to apply a quick fix to remove all temporary files after each bot cycle so queues don't get blocked again while the root problem is investigated.

webkit-commit-queue.appspot.com/results/16118062
Comment 1 Alan Cutter 2013-01-24 20:46:44 PST
Created attachment 184650 [details]
Patch
Comment 2 Eric Seidel (no email) 2013-01-24 20:51:10 PST
Comment on attachment 184650 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=184650&action=review

> Tools/EWSTools/start-queue.sh:47
> +    # This clears any temporary file leaks after running tests.
> +    # Not the nicest solution but it will keep the queues running instead of
> +    # filling up all remaining disk space.
> +    find /tmp/* -delete

This seems quite dangerous...  I could imagine that other long-running process might want their temp files to not die every 2 hours... :)

But we can also try this and iterate.
Comment 3 Eric Seidel (no email) 2013-01-24 20:56:38 PST
Comment on attachment 184650 [details]
Patch

Is there and easy way for you to try this on the GCE queues w/o shipping this to everyone?  It seems like a more dangerous change than I would like.
Comment 4 Eric Seidel (no email) 2013-01-24 20:57:41 PST
Otherwise stated:  I worry that non-GCE clients may be using our scripts and not expect them to eat their entire /tmp folder.

mabye we should just clear /tmp/.org.chromium.Chromium* to start with, as that fixes this bug and is much safer.
Comment 5 Eric Seidel (no email) 2013-01-24 20:58:12 PST
Comment on attachment 184650 [details]
Patch

Yeah, I've changed my mind.  I think this is too dangerous. :)  Lets start with something more surgical.
Comment 6 Lucas Forschler 2013-01-24 21:57:33 PST
Comment on attachment 184650 [details]
Patch

Mac EWS prunes logs older than 14 days, and rotates out the main log file every 10 iterations.  See the start-queue-mac for what we used.  I'm open to suggestions or alternatives.
Comment 7 Alan Cutter 2013-01-24 23:44:55 PST
Created attachment 184681 [details]
Patch
Comment 8 Alan Cutter 2013-01-24 23:57:43 PST
(In reply to comment #7)
> Created an attachment (id=184681) [details]
> Patch

Updated patch to only clear out the ".org.chromium.Chromium.*" files/directories found in /tmp. I agree the first version was too violent a solution.
Comment 9 Alan Cutter 2013-01-25 00:01:41 PST
(In reply to comment #3)
> (From update of attachment 184650 [details])
> Is there and easy way for you to try this on the GCE queues w/o shipping this to everyone?  It seems like a more dangerous change than I would like.

It should be noted that this change won't take effect on most if not all bots until it's manually implemented. The bot bash scripts are normally located outside of the repository which calls the python scripts inside the repo.
Comment 10 Eric Seidel (no email) 2013-01-25 00:08:52 PST
Comment on attachment 184681 [details]
Patch

LGTM.
Comment 11 Eric Seidel (no email) 2013-01-25 00:09:37 PST
I believe this makes this dangerous to run this script on your personal machine, btw.  It will try and eat your chrome temp files. :)
Comment 12 WebKit Review Bot 2013-01-25 00:10:06 PST
Comment on attachment 184681 [details]
Patch

Rejecting attachment 184681 [details] from commit-queue.

Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=gce-cq-04', 'apply-attachment', '--no-update', '--non-interactive', 184681, '--port=chromium-xvfb']" exit_code: 2 cwd: /mnt/git/webkit-commit-queue

Last 500 characters of output:

Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', '--force', '--reviewer', 'Eric Seidel']" exit_code: 2 cwd: /mnt/git/webkit-commit-queue

Parsed 2 diffs from patch file(s).
patch: **** Can't create file /tmp/ppThWn7v : No space left on device
patch: **** Can't create file /tmp/ppvL0Bkw : No space left on device

Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', '--force', '--reviewer', 'Eric Seidel']" exit_code: 2 cwd: /mnt/git/webkit-commit-queue

Full output: http://queues.webkit.org/results/16116203
Comment 13 Alan Cutter 2013-01-25 00:54:29 PST
Created attachment 184692 [details]
Patch
Comment 14 Alan Cutter 2013-01-25 00:56:00 PST
(In reply to comment #12)
> (From update of attachment 184681 [details])
> Rejecting attachment 184681 [details] from commit-queue.
> 
> Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=gce-cq-04', 'apply-attachment', '--no-update', '--non-interactive', 184681, '--port=chromium-xvfb']" exit_code: 2 cwd: /mnt/git/webkit-commit-queue
> 
> Last 500 characters of output:
> 
> Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', '--force', '--reviewer', 'Eric Seidel']" exit_code: 2 cwd: /mnt/git/webkit-commit-queue
> 
> Parsed 2 diffs from patch file(s).
> patch: **** Can't create file /tmp/ppThWn7v : No space left on device
> patch: **** Can't create file /tmp/ppvL0Bkw : No space left on device
> 
> Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', '--force', '--reviewer', 'Eric Seidel']" exit_code: 2 cwd: /mnt/git/webkit-commit-queue
> 
> Full output: http://queues.webkit.org/results/16116203

(In reply to comment #13)
> Created an attachment (id=184692) [details]
> Patch

Trying again. (:
Comment 15 WebKit Review Bot 2013-01-25 01:57:49 PST
Comment on attachment 184692 [details]
Patch

Clearing flags on attachment: 184692

Committed r140801: <http://trac.webkit.org/changeset/140801>
Comment 16 WebKit Review Bot 2013-01-25 01:57:54 PST
All reviewed patches have been landed.  Closing bug.
Comment 17 Tony Chang 2013-01-25 09:57:27 PST
A few comments:
- Deleting .org.chromium.Chromium.* files should be safe if you're running Chrome, because Chrome would use .com.google.Chrome.*. It might stomp on temp files used by a developer build of Chromium.
- This is what the waterfall bots already do.  They run Tools/BuildSlaveSupport/chromium/remove-crash-logs. There might be a code sharing opportunity here.
- An additional option would be for NRWT to set the environment variable TMPDIR for DRT and blow away that whole directory after the tests are done.  I've filed bug 107959 for this (it'll help keep developer machines clean too).
Comment 18 Eric Seidel (no email) 2013-01-25 10:26:05 PST
(In reply to comment #17)
> A few comments:
> - An additional option would be for NRWT to set the environment variable TMPDIR for DRT and blow away that whole directory after the tests are done.  I've filed bug 107959 for this (it'll help keep developer machines clean too).

Yeah, I briefly considered this, but wasn't sure how well the TMPDIR environment variable would be support by all the code we care about. :)  I'm sure some parts hard-code /tmp.

If that works though, it sounds like the best solution.  Then not only can we clean up after leaks like this, we can also monitor them!