RESOLVED FIXED107905
Quick fix for Chromium EWS bots running out of space due to a tmp file leak
https://bugs.webkit.org/show_bug.cgi?id=107905
Summary Quick fix for Chromium EWS bots running out of space due to a tmp file leak
Alan Cutter
Reported 2013-01-24 20:34:53 PST
The gce-cr-linux-## bots have been hitting disk capacity recently due to directories being left behind by testing. The directories were of the form "/tmp/.org.chromium.Chromium.XXXXXX" with XXXXXX being random characters. This patch is to apply a quick fix to remove all temporary files after each bot cycle so queues don't get blocked again while the root problem is investigated. webkit-commit-queue.appspot.com/results/16118062
Attachments
Patch (1.41 KB, patch)
2013-01-24 20:46 PST, Alan Cutter
no flags
Patch (1.55 KB, patch)
2013-01-24 23:44 PST, Alan Cutter
no flags
Patch (1.70 KB, patch)
2013-01-25 00:54 PST, Alan Cutter
no flags
Alan Cutter
Comment 1 2013-01-24 20:46:44 PST
Eric Seidel (no email)
Comment 2 2013-01-24 20:51:10 PST
Comment on attachment 184650 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=184650&action=review > Tools/EWSTools/start-queue.sh:47 > + # This clears any temporary file leaks after running tests. > + # Not the nicest solution but it will keep the queues running instead of > + # filling up all remaining disk space. > + find /tmp/* -delete This seems quite dangerous... I could imagine that other long-running process might want their temp files to not die every 2 hours... :) But we can also try this and iterate.
Eric Seidel (no email)
Comment 3 2013-01-24 20:56:38 PST
Comment on attachment 184650 [details] Patch Is there and easy way for you to try this on the GCE queues w/o shipping this to everyone? It seems like a more dangerous change than I would like.
Eric Seidel (no email)
Comment 4 2013-01-24 20:57:41 PST
Otherwise stated: I worry that non-GCE clients may be using our scripts and not expect them to eat their entire /tmp folder. mabye we should just clear /tmp/.org.chromium.Chromium* to start with, as that fixes this bug and is much safer.
Eric Seidel (no email)
Comment 5 2013-01-24 20:58:12 PST
Comment on attachment 184650 [details] Patch Yeah, I've changed my mind. I think this is too dangerous. :) Lets start with something more surgical.
Lucas Forschler
Comment 6 2013-01-24 21:57:33 PST
Comment on attachment 184650 [details] Patch Mac EWS prunes logs older than 14 days, and rotates out the main log file every 10 iterations. See the start-queue-mac for what we used. I'm open to suggestions or alternatives.
Alan Cutter
Comment 7 2013-01-24 23:44:55 PST
Alan Cutter
Comment 8 2013-01-24 23:57:43 PST
(In reply to comment #7) > Created an attachment (id=184681) [details] > Patch Updated patch to only clear out the ".org.chromium.Chromium.*" files/directories found in /tmp. I agree the first version was too violent a solution.
Alan Cutter
Comment 9 2013-01-25 00:01:41 PST
(In reply to comment #3) > (From update of attachment 184650 [details]) > Is there and easy way for you to try this on the GCE queues w/o shipping this to everyone? It seems like a more dangerous change than I would like. It should be noted that this change won't take effect on most if not all bots until it's manually implemented. The bot bash scripts are normally located outside of the repository which calls the python scripts inside the repo.
Eric Seidel (no email)
Comment 10 2013-01-25 00:08:52 PST
Comment on attachment 184681 [details] Patch LGTM.
Eric Seidel (no email)
Comment 11 2013-01-25 00:09:37 PST
I believe this makes this dangerous to run this script on your personal machine, btw. It will try and eat your chrome temp files. :)
WebKit Review Bot
Comment 12 2013-01-25 00:10:06 PST
Comment on attachment 184681 [details] Patch Rejecting attachment 184681 [details] from commit-queue. Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=gce-cq-04', 'apply-attachment', '--no-update', '--non-interactive', 184681, '--port=chromium-xvfb']" exit_code: 2 cwd: /mnt/git/webkit-commit-queue Last 500 characters of output: Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', '--force', '--reviewer', 'Eric Seidel']" exit_code: 2 cwd: /mnt/git/webkit-commit-queue Parsed 2 diffs from patch file(s). patch: **** Can't create file /tmp/ppThWn7v : No space left on device patch: **** Can't create file /tmp/ppvL0Bkw : No space left on device Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', '--force', '--reviewer', 'Eric Seidel']" exit_code: 2 cwd: /mnt/git/webkit-commit-queue Full output: http://queues.webkit.org/results/16116203
Alan Cutter
Comment 13 2013-01-25 00:54:29 PST
Alan Cutter
Comment 14 2013-01-25 00:56:00 PST
(In reply to comment #12) > (From update of attachment 184681 [details]) > Rejecting attachment 184681 [details] from commit-queue. > > Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=gce-cq-04', 'apply-attachment', '--no-update', '--non-interactive', 184681, '--port=chromium-xvfb']" exit_code: 2 cwd: /mnt/git/webkit-commit-queue > > Last 500 characters of output: > > Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', '--force', '--reviewer', 'Eric Seidel']" exit_code: 2 cwd: /mnt/git/webkit-commit-queue > > Parsed 2 diffs from patch file(s). > patch: **** Can't create file /tmp/ppThWn7v : No space left on device > patch: **** Can't create file /tmp/ppvL0Bkw : No space left on device > > Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', '--force', '--reviewer', 'Eric Seidel']" exit_code: 2 cwd: /mnt/git/webkit-commit-queue > > Full output: http://queues.webkit.org/results/16116203 (In reply to comment #13) > Created an attachment (id=184692) [details] > Patch Trying again. (:
WebKit Review Bot
Comment 15 2013-01-25 01:57:49 PST
Comment on attachment 184692 [details] Patch Clearing flags on attachment: 184692 Committed r140801: <http://trac.webkit.org/changeset/140801>
WebKit Review Bot
Comment 16 2013-01-25 01:57:54 PST
All reviewed patches have been landed. Closing bug.
Tony Chang
Comment 17 2013-01-25 09:57:27 PST
A few comments: - Deleting .org.chromium.Chromium.* files should be safe if you're running Chrome, because Chrome would use .com.google.Chrome.*. It might stomp on temp files used by a developer build of Chromium. - This is what the waterfall bots already do. They run Tools/BuildSlaveSupport/chromium/remove-crash-logs. There might be a code sharing opportunity here. - An additional option would be for NRWT to set the environment variable TMPDIR for DRT and blow away that whole directory after the tests are done. I've filed bug 107959 for this (it'll help keep developer machines clean too).
Eric Seidel (no email)
Comment 18 2013-01-25 10:26:05 PST
(In reply to comment #17) > A few comments: > - An additional option would be for NRWT to set the environment variable TMPDIR for DRT and blow away that whole directory after the tests are done. I've filed bug 107959 for this (it'll help keep developer machines clean too). Yeah, I briefly considered this, but wasn't sure how well the TMPDIR environment variable would be support by all the code we care about. :) I'm sure some parts hard-code /tmp. If that works though, it sounds like the best solution. Then not only can we clean up after leaks like this, we can also monitor them!
Note You need to log in before you can comment on or make changes to this bug.