Bug 122112 - Large logs can bring down the webkit master
Summary: Large logs can bring down the webkit master
Status: REOPENED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2013-09-30 10:28 PDT by Lucas Forschler
Modified: 2016-04-23 12:17 PDT (History)
7 users (show)

See Also:


Attachments
Patch (1.20 KB, patch)
2016-02-11 12:53 PST, Dana Burkart
lforschler: review+
Details | Formatted Diff | Diff
Patch take 2 (3.06 KB, patch)
2016-02-11 16:48 PST, Dana Burkart
mcatanzaro: commit-queue-
Details | Formatted Diff | Diff
Patch take 3 (1.82 KB, patch)
2016-02-16 12:51 PST, Dana Burkart
dbates: review+
buildbot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews116 for mac-yosemite (916.29 KB, application/zip)
2016-02-16 14:25 PST, Build Bot
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lucas Forschler 2013-09-30 10:28:29 PDT
A recent change introduced a ton of leaks.  Our leaks bot was generating mega huge log files.   The act of transferring these log files caused the buildbot master to consume over 7TB of memory, and then crashed.
We need to find some way to limit log size on the build slaves, or limiting the size of the log files the master will accept.
Comment 1 Alexey Proskuryakov 2013-10-16 11:08:43 PDT
<rdar://problem/15242844>
Comment 2 Lucas Forschler 2013-10-21 15:09:49 PDT
I believe this is the stdio of the leaks process that is consuming so much memory.  buildbot streams the stdio from the slave back to the master.  Evidently (maybe?) buildbot does not write this to disk until it is finished.

We don't need this data, so we can hopefully redirect it to /dev/null.   
We should verify we still get the results in the leaks viewer.
Comment 3 Dana Burkart 2016-02-11 12:53:03 PST
Created attachment 271073 [details]
Patch

This patch instructs buildbot to throw away stderr and stdout, instead of transferring that output to the master.
Comment 4 Dean Johnson 2016-02-11 13:20:03 PST
Comment on attachment 271073 [details]
Patch

LGTM!
Comment 5 Dana Burkart 2016-02-11 13:46:25 PST
Committed r196436
http://trac.webkit.org/changeset/196436
Comment 6 Lucas Forschler 2016-02-11 14:04:01 PST
build master restarted.
Comment 7 Dana Burkart 2016-02-11 16:48:07 PST
Reopening because this didn't fix the problem.
Comment 8 Dana Burkart 2016-02-11 16:48:56 PST
Created attachment 271108 [details]
Patch take 2

I think the problem with the earlier patch is that these values need to be passed in via constructor so that they percolate to the ShellMixin class.
Comment 9 WebKit Commit Bot 2016-02-11 16:51:19 PST
Attachment 271108 [details] did not pass style-queue:


ERROR: Websites/test-results/ChangeLog:1:  ChangeLog entry has no bug number  [changelog/bugnumber] [5]
Total errors found: 1 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 10 Dana Burkart 2016-02-11 17:10:56 PST
(In reply to comment #9)
> Attachment 271108 [details] did not pass style-queue:
> 
> 
> ERROR: Websites/test-results/ChangeLog:1:  ChangeLog entry has no bug number
> [changelog/bugnumber] [5]
> Total errors found: 1 in 4 files
> 
> 
> If any of these errors are false positives, please file a bug against
> check-webkit-style.

I think this is because the diff came out looking weird, which must confuse check-webkit-style.
Comment 11 Michael Catanzaro 2016-02-15 13:21:00 PST
Comment on attachment 271108 [details]
Patch take 2

Dana, your changelog entries are messed up, please make sure that your patch modifies only the top of the Tools/ChangeLog file, and also fill in the Websites/test-results/ChangeLog file.Dana, your changelog entries are messed up, please make sure that your patch modifies only the top of the Tools/ChangeLog file, and also fill in the Websites/test-results/ChangeLog file.
Comment 12 Dana Burkart 2016-02-15 14:02:00 PST
(In reply to comment #11)
> Comment on attachment 271108 [details]
> Patch take 2
> 
> Dana, your changelog entries are messed up, please make sure that your patch
> modifies only the top of the Tools/ChangeLog file, and also fill in the
> Websites/test-results/ChangeLog file.Dana, your changelog entries are messed
> up, please make sure that your patch modifies only the top of the
> Tools/ChangeLog file, and also fill in the Websites/test-results/ChangeLog
> file.

The first ChangeLog file is not messed up; it happened because the next entry down simply has the same text as the newest entry (as it was the first attempt at this patch, which didn't work). I will remove the changed files in Websites/test-results, as that was a mistaken commit, and post a new patch.
Comment 13 Dana Burkart 2016-02-16 12:51:50 PST
Created attachment 271472 [details]
Patch take 3

Fixes up the patch to remove changes to Websites/test-results directory.
Comment 14 Build Bot 2016-02-16 14:25:18 PST
Comment on attachment 271472 [details]
Patch take 3

Attachment 271472 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/841801

New failing tests:
js/regress/Float64Array-to-Int16Array-set.html
js/regress/Float32Array-to-Float64Array-set.html
Comment 15 Build Bot 2016-02-16 14:25:21 PST
Created attachment 271483 [details]
Archive of layout-test-results from ews116 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews116  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 16 Daniel Bates 2016-04-23 12:06:21 PDT
Comment on attachment 271472 [details]
Patch take 3

OK.
Comment 17 Daniel Bates 2016-04-23 12:17:10 PDT
(In reply to comment #16)
> Comment on attachment 271472 [details]
> Patch take 3
> 
> OK.

This is OK as a temporary fix assuming it still happens and Lucas Forschler's remark that the stdout/stderr output is not useful is accurate. Does this issue still happen? I mean this issue was first reported in 2013 and this patch (attachment #271472 [details]) was posted over two months ago (02/16). In the long term we should look to fix the leaks bot so that it does not write 7TB of output. Why is it writing so much data? If this data is not important then can we teach the bot to discard it on its side instead of sending it to the Buildbot master to discard.

If you chose to land this patch (attachment #271472 [details]) then please file a follow up bug to investigate and address the issue about the leaks bot writing large amounts of output to stdout and stderr.