97643 – Flakiness Dashboard server OOMs when the results.json gets too large

NEW 97643

Flakiness Dashboard server OOMs when the results.json gets too large

https://bugs.webkit.org/show_bug.cgi?id=97643

Summary Flakiness Dashboard server OOMs when the results.json gets too large

Dominik Röttsches (drott)

Reported 2012-09-26 01:18:51 PDT

The flakiness dashboard does not accept results from our WebKit 2 EFL bot: http://build.webkit.org/builders/EFL%20Linux%2064-bit%20Debug%20WK2 At the end of each build, in the uploading step, we see: 00:44:46.921 6045 Uploading JSON files for builder: EFL Linux 64-bit Debug WK2 00:45:37.191 6045 Received HTTP status 500 loading "http://test-results.appspot.com/testfile/upload". Retrying in 10 seconds...

Attachments
Add attachment proposed patch, testcase, etc.

Ojan Vafai

Comment 1 2012-09-26 11:55:35 PDT

I've fixed the glitch. I deleted the "show all runs" data for this bot. Deleting the data for the bot isn't a big deal since we only keep the last 500 runs anyways, it's just a temporary data loss. This is a long-standing bug when the accumulated data in the results.json gets too large), the python server runs out of memory trying to parse it. We delete runs older than 500 and we delete entries that have only passed or been skipped in the past 500 runs. So the results.json is usually self-pruning and we don't hit this. But if to too many different tests fail in the past 500 runs, we get stuck here. There are a couple of proposed solutions, but noone has had the time to implement them: 1. Move over to using AppEngine Backend servers: https://developers.google.com/appengine/docs/python/backends/overview 2. Use a TaskQueue to do the JSON merging https://developers.google.com/appengine/docs/python/taskqueue/overview 3. Chunk the json we store every 100 runs and make the dashboard UI load 100 run chunks at a time. This would solve both the memory problem and would have the benefit of making it so we don't have to delete data older than 500 runs. Due to http://code.google.com/p/googleappengine/issues/detail?id=7973 we can't get the error logs to show us which builders are having this problem. :(

Ojan Vafai

Comment 2 2012-09-26 13:17:16 PDT

This is also affecting the Content Shell Chromium bots. I haven't deleted their results.json files since there are so many failures it will just start happening again. Peter, also a heads up in case you start seeing this with the Android bots.

jochen

Comment 3 2012-09-27 00:37:48 PDT

(In reply to comment #2) > This is also affecting the Content Shell Chromium bots. I haven't deleted their results.json files since there are so many failures it will just start happening again. of course the real fix is to make the bots not fail that much > > Peter, also a heads up in case you start seeing this with the Android bots.

Dominik Röttsches (drott)

Comment 4 2012-10-01 01:43:09 PDT

(In reply to comment #1) > I've fixed the glitch. > I deleted the "show all runs" data for this bot. Thanks a lot!

Ojan Vafai

Comment 5 2012-10-12 11:32:29 PDT

*** Bug 75499 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.

Status NEW

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Nobody

Reported

2012-09-26 01:18 PDT

Modified

2017-07-18 08:27 PDT History

CC List

6 users Show

URL

Keywords

Duplicates (1)

75499 View as bug list

Depends on

Blocks