WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
97643
Flakiness Dashboard server OOMs when the results.json gets too large
https://bugs.webkit.org/show_bug.cgi?id=97643
Summary
Flakiness Dashboard server OOMs when the results.json gets too large
Dominik Röttsches (drott)
Reported
2012-09-26 01:18:51 PDT
The flakiness dashboard does not accept results from our WebKit 2 EFL bot:
http://build.webkit.org/builders/EFL%20Linux%2064-bit%20Debug%20WK2
At the end of each build, in the uploading step, we see: 00:44:46.921 6045 Uploading JSON files for builder: EFL Linux 64-bit Debug WK2 00:45:37.191 6045 Received HTTP status 500 loading "
http://test-results.appspot.com/testfile/upload
". Retrying in 10 seconds...
Attachments
Add attachment
proposed patch, testcase, etc.
Ojan Vafai
Comment 1
2012-09-26 11:55:35 PDT
I've fixed the glitch. I deleted the "show all runs" data for this bot. Deleting the data for the bot isn't a big deal since we only keep the last 500 runs anyways, it's just a temporary data loss. This is a long-standing bug when the accumulated data in the results.json gets too large), the python server runs out of memory trying to parse it. We delete runs older than 500 and we delete entries that have only passed or been skipped in the past 500 runs. So the results.json is usually self-pruning and we don't hit this. But if to too many different tests fail in the past 500 runs, we get stuck here. There are a couple of proposed solutions, but noone has had the time to implement them: 1. Move over to using AppEngine Backend servers:
https://developers.google.com/appengine/docs/python/backends/overview
2. Use a TaskQueue to do the JSON merging
https://developers.google.com/appengine/docs/python/taskqueue/overview
3. Chunk the json we store every 100 runs and make the dashboard UI load 100 run chunks at a time. This would solve both the memory problem and would have the benefit of making it so we don't have to delete data older than 500 runs. Due to
http://code.google.com/p/googleappengine/issues/detail?id=7973
we can't get the error logs to show us which builders are having this problem. :(
Ojan Vafai
Comment 2
2012-09-26 13:17:16 PDT
This is also affecting the Content Shell Chromium bots. I haven't deleted their results.json files since there are so many failures it will just start happening again. Peter, also a heads up in case you start seeing this with the Android bots.
jochen
Comment 3
2012-09-27 00:37:48 PDT
(In reply to
comment #2
)
> This is also affecting the Content Shell Chromium bots. I haven't deleted their results.json files since there are so many failures it will just start happening again.
of course the real fix is to make the bots not fail that much
> > Peter, also a heads up in case you start seeing this with the Android bots.
Dominik Röttsches (drott)
Comment 4
2012-10-01 01:43:09 PDT
(In reply to
comment #1
)
> I've fixed the glitch. > I deleted the "show all runs" data for this bot.
Thanks a lot!
Ojan Vafai
Comment 5
2012-10-12 11:32:29 PDT
***
Bug 75499
has been marked as a duplicate of this bug. ***
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug