Summary: | Flakiness Dashboard server OOMs when the results.json gets too large | ||
---|---|---|---|
Product: | WebKit | Reporter: | Dominik Röttsches (drott) <d-r> |
Component: | Tools / Tests | Assignee: | Nobody <webkit-unassigned> |
Status: | NEW --- | ||
Severity: | Normal | CC: | abarth, dpranke, jochen, jparent, peter, rakuco |
Priority: | P2 | ||
Version: | 528+ (Nightly build) | ||
Hardware: | Unspecified | ||
OS: | Unspecified |
Description
Dominik Röttsches (drott)
2012-09-26 01:18:51 PDT
I've fixed the glitch. I deleted the "show all runs" data for this bot. Deleting the data for the bot isn't a big deal since we only keep the last 500 runs anyways, it's just a temporary data loss. This is a long-standing bug when the accumulated data in the results.json gets too large), the python server runs out of memory trying to parse it. We delete runs older than 500 and we delete entries that have only passed or been skipped in the past 500 runs. So the results.json is usually self-pruning and we don't hit this. But if to too many different tests fail in the past 500 runs, we get stuck here. There are a couple of proposed solutions, but noone has had the time to implement them: 1. Move over to using AppEngine Backend servers: https://developers.google.com/appengine/docs/python/backends/overview 2. Use a TaskQueue to do the JSON merging https://developers.google.com/appengine/docs/python/taskqueue/overview 3. Chunk the json we store every 100 runs and make the dashboard UI load 100 run chunks at a time. This would solve both the memory problem and would have the benefit of making it so we don't have to delete data older than 500 runs. Due to http://code.google.com/p/googleappengine/issues/detail?id=7973 we can't get the error logs to show us which builders are having this problem. :( This is also affecting the Content Shell Chromium bots. I haven't deleted their results.json files since there are so many failures it will just start happening again. Peter, also a heads up in case you start seeing this with the Android bots. (In reply to comment #2) > This is also affecting the Content Shell Chromium bots. I haven't deleted their results.json files since there are so many failures it will just start happening again. of course the real fix is to make the bots not fail that much > > Peter, also a heads up in case you start seeing this with the Android bots. (In reply to comment #1) > I've fixed the glitch. > I deleted the "show all runs" data for this bot. Thanks a lot! |