We seem to be getting 500s at the end of many test result runs. Sample output: 2011-09-23 16:44:16,152 21016 json_results_generator.py:312 INFO Uploading JSON files for builder: Webkit Mac10.6 (dbg) 2011-09-23 16:44:47,662 21016 networktransaction.py:60 WARNING Received HTTP status 500 loading "http://test-results.appspot.com/testfile/upload". Retrying in 10 seconds... 2011-09-23 16:45:29,422 21016 networktransaction.py:60 WARNING Received HTTP status 500 loading "http://test-results.appspot.com/testfile/upload". Retrying in 15.0 seconds... 2011-09-23 16:46:16,218 21016 networktransaction.py:60 WARNING Received HTTP status 500 loading "http://test-results.appspot.com/testfile/upload". Retrying in 22.5 seconds... 2011-09-23 16:47:11,776 21016 networktransaction.py:60 WARNING Received HTTP status 500 loading "http://test-results.appspot.com/testfile/upload". Retrying in 33.75 seconds... You can see the last upload time for bots at: http://test-results.appspot.com/testfile?master=ChromiumWebkit&name=results.json&testtype=layout-tests http://test-results.appspot.com/testfile?master=webkit.org&name=results.json&testtype=layout-tests On the chromium.org builders, "Webkit Mac10.6 (dbg)" hasn't updated since yestersday, and "Webkit Linux (dbg)(2)" for about a week. On the webkit.org builders, "Chromium Mac Release (Tests)" hasn't successfully uploaded for a month
According to the dashboard: URI Count % Errors (last 10 hrs) /testfile/upload 590 30% So, 30% of the uploads have errored out in the last 24 hours.
Here's the log from the most recent failure: 2011-09-24 09:32:52.366 /testfile/upload 500 29784ms 24282cpu_ms 85api_cpu_ms 0kb Python-urllib/2.6 74.125.248.75 - - [24/Sep/2011:09:32:52 -0700] "POST /testfile/upload HTTP/1.1" 500 0 - "Python-urllib/2.6" "test-results.appspot.com" ms=29784 cpu_ms=24282 api_cpu_ms=85 cpm_usd=0.736147 exit_code=104 throttle_code=1 instance=00c61b117cbffe71c764fc9e1064e00eb235 D 2011-09-24 09:32:22.704 Processing upload request, master: ChromiumWebkit, builder: Webkit Linux (deps)(dbg)(2), test_type: layout-tests. D 2011-09-24 09:32:22.704 test: [FieldStorage(u'file', u'incremental_results.json'), FieldStorage(u'file', u'full_results.json'), FieldStorage(u'file', u'times_ms.json')], type:<type 'list'> I 2011-09-24 09:32:22.714 Loading data for key: agx0ZXN0LXJlc3VsdHNyEQsSCURhdGFFbnRyeRjisjgM. I 2011-09-24 09:32:22.751 Loading data for key: agx0ZXN0LXJlc3VsdHNyEgsSCURhdGFFbnRyeRjuwqUCDA. I 2011-09-24 09:32:22.758 Loading incremental json... I 2011-09-24 09:32:23.905 Checking incremental json... I 2011-09-24 09:32:23.939 Loading existing aggregated json... I 2011-09-24 09:32:29.428 Checking existing aggregated json... I 2011-09-24 09:32:29.428 Merging json results... D 2011-09-24 09:32:29.428 Merging build 8988, incremental json index: 0. W 2011-09-24 09:32:29.428 Duplicate build 8988 in incremental json I 2011-09-24 09:32:29.458 Update failed, master: ChromiumWebkit, builder: Webkit Linux (deps)(dbg)(2), test_type: layout-tests, name: results-small.json. I 2011-09-24 09:32:29.467 Loading data for key: agx0ZXN0LXJlc3VsdHNyEQsSCURhdGFFbnRyeRiN9jkM. I 2011-09-24 09:32:29.534 Loading data for key: agx0ZXN0LXJlc3VsdHNyEQsSCURhdGFFbnRyeRj6lT8M. I 2011-09-24 09:32:29.621 Loading data for key: agx0ZXN0LXJlc3VsdHNyEgsSCURhdGFFbnRyeRjpgP8BDA. I 2011-09-24 09:32:29.662 Loading incremental json... I 2011-09-24 09:32:30.725 Checking incremental json... I 2011-09-24 09:32:30.752 Loading existing aggregated json... I 2011-09-24 09:32:43.304 Checking existing aggregated json... I 2011-09-24 09:32:43.304 Merging json results... D 2011-09-24 09:32:43.304 Merging build 8988, incremental json index: 0. E 2011-09-24 09:32:52.350 <class 'google.appengine.runtime.DeadlineExceededError'>: Traceback (most recent call last): File "/base/data/home/apps/test-results/r95566.353408741660940008/main.py", line 51, in main run_wsgi_app(application) File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/util.py", line 98, in run_wsgi_app run_bare_wsgi_app(add_wsgi_middleware(application)) File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/util.py", line 116, in run_bare_wsgi_app result = application(env, _start_response) File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 703, in __call__ handler.post(*groups) File "/base/data/home/apps/test-results/r95566.353408741660940008/handlers/testfilehandler.py", line 234, in post update_succeeded = JsonResults.update(master, builder, test_type, file.value) File "/base/data/home/apps/test-results/r95566.353408741660940008/model/jsonresults.py", line 290, in update large_file_updated = cls.update_file(master, builder, test_type, incremental, JSON_RESULTS_FILE, JSON_RESULTS_MAX_BUILDS) File "/base/data/home/apps/test-results/r95566.353408741660940008/model/jsonresults.py", line 299, in update_file new_results = cls.merge(builder, file.data, incremental, num_runs) File "/base/data/home/apps/test-results/r95566.353408741660940008/model/jsonresults.py", line 285, in merge return cls._generate_file_data(aggregated_json, sort_keys) File "/base/data/home/apps/test-results/r95566.353408741660940008/model/jsonresults.py", line 63, in _generate_file_data data = simplejson.dumps(json, separators=(',', ':'), sort_keys=sort_keys) File "/base/python_runtime/python_lib/versions/1/simplejson/__init__.py", line 272, in dumps use_decimal=use_decimal, **kw).encode(obj) File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 216, in encode chunks = list(chunks) File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 485, in _iterencode for chunk in _iterencode_dict(o, _current_indent_level): File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 459, in _iterencode_dict for chunk in chunks: File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 459, in _iterencode_dict for chunk in chunks: File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 459, in _iterencode_dict for chunk in chunks: File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 459, in _iterencode_dict for chunk in chunks: File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 366, in _iterencode_list elif isinstance(value, (int, long)): C 2011-09-24 09:32:52.351 Exceeded soft process size limit with 188.992 MB after servicing 290 requests total W 2011-09-24 09:32:52.352 A serious problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may be throwing exceptions during the initialization of your application. (Error code 104)
Would it be possible to move file uploads to use the Task Queue API (http://code.google.com/appengine/docs/python/taskqueue/) so that it's under tight deadline constraints? Alternatively, can we delete some of the old data for these bots. I'm not that familiar with how it's stored, so I don't know if it's possible to just drop really old data. Most likely for bots that are >1 month old, just blowing it all away should be OK. +Marcus, since he might run into this when he starts non-PST gardening on Monday.
Data older than a month can certainly be dropped (if not data older than a week). I don't know much about the Task Queue API, but I can learn. The main question is who's going to own this issue. It's probably either going to be Ojan or me. It's better for one person to dive into the details here than to duplicate effort.
After https://bugs.webkit.org/show_bug.cgi?id=69019 lands, we can revert my changes to the results server to keep old full_results.json files. Hopefully that will make app run well because it won't need to keep nearly as much data around.