Bug 68748
Summary: | Layout test result uploading failing for many bots | ||
---|---|---|---|
Product: | WebKit | Reporter: | Mihai Parparita <mihaip> |
Component: | Tools / Tests | Assignee: | Nobody <webkit-unassigned> |
Status: | NEW | ||
Severity: | Normal | CC: | abarth, bulach, tony |
Priority: | P2 | ||
Version: | 528+ (Nightly build) | ||
Hardware: | Unspecified | ||
OS: | Unspecified |
Mihai Parparita
We seem to be getting 500s at the end of many test result runs. Sample output:
2011-09-23 16:44:16,152 21016 json_results_generator.py:312 INFO Uploading JSON files for builder: Webkit Mac10.6 (dbg)
2011-09-23 16:44:47,662 21016 networktransaction.py:60 WARNING Received HTTP status 500 loading "http://test-results.appspot.com/testfile/upload". Retrying in 10 seconds...
2011-09-23 16:45:29,422 21016 networktransaction.py:60 WARNING Received HTTP status 500 loading "http://test-results.appspot.com/testfile/upload". Retrying in 15.0 seconds...
2011-09-23 16:46:16,218 21016 networktransaction.py:60 WARNING Received HTTP status 500 loading "http://test-results.appspot.com/testfile/upload". Retrying in 22.5 seconds...
2011-09-23 16:47:11,776 21016 networktransaction.py:60 WARNING Received HTTP status 500 loading "http://test-results.appspot.com/testfile/upload". Retrying in 33.75 seconds...
You can see the last upload time for bots at:
http://test-results.appspot.com/testfile?master=ChromiumWebkit&name=results.json&testtype=layout-tests
http://test-results.appspot.com/testfile?master=webkit.org&name=results.json&testtype=layout-tests
On the chromium.org builders, "Webkit Mac10.6 (dbg)" hasn't updated since yestersday, and "Webkit Linux (dbg)(2)" for about a week. On the webkit.org builders, "Chromium Mac Release (Tests)" hasn't successfully uploaded for a month
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
Adam Barth
According to the dashboard:
URI Count % Errors (last 10 hrs)
/testfile/upload 590 30%
So, 30% of the uploads have errored out in the last 24 hours.
Adam Barth
Here's the log from the most recent failure:
2011-09-24 09:32:52.366 /testfile/upload 500 29784ms 24282cpu_ms 85api_cpu_ms 0kb Python-urllib/2.6
74.125.248.75 - - [24/Sep/2011:09:32:52 -0700] "POST /testfile/upload HTTP/1.1" 500 0 - "Python-urllib/2.6" "test-results.appspot.com" ms=29784 cpu_ms=24282 api_cpu_ms=85 cpm_usd=0.736147 exit_code=104 throttle_code=1 instance=00c61b117cbffe71c764fc9e1064e00eb235
D 2011-09-24 09:32:22.704
Processing upload request, master: ChromiumWebkit, builder: Webkit Linux (deps)(dbg)(2), test_type: layout-tests.
D 2011-09-24 09:32:22.704
test: [FieldStorage(u'file', u'incremental_results.json'), FieldStorage(u'file', u'full_results.json'), FieldStorage(u'file', u'times_ms.json')], type:<type 'list'>
I 2011-09-24 09:32:22.714
Loading data for key: agx0ZXN0LXJlc3VsdHNyEQsSCURhdGFFbnRyeRjisjgM.
I 2011-09-24 09:32:22.751
Loading data for key: agx0ZXN0LXJlc3VsdHNyEgsSCURhdGFFbnRyeRjuwqUCDA.
I 2011-09-24 09:32:22.758
Loading incremental json...
I 2011-09-24 09:32:23.905
Checking incremental json...
I 2011-09-24 09:32:23.939
Loading existing aggregated json...
I 2011-09-24 09:32:29.428
Checking existing aggregated json...
I 2011-09-24 09:32:29.428
Merging json results...
D 2011-09-24 09:32:29.428
Merging build 8988, incremental json index: 0.
W 2011-09-24 09:32:29.428
Duplicate build 8988 in incremental json
I 2011-09-24 09:32:29.458
Update failed, master: ChromiumWebkit, builder: Webkit Linux (deps)(dbg)(2), test_type: layout-tests, name: results-small.json.
I 2011-09-24 09:32:29.467
Loading data for key: agx0ZXN0LXJlc3VsdHNyEQsSCURhdGFFbnRyeRiN9jkM.
I 2011-09-24 09:32:29.534
Loading data for key: agx0ZXN0LXJlc3VsdHNyEQsSCURhdGFFbnRyeRj6lT8M.
I 2011-09-24 09:32:29.621
Loading data for key: agx0ZXN0LXJlc3VsdHNyEgsSCURhdGFFbnRyeRjpgP8BDA.
I 2011-09-24 09:32:29.662
Loading incremental json...
I 2011-09-24 09:32:30.725
Checking incremental json...
I 2011-09-24 09:32:30.752
Loading existing aggregated json...
I 2011-09-24 09:32:43.304
Checking existing aggregated json...
I 2011-09-24 09:32:43.304
Merging json results...
D 2011-09-24 09:32:43.304
Merging build 8988, incremental json index: 0.
E 2011-09-24 09:32:52.350
<class 'google.appengine.runtime.DeadlineExceededError'>:
Traceback (most recent call last):
File "/base/data/home/apps/test-results/r95566.353408741660940008/main.py", line 51, in main
run_wsgi_app(application)
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/util.py", line 98, in run_wsgi_app
run_bare_wsgi_app(add_wsgi_middleware(application))
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/util.py", line 116, in run_bare_wsgi_app
result = application(env, _start_response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 703, in __call__
handler.post(*groups)
File "/base/data/home/apps/test-results/r95566.353408741660940008/handlers/testfilehandler.py", line 234, in post
update_succeeded = JsonResults.update(master, builder, test_type, file.value)
File "/base/data/home/apps/test-results/r95566.353408741660940008/model/jsonresults.py", line 290, in update
large_file_updated = cls.update_file(master, builder, test_type, incremental, JSON_RESULTS_FILE, JSON_RESULTS_MAX_BUILDS)
File "/base/data/home/apps/test-results/r95566.353408741660940008/model/jsonresults.py", line 299, in update_file
new_results = cls.merge(builder, file.data, incremental, num_runs)
File "/base/data/home/apps/test-results/r95566.353408741660940008/model/jsonresults.py", line 285, in merge
return cls._generate_file_data(aggregated_json, sort_keys)
File "/base/data/home/apps/test-results/r95566.353408741660940008/model/jsonresults.py", line 63, in _generate_file_data
data = simplejson.dumps(json, separators=(',', ':'), sort_keys=sort_keys)
File "/base/python_runtime/python_lib/versions/1/simplejson/__init__.py", line 272, in dumps
use_decimal=use_decimal, **kw).encode(obj)
File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 216, in encode
chunks = list(chunks)
File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 485, in _iterencode
for chunk in _iterencode_dict(o, _current_indent_level):
File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 459, in _iterencode_dict
for chunk in chunks:
File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 459, in _iterencode_dict
for chunk in chunks:
File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 459, in _iterencode_dict
for chunk in chunks:
File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 459, in _iterencode_dict
for chunk in chunks:
File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 366, in _iterencode_list
elif isinstance(value, (int, long)):
C 2011-09-24 09:32:52.351
Exceeded soft process size limit with 188.992 MB after servicing 290 requests total
W 2011-09-24 09:32:52.352
A serious problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may be throwing exceptions during the initialization of your application. (Error code 104)
Mihai Parparita
Would it be possible to move file uploads to use the Task Queue API (http://code.google.com/appengine/docs/python/taskqueue/) so that it's under tight deadline constraints?
Alternatively, can we delete some of the old data for these bots. I'm not that familiar with how it's stored, so I don't know if it's possible to just drop really old data. Most likely for bots that are >1 month old, just blowing it all away should be OK.
+Marcus, since he might run into this when he starts non-PST gardening on Monday.
Adam Barth
Data older than a month can certainly be dropped (if not data older than a week). I don't know much about the Task Queue API, but I can learn.
The main question is who's going to own this issue. It's probably either going to be Ojan or me. It's better for one person to dive into the details here than to duplicate effort.
Adam Barth
After https://bugs.webkit.org/show_bug.cgi?id=69019 lands, we can revert my changes to the results server to keep old full_results.json files. Hopefully that will make app run well because it won't need to keep nearly as much data around.