Bug 68748 - Layout test result uploading failing for many bots
Summary: Layout test result uploading failing for many bots
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-23 18:07 PDT by Mihai Parparita
Modified: 2017-07-18 08:30 PDT (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mihai Parparita 2011-09-23 18:07:37 PDT
We seem to be getting 500s at the end of many test result runs. Sample output:

2011-09-23 16:44:16,152 21016 json_results_generator.py:312 INFO Uploading JSON files for builder: Webkit Mac10.6 (dbg)
2011-09-23 16:44:47,662 21016 networktransaction.py:60 WARNING Received HTTP status 500 loading "http://test-results.appspot.com/testfile/upload".  Retrying in 10 seconds...
2011-09-23 16:45:29,422 21016 networktransaction.py:60 WARNING Received HTTP status 500 loading "http://test-results.appspot.com/testfile/upload".  Retrying in 15.0 seconds...
2011-09-23 16:46:16,218 21016 networktransaction.py:60 WARNING Received HTTP status 500 loading "http://test-results.appspot.com/testfile/upload".  Retrying in 22.5 seconds...
2011-09-23 16:47:11,776 21016 networktransaction.py:60 WARNING Received HTTP status 500 loading "http://test-results.appspot.com/testfile/upload".  Retrying in 33.75 seconds...

You can see the last upload time for bots at:
http://test-results.appspot.com/testfile?master=ChromiumWebkit&name=results.json&testtype=layout-tests
http://test-results.appspot.com/testfile?master=webkit.org&name=results.json&testtype=layout-tests

On the chromium.org builders, "Webkit Mac10.6 (dbg)" hasn't updated since yestersday, and "Webkit Linux (dbg)(2)" for about a week. On the webkit.org builders, "Chromium Mac Release (Tests)" hasn't successfully uploaded for a month
Comment 1 Adam Barth 2011-09-24 09:33:30 PDT
According to the dashboard:

URI                      Count        % Errors (last 10 hrs)
/testfile/upload    590            30%

So, 30% of the uploads have errored out in the last 24 hours.
Comment 2 Adam Barth 2011-09-24 09:34:47 PDT
Here's the log from the most recent failure:

2011-09-24 09:32:52.366 /testfile/upload 500 29784ms 24282cpu_ms 85api_cpu_ms 0kb Python-urllib/2.6
74.125.248.75 - - [24/Sep/2011:09:32:52 -0700] "POST /testfile/upload HTTP/1.1" 500 0 - "Python-urllib/2.6" "test-results.appspot.com" ms=29784 cpu_ms=24282 api_cpu_ms=85 cpm_usd=0.736147 exit_code=104 throttle_code=1 instance=00c61b117cbffe71c764fc9e1064e00eb235
D 2011-09-24 09:32:22.704
Processing upload request, master: ChromiumWebkit, builder: Webkit Linux (deps)(dbg)(2), test_type: layout-tests.
D 2011-09-24 09:32:22.704
test: [FieldStorage(u'file', u'incremental_results.json'), FieldStorage(u'file', u'full_results.json'), FieldStorage(u'file', u'times_ms.json')], type:<type 'list'>
I 2011-09-24 09:32:22.714
Loading data for key: agx0ZXN0LXJlc3VsdHNyEQsSCURhdGFFbnRyeRjisjgM.
I 2011-09-24 09:32:22.751
Loading data for key: agx0ZXN0LXJlc3VsdHNyEgsSCURhdGFFbnRyeRjuwqUCDA.
I 2011-09-24 09:32:22.758
Loading incremental json...
I 2011-09-24 09:32:23.905
Checking incremental json...
I 2011-09-24 09:32:23.939
Loading existing aggregated json...
I 2011-09-24 09:32:29.428
Checking existing aggregated json...
I 2011-09-24 09:32:29.428
Merging json results...
D 2011-09-24 09:32:29.428
Merging build 8988, incremental json index: 0.
W 2011-09-24 09:32:29.428
Duplicate build 8988 in incremental json
I 2011-09-24 09:32:29.458
Update failed, master: ChromiumWebkit, builder: Webkit Linux (deps)(dbg)(2), test_type: layout-tests, name: results-small.json.
I 2011-09-24 09:32:29.467
Loading data for key: agx0ZXN0LXJlc3VsdHNyEQsSCURhdGFFbnRyeRiN9jkM.
I 2011-09-24 09:32:29.534
Loading data for key: agx0ZXN0LXJlc3VsdHNyEQsSCURhdGFFbnRyeRj6lT8M.
I 2011-09-24 09:32:29.621
Loading data for key: agx0ZXN0LXJlc3VsdHNyEgsSCURhdGFFbnRyeRjpgP8BDA.
I 2011-09-24 09:32:29.662
Loading incremental json...
I 2011-09-24 09:32:30.725
Checking incremental json...
I 2011-09-24 09:32:30.752
Loading existing aggregated json...
I 2011-09-24 09:32:43.304
Checking existing aggregated json...
I 2011-09-24 09:32:43.304
Merging json results...
D 2011-09-24 09:32:43.304
Merging build 8988, incremental json index: 0.
E 2011-09-24 09:32:52.350
<class 'google.appengine.runtime.DeadlineExceededError'>: 
Traceback (most recent call last):
  File "/base/data/home/apps/test-results/r95566.353408741660940008/main.py", line 51, in main
    run_wsgi_app(application)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/util.py", line 98, in run_wsgi_app
    run_bare_wsgi_app(add_wsgi_middleware(application))
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/util.py", line 116, in run_bare_wsgi_app
    result = application(env, _start_response)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 703, in __call__
    handler.post(*groups)
  File "/base/data/home/apps/test-results/r95566.353408741660940008/handlers/testfilehandler.py", line 234, in post
    update_succeeded = JsonResults.update(master, builder, test_type, file.value)
  File "/base/data/home/apps/test-results/r95566.353408741660940008/model/jsonresults.py", line 290, in update
    large_file_updated = cls.update_file(master, builder, test_type, incremental, JSON_RESULTS_FILE, JSON_RESULTS_MAX_BUILDS)
  File "/base/data/home/apps/test-results/r95566.353408741660940008/model/jsonresults.py", line 299, in update_file
    new_results = cls.merge(builder, file.data, incremental, num_runs)
  File "/base/data/home/apps/test-results/r95566.353408741660940008/model/jsonresults.py", line 285, in merge
    return cls._generate_file_data(aggregated_json, sort_keys)
  File "/base/data/home/apps/test-results/r95566.353408741660940008/model/jsonresults.py", line 63, in _generate_file_data
    data = simplejson.dumps(json, separators=(',', ':'), sort_keys=sort_keys)
  File "/base/python_runtime/python_lib/versions/1/simplejson/__init__.py", line 272, in dumps
    use_decimal=use_decimal, **kw).encode(obj)
  File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 216, in encode
    chunks = list(chunks)
  File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 485, in _iterencode
    for chunk in _iterencode_dict(o, _current_indent_level):
  File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 459, in _iterencode_dict
    for chunk in chunks:
  File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 459, in _iterencode_dict
    for chunk in chunks:
  File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 459, in _iterencode_dict
    for chunk in chunks:
  File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 459, in _iterencode_dict
    for chunk in chunks:
  File "/base/python_runtime/python_lib/versions/1/simplejson/encoder.py", line 366, in _iterencode_list
    elif isinstance(value, (int, long)):
C 2011-09-24 09:32:52.351
Exceeded soft process size limit with 188.992 MB after servicing 290 requests total
W 2011-09-24 09:32:52.352
A serious problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may be throwing exceptions during the initialization of your application. (Error code 104)
Comment 3 Mihai Parparita 2011-09-24 16:05:10 PDT
Would it be possible to move file uploads to use the Task Queue API (http://code.google.com/appengine/docs/python/taskqueue/) so that it's under tight deadline constraints?

Alternatively, can we delete some of the old data for these bots. I'm not that familiar with how it's stored, so I don't know if it's possible to just drop really old data. Most likely for bots that are >1 month old, just blowing it all away should be OK.

+Marcus, since he might run into this when he starts non-PST gardening on Monday.
Comment 4 Adam Barth 2011-09-24 16:09:53 PDT
Data older than a month can certainly be dropped (if not data older than a week).  I don't know much about the Task Queue API, but I can learn.

The main question is who's going to own this issue.  It's probably either going to be Ojan or me.  It's better for one person to dive into the details here than to duplicate effort.
Comment 5 Adam Barth 2011-09-28 12:46:36 PDT
After https://bugs.webkit.org/show_bug.cgi?id=69019 lands, we can revert my changes to the results server to keep old full_results.json files.  Hopefully that will make app run well because it won't need to keep nearly as much data around.