Bug 107612 - Add monitoring of patches and queues to the QueueStatusServer
Summary: Add monitoring of patches and queues to the QueueStatusServer
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alan Cutter
URL:
Keywords:
Depends on: 107775
Blocks:
  Show dependency treegraph
 
Reported: 2013-01-22 18:50 PST by Alan Cutter
Modified: 2013-01-23 21:33 PST (History)
4 users (show)

See Also:


Attachments
Patch (28.53 KB, patch)
2013-01-22 19:24 PST, Alan Cutter
no flags Details | Formatted Diff | Diff
Patch (28.54 KB, patch)
2013-01-22 21:05 PST, Alan Cutter
no flags Details | Formatted Diff | Diff
Patch (28.52 KB, patch)
2013-01-22 23:15 PST, Alan Cutter
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alan Cutter 2013-01-22 18:50:32 PST
Currently it's very difficult to see how long the average patch runs for in the EWS queues or how fast a particular bot is performing compared to others. The QueueStatusServer should be recording data that can easily present this type of information.
Comment 1 Alan Cutter 2013-01-22 19:24:56 PST
Created attachment 184107 [details]
Patch
Comment 2 WebKit Review Bot 2013-01-22 19:49:19 PST
Attachment 184107 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Tools/ChangeLog', u'Tools/QueueStatusServe..." exit_code: 1
Tools/QueueStatusServer/model/queuelog.py:32:  No name 'appengine' in module 'google'  [pylint/E0611] [5]
Tools/QueueStatusServer/model/queuelog.py:53:  [QueueLog.get_current] Class 'QueueLog' has no 'get_or_insert' member  [pylint/E1101] [5]
Tools/QueueStatusServer/model/patchlog.py:31:  No name 'appengine' in module 'google'  [pylint/E0611] [5]
Tools/QueueStatusServer/model/patchlog.py:48:  [PatchLog.lookup] Class 'PatchLog' has no 'get_or_insert' member  [pylint/E1101] [5]
Tools/QueueStatusServer/handlers/updateworkitems.py:52:  [UpdateWorkItems._update_work_items_from_request] Instance of 'UpdateWorkItems' has no 'request' member  [pylint/E1101] [5]
Tools/QueueStatusServer/handlers/updateworkitems.py:55:  [UpdateWorkItems._update_work_items_from_request] Instance of 'UpdateWorkItems' has no 'response' member  [pylint/E1101] [5]
Tools/QueueStatusServer/handlers/updateworkitems.py:72:  [UpdateWorkItems.post] Instance of 'UpdateWorkItems' has no 'response' member  [pylint/E1101] [5]
Total errors found: 7 in 15 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Adam Barth 2013-01-22 20:38:52 PST
Comment on attachment 184107 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=184107&action=review

> Tools/QueueStatusServer/handlers/updateworkitems.py:58
> +        work_items.date = datetime.now()

Why not utcnow?
Comment 4 Adam Barth 2013-01-22 20:39:30 PST
First of all, this is a great idea.  It's just going to take me a bit to remember this code well enough to review your patch.  :)
Comment 5 Alan Cutter 2013-01-22 21:05:05 PST
Created attachment 184126 [details]
Patch
Comment 6 Alan Cutter 2013-01-22 21:06:07 PST
(In reply to comment #3)
> (From update of attachment 184107 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=184107&action=review
> 
> > Tools/QueueStatusServer/handlers/updateworkitems.py:58
> > +        work_items.date = datetime.now()
> 
> Why not utcnow?

Ah! That one slipped me. Fixed in second attachment.
Comment 7 WebKit Review Bot 2013-01-22 21:10:51 PST
Attachment 184126 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Tools/ChangeLog', u'Tools/QueueStatusServe..." exit_code: 1
Tools/QueueStatusServer/model/queuelog.py:32:  No name 'appengine' in module 'google'  [pylint/E0611] [5]
Tools/QueueStatusServer/model/queuelog.py:53:  [QueueLog.get_current] Class 'QueueLog' has no 'get_or_insert' member  [pylint/E1101] [5]
Tools/QueueStatusServer/model/patchlog.py:31:  No name 'appengine' in module 'google'  [pylint/E0611] [5]
Tools/QueueStatusServer/model/patchlog.py:48:  [PatchLog.lookup] Class 'PatchLog' has no 'get_or_insert' member  [pylint/E1101] [5]
Tools/QueueStatusServer/handlers/updateworkitems.py:52:  [UpdateWorkItems._update_work_items_from_request] Instance of 'UpdateWorkItems' has no 'request' member  [pylint/E1101] [5]
Tools/QueueStatusServer/handlers/updateworkitems.py:55:  [UpdateWorkItems._update_work_items_from_request] Instance of 'UpdateWorkItems' has no 'response' member  [pylint/E1101] [5]
Tools/QueueStatusServer/handlers/updateworkitems.py:72:  [UpdateWorkItems.post] Instance of 'UpdateWorkItems' has no 'response' member  [pylint/E1101] [5]
Total errors found: 7 in 15 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 Adam Barth 2013-01-22 22:00:46 PST
Comment on attachment 184126 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=184126&action=review

This looks good.  I don't remember this code very well, so this review isn't based on any deep knowledge of the stuff you're changing.  ;)

> Tools/QueueStatusServer/loggers/recordpatchevent.py:42
> +        print patches_waiting

Do we want to call print in production?
Comment 9 Eric Seidel (no email) 2013-01-22 22:04:51 PST
Comment on attachment 184126 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=184126&action=review

> Tools/QueueStatusServer/handlers/updateworkitems.py:49
> +            return

return None is clearer IMO.

>> Tools/QueueStatusServer/handlers/updateworkitems.py:55
>> +            self.response.out.write("Failed to parse work items: %s" % items_string)
> 
> [UpdateWorkItems._update_work_items_from_request] Instance of 'UpdateWorkItems' has no 'response' member  [pylint/E1101] [5]

We need to figure out why pylint is so stupid and fix it to not be.

> Tools/QueueStatusServer/loggers/recordbotevent.py:35
> +    def activity(cls, queue_name, bot_id):

I might have given this a more "active" name. :)  like record_activity.

>> Tools/QueueStatusServer/loggers/recordpatchevent.py:42
>> +        print patches_waiting
> 
> Do we want to call print in production?

That's how you print to the page in simple GAE, but that may not be what alan meant to do here.

>> Tools/QueueStatusServer/model/patchlog.py:31
>> +from google.appengine.ext import db
> 
> No name 'appengine' in module 'google'  [pylint/E0611] [5]

Yeah, clearly we need to fix sys.path on the bot or something.  We probably just need to install the gae sdk on the bot, or make it auto-installed.

> Tools/QueueStatusServer/model/queuelog.py:41
> +    max_patches_waiting = db.IntegerProperty(default=0)

I'm surprised 0 isn't the default.  I guess None is?
Comment 10 Eric Seidel (no email) 2013-01-22 22:16:42 PST
I didn't mean to block this commit, btw.
Comment 11 Adam Barth 2013-01-22 22:30:59 PST
We probably should check with Alan about the print statement before landing the patch.
Comment 12 Alan Cutter 2013-01-22 23:15:43 PST
Created attachment 184154 [details]
Patch
Comment 13 WebKit Review Bot 2013-01-22 23:17:38 PST
Attachment 184154 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Tools/ChangeLog', u'Tools/QueueStatusServe..." exit_code: 1
Tools/QueueStatusServer/model/queuelog.py:32:  No name 'appengine' in module 'google'  [pylint/E0611] [5]
Tools/QueueStatusServer/model/queuelog.py:53:  [QueueLog.get_current] Class 'QueueLog' has no 'get_or_insert' member  [pylint/E1101] [5]
Tools/QueueStatusServer/model/patchlog.py:31:  No name 'appengine' in module 'google'  [pylint/E0611] [5]
Tools/QueueStatusServer/model/patchlog.py:48:  [PatchLog.lookup] Class 'PatchLog' has no 'get_or_insert' member  [pylint/E1101] [5]
Tools/QueueStatusServer/handlers/updateworkitems.py:52:  [UpdateWorkItems._update_work_items_from_request] Instance of 'UpdateWorkItems' has no 'request' member  [pylint/E1101] [5]
Tools/QueueStatusServer/handlers/updateworkitems.py:55:  [UpdateWorkItems._update_work_items_from_request] Instance of 'UpdateWorkItems' has no 'response' member  [pylint/E1101] [5]
Tools/QueueStatusServer/handlers/updateworkitems.py:72:  [UpdateWorkItems.post] Instance of 'UpdateWorkItems' has no 'response' member  [pylint/E1101] [5]
Total errors found: 7 in 15 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 14 Alan Cutter 2013-01-22 23:24:39 PST
(In reply to comment #9)
> (From update of attachment 184126 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=184126&action=review
> 
> > Tools/QueueStatusServer/handlers/updateworkitems.py:49
> > +            return
> 
> return None is clearer IMO.
> 

Added None to return.

> > Tools/QueueStatusServer/loggers/recordbotevent.py:35
> > +    def activity(cls, queue_name, bot_id):
> 
> I might have given this a more "active" name. :)  like record_activity.
> 

Changed name to record_activity.

> >> Tools/QueueStatusServer/loggers/recordpatchevent.py:42
> >> +        print patches_waiting
> > 
> > Do we want to call print in production?
> 
> That's how you print to the page in simple GAE, but that may not be what alan meant to do here.
> 

Indeed not, removed print.

> >> Tools/QueueStatusServer/model/patchlog.py:31
> >> +from google.appengine.ext import db
> > 
> > No name 'appengine' in module 'google'  [pylint/E0611] [5]
> 
> Yeah, clearly we need to fix sys.path on the bot or something.  We probably just need to install the gae sdk on the bot, or make it auto-installed.
> 

That is a good idea! We might be able to get some automated tests going for QSS. (:

> > Tools/QueueStatusServer/model/queuelog.py:41
> > +    max_patches_waiting = db.IntegerProperty(default=0)
> 
> I'm surprised 0 isn't the default.  I guess None is?

None is the default for primitive values. The GAE docs aren't very clear on that detail.
Comment 15 WebKit Review Bot 2013-01-22 23:37:38 PST
Comment on attachment 184154 [details]
Patch

Clearing flags on attachment: 184154

Committed r140513: <http://trac.webkit.org/changeset/140513>
Comment 16 WebKit Review Bot 2013-01-22 23:37:42 PST
All reviewed patches have been landed.  Closing bug.
Comment 17 Csaba Osztrogonác 2013-01-23 07:11:09 PST
(In reply to comment #15)
> (From update of attachment 184154 [details])
> Clearing flags on attachment: 184154
> 
> Committed r140513: <http://trac.webkit.org/changeset/140513>

It seems it broke all EWS bots somehow. For example Qt EWS bots 
are stucked in the following exception when they try to fetch http://queues.webkit.org/next-patch/qt-ews:

Traceback (most recent call last):
  File "/storage/WebKit-qt-ews/Tools/Scripts/webkitpy/tool/bot/queueengine.py", line 96, in run
    work_item = self._delegate.next_work_item()
  File "/storage/WebKit-qt-ews/Tools/Scripts/webkitpy/tool/commands/queues.py", line 419, in next_work_item
    return self._next_patch()
  File "/storage/WebKit-qt-ews/Tools/Scripts/webkitpy/tool/commands/queues.py", line 216, in _next_patch
    patch_id = self._tool.status_server.next_work_item(self.name)
  File "/storage/WebKit-qt-ews/Tools/Scripts/webkitpy/common/net/statusserver.py", line 128, in next_work_item
    return self._fetch_url(next_patch_url)
  File "/storage/WebKit-qt-ews/Tools/Scripts/webkitpy/common/net/statusserver.py", line 162, in _fetch_url
    raise e
HTTPError: HTTP Error 500: Internal Server Error


I tried to fetch this URL from the browser and I got this exception:
Traceback (most recent call last):
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 714, in __call__
    handler.get(*groups)
  File "/base/data/home/apps/webkit-commit-queue/107612.364787856447592722/handlers/nextpatch.py", line 51, in get
    RecordPatchEvent.started(patch_id, queue_name)
  File "/base/data/home/apps/webkit-commit-queue/107612.364787856447592722/loggers/recordpatchevent.py", line 63, in started
    patch_log.calculate_wait_duration()
  File "/base/data/home/apps/webkit-commit-queue/107612.364787856447592722/model/patchlog.py", line 52, in calculate_wait_duration
    self.wait_duration = int(time_delta.total_seconds())
AttributeError: 'datetime.timedelta' object has no attribute 'total_seconds'

Could you guys check it, please?
Comment 18 Csaba Osztrogonác 2013-01-23 09:23:14 PST
ping?
Comment 19 Adam Barth 2013-01-23 09:35:18 PST
> ping?

Alan is probably asleep, but I think I've rolled back the server to the version before this change.
Comment 20 Csaba Osztrogonác 2013-01-23 09:39:55 PST
(In reply to comment #19)
> > ping?
> 
> Alan is probably asleep, but I think I've rolled back the server to the version before this change.

Thanks, EWS bots started to work again.
Comment 21 Csaba Osztrogonác 2013-01-23 09:40:20 PST
Reopen to fix the bug.
Comment 22 Alan Cutter 2013-01-23 19:48:25 PST
(In reply to comment #21)
> Reopen to fix the bug.

Created bug 107775 to resolve problem.