WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 30098
commit-queue can wrongly reject patches if the buildbots are behind
https://bugs.webkit.org/show_bug.cgi?id=30098
Summary
commit-queue can wrongly reject patches if the buildbots are behind
Eric Seidel (no email)
Reported
2009-10-05 15:01:07 PDT
commit-queue can wrongly reject patches if the buildbots are behind The commit-queue uses the buildbot status to determine if it should be landing patches or not. Right now it checks before trying to build and if the bots are green goes ahead and tries to land. If the build/tests fail then it assumes they were caused by the patch. This can lead to false rejections if the bots are simply behind but they are about to turn red. :( There are many solutions we could use to fix this. Two possibilities: - roll out the patch locally and try to build/test again, if that passes actually reject the patch. - update to exactly the revision that the bots last tested.
Attachments
My proposed fix, currently testing on live commit-bot
(9.03 KB, patch)
2009-10-26 17:34 PDT
,
Eric Seidel (no email)
no flags
Details
Formatted Diff
Diff
Patch
(2.50 KB, patch)
2010-01-04 14:46 PST
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Patch
(2.96 KB, patch)
2010-01-04 15:04 PST
,
Adam Barth
eric
: review+
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Eric Seidel (no email)
Comment 1
2009-10-26 17:28:04 PDT
OK. I've modified the commit-queue to pause itself when the buildbots get behind. Example: Builders ["Leopard Intel Release (Tests)" : 4 pending, "Leopard Intel Debug (Tests)" : 7 pending] are behind. Waiting. See
http://build.webkit.org/waterfall
. Sleeping until 2009-10-26 17:31:39 (5 mins). I'm going to test the change locally for a bit. I'll post a patch for review tomorrow assuming all goes well.
Eric Seidel (no email)
Comment 2
2009-10-26 17:29:35 PDT
The sad part of this change is that it will make the commit-bot much more sensitive to our slow layout tests. It also will make the commit-bot tend to commit patches late at night (when the tree is otherwise quiet) and will slow down response time to commit-queue+. That said, I still would rather have a commit-queue which never falsely rejects patches than one which is fast.
Eric Seidel (no email)
Comment 3
2009-10-26 17:34:41 PDT
Created
attachment 41919
[details]
My proposed fix, currently testing on live commit-bot
Eric Seidel (no email)
Comment 4
2009-10-26 17:35:36 PDT
http://webkit-commit-queue.appspot.com/
doesn't really like having such long status messages. But you can see what the output of the bot looks like there.
Eric Seidel (no email)
Comment 5
2009-10-27 01:39:06 PDT
Actually checking to make sure that there are no pending builds isn't quite strong enough. If I want to actually make sure that the very latest build was green then I need to make sure that every builder is "idle". Adam Barth points out that I should have a fall-back mode, where if the builders are busy I do an extra build/test step w/o the patch applied first, if that passes then I go ahead and commit. That might just be extra complexity for an incomplete solution, but then again it might be a big help. As-is waiting for the builders to all be idle before letting the commit-queue run is going to greatly slow down the rate at which the commit-queue can land patches as it will be always tied to run as slowly as the slowest bot that it's watching. :(
Eric Seidel (no email)
Comment 6
2009-10-27 01:40:39 PDT
I think if we end up turning this on for the commit-queue, I may end up needing to spend a bunch of effort making the buildbots faster (by speeding up run-webkit-tests). We'll see.
Eric Seidel (no email)
Comment 7
2009-12-22 18:27:05 PST
This patch is long out of date, but I need to update it, because it would be very useful. :)
Eric Seidel (no email)
Comment 8
2010-01-04 13:15:12 PST
Another victim:
https://bugs.webkit.org/show_bug.cgi?id=33090#c3
Adam Barth
Comment 9
2010-01-04 14:37:33 PST
This problem is super easy to solve just by running a complete cycle before landing the patch and then trying to apply the patch with --no-update. That's how the EWS rolls.
Eric Seidel (no email)
Comment 10
2010-01-04 14:43:53 PST
The commit is more likely to fail at the end however, because the ChangeLogs are more likely to be out of date. That approach trades one evil (longer waits before bots are actually green) for another (longer cycles, more likely to reject). But since it's easy to implement and better than what we've got I say go for it. We also need to eventually fix buildbot.py's pending count (right now it's one-off), but that's relatively low priority.
Adam Barth
Comment 11
2010-01-04 14:46:13 PST
Created
attachment 45834
[details]
Patch
WebKit Review Bot
Comment 12
2010-01-04 14:49:28 PST
style-queue ran check-webkit-style on
attachment 45834
[details]
without any errors.
Eric Seidel (no email)
Comment 13
2010-01-04 14:50:34 PST
Comment on
attachment 45834
[details]
Patch Seems: 162 self._update_status("Unabled to successfully build and test", None) should include the revision number that it tried, and should include the results spew, no? Also, shouldn't it prefix with ERROR: so that it shows up purple? Probably should have a comment to explain the --no-update here: 170 self.run_bugzilla_tool(["land-attachment", "--force-clean", "--non-interactive", "--no-update", "--parent-command=commit-queue", "--build-style=both", "--quiet", patch["id"]]) I think this is definitely better than what we have. Sad that we'll end up running the layout tests twice which will slow us down a lot. Hopefully that will be moot once run_webkit_tests.py is with us. I'll say r+, but I think this needs modifications to report the status of the failed test build better. That will help us more easily debug flakey tests (and will make the dashboard correctly show purple).
Adam Barth
Comment 14
2010-01-04 15:02:33 PST
> should include the revision number that it tried, and should include the > results spew, no?
It could, but no one is going to see the message expect maybe you. The EWS also as a similar idle message and I've never been interested in why they didn't build.
> Also, shouldn't it prefix with ERROR: so that it shows up purple?
This doesn't show up purple anywhere because it's not associated with any patch. It's just an idle message like "Builders [%s] are red." I'm tempted to remove the message instead of making it more complicated.
> Probably should have a comment to explain the --no-update here:
Done.
> I'll say r+, but I think this needs modifications to report the status of the > failed test build better. That will help us more easily debug flakey tests > (and will make the dashboard correctly show purple).
Flakey tests will just make the commit-queue wait five minutes before landing a patch occasionally. If we get a flaky test 1% of the time, this will add 3 seconds of average latency, which is lost in the noise of the 5 minute polling frequency.
Adam Barth
Comment 15
2010-01-04 15:04:07 PST
Created
attachment 45835
[details]
Patch
Eric Seidel (no email)
Comment 16
2010-01-04 15:06:29 PST
Comment on
attachment 45835
[details]
Patch LGTM. Slightly confused by the second _builders_are_green() check? (might need a comment?) Won't land-attachment check for us? I still need to fix buildbot to be able to tell you if things are pending. I'll do that in a separate bug.
Adam Barth
Comment 17
2010-01-04 15:13:13 PST
> Slightly confused by the second _builders_are_green() check? (might need a > comment?)
I added that because you mentioned that build-and-test could take a while. If the bots have gone red in the meantime (say on Windows), then we don't want to land in order to avoid making the tree worse.
> Won't land-attachment check for us?
If that's the case, then why was the code there originally to check the builders? I'd be happy to remove it if you think the subprocess should handle checking the builders. I'd be more inclined to have to pass --ignore-builders and have the master process deal with deciding whether its time to run land-attachment.
Adam Barth
Comment 18
2010-01-04 15:14:02 PST
Committed
r52764
: <
http://trac.webkit.org/changeset/52764
>
Eric Seidel (no email)
Comment 19
2010-01-04 15:15:52 PST
Well, back in the day we didn't have any way to distingish between a recoverable/temporary error and a permentant error. I'm not sure that builders being red returns the right exit code. But you're right, we could remove the second check and depend on land-attachment assuming that EnsureBuildersAreGreen returns the correct exit code.
Eric Seidel (no email)
Comment 20
2010-01-20 03:42:06 PST
This change has made the commit-queue nearly unbearably slow. We may need to revert this, or try a different approach.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug