Bug 30098

Summary:

commit-queue can wrongly reject patches if the buildbots are behind

Product:

WebKit

Reporter:

Eric Seidel (no email) <eric>

Component:

Tools / Tests

Assignee:

Nobody <webkit-unassigned>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

abarth, levin, ojan, webkit.review.bot

Priority:

Version:

528+ (Nightly build)

Hardware:

OS:

OS X 10.5

Attachments:

Description	Flags
My proposed fix, currently testing on live commit-bot	none
Patch	none
Patch	eric: review+

Eric Seidel (no email)

Reported 2009-10-05 15:01:07 PDT

commit-queue can wrongly reject patches if the buildbots are behind The commit-queue uses the buildbot status to determine if it should be landing patches or not. Right now it checks before trying to build and if the bots are green goes ahead and tries to land. If the build/tests fail then it assumes they were caused by the patch. This can lead to false rejections if the bots are simply behind but they are about to turn red. :( There are many solutions we could use to fix this. Two possibilities: - roll out the patch locally and try to build/test again, if that passes actually reject the patch. - update to exactly the revision that the bots last tested.

Attachments
My proposed fix, currently testing on live commit-bot (9.03 KB, patch) 2009-10-26 17:34 PDT, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
Patch (2.50 KB, patch) 2010-01-04 14:46 PST, Adam Barth	no flags	Details Formatted Diff Diff
Patch (2.96 KB, patch) 2010-01-04 15:04 PST, Adam Barth	eric: review+	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Eric Seidel (no email)

Comment 1 2009-10-26 17:28:04 PDT

OK. I've modified the commit-queue to pause itself when the buildbots get behind. Example: Builders ["Leopard Intel Release (Tests)" : 4 pending, "Leopard Intel Debug (Tests)" : 7 pending] are behind. Waiting. See http://build.webkit.org/waterfall. Sleeping until 2009-10-26 17:31:39 (5 mins). I'm going to test the change locally for a bit. I'll post a patch for review tomorrow assuming all goes well.

Eric Seidel (no email)

Comment 2 2009-10-26 17:29:35 PDT

The sad part of this change is that it will make the commit-bot much more sensitive to our slow layout tests. It also will make the commit-bot tend to commit patches late at night (when the tree is otherwise quiet) and will slow down response time to commit-queue+. That said, I still would rather have a commit-queue which never falsely rejects patches than one which is fast.

Eric Seidel (no email)

Comment 3 2009-10-26 17:34:41 PDT

Created attachment 41919 [details] My proposed fix, currently testing on live commit-bot

Eric Seidel (no email)

Comment 4 2009-10-26 17:35:36 PDT

http://webkit-commit-queue.appspot.com/ doesn't really like having such long status messages. But you can see what the output of the bot looks like there.

Eric Seidel (no email)

Comment 5 2009-10-27 01:39:06 PDT

Actually checking to make sure that there are no pending builds isn't quite strong enough. If I want to actually make sure that the very latest build was green then I need to make sure that every builder is "idle". Adam Barth points out that I should have a fall-back mode, where if the builders are busy I do an extra build/test step w/o the patch applied first, if that passes then I go ahead and commit. That might just be extra complexity for an incomplete solution, but then again it might be a big help. As-is waiting for the builders to all be idle before letting the commit-queue run is going to greatly slow down the rate at which the commit-queue can land patches as it will be always tied to run as slowly as the slowest bot that it's watching. :(

Eric Seidel (no email)

Comment 6 2009-10-27 01:40:39 PDT

I think if we end up turning this on for the commit-queue, I may end up needing to spend a bunch of effort making the buildbots faster (by speeding up run-webkit-tests). We'll see.

Eric Seidel (no email)

Comment 7 2009-12-22 18:27:05 PST

This patch is long out of date, but I need to update it, because it would be very useful. :)

Eric Seidel (no email)

Comment 8 2010-01-04 13:15:12 PST

Another victim: https://bugs.webkit.org/show_bug.cgi?id=33090#c3

Adam Barth

Comment 9 2010-01-04 14:37:33 PST

This problem is super easy to solve just by running a complete cycle before landing the patch and then trying to apply the patch with --no-update. That's how the EWS rolls.

Eric Seidel (no email)

Comment 10 2010-01-04 14:43:53 PST

The commit is more likely to fail at the end however, because the ChangeLogs are more likely to be out of date. That approach trades one evil (longer waits before bots are actually green) for another (longer cycles, more likely to reject). But since it's easy to implement and better than what we've got I say go for it. We also need to eventually fix buildbot.py's pending count (right now it's one-off), but that's relatively low priority.

Adam Barth

Comment 11 2010-01-04 14:46:13 PST

Created attachment 45834 [details] Patch

WebKit Review Bot

Comment 12 2010-01-04 14:49:28 PST

style-queue ran check-webkit-style on attachment 45834 [details] without any errors.

Eric Seidel (no email)

Comment 13 2010-01-04 14:50:34 PST

Comment on attachment 45834 [details] Patch Seems: 162 self._update_status("Unabled to successfully build and test", None) should include the revision number that it tried, and should include the results spew, no? Also, shouldn't it prefix with ERROR: so that it shows up purple? Probably should have a comment to explain the --no-update here: 170 self.run_bugzilla_tool(["land-attachment", "--force-clean", "--non-interactive", "--no-update", "--parent-command=commit-queue", "--build-style=both", "--quiet", patch["id"]]) I think this is definitely better than what we have. Sad that we'll end up running the layout tests twice which will slow us down a lot. Hopefully that will be moot once run_webkit_tests.py is with us. I'll say r+, but I think this needs modifications to report the status of the failed test build better. That will help us more easily debug flakey tests (and will make the dashboard correctly show purple).

Adam Barth

Comment 14 2010-01-04 15:02:33 PST

> should include the revision number that it tried, and should include the > results spew, no? It could, but no one is going to see the message expect maybe you. The EWS also as a similar idle message and I've never been interested in why they didn't build. > Also, shouldn't it prefix with ERROR: so that it shows up purple? This doesn't show up purple anywhere because it's not associated with any patch. It's just an idle message like "Builders [%s] are red." I'm tempted to remove the message instead of making it more complicated. > Probably should have a comment to explain the --no-update here: Done. > I'll say r+, but I think this needs modifications to report the status of the > failed test build better. That will help us more easily debug flakey tests > (and will make the dashboard correctly show purple). Flakey tests will just make the commit-queue wait five minutes before landing a patch occasionally. If we get a flaky test 1% of the time, this will add 3 seconds of average latency, which is lost in the noise of the 5 minute polling frequency.

Adam Barth

Comment 15 2010-01-04 15:04:07 PST

Created attachment 45835 [details] Patch

Eric Seidel (no email)

Comment 16 2010-01-04 15:06:29 PST

Comment on attachment 45835 [details] Patch LGTM. Slightly confused by the second _builders_are_green() check? (might need a comment?) Won't land-attachment check for us? I still need to fix buildbot to be able to tell you if things are pending. I'll do that in a separate bug.

Adam Barth

Comment 17 2010-01-04 15:13:13 PST

> Slightly confused by the second _builders_are_green() check? (might need a > comment?) I added that because you mentioned that build-and-test could take a while. If the bots have gone red in the meantime (say on Windows), then we don't want to land in order to avoid making the tree worse. > Won't land-attachment check for us? If that's the case, then why was the code there originally to check the builders? I'd be happy to remove it if you think the subprocess should handle checking the builders. I'd be more inclined to have to pass --ignore-builders and have the master process deal with deciding whether its time to run land-attachment.

Adam Barth

Comment 18 2010-01-04 15:14:02 PST

Committed r52764: <http://trac.webkit.org/changeset/52764>

Eric Seidel (no email)

Comment 19 2010-01-04 15:15:52 PST

Well, back in the day we didn't have any way to distingish between a recoverable/temporary error and a permentant error. I'm not sure that builders being red returns the right exit code. But you're right, we could remove the second check and depend on land-attachment assuming that EnsureBuildersAreGreen returns the correct exit code.

Eric Seidel (no email)

Comment 20 2010-01-20 03:42:06 PST

This change has made the commit-queue nearly unbearably slow. We may need to revert this, or try a different approach.

Note You need to log in before you can comment on or make changes to this bug.