Bug 36908 - Several tests in fast/canvas/webgl/ failed randomly on Leopard Commit Bot
Summary: Several tests in fast/canvas/webgl/ failed randomly on Leopard Commit Bot
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Zhenyao Mo
URL:
Keywords:
Depends on: 33416
Blocks: 38560
  Show dependency treegraph
 
Reported: 2010-03-31 16:27 PDT by Eric Seidel (no email)
Modified: 2010-06-18 12:02 PDT (History)
6 users (show)

See Also:


Attachments
patch (1.15 KB, patch)
2010-04-01 10:30 PDT, Zhenyao Mo
no flags Details | Formatted Diff | Diff
patch: gather extra information about failure (2.18 KB, patch)
2010-04-02 10:20 PDT, Zhenyao Mo
no flags Details | Formatted Diff | Diff
patch (1.75 KB, patch)
2010-04-15 10:04 PDT, Zhenyao Mo
no flags Details | Formatted Diff | Diff
revised patch : responding to Ken Russell's review (1.83 KB, patch)
2010-04-15 10:11 PDT, Zhenyao Mo
abarth: review+
commit-queue: commit-queue-
Details | Formatted Diff | Diff
revised patch: initialize members in the same order as they are declared (1.82 KB, patch)
2010-04-15 11:53 PDT, Zhenyao Mo
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2010-03-31 16:27:19 PDT
fast/canvas/webgl/index-validation.html failed on Leopard Commit Bot

Twice actually.  Somehow lightning struck twice.

https://bugs.webkit.org/show_bug.cgi?id=36876#c3

/tmp/layout-test-results/fast/canvas/webgl/index-validation-actual.txt

44 
55 PASS gl.getError() is 0
66 PASS gl.drawElements(gl.TRIANGLES, 3, gl.UNSIGNED_SHORT, 0) is undefined.
7  PASS gl.getError() is 0
 7 FAIL gl.getError() should be 0. Was 1286.
88 PASS successfullyParsed is true
99 
1010 TEST COMPLETE

I suspect it may be related to the commit-bot's hardware?
Comment 2 Eric Seidel (no email) 2010-03-31 16:29:01 PDT
I'm happy to send a full hardware report to anyone who might need one.
Comment 3 Eric Seidel (no email) 2010-03-31 16:31:14 PDT
Grepping the commit-queue logs, I see the commit-queue having failed this test 9 times.  We just happen to hit two failures in a row while trying to land Adam's patch https://bugs.webkit.org/show_bug.cgi?id=36876#c3
Comment 4 Zhenyao Mo 2010-03-31 20:32:34 PDT
This happened once on my machine before, but not recently.

I would suggest to use "antialias:false" in this test to resolve this random failure.  However, we need to dig into the reason why this happens and find a fix for it.
Comment 5 Eric Seidel (no email) 2010-04-01 00:21:16 PDT
Failed again:
https://bugs.webkit.org/show_bug.cgi?id=36932#c4
Comment 6 Eric Seidel (no email) 2010-04-01 00:50:05 PDT
Something happened today to make this failure much more common than previously.
Comment 7 Eric Seidel (no email) 2010-04-01 01:09:44 PDT
Looks like the first time the Leopard Commit Bot ever saw this test fail was around 3:30PM this afternoon.
Comment 8 Eric Seidel (no email) 2010-04-01 01:10:23 PDT
It was shortly after http://trac.webkit.org/changeset/56872, which is probably related. :)
Comment 9 Eric Seidel (no email) 2010-04-01 01:11:36 PDT
This test has failed 39 times on the Leopard Commit Bot since it first failed this afternoon!
Comment 10 Adam Barth 2010-04-01 09:14:18 PDT
This failure rate is unacceptable.  Can we rollback that rev and see if it fixes the problem?
Comment 11 Kenneth Russell 2010-04-01 09:35:42 PDT
*Please* don't roll back that revision. It is a large one that we've been trying to land for weeks, and the failure is ultimately caused by driver bugs and not the code.

Can we temporarily disable this test on this bot? We may have another workaround we can land in a couple of hours.
Comment 12 Adam Barth 2010-04-01 09:40:03 PDT
> Can we temporarily disable this test on this bot?

We don't have a way to disable tests on the commit-queue specifically.  We can disable it more generally however.

> We may have another workaround we can land in a couple of hours.

Ok.

By way of context, this test is flaky enough that it often fails twice in a row.  We have something like 13k tests.  If an appreciable fraction of them failed this often, the entire test suite would be useless.
Comment 13 Kenneth Russell 2010-04-01 09:43:07 PDT
(In reply to comment #12)
> > Can we temporarily disable this test on this bot?
> 
> We don't have a way to disable tests on the commit-queue specifically.  We can
> disable it more generally however.

Can we add it to the Skipped list for mac-leopard for the moment?

> > We may have another workaround we can land in a couple of hours.
> 
> Ok.
> 
> By way of context, this test is flaky enough that it often fails twice in a
> row.  We have something like 13k tests.  If an appreciable fraction of them
> failed this often, the entire test suite would be useless.

Understood. I apologize for the flakiness introduced by this checkin.
Comment 14 Zhenyao Mo 2010-04-01 10:30:47 PDT
Created attachment 52310 [details]
patch

This patch should stabilize the test on the bot.  However, we need to dig deeper into the cause of this flaky behavior - very likely hardware/driver bugs - and find a better way to solve it.  Will create another bug to track this down.
Comment 15 Zhenyao Mo 2010-04-01 10:43:46 PDT
Just created a new bug to track this issue, but the patch should solve it for now.

https://bugs.webkit.org/show_bug.cgi?id=36972
Comment 16 Kenneth Russell 2010-04-01 10:44:15 PDT
Comment on attachment 52310 [details]
patch

Looks good to me.
Comment 17 Eric Seidel (no email) 2010-04-01 11:12:24 PDT
Comment on attachment 52310 [details]
patch

So is this work-around we intend to roll out eventually?  Or is this a permanent change to this test? The ChangeLog doesn't really say.  I'm fine with either of course.
Comment 18 Eric Seidel (no email) 2010-04-01 11:13:58 PDT
(In reply to comment #12)
> By way of context, this test is flaky enough that it often fails twice in a
> row.  We have something like 13k tests.  If an appreciable fraction of them
> failed this often, the entire test suite would be useless.

Actually, the statement is stronger.  After Adam's changes to speed up the commit-queue yesterday (which happened right after this regression landed) it now requires a flaky test to fail *three* times in a row for it to cause the commit queue to wrongly reject a patch. :)
Comment 19 Kenneth Russell 2010-04-01 11:15:42 PDT
(In reply to comment #17)
> (From update of attachment 52310 [details])
> So is this work-around we intend to roll out eventually?  Or is this a
> permanent change to this test? The ChangeLog doesn't really say.  I'm fine with
> either of course.

Right now we don't know why this OpenGL error is reported on this test, so the change is basically permanent.
Comment 20 Eric Seidel (no email) 2010-04-01 11:25:14 PDT
Another victim:
https://bugs.webkit.org/show_bug.cgi?id=36918#c3
Comment 21 Eric Seidel (no email) 2010-04-01 11:29:17 PDT
Another victim:
https://bugs.webkit.org/show_bug.cgi?id=36932#c6
Comment 22 WebKit Commit Bot 2010-04-01 11:37:24 PDT
Comment on attachment 52310 [details]
patch

Clearing flags on attachment: 52310

Committed r56925: <http://trac.webkit.org/changeset/56925>
Comment 23 WebKit Commit Bot 2010-04-01 11:37:36 PDT
All reviewed patches have been landed.  Closing bug.
Comment 24 Eric Seidel (no email) 2010-04-01 21:23:28 PDT
This test has still failed at least twice on the commit-queue since this fix went in.  I can get you the failure diff.
Comment 25 Eric Seidel (no email) 2010-04-01 21:25:13 PDT
The failure:
/tmp/layout-test-results/fast/canvas/webgl/index-validation-actual.txt

44 
55 PASS gl.getError() is 0
66 PASS gl.drawElements(gl.TRIANGLES, 3, gl.UNSIGNED_SHORT, 0) is undefined.
7  PASS gl.getError() is 0
 7 FAIL gl.getError() should be 0. Was 1286.
88 PASS successfullyParsed is true
99 
1010 TEST COMPLETE
Comment 26 Zhenyao Mo 2010-04-02 07:56:34 PDT
After the previous patch, the only difference in this test before and after the multisample patch is the stencil buffer is on by default (before by default we only have color and depth buffer).

I'll get another patch ready to turn off the stencil buffer for this test shortly.
Comment 27 Zhenyao Mo 2010-04-02 08:02:32 PDT
Also, I am curious, after the previous patch in which I turned off antialias for this test, does the frequency of this random failure reduced on the bot or still the same?
Comment 28 Eric Seidel (no email) 2010-04-02 10:02:45 PDT
It appears to be greatly reduced.  There have been 9 failure on the bot since last night.  Yesterday afternoon we had 66.  The bot may simply be doing less now... but I think the failure incidence was reduced, but not eliminated by your previous change.
Comment 29 Zhenyao Mo 2010-04-02 10:20:51 PDT
Created attachment 52421 [details]
patch: gather extra information about failure

This patch is an attempt to gather further information about the cause of this failure (sorry, but seems like the bot is the only machine that we get this failure at the moment).

With this patch, the failure behavior won't be better or worse.
Comment 30 Zhenyao Mo 2010-04-02 10:22:42 PDT
Eric, when this patch lands, and the bot failed this test again, could you send the output to me?  Then I will upload another patch to fix the failure problem (hopefully).
Comment 31 Kenneth Russell 2010-04-02 10:24:11 PDT
Comment on attachment 52421 [details]
patch: gather extra information about failure

Looks good to me.
Comment 32 Eric Seidel (no email) 2010-04-02 10:27:37 PDT
Comment on attachment 52421 [details]
patch: gather extra information about failure

OK.
Comment 33 WebKit Commit Bot 2010-04-02 12:22:00 PDT
Comment on attachment 52421 [details]
patch: gather extra information about failure

Clearing flags on attachment: 52421

Committed r57015: <http://trac.webkit.org/changeset/57015>
Comment 34 WebKit Commit Bot 2010-04-02 12:22:07 PDT
All reviewed patches have been landed.  Closing bug.
Comment 35 Zhenyao Mo 2010-04-15 09:59:03 PDT
These tests are still failing on the commit-bot.
Comment 36 Zhenyao Mo 2010-04-15 10:04:07 PDT
Created attachment 53444 [details]
patch
Comment 37 Kenneth Russell 2010-04-15 10:07:29 PDT
The patch still has an OOPS about no new tests. Also, I think it's worth separately mentioning in the ChangeLog that this fixes an uninitialized variable bug and restores a glFinish call that used to be present.
Comment 38 Zhenyao Mo 2010-04-15 10:11:23 PDT
Created attachment 53445 [details]
revised patch : responding to Ken Russell's review
Comment 39 Kenneth Russell 2010-04-15 10:13:02 PDT
Looks good to me.
Comment 40 Adam Barth 2010-04-15 11:06:15 PDT
Comment on attachment 53445 [details]
revised patch : responding to Ken Russell's review

Me too.
Comment 41 WebKit Commit Bot 2010-04-15 11:37:14 PDT
Comment on attachment 53445 [details]
revised patch : responding to Ken Russell's review

Rejecting patch 53445 from commit-queue.

Failed to run "['WebKitTools/Scripts/build-webkit', '--debug']" exit_code: 1
Last 500 characters of output:
n/yacc
    /bin/sh -c /Users/eseidel/Projects/CommitQueue/WebKitBuild/WebCore.build/Debug/WebCore.build/Script-5DF50887116F3077005202AB.sh
** BUILD FAILED **

The following build commands failed:
WebCore:
	Distributed-CompileC /Users/eseidel/Projects/CommitQueue/WebKitBuild/WebCore.build/Debug/WebCore.build/Objects-normal/i386/GraphicsContext3DMac.o /Users/eseidel/Projects/CommitQueue/WebCore/platform/graphics/mac/GraphicsContext3DMac.cpp normal i386 c++ com.apple.compilers.gcc.4_2
(1 failure)


Full output: http://webkit-commit-queue.appspot.com/results/1711103
Comment 42 Zhenyao Mo 2010-04-15 11:53:23 PDT
Created attachment 53458 [details]
revised patch: initialize members in the same order as they are declared
Comment 43 Kenneth Russell 2010-04-15 11:56:16 PDT
LGTM
Comment 44 Adam Barth 2010-04-15 12:01:19 PDT
Comment on attachment 53458 [details]
revised patch: initialize members in the same order as they are declared

Ah, the old "order of initialization" check.  :)
Comment 45 WebKit Commit Bot 2010-04-15 13:08:00 PDT
Comment on attachment 53458 [details]
revised patch: initialize members in the same order as they are declared

Clearing flags on attachment: 53458

Committed r57664: <http://trac.webkit.org/changeset/57664>
Comment 46 WebKit Commit Bot 2010-04-15 13:08:08 PDT
All reviewed patches have been landed.  Closing bug.