Bug 20553

Summary: [Meta] Import Philip Taylor's HTML5 <canvas> test suite and pass it all
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: Tools / TestsAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: abarth, ap, aroben, cshu, dglazkov, eric, excors, gsnedders, jamesr, jchaffraix, kenneth, kent.hansen, laszlo.gombos, mdelaney7, oliver, ossy, peter, qi.2.zhang, sam, simon.fraser, tonikitoo, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Macintosh   
OS: OS X 10.5   
URL: http://philip.html5.org/tests/canvas/suite/tests
Bug Depends on: 38463, 38845, 38934, 39027, 39068, 39161, 39166, 39168, 44628    
Bug Blocks:    
Attachments:
Description Flags
fix patch
none
fix patch 2
oliver: review-
fix patch 3
none
Add expected results
none
Add expected results, updated ChangeLog
none
enable qt tests
none
use failed results zimmermann: review-

Description Eric Seidel (no email) 2008-08-27 17:27:31 PDT
Import Philip Tyler's HTML5 <canavs> test suite and pass it all :)

We really should add:
http://philip.html5.org/tests/canvas/suite/tests
to our layout tests to cover more <canvas> cases.
Comment 1 Chang Shu 2010-05-06 06:33:53 PDT
Here are some statistics:
Total test cases					789	
Test cases failed on both Safari and Firefox		77	
Test cases failed on Safari but passed on Firefox 	39
Comment 2 Chang Shu 2010-05-06 11:30:01 PDT
I have the test cases ready for check-in. However, the patch I am about to submit is 12Mb. Would this be ok?
Comment 3 Oliver Hunt 2010-05-06 11:32:03 PDT
The tests should not be 12mb -- more or less all of phillips tests should be dumpAsText, there is no reason for them to be pixel tests, and so no reason for them to be anything like 12mb in size.
Comment 4 Laszlo Gombos 2010-05-06 11:44:34 PDT
Are there any licensing issues around the test content ?
Comment 5 Chang Shu 2010-05-06 11:45:43 PDT
(In reply to comment #3)
> The tests should not be 12mb -- more or less all of phillips tests should be
> dumpAsText, there is no reason for them to be pixel tests, and so no reason for
> them to be anything like 12mb in size.


I double-checked the test suite and there are about 192 png files in the package. The size of unzipped files is about 18Mb.
Comment 6 Oliver Hunt 2010-05-06 11:56:58 PDT
You should talk to phillip on irc and get him to clarify the license, i would go through the images and just unify all the identical images -- i suspect it generates many identical images.
Comment 7 Philip Taylor 2010-05-06 12:53:42 PDT
The test code and data is currently all MIT-licensed (I think it says that in various places in the source).

Most of the PNGs are demonstrating the approximate expected output, for a human to interpret (they can't be exact matches because of antialiasing, rounding, etc), so I assume they're not useful for automated tests. (Most of the test cases also run scripts that determine pass/fail, which might miss some rendering bugs (since they only test a few output pixels) but are probably much easier to hook into whatever test harness there is.)
Comment 8 Chang Shu 2010-05-07 09:03:25 PDT
Actually, the png files only contribute about 1Mb in the total size. I removed a couple of big files though:
current-work
tests/results.html
tests/spec.html
And now the patch file is about 6.4Mb. The total size on the file system should reach 12Mb. This is because there are 2546 html files under tests folder and each of them is several kb.
Comment 9 Oliver Hunt 2010-05-07 12:03:42 PDT
(In reply to comment #8)
> Actually, the png files only contribute about 1Mb in the total size. I removed
> a couple of big files though:
> current-work
> tests/results.html
> tests/spec.html
> And now the patch file is about 6.4Mb. The total size on the file system should
> reach 12Mb. This is because there are 2546 html files under tests folder and
> each of them is several kb.

Are all the tests set to use text only results?
Comment 10 Chang Shu 2010-05-07 12:12:20 PDT
(In reply to comment #9)
> (In reply to comment #8)
> > Actually, the png files only contribute about 1Mb in the total size. I removed
> > a couple of big files though:
> > current-work
> > tests/results.html
> > tests/spec.html
> > And now the patch file is about 6.4Mb. The total size on the file system should
> > reach 12Mb. This is because there are 2546 html files under tests folder and
> > each of them is several kb.
> 
> Are all the tests set to use text only results?

Right, no image comparison is used to determine passing/failing of the tests. Instead, it may test certain pixels like some of current layout tests do. Correct me if I'm wrong, Philip.
Comment 11 Chang Shu 2010-05-10 13:22:56 PDT
Created attachment 55597 [details]
fix patch
Comment 12 Chang Shu 2010-05-12 07:22:10 PDT
If people feel the complete test suite is too big, an alternative way is to check in the test cases one by one, on necessity base.
Comment 13 Chang Shu 2010-05-14 12:56:37 PDT
Created attachment 56097 [details]
fix patch 2

I removed many html files that are only used for manual testing. This reduces the total size to 2.2Mb.
Hopefully, this is good to go.
Comment 14 Dimitri Glazkov (Google) 2010-05-14 13:01:43 PDT
Don't forget to skip on chromium, too. it's a bit different syntax, but it's fairly intuitive: trac.webkit.org/browser/trunk/LayoutTests/chromium/test_expectations.txt

Also, these tests look to be fairly trivial to be made text-only, i.e. setting layoutTestController.dumpAsText(). Can you do that in tests.js?
Comment 15 Oliver Hunt 2010-05-14 13:03:46 PDT
(In reply to comment #14)
> Don't forget to skip on chromium, too. it's a bit different syntax, but it's fairly intuitive: trac.webkit.org/browser/trunk/LayoutTests/chromium/test_expectations.txt
> 
> Also, these tests look to be fairly trivial to be made text-only, i.e. setting layoutTestController.dumpAsText(). Can you do that in tests.js?

I have explicitly requested that these not be pixel tests earlier. Why are they still doing pixel dumps?
Comment 16 Oliver Hunt 2010-05-14 13:05:38 PDT
Comment on attachment 56097 [details]
fix patch 2

r- due to using pixel tests
Comment 17 Chang Shu 2010-05-14 13:25:19 PDT
Thanks for the comments, guys. Is this the right syntax for chromium skip file?
// New canvas test suite from http://philip.html5.org/tests/canvas/suite/tests/
BUG20553 SKIP : fast/canvas/philip = TEXT
Comment 18 Chang Shu 2010-05-14 13:49:27 PDT
Created attachment 56106 [details]
fix patch 3

Based on the comments:
1. Dump text results only
2. Skip tests on chromium
Comment 19 Chang Shu 2010-05-18 06:59:53 PDT
Can anybody review my patch? :)
Comment 20 Kent Hansen 2010-05-19 07:22:12 PDT
Cool!
What's stopping you from adding expected test results straight away, though?
I'd just run-webkit-tests --new-test-results and modify all the failing ones to say "Passed".
That should be a good starting point for creating the cross-platform & platform-specific skip lists for the individual tests.
Comment 21 Chang Shu 2010-05-19 07:33:20 PDT
(In reply to comment #20)
> Cool!
> What's stopping you from adding expected test results straight away, though?
> I'd just run-webkit-tests --new-test-results and modify all the failing ones to say "Passed".
> That should be a good starting point for creating the cross-platform & platform-specific skip lists for the individual tests.

My plan is to check in the test cases as the first step. The next step would be generate expected results on Mac and update skip list with individual tests (not the whole directory as it is right now). Then move on to other platforms, like Qt, which I can take care of.
Comment 22 Eric Seidel (no email) 2010-05-21 12:46:10 PDT
Comment on attachment 56106 [details]
fix patch 3

I like new tests.  Please add results for the various platforms in follow-up commits.  If you need help from various port maintainers, ask on webkit-dev.
Comment 23 Chang Shu 2010-05-21 13:51:10 PDT
Comment on attachment 56106 [details]
fix patch 3

committed manually, r59954.
clean up flags on attachment 56106 [details].
Will work on expected results.
Comment 24 James Robinson 2010-05-21 14:11:21 PDT
Can you move these from fast/canvas to canvas/?  Also, do you know that the results will (in general) be platform-specific?  Without looking in detail I would expect the dumpAsText results to mostly be the same.  If not, that would probably be a problem,
Comment 25 Chang Shu 2010-05-21 14:18:16 PDT
(In reply to comment #24)
> Can you move these from fast/canvas to canvas/?  Also, do you know that the results will (in general) be platform-specific?  Without looking in detail I would expect the dumpAsText results to mostly be the same.  If not, that would probably be a problem,

that's a good point (on moving tests to canvas/). will do. Yes, I also would expect most expected results are the same. thanks!
Comment 26 Chang Shu 2010-05-24 07:56:22 PDT
I moved the test cases manually to LayoutTests/canvas. Unreviewed. r60072.
I didn't generate a patch for this because it's big with all html contents marked as "-".
Hopefully, it's acceptable. :)
Comment 27 Chang Shu 2010-05-24 08:06:12 PDT
Created attachment 56888 [details]
Add expected results
Comment 28 James Robinson 2010-05-24 14:30:47 PDT
You should not skip just because they fail, only skip tests if they will cause problems running the rest of the suite.   Tests that we currently fail should have the failures reflected in their -expected.txt results and have bugs on file somewhere in bugs.webkit.org so that they can get triaged and fixed as needed.

Several tests have output that say "Cannot automatically verify result" or something similar.  Is there much value to running these anyway?
Comment 29 James Robinson 2010-05-24 14:58:48 PDT
We chatted a bit in IRC.  I think this patch is fine to land if it just removes canvas/philip from platform/mac/Skipped altogether.  The test failures are already encapsulated in the -expected.txt files and since other ports still have this on their own Skipped lists they should not turn red.
Comment 30 James Robinson 2010-05-24 15:45:16 PDT
Actually, it looks like the -expectated.txt files here are not really the output from DRT but something else.  That is definitely not right, the test expectations should always be the actual output currently produced and not something else.  I don't think this patch should land until the expectations match what actually happens on at least one port.
Comment 31 Chang Shu 2010-05-24 19:24:24 PDT
(In reply to comment #30)
> Actually, it looks like the -expectated.txt files here are not really the output from DRT but something else.  That is definitely not right, the test expectations should always be the actual output currently produced and not something else.  I don't think this patch should land until the expectations match what actually happens on at least one port.

All the -expected.txt files for succeeded cases are actual output from DRT. The failed ones on Mac are modified manually to "passed". This is suggested in comment#20 by Kent, which makes a lot of sense to me. I don't think the expected results should match the actual output on any platforms, unless all the test cases pass, which is not happening yet.
Comment 32 Kent Hansen 2010-05-25 02:55:49 PDT
(In reply to comment #31)
> (In reply to comment #30)
> > Actually, it looks like the -expectated.txt files here are not really the output from DRT but something else.  That is definitely not right, the test expectations should always be the actual output currently produced and not something else.  I don't think this patch should land until the expectations match what actually happens on at least one port.
> 
> All the -expected.txt files for succeeded cases are actual output from DRT. The failed ones on Mac are modified manually to "passed". This is suggested in comment#20 by Kent, which makes a lot of sense to me. I don't think the expected results should match the actual output on any platforms, unless all the test cases pass, which is not happening yet.

Right. The skipped tests would be removed from the Skipped-list as each issue is fixed, without having to modify the test data.

The expected results look good, except for a couple of the hand-edited ones that seem to be missing the "Expected output:\n\n" part, making the actual-vs-expected diff larger than necessary when running the tests.

When running the tests on Qt/Linux there are quite a few more failures than on Safari/Mac, so it makes sense to maintain per-platform Skipped lists for now. But once all those are in place it's probably a good idea to go through the failures and try to sort out which ones are really platform-specific.
Comment 33 Chang Shu 2010-05-25 03:02:17 PDT
> 
> The expected results look good, except for a couple of the hand-edited ones that seem to be missing the "Expected output:\n\n" part, making the actual-vs-expected diff larger than necessary when running the tests.
> 
Thanks, I noticed that too. My local workspace has them fixed. I will commit with the right text after r+.
Comment 34 Chang Shu 2010-05-25 06:41:10 PDT
Created attachment 57013 [details]
Add expected results, updated ChangeLog
Comment 35 Kenneth Rohde Christiansen 2010-05-25 06:57:12 PDT
Comment on attachment 57013 [details]
Add expected results, updated ChangeLog

OK looks good to me now. Way better ChangeLog.
Comment 36 Chang Shu 2010-05-25 07:33:43 PDT
Patch 57013 committed manually, r60162. Flags cleared.
Comment 37 WebKit Review Bot 2010-05-25 08:05:12 PDT
http://trac.webkit.org/changeset/60162 might have broken SnowLeopard Intel Release (Tests)
Comment 38 Chang Shu 2010-05-25 08:41:12 PDT
(In reply to comment #37)
> http://trac.webkit.org/changeset/60162 might have broken SnowLeopard Intel Release (Tests)

I added two additional cases that failed on SnowLeopard only to the Skipped.
Code change was committed manually, r60166.
Comment 39 Adam Barth 2010-05-25 09:11:03 PDT
Looks like your change also caused problems on Tiger.
Comment 40 Antonio Gomes 2010-05-25 09:15:10 PDT
(In reply to comment #39)
> Looks like your change also caused problems on Tiger.

see bug 39677 , it skipped failing tests on Tiger.
Comment 41 Adam Roben (:aroben) 2010-05-25 09:16:36 PDT
(In reply to comment #32)
> (In reply to comment #31)
> > (In reply to comment #30)
> > > Actually, it looks like the -expectated.txt files here are not really the output from DRT but something else.  That is definitely not right, the test expectations should always be the actual output currently produced and not something else.  I don't think this patch should land until the expectations match what actually happens on at least one port.
> > 
> > All the -expected.txt files for succeeded cases are actual output from DRT. The failed ones on Mac are modified manually to "passed". This is suggested in comment#20 by Kent, which makes a lot of sense to me. I don't think the expected results should match the actual output on any platforms, unless all the test cases pass, which is not happening yet.
> 
> Right. The skipped tests would be removed from the Skipped-list as each issue is fixed, without having to modify the test data.

This isn't how we've traditionally done things in the WebKit project. the -expected files should reflect what actually happens today when you run the test, not what we wish happened. We should run all the tests on all platforms, rather than skipping them, and update the expected results as we fix bugs. One benefit of doing things this way is that you'll notice when you fix a bug in one of these tests, because the test will show up as not matching the expected results (whereas if it were on the Skipped list, you might not notice you had fixed the test at all!). Conversely, if you make a change that makes one of the already-failing tests even worse, you will notice that, too.

I recommend that we check in results for all the tests that match what happens in WebKit today, and remove them from the Skipped list.
Comment 42 Kenneth Rohde Christiansen 2010-05-25 09:19:37 PDT
So we land expected failures and add bug reports for them? Sounds good with me.
Comment 43 Chang Shu 2010-05-25 10:10:43 PDT
As long as we are on the same page, I don't mind either way. I submitted "Failed" expected results before but at that time, some reviewers told me not to. So I guess it's still confusing to many contributors.
Comment 44 Kent Hansen 2010-05-25 10:26:22 PDT
(In reply to comment #41)
> (In reply to comment #32)
> > Right. The skipped tests would be removed from the Skipped-list as each issue is fixed, without having to modify the test data.
> 
> This isn't how we've traditionally done things in the WebKit project. the -expected files should reflect what actually happens today when you run the test, not what we wish happened. We should run all the tests on all platforms, rather than skipping them, and update the expected results as we fix bugs. One benefit of doing things this way is that you'll notice when you fix a bug in one of these tests, because the test will show up as not matching the expected results (whereas if it were on the Skipped list, you might not notice you had fixed the test at all!). Conversely, if you make a change that makes one of the already-failing tests even worse, you will notice that, too.
> 
> I recommend that we check in results for all the tests that match what happens in WebKit today, and remove them from the Skipped list.

I see. It would be nice if there were a way to express expected failures, rather than checking in incorrect results. (I thought the purpose of adding this test suite was to test conformance, not current behavior, but maybe that's not the case. Certainly an interesting point :) )

In Qt, if a test is flagged as expected to fail, the bot will stay green, but the expected failure is clearly visible in the test output (so it's not so easily forgotten). If a test that's expected to fail suddenly starts to pass, the bot reports an "unexpected pass", and goes red. Such failures are readily distinguishable from "normal" test failures.
Comment 45 Chang Shu 2010-05-25 10:42:13 PDT
One benefit of having all failed test cases in the skipped list is easy to read. Searching failed test cases in bug list doesn't seem to be as clear as by reading skipped file.
Comment 46 Adam Roben (:aroben) 2010-05-25 10:46:25 PDT
(In reply to comment #44)
> I see. It would be nice if there were a way to express expected failures, rather than checking in incorrect results.

new-run-webkit-tests has this ability. Hopefully we'll be able to switch over soon.

> (I thought the purpose of adding this test suite was to test conformance, not current behavior, but maybe that's not the case. Certainly an interesting point :) )

All of the tests in LayoutTests help us catch regressions. Many of them also help us test conformance to some specification(s). But the intent is that run-webkit-tests should report no failures on all platforms at all times. Any failures it reports should be either fixed immediately or put into a bug report. (If run-webkit-tests consistently reported failures on some platforms, it would be very hard to catch newly-introduced regressions.)
Comment 47 Adam Roben (:aroben) 2010-05-25 10:47:43 PDT
(In reply to comment #45)
> One benefit of having all failed test cases in the skipped list is easy to read. Searching failed test cases in bug list doesn't seem to be as clear as by reading skipped file.

That is true. But I think that benefit is outweighed by the disadvantages I listed in comment 41. And it doesn't match our current policy. (The policy could of course be changed, but that would only happen after discussion with the rest of the WebKit community.)
Comment 48 Chang Shu 2010-05-25 10:54:08 PDT
A practical question: do we put failed-expectations in platform specific folder? Let's say, if test case fails on Mac and originally, I put it in common folder. Then I run it on Qt and it passes. Shall I put the passing result in platform/qt or replace it in common folder after moving the "failed" one to platform/mac?
Comment 49 Adam Roben (:aroben) 2010-05-25 11:06:39 PDT
(In reply to comment #48)
> A practical question: do we put failed-expectations in platform specific folder? Let's say, if test case fails on Mac and originally, I put it in common folder. Then I run it on Qt and it passes. Shall I put the passing result in platform/qt or replace it in common folder after moving the "failed" one to platform/mac?

I guess the answer depends on what other platforms do. Do the tests also fail on Windows? GTK+? Chromium?

If the tests pass on more than one platform, then it seems likely that the failures are due to incorrect platform-specific code. In that case, it makes sense to me to put the failure results in the platform-specific directories.

If the tests only pass on one platform, then it seems likely that the failures are due to incorrect cross-platform code. In that case, it makes sense to me to put the failure results in the cross-platform directory.

(This is just a rule-of-thumb.)
Comment 50 Chang Shu 2010-05-25 11:12:01 PDT
Ok, understood. I will update the expected results and clean up skipped file.
Comment 51 Chang Shu 2010-05-28 07:20:20 PDT
Created attachment 57328 [details]
enable qt tests
Comment 52 Chang Shu 2010-05-28 11:44:43 PDT
Comment on attachment 57328 [details]
enable qt tests

committed manually, r60371. remove flags
Comment 53 WebKit Review Bot 2010-05-28 13:58:36 PDT
http://trac.webkit.org/changeset/60371 might have broken GTK Linux 32-bit Debug
The following changes are on the blame list:
http://trac.webkit.org/changeset/60371
http://trac.webkit.org/changeset/60372
http://trac.webkit.org/changeset/60373
Comment 54 Chang Shu 2010-05-28 14:17:48 PDT
Created attachment 57375 [details]
use failed results
Comment 55 Kent Hansen 2010-07-23 05:35:16 PDT
(In reply to comment #54)
> Created an attachment (id=57375) [details]
> use failed results

This needs rebasing now, and the expected results need to be updated (at least there are some expected results that don't coincide with the results I'm getting on my local Mac).
Comment 56 Nikolas Zimmermann 2010-07-30 23:20:22 PDT
Comment on attachment 57375 [details]
use failed results

r- because of Kent Hansens comment.
Comment 57 Sam Sneddon [:gsnedders] 2016-09-15 08:24:06 PDT
These are nowadays maintained as part of https://github.com/w3c/web-platform-tests/tree/master/2dcontext so this is really obsoleted by just running more of web-platform-tests.