Bug 66295

Summary: [Meta] Support w3c reftests
Product: WebKit Reporter: Hayato Ito <hayato>
Component: Tools / TestsAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: arv, dglazkov, dominicc, dpranke, eoconnor, mike, morrita, ojan, peter.linss, rniwa, rolandsteiner, stearns
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 67269, 67479, 71567, 71574, 73613    
Bug Blocks: 73704    

Description Hayato Ito 2011-08-16 05:51:31 PDT
Basic support for running reftests had been added to WebKit as described in http://trac.webkit.org/wiki/Writing%20Reftests.
The next step is to support a reftest which follows CSS test format guidelines defined in CSS working group. Let me call such reftests 'CSSWG reftests'.

It would be great that we can run 'CSSWG reftests' maintained by CSS working group 'as is', without any manual modifications, on WebKit.
This meta bug acts as an umbrella bug for all efforts to run 'CSSWG reftests' on WebKit.

This might include the following efforts, but not limited to:
- How to import reftests from CSS working group.
- Update new-run-webkit-tests so that it can run 'CSSWG reftests'
- Support a manifest file of 'CSSWG reftests'.
- Integrate that to the existing LayoutTest infrastructure nicely if it is possible.

See also:
- http://wiki.csswg.org/test/reftest
- http://wiki.csswg.org/test/css2.1/format
Comment 1 Dirk Pranke 2011-09-01 12:03:39 PDT
If the reftest links are embedded in the file, presumably that means that we can't tell from the filename (and the -expected.html convention) that something is a reftest; this might complicate things a bit. We should try to figure out a way to figure out which files are reftests without having to parse *every* test file (which could be kinda slow). 

Presumably we could only parse the file if we don't find any appropriately named -expected files, for example. Or we could adopt a convention that any files in the "reftests" directory were assumed to be reftests, or something.

Or we could just benchmark it; maybe it doesn't matter too much.
Comment 2 Hayato Ito 2011-09-01 21:47:41 PDT
Hi Dirk, thank you for the comment.

(In reply to comment #1)
> If the reftest links are embedded in the file, presumably that means that we can't tell from the filename (and the -expected.html convention) that something is a reftest; this might complicate things a bit. We should try to figure out a way to figure out which files are reftests without having to parse *every* test file (which could be kinda slow). 

Yeah, my initial thought is putting all CSSWG reftests to a dedicated directory, like a LayoutTests/reftests/, until we have a nice idea to solve issues caused by mixing existing LayoutTests and CSSWG reftests in the same directories. That makes things much simpler.

As the first attempt of integrating CSSWG reftests with new-run-webkit-tests, we are going to add '--run-csswg-runtests' command line option to new-run-webkit-tests. If that is specified, all CSSWG reftests under the dedicated directory are run. Ai-san, the intern, will try to do it from the next week.
Comment 3 Ojan Vafai 2011-11-01 16:35:31 PDT
(In reply to comment #2)
> Hi Dirk, thank you for the comment.
> 
> (In reply to comment #1)
> > If the reftest links are embedded in the file, presumably that means that we can't tell from the filename (and the -expected.html convention) that something is a reftest; this might complicate things a bit. We should try to figure out a way to figure out which files are reftests without having to parse *every* test file (which could be kinda slow). 
> 
> Yeah, my initial thought is putting all CSSWG reftests to a dedicated directory, like a LayoutTests/reftests/, until we have a nice idea to solve issues caused by mixing existing LayoutTests and CSSWG reftests in the same directories. That makes things much simpler.
> 
> As the first attempt of integrating CSSWG reftests with new-run-webkit-tests, we are going to add '--run-csswg-runtests' command line option to new-run-webkit-tests. If that is specified, all CSSWG reftests under the dedicated directory are run. Ai-san, the intern, will try to do it from the next week.

I'm OK with this solution. We also don't need to know if a test is a reftest until after we've run the test. So, if we provide the appropriate DRT hooks, it should work fine, no?
Comment 4 Ryosuke Niwa 2011-11-02 19:01:46 PDT
*** Bug 71431 has been marked as a duplicate of this bug. ***
Comment 5 Ryosuke Niwa 2011-11-02 19:02:52 PDT
This has come up a lot in TPAC so we should prioritize it. I'm more than happy to work on this if nobody else's actively working on it.
Comment 6 Ryosuke Niwa 2011-11-02 19:06:06 PDT
(In reply to comment #1)
> If the reftest links are embedded in the file, presumably that means that we can't tell from the filename (and the -expected.html convention) that something is a reftest; this might complicate things a bit.

Why is this a problem?
Comment 7 Dirk Pranke 2011-11-02 19:34:06 PDT
(In reply to comment #6)
> (In reply to comment #1)
> > If the reftest links are embedded in the file, presumably that means that we can't tell from the filename (and the -expected.html convention) that something is a reftest; this might complicate things a bit.
> 
> Why is this a problem?

The code in single_test_runner runs different code paths based on whether it's a ref test or not, and that branch is done before we run the test. So we'd either have to open the test file ourselves and figure out that it's a reftest (making things somewhat fragile because we'd have to duplicate the logic in DRT and incurring a perf hit), or restructure the code so that we get that info handed back from DRT (as Ojan suggests in comment #3, which *I think* is true, but I'm not 100% positive).

Note that we might want to duplicate the logic anyway in order for the MockDRT to work (I believe the MockDRT currently crashes on reftests).
Comment 8 Ryosuke Niwa 2011-11-02 22:32:44 PDT
(In reply to comment #7)
> The code in single_test_runner runs different code paths based on whether it's a ref test or not, and that branch is done before we run the test. So we'd either have to open the test file ourselves and figure out that it's a reftest (making things somewhat fragile because we'd have to duplicate the logic in DRT and incurring a perf hit), or restructure the code so that we get that info handed back from DRT (as Ojan suggests in comment #3, which *I think* is true, but I'm not 100% positive).

I've looked into this just now but it appears like we'd have to make quite few changes to each port's DRT. It's doable but I'd rather avoid it if possible.

But I'm also strongly against special-casing directories. So I guess it all depends on the perf hit. Also, we might want to try http://lxml.de/html5parser.html since it implements HTML5 parsing algorithm.
Comment 9 Dirk Pranke 2011-11-02 23:16:29 PDT
We could probably get away with only reading the file to look for the links if the don't find a text (or audio) baseline. In addition, it looks like the WG is recommending that the reference files be named as -ref.html or -notref.html, so we could check for the existence of those filenames as well and only parse the file looking for a link as a last resort.
Comment 10 Hayato Ito 2011-11-03 09:28:14 PDT
In TPAC, every guys called these kinds of reftests, 'w3c reftests', not 'CSSWG reftests', So I renamed the subject of this entry. Lets call that 'w3c reftests' from now.
Comment 11 Hayato Ito 2011-11-03 09:36:45 PDT
(In reply to comment #9)
> We could probably get away with only reading the file to look for the links if the don't find a text (or audio) baseline. In addition, it looks like the WG is recommending that the reference files be named as -ref.html or -notref.html, so we could check for the existence of those filenames as well and only parse the file looking for a link as a last resort.

In order to minimize the startup time of collecting tests, we should try to avoid parsing HTML files as few as possible in this phase. Lets discuss using examples. Suppose a directory contains the following files:

- a.html
- a-expected.txt
- a-expected.png
- b.html
- b-expected.html
- c.html
- c-ref.html
- d.html
- d-notref.html

In this case, it is easy to know which files are w3c reftests without parsing, c.html and d.html.

But I am afraid that every w3c reftests won't follow this name convention. And remember that w3c reftests can have multiple reference files per a test. In that case, it might be more difficult to detect w3c reftests and reference files without parsing. Suppose the following more complicated case:

- a.html
  <link rel="match" href="common-ref.html" />
- b.html
  <link rel="match" href="common-ref.html" />
- c.html
  <link rel="match" href="common-ref.html" />
- d.html
  <link rel="match" href="d-1-ref.html" />
  <link rel="match" href="d-2-ref.html" />
- d-1-ref.html
- d-2-ref.html
- common-ref.html

In this case, we somehow should parse the HTML files.
So my suggestion is:

1. At first, use a heuristic rule as soon as possible and eliminate html files which are easily detected.
2. For remaining files, parse the content to know whether this html file is w3 reftest or not.
Comment 12 Hayato Ito 2011-11-03 09:41:16 PDT
This might be a separate topic:

I'd like to start to support the case where a w3c reftest have only one reference file. In TPAC, Alan told me that most w3c reftests have only one reference file. This covers most cases. If we support this case, we can start to write w3c reftests in WebKit.

Later, we should try to support a test with multiple references files, which already exist in w3c repository.
I think it requires non-trivial change to NRWT code base to support this case.
Comment 13 Ryosuke Niwa 2011-11-03 09:54:26 PDT
(In reply to comment #12)
> I'd like to start to support the case where a w3c reftest have only one reference file. In TPAC, Alan told me that most w3c reftests have only one reference file. This covers most cases. If we support this case, we can start to write w3c reftests in WebKit.
> 
> Later, we should try to support a test with multiple references files, which already exist in w3c repository.
> I think it requires non-trivial change to NRWT code base to support this case.

Alternatively, we can give a feedback to W3C that we should have exactly one reference file per test. What's the purpose of having multiple references anyway?
Comment 14 Ryosuke Niwa 2011-11-03 10:01:31 PDT
(In reply to comment #13)
> Alternatively, we can give a feedback to W3C that we should have exactly one reference file per test. What's the purpose of having multiple references anyway?

Answer: "In some cases when creating the reference file, it is necessary to use features that, although different from the tested features, may themselves fail in such a manner as to cause the reference to render identically to a failed test. When this is the case, in order to reduce the possibility of false positive testing outcomes, multiple reference files should be used, each using a different technique to render the reference. One possibility is to create one or more references that must not match the test file, i.e.: a file that renders in the same manner as a failed test."
Comment 15 Ojan Vafai 2011-11-03 10:10:27 PDT
I'd rather we restructure the code so that reftests get run the same way as non-reftests. The only difference is *after* we've run the test we need to evaluate whether it's a reftest.

So basically, we would still I identify reftests via the -expected.html files during the initial crawl of the LayoutTests directory, but then we'd also identify them when we run the test. 

Ideally, I don't think we need to specially parse the files. We can add DRT infrastructure to print out the appropriate text when it encounters a reftest link element. Then run-webkit-tests can read the printed out line to know which file is the reference. This might be too much work though.

If that's too complicated, I'd also be OK with just running a regexp over the file before we run it, but I worry that would increase the time it takes to run the test. The part I'm worried about is the extra disk access, but maybe it's OK since the file will then be in the disk cache for when DRT needs to read it.

In short, I'd like to see someone try the less complicated regexp approach right before we run the test and (locally!) see if it has a significant cost in the time it takes to run the full test suite. If I understand correctly, that wouldn't actually require restructuring the test runner code much since we'd still know before running the test that it's a reftest.
Comment 16 Ryosuke Niwa 2011-11-03 10:16:40 PDT
(In reply to comment #15)
> Ideally, I don't think we need to specially parse the files. We can add DRT infrastructure to print out the appropriate text when it encounters a reftest link element. Then run-webkit-tests can read the printed out line to know which file is the reference. This might be too much work though.

This involves modifying each port's DRT and add a similar mechanism as dumpAsText.

> If that's too complicated, I'd also be OK with just running a regexp over the file before we run it, but I worry that would increase the time it takes to run the test.

Hayato has already added a script to extract reference files using HTMLParser in fixing https://bugs.webkit.org/show_bug.cgi?id=66838.

> In short, I'd like to see someone try the less complicated regexp approach right before we run the test and (locally!) see if it has a significant cost in the time it takes to run the full test suite.

Agreed.
Comment 17 Dirk Pranke 2011-11-03 10:17:55 PDT
(In reply to comment #16)
> (In reply to comment #15)
> > Ideally, I don't think we need to specially parse the files. We can add DRT infrastructure to print out the appropriate text when it encounters a reftest link element. Then run-webkit-tests can read the printed out line to know which file is the reference. This might be too much work though.
> 
> This involves modifying each port's DRT and add a similar mechanism as dumpAsText.
> 
> > If that's too complicated, I'd also be OK with just running a regexp over the file before we run it, but I worry that would increase the time it takes to run the test.
> 
> Hayato has already added a script to extract reference files using HTMLParser in fixing https://bugs.webkit.org/show_bug.cgi?id=66838.
> 
> > In short, I'd like to see someone try the less complicated regexp approach right before we run the test and (locally!) see if it has a significant cost in the time it takes to run the full test suite.
> 
> Agreed.

I can run a couple benchmarks today and post some numbers.
Comment 18 Ojan Vafai 2011-11-03 10:20:57 PDT
(In reply to comment #9)
> We could probably get away with only reading the file to look for the links if the don't find a text (or audio) baseline. In addition, it looks like the WG is recommending that the reference files be named as -ref.html or -notref.html, so we could check for the existence of those filenames as well and only parse the file looking for a link as a last resort.

Makes sense to me. I'm sure this would completely mitigate the potential performance hit.

To your second point, if the W3C versions use -ref.html and -notref.html, perhaps we should rename ours from -expected.html and -expected-mismatch.html. I've found that everyone I explain reftests to is a little confused by these names. -ref and -notref might be more clear since it's clearly about reftesting.
Comment 19 Hayato Ito 2011-11-03 10:43:24 PDT
For better communication, we should have good names for each 'reftests'. Lets call that.
1. 'non-w3c reftests', which is already supported in WebKit.
2. 'w3c reftests'

The naming convention of '-expected.html' and '-expected-mismatch.html' came from the discussion of  bug 36065. That convention should only be used for non-w3c reftests.
At that time of the discussion, w3c reftests format had not appeared.

'-ref.html' and '-notref.html' convention should be preserved for w3c reftests.
Once we can support w3c reftests, there is no reason to write non-w3c reftests for new tests.
'non-w3c reftests' should be deprecated.


(In reply to comment #18)
> (In reply to comment #9)
> > We could probably get away with only reading the file to look for the links if the don't find a text (or audio) baseline. In addition, it looks like the WG is recommending that the reference files be named as -ref.html or -notref.html, so we could check for the existence of those filenames as well and only parse the file looking for a link as a last resort.
> 
> Makes sense to me. I'm sure this would completely mitigate the potential performance hit.
> 
> To your second point, if the W3C versions use -ref.html and -notref.html, perhaps we should rename ours from -expected.html and -expected-mismatch.html. I've found that everyone I explain reftests to is a little confused by these names. -ref and -notref might be more clear since it's clearly about reftesting.
Comment 20 Hayato Ito 2011-11-03 10:48:19 PDT
Okay. Although I prefer pre-parsing of reference links before passing it to DRT, the following is yet another crazy idea.

1. Collect every 'HTML' files without parsing. HTML file might be one of the followings:
  A. HTML file used by *traditional* Layout tests
  B. HTML file used by non-w3c reftests (e.g. a.html)
  C. Expected html file by non-w3c reftests (e.g. a-expected.html)
  D. Expected mismatch html file by non-w3c reftests (e.g. a-expected-mismatch.html)
  E. w3c reftests (c.html)
  F. w3c reference files (e.g. c-ref.html, c-notref.html)
  G. misplaced html files. :)

  In w3c reftests, one html file might be used as both E and F.

In this phase, we don't know how many html files actually represens for 'test'.
Some of them are used only by 'references'.

2. Get rendering results of all HTML files and also get reference links for each tests from DRT.

3. Given rendering results and reference links information, compare and evaluate the rendering results.

The problem is that we cannot store all rendering results in memory. We proceed 2 & 3 in some incremental ways....


(In reply to comment #15)
> I'd rather we restructure the code so that reftests get run the same way as non-reftests. The only difference is *after* we've run the test we need to evaluate whether it's a reftest.
> 
> So basically, we would still I identify reftests via the -expected.html files during the initial crawl of the LayoutTests directory, but then we'd also identify them when we run the test. 
> 
> Ideally, I don't think we need to specially parse the files. We can add DRT infrastructure to print out the appropriate text when it encounters a reftest link element. Then run-webkit-tests can read the printed out line to know which file is the reference. This might be too much work though.
> 
> If that's too complicated, I'd also be OK with just running a regexp over the file before we run it, but I worry that would increase the time it takes to run the test. The part I'm worried about is the extra disk access, but maybe it's OK since the file will then be in the disk cache for when DRT needs to read it.
> 
> In short, I'd like to see someone try the less complicated regexp approach right before we run the test and (locally!) see if it has a significant cost in the time it takes to run the full test suite. If I understand correctly, that wouldn't actually require restructuring the test runner code much since we'd still know before running the test that it's a reftest.
Comment 21 Ojan Vafai 2011-11-03 11:23:41 PDT
Oh, is it not required that the reference links only point to -ref.html files or -notref.html files? If so, we should see about changing that at the W3C. That seems unnecessarily general to me.

If we can assert that only -ref and -notref files are references, then any other .html file is a test.
Comment 22 Ojan Vafai 2011-11-03 11:31:50 PDT
As per http://wiki.csswg.org/test/reftest, "References should be named after the earliest test that uses them in the test-topic series they belong to, and have -ref appended before the file extension. Depending on the test suite, they may be placed in the reference subfolder of the main test directory or directly in the main test directory."

Sounds like that's already assumed. We should try to get the text to be more explicit that this is true for link references as well.

I'm not really sure where this is being discussed...
Comment 23 Ojan Vafai 2011-11-03 11:46:11 PDT
Discussed this a bit on #whatwg, and it looks like, at the very least, mozilla's test suite uses a manifest file for their reftests.

I think we should eventually:
a) Add support for mozilla's manifest format
b) assume that .html files are only references if they're listed in a manifest file or if they have ref (sounds like it could start with ref-/notref- or end with -ref/-notref or be in a references directory).

In the short term, I don't think we need to implement the manifest format. That means we can assume that html files that have -ref/-notref/-exepected/-expected-mismatch in the name are references and all other html files are not.
Comment 24 Alan Stearns 2011-11-03 11:47:58 PDT
According to Peter Linss, the naming conventions are slightly more complicated. His test harness checks for (at least) filename is: *-ref.* ref-*.* (or *-notref.*  notref-*.*) and whether the file is in a "reftest" folder. He also ends up reading through some files and if there is not a rel=help link then he concludes the file is not a test.
Comment 25 Alan Stearns 2011-11-03 11:50:23 PDT
One possibility is taking the w3c test harness code for processing w3c tests. The harness between CSS and WebApps is converging, and currently the CSS side of things processes the files and generates a manifest to use while running the tests. If we share this (python) code with the w3c we don't have to keep up with their process changes.
Comment 26 Ojan Vafai 2011-11-03 11:59:57 PDT
(In reply to comment #25)
> One possibility is taking the w3c test harness code for processing w3c tests. The harness between CSS and WebApps is converging, and currently the CSS side of things processes the files and generates a manifest to use while running the tests. If we share this (python) code with the w3c we don't have to keep up with their process changes.

The downside to this is that it makes it very confusing to maintain the webkit tests if there are too many different ways tests run.
Comment 27 Ryosuke Niwa 2011-11-03 12:11:53 PDT
(In reply to comment #24)
> According to Peter Linss, the naming conventions are slightly more complicated. His test harness checks for (at least) filename is: *-ref.* ref-*.* (or *-notref.*  notref-*.*) and whether the file is in a "reftest" folder. He also ends up reading through some files and if there is not a rel=help link then he concludes the file is not a test.

This sounds like a serious problem to me. Can we wait until W3C converges on one method to write tests. It appears that people are already confused by the current implementation of ref tests and adding multiple ways to write ref tests will worsen the issue.
Comment 28 Peter Linss 2011-11-03 12:45:56 PDT
(In reply to comment #27)
> (In reply to comment #24)
> > According to Peter Linss, the naming conventions are slightly more complicated. His test harness checks for (at least) filename is: *-ref.* ref-*.* (or *-notref.*  notref-*.*) and whether the file is in a "reftest" folder. He also ends up reading through some files and if there is not a rel=help link then he concludes the file is not a test.
> 
> This sounds like a serious problem to me. Can we wait until W3C converges on one method to write tests. It appears that people are already confused by the current implementation of ref tests and adding multiple ways to write ref tests will worsen the issue.

Let me be clear. We have one preferred method to name test files: append -ref or -notref to the filename of the reference file. 

Our system optionally supports ref- or notref- prefixes, or having the reference files in a 'reftest' or 'reference' sub-directory. These are for compatibility with older tests.

We also expect all retests to use the <link rel='match' .../> or <link rel='mismatch' .../> convention for source files. However our test build system (not the harness) will also generate reftest manifest files so it's not necessary for any other tools to parse the source files looking for reference links.

In general we expect testers to run the output of our build process, not the source.
Comment 29 Dirk Pranke 2011-11-03 19:04:31 PDT
(In reply to comment #25)
> One possibility is taking the w3c test harness code for processing w3c tests. The harness between CSS and WebApps is converging, and currently the CSS side of things processes the files and generates a manifest to use while running the tests. If we share this (python) code with the w3c we don't have to keep up with their process changes.

Alan, can you point me at that code? When I've looked in the past, I hadn't found anything (admittedly, I haven't yet looked very hard).
Comment 30 Dirk Pranke 2011-11-03 19:07:27 PDT
(In reply to comment #23)
> I think we should eventually:
> a) Add support for mozilla's manifest format

It sounds like the W3C isn't planning to require it, and I don't know that we need to support the complexity it handles for some other reason. Are you hoping to run other tests from mozilla (which would be perfectly reasonable thing to want)?
Comment 31 Ojan Vafai 2011-11-03 19:09:24 PDT
(In reply to comment #30)
> (In reply to comment #23)
> > I think we should eventually:
> > a) Add support for mozilla's manifest format
> 
> It sounds like the W3C isn't planning to require it, and I don't know that we need to support the complexity it handles for some other reason. Are you hoping to run other tests from mozilla (which would be perfectly reasonable thing to want)?

Yes. It would be great to be able to import more of mozilla's test suites. The more tests we share, the better our interoperability and the better our test coverage. Anyways, whoever wants to import the mozilla tests can take on this task at that time. I was just trying to figure out what the eventual state of things should be.
Comment 32 Peter Linss 2011-11-03 21:21:19 PDT
(In reply to comment #29)
> (In reply to comment #25)
> > One possibility is taking the w3c test harness code for processing w3c tests. The harness between CSS and WebApps is converging, and currently the CSS side of things processes the files and generates a manifest to use while running the tests. If we share this (python) code with the w3c we don't have to keep up with their process changes.
> 
> Alan, can you point me at that code? When I've looked in the past, I hadn't found anything (admittedly, I haven't yet looked very hard).

The code for the harness is at:
http://hg.csswg.org/dev/harness

Note that this is the web application that runs tests, gathers results and prepares reports.

The code that processes tests and produces manifest files (among other things) is in the test suite repository at:
http://hg.csswg.org/test/tools
and mostly in the python library at:
http://hg.csswg.org/dev/w3ctestlib
which is included as a sub repository under test/tools.

Note that the test suite build code is currently focused on the CSS test suites. I'll be re-working it over the next few weeks to be more generic and flexible.
Comment 33 Ryosuke Niwa 2011-11-03 21:26:02 PDT
We had a long discussion about this today. A long story short,
1. W3C's test suite has a build step that generates manifest files for all ref tests
2. Browser vendors need only support built test suite
3. Peter and other folks will make our lives easier by accepting link-element-less reftest+manifest file format to W3C test suites

Basically we don't have to do anything that relies on file names. We just need to support the manifest file (one that's much simpler than what mozilla uses), and everything will just work.

In fact, Hayato and me just hacked out a patch (see https://bugs.webkit.org/show_bug.cgi?id=66837) that almost completes the support for W3C style ref tests.
Comment 34 Ryosuke Niwa 2011-11-03 21:27:09 PDT
(In reply to comment #33)
> In fact, Hayato and me just hacked out
s/and me just hacked out/and I just hacked/
Comment 35 Hayato Ito 2011-11-04 10:06:55 PDT
*** Bug 67479 has been marked as a duplicate of this bug. ***
Comment 36 Ryosuke Niwa 2011-11-06 18:11:51 PST
If you combine patches on https://bugs.webkit.org/show_bug.cgi?id=66837 and https://bugs.webkit.org/show_bug.cgi?id=71567, you can start running W3C ref tests.
Comment 37 Ryosuke Niwa 2011-12-02 14:15:37 PST
I think we're done. We have a support for reftest.list and we support having multiple reference files per test. We should be able to start importing tests now.