66295 – [Meta] Support w3c reftests

Hayato Ito

Reported 2011-08-16 05:51:31 PDT

Basic support for running reftests had been added to WebKit as described in http://trac.webkit.org/wiki/Writing%20Reftests. The next step is to support a reftest which follows CSS test format guidelines defined in CSS working group. Let me call such reftests 'CSSWG reftests'. It would be great that we can run 'CSSWG reftests' maintained by CSS working group 'as is', without any manual modifications, on WebKit. This meta bug acts as an umbrella bug for all efforts to run 'CSSWG reftests' on WebKit. This might include the following efforts, but not limited to: - How to import reftests from CSS working group. - Update new-run-webkit-tests so that it can run 'CSSWG reftests' - Support a manifest file of 'CSSWG reftests'. - Integrate that to the existing LayoutTest infrastructure nicely if it is possible. See also: - http://wiki.csswg.org/test/reftest - http://wiki.csswg.org/test/css2.1/format

Dirk Pranke

Comment 1 2011-09-01 12:03:39 PDT

If the reftest links are embedded in the file, presumably that means that we can't tell from the filename (and the -expected.html convention) that something is a reftest; this might complicate things a bit. We should try to figure out a way to figure out which files are reftests without having to parse *every* test file (which could be kinda slow). Presumably we could only parse the file if we don't find any appropriately named -expected files, for example. Or we could adopt a convention that any files in the "reftests" directory were assumed to be reftests, or something. Or we could just benchmark it; maybe it doesn't matter too much.

Hayato Ito

Comment 2 2011-09-01 21:47:41 PDT

Hi Dirk, thank you for the comment. (In reply to comment #1) > If the reftest links are embedded in the file, presumably that means that we can't tell from the filename (and the -expected.html convention) that something is a reftest; this might complicate things a bit. We should try to figure out a way to figure out which files are reftests without having to parse *every* test file (which could be kinda slow). Yeah, my initial thought is putting all CSSWG reftests to a dedicated directory, like a LayoutTests/reftests/, until we have a nice idea to solve issues caused by mixing existing LayoutTests and CSSWG reftests in the same directories. That makes things much simpler. As the first attempt of integrating CSSWG reftests with new-run-webkit-tests, we are going to add '--run-csswg-runtests' command line option to new-run-webkit-tests. If that is specified, all CSSWG reftests under the dedicated directory are run. Ai-san, the intern, will try to do it from the next week.

Ojan Vafai

Comment 3 2011-11-01 16:35:31 PDT

(In reply to comment #2) > Hi Dirk, thank you for the comment. > > (In reply to comment #1) > > If the reftest links are embedded in the file, presumably that means that we can't tell from the filename (and the -expected.html convention) that something is a reftest; this might complicate things a bit. We should try to figure out a way to figure out which files are reftests without having to parse *every* test file (which could be kinda slow). > > Yeah, my initial thought is putting all CSSWG reftests to a dedicated directory, like a LayoutTests/reftests/, until we have a nice idea to solve issues caused by mixing existing LayoutTests and CSSWG reftests in the same directories. That makes things much simpler. > > As the first attempt of integrating CSSWG reftests with new-run-webkit-tests, we are going to add '--run-csswg-runtests' command line option to new-run-webkit-tests. If that is specified, all CSSWG reftests under the dedicated directory are run. Ai-san, the intern, will try to do it from the next week. I'm OK with this solution. We also don't need to know if a test is a reftest until after we've run the test. So, if we provide the appropriate DRT hooks, it should work fine, no?

Ryosuke Niwa

Comment 4 2011-11-02 19:01:46 PDT

*** Bug 71431 has been marked as a duplicate of this bug. ***

Ryosuke Niwa

Comment 5 2011-11-02 19:02:52 PDT

This has come up a lot in TPAC so we should prioritize it. I'm more than happy to work on this if nobody else's actively working on it.

Ryosuke Niwa

Comment 6 2011-11-02 19:06:06 PDT

(In reply to comment #1) > If the reftest links are embedded in the file, presumably that means that we can't tell from the filename (and the -expected.html convention) that something is a reftest; this might complicate things a bit. Why is this a problem?

Dirk Pranke

Comment 7 2011-11-02 19:34:06 PDT

(In reply to comment #6) > (In reply to comment #1) > > If the reftest links are embedded in the file, presumably that means that we can't tell from the filename (and the -expected.html convention) that something is a reftest; this might complicate things a bit. > > Why is this a problem? The code in single_test_runner runs different code paths based on whether it's a ref test or not, and that branch is done before we run the test. So we'd either have to open the test file ourselves and figure out that it's a reftest (making things somewhat fragile because we'd have to duplicate the logic in DRT and incurring a perf hit), or restructure the code so that we get that info handed back from DRT (as Ojan suggests in comment #3, which *I think* is true, but I'm not 100% positive). Note that we might want to duplicate the logic anyway in order for the MockDRT to work (I believe the MockDRT currently crashes on reftests).

Ryosuke Niwa

Comment 8 2011-11-02 22:32:44 PDT

(In reply to comment #7) > The code in single_test_runner runs different code paths based on whether it's a ref test or not, and that branch is done before we run the test. So we'd either have to open the test file ourselves and figure out that it's a reftest (making things somewhat fragile because we'd have to duplicate the logic in DRT and incurring a perf hit), or restructure the code so that we get that info handed back from DRT (as Ojan suggests in comment #3, which *I think* is true, but I'm not 100% positive). I've looked into this just now but it appears like we'd have to make quite few changes to each port's DRT. It's doable but I'd rather avoid it if possible. But I'm also strongly against special-casing directories. So I guess it all depends on the perf hit. Also, we might want to try http://lxml.de/html5parser.html since it implements HTML5 parsing algorithm.

Dirk Pranke

Comment 9 2011-11-02 23:16:29 PDT

We could probably get away with only reading the file to look for the links if the don't find a text (or audio) baseline. In addition, it looks like the WG is recommending that the reference files be named as -ref.html or -notref.html, so we could check for the existence of those filenames as well and only parse the file looking for a link as a last resort.

Hayato Ito

Comment 10 2011-11-03 09:28:14 PDT

In TPAC, every guys called these kinds of reftests, 'w3c reftests', not 'CSSWG reftests', So I renamed the subject of this entry. Lets call that 'w3c reftests' from now.

Hayato Ito

Comment 11 2011-11-03 09:36:45 PDT

(In reply to comment #9) > We could probably get away with only reading the file to look for the links if the don't find a text (or audio) baseline. In addition, it looks like the WG is recommending that the reference files be named as -ref.html or -notref.html, so we could check for the existence of those filenames as well and only parse the file looking for a link as a last resort. In order to minimize the startup time of collecting tests, we should try to avoid parsing HTML files as few as possible in this phase. Lets discuss using examples. Suppose a directory contains the following files: - a.html - a-expected.txt - a-expected.png - b.html - b-expected.html - c.html - c-ref.html - d.html - d-notref.html In this case, it is easy to know which files are w3c reftests without parsing, c.html and d.html. But I am afraid that every w3c reftests won't follow this name convention. And remember that w3c reftests can have multiple reference files per a test. In that case, it might be more difficult to detect w3c reftests and reference files without parsing. Suppose the following more complicated case: - a.html <link rel="match" href="common-ref.html" /> - b.html <link rel="match" href="common-ref.html" /> - c.html <link rel="match" href="common-ref.html" /> - d.html <link rel="match" href="d-1-ref.html" /> <link rel="match" href="d-2-ref.html" /> - d-1-ref.html - d-2-ref.html - common-ref.html In this case, we somehow should parse the HTML files. So my suggestion is: 1. At first, use a heuristic rule as soon as possible and eliminate html files which are easily detected. 2. For remaining files, parse the content to know whether this html file is w3 reftest or not.

Hayato Ito

Comment 12 2011-11-03 09:41:16 PDT

This might be a separate topic: I'd like to start to support the case where a w3c reftest have only one reference file. In TPAC, Alan told me that most w3c reftests have only one reference file. This covers most cases. If we support this case, we can start to write w3c reftests in WebKit. Later, we should try to support a test with multiple references files, which already exist in w3c repository. I think it requires non-trivial change to NRWT code base to support this case.

Ryosuke Niwa

Comment 13 2011-11-03 09:54:26 PDT

(In reply to comment #12) > I'd like to start to support the case where a w3c reftest have only one reference file. In TPAC, Alan told me that most w3c reftests have only one reference file. This covers most cases. If we support this case, we can start to write w3c reftests in WebKit. > > Later, we should try to support a test with multiple references files, which already exist in w3c repository. > I think it requires non-trivial change to NRWT code base to support this case. Alternatively, we can give a feedback to W3C that we should have exactly one reference file per test. What's the purpose of having multiple references anyway?

Ryosuke Niwa

Comment 14 2011-11-03 10:01:31 PDT

(In reply to comment #13) > Alternatively, we can give a feedback to W3C that we should have exactly one reference file per test. What's the purpose of having multiple references anyway? Answer: "In some cases when creating the reference file, it is necessary to use features that, although different from the tested features, may themselves fail in such a manner as to cause the reference to render identically to a failed test. When this is the case, in order to reduce the possibility of false positive testing outcomes, multiple reference files should be used, each using a different technique to render the reference. One possibility is to create one or more references that must not match the test file, i.e.: a file that renders in the same manner as a failed test."

Ojan Vafai

Comment 15 2011-11-03 10:10:27 PDT

I'd rather we restructure the code so that reftests get run the same way as non-reftests. The only difference is *after* we've run the test we need to evaluate whether it's a reftest. So basically, we would still I identify reftests via the -expected.html files during the initial crawl of the LayoutTests directory, but then we'd also identify them when we run the test. Ideally, I don't think we need to specially parse the files. We can add DRT infrastructure to print out the appropriate text when it encounters a reftest link element. Then run-webkit-tests can read the printed out line to know which file is the reference. This might be too much work though. If that's too complicated, I'd also be OK with just running a regexp over the file before we run it, but I worry that would increase the time it takes to run the test. The part I'm worried about is the extra disk access, but maybe it's OK since the file will then be in the disk cache for when DRT needs to read it. In short, I'd like to see someone try the less complicated regexp approach right before we run the test and (locally!) see if it has a significant cost in the time it takes to run the full test suite. If I understand correctly, that wouldn't actually require restructuring the test runner code much since we'd still know before running the test that it's a reftest.

Ryosuke Niwa

Comment 16 2011-11-03 10:16:40 PDT

(In reply to comment #15) > Ideally, I don't think we need to specially parse the files. We can add DRT infrastructure to print out the appropriate text when it encounters a reftest link element. Then run-webkit-tests can read the printed out line to know which file is the reference. This might be too much work though. This involves modifying each port's DRT and add a similar mechanism as dumpAsText. > If that's too complicated, I'd also be OK with just running a regexp over the file before we run it, but I worry that would increase the time it takes to run the test. Hayato has already added a script to extract reference files using HTMLParser in fixing https://bugs.webkit.org/show_bug.cgi?id=66838. > In short, I'd like to see someone try the less complicated regexp approach right before we run the test and (locally!) see if it has a significant cost in the time it takes to run the full test suite. Agreed.

Dirk Pranke

Comment 17 2011-11-03 10:17:55 PDT

(In reply to comment #16) > (In reply to comment #15) > > Ideally, I don't think we need to specially parse the files. We can add DRT infrastructure to print out the appropriate text when it encounters a reftest link element. Then run-webkit-tests can read the printed out line to know which file is the reference. This might be too much work though. > > This involves modifying each port's DRT and add a similar mechanism as dumpAsText. > > > If that's too complicated, I'd also be OK with just running a regexp over the file before we run it, but I worry that would increase the time it takes to run the test. > > Hayato has already added a script to extract reference files using HTMLParser in fixing https://bugs.webkit.org/show_bug.cgi?id=66838. > > > In short, I'd like to see someone try the less complicated regexp approach right before we run the test and (locally!) see if it has a significant cost in the time it takes to run the full test suite. > > Agreed. I can run a couple benchmarks today and post some numbers.

Ojan Vafai

Comment 18 2011-11-03 10:20:57 PDT

(In reply to comment #9) > We could probably get away with only reading the file to look for the links if the don't find a text (or audio) baseline. In addition, it looks like the WG is recommending that the reference files be named as -ref.html or -notref.html, so we could check for the existence of those filenames as well and only parse the file looking for a link as a last resort. Makes sense to me. I'm sure this would completely mitigate the potential performance hit. To your second point, if the W3C versions use -ref.html and -notref.html, perhaps we should rename ours from -expected.html and -expected-mismatch.html. I've found that everyone I explain reftests to is a little confused by these names. -ref and -notref might be more clear since it's clearly about reftesting.

Hayato Ito

Comment 19 2011-11-03 10:43:24 PDT

For better communication, we should have good names for each 'reftests'. Lets call that. 1. 'non-w3c reftests', which is already supported in WebKit. 2. 'w3c reftests' The naming convention of '-expected.html' and '-expected-mismatch.html' came from the discussion of bug 36065. That convention should only be used for non-w3c reftests. At that time of the discussion, w3c reftests format had not appeared. '-ref.html' and '-notref.html' convention should be preserved for w3c reftests. Once we can support w3c reftests, there is no reason to write non-w3c reftests for new tests. 'non-w3c reftests' should be deprecated. (In reply to comment #18) > (In reply to comment #9) > > We could probably get away with only reading the file to look for the links if the don't find a text (or audio) baseline. In addition, it looks like the WG is recommending that the reference files be named as -ref.html or -notref.html, so we could check for the existence of those filenames as well and only parse the file looking for a link as a last resort. > > Makes sense to me. I'm sure this would completely mitigate the potential performance hit. > > To your second point, if the W3C versions use -ref.html and -notref.html, perhaps we should rename ours from -expected.html and -expected-mismatch.html. I've found that everyone I explain reftests to is a little confused by these names. -ref and -notref might be more clear since it's clearly about reftesting.

Hayato Ito

Comment 20 2011-11-03 10:48:19 PDT

Okay. Although I prefer pre-parsing of reference links before passing it to DRT, the following is yet another crazy idea. 1. Collect every 'HTML' files without parsing. HTML file might be one of the followings: A. HTML file used by *traditional* Layout tests B. HTML file used by non-w3c reftests (e.g. a.html) C. Expected html file by non-w3c reftests (e.g. a-expected.html) D. Expected mismatch html file by non-w3c reftests (e.g. a-expected-mismatch.html) E. w3c reftests (c.html) F. w3c reference files (e.g. c-ref.html, c-notref.html) G. misplaced html files. :) In w3c reftests, one html file might be used as both E and F. In this phase, we don't know how many html files actually represens for 'test'. Some of them are used only by 'references'. 2. Get rendering results of all HTML files and also get reference links for each tests from DRT. 3. Given rendering results and reference links information, compare and evaluate the rendering results. The problem is that we cannot store all rendering results in memory. We proceed 2 & 3 in some incremental ways.... (In reply to comment #15) > I'd rather we restructure the code so that reftests get run the same way as non-reftests. The only difference is *after* we've run the test we need to evaluate whether it's a reftest. > > So basically, we would still I identify reftests via the -expected.html files during the initial crawl of the LayoutTests directory, but then we'd also identify them when we run the test. > > Ideally, I don't think we need to specially parse the files. We can add DRT infrastructure to print out the appropriate text when it encounters a reftest link element. Then run-webkit-tests can read the printed out line to know which file is the reference. This might be too much work though. > > If that's too complicated, I'd also be OK with just running a regexp over the file before we run it, but I worry that would increase the time it takes to run the test. The part I'm worried about is the extra disk access, but maybe it's OK since the file will then be in the disk cache for when DRT needs to read it. > > In short, I'd like to see someone try the less complicated regexp approach right before we run the test and (locally!) see if it has a significant cost in the time it takes to run the full test suite. If I understand correctly, that wouldn't actually require restructuring the test runner code much since we'd still know before running the test that it's a reftest.

Ojan Vafai

Comment 21 2011-11-03 11:23:41 PDT

Oh, is it not required that the reference links only point to -ref.html files or -notref.html files? If so, we should see about changing that at the W3C. That seems unnecessarily general to me. If we can assert that only -ref and -notref files are references, then any other .html file is a test.

Ojan Vafai

Comment 22 2011-11-03 11:31:50 PDT

As per http://wiki.csswg.org/test/reftest, "References should be named after the earliest test that uses them in the test-topic series they belong to, and have -ref appended before the file extension. Depending on the test suite, they may be placed in the reference subfolder of the main test directory or directly in the main test directory." Sounds like that's already assumed. We should try to get the text to be more explicit that this is true for link references as well. I'm not really sure where this is being discussed...

Ojan Vafai

Comment 23 2011-11-03 11:46:11 PDT

Discussed this a bit on #whatwg, and it looks like, at the very least, mozilla's test suite uses a manifest file for their reftests. I think we should eventually: a) Add support for mozilla's manifest format b) assume that .html files are only references if they're listed in a manifest file or if they have ref (sounds like it could start with ref-/notref- or end with -ref/-notref or be in a references directory). In the short term, I don't think we need to implement the manifest format. That means we can assume that html files that have -ref/-notref/-exepected/-expected-mismatch in the name are references and all other html files are not.

Alan Stearns

Comment 24 2011-11-03 11:47:58 PDT

According to Peter Linss, the naming conventions are slightly more complicated. His test harness checks for (at least) filename is: *-ref.* ref-*.* (or *-notref.* notref-*.*) and whether the file is in a "reftest" folder. He also ends up reading through some files and if there is not a rel=help link then he concludes the file is not a test.

Alan Stearns

Comment 25 2011-11-03 11:50:23 PDT

One possibility is taking the w3c test harness code for processing w3c tests. The harness between CSS and WebApps is converging, and currently the CSS side of things processes the files and generates a manifest to use while running the tests. If we share this (python) code with the w3c we don't have to keep up with their process changes.

Ojan Vafai

Comment 26 2011-11-03 11:59:57 PDT

(In reply to comment #25) > One possibility is taking the w3c test harness code for processing w3c tests. The harness between CSS and WebApps is converging, and currently the CSS side of things processes the files and generates a manifest to use while running the tests. If we share this (python) code with the w3c we don't have to keep up with their process changes. The downside to this is that it makes it very confusing to maintain the webkit tests if there are too many different ways tests run.

Ryosuke Niwa

Comment 27 2011-11-03 12:11:53 PDT

(In reply to comment #24) > According to Peter Linss, the naming conventions are slightly more complicated. His test harness checks for (at least) filename is: *-ref.* ref-*.* (or *-notref.* notref-*.*) and whether the file is in a "reftest" folder. He also ends up reading through some files and if there is not a rel=help link then he concludes the file is not a test. This sounds like a serious problem to me. Can we wait until W3C converges on one method to write tests. It appears that people are already confused by the current implementation of ref tests and adding multiple ways to write ref tests will worsen the issue.

Peter Linss

Comment 28 2011-11-03 12:45:56 PDT

(In reply to comment #27) > (In reply to comment #24) > > According to Peter Linss, the naming conventions are slightly more complicated. His test harness checks for (at least) filename is: *-ref.* ref-*.* (or *-notref.* notref-*.*) and whether the file is in a "reftest" folder. He also ends up reading through some files and if there is not a rel=help link then he concludes the file is not a test. > > This sounds like a serious problem to me. Can we wait until W3C converges on one method to write tests. It appears that people are already confused by the current implementation of ref tests and adding multiple ways to write ref tests will worsen the issue. Let me be clear. We have one preferred method to name test files: append -ref or -notref to the filename of the reference file. Our system optionally supports ref- or notref- prefixes, or having the reference files in a 'reftest' or 'reference' sub-directory. These are for compatibility with older tests. We also expect all retests to use the <link rel='match' .../> or <link rel='mismatch' .../> convention for source files. However our test build system (not the harness) will also generate reftest manifest files so it's not necessary for any other tools to parse the source files looking for reference links. In general we expect testers to run the output of our build process, not the source.

Dirk Pranke

Comment 29 2011-11-03 19:04:31 PDT

(In reply to comment #25) > One possibility is taking the w3c test harness code for processing w3c tests. The harness between CSS and WebApps is converging, and currently the CSS side of things processes the files and generates a manifest to use while running the tests. If we share this (python) code with the w3c we don't have to keep up with their process changes. Alan, can you point me at that code? When I've looked in the past, I hadn't found anything (admittedly, I haven't yet looked very hard).

Dirk Pranke

Comment 30 2011-11-03 19:07:27 PDT

(In reply to comment #23) > I think we should eventually: > a) Add support for mozilla's manifest format It sounds like the W3C isn't planning to require it, and I don't know that we need to support the complexity it handles for some other reason. Are you hoping to run other tests from mozilla (which would be perfectly reasonable thing to want)?

Ojan Vafai

Comment 31 2011-11-03 19:09:24 PDT

(In reply to comment #30) > (In reply to comment #23) > > I think we should eventually: > > a) Add support for mozilla's manifest format > > It sounds like the W3C isn't planning to require it, and I don't know that we need to support the complexity it handles for some other reason. Are you hoping to run other tests from mozilla (which would be perfectly reasonable thing to want)? Yes. It would be great to be able to import more of mozilla's test suites. The more tests we share, the better our interoperability and the better our test coverage. Anyways, whoever wants to import the mozilla tests can take on this task at that time. I was just trying to figure out what the eventual state of things should be.

Peter Linss

Comment 32 2011-11-03 21:21:19 PDT

(In reply to comment #29) > (In reply to comment #25) > > One possibility is taking the w3c test harness code for processing w3c tests. The harness between CSS and WebApps is converging, and currently the CSS side of things processes the files and generates a manifest to use while running the tests. If we share this (python) code with the w3c we don't have to keep up with their process changes. > > Alan, can you point me at that code? When I've looked in the past, I hadn't found anything (admittedly, I haven't yet looked very hard). The code for the harness is at: http://hg.csswg.org/dev/harness Note that this is the web application that runs tests, gathers results and prepares reports. The code that processes tests and produces manifest files (among other things) is in the test suite repository at: http://hg.csswg.org/test/tools and mostly in the python library at: http://hg.csswg.org/dev/w3ctestlib which is included as a sub repository under test/tools. Note that the test suite build code is currently focused on the CSS test suites. I'll be re-working it over the next few weeks to be more generic and flexible.

Ryosuke Niwa

Comment 33 2011-11-03 21:26:02 PDT

We had a long discussion about this today. A long story short, 1. W3C's test suite has a build step that generates manifest files for all ref tests 2. Browser vendors need only support built test suite 3. Peter and other folks will make our lives easier by accepting link-element-less reftest+manifest file format to W3C test suites Basically we don't have to do anything that relies on file names. We just need to support the manifest file (one that's much simpler than what mozilla uses), and everything will just work. In fact, Hayato and me just hacked out a patch (see https://bugs.webkit.org/show_bug.cgi?id=66837) that almost completes the support for W3C style ref tests.

Ryosuke Niwa

Comment 34 2011-11-03 21:27:09 PDT

(In reply to comment #33) > In fact, Hayato and me just hacked out s/and me just hacked out/and I just hacked/

Hayato Ito

Comment 35 2011-11-04 10:06:55 PDT

*** Bug 67479 has been marked as a duplicate of this bug. ***

Ryosuke Niwa

Comment 36 2011-11-06 18:11:51 PST

If you combine patches on https://bugs.webkit.org/show_bug.cgi?id=66837 and https://bugs.webkit.org/show_bug.cgi?id=71567, you can start running W3C ref tests.

Ryosuke Niwa

Comment 37 2011-12-02 14:15:37 PST

I think we're done. We have a support for reftest.list and we support having multiple reference files per test. We should be able to start importing tests now.