Bug 116333 - [webkitpy] Explicitly specify the reference file extensions
Summary: [webkitpy] Explicitly specify the reference file extensions
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Zan Dobersek
URL:
Keywords:
Depends on:
Blocks: 116312
  Show dependency treegraph
 
Reported: 2013-05-17 10:59 PDT by Zan Dobersek
Modified: 2013-05-28 06:34 PDT (History)
8 users (show)

See Also:


Attachments
Patch (3.91 KB, patch)
2013-05-17 11:11 PDT, Zan Dobersek
no flags Details | Formatted Diff | Diff
Patch (5.79 KB, patch)
2013-05-17 11:24 PDT, Zan Dobersek
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zan Dobersek 2013-05-17 10:59:21 PDT
[webkitpy] Explicitly specify the reference file extensions
Comment 1 Zan Dobersek 2013-05-17 11:11:50 PDT
Created attachment 202119 [details]
Patch
Comment 2 Zan Dobersek 2013-05-17 11:24:49 PDT
Created attachment 202125 [details]
Patch
Comment 3 Dirk Pranke 2013-05-17 13:32:31 PDT
Comment on attachment 202125 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=202125&action=review

Why do we need or want to make this distinction? I'm not sure I understand ...

> Tools/Scripts/webkitpy/port/base.py:617
> +        filename_wihout_ext, ext = filesystem.splitext(filename)

typo: "without".
Comment 4 Zan Dobersek 2013-05-17 13:51:59 PDT
(In reply to comment #3)
> (From update of attachment 202125 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=202125&action=review
> 
> Why do we need or want to make this distinction? I'm not sure I understand ...
> 

There's undergoing work in bug #116312 that introduces running MHTML-dumping tests. The dumped output is then compared to the *-expected.mht MHTML archive baselines, much like the webarchive baselines under LayoutTests/webarchive.

But .mht is also an extension used by various test files and is as such required to be listed in the _supported_file_extensions set (that I propose renaming). This same set is also used when searching for possible reference files in Port.reference_files, meaning that given a MHTML-dumping page.html layout test, its baseline, page-expected.mht, is actually recognized as a reference file, leading to considering the page.html test to be a reftest and comparing the pixel output etc. This is of course not welcomed as the .mht baseline should be treated as a text baseline, just like the .webarchive baselines.
Comment 5 Dirk Pranke 2013-05-17 14:18:00 PDT
I see. Thanks for explaining.

It seems like a mistake to me to have .mht be both a valid extension for a test file and a result that *isn't* a valid reference file. That requires the users to be aware that .html is fine for both tests and references, but .mht isn't, and that perhaps a better solution is to either dump the file as -expected.txt, or to pick a different suffix for the thing that is a baseline. 

I imagine this tradeoff was considered before; what do y'all think of this?
Comment 6 Zan Dobersek 2013-05-18 12:25:57 PDT
CC-ing more people to hopefully get their opinion.

I'll have to agree on the premise that a valid test file extension must also be a valid reference file extension, it makes total sense.

Given that, I think it would make most sense to have the dumped MHTML archive compared against a *-expected.txt baseline. Another possibility would be to have the *.html test files and *.mht reference files. These would then have the pixel output compared, but I'm not sure how much coverage this would yield in regards to testing the proper format of the dumped MHTML archive. Another possibility that I would rather avoid is to have the test (and reference) files use .mht extension while the text baselines would use the .mhtml extension, but IMO that just brings additional complexity to the extensions support.

FWIW, The current MHTML tests consist of .mht test files that are loaded and then dump text and pixel output. I think these can be easily converted into reference tests. Though, as mentioned before, I'm not sure of testing MHTML behavior the other way around (loading HTML test files and using MHTML files as the reference output).
Comment 7 Benjamin Poulain 2013-05-22 17:03:59 PDT
Comment on attachment 202125 [details]
Patch

Nothing against this, but why not differentiate based on the -expected?

For MHTML, maybe we should have -expected.mht and -binary-expected.mht simultaneously for all tests don't you think?
Comment 8 Dirk Pranke 2013-05-22 17:23:54 PDT
(In reply to comment #7)
> (From update of attachment 202125 [details])
> Nothing against this, but why not differentiate based on the -expected?
> 
> For MHTML, maybe we should have -expected.mht and -binary-expected.mht simultaneously for all tests don't you think?

MHTML is a text-based format, so the word "binary" confuses me in this context. We need to distinguish between "here is the reference MHTML to compare against" and "here is the dumped MHTML archive from the test" (although I'm not actually sure we need reference MHTML files, we could probably just use regular html files). 

Which of those two types of files is which of yours?
Comment 9 Benjamin Poulain 2013-05-22 18:03:11 PDT
> MHTML is a text-based format, so the word "binary" confuses me in this context. We need to distinguish between "here is the reference MHTML to compare against" and "here is the dumped MHTML archive from the test" (although I'm not actually sure we need reference MHTML files, we could probably just use regular html files). 

I don't know much about MHTML but the API has a "useBinaryEncoding" boolean. Which is why I ask about that.

> Which of those two types of files is which of yours?

We have tests taking MHTML as the input test file (existing tests loading MHTML).
What GTK/EFL/Qt need now is new tests that take HTML and generate MHTML.

So we will have tests with ".mht" as input in some cases, and "-expected.mht" as output in some cases.
Comment 10 Dirk Pranke 2013-05-22 18:22:47 PDT
(In reply to comment #9)
> > MHTML is a text-based format, so the word "binary" confuses me in this context. We need to distinguish between "here is the reference MHTML to compare against" and "here is the dumped MHTML archive from the test" (although I'm not actually sure we need reference MHTML files, we could probably just use regular html files). 
> 
> I don't know much about MHTML but the API has a "useBinaryEncoding" boolean. Which is why I ask about that.
> 

I see. I have no idea what that would do :)

> > Which of those two types of files is which of yours?
> 
> We have tests taking MHTML as the input test file (existing tests loading MHTML).
> What GTK/EFL/Qt need now is new tests that take HTML and generate MHTML.
> 
> So we will have tests with ".mht" as input in some cases, and "-expected.mht" as output in some cases.

Right, so to recap my earlier objection, -expected.mht (a baseline) would conflict as a name with -expected.mht (the reference for a reftest). Given that the regular encoding for MHTML is as text (mime/multipart + base64-encoded subresources), I was suggesting that we just use -expected.txt for the baseline. 

If we were to use -expected.mht for the baseline we'd have to forbid it from also being used as a reftest (which this patch does), but that seems confusing to me, because then we'd have the same extension being used for two different things.

I'm not sure what your opinion is of this dilemma yet.
Comment 11 Benjamin Poulain 2013-05-22 18:43:26 PDT
> Right, so to recap my earlier objection, -expected.mht (a baseline) would conflict as a name with -expected.mht (the reference for a reftest). Given that the regular encoding for MHTML is as text (mime/multipart + base64-encoded subresources), I was suggesting that we just use -expected.txt for the baseline. 
> 
> If we were to use -expected.mht for the baseline we'd have to forbid it from also being used as a reftest (which this patch does), but that seems confusing to me, because then we'd have the same extension being used for two different things.
> 
> I'm not sure what your opinion is of this dilemma yet.

What I don't like about "-expected.txt" for MHTML is it kills readability of the test. If we go down that path, people cannot just open the file in their browser to see the expectations live.

Your objection about ref-tests is valid and I have no solution to propose for that issue. :(
Comment 12 Dirk Pranke 2013-05-22 19:15:20 PDT
(In reply to comment #11)
> 
> What I don't like about "-expected.txt" for MHTML is it kills readability of the test. If we go down that path, people cannot just open the file in their browser to see the expectations live.
>

That is a good point. Hrm.
Comment 13 Santosh Mahto 2013-05-23 09:27:09 PDT
It seems there is a confusion related to  useBinaryEncoding
First:  MHTML is only text base format.
        and MHTML is independent of any type of encoding format.
it just append all the resource in one file  with header

If the resource is not in Text based format( like images) then :--- 
     By default resources data inside MHTML archive  are encoded as          base64 .setting useBinaryEncoding = true just override base64 
encoding of resource data to binary encoding. But the format of MHTML archive is not changed.
So useBinaryEncoding == true in only simple html file(no images) is useless as all content is in text format

e.g if useBinaryEncoding ==  false then Image will be appened as base64 in MHTMl archive
    if useBinaryEncoding ==  true then Image will be appened as Binary in MHTML archive
 
In webarchive format we append data in  <data> </data> section with particular 
encoding but in general webarchive foramt is independent of encoding.

So there is only one MHTML format(text, as webarchive is XML )  and encoding is just used to encode resource that are not in text format(pritnatble) 

so useBinaryEncoding(false) has not any  dependancy on MHTML format.
Comment 14 Benjamin Poulain 2013-05-23 11:25:55 PDT
Thanks for clarifying the binary format.
Comment 15 Benjamin Poulain 2013-05-23 11:32:18 PDT
Back to the expectations:

Unless I missed it, I did not find any ref-test using MHTML.

I would go forward with Zan and Santosh work and use -expected.mht for a dump format for seralized MHTML using the public API.

One can still write a ref-test by comparing a .mht with a -expected.html. <- Dirk, how much cannot be covered like this?

Any opinion before I r-/r+ this?
Comment 16 Zan Dobersek 2013-05-23 12:39:03 PDT
I'm OK with either way, really.

Also, FWIW, bug #116443 has a patch that converts the current MHTML tests into reftests (meaning that the tested MHTML archive output is compared against the output of the originating on equivalent HTML).
Comment 17 Dirk Pranke 2013-05-23 13:29:46 PDT
(In reply to comment #15)
> Back to the expectations:
> 
> Unless I missed it, I did not find any ref-test using MHTML.
> 
> I would go forward with Zan and Santosh work and use -expected.mht for a dump format for seralized MHTML using the public API.
> 
> One can still write a ref-test by comparing a .mht with a -expected.html. <- Dirk, how much cannot be covered like this?
>

I can't think of any reason you'd need a ref test to be mhtml, so in theory this patch's approach is fine, I just find it kind of confusing.
 
> Any opinion before I r-/r+ this?

another alternative we can consider is to not use -expected as the extension for reference tests. The W3C actually uses -ref instead. I know when we initially started supporting ref tests we wanted to use -expected for consistency with the text and pixel baselines, but that seems less compelling to me these days, and frankly following the W3C's conventions seems more valuable in the long run (I'm probably going to add support for it into Blink, for what that's worth).
Comment 18 Santosh Mahto 2013-05-26 20:45:19 PDT

(In reply to comment #17)
> (In reply to comment #15)
> > Back to the expectations:
> > 
> > Unless I missed it, I did not find any ref-test using MHTML.
> > 
> > I would go forward with Zan and Santosh work and use -expected.mht for a dump format for seralized MHTML using the public API.
> > 
> > One can still write a ref-test by comparing a .mht with a -expected.html. <- Dirk, how much cannot be covered like this?
 
 -expected.mht for a dump format for seralized MHTML using the public API  :
   looks ok to me as same  way we do for webarchive

> I can't think of any reason you'd need a ref test to be mhtml, so in theory this patch's approach is fine, I just find it kind of confusing.

> another alternative we can consider is to not use -expected as the extension for reference tests. The W3C actually uses -ref instead. I know when we initially started supporting ref tests we wanted to use -expected for consistency with the text and pixel baselines, but that seems less compelling to me these days, and frankly following the W3C's conventions seems more valuable in the long run (I'm probably going to add support for it into Blink, for what that's worth).

   This looks out of scope of this bug although seems good suggestion. Until
anyone take this  in his/her own hand in seperate bug  i think we should follow the baseline comparsion with *.mht
it makes sense to follow the  WebArchive dump testing
Comment 19 Zan Dobersek 2013-05-28 06:34:02 PDT
Comment on attachment 202125 [details]
Patch

Clearing flags on attachment: 202125

Committed r150800: <http://trac.webkit.org/changeset/150800>
Comment 20 Zan Dobersek 2013-05-28 06:34:09 PDT
All reviewed patches have been landed.  Closing bug.