Bug 116333

Summary: [webkitpy] Explicitly specify the reference file extensions
Product: WebKit Reporter: Zan Dobersek <zan>
Component: New BugsAssignee: Zan Dobersek <zan>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, commit-queue, dpranke, glenn, rniwa, santosh.mahto, santosh.ma, vivekg
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 116312    
Attachments:
Description Flags
Patch
none
Patch none

Zan Dobersek
Reported 2013-05-17 10:59:21 PDT
[webkitpy] Explicitly specify the reference file extensions
Attachments
Patch (3.91 KB, patch)
2013-05-17 11:11 PDT, Zan Dobersek
no flags
Patch (5.79 KB, patch)
2013-05-17 11:24 PDT, Zan Dobersek
no flags
Zan Dobersek
Comment 1 2013-05-17 11:11:50 PDT
Zan Dobersek
Comment 2 2013-05-17 11:24:49 PDT
Dirk Pranke
Comment 3 2013-05-17 13:32:31 PDT
Comment on attachment 202125 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=202125&action=review Why do we need or want to make this distinction? I'm not sure I understand ... > Tools/Scripts/webkitpy/port/base.py:617 > + filename_wihout_ext, ext = filesystem.splitext(filename) typo: "without".
Zan Dobersek
Comment 4 2013-05-17 13:51:59 PDT
(In reply to comment #3) > (From update of attachment 202125 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=202125&action=review > > Why do we need or want to make this distinction? I'm not sure I understand ... > There's undergoing work in bug #116312 that introduces running MHTML-dumping tests. The dumped output is then compared to the *-expected.mht MHTML archive baselines, much like the webarchive baselines under LayoutTests/webarchive. But .mht is also an extension used by various test files and is as such required to be listed in the _supported_file_extensions set (that I propose renaming). This same set is also used when searching for possible reference files in Port.reference_files, meaning that given a MHTML-dumping page.html layout test, its baseline, page-expected.mht, is actually recognized as a reference file, leading to considering the page.html test to be a reftest and comparing the pixel output etc. This is of course not welcomed as the .mht baseline should be treated as a text baseline, just like the .webarchive baselines.
Dirk Pranke
Comment 5 2013-05-17 14:18:00 PDT
I see. Thanks for explaining. It seems like a mistake to me to have .mht be both a valid extension for a test file and a result that *isn't* a valid reference file. That requires the users to be aware that .html is fine for both tests and references, but .mht isn't, and that perhaps a better solution is to either dump the file as -expected.txt, or to pick a different suffix for the thing that is a baseline. I imagine this tradeoff was considered before; what do y'all think of this?
Zan Dobersek
Comment 6 2013-05-18 12:25:57 PDT
CC-ing more people to hopefully get their opinion. I'll have to agree on the premise that a valid test file extension must also be a valid reference file extension, it makes total sense. Given that, I think it would make most sense to have the dumped MHTML archive compared against a *-expected.txt baseline. Another possibility would be to have the *.html test files and *.mht reference files. These would then have the pixel output compared, but I'm not sure how much coverage this would yield in regards to testing the proper format of the dumped MHTML archive. Another possibility that I would rather avoid is to have the test (and reference) files use .mht extension while the text baselines would use the .mhtml extension, but IMO that just brings additional complexity to the extensions support. FWIW, The current MHTML tests consist of .mht test files that are loaded and then dump text and pixel output. I think these can be easily converted into reference tests. Though, as mentioned before, I'm not sure of testing MHTML behavior the other way around (loading HTML test files and using MHTML files as the reference output).
Benjamin Poulain
Comment 7 2013-05-22 17:03:59 PDT
Comment on attachment 202125 [details] Patch Nothing against this, but why not differentiate based on the -expected? For MHTML, maybe we should have -expected.mht and -binary-expected.mht simultaneously for all tests don't you think?
Dirk Pranke
Comment 8 2013-05-22 17:23:54 PDT
(In reply to comment #7) > (From update of attachment 202125 [details]) > Nothing against this, but why not differentiate based on the -expected? > > For MHTML, maybe we should have -expected.mht and -binary-expected.mht simultaneously for all tests don't you think? MHTML is a text-based format, so the word "binary" confuses me in this context. We need to distinguish between "here is the reference MHTML to compare against" and "here is the dumped MHTML archive from the test" (although I'm not actually sure we need reference MHTML files, we could probably just use regular html files). Which of those two types of files is which of yours?
Benjamin Poulain
Comment 9 2013-05-22 18:03:11 PDT
> MHTML is a text-based format, so the word "binary" confuses me in this context. We need to distinguish between "here is the reference MHTML to compare against" and "here is the dumped MHTML archive from the test" (although I'm not actually sure we need reference MHTML files, we could probably just use regular html files). I don't know much about MHTML but the API has a "useBinaryEncoding" boolean. Which is why I ask about that. > Which of those two types of files is which of yours? We have tests taking MHTML as the input test file (existing tests loading MHTML). What GTK/EFL/Qt need now is new tests that take HTML and generate MHTML. So we will have tests with ".mht" as input in some cases, and "-expected.mht" as output in some cases.
Dirk Pranke
Comment 10 2013-05-22 18:22:47 PDT
(In reply to comment #9) > > MHTML is a text-based format, so the word "binary" confuses me in this context. We need to distinguish between "here is the reference MHTML to compare against" and "here is the dumped MHTML archive from the test" (although I'm not actually sure we need reference MHTML files, we could probably just use regular html files). > > I don't know much about MHTML but the API has a "useBinaryEncoding" boolean. Which is why I ask about that. > I see. I have no idea what that would do :) > > Which of those two types of files is which of yours? > > We have tests taking MHTML as the input test file (existing tests loading MHTML). > What GTK/EFL/Qt need now is new tests that take HTML and generate MHTML. > > So we will have tests with ".mht" as input in some cases, and "-expected.mht" as output in some cases. Right, so to recap my earlier objection, -expected.mht (a baseline) would conflict as a name with -expected.mht (the reference for a reftest). Given that the regular encoding for MHTML is as text (mime/multipart + base64-encoded subresources), I was suggesting that we just use -expected.txt for the baseline. If we were to use -expected.mht for the baseline we'd have to forbid it from also being used as a reftest (which this patch does), but that seems confusing to me, because then we'd have the same extension being used for two different things. I'm not sure what your opinion is of this dilemma yet.
Benjamin Poulain
Comment 11 2013-05-22 18:43:26 PDT
> Right, so to recap my earlier objection, -expected.mht (a baseline) would conflict as a name with -expected.mht (the reference for a reftest). Given that the regular encoding for MHTML is as text (mime/multipart + base64-encoded subresources), I was suggesting that we just use -expected.txt for the baseline. > > If we were to use -expected.mht for the baseline we'd have to forbid it from also being used as a reftest (which this patch does), but that seems confusing to me, because then we'd have the same extension being used for two different things. > > I'm not sure what your opinion is of this dilemma yet. What I don't like about "-expected.txt" for MHTML is it kills readability of the test. If we go down that path, people cannot just open the file in their browser to see the expectations live. Your objection about ref-tests is valid and I have no solution to propose for that issue. :(
Dirk Pranke
Comment 12 2013-05-22 19:15:20 PDT
(In reply to comment #11) > > What I don't like about "-expected.txt" for MHTML is it kills readability of the test. If we go down that path, people cannot just open the file in their browser to see the expectations live. > That is a good point. Hrm.
Santosh Mahto
Comment 13 2013-05-23 09:27:09 PDT
It seems there is a confusion related to useBinaryEncoding First: MHTML is only text base format. and MHTML is independent of any type of encoding format. it just append all the resource in one file with header If the resource is not in Text based format( like images) then :--- By default resources data inside MHTML archive are encoded as base64 .setting useBinaryEncoding = true just override base64 encoding of resource data to binary encoding. But the format of MHTML archive is not changed. So useBinaryEncoding == true in only simple html file(no images) is useless as all content is in text format e.g if useBinaryEncoding == false then Image will be appened as base64 in MHTMl archive if useBinaryEncoding == true then Image will be appened as Binary in MHTML archive In webarchive format we append data in <data> </data> section with particular encoding but in general webarchive foramt is independent of encoding. So there is only one MHTML format(text, as webarchive is XML ) and encoding is just used to encode resource that are not in text format(pritnatble) so useBinaryEncoding(false) has not any dependancy on MHTML format.
Benjamin Poulain
Comment 14 2013-05-23 11:25:55 PDT
Thanks for clarifying the binary format.
Benjamin Poulain
Comment 15 2013-05-23 11:32:18 PDT
Back to the expectations: Unless I missed it, I did not find any ref-test using MHTML. I would go forward with Zan and Santosh work and use -expected.mht for a dump format for seralized MHTML using the public API. One can still write a ref-test by comparing a .mht with a -expected.html. <- Dirk, how much cannot be covered like this? Any opinion before I r-/r+ this?
Zan Dobersek
Comment 16 2013-05-23 12:39:03 PDT
I'm OK with either way, really. Also, FWIW, bug #116443 has a patch that converts the current MHTML tests into reftests (meaning that the tested MHTML archive output is compared against the output of the originating on equivalent HTML).
Dirk Pranke
Comment 17 2013-05-23 13:29:46 PDT
(In reply to comment #15) > Back to the expectations: > > Unless I missed it, I did not find any ref-test using MHTML. > > I would go forward with Zan and Santosh work and use -expected.mht for a dump format for seralized MHTML using the public API. > > One can still write a ref-test by comparing a .mht with a -expected.html. <- Dirk, how much cannot be covered like this? > I can't think of any reason you'd need a ref test to be mhtml, so in theory this patch's approach is fine, I just find it kind of confusing. > Any opinion before I r-/r+ this? another alternative we can consider is to not use -expected as the extension for reference tests. The W3C actually uses -ref instead. I know when we initially started supporting ref tests we wanted to use -expected for consistency with the text and pixel baselines, but that seems less compelling to me these days, and frankly following the W3C's conventions seems more valuable in the long run (I'm probably going to add support for it into Blink, for what that's worth).
Santosh Mahto
Comment 18 2013-05-26 20:45:19 PDT
(In reply to comment #17) > (In reply to comment #15) > > Back to the expectations: > > > > Unless I missed it, I did not find any ref-test using MHTML. > > > > I would go forward with Zan and Santosh work and use -expected.mht for a dump format for seralized MHTML using the public API. > > > > One can still write a ref-test by comparing a .mht with a -expected.html. <- Dirk, how much cannot be covered like this? -expected.mht for a dump format for seralized MHTML using the public API : looks ok to me as same way we do for webarchive > I can't think of any reason you'd need a ref test to be mhtml, so in theory this patch's approach is fine, I just find it kind of confusing. > another alternative we can consider is to not use -expected as the extension for reference tests. The W3C actually uses -ref instead. I know when we initially started supporting ref tests we wanted to use -expected for consistency with the text and pixel baselines, but that seems less compelling to me these days, and frankly following the W3C's conventions seems more valuable in the long run (I'm probably going to add support for it into Blink, for what that's worth). This looks out of scope of this bug although seems good suggestion. Until anyone take this in his/her own hand in seperate bug i think we should follow the baseline comparsion with *.mht it makes sense to follow the WebArchive dump testing
Zan Dobersek
Comment 19 2013-05-28 06:34:02 PDT
Comment on attachment 202125 [details] Patch Clearing flags on attachment: 202125 Committed r150800: <http://trac.webkit.org/changeset/150800>
Zan Dobersek
Comment 20 2013-05-28 06:34:09 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.