WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
232471
REGRESSION(
r284521
): [SOUP] Several tests parsed as HTML because of ".html" extensions on other platforms are now parsed as XHTML because they "sniff" as XHTML (and some are invalid XHTML!)
https://bugs.webkit.org/show_bug.cgi?id=232471
Summary
REGRESSION(r284521): [SOUP] Several tests parsed as HTML because of ".html" e...
Diego Pino
Reported
2021-10-29 00:24:17 PDT
Several tests written in XHTML are not valid XHTML
Attachments
Patch
(20.84 KB, patch)
2021-10-29 00:25 PDT
,
Diego Pino
ap
: review-
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Diego Pino
Comment 1
2021-10-29 00:25:22 PDT
Created
attachment 442790
[details]
Patch
Darin Adler
Comment 2
2021-10-29 11:48:55 PDT
Comment on
attachment 442790
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=442790&action=review
> LayoutTests/ChangeLog:3 > + Several tests written in XHTML are not valid XHTML
What does "written in XHTML" mean here? As far as I can tell, none of these are XHTML. They have ".html" extensions causing them to be processed as HTML, not XHTML, and the first one last least starts with: <!DOCTYPE html> That means HTML 5, not XHTML.
Alexey Proskuryakov
Comment 3
2021-10-30 11:34:05 PDT
Comment on
attachment 442790
[details]
Patch Marking r-, it doesn’t seem like there is anything that needs to be done here. I don’t think that we want to remove xmlns either, we are not striving to have our tests be absolutely minimal or even pass validation in general.
Diego Pino
Comment 4
2021-10-31 18:38:37 PDT
(In reply to Alexey Proskuryakov from
comment #3
)
> Comment on
attachment 442790
[details]
> Patch > > Marking r-, it doesn’t seem like there is anything that needs to be done > here. > > I don’t think that we want to remove xmlns either, we are not striving to > have our tests be absolutely minimal or even pass validation in general.
There has been a change recently in WebKitGTK/WPE (
https://trac.webkit.org/changeset/284521/webkit
), that makes several tests fail because they're now detected as MIME type XHTML. The tests were marked as failure in
webkit.org/b/232386
. It seems to me other WebKit ports use the file extension to determine whether a file is HTML or XHTML. These tests that are failing all have .html as extension, but they claim to be XHTML (through a DTD header or xmlns). Having these tests fixed as valid XHTML makes several tests pass in WebKitGTK/WPE, and it doesn't hurt any other WebKit ports. An alternatively solution, and possibly better, it's to actually drop the DTD header and xmlns declaration in these failing tests as they do not need to be XHTML documents.
Alexey Proskuryakov
Comment 5
2021-11-01 08:53:20 PDT
While this is not what these particular tests are explicitly intended for, it is important that we do test how .html files of this form are parsed by WebKit. There is a lot of content on the web that might be misinterpreted as XHTML due to having some pieces of XML syntax it it, but is parsed as HTML. I don't quite understand why content sniffing for local .html files is desirable for the Gtk port. Does any other browser engine behave like that, failing to display local .html files if there are vestiges of XHTML in them? My understanding is that this Gtk behavior is an exception, not the norm. If this is truly what Gtk needs, then WebKitTestRunner probably should disable content sniffing for most tests, possibly enabling it for some Gtk-specific tests that explicitly verify that it works, not the other way around.
Michael Catanzaro
Comment 6
2021-11-01 10:07:08 PDT
(In reply to Alexey Proskuryakov from
comment #5
)
> I don't quite understand why content sniffing for local .html files is > desirable for the Gtk port. Does any other browser engine behave like that, > failing to display local .html files if there are vestiges of XHTML in them? > My understanding is that this Gtk behavior is an exception, not the norm.
I agree it's certainly not desirable. Our desired behavior is "do exactly what Safari does." That said, we cannot simply revert
r284521
(the most recent commit that broke these tests) because that commit itself fixes a regression that is more severe. Looking into how Apple ports behave, it looks pretty similar: NetworkDataTaskCocoa.mm decides whether to content sniff or not, and if so it uses CFNetwork to do that. It's possible that CFNetwork is just stricter about detecting XHTML than shared-mime-info is. But also, NetworkDataTaskCocoa.mm disables content sniffing altogether for local files. I don't know why. I guess we should do that, but I'm not sure if making the same change for the soup backend would actually be enough to fix our tests, though, because the code we changed here was NetworkDataTaskSoup::didGetFileInfo which (a) is calling WebCore::ResourceResponse::setMIMEType, something NetworkDataTaskCocoa does not do at all, and (b) is clearly not controlled by whether shouldContentSniff is enabled or not, it seems to be entirely unrelated to the content sniffing setting, even though it is content sniffing. This needs a closer look to figure out what the desired behavior really is.
Michael Catanzaro
Comment 7
2021-11-01 10:13:54 PDT
(In reply to Alexey Proskuryakov from
comment #5
)
> While this is not what these particular tests are explicitly intended for, > it is important that we do test how .html files of this form are parsed by > WebKit. There is a lot of content on the web that might be misinterpreted as > XHTML due to having some pieces of XML syntax it it, but is parsed as HTML.
I am actually OK with Diego's commit fixing these tests, because testing HTML vs. XHTML is not the intended purpose of these tests. They seem to be failure to copy/paste the right boilerplate, which is worth fixing. That said, if we do fix these tests, then we should also add new tests that fail in the same ways that these tests are currently failing now, so that we can continue testing to ensure different WebKit ports implement the same behavior, like Alexey suggests.
Alexey Proskuryakov
Comment 8
2021-11-01 16:12:37 PDT
These tests are intended to be parsed as HTML, and always have been. If anything, removing xmlns would be small cleanup, but again, we usually don't strive to clean up tests too much - the real web is messy.
Radar WebKit Bug Importer
Comment 9
2021-11-05 00:25:18 PDT
<
rdar://problem/85049435
>
Michael Catanzaro
Comment 10
2021-11-09 13:15:01 PST
(Darin, your new title for this issue is actually more like the opposite of what's going on. I've retitled it.)
Michael Catanzaro
Comment 11
2021-11-10 06:14:45 PST
***
Bug 232386
has been marked as a duplicate of this bug. ***
Michael Catanzaro
Comment 12
2023-05-03 07:07:17 PDT
***
Bug 239597
has been marked as a duplicate of this bug. ***
Michael Catanzaro
Comment 13
2023-05-03 07:07:23 PDT
***
Bug 256253
has been marked as a duplicate of this bug. ***
Michael Catanzaro
Comment 14
2023-05-03 07:13:35 PDT
It seems we do give higher weight to the file extension than to the content:
https://gitlab.freedesktop.org/xdg/shared-mime-info/-/issues/98
But this fails because we don't consider them both at the same time. NetworkDataTaskSoup::didGetFileInfo first tries to guess the MIME type for the content, and only if that fails does it guess the MIME type for the filename. Any changes here risk bringing back
bug #230797
so caution is required. What we do here will also affect how we approach
bug #252263
.
Michael Catanzaro
Comment 15
2023-05-03 07:17:00 PDT
(In reply to Michael Catanzaro from
comment #14
)
> What we do here will also affect how we approach
bug #252263
.
Actually, in that bug I proposed using g_content_type_guess() to replace xdg_mime_get_mime_type_from_file_name(). g_content_type_guess() allows us to pass both the file extension and the contents at the same time so that they can both be considered. That *might* solve everything. Maybe.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug