Bug 202321

Summary: [GTK] Web Inspector: page error with shared-mime-info 1.14
Product: WebKit Reporter: Xi Ruoyao <xry111>
Component: Web InspectorAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: aperez, bugs-noreply, bugzilla, cgarcia, clopez, clord, inspector-bugzilla-changes, lantw44, mcatanzaro, uncommonnonsense, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Local Build   
Hardware: All   
OS: All   
See Also: https://bugs.webkit.org/show_bug.cgi?id=201545
https://bugs.webkit.org/show_bug.cgi?id=160347

Description Xi Ruoyao 2019-09-27 10:30:41 PDT
When I try to open the Web Inspector in webkitgtk-2.26.1 based browsers, at first nothing is happening.  After retry the Web Inspector frame shows up but with an error message:

error on line 43 at column 8: Opening and ending tag mismatch: link line 0 and head

It's reproducible on the latest Arch Linux system and Beyond Linux From Scratch system.
Comment 1 Michael Catanzaro 2019-09-29 09:58:59 PDT
Also broken in Fedora's 2.26.0 and GNOME runtime's 2.26.1.
Comment 2 Michael Catanzaro 2019-10-02 13:56:42 PDT
So apparently this bug maybe does not occur on Debian. I suggest installing Epiphany Technology Preview for testing purposes. It provides a distro-agnostic ground-truth runtime that exhibits this bug and will allow testing regardless of host dependencies.

In the meantime, I'm using Firefox's inspector, which feels a lot more familiar than Chrome's.
Comment 3 Carlos Garcia Campos 2019-10-03 01:13:09 PDT
Does it happen with sandboxing disabled?
Comment 4 Xi Ruoyao 2019-10-03 01:15:31 PDT
(In reply to Michael Catanzaro from comment #2)
> So apparently this bug maybe does not occur on Debian. I suggest installing
> Epiphany Technology Preview for testing purposes. It provides a
> distro-agnostic ground-truth runtime that exhibits this bug and will allow
> testing regardless of host dependencies.
> 
> In the meantime, I'm using Firefox's inspector, which feels a lot more
> familiar than Chrome's.

I "fixed" this by moving "<!DOCTYPE html>" in Source/WebInspectorUI/UserInterface/Main.html from line 26 to line 1.

It seems if <!DOCTYPE html> is at line 26, WebKit ignores it and tries to parse Main.html as XML (again!).
Comment 5 Michael Catanzaro 2019-10-03 10:59:59 PDT
(In reply to Carlos Garcia Campos from comment #3)
> Does it happen with sandboxing disabled?

Yes it does. Remember you asked me this elsewhere just yesterday.

There were a bunch of HTML/XHTML/SVG-related changes in shared-mime-info recently. I bet you can't reproduce because you are using older shared-mime-info.

HTML vs. XHTML confusion seems to be a recurring problem for WebKitGTK. See also: bug #201545
Comment 6 Carlos Alberto Lopez Perez 2019-10-11 07:55:58 PDT
(In reply to Xi Ruoyao from comment #4)
> (In reply to Michael Catanzaro from comment #2)
> > So apparently this bug maybe does not occur on Debian. I suggest installing
> > Epiphany Technology Preview for testing purposes. It provides a
> > distro-agnostic ground-truth runtime that exhibits this bug and will allow
> > testing regardless of host dependencies.
> > 
> > In the meantime, I'm using Firefox's inspector, which feels a lot more
> > familiar than Chrome's.
> 
> I "fixed" this by moving "<!DOCTYPE html>" in
> Source/WebInspectorUI/UserInterface/Main.html from line 26 to line 1.
> 
> It seems if <!DOCTYPE html> is at line 26, WebKit ignores it and tries to
> parse Main.html as XML (again!).

Then this may be related to this shared-mime-info commit:

https://cgit.freedesktop.org/xdg/shared-mime-info/commit/freedesktop.org.xml.in?id=8ae13a589577e9bda12fb16465a03cd81b1cd349

That should be fixed with https://gitlab.freedesktop.org/xdg/shared-mime-info/merge_requests/27

Not sure if the version of shared-mime-info in fedora is affected.
Comment 7 Michael Catanzaro 2019-10-11 08:41:10 PDT
(In reply to Carlos Alberto Lopez Perez from comment #6)
> That should be fixed with
> https://gitlab.freedesktop.org/xdg/shared-mime-info/merge_requests/27
> 
> Not sure if the version of shared-mime-info in fedora is affected.

We have shared-mime-info 1.14 in both Fedora 30 and Fedora 31.

Those commits are more likely to have introduced the regression. They certainly cannot be the fix, because they are already included in 1.14.
Comment 8 Carlos Alberto Lopez Perez 2019-10-11 09:07:06 PDT
(In reply to Michael Catanzaro from comment #7)
> (In reply to Carlos Alberto Lopez Perez from comment #6)
> > That should be fixed with
> > https://gitlab.freedesktop.org/xdg/shared-mime-info/merge_requests/27
> > 
> > Not sure if the version of shared-mime-info in fedora is affected.
> 
> We have shared-mime-info 1.14 in both Fedora 30 and Fedora 31.
> 
> Those commits are more likely to have introduced the regression. They
> certainly cannot be the fix, because they are already included in 1.14.

I quickly checked shared-mime-info master and it seems to me that the original commit from https://gitlab.freedesktop.org/xdg/shared-mime-info/merge_requests/27 was not merged, but a reworked version.

So, may it be that the issue is still not fixed in shared-mime-info? :\
Comment 9 Michael Catanzaro 2019-10-11 12:46:29 PDT
shared-mime-info 1.14 is one translation commit ahead of master. Basically the same thing. It's definitely not fixed.
Comment 10 Michael Catanzaro 2019-10-14 10:05:10 PDT
This bug also affects WebKitGTK 2.24.
Comment 11 Carlos Alberto Lopez Perez 2019-10-14 12:07:31 PDT
I confirm that this "fixes" the issue on Fedora 31:

$ curl https://people.igalia.com/clopez/wkbug/202321/0001-Revert-Assign-.html-to-XHTML-pages.patch| sudo patch  /usr/share/mime/packages/freedesktop.org.xml
$ sudo /usr/bin/update-mime-database /usr/share/mime

(basically, just revert 8ae13a589577e9bda12fb16465a03cd81b1cd349 on shared-mime-info).


However, I'm confused because the tool xdg-mime reports text/html for the questioned file before applying that workaround.

$ wget https://trac.webkit.org/export/251072/webkit/trunk/Source/WebInspectorUI/UserInterface/Main.html
$ xdg-mime query filetype Main.html
text/html


So, not sure what is exactly confusing shared-mime-info, but seems something is.
Comment 12 Carlos Garcia Campos 2019-10-15 23:55:41 PDT
So, is this a bug in wk or shared-mime-info?
Comment 13 Carlos Alberto Lopez Perez 2019-10-16 01:12:04 PDT
(In reply to Carlos Garcia Campos from comment #12)
> So, is this a bug in wk or shared-mime-info?

Good question. I don't know :\

To claim is a bug in shared mime info I need a way to reproduce the issue with shared-mime-info (and to file a bug there in case it is a issue with it). I tried with "xdg-mime query" but it doesn't reproduce the issue.

Maybe with a small C/C++ program that simulates what WebKit does? But I don't know where is exactly the WebKit code for this to see what it does and how.
Comment 14 Chris Lord 2019-10-16 01:38:36 PDT
(In reply to Carlos Alberto Lopez Perez from comment #13)
> (In reply to Carlos Garcia Campos from comment #12)
> > So, is this a bug in wk or shared-mime-info?
> 
> Good question. I don't know :\
> 
> To claim is a bug in shared mime info I need a way to reproduce the issue
> with shared-mime-info (and to file a bug there in case it is a issue with
> it). I tried with "xdg-mime query" but it doesn't reproduce the issue.
> 
> Maybe with a small C/C++ program that simulates what WebKit does? But I
> don't know where is exactly the WebKit code for this to see what it does and
> how.

IIRC, the relevant code is in libsoup somewhere, but I don't have access to the machine I investigated this on at the moment...
Comment 15 Bastien Nocera 2019-10-16 01:56:21 PDT
(In reply to Carlos Alberto Lopez Perez from comment #13)
> (In reply to Carlos Garcia Campos from comment #12)
> > So, is this a bug in wk or shared-mime-info?
> 
> Good question. I don't know :\
> 
> To claim is a bug in shared mime info I need a way to reproduce the issue
> with shared-mime-info (and to file a bug there in case it is a issue with
> it). I tried with "xdg-mime query" but it doesn't reproduce the issue.

xdg-mime is a pile of shell scripting garbage. And it probably just shows the mime-type guessed from the suffix.

> Maybe with a small C/C++ program that simulates what WebKit does? But I
> don't know where is exactly the WebKit code for this to see what it does and
> how.

There's a test suite in shared-mime-info, details of which are explained in the HACKING file.

I'll repeat it once more though, WebKitGTK seriously needs to stop using shared-mime-info to detect whether files are one type or the other. Or somebody needs to take all the internal html pages and add them to the shared-mime-info test suite so it doesn't regress.
Comment 16 Michael Catanzaro 2019-10-16 03:23:55 PDT
(In reply to Bastien Nocera from comment #15)
> I'll repeat it once more though, WebKitGTK seriously needs to stop using
> shared-mime-info to detect whether files are one type or the other. Or
> somebody needs to take all the internal html pages and add them to the
> shared-mime-info test suite so it doesn't regress.

I'll just add: Bastien has recommended this many times, and so has Alexey, most recently in bug #201545.

It would require removing the SoupContentSniffer feature. Note that feature might be added by default now (check the SoupSession documentation), so just removing it from the SoupSession initialization isn't enough because that is probably redundant. It should be explicitly removed.
Comment 17 Michael Catanzaro 2019-10-16 03:28:05 PDT
(In reply to Bastien Nocera from comment #15)
> xdg-mime is a pile of shell scripting garbage. And it probably just shows
> the mime-type guessed from the suffix.

Does it use shared-mime-info?

I see a different result than Carlos:

$ xdg-mime query filetype Main.html 
application/xhtml+xml

And I see it reported as XHTML in nautilus properties dialog, too.
Comment 18 Bastien Nocera 2019-10-16 03:42:38 PDT
(In reply to Michael Catanzaro from comment #17)
> (In reply to Bastien Nocera from comment #15)
> > xdg-mime is a pile of shell scripting garbage. And it probably just shows
> > the mime-type guessed from the suffix.
> 
> Does it use shared-mime-info?
> 
> I see a different result than Carlos:
> 
> $ xdg-mime query filetype Main.html 
> application/xhtml+xml

It uses different "backends" depending on what's installed. It might use some KDE definitions.

> And I see it reported as XHTML in nautilus properties dialog, too.

Nautilus might or might not use data to figure this out. Or maybe just the glob.

Use the shared-mime-info test suite if you want to test things uninstalled, use g_content_type_guess() if you want to write your own test case.
Comment 19 Carlos Alberto Lopez Perez 2019-10-16 03:58:00 PDT
(In reply to Michael Catanzaro from comment #16)
> (In reply to Bastien Nocera from comment #15)
> > I'll repeat it once more though, WebKitGTK seriously needs to stop using
> > shared-mime-info to detect whether files are one type or the other. Or
> > somebody needs to take all the internal html pages and add them to the
> > shared-mime-info test suite so it doesn't regress.
> 
> I'll just add: Bastien has recommended this many times, and so has Alexey,
> most recently in bug #201545.
> 
> It would require removing the SoupContentSniffer feature. Note that feature
> might be added by default now (check the SoupSession documentation), so just
> removing it from the SoupSession initialization isn't enough because that is
> probably redundant. It should be explicitly removed.

long history; short:

1. Someone assigns ".html" and ".htm" to XML type in shared-mime-info because:

"So that WebKitGTK+ can detect mime-types more reliably for local
XHTML files which wouldn't be detected as XHTML through magic."

https://cgit.freedesktop.org/xdg/shared-mime-info/commit/freedesktop.org.xml.in?id=8ae13a589577e9bda12fb16465a03cd81b1cd349


2. We decide that shared-mime-info is broken because of that and stop using it?


What's the point of 1. then? What does it "fixes" if we stop relying on shared-mime-info? Shouldn't 1. just be reverted on shared-mime-info instead?

I think that 1. was a bad idea from the beginning. If the file ends in ".htm" or ".html" but is an XML it should be just renamed to ".xhtml" or something instead of having shared-mime-info guessing it by the contents.
Comment 20 Bastien Nocera 2019-10-16 04:27:46 PDT
(In reply to Carlos Alberto Lopez Perez from comment #19)
> (In reply to Michael Catanzaro from comment #16)
> > (In reply to Bastien Nocera from comment #15)
> > > I'll repeat it once more though, WebKitGTK seriously needs to stop using
> > > shared-mime-info to detect whether files are one type or the other. Or
> > > somebody needs to take all the internal html pages and add them to the
> > > shared-mime-info test suite so it doesn't regress.
> > 
> > I'll just add: Bastien has recommended this many times, and so has Alexey,
> > most recently in bug #201545.
> > 
> > It would require removing the SoupContentSniffer feature. Note that feature
> > might be added by default now (check the SoupSession documentation), so just
> > removing it from the SoupSession initialization isn't enough because that is
> > probably redundant. It should be explicitly removed.
> 
> long history; short:
> 
> 1. Someone assigns ".html" and ".htm" to XML type in shared-mime-info
> because:
> 
> "So that WebKitGTK+ can detect mime-types more reliably for local
> XHTML files which wouldn't be detected as XHTML through magic."
> 
> https://cgit.freedesktop.org/xdg/shared-mime-info/commit/freedesktop.org.xml.
> in?id=8ae13a589577e9bda12fb16465a03cd81b1cd349
> 
> 
> 2. We decide that shared-mime-info is broken because of that and stop using
> it?
> 
> 
> What's the point of 1. then? What does it "fixes" if we stop relying on
> shared-mime-info? Shouldn't 1. just be reverted on shared-mime-info instead?
> 
> I think that 1. was a bad idea from the beginning. If the file ends in
> ".htm" or ".html" but is an XML it should be just renamed to ".xhtml" or
> something instead of having shared-mime-info guessing it by the contents.

shared-mime-info doesn't "detect" anything. It's a database, which applications can use how they see fit. Some of them will use pass a filename to an API that consumes the database, others will pass data, some will pass both.

Furthermore, shared-mime-info is built for end-users, not for programmers, which aren't really supposed to know whether a webpage is HTML or XHTML. shared-mime-info test suite needs to be fed some of the data that WebKitGTK relies on to function properly.
Comment 21 Carlos Alberto Lopez Perez 2019-10-16 05:08:44 PDT
(In reply to Bastien Nocera from comment #20)
> (In reply to Carlos Alberto Lopez Perez from comment #19)
> > (In reply to Michael Catanzaro from comment #16)
> > > (In reply to Bastien Nocera from comment #15)
> > > > I'll repeat it once more though, WebKitGTK seriously needs to stop using
> > > > shared-mime-info to detect whether files are one type or the other. Or
> > > > somebody needs to take all the internal html pages and add them to the
> > > > shared-mime-info test suite so it doesn't regress.
> > > 
> > > I'll just add: Bastien has recommended this many times, and so has Alexey,
> > > most recently in bug #201545.
> > > 
> > > It would require removing the SoupContentSniffer feature. Note that feature
> > > might be added by default now (check the SoupSession documentation), so just
> > > removing it from the SoupSession initialization isn't enough because that is
> > > probably redundant. It should be explicitly removed.
> > 
> > long history; short:
> > 
> > 1. Someone assigns ".html" and ".htm" to XML type in shared-mime-info
> > because:
> > 
> > "So that WebKitGTK+ can detect mime-types more reliably for local
> > XHTML files which wouldn't be detected as XHTML through magic."
> > 
> > https://cgit.freedesktop.org/xdg/shared-mime-info/commit/freedesktop.org.xml.
> > in?id=8ae13a589577e9bda12fb16465a03cd81b1cd349
> > 
> > 
> > 2. We decide that shared-mime-info is broken because of that and stop using
> > it?
> > 
> > 
> > What's the point of 1. then? What does it "fixes" if we stop relying on
> > shared-mime-info? Shouldn't 1. just be reverted on shared-mime-info instead?
> > 
> > I think that 1. was a bad idea from the beginning. If the file ends in
> > ".htm" or ".html" but is an XML it should be just renamed to ".xhtml" or
> > something instead of having shared-mime-info guessing it by the contents.
> 
> shared-mime-info doesn't "detect" anything. It's a database, which
> applications can use how they see fit. Some of them will use pass a filename
> to an API that consumes the database, others will pass data, some will pass
> both.
> 

Well, the reality is that a file that ends in ".html" gets identified as XML since the mentioned commit in shared-mime-info. Not sure how a discussion about if shared-mime-info is a database, an API or a program helps here.


> Furthermore, shared-mime-info is built for end-users, not for programmers,
> which aren't really supposed to know whether a webpage is HTML or XHTML.
> shared-mime-info test suite needs to be fed some of the data that WebKitGTK
> relies on to function properly.

That's news to me. How I'm supposed as end-user to use shared-mime-info? On my debian system the only program shared-mime-info includes is update-mime-database.
Comment 22 Bastien Nocera 2019-10-16 06:38:15 PDT
(In reply to Carlos Alberto Lopez Perez from comment #21)
> (In reply to Bastien Nocera from comment #20)
<snip>
> Well, the reality is that a file that ends in ".html" gets identified as XML
> since the mentioned commit in shared-mime-info. Not sure how a discussion
> about if shared-mime-info is a database, an API or a program helps here.

That means it can be used in different ways. The caller is the one that chooses whether to use content, or filename patterns, or a combination of both.

> 
> > Furthermore, shared-mime-info is built for end-users, not for programmers,
> > which aren't really supposed to know whether a webpage is HTML or XHTML.
> > shared-mime-info test suite needs to be fed some of the data that WebKitGTK
> > relies on to function properly.
> 
> That's news to me. How I'm supposed as end-user to use shared-mime-info? On
> my debian system the only program shared-mime-info includes is
> update-mime-database.

Its content is geared towards differentiating files for the benefit of end-users. If you need to know whether something is HTML, or XHTML, or which variant thereof, then you should implement it in your application/library. shared-mime-info does not aim to be a replacement for those, as you've seen.

Anyway, I don't think that carrying on the discussion here will have any effect. If there are identification bugs, please file MRs against shared-mime-info to extend the test suite to show those.
Comment 23 Carlos Alberto Lopez Perez 2019-10-16 07:39:05 PDT
(In reply to Bastien Nocera from comment #22)
> Anyway, I don't think that carrying on the discussion here will have any
> effect. If there are identification bugs, please file MRs against
> shared-mime-info to extend the test suite to show those.

Ok. Filed: https://gitlab.freedesktop.org/xdg/shared-mime-info/issues/120
Comment 24 Carlos Garcia Campos 2019-10-21 01:31:46 PDT
(In reply to Carlos Alberto Lopez Perez from comment #23)
> (In reply to Bastien Nocera from comment #22)
> > Anyway, I don't think that carrying on the discussion here will have any
> > effect. If there are identification bugs, please file MRs against
> > shared-mime-info to extend the test suite to show those.
> 
> Ok. Filed: https://gitlab.freedesktop.org/xdg/shared-mime-info/issues/120

And it's now fixed, right? I just want to clarify how the content type guess anyway.

1- For non-local files, the type is only sniffed when sniff policy allows it (it's the default, though). It's done by libsoup, using its own impl, not shared-mime-info.

2- For local files, libsoup gets the content type using G_FILE_ATTRIBUTE_STANDARD_CONTENT_TYPE. When we don't get a content type from libsoup we try with the WebKit MIMETypeRegistry (which tries to get the type from the file extension using xdg_mime_get_mime_type_from_file_name).

So, I don't think WebKit i using shared-mimo-info directly, it's libsoup using G_FILE_ATTRIBUTE_STANDARD_CONTENT_TYPE, which ends up suing g_content_type_guess(), for local files.
Comment 25 Michael Catanzaro 2019-10-22 05:16:05 PDT
(In reply to Carlos Garcia Campos from comment #24)
> And it's now fixed, right?

Yes, closing.

> So, I don't think WebKit i using shared-mimo-info directly, it's libsoup
> using G_FILE_ATTRIBUTE_STANDARD_CONTENT_TYPE, which ends up suing
> g_content_type_guess(), for local files.

But it's an opt-in feature that WebKit explicitly enables when creating its SoupSession in SoupNetworkSession.cpp and NetworkDataTaskSoup.cpp, so WebKit is requesting this behavior.

(In reply to Michael Catanzaro from comment #16)
> It would require removing the SoupContentSniffer feature. Note that feature
> might be added by default now (check the SoupSession documentation), so just
> removing it from the SoupSession initialization isn't enough because that is
> probably redundant. It should be explicitly removed.

Sorry, this was incorrect. The feature is not enabled by default.
Comment 26 Michael Catanzaro 2019-10-22 05:16:42 PDT
(In reply to Michael Catanzaro from comment #25)
> (In reply to Carlos Garcia Campos from comment #24)
> > And it's now fixed, right?
> 
> Yes, closing.

Well, presumably. We still need a new shared-mime-info release.
Comment 27 Radar WebKit Bug Importer 2019-10-22 05:18:18 PDT
<rdar://problem/56497056>
Comment 28 Carlos Garcia Campos 2019-10-22 06:20:44 PDT
(In reply to Michael Catanzaro from comment #25)
> (In reply to Carlos Garcia Campos from comment #24)
> > And it's now fixed, right?
> 
> Yes, closing.
> 
> > So, I don't think WebKit i using shared-mimo-info directly, it's libsoup
> > using G_FILE_ATTRIBUTE_STANDARD_CONTENT_TYPE, which ends up suing
> > g_content_type_guess(), for local files.
> 
> But it's an opt-in feature that WebKit explicitly enables when creating its
> SoupSession in SoupNetworkSession.cpp and NetworkDataTaskSoup.cpp, so WebKit
> is requesting this behavior.

As I explained before, the soup sniffer is not used for local files, where G_FILE_ATTRIBUTE_STANDARD_CONTENT_TYPE is used unconditionally.

> (In reply to Michael Catanzaro from comment #16)
> > It would require removing the SoupContentSniffer feature. Note that feature
> > might be added by default now (check the SoupSession documentation), so just
> > removing it from the SoupSession initialization isn't enough because that is
> > probably redundant. It should be explicitly removed.
> 
> Sorry, this was incorrect. The feature is not enabled by default.
Comment 29 Edoardo Vacchi 2019-11-23 05:24:11 PST
sorry if this is a useless comment, but I am still observing this behaviour on Fedora 31, shared-mime-info 1.15

Name         : shared-mime-info
Version      : 1.15
Release      : 1.fc31


Name         : webkit2gtk3
Version      : 2.26.2
Release      : 1.fc31

Anything I can do to further debug this?
Comment 30 Michael Catanzaro 2019-11-23 09:07:35 PST
I have the same package versions here and it's definitely fixed.
Comment 31 Edoardo Vacchi 2019-11-23 11:04:50 PST
I thought I mentioned, I am observing this via Epiphany, but I assume it is dynamically linked against the system webkit, right?
Comment 32 Michael Catanzaro 2019-11-23 14:30:30 PST
Of course.
Comment 33 Edoardo Vacchi 2019-11-25 00:21:10 PST
Is there any further action I can take to debug this on my end? Any caches to clear? I already tried to issue

# update-mime-database /usr/share/mime

I have curl'd '...Source/WebInspectorUI/UserInterface/Main.html'

and indeed:

$ file Scratch/Main.html
Scratch/Main.html: HTML document, ASCII text

so I am kind of puzzled at this point
Comment 34 Xi Ruoyao 2019-11-25 01:26:24 PST
> $ file Scratch/Main.html
> Scratch/Main.html: HTML document, ASCII text

`file` utility does not use shared-mime-info.
Comment 35 Edoardo Vacchi 2019-11-25 03:55:57 PST
$ xdg-mime query filetype Main.html                                                                                                                                                                                               text/html