NEW 278992
Entity 'commat' not defined
https://bugs.webkit.org/show_bug.cgi?id=278992
Summary Entity 'commat' not defined
dabl02
Reported 2024-09-01 13:25:47 PDT
Created attachment 472397 [details] HTML file causing rendering error I generate HTML documentation with a tool called odig (HTML file not written by hand). This particular file `commat_error.html` causes an error in Epiphany that I don't have in Firefox. The error is: ``` error on line 2 at column 3176: Entity 'commat' not defined ``` The version tested is Epiphany 46.3 installed with flatpak. WebKitGTK version is 2.44.3. Thank you
Attachments
HTML file causing rendering error (4.46 KB, text/html)
2024-09-01 13:25 PDT, dabl02
no flags
dabl02
Comment 1 2024-09-01 13:40:15 PDT
I tested opening the uploaded file after submitting the bug. I open the file directly in Epiphany using the file attachment link, I had no problem. I had to manually download the file manually and open it from the file system to see the error. I hope you can reproduce the error. Maybe it is a manipulation error on my part but it seems over-complicated for a simple file.
Karl Dubost
Comment 2 2024-09-03 23:01:31 PDT
So this bug is probably talking about this line, <li>Daniel Bünzli &lt;daniel.buenzl i&commat;erratique.ch&gt;</li> which is rendered as Daniel Bünzli <daniel.buenzl i@erratique.ch> on macOS Safari. This is this character. https://www.compart.com/en/unicode/U+0040 There is no issue on Safari, maybe this is just for Epiphany.
Michael Catanzaro
Comment 3 2024-09-04 09:32:30 PDT
I've tested WebKitGTK 2.44.3 from Fedora and also WebKitGTK 2.59.91 from Epiphany Tech Preview. Works fine for me in both.
dabl02
Comment 4 2024-09-07 02:49:25 PDT
Thanks for your replies. Firstly, at first sight, this bug seems not particularly related to the commat entity but the way the HTML file is read / parse and rendered until this character. After your comments, I was able to reproduce this bug again on Epiphany Technology Preview WebKitGTK 2.45.92 on Debian 12. I tested it previously on Void Linux (distribution package) and on Linux Mint LMDE (flatpak) and obviously encountered the same error. To further describe the error, it is written in a red rectangle at the top of the page: This page contains the following errors: error on line 2 at column 3176: Entity 'commat' not defined Below is a rendering of the page up to the first error. Then, as it is written, the page is rendered below the rectangle until the "problematic" character. I have uploaded this file to the W3 validator website https://validator.w3.org/#validate_by_upload. I don't see any relevant information about the file being not valid (although you may have a look to confirm my opinion). The exact steps I took to reproduce the bug are: - Go to the attachment link with a browser other than Epiphany (In my case Firefox) https://bug-278992-attachments.webkit.org/attachment.cgi?id=472397. - Right click and select View Page Source. - CTRL-A and CTRL-C. - On your system create a file `commat_bug.html` and paste the clipboard content. You should have two lines of code. - Then save the file: its md5sum is 5a342cce19b976dd612ee044a23159cc. If you inadvertently added a new line character at the end, the md5sum is a40ea32795a1466d9b32319506d693ed. - Open this file with Epiphany. In my case, I open the Thunar file browser, right click on the file and select Open With. Are you able to get the error by following this steps and with the right checksum? I'm not sure if this is really "a bug" on Epiphany's part, or if the problem is in the file. Having open it normally in Chromium and Firefox (and from your sides on Safary etc.) suggests to me that something is not normal and there may be an issue somewhere.
Michael Catanzaro
Comment 5 2024-09-07 05:57:41 PDT
Try it in a new Unix user account with a clean home directory. Does the bug still occur?
dabl02
Comment 6 2024-09-07 06:27:28 PDT
(In reply to Michael Catanzaro from comment #5) > Try it in a new Unix user account with a clean home directory. Does the bug > still occur? Yes, same error. I just tried: - Run `useradd debug-epiphany`. - Run `passwd debug-epiphany`. - Log out and log in as the new user. - Copy the file to my home directory and open it. I also previously tested it on three different distributions on three different machines.
Michael Catanzaro
Comment 7 2024-09-07 10:51:34 PDT
(In reply to dabl02 from comment #1) > I open the file directly in Epiphany using the file attachment link, I had > no problem. > > I had to manually download the file manually and open it from the file > system to see the error. OK, problem was I didn't read your comment. The error doesn't occur on WebKit Bugzilla because an HTTP header sets the content type. When opened locally, there is no HTTP and the content type is guessed from the content of the file. shared-mime-info reasonably guesses XHTML because that's what it is: <html xmlns="http://www.w3.org/1999/xhtml"> So the error is correct. You've just got an invalid XHTML file.
dabl02
Comment 8 2024-09-10 13:48:09 PDT
(In reply to Michael Catanzaro from comment #7) > OK, problem was I didn't read your comment. The error doesn't occur on > WebKit Bugzilla because an HTTP header sets the content type. When opened > locally, there is no HTTP and the content type is guessed from the content > of the file. shared-mime-info reasonably guesses XHTML because that's what > it is: > > <html xmlns="http://www.w3.org/1999/xhtml"> > > So the error is correct. You've just got an invalid XHTML file. That's it! So the problem is the document type being recognized wrongly. To find an explanation, I read the HTML5 standard. I found some interesting answers: - The parser used for the file depends on its MIME type: https://dom.spec.whatwg.org/#html-document ``` A document is said to be an XML document if its type is "xml"; otherwise an HTML document. Whether a document is an HTML document or an XML document affects the behavior of certain APIs. ``` - The xmlns attribute has no effect on guessing the document type (HTML or XML): https://html.spec.whatwg.org/multipage/dom.html#global-attributes ``` In HTML documents, elements in the HTML namespace may have an xmlns attribute specified, if, and only if, it has the exact value "http://www.w3.org/1999/xhtml". This does not apply to XML documents. ``` - The way to retrieve the MIME type of a resource depends on its source (network or file system). There is a MIME type detection algorithm. In the case of a local file, the MIME type is provided by the file system: https://mimesniff.spec.whatwg.org/#interpreting-the-resource-metadata ``` If the resource is retrieved directly from the file system, set supplied-type to the MIME type provided by the file system. ``` - Finally, the association between filename extensions and MIME types exists in UNIX-type systems at `etc/mime.types` and can be seen as a way for the filesystem to provide the MIME type (https://en.wikipedia.org/wiki/Media_type#mime.types). I confirmed with Firefox that, if I rename the file `index.html` to `index.xhtml`, the file is parsed as an XML Document and I get the **same** error. So the way the MIME Type is guessed depends on the web browser. In the case of a local file, should Epiphany guess the MIME type based on the filename?
Michael Catanzaro
Comment 9 2024-09-10 16:33:02 PDT
> The xmlns attribute has no effect on guessing the document type (HTML or XML): Huh, I didn't know that. Good find. Then shared-mime-info should probably stop matching on this to detect XHTML. > If the resource is retrieved directly from the file system, set supplied-type to the MIME type provided by the file system. I'm not sure what this means, because filesystems obviously do not have any concept of MIME type. That's not how computers work? We take the MIME type guessed by shared-mime-info, which I guess is the closest possible approximation. > - Finally, the association between filename extensions and MIME types exists in UNIX-type systems at `etc/mime.types` and can be seen as a way for the filesystem to provide the MIME type (https://en.wikipedia.org/wiki/Media_type#mime.types). Huh, I've never heard of /etc/mime.types. I doubt shared-mime-info looks at it. On my computer, that file is owned by mailcap, which I've never heard of. :) But I guess what this really indicates is that the spec authors' definition of "filesystem" is pretty flexible and can means whatever the browser wants it to mean. Anyway, for us, that's shared-mime-info. I think shared-mime-info decides for itself whether content sniffing magic or file extension has higher priority? But I'm not sure. It's just one big data file, so you could probably experiment easily enough without needing to be a software developer. Seems we need the file extension ought to take precedence over content sniffing magic. Guess: it maybe broke in: https://gitlab.freedesktop.org/xdg/shared-mime-info/-/commit/4961dc3e48d13c0c675ad7c135419b864813ca55 Or possibly: https://gitlab.freedesktop.org/xdg/shared-mime-info/-/commit/8ae13a589577e9bda12fb16465a03cd81b1cd349 which actually references our bug #160347, which it seems got forgotten long ago....
Michael Catanzaro
Comment 10 2024-09-10 16:40:25 PDT
Another possibility: maybe WebKit just doesn't pass the filename, so shared-mime-info has to guess from magic alone? Unfortunately I'm not sure where exists our code that reads the MIME type for non-HTTP content.
Note You need to log in before you can comment on or make changes to this bug.