6290 – XML documents with "html" in dtd name use XHTML entities when they shouldn't

RESOLVED FIXED 6290

XML documents with "html" in dtd name use XHTML entities when they shouldn't

https://bugs.webkit.org/show_bug.cgi?id=6290

Summary XML documents with "html" in dtd name use XHTML entities when they shouldn't

Ian 'Hixie' Hickson

Reported 2005-12-29 16:56:35 PST

Currently any XML file with a DOCTYPE whose "name" part is "html" triggers "HTML" parsing mode. This is wrong. It should be based on the PUBLIC identifier. TESTCASE: http://www.hixie.ch/tests/adhoc/xml/parsing/010.xml This is a regression caused by bug 4301.

Attachments
fix & test (4.20 KB, patch) 2005-12-29 18:36 PST, Eric Seidel (no email)	mjs: review+	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Eric Seidel (no email)

Comment 1 2005-12-29 17:02:01 PST

This is such a trivial edgecase.

Ian 'Hixie' Hickson

Comment 2 2005-12-29 17:11:02 PST

Not that much of an edgecase, XHTML2 will likely trip this, and even if XHTML2 isn't much in use today, it eventually may be. It would be sad for XHTML2 authors if they were confused by old builds of Safari. The bug is that externalSubsetHandler() is checking |name|, but should be checking |externalId|. Here are the IDs to compare against (taken from Mozilla's list, with the MathML DTDs removed): "-//W3C//DTD XHTML 1.0 Transitional//EN" "-//W3C//DTD XHTML 1.1//EN" "-//W3C//DTD XHTML 1.0 Strict//EN" "-//W3C//DTD XHTML 1.0 Frameset//EN" "-//W3C//DTD XHTML Basic 1.0//EN" "-//WAPFORUM//DTD XHTML Mobile 1.0//EN"

Eric Seidel (no email)

Comment 3 2005-12-29 17:36:23 PST

Calling this a regression is pretty strong language. As far as I can tell this test case only "passed" before, because html entities were not supported in xhtml at all in Safari. Perhaps in the expat days this test case passed for valid reasons. Now html entities are supported in anything that looks like an xhtml document. This one happens to look enough like xhtml that we treat it as one. Eventually, we'll need a HashSet, and a list of allowed doc types.

Ian 'Hixie' Hickson

Comment 4 2005-12-29 17:38:49 PST

Why not just a set of six string comparisons in the place of the one comparison you have now? You'll almost always either match the first one or not enter the function, so it shouldn't be a perf hit. It's a regression because before you used to not have this bug, and now you do. That's what a regression is: the introduction of a new bug.

Eric Seidel (no email)

Comment 5 2005-12-29 18:36:07 PST

Created attachment 5364 [details] fix & test

Ian 'Hixie' Hickson

Comment 6 2005-12-29 19:01:28 PST

You have this line: + || (extId == "-//W3C//DTD XHTML Basic 1.0//EN") ...in there twice. (In the Mozilla codebase they have the MathML 2.0 DOCTYPE where you have the second occurance, but you don't want to put that in because that isn't XHTML and has different entities. It's probably sort-of ok to have the XHTML+MathML DOCTYPEs in there, I guess, though. It'll at least work for some pages, anyway.) Other than that, looks great. Thanks. :-)

Maciej Stachowiak

Comment 7 2005-12-29 20:00:49 PST

Comment on attachment 5364 [details] fix & test r=me

Joost de Valk (AlthA)

Comment 8 2006-01-22 04:56:38 PST

Removing keyword(s) since bug is fixed.

Joost de Valk (AlthA)

Comment 9 2006-01-22 05:01:00 PST

Removing keyword(s) since bug is fixed.

Eric Seidel (no email)

Comment 10 2006-01-31 21:20:40 PST

Removing Regression keyword from bugs already fixed.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P4

Severity Normal

Classification Unclassified

Version 420+

Hardware PC

OS Linux

Product WebKit

Component WebKit Misc.

Assignee

Eric Seidel (no email)

Reported

2005-12-29 16:56 PST

Modified

2006-01-31 21:20 PST History

CC List

0 users

URL

http://www.hixie.ch/tests/adhoc/xml/parsing/010.xml

Keywords

Depends on

Blocks

4301

Dependencies

tree graph