Bug 12751 - DocumentType nodes aren't part of the Document (affects Acid3)
Summary: DocumentType nodes aren't part of the Document (affects Acid3)
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Dave Hyatt
URL: http://acid3.acidtests.org/
Keywords: HasReduction
: 16636 (view as bug list)
Depends on:
Blocks: Acid3 17080
  Show dependency treegraph
Reported: 2007-02-12 14:51 PST by Anders Carlsson
Modified: 2019-02-06 09:03 PST (History)
6 users (show)

See Also:

test case (456 bytes, text/html)
2008-01-30 12:55 PST, Robert Blaut
no flags Details
Work in progress (34.33 KB, patch)
2008-02-15 15:15 PST, Dave Hyatt
no flags Details | Formatted Diff | Diff
Patch that gets Acid3 up to 84/100 (36.19 KB, patch)
2008-02-15 15:52 PST, Dave Hyatt
no flags Details | Formatted Diff | Diff
More cleanup. (37.52 KB, patch)
2008-02-18 06:37 PST, Dave Hyatt
no flags Details | Formatted Diff | Diff
Finish view source mode. Fix bugs in the inspector code. (45.43 KB, patch)
2008-02-19 15:44 PST, Dave Hyatt
no flags Details | Formatted Diff | Diff
Patch that passes all of Sam's tests (51.11 KB, patch)
2008-02-19 20:37 PST, Dave Hyatt
no flags Details | Formatted Diff | Diff
Work in progress (51.31 KB, patch)
2008-02-19 22:04 PST, Dave Hyatt
no flags Details | Formatted Diff | Diff
Ready for review. (53.95 KB, patch)
2008-02-20 01:21 PST, Dave Hyatt
no flags Details | Formatted Diff | Diff
doctype tests (22.77 KB, patch)
2008-02-20 13:16 PST, Sam Weinig
no flags Details | Formatted Diff | Diff
Final patch (260.54 KB, patch)
2008-02-20 14:25 PST, Dave Hyatt
sam: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Anders Carlsson 2007-02-12 14:51:14 PST
Firefox treats the document type node as the first child of the document. We don't add document type nodes as children at all.
Comment 1 Alexey Proskuryakov 2007-12-28 07:00:40 PST
*** Bug 16636 has been marked as a duplicate of this bug. ***
Comment 2 Darin Adler 2008-01-01 11:35:54 PST
Committed revision 29067.
Comment 3 Darin Adler 2008-01-01 12:01:20 PST
Marked resolved by accident.
Comment 4 Darin Adler 2008-01-01 12:04:36 PST
Oh, Eric just pointed out that it's not just XML!
Comment 5 Darin Adler 2008-01-01 12:05:22 PST
I guess it probably won't hurt compatibility too much if it makes us match Gecko. I wonder what Trident/IE does with a doctype node.
Comment 6 Alexey Proskuryakov 2008-01-06 02:05:17 PST
See also: bug 15290, bug 15411.
Comment 7 Robert Blaut 2008-01-30 12:55:27 PST
Created attachment 18797 [details]
test case
Comment 8 Eric Seidel (no email) 2008-02-08 15:07:57 PST
Sadly this causes 2 Acid3 failures.
Comment 9 Dave Hyatt 2008-02-15 15:15:35 PST
Created attachment 19146 [details]
Work in progress

Just saving my work.  Not ready for review yet.
Comment 10 Dave Hyatt 2008-02-15 15:52:31 PST
Created attachment 19147 [details]
Patch that gets Acid3 up to 84/100

I still have to make view source mode work with doctypes so that I don't break the Web Inspector.
Comment 11 Darin Adler 2008-02-16 07:59:19 PST
Comment on attachment 19147 [details]
Patch that gets Acid3 up to 84/100

I like it!

 405     virtual void determineParseMode() {};

You made this protected, but it should be private. You only need it protected if derived classes need to *call* it. You can always override a virtual regardless of the visibility. And the derived class only needs to override, not call. It should be private in HTMLDocument too.

Also, no semicolon after the braces.

Do we need to do something to prohibit a second DOCTYPE node in XML, or does the parser already take care of that? Maybe this doesn't even handle he XML case yet?

 858         bool isWhitespace = c == '\r' || c == '\n' || c == '\t' || c == ' ';

Don't we already have a function for exactly that set of whitespace? All our different whitespace checks drive me crazy.

 868                     // Malformed.  Just exit.

Extra space here in this comment after the period.

 902                     if (Unicode::toLower(c) == publicStart[m_doctypeSearchCount]) {

 910                     } else if (Unicode::toLower(c) == systemStart[m_doctypeSearchCount]) {

 1072                 if (Unicode::toLower(*src) == doctypeStart[m_doctypeSearchCount]) {

Unicode::toLower is overkill here. You want the much faster toASCIILower that works only for ASCII characters, because you know the that characters you are looking for are ASCII.

I think we need test cases for every single branch in the parsing logic. I honestly can't read the parsing code well enough to be sure it's right because of so many if statements and branches. Only tests will convince me and I assume that' true for you too.

 102     DoctypeToken()
 103     {}

I'd prefer this either all on one line, or with the braces on separate lines.

 105     void reset() {

Please put the opening brace here on a separate line since this is a function.

 116     String name() const { return String(m_name.data(), m_name.size()); }
 117     String publicID() const { return String(m_publicID.data(), m_publicID.size()); }
 118     String systemID() const { return String(m_systemID.data(), m_systemID.size()); }

Since we only use these when we're *done* with the doctype, we should use String::adopt, which is more efficient, but leaves the vector empty. I think that we could just leave these out entirely and write this:

    document->setDocType(new DocumentType(document, String::adopt(t->m_name), String::adopt(t->m_publicID), String::adopt(t->m_systemID)));

That's all my comments.
Comment 12 Dave Hyatt 2008-02-18 06:37:49 PST
Created attachment 19186 [details]
More cleanup.
Comment 13 Eric Seidel (no email) 2008-02-18 11:13:52 PST
Comment on attachment 19186 [details]
More cleanup.

I know you didn't post this for review, yet, but here are my thoughts anyway.  I read through the DocType node manipulation bits, not through the doctype parsing bits yet.

1.  Obviously this needs test cases (I'm sure you have some).
2.  I'm not sure what other engines do when you add a second doctype node to a document (via js), then remove the first.  Does the second then become the doctype?  In our code, when you add a 3rd after removing the first it would suddenly become the doctype.  What if you have 3 in the doc when you remove the first? which becomes the doctype (if other parsers do that).
Comment 14 Ian 'Hixie' Hickson 2008-02-18 12:28:12 PST
"if the node to append is one of this node's ancestors or this node itself, or if this node is of type
Document and the DOM application attempts to append a second DocumentType or Element node", then you should raise a "HIERARCHY_REQUEST_ERR" exception.
See http://www.w3.org/TR/DOM-Level-3-Core/core.html
Comment 15 Adam Roben (:aroben) 2008-02-19 00:14:13 PST
(In reply to comment #13)
> 1.  Obviously this needs test cases (I'm sure you have some).

We can probably borrow some testcases from html5lib. Anne pointed me at the following files, all of which contain some doctype tokenizer tests:

Comment 16 Dave Hyatt 2008-02-19 15:44:52 PST
Created attachment 19216 [details]
Finish view source mode. Fix bugs in the inspector code.
Comment 17 Dave Hyatt 2008-02-19 18:24:05 PST
Comment on attachment 19216 [details]
Finish view source mode. Fix bugs in the inspector code.

Sam Weinig has written a test harness that tests a bunch of doctypes and makes sure the mode that is picked is correct.  

I will land his harness in the tree once it is ready (along with this patch).

Ready for comments again.
Comment 18 Dave Hyatt 2008-02-19 20:37:32 PST
Created attachment 19218 [details]
Patch that passes all of Sam's tests

This patch passes every single doctype rule contained in the HTML5 specification (Sam's tests cover all of them).
Comment 19 Dave Hyatt 2008-02-19 22:04:40 PST
Created attachment 19219 [details]
Work in progress

There are still many layout tests to fix.  I am having to change layout tests that have bugs in them (e.g., svg/dynamic-updates has a broken test harness that does document.firstChild instead of document.documentElement).
Comment 20 Dave Hyatt 2008-02-20 01:21:06 PST
Created attachment 19221 [details]
Ready for review.

This patch has been through the layout tests.  Many tests are changing results:

(1) DOM tests that failed are now passing and/or changing results slightly.
(2) about:blank is now properly in quirks mode instead of strict mode (affects every single test that uses about:blank)
(3) Positional dumping tests now change when a doctype is placed before the root element
(4) XSL transform results now have the correct mode

One major change I had to make is that the construction of the style selector is now delayed until a Document is attached.  Document construction time was too early, since the compat mode had not yet been initialized (the HTMLDocument is not even constructed yet in the base class constructor).
Comment 21 Adam Roben (:aroben) 2008-02-20 09:54:55 PST
Comment on attachment 19221 [details]
Ready for review.

+    String text = "<";
+    text += String::adopt(doctypeToken->m_source);
+    text += ">";
+    addText(text, "webkit-html-doctype");

This could be quite a bit more efficient like so:

Vector<UChar> buffer(doctypeToken->m_source.size() + 2);
buffer[0] = "<";
buffer.insert(1, doctypeToken->m_source);
buffer[buffer.size() - 1] = ">";
addText(String::adopt(buffer), "webkit-html-doctype");
Comment 22 Sam Weinig 2008-02-20 13:16:40 PST
Created attachment 19235 [details]
doctype tests
Comment 23 mitz 2008-02-20 13:29:05 PST
Comment on attachment 19235 [details]
doctype tests

+FAIL: the Doctype was Standards.  It was expected to be Quirks.  Doctype: <!-- comment --><!DOCTYPE html>

Is this expected?
Comment 24 Dave Hyatt 2008-02-20 13:37:35 PST
Comment on attachment 19235 [details]
doctype tests

Yeah that is a bug.  I will fix the test.
Comment 25 Dave Hyatt 2008-02-20 14:25:34 PST
Created attachment 19241 [details]
Final patch
Comment 26 Sam Weinig 2008-02-20 14:38:49 PST
Comment on attachment 19241 [details]
Final patch

Comment 27 Dave Hyatt 2008-02-20 14:49:35 PST
Fixed in r30431.

Comment 28 Lucas Forschler 2019-02-06 09:03:22 PST
Mass moving XML DOM bugs to the "DOM" Component.