Bug 15290 - REGRESSION (r14184-r14216): Duplicate DOCTYPE definitions when saving files as webarchives
Summary: REGRESSION (r14184-r14216): Duplicate DOCTYPE definitions when saving files a...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: 523.x (Safari 3)
Hardware: Macintosh OS X 10.4
: P1 Normal
Assignee: Nobody
URL:
Keywords: HasReduction, InRadar, Regression
Depends on:
Blocks:
 
Reported: 2007-09-26 14:27 PDT by Eric Seidel (no email)
Modified: 2008-07-05 10:58 PDT (History)
2 users (show)

See Also:


Attachments
an example busted archive (30.79 KB, application/x-webarchive)
2007-09-26 14:27 PDT, Eric Seidel (no email)
no flags Details
minimal SVG file to reproduce the problem (156 bytes, image/svg+xml)
2007-09-26 14:34 PDT, Eric Seidel (no email)
no flags Details
minimal example busted SVG archive (497 bytes, application/x-webarchive)
2007-09-26 14:34 PDT, Eric Seidel (no email)
no flags Details
minimal example html file (105 bytes, text/html)
2007-09-26 14:41 PDT, Eric Seidel (no email)
no flags Details
minimal example html archive (still loads, just is clearly serialized incorrectly) (457 bytes, application/x-webarchive)
2007-09-26 14:42 PDT, Eric Seidel (no email)
no flags Details
Patch v1 (5.43 KB, patch)
2008-07-03 09:36 PDT, David Kilzer (:ddkilzer)
no flags Details | Formatted Diff | Diff
Patch v2 (6.41 KB, patch)
2008-07-04 07:33 PDT, David Kilzer (:ddkilzer)
mitz: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2007-09-26 14:27:17 PDT
Duplicate DOCTYPE definitions when saving SVGs as WebArchives

Save an SVG as a WebArchive and you end up with a duplicate DOCTYPE definition.  This then breaks parsing when trying to open the archive.
Comment 1 Eric Seidel (no email) 2007-09-26 14:27:45 PDT
Created attachment 16405 [details]
an example busted archive
Comment 2 Eric Seidel (no email) 2007-09-26 14:34:12 PDT
Created attachment 16406 [details]
minimal SVG file to reproduce the problem
Comment 3 Eric Seidel (no email) 2007-09-26 14:34:45 PDT
Created attachment 16407 [details]
minimal example busted SVG archive
Comment 4 Eric Seidel (no email) 2007-09-26 14:36:56 PDT
I believe this is caused by this line in markup.cpp:

        case Node::DOCUMENT_NODE: {
            // Documents do not normally contain a docType as a child node, force it to print here instead.
            const DocumentType* docType = static_cast<const Document*>(node)->doctype();
            if (docType)
                return docType->toString().deprecatedString();
            return "";
        }

Comment 5 Eric Seidel (no email) 2007-09-26 14:39:50 PDT
This actually applies to HTML as well, it just doesn't break the document, since the HTML parser is more forgiving.  This may be a regression from 2.0 (likely, since we have made major improvements to XML serialization since then).
Comment 6 Eric Seidel (no email) 2007-09-26 14:41:14 PDT
Created attachment 16408 [details]
minimal example html file
Comment 7 Eric Seidel (no email) 2007-09-26 14:42:09 PDT
Created attachment 16409 [details]
minimal example html archive (still loads, just is clearly serialized incorrectly)
Comment 8 Eric Seidel (no email) 2007-09-26 14:49:15 PDT
So I tested with Safari 2.0 and this is actually a regression.  However it only affects XHTML (since SVG didn't work in Safari 2) so I'm not sure anyone cares.  I've added the keyword (since it is technically a regression), but someone else can certainly remove it given the only-affecting-xhtml circumstances.
Comment 9 Eric Seidel (no email) 2007-09-26 16:00:15 PDT
Actually, the bug seems to be from the use of:
_stringWithDocumentTypeStringAndMarkupString

which also pre-pends the DOCTYPE to the beginning of the file as well as createMarkup (when printing the DocumentNode).
Comment 10 David Kilzer (:ddkilzer) 2007-09-27 09:57:45 PDT
Unfortunately, it appears that the method that DumpRenderTree uses to create webarchives is not the same method that Safari uses.

(In reply to comment #8)
> So I tested with Safari 2.0 and this is actually a regression.  However it only
> affects XHTML (since SVG didn't work in Safari 2) so I'm not sure anyone cares.

Actually, it affects any HTML or XML document with a pre-existing DOCTYPE.

Comment 11 David Kilzer (:ddkilzer) 2007-09-27 10:11:11 PDT
An autospade run says:

Works: r14184  Fails: r14216

Comment 12 David Kilzer (:ddkilzer) 2007-09-27 10:15:43 PDT
* STEPS TO REPRODUCE
1. Launch Safari/WebKit.
2. Open Attachment 16408 [details].
3. Save as webarchive.
4. Open saved webarchive.
5. View source of page.

* EXPECTED RESULTS
There should be only one "<!DOCTYPE>" tag.

* ACTUAL RESULTS
There are two "<!DOCTYPE>" tags at the top of the file.

Comment 13 David Kilzer (:ddkilzer) 2007-09-27 10:16:13 PDT
<rdar://problem/5509173>
Comment 14 David Kilzer (:ddkilzer) 2007-09-27 10:23:30 PDT
(In reply to comment #11)
> An autospade run says:
> Works: r14184  Fails: r14216

Revision r14204 looks most suspicious.

http://trac.webkit.org/projects/webkit/changeset/14204

Comment 15 Eric Seidel (no email) 2007-09-27 11:02:50 PDT
I don't think it was 14204.  I had a patch locally on my system to remove "realDocType()" as it's the same as the default implementation of docType() (which is no longer overridden by anyone and thus realDocType() doesn't need to exist)
Comment 16 David Kilzer (:ddkilzer) 2008-07-03 09:36:31 PDT
Created attachment 22066 [details]
Patch v1

Proposed fix.  Tested both the SVG and HTML examples.  Layout tests pass as well.
Comment 17 mitz 2008-07-03 09:38:28 PDT
Comment on attachment 22066 [details]
Patch v1

r=me
Comment 18 David Kilzer (:ddkilzer) 2008-07-03 10:35:45 PDT
Comment on attachment 22066 [details]
Patch v1

Clearing review flag for now.  I think I may need to check to see that the "first" node is a DOCTYPE node since this method may be called for any arbitrary node in the DOM.
Comment 19 David Kilzer (:ddkilzer) 2008-07-04 07:33:28 PDT
Created attachment 22087 [details]
Patch v2

Added checks for the node being a DOCUMENT_NODE or DOCUMENT_TYPE_NODE and only prepending the document type if the node type is NOT one of those.
Comment 20 mitz 2008-07-04 16:02:33 PDT
Comment on attachment 22087 [details]
Patch v2

r=me
Comment 21 David Kilzer (:ddkilzer) 2008-07-04 21:52:00 PDT
Committed r35006
Comment 22 David Kilzer (:ddkilzer) 2008-07-05 10:58:00 PDT
Fixed ChangeLog entry in r35014.