Bug 80170

Summary:

Contents of noscript elements turned into strings in WebArchives

Product:

WebKit

Reporter:

Jessie Berlin <jberlin>

Component:

Page Loading

Assignee:

Brady Eidson <beidson>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

adachan, beidson, japhet, jberlin, sfalken, webkit-bug-importer, webkit.review.bot

Priority:

Keywords:

InRadar

Version:

528+ (Nightly build)

Hardware:

Unspecified

OS:

Unspecified

Attachments:

Description	Flags
Reduced case with a noscript element	none
Patch v1	aestes: review+

Jessie Berlin

Reported 2012-03-02 10:27:25 PST

Created attachment 129920 [details] Reduced case with a noscript element When a page with the following markup: <!DOCTYPE html> <html> <head> <title>Page With No Script Element To Be WebArchived</title> </head> <body> <div>If JS is turned off, you should see red text below</div> <div> <noscript> <div height="200" width="200" style="color: red">Some red text</div> </noscript> </div> </body> </html> is saved as a Web Archive, the contents of the noscript element inside the Web Archive are put into a string: <!DOCTYPE html> <html><head> <title>Page With No Script Element</title> </head> <body> <div>If JS is turned off, you should see red text below</div> <div> <noscript> &lt;div height="200" width="200" style="color: red"&gt;Some red text&lt;/div&gt; </noscript> </div> </body></html> I first noticed this on r109561, but the comment from https://bugs.webkit.org/show_bug.cgi?id=13522 indicates that this was happening as far back as August 2010.

Attachments
Reduced case with a noscript element (333 bytes, text/html) 2012-03-02 10:27 PST, Jessie Berlin	no flags	Details
Patch v1 (12.32 KB, patch) 2012-05-10 12:41 PDT, Brady Eidson	aestes: review+	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Jessie Berlin

Comment 1 2012-03-02 10:29:54 PST

<rdar://problem/10972534>

Brady Eidson

Comment 2 2012-03-02 10:30:04 PST

When we create a WebArchive, we serialize the DOM. Since the contents of the <noscript> tag is never parsed into the DOM tree when scripting is enabled, they remain the innertext of the <noscript> tag. I agree this is weird, but is there a real world symptom?

Radar WebKit Bug Importer

Comment 3 2012-03-02 10:33:38 PST

<rdar://problem/10972577>

Jessie Berlin

Comment 4 2012-03-02 10:40:23 PST

(In reply to comment #2) > When we create a WebArchive, we serialize the DOM. > > Since the contents of the <noscript> tag is never parsed into the DOM tree when scripting is enabled, they remain the innertext of the <noscript> tag. > > I agree this is weird, but is there a real world symptom? The real word symptom is odd strings appearing in webarchives that were created with JavaScript enabled, but the user has decided to look at with JavaScript disabled.

Brady Eidson

Comment 5 2012-05-10 12:21:52 PDT

There's a much deeper question about how innerHTML of <noscript> is expected to work in both a scripting and non-scripting environment that I believe Alexey is planning to pursue with the spec. But for webarchives, we can solve this by filtering out the <noscript> elements completely if scripting is enabled. Patch coming soon.

Brady Eidson

Comment 6 2012-05-10 12:41:31 PDT

Created attachment 141229 [details] Patch v1

Andy Estes

Comment 7 2012-05-10 13:29:48 PDT

Comment on attachment 141229 [details] Patch v1 View in context: https://bugs.webkit.org/attachment.cgi?id=141229&action=review > Source/WebCore/editing/MarkupAccumulator.h:73 > + String serializeNodes(Node* targetNode, Node* nodeToSkip, EChildrenOnly, Vector<QualifiedName>* tagNamesToSkip); You should make this new argument optional. > Source/WebCore/page/PageSerializer.cpp:217 > + String text = accumulator.serializeNodes(document->documentElement(), 0, IncludeNode, 0); You wouldn't have to change this line if the new argument to serializeNodes() were optional.

Brady Eidson

Comment 8 2012-05-10 14:08:45 PDT

http://trac.webkit.org/changeset/116685

Note You need to log in before you can comment on or make changes to this bug.