Bug 80170

Summary: Contents of noscript elements turned into strings in WebArchives
Product: WebKit Reporter: Jessie Berlin <jberlin>
Component: Page LoadingAssignee: Brady Eidson <beidson>
Status: RESOLVED FIXED    
Severity: Normal CC: adachan, beidson, japhet, jberlin, sfalken, webkit-bug-importer, webkit.review.bot
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Reduced case with a noscript element
none
Patch v1 aestes: review+

Jessie Berlin
Reported 2012-03-02 10:27:25 PST
Created attachment 129920 [details] Reduced case with a noscript element When a page with the following markup: <!DOCTYPE html> <html> <head> <title>Page With No Script Element To Be WebArchived</title> </head> <body> <div>If JS is turned off, you should see red text below</div> <div> <noscript> <div height="200" width="200" style="color: red">Some red text</div> </noscript> </div> </body> </html> is saved as a Web Archive, the contents of the noscript element inside the Web Archive are put into a string: <!DOCTYPE html> <html><head> <title>Page With No Script Element</title> </head> <body> <div>If JS is turned off, you should see red text below</div> <div> <noscript> &amp;lt;div height="200" width="200" style="color: red"&amp;gt;Some red text&amp;lt;/div&amp;gt; </noscript> </div> </body></html> I first noticed this on r109561, but the comment from https://bugs.webkit.org/show_bug.cgi?id=13522 indicates that this was happening as far back as August 2010.
Attachments
Reduced case with a noscript element (333 bytes, text/html)
2012-03-02 10:27 PST, Jessie Berlin
no flags
Patch v1 (12.32 KB, patch)
2012-05-10 12:41 PDT, Brady Eidson
aestes: review+
Jessie Berlin
Comment 1 2012-03-02 10:29:54 PST
Brady Eidson
Comment 2 2012-03-02 10:30:04 PST
When we create a WebArchive, we serialize the DOM. Since the contents of the <noscript> tag is never parsed into the DOM tree when scripting is enabled, they remain the innertext of the <noscript> tag. I agree this is weird, but is there a real world symptom?
Radar WebKit Bug Importer
Comment 3 2012-03-02 10:33:38 PST
Jessie Berlin
Comment 4 2012-03-02 10:40:23 PST
(In reply to comment #2) > When we create a WebArchive, we serialize the DOM. > > Since the contents of the <noscript> tag is never parsed into the DOM tree when scripting is enabled, they remain the innertext of the <noscript> tag. > > I agree this is weird, but is there a real world symptom? The real word symptom is odd strings appearing in webarchives that were created with JavaScript enabled, but the user has decided to look at with JavaScript disabled.
Brady Eidson
Comment 5 2012-05-10 12:21:52 PDT
There's a much deeper question about how innerHTML of <noscript> is expected to work in both a scripting and non-scripting environment that I believe Alexey is planning to pursue with the spec. But for webarchives, we can solve this by filtering out the <noscript> elements completely if scripting is enabled. Patch coming soon.
Brady Eidson
Comment 6 2012-05-10 12:41:31 PDT
Created attachment 141229 [details] Patch v1
Andy Estes
Comment 7 2012-05-10 13:29:48 PDT
Comment on attachment 141229 [details] Patch v1 View in context: https://bugs.webkit.org/attachment.cgi?id=141229&action=review > Source/WebCore/editing/MarkupAccumulator.h:73 > + String serializeNodes(Node* targetNode, Node* nodeToSkip, EChildrenOnly, Vector<QualifiedName>* tagNamesToSkip); You should make this new argument optional. > Source/WebCore/page/PageSerializer.cpp:217 > + String text = accumulator.serializeNodes(document->documentElement(), 0, IncludeNode, 0); You wouldn't have to change this line if the new argument to serializeNodes() were optional.
Brady Eidson
Comment 8 2012-05-10 14:08:45 PDT
Note You need to log in before you can comment on or make changes to this bug.