Created attachment 129920 [details] Reduced case with a noscript element When a page with the following markup: <!DOCTYPE html> <html> <head> <title>Page With No Script Element To Be WebArchived</title> </head> <body> <div>If JS is turned off, you should see red text below</div> <div> <noscript> <div height="200" width="200" style="color: red">Some red text</div> </noscript> </div> </body> </html> is saved as a Web Archive, the contents of the noscript element inside the Web Archive are put into a string: <!DOCTYPE html> <html><head> <title>Page With No Script Element</title> </head> <body> <div>If JS is turned off, you should see red text below</div> <div> <noscript> &lt;div height="200" width="200" style="color: red"&gt;Some red text&lt;/div&gt; </noscript> </div> </body></html> I first noticed this on r109561, but the comment from https://bugs.webkit.org/show_bug.cgi?id=13522 indicates that this was happening as far back as August 2010.
<rdar://problem/10972534>
When we create a WebArchive, we serialize the DOM. Since the contents of the <noscript> tag is never parsed into the DOM tree when scripting is enabled, they remain the innertext of the <noscript> tag. I agree this is weird, but is there a real world symptom?
<rdar://problem/10972577>
(In reply to comment #2) > When we create a WebArchive, we serialize the DOM. > > Since the contents of the <noscript> tag is never parsed into the DOM tree when scripting is enabled, they remain the innertext of the <noscript> tag. > > I agree this is weird, but is there a real world symptom? The real word symptom is odd strings appearing in webarchives that were created with JavaScript enabled, but the user has decided to look at with JavaScript disabled.
There's a much deeper question about how innerHTML of <noscript> is expected to work in both a scripting and non-scripting environment that I believe Alexey is planning to pursue with the spec. But for webarchives, we can solve this by filtering out the <noscript> elements completely if scripting is enabled. Patch coming soon.
Created attachment 141229 [details] Patch v1
Comment on attachment 141229 [details] Patch v1 View in context: https://bugs.webkit.org/attachment.cgi?id=141229&action=review > Source/WebCore/editing/MarkupAccumulator.h:73 > + String serializeNodes(Node* targetNode, Node* nodeToSkip, EChildrenOnly, Vector<QualifiedName>* tagNamesToSkip); You should make this new argument optional. > Source/WebCore/page/PageSerializer.cpp:217 > + String text = accumulator.serializeNodes(document->documentElement(), 0, IncludeNode, 0); You wouldn't have to change this line if the new argument to serializeNodes() were optional.
http://trac.webkit.org/changeset/116685