Bug 45338 - innerHTML escapes <, >, &, and nbsp inside noembed, noframes, and plaintext
Summary: innerHTML escapes <, >, &, and nbsp inside noembed, noframes, and plaintext
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks: 45330
  Show dependency treegraph
 
Reported: 2010-09-07 16:59 PDT by Ryosuke Niwa
Modified: 2023-03-29 12:48 PDT (History)
8 users (show)

See Also:


Attachments
demo (981 bytes, text/html)
2010-09-07 17:19 PDT, Ryosuke Niwa
no flags Details
noframes example (196 bytes, text/html)
2010-09-07 17:38 PDT, Alexey Proskuryakov
no flags Details
static html demo (900 bytes, text/html)
2010-09-07 17:42 PDT, Ryosuke Niwa
no flags Details
static html demo with UTF-8 nbsp (1.19 KB, text/html)
2010-09-07 18:42 PDT, Ryosuke Niwa
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ryosuke Niwa 2010-09-07 16:59:16 PDT
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#serializing-html-fragments
Specifies that we should append the value of data IDL attribute literally without any escapes for text node under style, script, xmp, iframe, noembed, noframes, or plaintext elements.

However, WebKit currently escapes <, >, &, and non-breaking space inside noembed, noframes, and plaintext elements.
Comment 1 Darin Adler 2010-09-07 17:08:26 PDT
Lets make some test cases and double check the other browsers do it that way.
Comment 2 Alexey Proskuryakov 2010-09-07 17:18:04 PDT
I'm not sure about noembed - Firefox 3.6.8 escapes <, >, &, and non-breaking space in it.
Comment 3 Ryosuke Niwa 2010-09-07 17:19:14 PDT
Created attachment 66794 [details]
demo
Comment 4 Alexey Proskuryakov 2010-09-07 17:38:10 PDT
Created attachment 66805 [details]
noframes example

Interestingly, the results of your test don't match what I saw. Firefox serializes noframes differently depending on what element innerHTML is called on - there is no such difference for noembed!
Comment 5 Ryosuke Niwa 2010-09-07 17:42:29 PDT
Created attachment 66806 [details]
static html demo
Comment 6 Ryosuke Niwa 2010-09-07 17:45:36 PDT
(In reply to comment #1)
> Lets make some test cases and double check the other browsers do it that way.

I thought I attached the test but apparently not.  Added a test + static html demo for MSIE.

(In reply to comment #2)
> I'm not sure about noembed - Firefox 3.6.8 escapes <, >, &, and non-breaking space in it.

Right.  Firefox doesn't escape noembed and noframes.

(In reply to comment #4)
> Interestingly, the results of your test don't match what I saw. Firefox serializes noframes differently depending on what element innerHTML is called on - there is no such difference for noembed!

The situation is even worse.  static html and dynamic html (appending node manually) give different results.  It seems like both WebKit and MSIE drops the contents of noembed and noframes while parsing the document.
Comment 7 Ryosuke Niwa 2010-09-07 18:28:04 PDT
It seems like Mac's lexer isn't reading 0xA0 properly.  I'm getting 0xFFFD instead and I can't figure out a way to read 0xA0.
Comment 8 Ryosuke Niwa 2010-09-07 18:31:08 PDT
(In reply to comment #7)
> It seems like Mac's lexer isn't reading 0xA0 properly.  I'm getting 0xFFFD instead and I can't figure out a way to read 0xA0.

Ugh... this wasn't an issue with Mac platforms.  With TOT, we don't read nbsp on all platforms.
Comment 9 Ryosuke Niwa 2010-09-07 18:42:00 PDT
Created attachment 66824 [details]
static html demo with UTF-8 nbsp

The problem was that the default encoding is set to UTF-8 in which case nbsp is encoded as 0xC2 0xA0 but nbsp in the document was 0xA0 (ISO/IEC 8859).
Comment 10 Ryosuke Niwa 2010-09-07 18:47:35 PDT
Here's a quick summary:

Firefox 3.6.8 escapes text node under noembed and noframes the same way we do.
Internet Explorer 8 returns empty string for noembed and noframes.

Neither Firefox nor Internet Explorer escapes the text node under plaintext.
Comment 11 Ahmad Saleem 2022-06-01 03:07:38 PDT
I am still able to reproduce this issue in Safari 15.5 on macOS 12.4 using "demo" test case. All other browsers Chrome Canary 104 and Firefox 103 behaves similar. Thanks!
Comment 12 Ahmad Saleem 2022-10-06 04:34:55 PDT
Is this bug reason that we fail following WPT tests:

https://wpt.fyi/results/html/syntax/parsing-html-fragments/tokenizer-modes-001.html?label=experimental&label=master&aligned
Comment 13 Ahmad Saleem 2022-10-06 04:39:49 PDT
(In reply to Ahmad Saleem from comment #12)
> Is this bug reason that we fail following WPT tests:
> 
> https://wpt.fyi/results/html/syntax/parsing-html-fragments/tokenizer-modes-
> 001.html?label=experimental&label=master&aligned

Something related to this?

https://github.com/WebKit/WebKit/blob/a43d4b3fb6b0e3fe6eebd85112c25653949bfd08/Source/WebCore/html/parser/HTMLTokenizer.cpp#L1398
Comment 14 EWS 2023-03-29 12:47:31 PDT
Committed 262285@main (a641fc693f57): <https://commits.webkit.org/262285@main>

Reviewed commits have been landed. Closing PR #12108 and removing active labels.
Comment 15 Radar WebKit Bug Importer 2023-03-29 12:48:37 PDT
<rdar://problem/107381507>