Bug 16731

Summary: Incorrect node type for whitespace when setting innerHTML in an XHTML document
Product: WebKit Reporter: Max Barel <max>
Component: DOMAssignee: Alexey Proskuryakov <ap>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, cdumez
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Macintosh   
OS: OS X 10.5   
Attachments:
Description Flags
base xhtml file
none
injected xhtml chunk, NS
none
injected xhtml chunk, no NS
none
javascript injection
none
reduced test case
none
proposed fix darin: review+

Description Max Barel 2008-01-04 08:37:20 PST
I'm reporting this as a bug on suggestion by Darin Adler on the dev mailing list. Here is the related thread where it was first mentioned:
http://lists.webkit.org/pipermail/webkit-dev/2008-January/003082.html

Summary:
When injecting HTML elements into the DOM of a XHTML 1.0 strict document served as application/xhtml+xml, the web inspector shows unexpected #cdata-section between injected elements, IF the injection is made by altering the innerHTML attribute. While reducing a test case to post here, it seems that theses sections are inserted for linefeed and/or tab sequences. They should be #text.

Injecting into xml doc using innerHTML is not the right way, so I wouldn't report it if not suggested to.

Worth noting also, when using XMLHttpRequest to get an xhtml chunk, served as application/xhtml+xml and injected from responseXML using DOM methods, injected elements are not valid HTML elements if there is no explicit namespace through xmlns attribute on the container element.

Test case:
XHTML base file (served as application/xhtml+xml):
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
	<script type="text/javascript" charset="UTF-8" src="dom_injection.js"></script>
	<title>xhtml DOM injection</title>
</head>

<body onload="inject();">
	<p id='p_static'>This pararagraph is static in xhtml file.
		<a href="#p_static">Link to the anchor of this paragraph.</a>
	</p>
	<div id="target"></div>
</body>
</html>
injected xhtml (no namespace):
	
<p id="p_inj">This paragraph is ajax injected with no NS.
	<a href="#p_inj">Link to the anchor of this paragraph.</a>

</p>

injected xhtml (namespace):
	
<p xmlns="http://www.w3.org/1999/xhtml" id="p_inj_NS">This paragraph is ajax injected with NS.

	<a href="#p_inj_NS">Link to the anchor of this paragraph.</a>

</p>

These two injected files are served either as application/xhtml+xml or text/html depending of the file extension used to get them.

The javacript test file:

function inject() {
	var target = document.getElementById('target');
	//no #cdata, valid elements, innerHTML injection
	// target.innerHTML += '<p id="p_inj_1">This paragraph is injected trough innerHTML with no NS. <a href="#p_inj_1">Link to the anchor of this paragraph.</a></p>';
	//#cdata, valid elements, innerHTML injection
	// target.innerHTML += '\n<p id="p_inj_1">This paragraph is injected trough innerHTML with no NS. <a href="#p_inj_1">Link to the anchor of this paragraph.</a></p>';
	//no #cdata, valid elements, innerHTML injection
	// target.innerHTML += '<p xmlns="http://www.w3.org/1999/xhtml" id="p_inj_NS_1">This paragraph is injected trough innerHTML with NS.\n<a href="#p_inj_NS_1">Link to the anchor of this paragraph.</a></p>';
	//#cdata, valid elements, innerHTML injection
	// target.innerHTML += '\n<p xmlns="http://www.w3.org/1999/xhtml" id="p_inj_NS_1">This paragraph is injected trough innerHTML with NS.\n<a href="#p_inj_NS_1">Link to the anchor of this paragraph.</a></p>';

	var client = new XMLHttpRequest();
	client.onreadystatechange = function () {
		if(this.readyState == 4 && this.status == 200) {
			if (this.responseXML && this.responseXML.documentElement) {
				window.status = 'importNode'
				target.appendChild(document.importNode(this.responseXML.documentElement, true));
			}
			else {
				window.status = 'innerHTML'
				target.innerHTML += this.responseText;
			}
		}
	};
	
	client.open("GET", "dom_injection_NS.xhtml");	//application/xhtml+xml, no #cdata, valid elements, DOM methods injection
	// client.open("GET", "dom_injection.xhtml");		//application/xhtml+xml, no #cdata, invalid elements, DOM methods injection
	// client.open("GET", "dom_injection_NS.html");		//text/html, #cdata, valid elements, innerHTML injection
	// client.open("GET", "dom_injection.html");		//text/html, #cdata, valid elements, innerHTML injection
	client.send();
	
}

Depending of the uncommented line the js file, different case are observed, see comment.
Comment 1 Max Barel 2008-01-04 08:39:03 PST
Created attachment 18266 [details]
base xhtml file
Comment 2 Max Barel 2008-01-04 08:40:09 PST
Created attachment 18267 [details]
injected xhtml chunk, NS
Comment 3 Max Barel 2008-01-04 08:40:43 PST
Created attachment 18268 [details]
injected xhtml chunk, no NS
Comment 4 Max Barel 2008-01-04 08:41:44 PST
Created attachment 18269 [details]
javascript injection
Comment 5 Alexey Proskuryakov 2008-01-05 04:31:06 PST
(In reply to comment #0)

> Worth noting also, when using XMLHttpRequest to get an xhtml chunk, served as
> application/xhtml+xml and injected from responseXML using DOM methods, injected
> elements are not valid HTML elements if there is no explicit namespace through
> xmlns attribute on the container element.

This part is correct: elements being inserted have a null namespace, so they are not treated as XHTML elements. Firefox gives the same result, of course.

It's a bit unfortunate that Web Inspector gives no indication of a problem in this case: e.g., both xhtml:a and null:a elements are displayed as "a". Also, XMLSerializer serializes this incorrectly (Firefox has the same problem). Filed bug 16739 and bug 16740 for these issues; let's keep this bug focused on tracking just #cdata-section issue.
Comment 6 Alexey Proskuryakov 2008-01-05 06:20:12 PST
Confirmed as a difference with Firefox.
Comment 7 Alexey Proskuryakov 2008-01-05 06:20:48 PST
Created attachment 18289 [details]
reduced test case
Comment 8 Alexey Proskuryakov 2008-01-06 10:38:04 PST
Created attachment 18302 [details]
proposed fix

I do not quite understand the theory behind this, but the change looks straightforward.

Note that parsing a whole document is different from parsing a fragment for some reason. In the former case, we get "characters" callbacks, and do not need to handle ignorableWhitespace - yet for fragments, it seems necessary.
Comment 9 Darin Adler 2008-01-06 11:20:14 PST
Comment on attachment 18302 [details]
proposed fix

r=me
Comment 10 Alexey Proskuryakov 2008-01-06 11:36:07 PST
Committed revision 29211.

Comment 11 Lucas Forschler 2019-02-06 09:02:54 PST
Mass moving XML DOM bugs to the "DOM" Component.