Created attachment 70953 [details]
Reproduce XML serialization error for HTML br element
My browser is Chromium 6.0.472.62 (Developer Build 59676) Ubuntu 10.04, WebKit 534.3 V8 184.108.40.206.
I am using XMLSerializer.serializeToString to serialize an HTML element and its content to XML. Unfortunately serializing content including a BR element then results in malformed XML. This is a nuisance when serializing markup to a server for download by other browsers, e.g. as part of a browser-based document editor.
Other browsers e.g. Firefox get this right.
Try viewing the attached HTML document in webkit, Opera and Firefox to see what I mean.
That's more or less intentional behavior. WebKit makes the serialization form choice based on what is being serialized, not on what API happened to be used.
See also: bug 16496.
If you want to get an HTML serialization, then you can use innerHTML, but the XML serialization should be XML for interoperability with other browsers.
I did some more experiments and found that regardless of the namespace of the parent element, BR is only serialized by webkit as well formed XML if you set its namespace to be something other than "http://www.w3.org/1999/xhtml".
Thus the webkit implementation seems to be examining the namespace for each element and switching its serialization on the fly with the end result that you can't get well formed XML if there were any empty elements in the "http://www.w3.org/1999/xhtml" namespace.
Created attachment 70980 [details]
XHTML test case
> If you want to get an HTML serialization, then you can use innerHTML
This is not accurate in XML documents per HTML5. See <http://www.whatwg.org/specs/web-apps/current-work/multipage/apis-in-html-documents.html#innerhtml>:
On getting, if the node's document is an HTML document, then the attribute must return the result of running the HTML fragment serialization algorithm on the node; otherwise, the node's document is an XML document, and the attribute must return the result of running the XML fragment serialization algorithm on the node instead (this might raise an exception instead of returning a string).
So, innerHTML changes its behavior depending on what's being serialized.
> but the XML serialization should be XML for interoperability with other browsers.
There is no dispute about that being a strong argument for adopting Firefox behavior.
> I did some more experiments and found that regardless of the namespace of the parent element,
> BR is only serialized by webkit as well formed XML if you set its namespace to be something other
> than "http://www.w3.org/1999/xhtml".
This is not what I'm seeing on the attached test case.
My browser: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.10 (KHTML, like Gecko) Ubuntu/12.04 Chromium/23.0.1262.0 Chrome/23.0.1262.0 Safari/537.10
Example 1: HTML served as text/html
Example 2: XHTML served as text/html
Example 3: XHTML served as application/xhtml+xml
Only example 3 outputs well-formed XML, which isn't what I'd expect an XMLSerializer to do!
This is also inconsistent with Firefox and Opera:
^ the above screenshots are based on Example 1 above.
If a page is marked-up as HTML or is XHTML but served as text/html, there appears to be no way in WebKit to serialize it as XML, short of creating a new XML DOM, walking to HTML DOM and re-creating elements in the XML DOM.
> Only example 3 outputs well-formed XML, which isn't what I'd expect an XMLSerializer to do!
As mentioned in comments 1, this is not what WebKit implements - XMLSerializer just serializes the document without magically converting it to a different kind of a document.
Just to clarify, my point is that I'd expect something called "XMLSerializer" to generate XML. If it does not do this, then it should surely be called just plain "Serializer". And therefore, being called XMLSerializer and not outputting XML must therefore be a bug?
Additionally, I can't think-of/find a viable alternative or work-around (tree-walking, DOM copying is kind of slow!). For my use-case, I need well-formed XML in order to push it through an XSL transformation. Conversely, if XMLSerializer actually output XML, it would be trivial to convert XHTML to HTML using a simple XSL transformation.
> I'd expect something called "XMLSerializer" to generate XML
I see where you are coming from, but can't really subscribe to this line of thinking. XMLHttpRequest doesn't have much to do with XML either.
XSLT has a lot of rough edges when used on the Web - in fact, you can't even produce <br> without violating the spec, see <https://www.w3.org/Bugs/Public/show_bug.cgi?id=18460>.
As I was hit by the same limitation I sat down to implement a serializer in plain JS. I now hope that this can benefit others. You'll find a demo here: http://cburgmer.github.io/xmlserializer.js/.