Bug 47768 - XMLSerializer serializes BR as <BR> not <BR/>
Summary: XMLSerializer serializes BR as <BR> not <BR/>
Status: UNCONFIRMED
Alias: None
Product: WebKit
Classification: Unclassified
Component: XML (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Major
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-16 07:30 PDT by Dave Raggett
Modified: 2016-07-29 09:40 PDT (History)
4 users (show)

See Also:


Attachments
Reproduce XML serialization error for HTML br element (961 bytes, text/html)
2010-10-16 07:30 PDT, Dave Raggett
no flags Details
XHTML test case (169 bytes, application/xhtml+xml)
2010-10-17 11:49 PDT, Alexey Proskuryakov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dave Raggett 2010-10-16 07:30:48 PDT
Created attachment 70953 [details]
Reproduce XML serialization error for HTML br element

My browser is Chromium	6.0.472.62 (Developer Build 59676) Ubuntu 10.04, WebKit	534.3 V8 2.2.24.23.

I am using XMLSerializer.serializeToString to serialize an HTML element and its content to XML. Unfortunately serializing content including a BR element then results in malformed XML. This is a nuisance when serializing markup to a server for download by other browsers, e.g. as part of a browser-based document editor.

Other browsers e.g. Firefox get this right.

Try viewing the attached HTML document in webkit, Opera and Firefox to see what I mean.
Comment 1 Alexey Proskuryakov 2010-10-16 17:17:06 PDT
That's more or less intentional behavior. WebKit makes the serialization form choice based on what is being serialized, not on what API happened to be used.

See also: bug 16496.
Comment 2 Dave Raggett 2010-10-17 02:41:34 PDT
If you want to get an HTML serialization, then you can use innerHTML, but the XML serialization should be XML for interoperability with other browsers.

I did some more experiments and found that regardless of the namespace of the parent element, BR is only serialized by webkit as well formed XML if you set its namespace to be something other than "http://www.w3.org/1999/xhtml".

Thus the webkit implementation seems to be examining the namespace for each element and switching its serialization on the fly with the end result that you can't get well formed XML if there were any empty elements in the "http://www.w3.org/1999/xhtml" namespace.
Comment 3 Alexey Proskuryakov 2010-10-17 11:49:46 PDT
Created attachment 70980 [details]
XHTML test case

> If you want to get an HTML serialization, then you can use innerHTML

This is not accurate in XML documents per HTML5. See <http://www.whatwg.org/specs/web-apps/current-work/multipage/apis-in-html-documents.html#innerhtml>:

-----------------------------
On getting, if the node's document is an HTML document, then the attribute must return the result of running the HTML fragment serialization algorithm on the node; otherwise, the node's document is an XML document, and the attribute must return the result of running the XML fragment serialization algorithm on the node instead (this might raise an exception instead of returning a string).
-----------------------------

So, innerHTML changes its behavior depending on what's being serialized.

>  but the XML serialization should be XML for interoperability with other browsers.

There is no dispute about that being a strong argument for adopting Firefox behavior.

> I did some more experiments and found that regardless of the namespace of the parent element,
> BR is only serialized by webkit as well formed XML if you set its namespace to be something other
> than "http://www.w3.org/1999/xhtml".

This is not what I'm seeing on the attached test case.
Comment 4 Peter Ryan 2012-09-22 14:27:12 PDT
My browser: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.10 (KHTML, like Gecko) Ubuntu/12.04 Chromium/23.0.1262.0 Chrome/23.0.1262.0 Safari/537.10

Example 1: HTML served as text/html
http://clickindustrial.com/test/public/20120908/test01.html

Example 2: XHTML served as text/html
http://clickindustrial.com/test/public/20120908/test02.html

Example 3: XHTML served as application/xhtml+xml
http://clickindustrial.com/test/public/20120908/test02.xhtml

Only example 3 outputs well-formed XML, which isn't what I'd expect an XMLSerializer to do!

This is also inconsistent with Firefox and Opera:
http://clickindustrial.com/test/public/20120908/test01.png

^ the above screenshots are based on Example 1 above.

If a page is marked-up as HTML or is XHTML but served as text/html, there appears to be no way in WebKit to serialize it as XML, short of creating a new XML DOM, walking to HTML DOM and re-creating elements in the XML DOM.
Comment 5 Alexey Proskuryakov 2012-09-23 10:44:44 PDT
> Only example 3 outputs well-formed XML, which isn't what I'd expect an XMLSerializer to do!

As mentioned in comments 1, this is not what WebKit implements - XMLSerializer just serializes the document without magically converting it to a different kind of a document.
Comment 6 Peter Ryan 2012-09-24 07:48:11 PDT
Just to clarify, my point is that I'd expect something called "XMLSerializer" to generate XML. If it does not do this, then it should surely be called just plain "Serializer". And therefore, being called XMLSerializer and not outputting XML must therefore be a bug?

Additionally, I can't think-of/find a viable alternative or work-around (tree-walking, DOM copying is kind of slow!). For my use-case, I need well-formed XML in order to push it through an XSL transformation. Conversely, if XMLSerializer actually output XML, it would be trivial to convert XHTML to HTML using a simple XSL transformation.
Comment 7 Alexey Proskuryakov 2012-09-24 08:42:17 PDT
> I'd expect something called "XMLSerializer" to generate XML

I see where you are coming from, but can't really subscribe to this line of thinking. XMLHttpRequest doesn't have much to do with XML either.

XSLT has a lot of rough edges when used on the Web - in fact, you can't even produce <br> without violating the spec, see <https://www.w3.org/Bugs/Public/show_bug.cgi?id=18460>.
Comment 8 Christoph Burgmer 2013-11-14 02:51:58 PST
As I was hit by the same limitation I sat down to implement a serializer in plain JS. I now hope that this can benefit others. You'll find a demo here: http://cburgmer.github.io/xmlserializer.js/.