Bug 5262 - XMLSerializer drops Namespace information
Summary: XMLSerializer drops Namespace information
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: XML (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar, NeedsReduction
: 3811 14322 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-10-04 14:16 PDT by Richard Vermillion
Modified: 2007-06-23 03:37 PDT (History)
7 users (show)

See Also:


Attachments
patch for fix and test case (10.12 KB, patch)
2007-03-02 14:38 PST, Lamar Goddard
darin: review-
Details | Formatted Diff | Diff
patch for fix and test case (10.31 KB, patch)
2007-03-05 19:02 PST, Lamar Goddard
darin: review-
Details | Formatted Diff | Diff
patch for fix and test case (10.47 KB, patch)
2007-03-07 16:22 PST, Lamar Goddard
ap: review-
Details | Formatted Diff | Diff
fix bit rot (13.41 KB, patch)
2007-03-31 02:53 PDT, Alexey Proskuryakov
no flags Details | Formatted Diff | Diff
patch for fix and test case (14.46 KB, patch)
2007-04-03 16:21 PDT, Lamar Goddard
darin: review-
Details | Formatted Diff | Diff
patch for fix and test case (15.62 KB, patch)
2007-04-04 16:51 PDT, Lamar Goddard
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Vermillion 2005-10-04 14:16:28 PDT
The XMLSerializer object in Safari 2.0.1 does not serialize XML Namespace information.  Specifically, no 
xmlns attributes are serialized and attributes are only serialized with their localName.  This leads to 
invalid HTML when there are two or more attributes with the same localName but different 
namespaceURI's.  The DOM implementation itself is correctly handling the namespace information, but 
it is not being serialized.

The following script will produce the error (it works fine in Firefox):

  var d = document.implementation.createDocument("urn:foo-ns", "foo:root", null);
  if (!d.documentElement) {
    // This shouldn't happen, since DomImplementation.createDocument
    // is supposed to create the root element.  But in Safari, it's required.
    d.appendChild(d.createElementNS("urn:foo-ns", "foo:root"));
  }
  var root = d.documentElement;

  root.setAttributeNS("urn:foo-ns", "foo:type", "test")

  var c = d.createElementNS(null, "child");
  root.appendChild(c);

  c.setAttributeNS("urn:foo-ns", "foo:name", "one");
  c.setAttributeNS("urn:bar-ns", "bar:name", "two");

  window.alert("foo:name is " + c.getAttributeNS("urn:foo-ns", "name") +
               " and should be one");
  window.alert("bar:name is " + c.getAttributeNS("urn:bar-ns", "name") +
               " and should be two");

  var s = new XMLSerializer();

  window.alert(s.serializeToString(d));
Comment 1 Alexey Proskuryakov 2005-10-18 03:55:30 PDT
Same as bug 3811 (which doesn't have simple steps to reproduce)?
Comment 2 Noah Vihinen 2005-12-19 00:59:35 PST
I ran into this bug as well.
Comment 3 Alexey Proskuryakov 2005-12-23 05:38:16 PST
*** Bug 3811 has been marked as a duplicate of this bug. ***
Comment 4 Alexey Proskuryakov 2005-12-23 05:38:39 PST
From bug 3811:
Expected a namespace-savvy serializer to be used. That is, expected the
serializer to preserve (ns-URI,localName) pairs by synthetizing xmlns attibutes
and prefixes if necessary. Expected residual prefixes and xmlns attributes to be
used as hints of preferred prefixes when possible.

Expected DOM Level 1 tampering with the xml:foo attributes to get the namespace
right. (IE does not do namespaces or DOM Level 1 but does have sane behavior
with DOM Level 1 and xml:foo attributes.)
Comment 5 Alexey Proskuryakov 2005-12-24 02:42:56 PST
May be relevant: <http://www.w3.org/TR/2002/WD-DOM-Level-3-Core-20021022/namespaces-
algorithms.html>
Comment 6 Eric Seidel (no email) 2005-12-24 02:53:20 PST
I thought I fixed this a while back, when doing the XSLTProcessor work.  I guess not.  Assigning this to 
"nobody" mjs can pull it back if he really wanted this assigned to him.
Comment 7 Eric Seidel (no email) 2005-12-24 02:55:04 PST
It would be nice of someone could make up a DumpRenderTree compatible test case for this.
Comment 8 Maciej Stachowiak 2007-02-23 02:55:11 PST
<rdar://problem/5018778>
Comment 9 Lamar Goddard 2007-03-02 14:38:08 PST
Created attachment 13454 [details]
patch for fix and test case
Comment 10 Darin Adler 2007-03-04 22:22:50 PST
Comment on attachment 13454 [details]
patch for fix and test case

Thanks for tackling this.

--------

+static bool shouldAddNamespaceElem(const Element* elem, Vector<QualifiedName*>* namespaces)

Why the unused "namespaces" parameter?

--------

+    const String prefix = elem->prefix();

We don't normally put const on local variables like this one. I know that some people prefer to do this in their C++ code and there can be some modest benefits, but I don't see any reason to do this with all the local variables in your patch.

--------

All the places you write:

    !!x && x.length()

you don't need to do it like that. This will do the same thing, more efficiently:

    !x.isEmpty()

--------

This is a particularly inefficient way to check this:

+    if (attr->name().toString() == "xmlns") {

One good way to write this is:

    static const QualifiedName xmlnsAttr(nullAtom, "xmlns", nullAtom);
    if (attr->name() == xmlnsAttr) {

--------

+    if (attr->prefix() == "xmlns") {

Why it OK to check the prefix here and not the namespace?

--------

+static DeprecatedString addNamespace(String prefix, const String ns, Vector<QualifiedName*>* namespaces)

These parameters should be const String&, not String type.

The Vector should be a reference parameter, not a pointer parameter. Vector<QualifiedName*>&.

Please don't use DeprecatedString in the interface of new functions. This should return a String.

--------

+    prefix = !!prefix ? prefix : "";

Are you sure the above code is needed? I think everything would work fine without it. Null strings and empty strings behave the same in most cases.

--------

The * goes next to the type.

+        QualifiedName *item = namespaces->at(i);

--------

+static DeprecatedString startMarkup(const Node*, const Range*, EAnnotateForInterchange, CSSMutableStyleDeclaration*, Vector<QualifiedName*>* = 0);
+static DeprecatedString startMarkup(const Node *node, const Range *range, EAnnotateForInterchange annotate, CSSMutableStyleDeclaration *defaultStyle, Vector<QualifiedName*> *namespaces)

You should not have a separate declaration here. If you want to forward declare, it goes at the top of the file. Otherwise, the definition is just fine. Just put the = 0 after the namespaces vector.

+static DeprecatedString markup(Node*, bool, Vector<Node*>*, Vector<QualifiedName*>* = 0);
+static DeprecatedString markup(Node* startNode, bool onlyIncludeChildren, Vector<Node*>* nodes, Vector<QualifiedName*>* namespaces)

Same thing here.

--------

+    Vector<QualifiedName*> my_namespaces;

We don't use underscore in our variable names, nor do we use "my" in variable names.

All the QualifiedName objects of the namespaces leak here. You need to do something to delete them. A call to deleteAllValues would be one way to do it.

--------

Are you sure the test case exercises all the code paths?

I think you're using the wrong data structure for the namespaces. It seems that it should be a map from a prefix string to a namespace string. It's not a good way to use QualifiedName. I'm thinking the type you want is either this:

    HashMap<AtomicString, AtomicString>

or this:

    HashMap<AtomicStringImpl*, AtomicStringImpl*>

Using a vector instead of a map is only good if there's a fixed limit on the number of namespaces. Since there isn't, you should use a map.
Comment 11 Lamar Goddard 2007-03-05 19:02:22 PST
Created attachment 13484 [details]
patch for fix and test case

Ok, another attempt at a patch.  I now use HashMap<AtomicStringImpl*, AtomicStringImpl*> and have included a slightly more complex test case that should hit all the new code branches.
Comment 12 Darin Adler 2007-03-07 07:01:31 PST
Comment on attachment 13484 [details]
patch for fix and test case

Thanks! This looks really good.

+    static const QualifiedName xmlnsAttr(nullAtom, "xmlns", xmlnsURI);
+    if (attr->name() == xmlnsAttr) {

Is it right that the QualifiedName for the "xmlns" attribute has a namespace? I'd expect nullAtom for the namespace. Is there a test case that verifies that this line of code is working properly?

+static bool shouldAddNamespaceAttr(const Attribute* attr, HashMap<AtomicStringImpl*, AtomicStringImpl*>* namespaces)
+static String addNamespace(const AtomicString& prefix, const AtomicString& ns, HashMap<AtomicStringImpl*, AtomicStringImpl*>* namespaces)
+static String addNamespaceElem(const Element *elem, HashMap<AtomicStringImpl*, AtomicStringImpl*>* namespaces)
+static String addNamespaceAttr(const Attribute *attr, HashMap<AtomicStringImpl*, AtomicStringImpl*>* namespaces)

I still think that you should pass a reference here rather than a pointer (as I suggested last time I reviewed a version of this patch). I see how it's consistent to use a * since the namespaces hash is optional at the outermost level

+    // Set pre to be emptyAtom if prefix is empty so that all empty AtomicStrings will map to the same key in the hash.
+    AtomicString pre = prefix.isEmpty() ? emptyAtom : prefix;

I believe the behavior you're talking about here is already guaranteed by AtomicString; the whole point of the class is that it merges all equal strings into a single key. The only exception to this rule might be due to the difference between "null" and "empty", so the code above might be useful if you're trying to make the nullAtom be equal to the emptyAtom. Or maybe the hash table can't handle nullAtom as a key? Did you test without this? Are you sure you need that line of code?

+    AtomicString ns = elem->namespaceURI();

For lines of code like this one, you can actually make them slightly more efficient and not do as much reference count churning if you make the local variable "const AtomicString&". There are three caveats:

    1) you can't modify the value of the variable in that case
    2) you need to make sure that you do this only for functions that actually return const AtomicString&, like namespaceURI() and getAttribute() and the like
    3) you have to be sure not to use the value if the object you got the string from has been deleted

Anyway, not an important optimization but worth mentioning; I would have used it where possible.

This patch has some tabs in it. We can't check a patch into Subversion in if it has tab characters, so it makes work for whoever is landing your patch if you use them. Please search for tab characters and make sure you're not using any before posting your next patch.

+    if (namespaces)
+        namespaceHash = *namespaces;

Since the markup function does not modify the passed-in namespaces hash, then I think the namespaces parameter should be a const* parameter instead of a plain old * parameter?

But I'm worried now that we're going to be doing a lot of hash table copying. Is there any way to avoid that overhead while still keeping the same semantic. I know, it's my fault for suggesting we use a hash table in the first place!

Everything looks good here.

I'm going to say review- just because of the tab characters, and also because there are enough comments that you might want to make some revisions. However, I don't think any of my comments besides the tab character one are mandatory changes.
Comment 13 Lamar Goddard 2007-03-07 16:22:50 PST
Created attachment 13536 [details]
patch for fix and test case

(In reply to comment #12)
> +    static const QualifiedName xmlnsAttr(nullAtom, "xmlns", xmlnsURI);
> +    if (attr->name() == xmlnsAttr) {
> 
> Is it right that the QualifiedName for the "xmlns" attribute has a namespace?
> I'd expect nullAtom for the namespace. Is there a test case that verifies that
> this line of code is working properly?

Yeah, I tried it with nullAtom, but it doesn't match and the "node" elem in the test case also output the namespace.  I got the proper namespaceURI for xmlns attributes from http://www.w3.org/TR/2006/REC-xml-names-20060816/#xmlReserved

> +static bool shouldAddNamespaceAttr(const Attribute* attr,
> HashMap<AtomicStringImpl*, AtomicStringImpl*>* namespaces)
> +static String addNamespace(const AtomicString& prefix, const AtomicString& ns,
> HashMap<AtomicStringImpl*, AtomicStringImpl*>* namespaces)
> +static String addNamespaceElem(const Element *elem, HashMap<AtomicStringImpl*,
> AtomicStringImpl*>* namespaces)
> +static String addNamespaceAttr(const Attribute *attr,
> HashMap<AtomicStringImpl*, AtomicStringImpl*>* namespaces)
> 
> I still think that you should pass a reference here rather than a pointer (as I
> suggested last time I reviewed a version of this patch). I see how it's
> consistent to use a * since the namespaces hash is optional at the outermost
> level

I've made this change.

> +    // Set pre to be emptyAtom if prefix is empty so that all empty
> AtomicStrings will map to the same key in the hash.
> +    AtomicString pre = prefix.isEmpty() ? emptyAtom : prefix;
> 
> I believe the behavior you're talking about here is already guaranteed by
> AtomicString; the whole point of the class is that it merges all equal strings
> into a single key. The only exception to this rule might be due to the
> difference between "null" and "empty", so the code above might be useful if
> you're trying to make the nullAtom be equal to the emptyAtom. Or maybe the hash
> table can't handle nullAtom as a key? Did you test without this? Are you sure
> you need that line of code?

I tested without it and with using nullAtom instead of emptyAtom and both failed so I think you're right in that the hash can't store nullAtom as a key.

> +    AtomicString ns = elem->namespaceURI();
> 
> For lines of code like this one, you can actually make them slightly more
> efficient and not do as much reference count churning if you make the local
> variable "const AtomicString&". There are three caveats:
> 
>     1) you can't modify the value of the variable in that case
>     2) you need to make sure that you do this only for functions that actually
> return const AtomicString&, like namespaceURI() and getAttribute() and the like
>     3) you have to be sure not to use the value if the object you got the
> string from has been deleted
> 
> Anyway, not an important optimization but worth mentioning; I would have used
> it where possible.

I've added this where I thought it made sense.

> This patch has some tabs in it. We can't check a patch into Subversion in if it
> has tab characters, so it makes work for whoever is landing your patch if you
> use them. Please search for tab characters and make sure you're not using any
> before posting your next patch.

Ok, I've removed all the tabs.

> +    if (namespaces)
> +        namespaceHash = *namespaces;
> 
> Since the markup function does not modify the passed-in namespaces hash, then I
> think the namespaces parameter should be a const* parameter instead of a plain
> old * parameter?

I've made this change.

> But I'm worried now that we're going to be doing a lot of hash table copying.
> Is there any way to avoid that overhead while still keeping the same semantic.
> I know, it's my fault for suggesting we use a hash table in the first place!

I'm not sure how much overhead there is in copying a hash.  Each element should have its own copy of the hash to pass on to its children after adding any namespaces it defines.  In theory we could check startNode in WebCore::markup() for being an element that either has a prefix/namespace or has an attribute with a prefix/namespace not already in the hash before doing the copy.
Comment 14 Darin Adler 2007-03-09 06:45:07 PST
Comment on attachment 13536 [details]
patch for fix and test case

+    // Set pre to be emptyAtom if prefix is empty so that all empty AtomicStrings will map to the same key in the hash.

Comment should really mention null, because there are only two empty atomic strings: null and empty.

The code below:

+    // Set pre to be emptyAtom if prefix is empty so that all empty AtomicStrings will map to the same key in the hash.
+    const AtomicString& pre = prefix.isEmpty() ? emptyAtom : prefix;
+    AtomicString foundNS = namespaces.get(pre.impl());
+    if (foundNS.isEmpty() || (foundNS != ns)) {
+        namespaces.set(pre.impl(), ns.impl());
+        return " xmlns" + (!prefix.isEmpty() ? ":" + prefix : "") + "=\"" + ns + "\"";
+    }
+    return "";

seems more natural to me, written this way:

    if (ns.isEmpty())
        return "";
    // Use emptyAtom's impl() for both null and empty strings, since the HashMap can't handle 0 as a key.
    AtomicStringImpl* pre = prefix.isEmpty() ? emptyAtom.impl() : prefix.impl();
    AtomicStringImpl* foundNS = namespaces.get(pre.impl());
    if (foundNS != ns.impl()) {
        namespaces.set(pre.impl(), ns.impl());
        return " xmlns" + (!prefix.isEmpty() ? ":" + prefix : "") + "=\"" + ns + "\"";
    }
    return "";

Since the map is of AtomicStringImpl*, the code can work with those.

With the special case for ns, you can simplify callers of addNamespace -- they don't need to check for the ns.isEmpty() special case.

But those changes are optional. r=me
Comment 15 Alexey Proskuryakov 2007-03-31 02:53:48 PDT
Created attachment 13902 [details]
fix bit rot

I have tried to apply this patch, and encountered several problems:
1) It has bit-rotted.
2) Namespace attribute serialization should use escapeTextForMarkup() to avoid problems with special characters in namespaces.
3) No ChangeLog for LayoutTests,
4) Incorrect markup is produced in the original test case - one of the elements has 'xmlns="null"'.
5) I am not sure why this only fixes createMarkup(Node* ,...), and not createMarkup(Range*, ...).

I fixed (1), (2) and (3), but (4) has to be fixed prior to landing, too. Attaching what I ended up with.
Comment 16 Alexey Proskuryakov 2007-03-31 02:54:50 PDT
Comment on attachment 13536 [details]
patch for fix and test case

r- per the above comment.
Comment 17 Lamar Goddard 2007-04-03 16:21:20 PDT
Created attachment 13938 [details]
patch for fix and test case

Fixed (4) by patching Document::createElementNS to use nullAtom if the _namespaceURI param == "null" as createElementNS with a null namespace should behave the same way as Document::createElement.  Didn't patch JSImmediate::toString as it seems that other functions/tests assume that a null from js will be converted to the string "null".  This issue may need to be split out into its own bug as "null" and null are being treated the same by Document::createElementNS and that shouldn't be.

I don't think (5) needs addressing (at least by this bug) as XMLSerializer only calls createMarkup(Node* , ...).
Comment 18 Alexey Proskuryakov 2007-04-03 21:50:37 PDT
(In reply to comment #17)
> This issue may need to be
> split out into its own bug as "null" and null are being treated the same by
> Document::createElementNS and that shouldn't be.

I think you are right, this sounds like a separate bug then. It can be fixed in Document.idl:

-        [OldStyleObjC] Element createElementNS(in DOMString namespaceURI,
+        [OldStyleObjC] Element createElementNS(in [ConvertNullToNullString] DOMString namespaceURI,
                                               in DOMString qualifiedName)
Comment 19 Darin Adler 2007-04-04 09:31:14 PDT
Comment on attachment 13938 [details]
patch for fix and test case

Patch basically looks pretty good.

+    // Test _namespaceURI for "null" as null from js comes through as "null"

That's a bug that should be fixed at JavaScript binding call site. We shouldn't convert a null to a string and then check for "null". Instead we should use [ConvertNullToNullString] in the Document.idl file. Look at DOMImplementation.idl for examples.

-                        markups.prepend(startMarkup(parent, range, annotate));
+                        markups.prepend(startMarkup(parent, range, annotate, false, 0));

-                        markups.prepend(startMarkup(parent, range, annotate));
+                        markups.prepend(startMarkup(parent, range, annotate, false, 0));

-                markups.prepend(startMarkup(ancestor, range, annotate, convertBlocksToInlines));
+                markups.prepend(startMarkup(ancestor, range, annotate, convertBlocksToInlines, 0));

-                markups.prepend(startMarkup(ancestor, range, annotate));
+                markups.prepend(startMarkup(ancestor, range, annotate, false, 0));

Why are these helpful changes? Those are the default values of those parameters.
Comment 20 Lamar Goddard 2007-04-04 16:51:43 PDT
Created attachment 13953 [details]
patch for fix and test case

I've added [ConvertNullToNullString] for the namespaceURI parameters in Document's bindings of createAttributeNS, createElementNS and getElementsByTagNameNS.

The dom/xhtml/level3/core/nodeisequal14.xhtml test now fails because an Attr created from createAttribute is supposed to have a null prefix, localName and namespaceURI.  See: http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-1084891198
This was only passing before because the namespaceURI that was being used by createAttributeNS(null, ...) was "null".

The dom/xhtml/level3/core/nodeisequal15.xhtml test now succeds as attributes with no namespace from the page and attributes created explicitly with a namespace of null have the same namespace.
Comment 21 Darin Adler 2007-04-05 08:56:18 PDT
Comment on attachment 13953 [details]
patch for fix and test case

r=me

There are three minor things wrong with this patch that are not serious enough for a review-. One is that the change log for LayoutTests needs to list the files modified, and in a couple cases instead it's listing the .xhtml files of the tests with changed results.

Second, the explanation of nodeisequalnode14.xhtml failure is unnecessarily confusing. It should say that it's supposed to create an Attr with localName of null, which is the salient point I think.

The third problem is that we don't have nearly enough testing of the behavior of createElementNS and getElementsByTagNameNS when passed a namespace of null -- the new layout test is the only one that calls them that way, and it doesn't test the behavior very thoroughly.
Comment 22 Alexey Proskuryakov 2007-04-21 02:59:20 PDT
Committed revision 20997.

See bug 6216 for some related getElementsByTagNameNS issues.
Comment 23 Alexey Proskuryakov 2007-06-23 03:37:22 PDT
*** Bug 14322 has been marked as a duplicate of this bug. ***