Bug 75066 - If predefined XML attribute names had "xml" prefix, we could use == to compare with them
Summary: If predefined XML attribute names had "xml" prefix, we could use == to compar...
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: XML (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Darin Adler
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-21 22:09 PST by Darin Adler
Modified: 2012-09-11 22:12 PDT (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Darin Adler 2011-12-21 22:09:44 PST
There are a few rules about the xml and xmlns namespaces that are not implemented in our XML parser:

1) Both the xml and xmlns prefixes are supposed to be rebound to their respective namespaces.
2) Neither the xml nor xmlns prefix can be rebound to another namespace.
3) Neither the xml nor xmlns namespaces are allowed as a default namespace.
4) Neither the xml nor xmlns namespaces may have another prefix bound to them.

Until this bug is fixed, a proper implementation of xml:lang is impractical.
Comment 1 Eric Seidel (no email) 2011-12-21 22:14:34 PST
I thought that at least 1) was covered by code in the XMLDocumentParser.  Looking I see:
http://trac.webkit.org/browser/trunk/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp#L600
Comment 2 Darin Adler 2011-12-21 22:41:19 PST
(In reply to comment #0)
> There are a few rules about the xml and xmlns namespaces that are not implemented in our XML parser:

I was completely wrong. The real bug is in handling of predefined XML names that we generate from XMLNames.in.
Comment 3 Darin Adler 2011-12-21 22:42:16 PST
The names we generate from xmlattrs.in.
Comment 4 Eric Seidel (no email) 2011-12-21 22:47:37 PST
So make_names.pl needs to respect some sort of attr_prefix directive? or maybe each of the names needs to have a prefix defined?

Then I guess this line of code needs to change?
http://trac.webkit.org/browser/trunk/Source/WebCore/dom/make_names.pl#L631
Comment 5 Darin Adler 2011-12-21 22:57:13 PST
I’ve got a fix for this. It’s pretty simple. We were already deciding what to do for the namespace correctly, but not for the prefix.
Comment 6 Darin Adler 2011-12-21 22:58:01 PST
(In reply to comment #4)
> So make_names.pl needs to respect some sort of attr_prefix directive? or maybe each of the names needs to have a prefix defined?
> 
> Then I guess this line of code needs to change?
> http://trac.webkit.org/browser/trunk/Source/WebCore/dom/make_names.pl#L631

That’s one of the lines of code, but there’s another set of code a few lines lower in the file that does the same thing a different way.
Comment 7 Darin Adler 2011-12-21 23:27:07 PST
Thinking on this more, this is a less general issue than I thought.

In general, matching an expected qualified name should be done with matches, not with ==. That's because it's enough to just have the local name and the namespace URI match. The prefix does not have to match.

There are only two exceptions to this:

1) Names with a null namespace URI. They are guaranteed to also have a null prefix URI, so you can use == instead of matches on them.

2) Namespaces with the xml or xmlns prefix. They are guaranteed to always have the same prefix, so you can use == instead of matches on them.

It’s largely a coincidence that these are the only two types of attributes commonly seen in our attribute handling functions.

Adding a proper prefix to the qualified names for the xml-prefix names such as xml:lang will allow us to use the conventional == technique on those names, but ironically for a different reason than for the non-namespaced attribute names used in SVG and HTML.

On the other hand, adding a proper prefix to the qualified names for other prefixes, such as the XHTML namespace, doesn’t do much good. In fact, the HTML code might even be currently depending on the XHTML names having null prefixes. If so, there is no compelling reason to change this. Those qualified names need to match any prefix, as long as the namespace URI and local name match.

Now that we have established that this is just an optimization, we could forgo the optimization and use matches in the xml:lang handling, which is what code in SVGLangSpace.cpp already does. A bit of shame because we could be using the faster == if we just set the namespace correctly in the globals.
Comment 8 Darin Adler 2011-12-22 10:19:58 PST
Here’s my thinking on this now:

One problem with things as they are is that we use QualifiedName to hold names that should just be names with namespaces. There are prefix fields in these, but code that relies on the prefix field is likely to be mistaken. So using == or even calling prefix on one of these could well be a bug. The predefined names should probably be some kind of object that does not have an unused prefix field.

Our attribute handling code works because most pre-defined attributes have no namespace at all. For qualified names with no namespace, == works, because there is only one valid prefix. The other case with only one valid prefix is attributes in the XML reserved namespaces, the ones with the prefixes xml and xmlns.

Our element handling code works because it uses calls to the matches function rather than calls to ==.

A silly thing about the matches function is that it checks == on the QualifiedName before doing the actual matches algorithm. This matches the relatively uncommon case of calling matches on two QualifiedName objects that are exactly equal, and makes the relatively common case of calling matches on two QualifiedName objects that are not equal (both non-matches and matches) slower. Seems like the wrong tradeoff.

One way to clean all of this up is to add some new classes. We could have one class to hold names with namespaces and not prefixes. Not sure the best name for that, but perhaps NameWithNamespace? Then we could have another class to hold names with one of the reserved namespaces (treating no namespace at all as one of those). NameWithReservedNamespace. Then, a constant like XMLNames::langAttr would be NameWithReservedNamespace and a constant like HTMLNames::aTag would be NameWithNamespace rather than QualifiedName.

We could define == operators that compared the relevant fields, so someone using these things at a call site would get the correct operator.

I’m not sure which of these things are real problems and which problems are worth solving.
Comment 9 Darin Adler 2012-09-11 22:12:31 PDT
I think this is not really all that great an idea.