See the example URL for what I mean. In this instance the HTML writer has forgotten the equals sign. I believe it's pretty clear that the intention was id="foo" not id"foo"="". In quirks mode, you can just put the = back in. In strict mode, the attribute should be ignored as invalid, and not appear in the DOM. Similarly for single quotes and mixed-up quoting (one of each). I'm not sure about attribute names, but HTML says of attribute values: The attribute value may only contain letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), periods (ASCII decimal 46), underscores (ASCII decimal 95), and colons (ASCII decimal 58). I would contend that the same character set restrictions be applied to attribute names (unless they are actually defined somewhere).
HTML5 says that <body id"foo"> should result in an element <body> with an attribute called |id"foo"| and an empty value. IMHO this bug is INVALID. http://whatwg.org/specs/web-apps/current-work/#tokenisation
Ian: I hadn't looked at the HTML5 specs in this regard. But what about when parsing html ≤ 4 in quirks mode?
We want the fewest differences possible. The idea of the HTML5 parser spec is that it also apply in quirks mode. (There are a couple of things that still need doing before it's fully done, but attribute parsing isn't one of them.) Unless IE6 does something different, we should do what the spec says. And if IE6 does do something different, then the spec should probably change to match.
Our behavior matches the HTML5 spec.