In xhtml, links must be escaped like this:
<a id="search_link" href="http://www.google.com/search?q=webkit&start=10">Original Search</a>
Safari 3.x correctly handles this when loading an XHTML document, but not when replacing parts of the document using Ajax. What happens is Safari interprets the links as:
Which is wrong - the start parameter will not be sent along.
I have attached an example showing the problem:
1. Open the document
2. Click the link, and you'll go to page #2 of a Google search on Webkit.
3. Hit the replace button which will replace the link in a similar way that an Ajax call will work. Note I took the replace method from the Prototype library.
4. Now click the link again - you'll see page 1 of the results, *not* page 2.
And that's the bug - Safari mucks up & is xhtml links, cause all parameters after the first one to be dropped.
Created attachment 19168 [details]
Shows xhtml link handling bug
Note by the way, the example works correctly in Firefox 2.x and Opera 9.20
Confirmed with r30368 - internally, we are doubly encoding the ampersand as "&#38;".
Any thoughts on fixes? This issue makes using xhtml/ajax with Safari a bit painful since any replaced links are wrong.
I think you would get better results by using DOM manipulation methods instead of createContextualFragment. If this is unacceptably hard to re-architecture the code you have, the next thing I'd try is fixing up URL with DOM methods after parsing with createContextualFragment.
DOM manipulation really isn't the best solution in this case because XHTML is being returned from a server. Thus its a whole lot easier to use createContextualFragment - the alternative would be to use DOM Parser to first parse the content, and then copy it over to the DOM. But my guess is you'd run into the same bug anyway.
I say that because we did try the trick of looping over all A elements that are inserted and reset their href values. It didn't work...the same issue occurs.
This really seems like a bug in webkit since inserting html/xhtml text into the DOM works with every browser *except* Safari with xhtml (and obviously IE with xhtml).
Note that we use xhtml because mapbuzz is a mapping site, and we make heavy use of SVG. In addition, in some places we atom feeds into xhtml documents to speed up some search results. Thus we depend on XML namespaces (or IE's proprietary solution of XML data islands).
Thanks for the comments, and we would of course be happy to test any patches.
Created attachment 19277 [details]
further reduced test case
This looks like a bug in createContextualFragment(), not in URL handling.
DOMParser seems to work, you can try it as a workaround:
var html = '<div foo="&"/>'
var newDoc = (new DOMParser).parseFromString(html, 'application/xml');
Unfortunately DOMParser does *not* work.
To flesh out your example:
var parser = new DOMParser()
var doc = parser.parseFromString(html.stripScripts(), 'application/xhtml+xml');
var node = document.importNode(doc.documentElement, true)
element.innerHTML = ' ' // the space is needed...WebKit bug
All these operations work. But what you end up with is inserted XHTML that has:
* Broken images - the <img> tags are correct but no image shows up
* Broken links - Safari renders the links (they have a blue underline), but if you mouse over them no links show up in the status bar and if you click on them they do not work
After doing the appendChild, alert(element.innerHTML) shows the correct XHTML is inserted. But the inserted XHTML simply doesn't work...
So this solution is a no-go (and seems to reveal a bunch more bugs).
(In reply to comment #9)
> * Broken images - the <img> tags are correct but no image shows up
That's bug 17897 (not a DOMParser issue).
> * Broken links - Safari renders the links (they have a blue underline), but if
> you mouse over them no links show up in the status bar and if you click on them
> they do not work
I do not remember seeing a bug for this, but it sounds related.
Ah, interesting - thanks for the pointer.
Any chance of getting the issue in createContextualFragment() fixed so that the URIs come out right when setting innerHTML in an XHTML document?
Thanks for the help Alexey.
I think I have found the cause of the bug.
We are failing in XMLTokenizer::parseXMLDocumentFragment. The arguments are good but the result of xmlParseBalancedChunkMemory is wrong (it converts "&" to "&")
I had a look at libxml to know why we fail : when you use xmlParseBalancedChunkMemory, the library automatically converts '&' to "&" which explains the bug.
A look at libxml suggests that there is two possibilities to solve the bug: correcting the values when creating the DocumentFragment or using another libxml API. Any suggestion on how to proceed here ?
What is the other libxml2 API? If it's more suitable for the task, we should definitely consider using it.
(In reply to comment #13)
> What is the other libxml2 API? If it's more suitable for the task, we should
> definitely consider using it.
I had no in mind when writing the mail as my libxml2 knowledge was too weak. After looking a bit more into the code and the documentation, I am pretty sure we will have to switch to another API. The candidate I am testing is xmlParseChunk as we can provide an xmlParseCtxPtr (which will give us more control over the parsing).
> I had no in mind when writing the mail as my libxml2 knowledge was too weak.
> After looking a bit more into the code and the documentation, I am pretty sure
> we will have to switch to another API. The candidate I am testing is
> xmlParseChunk as we can provide an xmlParseCtxPtr (which will give us more
> control over the parsing).
I was wrong about xmlParseChunk. We need a function that can parse a fragment or "content" as defined in the libxml documentation (see http://bugs.webkit.org/show_bug.cgi?id=5142 for more explainations).
There is 4 APIs that does fragment parsing: xmlParseBalancedChunkMemory, xmlParseBalancedChunkMemoryRecover, xmlParseContent and xmlParseInNodeContext.
The first two are out because they are more or less the same and we are trying to find a replacement. The last one does not sound correct for what we want.
I have investigated xmlParseContext with no results for the moment. I will continue to digg that API a bit more.
As advised by Alexey, I have contacted the libxml mailing list to have their view about what we wanted to do and the way to do it (see http://mail.gnome.org/archives/xml/2008-April/msg00017.html).
It seems that parsing a content is quite tricky and the only way is to use the internal function xmlParseContent.
As precised in the link, we need to initialize the parser ourselves (we also need to interpret the errors ourselves) which will lead to some non portable code.
However we already use some internal properties and the addition is quite small so it should be worth the effort of merging some of our parsing code.
Created attachment 20493 [details]
First version: switch DocumentFragment parsing to xmlParseContent
Tested and no regression found.
Created attachment 20521 [details]
test contextual parsing
The only problem I can see is that is that createContextualFragment is still not contextual - it doesn't use any context from the node it is invoked on, see this test case. Even in the original test case, the elements would end up in null namespace AFAICT. Daniel says there are many things that are contextual, besides namespaces - are there any others that we should care about?
483 // XML_PARSE_NODICT: default dictionnary option.
An extra "n" in "dictionary" here.
> The only problem I can see is that is that createContextualFragment is still
> not contextual - it doesn't use any context from the node it is invoked on, see
> this test case. Even in the original test case, the elements would end up in
> null namespace AFAICT.
IMHO, I think you are wrong here. Check XMLTokenizer::startElementNs: there is a check for the namespace. If none is provided by libxml then we check the prefix against m_prefixToNamespaceMap (which is initialized from the Element provided as a parent). If no prefix was found, we put the node in the default namespace.
(LayoutTests/fast/dom/set-innnerHTML also check for namespace and I had a hard time with it)
> Daniel says there are many things that are contextual,
> besides namespaces - are there any others that we should care about?
I would say that others are internal to libxml (like tree structure or parser state to detect if the tree is well formed).
I have tried to match libxml behaviour to avoid any regression here (preferring portable API whenever I could).
> 483 // XML_PARSE_NODICT: default dictionnary option.
> An extra "n" in "dictionary" here.
There is also a memory leak as we never free m_context.
Comment on attachment 20493 [details]
First version: switch DocumentFragment parsing to xmlParseContent
> IMHO, I think you are wrong here.
Indeed, I didn't realize that this logic was in WebCore, not libxml.
Committed in r31860.
Thanks everyone for the quick fix - I'll give it a spin in one of this week's nightly builds.
Mass moving XML DOM bugs to the "DOM" Component.