Bug 17403 - WebKit Creates Invalid Xhtml Links with Ajax
Summary: WebKit Creates Invalid Xhtml Links with Ajax
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Major
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-02-16 23:19 PST by Charlie
Modified: 2019-02-06 09:02 PST (History)
4 users (show)

See Also:


Attachments
Shows xhtml link handling bug (918 bytes, application/xhtml+xml)
2008-02-16 23:20 PST, Charlie
no flags Details
further reduced test case (551 bytes, application/xhtml+xml)
2008-02-22 03:59 PST, Alexey Proskuryakov
no flags Details
First version: switch DocumentFragment parsing to xmlParseContent (11.11 KB, patch)
2008-04-12 15:07 PDT, Julien Chaffraix
ap: review+
Details | Formatted Diff | Diff
test contextual parsing (600 bytes, application/xhtml+xml)
2008-04-13 23:56 PDT, Alexey Proskuryakov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Charlie 2008-02-16 23:19:51 PST
In xhtml, links must be escaped like this:

<a id="search_link" href="http://www.google.com/search?q=webkit&amp;start=10">Original Search</a>

Safari 3.x correctly handles this when loading an XHTML document, but not when replacing parts of the document using Ajax.  What happens is Safari interprets the links as:

http://www.google.com/search?q=webkit&#38;start=10

Which is wrong - the start parameter will not be sent along.

I have attached an example showing the problem:

1.  Open the document
2.  Click the link, and you'll go to page #2 of a Google search on Webkit.
3.  Hit the replace button which will replace the link in a similar way that an Ajax call will work.  Note I took the replace method from the Prototype library.
4.  Now click the link again - you'll see page 1 of the results, *not* page 2.  

And that's the bug - Safari mucks up &amp; is xhtml links, cause all parameters after the first one to be dropped.
Comment 1 Charlie 2008-02-16 23:20:44 PST
Created attachment 19168 [details]
Shows xhtml link handling bug
Comment 2 Charlie 2008-02-16 23:24:42 PST
Note by the way, the example works correctly in Firefox 2.x and Opera 9.20
Comment 3 Alexey Proskuryakov 2008-02-18 03:41:08 PST
Confirmed with r30368 - internally, we are doubly encoding the ampersand as "&amp;#38;".
Comment 4 Charlie 2008-02-21 01:02:58 PST
Any thoughts on fixes?  This issue makes using xhtml/ajax with Safari a bit painful since any replaced links are wrong.
Comment 5 Alexey Proskuryakov 2008-02-21 01:52:45 PST
I think you would get better results by using DOM manipulation methods instead of createContextualFragment. If this is unacceptably hard to re-architecture the code you have, the next thing I'd try is fixing up URL with DOM methods after parsing with createContextualFragment.
Comment 6 Charlie 2008-02-21 19:27:18 PST
DOM manipulation really isn't the best solution in this case because XHTML is being returned from a server.  Thus its a whole lot easier to use createContextualFragment - the alternative would be to use DOM Parser to first parse the content, and then copy it over to the DOM.  But my guess is you'd run into the same bug anyway.

I say that because we did try the trick of looping over all A elements that are inserted and reset their href values.  It didn't work...the same issue occurs.

This really seems like a bug in webkit since inserting html/xhtml text into the DOM works with every browser *except* Safari with xhtml (and obviously IE with xhtml). 

Note that we use xhtml because mapbuzz is a mapping site, and we make heavy use of SVG.  In addition, in some places we atom feeds into xhtml documents to speed up some search results.  Thus we depend on XML namespaces (or IE's proprietary solution of XML data islands).

Thanks for the comments, and we would of course be happy to test any patches.

Comment 7 Alexey Proskuryakov 2008-02-22 03:59:27 PST
Created attachment 19277 [details]
further reduced test case

This looks like a bug in createContextualFragment(), not in URL handling.
Comment 8 Alexey Proskuryakov 2008-02-22 04:02:43 PST
DOMParser seems to work, you can try it as a workaround:

    var html = '<div foo="&amp;"/>'
    var newDoc = (new DOMParser).parseFromString(html, 'application/xml');

Comment 9 Charlie 2008-03-18 14:44:49 PDT
Alexey,

Unfortunately DOMParser does *not* work.

To flesh out your example:

var parser = new DOMParser()
var doc = parser.parseFromString(html.stripScripts(), 'application/xhtml+xml');
var node = document.importNode(doc.documentElement, true)
element.innerHTML = ' ' // the space is needed...WebKit bug
element.appendChild(node)

All these operations work.  But what you end up with is inserted XHTML that has:

* Broken images - the <img> tags are correct but no image shows up
* Broken links - Safari renders the links (they have a blue underline), but if you mouse over them no links show up in the status bar and if you click on them they do not work

After doing the appendChild, alert(element.innerHTML) shows the correct XHTML is inserted.  But the inserted XHTML simply doesn't work...

So this solution is a no-go (and seems to reveal a bunch more bugs).
Comment 10 Alexey Proskuryakov 2008-03-18 23:28:08 PDT
(In reply to comment #9)
> * Broken images - the <img> tags are correct but no image shows up

  That's bug 17897 (not a DOMParser issue).

> * Broken links - Safari renders the links (they have a blue underline), but if
> you mouse over them no links show up in the status bar and if you click on them
> they do not work

  I do not remember seeing a bug for this, but it sounds related.
Comment 11 Charlie 2008-03-18 23:35:35 PDT
Ah, interesting - thanks for the pointer.

Any chance of getting the issue in createContextualFragment() fixed so that the URIs come out right when setting innerHTML in an XHTML document?

Thanks for the help Alexey.
Comment 12 Julien Chaffraix 2008-03-26 19:00:47 PDT
I think I have found the cause of the bug.
We are failing in XMLTokenizer::parseXMLDocumentFragment. The arguments are good but the result of xmlParseBalancedChunkMemory is wrong (it converts "&amp;" to "&#38;")
I had a look at libxml to know why we fail : when you use xmlParseBalancedChunkMemory, the library automatically converts '&' to "&#38" which explains the bug.
A look at libxml suggests that there is two possibilities to solve the bug: correcting the values when creating the DocumentFragment or using another libxml API. Any suggestion on how to proceed here ?
Comment 13 Alexey Proskuryakov 2008-03-28 03:48:46 PDT
What is the other libxml2 API? If it's more suitable for the task, we should definitely consider using it.
Comment 14 Julien Chaffraix 2008-03-28 13:44:16 PDT
(In reply to comment #13)
> What is the other libxml2 API? If it's more suitable for the task, we should
> definitely consider using it.
> 

I had no in mind when writing the mail as my libxml2 knowledge was too weak. After looking a bit more into the code and the documentation, I am pretty sure we will have to switch to another API. The candidate I am testing is xmlParseChunk as we can provide an xmlParseCtxPtr (which will give us more control over the parsing).
Comment 15 Julien Chaffraix 2008-03-29 19:23:36 PDT
> I had no in mind when writing the mail as my libxml2 knowledge was too weak.
> After looking a bit more into the code and the documentation, I am pretty sure
> we will have to switch to another API. The candidate I am testing is
> xmlParseChunk as we can provide an xmlParseCtxPtr (which will give us more
> control over the parsing).
> 

I was wrong about xmlParseChunk. We need a function that can parse a fragment or "content" as defined in the libxml documentation (see http://bugs.webkit.org/show_bug.cgi?id=5142 for more explainations).
There is 4 APIs that does fragment parsing: xmlParseBalancedChunkMemory, xmlParseBalancedChunkMemoryRecover, xmlParseContent and xmlParseInNodeContext.
The first two are out because they are more or less the same and we are trying to find a replacement. The last one does not sound correct for what we want.
I have investigated xmlParseContext with no results for the moment. I will continue to digg that API a bit more.
Comment 16 Julien Chaffraix 2008-04-12 15:02:49 PDT
As advised by Alexey, I have contacted the libxml mailing list to have their view about what we wanted to do and the way to do it (see http://mail.gnome.org/archives/xml/2008-April/msg00017.html).
It seems that parsing a content is quite tricky and the only way is to use the internal function xmlParseContent.
As precised in the link, we need to initialize the parser ourselves (we also need to interpret the errors ourselves) which will lead to some non portable code.
However we already use some internal properties and the addition is quite small so it should be worth the effort of merging some of our parsing code.
Comment 17 Julien Chaffraix 2008-04-12 15:07:03 PDT
Created attachment 20493 [details]
First version: switch DocumentFragment parsing to xmlParseContent

Tested and no regression found.
Comment 18 Alexey Proskuryakov 2008-04-13 23:56:55 PDT
Created attachment 20521 [details]
test contextual parsing

The only problem I can see is that is that createContextualFragment is still not contextual - it doesn't use any context from the node it is invoked on, see this test case. Even in the original test case, the elements would end up in null namespace AFAICT. Daniel says there are many things that are contextual, besides namespaces - are there any others that we should care about?

 483     // XML_PARSE_NODICT: default dictionnary option.

An extra "n" in "dictionary" here.
Comment 19 Julien Chaffraix 2008-04-14 03:50:49 PDT
> The only problem I can see is that is that createContextualFragment is still
> not contextual - it doesn't use any context from the node it is invoked on, see
> this test case. Even in the original test case, the elements would end up in
> null namespace AFAICT. 

IMHO, I think you are wrong here. Check XMLTokenizer::startElementNs: there is a check for the namespace. If none is provided by libxml then we check the prefix against m_prefixToNamespaceMap (which is initialized from the Element provided as a parent). If no prefix was found, we put the node in the default namespace.
(LayoutTests/fast/dom/set-innnerHTML also check for namespace and I had a hard time with it)

> Daniel says there are many things that are contextual,
> besides namespaces - are there any others that we should care about?

I would say that others are internal to libxml (like tree structure or parser state to detect if the tree is well formed).

I have tried to match libxml behaviour to avoid any regression here (preferring portable API whenever I could).

>  483     // XML_PARSE_NODICT: default dictionnary option.
> 
> An extra "n" in "dictionary" here.
> 
Oups...

There is also a memory leak as we never free m_context.
Comment 20 Alexey Proskuryakov 2008-04-14 04:22:01 PDT
Comment on attachment 20493 [details]
First version: switch DocumentFragment parsing to xmlParseContent

> IMHO, I think you are wrong here.

Indeed, I didn't realize that this logic was in WebCore, not libxml.

r=me.
Comment 21 Julien Chaffraix 2008-04-14 04:53:11 PDT
Committed in r31860.
Comment 22 Charlie 2008-04-14 08:13:43 PDT
Thanks everyone for the quick fix - I'll give it a spin in one of this week's nightly builds.
Comment 23 Lucas Forschler 2019-02-06 09:02:36 PST
Mass moving XML DOM bugs to the "DOM" Component.