17403 – WebKit Creates Invalid Xhtml Links with Ajax

RESOLVED FIXED17403

WebKit Creates Invalid Xhtml Links with Ajax

https://bugs.webkit.org/show_bug.cgi?id=17403

Summary WebKit Creates Invalid Xhtml Links with Ajax

Charlie

Reported 2008-02-16 23:19:51 PST

In xhtml, links must be escaped like this: <a id="search_link" href="http://www.google.com/search?q=webkit&start=10">Original Search</a> Safari 3.x correctly handles this when loading an XHTML document, but not when replacing parts of the document using Ajax. What happens is Safari interprets the links as: http://www.google.com/search?q=webkit&start=10 Which is wrong - the start parameter will not be sent along. I have attached an example showing the problem: 1. Open the document 2. Click the link, and you'll go to page #2 of a Google search on Webkit. 3. Hit the replace button which will replace the link in a similar way that an Ajax call will work. Note I took the replace method from the Prototype library. 4. Now click the link again - you'll see page 1 of the results, *not* page 2. And that's the bug - Safari mucks up & is xhtml links, cause all parameters after the first one to be dropped.

Attachments
Shows xhtml link handling bug (918 bytes, application/xhtml+xml) 2008-02-16 23:20 PST, Charlie	no flags	Details
further reduced test case (551 bytes, application/xhtml+xml) 2008-02-22 03:59 PST, Alexey Proskuryakov	no flags	Details
First version: switch DocumentFragment parsing to xmlParseContent (11.11 KB, patch) 2008-04-12 15:07 PDT, Julien Chaffraix	ap: review+	Details Formatted Diff Diff
test contextual parsing (600 bytes, application/xhtml+xml) 2008-04-13 23:56 PDT, Alexey Proskuryakov	no flags	Details
View All Add attachment proposed patch, testcase, etc.

Charlie

Comment 1 2008-02-16 23:20:44 PST

Created attachment 19168 [details] Shows xhtml link handling bug

Charlie

Comment 2 2008-02-16 23:24:42 PST

Note by the way, the example works correctly in Firefox 2.x and Opera 9.20

Alexey Proskuryakov

Comment 3 2008-02-18 03:41:08 PST

Confirmed with r30368 - internally, we are doubly encoding the ampersand as "&#38;".

Charlie

Comment 4 2008-02-21 01:02:58 PST

Any thoughts on fixes? This issue makes using xhtml/ajax with Safari a bit painful since any replaced links are wrong.

Alexey Proskuryakov

Comment 5 2008-02-21 01:52:45 PST

I think you would get better results by using DOM manipulation methods instead of createContextualFragment. If this is unacceptably hard to re-architecture the code you have, the next thing I'd try is fixing up URL with DOM methods after parsing with createContextualFragment.

Charlie

Comment 6 2008-02-21 19:27:18 PST

DOM manipulation really isn't the best solution in this case because XHTML is being returned from a server. Thus its a whole lot easier to use createContextualFragment - the alternative would be to use DOM Parser to first parse the content, and then copy it over to the DOM. But my guess is you'd run into the same bug anyway. I say that because we did try the trick of looping over all A elements that are inserted and reset their href values. It didn't work...the same issue occurs. This really seems like a bug in webkit since inserting html/xhtml text into the DOM works with every browser *except* Safari with xhtml (and obviously IE with xhtml). Note that we use xhtml because mapbuzz is a mapping site, and we make heavy use of SVG. In addition, in some places we atom feeds into xhtml documents to speed up some search results. Thus we depend on XML namespaces (or IE's proprietary solution of XML data islands). Thanks for the comments, and we would of course be happy to test any patches.

Alexey Proskuryakov

Comment 7 2008-02-22 03:59:27 PST

Created attachment 19277 [details] further reduced test case This looks like a bug in createContextualFragment(), not in URL handling.

Alexey Proskuryakov

Comment 8 2008-02-22 04:02:43 PST

DOMParser seems to work, you can try it as a workaround: var html = '<div foo="&"/>' var newDoc = (new DOMParser).parseFromString(html, 'application/xml');

Charlie

Comment 9 2008-03-18 14:44:49 PDT

Alexey, Unfortunately DOMParser does *not* work. To flesh out your example: var parser = new DOMParser() var doc = parser.parseFromString(html.stripScripts(), 'application/xhtml+xml'); var node = document.importNode(doc.documentElement, true) element.innerHTML = ' ' // the space is needed...WebKit bug element.appendChild(node) All these operations work. But what you end up with is inserted XHTML that has: * Broken images - the <img> tags are correct but no image shows up * Broken links - Safari renders the links (they have a blue underline), but if you mouse over them no links show up in the status bar and if you click on them they do not work After doing the appendChild, alert(element.innerHTML) shows the correct XHTML is inserted. But the inserted XHTML simply doesn't work... So this solution is a no-go (and seems to reveal a bunch more bugs).

Alexey Proskuryakov

Comment 10 2008-03-18 23:28:08 PDT

(In reply to comment #9) > * Broken images - the <img> tags are correct but no image shows up That's bug 17897 (not a DOMParser issue). > * Broken links - Safari renders the links (they have a blue underline), but if > you mouse over them no links show up in the status bar and if you click on them > they do not work I do not remember seeing a bug for this, but it sounds related.

Charlie

Comment 11 2008-03-18 23:35:35 PDT

Ah, interesting - thanks for the pointer. Any chance of getting the issue in createContextualFragment() fixed so that the URIs come out right when setting innerHTML in an XHTML document? Thanks for the help Alexey.

Julien Chaffraix

Comment 12 2008-03-26 19:00:47 PDT

I think I have found the cause of the bug. We are failing in XMLTokenizer::parseXMLDocumentFragment. The arguments are good but the result of xmlParseBalancedChunkMemory is wrong (it converts "&" to "&") I had a look at libxml to know why we fail : when you use xmlParseBalancedChunkMemory, the library automatically converts '&' to "&#38" which explains the bug. A look at libxml suggests that there is two possibilities to solve the bug: correcting the values when creating the DocumentFragment or using another libxml API. Any suggestion on how to proceed here ?

Alexey Proskuryakov

Comment 13 2008-03-28 03:48:46 PDT

What is the other libxml2 API? If it's more suitable for the task, we should definitely consider using it.

Julien Chaffraix

Comment 14 2008-03-28 13:44:16 PDT

(In reply to comment #13) > What is the other libxml2 API? If it's more suitable for the task, we should > definitely consider using it. > I had no in mind when writing the mail as my libxml2 knowledge was too weak. After looking a bit more into the code and the documentation, I am pretty sure we will have to switch to another API. The candidate I am testing is xmlParseChunk as we can provide an xmlParseCtxPtr (which will give us more control over the parsing).

Julien Chaffraix

Comment 15 2008-03-29 19:23:36 PDT

> I had no in mind when writing the mail as my libxml2 knowledge was too weak. > After looking a bit more into the code and the documentation, I am pretty sure > we will have to switch to another API. The candidate I am testing is > xmlParseChunk as we can provide an xmlParseCtxPtr (which will give us more > control over the parsing). > I was wrong about xmlParseChunk. We need a function that can parse a fragment or "content" as defined in the libxml documentation (see http://bugs.webkit.org/show_bug.cgi?id=5142 for more explainations). There is 4 APIs that does fragment parsing: xmlParseBalancedChunkMemory, xmlParseBalancedChunkMemoryRecover, xmlParseContent and xmlParseInNodeContext. The first two are out because they are more or less the same and we are trying to find a replacement. The last one does not sound correct for what we want. I have investigated xmlParseContext with no results for the moment. I will continue to digg that API a bit more.

Julien Chaffraix

Comment 16 2008-04-12 15:02:49 PDT

As advised by Alexey, I have contacted the libxml mailing list to have their view about what we wanted to do and the way to do it (see http://mail.gnome.org/archives/xml/2008-April/msg00017.html). It seems that parsing a content is quite tricky and the only way is to use the internal function xmlParseContent. As precised in the link, we need to initialize the parser ourselves (we also need to interpret the errors ourselves) which will lead to some non portable code. However we already use some internal properties and the addition is quite small so it should be worth the effort of merging some of our parsing code.

Julien Chaffraix

Comment 17 2008-04-12 15:07:03 PDT

Created attachment 20493 [details] First version: switch DocumentFragment parsing to xmlParseContent Tested and no regression found.

Alexey Proskuryakov

Comment 18 2008-04-13 23:56:55 PDT

Created attachment 20521 [details] test contextual parsing The only problem I can see is that is that createContextualFragment is still not contextual - it doesn't use any context from the node it is invoked on, see this test case. Even in the original test case, the elements would end up in null namespace AFAICT. Daniel says there are many things that are contextual, besides namespaces - are there any others that we should care about? 483 // XML_PARSE_NODICT: default dictionnary option. An extra "n" in "dictionary" here.

Julien Chaffraix

Comment 19 2008-04-14 03:50:49 PDT

> The only problem I can see is that is that createContextualFragment is still > not contextual - it doesn't use any context from the node it is invoked on, see > this test case. Even in the original test case, the elements would end up in > null namespace AFAICT. IMHO, I think you are wrong here. Check XMLTokenizer::startElementNs: there is a check for the namespace. If none is provided by libxml then we check the prefix against m_prefixToNamespaceMap (which is initialized from the Element provided as a parent). If no prefix was found, we put the node in the default namespace. (LayoutTests/fast/dom/set-innnerHTML also check for namespace and I had a hard time with it) > Daniel says there are many things that are contextual, > besides namespaces - are there any others that we should care about? I would say that others are internal to libxml (like tree structure or parser state to detect if the tree is well formed). I have tried to match libxml behaviour to avoid any regression here (preferring portable API whenever I could). > 483 // XML_PARSE_NODICT: default dictionnary option. > > An extra "n" in "dictionary" here. > Oups... There is also a memory leak as we never free m_context.

Alexey Proskuryakov

Comment 20 2008-04-14 04:22:01 PDT

Comment on attachment 20493 [details] First version: switch DocumentFragment parsing to xmlParseContent > IMHO, I think you are wrong here. Indeed, I didn't realize that this logic was in WebCore, not libxml. r=me.

Julien Chaffraix

Comment 21 2008-04-14 04:53:11 PDT

Committed in r31860.

Charlie

Comment 22 2008-04-14 08:13:43 PDT

Thanks everyone for the quick fix - I'll give it a spin in one of this week's nightly builds.

Lucas Forschler

Comment 23 2019-02-06 09:02:36 PST

Mass moving XML DOM bugs to the "DOM" Component.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Major

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component DOM

Assignee

Nobody

Reported

2008-02-16 23:19 PST

Modified

2019-02-06 09:02 PST History

CC List

4 users Show

URL

Keywords

Depends on

Blocks