Bug 6452

Summary: KURL::appendEscapingBadChars() doesn't know about %u-escaping.
Product: WebKit Reporter: Henrik Grubbström <grubba>
Component: DOMAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, cdumez
Priority: P2    
Version: 420+   
Hardware: Mac   
OS: OS X 10.4   
Attachments:
Description Flags
Suggested patch for KWQKURL.mm::appendEscapingBadChars()
darin: review-
Now with ChangeLog entry.
darin: review-
don't escape % characters mjs: review+

Henrik Grubbström
Reported 2006-01-09 08:54:03 PST
Sending a %u-escaped string (eg as generated by escape() for unicode data) to XMLHttpRequest::open() will cause any %u's to be escaped an extra time (ie the webserver will receive %25u2014 instead of %u2014). The likely culprit seems to be WebCore/kwq/KWQKURL.mm::appendEscapingBadChars(), which doesn't know about %u-style escaping. A patch similar to the following ought to fix the problem: --- KWQKURL.mm.orig Mon Jan 9 17:43:02 2006 +++ KWQKURL.mm Mon Jan 9 17:46:04 2006 @@ -981,6 +981,16 @@ *p++ = c; *p++ = *str++; *p++ = *str++; + } else if (c == '%' && strEnd - str >= 5 && + (str[0] == 'u' || str[0] == 'U') && + isHexDigit(str[0]) && isHexDigit(str[1]) && + isHexDigit(str[2]) && isHexDigit(str[3])) { + *p++ = c; + *p++ = *str++; + *p++ = *str++; + *p++ = *str++; + *p++ = *str++; + *p++ = *str++; } else if (c == '?') { *p++ = c; } else {
Attachments
Suggested patch for KWQKURL.mm::appendEscapingBadChars() (631 bytes, patch)
2006-01-10 10:01 PST, Henrik Grubbström
darin: review-
Now with ChangeLog entry. (1.04 KB, patch)
2006-01-10 10:15 PST, Henrik Grubbström
darin: review-
don't escape % characters (2.47 KB, patch)
2006-02-14 12:21 PST, Alexey Proskuryakov
mjs: review+
Alexey Proskuryakov
Comment 1 2006-01-10 05:14:22 PST
Do you have a test case that behaves differently in Safari and Firefox (or Internet Explorer)? The problem is not entirely clear to me from the description, sorry.
Henrik Grubbström
Comment 2 2006-01-10 05:47:26 PST
I've verified that my javascript code works as expected in Firefox 1.5. It's hard to do a portable test case, since the typical use is in AJAX code, but here's an extract: index.js: var isSafari = navigator.appVersion.match("AppleWebKit"); function xmlhttpRequest(filepath, callback) { var xmlhttp; if(window.XMLHttpRequest) { // Mozilla, Safari, ... xmlhttp = new XMLHttpRequest(); } else if (window.ActiveXObject) { // IE xmlhttp = new ActiveXObject("Microsoft.XMLHTTP"); } xmlhttp.open("GET", filepath, true); xmlhttp.onreadystatechange = function() { var value = callback(xmlhttp); return value; }; xmlhttp.send(null) } function getResponseXML(xmlhttp) { var doc = document.importNode(xmlhttp.responseXML.documentElement, true); if(isSafari) { // Safari doesn't initialise all node attributes (name, class etc.) // properly without this workaround. doc.innerHTML = doc.innerHTML; } return doc; } function editGetFields(tag_list, data) { for(var i = 0; i < tag_list.length; i++) { var name = tag_list.item(i).getAttribute("name"); if(name) { data[name] = tag_list.item(i).value; } } } function editTextSave(button, close) { var div_edit = document.getElementById("document-editor"); var document_id = div_edit.getAttribute("name"); var article_id = getArticleId(div_edit); var article_tr = document.getElementById("article-open-" + article_id); var data = new Object; editGetFields(div_edit.getElementsByTagName("input"), data); editGetFields(div_edit.getElementsByTagName("textarea"), data); var callback = function(xmlhttp) { if(xmlhttp.readyState != 4) return; var table_db = getResponseXML(xmlhttp); var tr_db = table_db.getElementsByTagName("tr").item(0); article_tr.parentNode.replaceChild(tr_db, article_tr); if(close) { displayArticleList(); } } xmlhttpRequest("actions/edit-text-save.xml?" + "article_id="+escape(article_id) + "&" + "document_id="+escape(document_id) + "&" + "finished="+escape(data.finished) + "&" + "headline="+escape(data.headline) + "&" + "subheadline="+escape(data.subheadline) + "&" + "keywords="+escape(data.keywords) + "&" + "blurb="+escape(data.blurb) + "&" + "byline="+escape(data.byline) + "&" + "notes="+escape(data.notes) + "&" + "story="+escape(data.story), callback); return false; } If one of data.x contains a unicode character outside ISO-8859-1 (like eg \u2014 (EM_DASH)), escape() above will encode it with %u-style encoding (eg %u2014). This is as expected. However when the resulting request is received by the http server the DWIM in XMLHttpRequest::open() will have quoted it an extra time (to eg %25u2014) when sent by Safari: ceylon.roxen.com - - [09/Jan/2006:17:15:44 +0100] "GET /es/actions/edit-text-sav e.xml?article_id=441&document_id=324&finished=on&headline=Katastrofhj%E4lp%20_%2 0ny%20exportsatsning&subheadline=&keywords=&blurb=&byline=&notes=&story=%E5%E4%F 6%20%25uACA1%25uAC94%25uAC84%0A%0AKatastrofhj%E4lp%20%25u2014%20ny%20exportsatsn ing%20%E5%E4%F6%20%BFSpanska%20fr%E5gor%3F%2075%20%25u2031%20%25u260E%0A%0AStora %20katastrofer%20skapar%20lidande%20och%20n%F6d%20%25u2014%20men%20ocks%E5%20exp [...] Note that there are several cases of double-escaped unicode characters above. But not with Firefox: shipon.roxen.com - - [09/Jan/2006:16:23:33 +0100] "GET /es/actions/edit-text-sav e.xml?article_id=441&document_id=324&finished=on&headline=Katastrofhj%E4lp%20_%2 0ny%20exportsatsning&subheadline=&keywords=&blurb=&byline=&notes=&story=%u6CE8%u 6587%u65B9%u6CD5%u304C%u3072%u3068%u76EE%u3067%u308F%u304B%u308B%u30AC%u30A4%u30 C9%u30C4%u30A2%u30FC%u306F%u3053%u3061%u3089%u3002%0A%u260E%20%E5%E4%F6%2075%20% [...] Above unicode characters have been escaped a single time (as expected).
Alexey Proskuryakov
Comment 3 2006-01-10 09:52:55 PST
Yes, I can see this with a simple test html and tcpflow. Would you like to submit the patch for review (by adding ChangeLog notes, attaching the patch as a file and setting a review flag)?
Henrik Grubbström
Comment 4 2006-01-10 10:01:50 PST
Created attachment 5599 [details] Suggested patch for KWQKURL.mm::appendEscapingBadChars()
Henrik Grubbström
Comment 5 2006-01-10 10:15:39 PST
Created attachment 5600 [details] Now with ChangeLog entry.
Alexey Proskuryakov
Comment 6 2006-01-10 10:33:06 PST
My test case: <http://nypop.com/~ap/webkit/xmlhttp-url-encoding.html> (too lazy to make one that doesn't require tcpflow, and no idea about how to make an automated one).
Darin Adler
Comment 7 2006-01-10 15:08:06 PST
Comment on attachment 5599 [details] Suggested patch for KWQKURL.mm::appendEscapingBadChars() First, thanks for submitting this patch. It seems quite important to fix this bug! But I don't think this fix is correct. I think we should either be escaping all % characters in the string or not escaping any. It doesn't seem right to decide whether to escape the % character based on whether it's followed by u201F (for example) or not. If we shouldn't be escaping "%" characters in this case, then we need to change KURL behavior, perhaps adding a parameter if we don't always want that behavior.
Darin Adler
Comment 8 2006-01-10 19:49:46 PST
Comment on attachment 5600 [details] Now with ChangeLog entry. OK, but my review comment stands (see comment 7 above). I don't think this is the correct fix.
Henrik Grubbström
Comment 9 2006-01-11 01:39:04 PST
I agree that having DWIM's like appendEscapingBadChars() is generally a bad idea to begin with, but now that it's there, it should either be fixed or removed entirely (which might break code that doesn't perform escaping properly, and thus relies on it being there). The alternative of escaping all "%" characters would break compatibility with eg Firefox even more.
Alexey Proskuryakov
Comment 10 2006-01-11 05:05:32 PST
I have updated the test case a bit to show more different cases. In the test, both MSIE and Firefox never escape % characters (MSIE doesn't even escape non-ASCII characters, actually). Looks like the DWIM code first appeared in a massive KURL rewrite between revisions 1905 and 1906; so it's not entirely obvious if it serves any purpose.
Alexey Proskuryakov
Comment 11 2006-02-14 12:21:52 PST
Created attachment 6486 [details] don't escape % characters I don't really see how to check all code paths that end up in appendEscapingBadChars()... There is a regression test included in the patch, and "everything else" still seems to work fine - which is admittedly less than a perfect evidence.
Maciej Stachowiak
Comment 12 2006-02-14 18:05:24 PST
Comment on attachment 6486 [details] don't escape % characters Based on your test case I think never escaping % is the right thing. I think I am the one who did the rewrite and I didn't have a specific reason for escaping %. r=me
Brett Wilson (Google)
Comment 13 2007-09-26 13:25:54 PDT
Note that IE7 changes this behavior. It rejects all invalid escape sequences and will not load the URL. This includes the %uXXXX form.
Lucas Forschler
Comment 14 2019-02-06 09:03:07 PST
Mass moving XML DOM bugs to the "DOM" Component.
Note You need to log in before you can comment on or make changes to this bug.