6452 – KURL::appendEscapingBadChars() doesn't know about %u-escaping.

RESOLVED FIXED 6452

KURL::appendEscapingBadChars() doesn't know about %u-escaping.

https://bugs.webkit.org/show_bug.cgi?id=6452

Summary KURL::appendEscapingBadChars() doesn't know about %u-escaping.

Henrik Grubbström

Reported 2006-01-09 08:54:03 PST

Sending a %u-escaped string (eg as generated by escape() for unicode data) to XMLHttpRequest::open() will cause any %u's to be escaped an extra time (ie the webserver will receive %25u2014 instead of %u2014). The likely culprit seems to be WebCore/kwq/KWQKURL.mm::appendEscapingBadChars(), which doesn't know about %u-style escaping. A patch similar to the following ought to fix the problem: --- KWQKURL.mm.orig Mon Jan 9 17:43:02 2006 +++ KWQKURL.mm Mon Jan 9 17:46:04 2006 @@ -981,6 +981,16 @@ *p++ = c; *p++ = *str++; *p++ = *str++; + } else if (c == '%' && strEnd - str >= 5 && + (str[0] == 'u' || str[0] == 'U') && + isHexDigit(str[0]) && isHexDigit(str[1]) && + isHexDigit(str[2]) && isHexDigit(str[3])) { + *p++ = c; + *p++ = *str++; + *p++ = *str++; + *p++ = *str++; + *p++ = *str++; + *p++ = *str++; } else if (c == '?') { *p++ = c; } else {

Attachments
Suggested patch for KWQKURL.mm::appendEscapingBadChars() (631 bytes, patch) 2006-01-10 10:01 PST, Henrik Grubbström	darin: review-	Details Formatted Diff Diff
Now with ChangeLog entry. (1.04 KB, patch) 2006-01-10 10:15 PST, Henrik Grubbström	darin: review-	Details Formatted Diff Diff
don't escape % characters (2.47 KB, patch) 2006-02-14 12:21 PST, Alexey Proskuryakov	mjs: review+	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Alexey Proskuryakov

Comment 1 2006-01-10 05:14:22 PST

Do you have a test case that behaves differently in Safari and Firefox (or Internet Explorer)? The problem is not entirely clear to me from the description, sorry.

Henrik Grubbström

Comment 2 2006-01-10 05:47:26 PST

I've verified that my javascript code works as expected in Firefox 1.5. It's hard to do a portable test case, since the typical use is in AJAX code, but here's an extract: index.js: var isSafari = navigator.appVersion.match("AppleWebKit"); function xmlhttpRequest(filepath, callback) { var xmlhttp; if(window.XMLHttpRequest) { // Mozilla, Safari, ... xmlhttp = new XMLHttpRequest(); } else if (window.ActiveXObject) { // IE xmlhttp = new ActiveXObject("Microsoft.XMLHTTP"); } xmlhttp.open("GET", filepath, true); xmlhttp.onreadystatechange = function() { var value = callback(xmlhttp); return value; }; xmlhttp.send(null) } function getResponseXML(xmlhttp) { var doc = document.importNode(xmlhttp.responseXML.documentElement, true); if(isSafari) { // Safari doesn't initialise all node attributes (name, class etc.) // properly without this workaround. doc.innerHTML = doc.innerHTML; } return doc; } function editGetFields(tag_list, data) { for(var i = 0; i < tag_list.length; i++) { var name = tag_list.item(i).getAttribute("name"); if(name) { data[name] = tag_list.item(i).value; } } } function editTextSave(button, close) { var div_edit = document.getElementById("document-editor"); var document_id = div_edit.getAttribute("name"); var article_id = getArticleId(div_edit); var article_tr = document.getElementById("article-open-" + article_id); var data = new Object; editGetFields(div_edit.getElementsByTagName("input"), data); editGetFields(div_edit.getElementsByTagName("textarea"), data); var callback = function(xmlhttp) { if(xmlhttp.readyState != 4) return; var table_db = getResponseXML(xmlhttp); var tr_db = table_db.getElementsByTagName("tr").item(0); article_tr.parentNode.replaceChild(tr_db, article_tr); if(close) { displayArticleList(); } } xmlhttpRequest("actions/edit-text-save.xml?" + "article_id="+escape(article_id) + "&" + "document_id="+escape(document_id) + "&" + "finished="+escape(data.finished) + "&" + "headline="+escape(data.headline) + "&" + "subheadline="+escape(data.subheadline) + "&" + "keywords="+escape(data.keywords) + "&" + "blurb="+escape(data.blurb) + "&" + "byline="+escape(data.byline) + "&" + "notes="+escape(data.notes) + "&" + "story="+escape(data.story), callback); return false; } If one of data.x contains a unicode character outside ISO-8859-1 (like eg \u2014 (EM_DASH)), escape() above will encode it with %u-style encoding (eg %u2014). This is as expected. However when the resulting request is received by the http server the DWIM in XMLHttpRequest::open() will have quoted it an extra time (to eg %25u2014) when sent by Safari: ceylon.roxen.com - - [09/Jan/2006:17:15:44 +0100] "GET /es/actions/edit-text-sav e.xml?article_id=441&document_id=324&finished=on&headline=Katastrofhj%E4lp%20_%2 0ny%20exportsatsning&subheadline=&keywords=&blurb=&byline=&notes=&story=%E5%E4%F 6%20%25uACA1%25uAC94%25uAC84%0A%0AKatastrofhj%E4lp%20%25u2014%20ny%20exportsatsn ing%20%E5%E4%F6%20%BFSpanska%20fr%E5gor%3F%2075%20%25u2031%20%25u260E%0A%0AStora %20katastrofer%20skapar%20lidande%20och%20n%F6d%20%25u2014%20men%20ocks%E5%20exp [...] Note that there are several cases of double-escaped unicode characters above. But not with Firefox: shipon.roxen.com - - [09/Jan/2006:16:23:33 +0100] "GET /es/actions/edit-text-sav e.xml?article_id=441&document_id=324&finished=on&headline=Katastrofhj%E4lp%20_%2 0ny%20exportsatsning&subheadline=&keywords=&blurb=&byline=&notes=&story=%u6CE8%u 6587%u65B9%u6CD5%u304C%u3072%u3068%u76EE%u3067%u308F%u304B%u308B%u30AC%u30A4%u30 C9%u30C4%u30A2%u30FC%u306F%u3053%u3061%u3089%u3002%0A%u260E%20%E5%E4%F6%2075%20% [...] Above unicode characters have been escaped a single time (as expected).

Alexey Proskuryakov

Comment 3 2006-01-10 09:52:55 PST

Yes, I can see this with a simple test html and tcpflow. Would you like to submit the patch for review (by adding ChangeLog notes, attaching the patch as a file and setting a review flag)?

Henrik Grubbström

Comment 4 2006-01-10 10:01:50 PST

Created attachment 5599 [details] Suggested patch for KWQKURL.mm::appendEscapingBadChars()

Henrik Grubbström

Comment 5 2006-01-10 10:15:39 PST

Created attachment 5600 [details] Now with ChangeLog entry.

Alexey Proskuryakov

Comment 6 2006-01-10 10:33:06 PST

My test case: <http://nypop.com/~ap/webkit/xmlhttp-url-encoding.html> (too lazy to make one that doesn't require tcpflow, and no idea about how to make an automated one).

Darin Adler

Comment 7 2006-01-10 15:08:06 PST

Comment on attachment 5599 [details] Suggested patch for KWQKURL.mm::appendEscapingBadChars() First, thanks for submitting this patch. It seems quite important to fix this bug! But I don't think this fix is correct. I think we should either be escaping all % characters in the string or not escaping any. It doesn't seem right to decide whether to escape the % character based on whether it's followed by u201F (for example) or not. If we shouldn't be escaping "%" characters in this case, then we need to change KURL behavior, perhaps adding a parameter if we don't always want that behavior.

Darin Adler

Comment 8 2006-01-10 19:49:46 PST

Comment on attachment 5600 [details] Now with ChangeLog entry. OK, but my review comment stands (see comment 7 above). I don't think this is the correct fix.

Henrik Grubbström

Comment 9 2006-01-11 01:39:04 PST

I agree that having DWIM's like appendEscapingBadChars() is generally a bad idea to begin with, but now that it's there, it should either be fixed or removed entirely (which might break code that doesn't perform escaping properly, and thus relies on it being there). The alternative of escaping all "%" characters would break compatibility with eg Firefox even more.

Alexey Proskuryakov

Comment 10 2006-01-11 05:05:32 PST

I have updated the test case a bit to show more different cases. In the test, both MSIE and Firefox never escape % characters (MSIE doesn't even escape non-ASCII characters, actually). Looks like the DWIM code first appeared in a massive KURL rewrite between revisions 1905 and 1906; so it's not entirely obvious if it serves any purpose.

Alexey Proskuryakov

Comment 11 2006-02-14 12:21:52 PST

Created attachment 6486 [details] don't escape % characters I don't really see how to check all code paths that end up in appendEscapingBadChars()... There is a regression test included in the patch, and "everything else" still seems to work fine - which is admittedly less than a perfect evidence.

Maciej Stachowiak

Comment 12 2006-02-14 18:05:24 PST

Comment on attachment 6486 [details] don't escape % characters Based on your test case I think never escaping % is the right thing. I think I am the one who did the rewrite and I didn't have a specific reason for escaping %. r=me

Brett Wilson (Google)

Comment 13 2007-09-26 13:25:54 PDT

Note that IE7 changes this behavior. It rejects all invalid escape sequences and will not load the URL. This includes the %uXXXX form.

Lucas Forschler

Comment 14 2019-02-06 09:03:07 PST

Mass moving XML DOM bugs to the "DOM" Component.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 420+

Hardware Mac

OS OS X 10.4

Product WebKit

Component DOM

Assignee

Nobody

Reported

2006-01-09 08:54 PST

Modified

2019-02-06 09:03 PST History

CC List

2 users Show

URL

Keywords

Depends on

Blocks