Bug 6452 - KURL::appendEscapingBadChars() doesn't know about %u-escaping.
Summary: KURL::appendEscapingBadChars() doesn't know about %u-escaping.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 420+
Hardware: Macintosh OS X 10.4
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-09 08:54 PST by Henrik Grubbström
Modified: 2019-02-06 09:03 PST (History)
2 users (show)

See Also:


Attachments
Suggested patch for KWQKURL.mm::appendEscapingBadChars() (631 bytes, patch)
2006-01-10 10:01 PST, Henrik Grubbström
darin: review-
Details | Formatted Diff | Diff
Now with ChangeLog entry. (1.04 KB, patch)
2006-01-10 10:15 PST, Henrik Grubbström
darin: review-
Details | Formatted Diff | Diff
don't escape % characters (2.47 KB, patch)
2006-02-14 12:21 PST, Alexey Proskuryakov
mjs: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Henrik Grubbström 2006-01-09 08:54:03 PST
Sending a %u-escaped string (eg as generated by escape() for unicode data) to
XMLHttpRequest::open() will cause any %u's to be escaped an extra time (ie the
webserver will receive %25u2014 instead of %u2014). The likely culprit seems to
be WebCore/kwq/KWQKURL.mm::appendEscapingBadChars(), which doesn't know about
%u-style escaping.

A patch similar to the following ought to fix the problem:

--- KWQKURL.mm.orig     Mon Jan  9 17:43:02 2006
+++ KWQKURL.mm  Mon Jan  9 17:46:04 2006
@@ -981,6 +981,16 @@
                 *p++ = c;
                 *p++ = *str++;
                 *p++ = *str++;
+           } else if (c == '%' && strEnd - str >= 5 &&
+                      (str[0] == 'u' || str[0] == 'U') &&
+                      isHexDigit(str[0]) && isHexDigit(str[1]) &&
+                      isHexDigit(str[2]) && isHexDigit(str[3])) {
+                *p++ = c;
+                *p++ = *str++;
+                *p++ = *str++;
+                *p++ = *str++;
+                *p++ = *str++;
+                *p++ = *str++;
             } else if (c == '?') {
                 *p++ = c;
             } else {
Comment 1 Alexey Proskuryakov 2006-01-10 05:14:22 PST
Do you have a test case that behaves differently in Safari and Firefox (or Internet Explorer)? The problem is 
not entirely clear to me from the description, sorry.
Comment 2 Henrik Grubbström 2006-01-10 05:47:26 PST
I've verified that my javascript code works as expected in Firefox 1.5.

It's hard to do a portable test case, since the typical use is in AJAX code, but
here's an extract:

index.js:

var isSafari = navigator.appVersion.match("AppleWebKit");

    function xmlhttpRequest(filepath, callback)
    {
      var xmlhttp;
      if(window.XMLHttpRequest)
      { // Mozilla, Safari, ...
        xmlhttp = new XMLHttpRequest();
      }
      else if (window.ActiveXObject)
      { // IE
        xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
      }
      
      xmlhttp.open("GET", filepath, true);
      xmlhttp.onreadystatechange = function()
      {
        var value = callback(xmlhttp);
	return value;
      };
      xmlhttp.send(null)
    }

    function getResponseXML(xmlhttp)
    {
      var doc = document.importNode(xmlhttp.responseXML.documentElement, true);
      
      if(isSafari)
      {
        // Safari doesn't initialise all node attributes (name, class etc.)
	// properly without this workaround.
        doc.innerHTML = doc.innerHTML;
      }

      return doc;
    }

    function editGetFields(tag_list, data)
    {
      for(var i = 0; i < tag_list.length; i++)
      {
        var name = tag_list.item(i).getAttribute("name");
        if(name)
	{
	  data[name] = tag_list.item(i).value;
	}
      }
    }
       
    function editTextSave(button, close)
    {
      var div_edit = document.getElementById("document-editor");
      var document_id = div_edit.getAttribute("name");

      var article_id = getArticleId(div_edit);
      var article_tr = document.getElementById("article-open-" + article_id);
    
      var data = new Object;
      editGetFields(div_edit.getElementsByTagName("input"), data);
      editGetFields(div_edit.getElementsByTagName("textarea"), data);

      var callback = function(xmlhttp)
      {
        if(xmlhttp.readyState != 4)
	  return;

	var table_db = getResponseXML(xmlhttp);
	var tr_db = table_db.getElementsByTagName("tr").item(0);
	  
	article_tr.parentNode.replaceChild(tr_db, article_tr);
	  
	if(close)
	{
	  displayArticleList();
	}
      }

      xmlhttpRequest("actions/edit-text-save.xml?" +
                     "article_id="+escape(article_id) + "&" +
                     "document_id="+escape(document_id) + "&" +
		     "finished="+escape(data.finished) + "&" +
		     "headline="+escape(data.headline) + "&" +
		     "subheadline="+escape(data.subheadline) + "&" +
		     "keywords="+escape(data.keywords) + "&" +
		     "blurb="+escape(data.blurb) + "&" +
		     "byline="+escape(data.byline) + "&" +
		     "notes="+escape(data.notes) + "&" +
		     "story="+escape(data.story),
		     callback);
		     
      return false;
    }

If one of data.x contains a unicode character outside ISO-8859-1 (like eg \u2014
(EM_DASH)), escape() above will encode it with %u-style encoding (eg %u2014).
This is as expected. However when the resulting request is received by the http
server the DWIM in XMLHttpRequest::open() will have quoted it an extra time (to
eg %25u2014) when sent by Safari:

ceylon.roxen.com - - [09/Jan/2006:17:15:44 +0100] "GET /es/actions/edit-text-sav
e.xml?article_id=441&document_id=324&finished=on&headline=Katastrofhj%E4lp%20_%2
0ny%20exportsatsning&subheadline=&keywords=&blurb=&byline=&notes=&story=%E5%E4%F
6%20%25uACA1%25uAC94%25uAC84%0A%0AKatastrofhj%E4lp%20%25u2014%20ny%20exportsatsn
ing%20%E5%E4%F6%20%BFSpanska%20fr%E5gor%3F%2075%20%25u2031%20%25u260E%0A%0AStora
%20katastrofer%20skapar%20lidande%20och%20n%F6d%20%25u2014%20men%20ocks%E5%20exp
[...]

Note that there are several cases of double-escaped unicode characters above.

But not with Firefox:

shipon.roxen.com - - [09/Jan/2006:16:23:33 +0100] "GET /es/actions/edit-text-sav
e.xml?article_id=441&document_id=324&finished=on&headline=Katastrofhj%E4lp%20_%2
0ny%20exportsatsning&subheadline=&keywords=&blurb=&byline=&notes=&story=%u6CE8%u
6587%u65B9%u6CD5%u304C%u3072%u3068%u76EE%u3067%u308F%u304B%u308B%u30AC%u30A4%u30
C9%u30C4%u30A2%u30FC%u306F%u3053%u3061%u3089%u3002%0A%u260E%20%E5%E4%F6%2075%20%
[...]

Above unicode characters have been escaped a single time (as expected).
Comment 3 Alexey Proskuryakov 2006-01-10 09:52:55 PST
Yes, I can see this with a simple test html and tcpflow.

Would you like to submit the patch for review (by adding ChangeLog notes, attaching the patch as a file 
and setting a review flag)?
Comment 4 Henrik Grubbström 2006-01-10 10:01:50 PST
Created attachment 5599 [details]
Suggested patch for KWQKURL.mm::appendEscapingBadChars()
Comment 5 Henrik Grubbström 2006-01-10 10:15:39 PST
Created attachment 5600 [details]
Now with ChangeLog entry.
Comment 6 Alexey Proskuryakov 2006-01-10 10:33:06 PST
My test case: <http://nypop.com/~ap/webkit/xmlhttp-url-encoding.html> (too lazy to make one that 
doesn't require tcpflow, and no idea about how to make an automated one).
Comment 7 Darin Adler 2006-01-10 15:08:06 PST
Comment on attachment 5599 [details]
Suggested patch for KWQKURL.mm::appendEscapingBadChars()

First, thanks for submitting this patch. It seems quite important to fix this
bug!

But I don't think this fix is correct.

I think we should either be escaping all % characters in the string or not
escaping any. It doesn't seem right to decide whether to escape the % character
based on whether it's followed by u201F (for example) or not.

If we shouldn't be escaping "%" characters in this case, then we need to change
KURL behavior, perhaps adding a parameter if we don't always want that
behavior.
Comment 8 Darin Adler 2006-01-10 19:49:46 PST
Comment on attachment 5600 [details]
Now with ChangeLog entry.

OK, but my review comment stands (see comment 7 above). I don't think this is
the correct fix.
Comment 9 Henrik Grubbström 2006-01-11 01:39:04 PST
I agree that having DWIM's like appendEscapingBadChars() is generally a bad idea
to begin with, but now that it's there, it should either be fixed or removed
entirely (which might break code that doesn't perform escaping properly, and
thus relies on it being there). The alternative of escaping all "%" characters
would break compatibility with eg Firefox even more.
Comment 10 Alexey Proskuryakov 2006-01-11 05:05:32 PST
I have updated the test case a bit to show more different cases. In the test, both MSIE and Firefox never 
escape % characters (MSIE doesn't even escape non-ASCII characters, actually).

Looks like the DWIM code first appeared in a massive KURL rewrite between revisions 1905 and 1906; so 
it's not entirely obvious if it serves any purpose.
Comment 11 Alexey Proskuryakov 2006-02-14 12:21:52 PST
Created attachment 6486 [details]
don't escape % characters

I don't really see how to check all code paths that end up in appendEscapingBadChars()... There is a regression test included in the patch, and "everything else" still seems to work fine - which is admittedly less than a perfect evidence.
Comment 12 Maciej Stachowiak 2006-02-14 18:05:24 PST
Comment on attachment 6486 [details]
don't escape % characters

Based on your test case I think never escaping % is the right thing. I think I am the one who did the rewrite and I didn't have a specific reason for escaping %.

r=me
Comment 13 Brett Wilson (Google) 2007-09-26 13:25:54 PDT
Note that IE7 changes this behavior. It rejects all invalid escape sequences and will not load the URL. This includes the %uXXXX form.
Comment 14 Lucas Forschler 2019-02-06 09:03:07 PST
Mass moving XML DOM bugs to the "DOM" Component.