Bug 28970 - content-type parameters not taken into account when building form-data
Summary: content-type parameters not taken into account when building form-data
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Inspector (Deprecated) (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-09-04 09:02 PDT by Patrick Mueller
Modified: 2009-09-25 11:37 PDT (History)
4 users (show)

See Also:


Attachments
the red circles show the charset parameter on the header, and the result interpretation as not being form data, when it actually is (42.00 KB, image/png)
2009-09-04 09:02 PDT, Patrick Mueller
no flags Details
shows error message when error encountered attempting to decode value (21.59 KB, image/png)
2009-09-24 13:28 PDT, Patrick Mueller
no flags Details
proposed patch - 2009/09/25 (40.65 KB, patch)
2009-09-25 10:29 PDT, Patrick Mueller
timothy: review-
Details | Formatted Diff | Diff
proposed patch - 2009/09/25 - 2 (40.78 KB, patch)
2009-09-25 11:13 PDT, Patrick Mueller
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick Mueller 2009-09-04 09:02:00 PDT
Created attachment 39062 [details]
the red circles show the charset parameter on the header, and the result interpretation as not being form data, when it actually is

In the new support for displaying form data in the resource view in the Resources Panel, the code is currently looking for a Content-Type of "application/x-www-form-urlencoded" to trigger the formatting of the request data into a set of form data elements.  This isn't good enough.  In the wild, it's been spotted that gmail sends requests with "form data" using a Content-Type header of "application/x-www-form-urlencoded;charset=utf-8".

Two obvious things to look at:

- change the code to be more flexible in recognizing the content type by doing a comparison after stripping the header value parameters, in this case, "charset=utf-8".

- do we need to do some additional decoding/encoding based on that charset value?  Seems a bit nightmarish, since it may be dependent on what text encoding the browser is currently using also.  I've also never been clear on how encoding of query string parameters in the URL itself works, since there's no way to denote the charset there.

Another thing to look at is if there are other content types in use.  A search on "application/www-form-urlencoded", for instance, yields the following:

   http://www.mail-archive.com/www-archive@w3.org/msg00187.html

Is this used in the wild?  Doesn't appear to be, but should do some additional research.
Comment 1 Timothy Hatcher 2009-09-04 09:07:54 PDT
The string will/should already be decoded from the UTF-8 form before you get it, so that shouldn't be an issue. Easy to test…
Comment 2 Patrick Mueller 2009-09-04 09:56:30 PDT
I'm less worried about UTF-8 than I am if the charset indicates Shift JIS or the (old?) default windows charset.  

I'll do a little research, but my initial thought is that we don't do anything regarding the actual charset encoding (as you suggest).  Absolutely worse case, the raw url-encoded form is still available.
Comment 3 Patrick Mueller 2009-09-04 10:11:55 PDT
Just happened upon this page from the what-wg irc channel, which appears to contain some sites to test:

   http://philip.html5.org/data/charsets-2.html#charset-shift_jis

Here's one I tried:

   http://hollywood-fm.com/jp/style/request/index.html

In the form, I filled all the fields in with 

    平 素 よ り

(hope that's SFW) and then pressed the gray button below the form.  On the page the form sent me to, I opened up Web Inspector, and the Resource view for the main page "sendmail.php" results in an empty white area.  Opening a web inspector on that, an error was logged:

   URIError: URI error        ResourceView.js:203

which is the line:

         val = decodeURIComponent(val).replace(/\+/g, " ");

At a minimum, we need to provide some try/catch around the the decodeURIComponent() call, apparently.
Comment 4 Timothy Hatcher 2009-09-04 10:19:26 PDT
What is URIError?
Comment 5 Patrick Mueller 2009-09-04 11:32:16 PDT
Try running

   decodeURIComponent("%")

and you'll see :-)
Comment 6 Patrick Mueller 2009-09-21 13:02:55 PDT
I wrapped the decodeURIcomponent() call with a try/catch, and just left the code as is, so that if the query string values can't be decoded, you'll only ever see the encoded values.  This allows the resource data to be displayed, instead of getting the "white div of death".

Here's some other info.

Pasting the following text into the form fields: "フリガナ" yields the following values in the form data section: 

   %83t%83%8A%83K%83i 

From the console, running encodeURIComponent("フリガナ") yields the following:

   %E3%83%95%E3%83%AA%E3%82%AC%E3%83%8A

Just based on the length of the strings, I'm guessing the first is the shift_jis encoding of the original string.  There's nothing in the request headers indicating a character coding being requested, but in the original page (the one we entered the form data on), the Content-Type header returned for the server was:

   text/html; charset=Shift_JIS

I think the thing we'll need to do, to handle this completely, is to have charset translations available, so that web inspector code can perform the translation of the string actually received (first string %83t... above) into unicode for display for the user.  Rather than have to guess at what the character encoding is, we will also need to have this information available.  This is the effective encoding used on the original page that submitted the request.

It appears the encoding may be available for the request via m_responseContentDispositionEncodingFallbackArray field of ResourceRequestBase (in platform/network), but the fact that there are up to three values in there makes me a little worried.  We may still have to guess.

At this point, not clear if it's useful to even proceed beyond adding the try/catch to prevent the "white div of death".  Are folks generally moving to UTF-8 based charsets?  Are things like Shift JIS encoding important to support for debugging purposes like this?  I can't see how a developer could ever do anything useful with these strings, programmatically, as there is no charset conversion facility generally available in JS.  

I'm opting for performing no heroics - try converting using the existing decodeURIComponent() and if that fails only show the encoded form.  Should we inform the user of the inability to convert?  Where - console message?
Comment 7 Timothy Hatcher 2009-09-21 15:05:20 PDT
JUst do the try/catch is fine. We will see if anyone complains/notices.
Comment 8 Patrick Mueller 2009-09-24 13:28:21 PDT
Created attachment 40079 [details]
shows error message when error encountered attempting to decode value

I'm now catching the decodeURIComponent() exception, the question is what to do when we encounter it.

Currently I'm displaying the encoded value where you would be expecting the decoded value - I figure something is better than nothing - but I also display a red error message beside the value: "(unable to decode value)".  

The attached image show an example.  

Good enough?
Comment 9 Timothy Hatcher 2009-09-24 13:46:54 PDT
I like that.
Comment 10 Patrick Mueller 2009-09-25 10:29:46 PDT
Created attachment 40117 [details]
proposed patch - 2009/09/25

Couple of fixes in here:

- check for form-encoded payload is more lenient per HTTP 1.1 Content-Type spec

- previously only decoded + if there was % in a value - should happen regardless

- trap exception when decoding % encodings, display error message in value when exception occurs
Comment 11 Timothy Hatcher 2009-09-25 10:45:47 PDT
Comment on attachment 40117 [details]
proposed patch - 2009/09/25

Nice work! Just fix up these little things.

> +                if (val.indexOf("%") >= 0)
> +                    try {
> +                        val = decodeURIComponent(val);
> +                    } catch(e) {
> +                        errorDecoding = true;
> +                    }

This should have braces around it since it is multiline.


> +                val = val.replace(/\+/g, " ");
> +            valEscaped = val.escapeHTML();

I know "val" isn't new, but it really should have not been abbrivated in the first place per our style guide. Same goes for valEscaped.


> +                valEscaped += " <span class=\"error-message\">" + WebInspector.UIString("(unable to decode value)") + "</span>";

I think the UIString should have ".escapeHTML()" also, sicne the localization could use HTML special characters.
Comment 12 Patrick Mueller 2009-09-25 11:13:09 PDT
Created attachment 40126 [details]
proposed patch - 2009/09/25 - 2

issues resolved; renamed val and valEscaped to value and valueEscaped
Comment 13 WebKit Commit Bot 2009-09-25 11:37:04 PDT
Comment on attachment 40126 [details]
proposed patch - 2009/09/25 - 2

Clearing flags on attachment: 40126

Committed r48763: <http://trac.webkit.org/changeset/48763>
Comment 14 WebKit Commit Bot 2009-09-25 11:37:09 PDT
All reviewed patches have been landed.  Closing bug.