Created attachment 39062 [details] the red circles show the charset parameter on the header, and the result interpretation as not being form data, when it actually is In the new support for displaying form data in the resource view in the Resources Panel, the code is currently looking for a Content-Type of "application/x-www-form-urlencoded" to trigger the formatting of the request data into a set of form data elements. This isn't good enough. In the wild, it's been spotted that gmail sends requests with "form data" using a Content-Type header of "application/x-www-form-urlencoded;charset=utf-8". Two obvious things to look at: - change the code to be more flexible in recognizing the content type by doing a comparison after stripping the header value parameters, in this case, "charset=utf-8". - do we need to do some additional decoding/encoding based on that charset value? Seems a bit nightmarish, since it may be dependent on what text encoding the browser is currently using also. I've also never been clear on how encoding of query string parameters in the URL itself works, since there's no way to denote the charset there. Another thing to look at is if there are other content types in use. A search on "application/www-form-urlencoded", for instance, yields the following: http://www.mail-archive.com/www-archive@w3.org/msg00187.html Is this used in the wild? Doesn't appear to be, but should do some additional research.
The string will/should already be decoded from the UTF-8 form before you get it, so that shouldn't be an issue. Easy to test…
I'm less worried about UTF-8 than I am if the charset indicates Shift JIS or the (old?) default windows charset. I'll do a little research, but my initial thought is that we don't do anything regarding the actual charset encoding (as you suggest). Absolutely worse case, the raw url-encoded form is still available.
Just happened upon this page from the what-wg irc channel, which appears to contain some sites to test: http://philip.html5.org/data/charsets-2.html#charset-shift_jis Here's one I tried: http://hollywood-fm.com/jp/style/request/index.html In the form, I filled all the fields in with 平 素 よ り (hope that's SFW) and then pressed the gray button below the form. On the page the form sent me to, I opened up Web Inspector, and the Resource view for the main page "sendmail.php" results in an empty white area. Opening a web inspector on that, an error was logged: URIError: URI error ResourceView.js:203 which is the line: val = decodeURIComponent(val).replace(/\+/g, " "); At a minimum, we need to provide some try/catch around the the decodeURIComponent() call, apparently.
What is URIError?
Try running decodeURIComponent("%") and you'll see :-)
I wrapped the decodeURIcomponent() call with a try/catch, and just left the code as is, so that if the query string values can't be decoded, you'll only ever see the encoded values. This allows the resource data to be displayed, instead of getting the "white div of death". Here's some other info. Pasting the following text into the form fields: "フリガナ" yields the following values in the form data section: %83t%83%8A%83K%83i From the console, running encodeURIComponent("フリガナ") yields the following: %E3%83%95%E3%83%AA%E3%82%AC%E3%83%8A Just based on the length of the strings, I'm guessing the first is the shift_jis encoding of the original string. There's nothing in the request headers indicating a character coding being requested, but in the original page (the one we entered the form data on), the Content-Type header returned for the server was: text/html; charset=Shift_JIS I think the thing we'll need to do, to handle this completely, is to have charset translations available, so that web inspector code can perform the translation of the string actually received (first string %83t... above) into unicode for display for the user. Rather than have to guess at what the character encoding is, we will also need to have this information available. This is the effective encoding used on the original page that submitted the request. It appears the encoding may be available for the request via m_responseContentDispositionEncodingFallbackArray field of ResourceRequestBase (in platform/network), but the fact that there are up to three values in there makes me a little worried. We may still have to guess. At this point, not clear if it's useful to even proceed beyond adding the try/catch to prevent the "white div of death". Are folks generally moving to UTF-8 based charsets? Are things like Shift JIS encoding important to support for debugging purposes like this? I can't see how a developer could ever do anything useful with these strings, programmatically, as there is no charset conversion facility generally available in JS. I'm opting for performing no heroics - try converting using the existing decodeURIComponent() and if that fails only show the encoded form. Should we inform the user of the inability to convert? Where - console message?
JUst do the try/catch is fine. We will see if anyone complains/notices.
Created attachment 40079 [details] shows error message when error encountered attempting to decode value I'm now catching the decodeURIComponent() exception, the question is what to do when we encounter it. Currently I'm displaying the encoded value where you would be expecting the decoded value - I figure something is better than nothing - but I also display a red error message beside the value: "(unable to decode value)". The attached image show an example. Good enough?
I like that.
Created attachment 40117 [details] proposed patch - 2009/09/25 Couple of fixes in here: - check for form-encoded payload is more lenient per HTTP 1.1 Content-Type spec - previously only decoded + if there was % in a value - should happen regardless - trap exception when decoding % encodings, display error message in value when exception occurs
Comment on attachment 40117 [details] proposed patch - 2009/09/25 Nice work! Just fix up these little things. > + if (val.indexOf("%") >= 0) > + try { > + val = decodeURIComponent(val); > + } catch(e) { > + errorDecoding = true; > + } This should have braces around it since it is multiline. > + val = val.replace(/\+/g, " "); > + valEscaped = val.escapeHTML(); I know "val" isn't new, but it really should have not been abbrivated in the first place per our style guide. Same goes for valEscaped. > + valEscaped += " <span class=\"error-message\">" + WebInspector.UIString("(unable to decode value)") + "</span>"; I think the UIString should have ".escapeHTML()" also, sicne the localization could use HTML special characters.
Created attachment 40126 [details] proposed patch - 2009/09/25 - 2 issues resolved; renamed val and valEscaped to value and valueEscaped
Comment on attachment 40126 [details] proposed patch - 2009/09/25 - 2 Clearing flags on attachment: 40126 Committed r48763: <http://trac.webkit.org/changeset/48763>
All reviewed patches have been landed. Closing bug.