Bug 19079 - Send the submissions character encoding in hidden _charset_ field
Summary: Send the submissions character encoding in hidden _charset_ field
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Forms (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Enhancement
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2008-05-15 10:19 PDT by Roger Peters
Modified: 2011-11-07 00:10 PST (History)
10 users (show)

See Also:


Attachments
proposed patch (15.16 KB, patch)
2011-11-01 00:33 PDT, Vineet Chaudhary (vineetc)
tkent: review-
Details | Formatted Diff | Diff
Updated Patch (22.71 KB, patch)
2011-11-04 02:42 PDT, Vineet Chaudhary (vineetc)
darin: review-
Details | Formatted Diff | Diff
Updated patch as per review comments (23.75 KB, patch)
2011-11-04 11:51 PDT, Vineet Chaudhary (vineetc)
darin: review+
Details | Formatted Diff | Diff
Updated patch as per review comments (23.57 KB, patch)
2011-11-04 12:09 PDT, Vineet Chaudhary (vineetc)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Roger Peters 2008-05-15 10:19:20 PDT
Most browsers (at least the newest IE, Opera, and Firefox) will send the character encoding that a form was submitted in in a hidden _charset_ field if it exists. It would be useful if webkit also used this, since it seems to be the easiest way to detect the submission character encoding.

Just for reference, the Mozilla bug report about this is at https://bugzilla.mozilla.org/show_bug.cgi?id=18643
Comment 1 Mark Rowe (bdash) 2008-05-15 20:57:33 PDT
<rdar://problem/5940816>
Comment 2 Alexey Proskuryakov 2011-10-27 09:05:14 PDT
There hasn't been much movement on this bug, but this seems like an important feature to implement.
Comment 3 Vineet Chaudhary (vineetc) 2011-11-01 00:33:38 PDT
Created attachment 113138 [details]
proposed patch

This initial patch will enable to send the character encoding that a form was submitted in a hidden _charset_ field if it exists.
Please let me know your review comments on this.
Comment 4 Kent Tamura 2011-11-01 22:15:20 PDT
Comment on attachment 113138 [details]
proposed patch

View in context: https://bugs.webkit.org/attachment.cgi?id=113138&action=review

> Source/WebCore/loader/FormSubmission.cpp:188
> +            if (equalIgnoringCase(input->name(), "_charset_") && input->isInputTypeHidden())
> +                    input->setValueForUser(domFormData->encoding().name());

* Wrong indentation
* According to the specification, we should set the character encoding only if the hidden input has no value attribute.
* I don't think we should update HTMLInputELement::value.  How does IE work?
* It's dangerous to use setValueForUser(), which dispatches an event and execute JavaScript code, which might update the associated element list.

> LayoutTests/ChangeLog:14
> +        * http/tests/misc/char_encoding_in_hidden_charset_field_01-expected.txt: Added.
> +        * http/tests/misc/char_encoding_in_hidden_charset_field_01.html: Added. For Default Encoding.

We usually use '-' to concatenate words, not '_'.

Please assign meaningful names instead of numbers.  e.g. char_encoding_in_hidden_charset_field_02.html should be char-encoding-in-hidden-charset-field-with-accept-charset.html

We should have tests for Chinese/Japanese encodings such as Big5, Shift_JIS, EUC-JP, ISO-2022-JP.  This feature is important in these countries.
Comment 5 Vineet Chaudhary (vineetc) 2011-11-02 04:41:32 PDT
(In reply to comment #4)
> (From update of attachment 113138 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=113138&action=review
> 
> > Source/WebCore/loader/FormSubmission.cpp:188
> > +            if (equalIgnoringCase(input->name(), "_charset_") && input->isInputTypeHidden())
> > +                    input->setValueForUser(domFormData->encoding().name());
> 

> * I don't think we should update HTMLInputELement::value.  How does IE work?
Even IE also modifies the value of input field.

> * It's dangerous to use setValueForUser(), which dispatches an event and execute JavaScript code, which might update the associated element list.
> * According to the specification, we should set the character encoding only if the hidden input has no value attribute.
I tested with IE, Mozilla and Opera behavior, but none of them respect "value" if inputType is hidden, they all sets respective encoding types here.
Please let me know if we want different behavior from these.
Comment 6 Kent Tamura 2011-11-02 06:23:23 PDT
(In reply to comment #5)
> (In reply to comment #4)
> > (From update of attachment 113138 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=113138&action=review
> > 
> > > Source/WebCore/loader/FormSubmission.cpp:188
> > > +            if (equalIgnoringCase(input->name(), "_charset_") && input->isInputTypeHidden())
> > > +                    input->setValueForUser(domFormData->encoding().name());
> > 
> 
> > * I don't think we should update HTMLInputELement::value.  How does IE work?
> Even IE also modifies the value of input field.

Please investigate the behavior of other browsers correctly.
I have just checked the behavior of IE9, Firefox 7, and Opera 11.51. All of them didn't care about the existence of value attribute, and didn't update HTMLInputElement::value.
We can check this behavior by a simple HTML like the following, and inspect the form.

<form action="..." target="_blank">
<input type=hidden name=_charset_ value=foo>
<input type=submit>
</form>


> > * According to the specification, we should set the character encoding only if the hidden input has no value attribute.
> I tested with IE, Mozilla and Opera behavior, but none of them respect "value" if inputType is hidden, they all sets respective encoding types here.
> Please let me know if we want different behavior from these.

We should update the specification first.
Comment 7 Vineet Chaudhary (vineetc) 2011-11-02 20:37:01 PDT
(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #4)
 
> > > * I don't think we should update HTMLInputELement::value.  How does IE work?
> > Even IE also modifies the value of input field.
> > I tested with IE, Mozilla and Opera behavior, but none of them respect "value" if inputType is hidden, they all sets respective encoding types here.
Please let me know if we want different behavior from these.

> Please investigate the behavior of other browsers correctly.
> I have just checked the behavior of IE9, Firefox 7, and Opera 11.51. All of them didn't care about the existence of value attribute, and didn't update HTMLInputElement::value.

Sorry my mistake :( I meant "IE also modifies the value of input field" as even if you have given value attribute it doesn't respect it and send current encoding type.

> We should update the specification first.

So shall I file a bug on w3c if you haven't already..?
Comment 8 Vineet Chaudhary (vineetc) 2011-11-03 00:08:58 PDT
(In reply to comment #6)
> I have just checked the behavior of IE9, Firefox 7, and Opera 11.51. All of them didn't care about the existence of value attribute, and didn't update HTMLInputElement::value.
> We can check this behavior by a simple HTML like the following, and inspect the form.
> We should update the specification first.

Hello Ian,

Can you please give us feedback on this, because even If value attribute is given  IE9, Firefox 7, and Opera 11.51 none of these respect it. Is it that we should modify spec? or browser behavior?
Comment 9 Vineet Chaudhary (vineetc) 2011-11-04 02:42:08 PDT
Created attachment 113639 [details]
Updated Patch

Updated patch

> * According to the specification, we should set the character encoding only if the hidden input has no value attribute.
As the behavior of IE9, Firefox 7, and Opera 11.51. All of them didn't care about the existence of value attribute I think we can also have same behavior. 
// Let me know if we want different behavior.

> * I don't think we should update HTMLInputELement::value.  How does IE work?
> * It's dangerous to use setValueForUser(), which dispatches an event and execute JavaScript code, which might update the associated element list.

Done : As none of the browser changes the HTMLInputELement::value. Thanks for pointing out!
 
> > LayoutTests/ChangeLog:14
> > +        * http/tests/misc/char_encoding_in_hidden_charset_field_01-expected.txt: Added.
> > +        * http/tests/misc/char_encoding_in_hidden_charset_field_01.html: Added. For Default Encoding.
> 
> We usually use '-' to concatenate words, not '_'.
Done:

> We should have tests for Chinese/Japanese encodings such as Big5, Shift_JIS, EUC-JP, ISO-2022-JP.  This feature is important in these countries.
Done: Added few more test cases for encodings Big5, Shift_JIS, EUC-JP, ISO-2022-JP.
Comment 10 Darin Adler 2011-11-04 09:36:38 PDT
Comment on attachment 113639 [details]
Updated Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=113639&action=review

Seems fine, but implementation is not quite right.

> Source/WebCore/html/InputType.cpp:178
> +    if (equalIgnoringCase(element()->name(), "_charset_") && element()->isInputTypeHidden())

This is the wrong way to code something that’s type specific in InputType objects. Instead, HiddenInputType should override the appendFormData function.

The InputType objects were created to avoid having type-specific code mixed in with type-independent code, making use of isXXX checks like the one here.

> Source/WebCore/html/InputType.cpp:179
> +        encoding.appendData(element()->name(), String(encoding.encoding().name()));

Why is an explicit String() needed here? Did you try compiling without it?
Comment 11 Darin Adler 2011-11-04 09:37:46 PDT
HiddenInputType::appendFormData should check for the special case, and if it’s not the special case call through to InputType::appendFormData.
Comment 12 Vineet Chaudhary (vineetc) 2011-11-04 11:51:53 PDT
Created attachment 113692 [details]
Updated patch as per review comments

Thanks Darin for review.

(In reply to comment #10)
> > Source/WebCore/html/InputType.cpp:178
> > +    if (equalIgnoringCase(element()->name(), "_charset_") && element()->isInputTypeHidden())
> 
> This is the wrong way to code something that’s type specific in InputType objects. Instead, HiddenInputType should override the appendFormData function.
> 
> The InputType objects were created to avoid having type-specific code mixed in with type-independent code, making use of isXXX checks like the one here.

Done.
 
> > Source/WebCore/html/InputType.cpp:179
> > +        encoding.appendData(element()->name(), String(encoding.encoding().name()));
> Why is an explicit String() needed here? Did you try compiling without it?

Yes I tried compiling it without String but it gives compile error as encoding().name() returns char*.

> HiddenInputType::appendFormData should check for the special case, and if it’s not the special case call through to InputType::appendFormData.

Done.
Comment 13 Darin Adler 2011-11-04 11:58:27 PDT
Comment on attachment 113692 [details]
Updated patch as per review comments

View in context: https://bugs.webkit.org/attachment.cgi?id=113692&action=review

> Source/WebCore/html/HiddenInputType.h:38
> +class FormDataList;

This forward declaration is not needed. Not sure why you added it.

> Source/WebCore/html/HiddenInputType.h:55
> +    virtual bool appendFormData(FormDataList& encoding, bool isMultipartForm) const;

Here in the header, the argument name "encoding" is neither helpful nor desirable. Please omit it.
Comment 14 Vineet Chaudhary (vineetc) 2011-11-04 12:09:35 PDT
Created attachment 113696 [details]
Updated patch as per review comments

Sorry for previous patch :(.
Comment 15 WebKit Review Bot 2011-11-04 12:53:37 PDT
Comment on attachment 113696 [details]
Updated patch as per review comments

Clearing flags on attachment: 113696

Committed r99310: <http://trac.webkit.org/changeset/99310>
Comment 16 WebKit Review Bot 2011-11-04 12:53:42 PDT
All reviewed patches have been landed.  Closing bug.
Comment 17 Kent Tamura 2011-11-07 00:10:35 PST
Comment on attachment 113696 [details]
Updated patch as per review comments

View in context: https://bugs.webkit.org/attachment.cgi?id=113696&action=review

> LayoutTests/http/tests/misc/char-encoding-in-hidden-charset-field-with-Big5.html:11
> +<form id="testForm" action="resources/char-encoding-in-hidden-charset-field.php" method="post" accept-charset="Big5">

Specifying accept-charset is not an interesting test.
I wanted to test that _charset_ had an auto-detected encoding. But it's hard because encoding detector is off by default.

I checked the behavior of _charset_ in Chromium browser, and confirmed it had an auto-detected encoding correctly.