Most browsers (at least the newest IE, Opera, and Firefox) will send the character encoding that a form was submitted in in a hidden _charset_ field if it exists. It would be useful if webkit also used this, since it seems to be the easiest way to detect the submission character encoding. Just for reference, the Mozilla bug report about this is at https://bugzilla.mozilla.org/show_bug.cgi?id=18643
<rdar://problem/5940816>
There hasn't been much movement on this bug, but this seems like an important feature to implement.
Created attachment 113138 [details] proposed patch This initial patch will enable to send the character encoding that a form was submitted in a hidden _charset_ field if it exists. Please let me know your review comments on this.
Comment on attachment 113138 [details] proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=113138&action=review > Source/WebCore/loader/FormSubmission.cpp:188 > + if (equalIgnoringCase(input->name(), "_charset_") && input->isInputTypeHidden()) > + input->setValueForUser(domFormData->encoding().name()); * Wrong indentation * According to the specification, we should set the character encoding only if the hidden input has no value attribute. * I don't think we should update HTMLInputELement::value. How does IE work? * It's dangerous to use setValueForUser(), which dispatches an event and execute JavaScript code, which might update the associated element list. > LayoutTests/ChangeLog:14 > + * http/tests/misc/char_encoding_in_hidden_charset_field_01-expected.txt: Added. > + * http/tests/misc/char_encoding_in_hidden_charset_field_01.html: Added. For Default Encoding. We usually use '-' to concatenate words, not '_'. Please assign meaningful names instead of numbers. e.g. char_encoding_in_hidden_charset_field_02.html should be char-encoding-in-hidden-charset-field-with-accept-charset.html We should have tests for Chinese/Japanese encodings such as Big5, Shift_JIS, EUC-JP, ISO-2022-JP. This feature is important in these countries.
(In reply to comment #4) > (From update of attachment 113138 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=113138&action=review > > > Source/WebCore/loader/FormSubmission.cpp:188 > > + if (equalIgnoringCase(input->name(), "_charset_") && input->isInputTypeHidden()) > > + input->setValueForUser(domFormData->encoding().name()); > > * I don't think we should update HTMLInputELement::value. How does IE work? Even IE also modifies the value of input field. > * It's dangerous to use setValueForUser(), which dispatches an event and execute JavaScript code, which might update the associated element list. > * According to the specification, we should set the character encoding only if the hidden input has no value attribute. I tested with IE, Mozilla and Opera behavior, but none of them respect "value" if inputType is hidden, they all sets respective encoding types here. Please let me know if we want different behavior from these.
(In reply to comment #5) > (In reply to comment #4) > > (From update of attachment 113138 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=113138&action=review > > > > > Source/WebCore/loader/FormSubmission.cpp:188 > > > + if (equalIgnoringCase(input->name(), "_charset_") && input->isInputTypeHidden()) > > > + input->setValueForUser(domFormData->encoding().name()); > > > > > * I don't think we should update HTMLInputELement::value. How does IE work? > Even IE also modifies the value of input field. Please investigate the behavior of other browsers correctly. I have just checked the behavior of IE9, Firefox 7, and Opera 11.51. All of them didn't care about the existence of value attribute, and didn't update HTMLInputElement::value. We can check this behavior by a simple HTML like the following, and inspect the form. <form action="..." target="_blank"> <input type=hidden name=_charset_ value=foo> <input type=submit> </form> > > * According to the specification, we should set the character encoding only if the hidden input has no value attribute. > I tested with IE, Mozilla and Opera behavior, but none of them respect "value" if inputType is hidden, they all sets respective encoding types here. > Please let me know if we want different behavior from these. We should update the specification first.
(In reply to comment #6) > (In reply to comment #5) > > (In reply to comment #4) > > > * I don't think we should update HTMLInputELement::value. How does IE work? > > Even IE also modifies the value of input field. > > I tested with IE, Mozilla and Opera behavior, but none of them respect "value" if inputType is hidden, they all sets respective encoding types here. Please let me know if we want different behavior from these. > Please investigate the behavior of other browsers correctly. > I have just checked the behavior of IE9, Firefox 7, and Opera 11.51. All of them didn't care about the existence of value attribute, and didn't update HTMLInputElement::value. Sorry my mistake :( I meant "IE also modifies the value of input field" as even if you have given value attribute it doesn't respect it and send current encoding type. > We should update the specification first. So shall I file a bug on w3c if you haven't already..?
(In reply to comment #6) > I have just checked the behavior of IE9, Firefox 7, and Opera 11.51. All of them didn't care about the existence of value attribute, and didn't update HTMLInputElement::value. > We can check this behavior by a simple HTML like the following, and inspect the form. > We should update the specification first. Hello Ian, Can you please give us feedback on this, because even If value attribute is given IE9, Firefox 7, and Opera 11.51 none of these respect it. Is it that we should modify spec? or browser behavior?
Created attachment 113639 [details] Updated Patch Updated patch > * According to the specification, we should set the character encoding only if the hidden input has no value attribute. As the behavior of IE9, Firefox 7, and Opera 11.51. All of them didn't care about the existence of value attribute I think we can also have same behavior. // Let me know if we want different behavior. > * I don't think we should update HTMLInputELement::value. How does IE work? > * It's dangerous to use setValueForUser(), which dispatches an event and execute JavaScript code, which might update the associated element list. Done : As none of the browser changes the HTMLInputELement::value. Thanks for pointing out! > > LayoutTests/ChangeLog:14 > > + * http/tests/misc/char_encoding_in_hidden_charset_field_01-expected.txt: Added. > > + * http/tests/misc/char_encoding_in_hidden_charset_field_01.html: Added. For Default Encoding. > > We usually use '-' to concatenate words, not '_'. Done: > We should have tests for Chinese/Japanese encodings such as Big5, Shift_JIS, EUC-JP, ISO-2022-JP. This feature is important in these countries. Done: Added few more test cases for encodings Big5, Shift_JIS, EUC-JP, ISO-2022-JP.
Comment on attachment 113639 [details] Updated Patch View in context: https://bugs.webkit.org/attachment.cgi?id=113639&action=review Seems fine, but implementation is not quite right. > Source/WebCore/html/InputType.cpp:178 > + if (equalIgnoringCase(element()->name(), "_charset_") && element()->isInputTypeHidden()) This is the wrong way to code something that’s type specific in InputType objects. Instead, HiddenInputType should override the appendFormData function. The InputType objects were created to avoid having type-specific code mixed in with type-independent code, making use of isXXX checks like the one here. > Source/WebCore/html/InputType.cpp:179 > + encoding.appendData(element()->name(), String(encoding.encoding().name())); Why is an explicit String() needed here? Did you try compiling without it?
HiddenInputType::appendFormData should check for the special case, and if it’s not the special case call through to InputType::appendFormData.
Created attachment 113692 [details] Updated patch as per review comments Thanks Darin for review. (In reply to comment #10) > > Source/WebCore/html/InputType.cpp:178 > > + if (equalIgnoringCase(element()->name(), "_charset_") && element()->isInputTypeHidden()) > > This is the wrong way to code something that’s type specific in InputType objects. Instead, HiddenInputType should override the appendFormData function. > > The InputType objects were created to avoid having type-specific code mixed in with type-independent code, making use of isXXX checks like the one here. Done. > > Source/WebCore/html/InputType.cpp:179 > > + encoding.appendData(element()->name(), String(encoding.encoding().name())); > Why is an explicit String() needed here? Did you try compiling without it? Yes I tried compiling it without String but it gives compile error as encoding().name() returns char*. > HiddenInputType::appendFormData should check for the special case, and if it’s not the special case call through to InputType::appendFormData. Done.
Comment on attachment 113692 [details] Updated patch as per review comments View in context: https://bugs.webkit.org/attachment.cgi?id=113692&action=review > Source/WebCore/html/HiddenInputType.h:38 > +class FormDataList; This forward declaration is not needed. Not sure why you added it. > Source/WebCore/html/HiddenInputType.h:55 > + virtual bool appendFormData(FormDataList& encoding, bool isMultipartForm) const; Here in the header, the argument name "encoding" is neither helpful nor desirable. Please omit it.
Created attachment 113696 [details] Updated patch as per review comments Sorry for previous patch :(.
Comment on attachment 113696 [details] Updated patch as per review comments Clearing flags on attachment: 113696 Committed r99310: <http://trac.webkit.org/changeset/99310>
All reviewed patches have been landed. Closing bug.
Comment on attachment 113696 [details] Updated patch as per review comments View in context: https://bugs.webkit.org/attachment.cgi?id=113696&action=review > LayoutTests/http/tests/misc/char-encoding-in-hidden-charset-field-with-Big5.html:11 > +<form id="testForm" action="resources/char-encoding-in-hidden-charset-field.php" method="post" accept-charset="Big5"> Specifying accept-charset is not an interesting test. I wanted to test that _charset_ had an auto-detected encoding. But it's hard because encoding detector is off by default. I checked the behavior of _charset_ in Chromium browser, and confirmed it had an auto-detected encoding correctly.