Bug 119320 - Do not normalize into NFC the values of form fields
Summary: Do not normalize into NFC the values of form fields
Status: RESOLVED DUPLICATE of bug 113001
Alias: None
Product: WebKit
Classification: Unclassified
Component: Forms (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: BlinkMergeCandidate
Depends on:
Blocks:
 
Reported: 2013-07-30 21:53 PDT by Ryosuke Niwa
Modified: 2013-08-12 08:41 PDT (History)
3 users (show)

See Also:


Attachments
Patch (42.38 KB, patch)
2013-08-11 06:28 PDT, Daniel Trebbien
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ryosuke Niwa 2013-07-30 21:53:22 PDT
Consider merging https://chromium.googlesource.com/chromium/blink/+/c167d111dd8a44da334d74e4b6608811b465945d

This commit renames the TextEncoding::encode(const String&, UnencodableHandling) const
API function to normalizeAndEncode() to make clear that the function
first applies Unicode NFC normalization before encoding the string. All
existing references to encode() except for one were updated to call
normalizeAndEncode() instead.

TextEncoding::encode(const String&, UnencodableHandling) const is added
back as a new API function which only encodes the string, but does not
normalize the string before encoding.

The one call to the old encode() function that was not updated is in
FormDataList::appendString(const String&). This was left as a call to
the new encode() to fix Issue 117128.

Chrome and Safari are unlike other browsers in that they apply Unicode
NFC normalization to form values when submitting a form; in particular,
the following browsers were tested and found not to normalize the form
values:
- Firefox 22.0
- Firefox ESR 17.0.7
- Opera 12.16
- IE 6
- IE 7
- IE 8
- IE 9
- IE 10
- Amaya 11.4.7

NFC normalization actually changes the meaning of text in certain scripts.
Notably, there are certain Biblical Hebrew words for which normalization
causes the word to be erroneously encoded. One example is given on page 9
of the SBL Hebrew Font User Manual version 1.5:
http://www.sbl-site.org/Fonts/SBLHebrewManualv1.5.pdf
This example is added as the new form-data-encoding-3.html layout test.
Comment 1 Alexey Proskuryakov 2013-07-31 09:46:15 PDT
> NFC normalization actually changes the meaning of text in certain scripts.

This is a mistake and should never be relied upon.

*** This bug has been marked as a duplicate of bug 113001 ***
Comment 2 Daniel Trebbien 2013-08-11 06:28:17 PDT
Created attachment 208491 [details]
Patch

I think that this bug should be re-opened now that Safari is the only major browser which normalizes form values.

Bug 8769 cites the charmod-norm spec. Note that not normalizing form submission values is "perfectly acceptable" according to C309 (http://www.w3.org/TR/charmod-norm/#C309) because the browser can be viewed as the "producer", and the server to which the form data is being submitted can be viewed as the "remote component ... to which normalization is delegated". The server should be able to decide whether normalization is performed or not, and to which values.
Comment 3 Alexey Proskuryakov 2013-08-11 09:27:49 PDT
WebKit has always been the only browser engine to do this, it's not a new development.
Comment 4 Daniel Trebbien 2013-08-11 09:52:44 PDT
(In reply to comment #3)
> WebKit has always been the only browser engine to do this, it's not a new development.

Well, I mean that Chrome/Blink will soon not have this bug.  In fact, the latest Chromium nightlies and Chrome Canary builds do not have this bug.

What is the reason for keeping this behavior?  It's not mandated by a spec, and it shouldn't be required for compatibility with Windows because Internet Explorer 6+ does not normalize.

I just checked IE 11 Preview and form values are not normalized.
Comment 5 Alexey Proskuryakov 2013-08-12 08:41:26 PDT
This is already discussed in the original. In any case, please keep the discussion to the original bug, to keep it in one place.