RESOLVED INVALID Bug 113001
Some Hebrew diacritics get messed up on form submission
https://bugs.webkit.org/show_bug.cgi?id=113001
Summary Some Hebrew diacritics get messed up on form submission
Konstantin
Reported 2013-03-21 21:40:48 PDT
Created attachment 194439 [details] Source of PHP script to reproduce the problem When I submit any form which has a text field which contains Hebrew diacritics U+05BC ("dagesh") and U+05B6 ("segol"), in this order, they get submitted to the server in the *opposite* order: U+05B6, U+05BC . While Hebrew word seems "same" visually, this "fixed" order is invalid (or at least non-standard), and regardless, browser obviously shouldn't change data entered into the form on its own, under any circumstances. To demonstrate this issue, I wrote a simple PHP script (attached, and available online at http://zapad.org/~ignatiev/temp/w4.php), which allows user to fill a text field and then upon form submission to compare user input with what was actually submitted (via simple hash sum JavaScript implementation). You can play with it and see that it works fine for almost any text in any language you can enter. If, however, you use button "initialize", script will initialize the text field to the string '\u05d1\u05bc\u05b5' (bet-dagesh-segol), and upon form submission the comparison test will FAIL; value submitted will be '\u05d1\u05b5\u05bc' bet-segol-dagesh. This problem is reproducible in any WebKit-based browser I tried (Chrome Windows/Mac, Safari Mac/Windows/iPhone, Debian 6 "Web browser", also on the latest "nightly build", compiled from source on Linux/GTK), while it works fine in IE, Firefox, and (Presto-based) Opera.
Attachments
Source of PHP script to reproduce the problem (1.64 KB, text/html)
2013-03-21 21:40 PDT, Konstantin
no flags
Alexey Proskuryakov
Comment 1 2013-03-26 11:58:08 PDT
> this "fixed" order is invalid (or at least non-standard) In fact, '\u05d1\u05bc\u05b5' is not properly normalized - both NFC and NFD forms for this string are '\u05d1\u05b5\u05bc'. Please see <http://unicode.org/reports/tr15/> for discussion of Unicode normalization forms. Overall, this is expected behavior. The reason why we normalize to NFC when sending for text is compatibility - since Windows uses NFC everywhere, there can be subtle errors when the text sent from WebKit gets processed by systems that don't work with decomposed text well. I can see how in this specific case WebKit becomes an outlier, but this is the cost of being like other browsers in more common cases.
Alexey Proskuryakov
Comment 2 2013-07-31 09:46:15 PDT
*** Bug 119320 has been marked as a duplicate of this bug. ***
Note You need to log in before you can comment on or make changes to this bug.