Created attachment 41225 [details] zipped attachements 1. Safari 4.0.3 (531.9.1) XMLHttpRequest send incorect POST data or have wrong unicode map. 2. Run test.php5 to generate and store to disk binary files. 3. Results a) bin_explorer_8.0.7600.16385.dat bin_firefox_3.5.3.dat bin_konqueror_4.3.2.dat bin_opera_10.00.1750.dat are binary the same. b) bin_safari_4.0.3.531.9.1.dat bin_chrome_3.0.195.27.dat are wrong binary!!! Please read about UTF-8 encoding to binary representation. 4. Summary Server-side languages receive incorrect data from Safari & Chrome. Only characters from javascript map array (test.php5) are invalid encoded. Explorer, Opera, Konqueror and Firefox send perfect data. --- code --- <?php if($_SERVER['QUERY_STRING']){ file_put_contents('bin.dat',file_get_contents('php://input')); } ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <meta http-equiv="content-type" content="text/html;charset=utf-8" /> <script type="text/javascript">//<![CDATA[ map=[832,833,835,836,884,894,903,2392,2393,2394,2395,2396,2397,2398,2399,2524,2525,2527,2611,2614,2649,2650,2651,2654,2908,2909,3907,3917,3922,3927,3932,3945,3955,3957,3958,3960,3969,3987,3997,4002,4007,4012,4025,8049,8051,8053,8055,8057,8059,8061,8123,8126,8137,8139,8147,8155,8163,8171,8174,8175,8185,8187,8189,8192,8193,8486,8490,8491,9001,9002,10972,63744,63745,63746,63747,63748,63749,63750,63751,63752,63753,63754,63755,63756,63757,63758,63759,63760,63761,63762,63763,63764,63765,63766,63767,63768,63769,63770,63771,63772,63773,63774,63775,63776,63777,63778,63779,63780,63781,63782,63783,63784,63785,63786,63787,63788,63789,63790,63791,63792,63793,63794,63795,63796,63797,63798,63799,63800,63801,63802,63803,63804,63805,63806,63807,63808,63809,63810,63811,63812,63813,63814,63815,63816,63817,63818,63819,63820,63821,63822,63823,63824,63825,63826,63827,63828,63829,63830,63831,63832,63833,63834,63835,63836,63837,63838,63839,63840,63841,63842,63843,63844,63845,63846,63847,63848,63849,63850,63851,63852,63853,63854,63855,63856,63857,63858,63859,63860,63861,63862,63863,63864,63865,63866,63867,63868,63869,63870,63871,63872,63873,63874,63875,63876,63877,63878,63879,63880,63881,63882,63883,63884,63885,63886,63887,63888,63889,63890,63891,63892,63893,63894,63895,63896,63897,63898,63899,63900,63901,63902,63903,63904,63905,63906,63907,63908,63909,63910,63911,63912,63913,63914,63915,63916,63917,63918,63919,63920,63921,63922,63923,63924,63925,63926,63927,63928,63929,63930,63931,63932,63933,63934,63935,63936,63937,63938,63939,63940,63941,63942,63943,63944,63945,63946,63947,63948,63949,63950,63951,63952,63953,63954,63955,63956,63957,63958,63959,63960,63961,63962,63963,63964,63965,63966,63967,63968,63969,63970,63971,63972,63973,63974,63975,63976,63977,63978,63979,63980,63981,63982,63983,63984,63985,63986,63987,63988,63989,63990,63991,63992,63993,63994,63995,63996,63997,63998,63999,64000,64001,64002,64003,64004,64005,64006,64007,64008,64009,64010,64011,64012,64013,64016,64018,64021,64022,64023,64024,64025,64026,64027,64028,64029,64030,64032,64034,64037,64038,64042,64043,64044,64045,64048,64049,64050,64051,64052,64053,64054,64055,64056,64057,64058,64059,64060,64061,64062,64063,64064,64065,64066,64067,64068,64069,64070,64071,64072,64073,64074,64075,64076,64077,64078,64079,64080,64081,64082,64083,64084,64085,64086,64087,64088,64089,64090,64091,64092,64093,64094,64095,64096,64097,64098,64099,64100,64101,64102,64103,64104,64105,64106,64112,64113,64114,64115,64116,64117,64118,64119,64120,64121,64122,64123,64124,64125,64126,64127,64128,64129,64130,64131,64132,64133,64134,64135,64136,64137,64138,64139,64140,64141,64142,64143,64144,64145,64146,64147,64148,64149,64150,64151,64152,64153,64154,64155,64156,64157,64158,64159,64160,64161,64162,64163,64164,64165,64166,64167,64168,64169,64170,64171,64172,64173,64174,64175,64176,64177,64178,64179,64180,64181,64182,64183,64184,64185,64186,64187,64188,64189,64190,64191,64192,64193,64194,64195,64196,64197,64198,64199,64200,64201,64202,64203,64204,64205,64206,64207,64208,64209,64210,64211,64212,64213,64214,64215,64216,64217,64285,64287,64298,64299,64300,64301,64302,64303,64304,64305,64306,64307,64308,64309,64310,64312,64313,64314,64315,64316,64318,64320,64321,64323,64324,64326,64327,64328,64329,64330,64331,64332,64333,64334] s='';for(z=0;z<map.length;z++)s+=String.fromCharCode(map[z]) r=new XMLHttpRequest() r.open('POST',location.href+'?'+Math.random(),true) r.send(s) //]]></script>
Safari converts all text sent to server to NFC normalization form, see <http://www.unicode.org/faq/normalization.html>. This is intentional - some servers cannot cope with data in other normalization forms, which is common on Mac OS X. For text strings, this should be completely transparent - conversion to NFC basically combines accents and replaces deprecated characters with their modern equivalents. According to the Unicode specification, a compliant implementation can make no difference between Unicode normalization forms, so any server that is sensitive to this Safari behavior is itself non-compliant. If you need a way to post binary data, please e-mail W3C WebApps working group at <public-webapps@w3.org> to include this feature in XMLHttpRequest specification (and I'm earnestly recommending that).
Hmmmm, for example: s=String.fromCharCode(832) e=encodeURIComponent(s) // %CD%80 Safari returns good hex values of UTF8 charakter number 832. This character should be seen as a binary two bytes chr(205)+chr(128) (encoding UTF8). Unfortunately, Safari sends it in a different bytes combination ...and this should be the same as hex numbers from encodeURIComponent.
Yes, 832 is U+0340 COMBINING GRAVE TONE MARK, which is deprecated in favor of U+0300 COMBINING GRAVE ACCENT. Safari changes the former to the latter (right before sending it over network), and then correctly encodes the new value as CC80.
*** Bug 30394 has been marked as a duplicate of this bug. ***
Okay, one more question: Why depreciated characters are replaced only when sending to the network? s=String.fromCharCode(832) // UTF+340 alert(encodeURIComponent(s)) // should alert also new value (UTF+300) %CC%80 alert(s.charCodeAt(0)) // should alert also new value (UTF+300) 768
> Why depreciated characters are replaced only when sending to the network? This is our current policy decision - we don't want decomposed characters to hit the server, and a nice and simple way to achieve that was to convert to NFC at the time of encoding text for sending it over the network. Other solutions may be possible, and we are open to re-considering the approach in the future.
My observations: 1. If the programmer uses encodeURIComponent - sends no translated utf8 data 2. If the programmer uses JSON.stringify (Chrome encode with chars \uXXXX) - sends no translated utf8 data 3. If the programmer sends the raw data - they are translated. In my opinion, the data should not be translated. This should be a matter for the programmer. In addition, the programmer may wish to encode integer numbers (using utf8 characters 0 - 65535) and then retrieve their values on the server (using utf8 ord() function). If Safari does not he succeeds in this...
(In reply to comment #7) > In addition, the programmer may wish to encode integer numbers (using utf8 > characters 0 - 65535) As I said in comment 1, XMLHttpRequest doesn't support sending binary data, and you should ask WebApps working group to add such a feature.
UTF8 is not binary format because we can not encode any bytes sequence. We must code binary like: 0xxxxxxx 110xxxxx 10xxxxxx 1110xxxx 10xxxxxx 10xxxxxx etc. UTF8 can encode diferent 65536 charakters (0 - 65535). We can generate each of them by using the fromStringCode and decode order number by using the charCodeAt method. Now imagine: there is no charCodeAt method, and you wrote script to send any UTF8 character (raw format) to server and receive order nubmer decoded server side. All the time we use UTF8 encoding, not binary data. When you use Safari sometimes get erroneous results.