Bug 30387 - Safari 4.0.3 (531.9.1) XMLHttpRequest send incorect POST data or have wrong unicode map
Summary: Safari 4.0.3 (531.9.1) XMLHttpRequest send incorect POST data or have wrong u...
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: XML (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Critical
Assignee: Nobody
URL:
Keywords:
: 30394 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-10-15 07:53 PDT by bugzilla33
Modified: 2009-10-20 23:16 PDT (History)
1 user (show)

See Also:


Attachments
zipped attachements (10.20 KB, application/octet-stream)
2009-10-15 07:53 PDT, bugzilla33
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description bugzilla33 2009-10-15 07:53:53 PDT
Created attachment 41225 [details]
zipped attachements

1. Safari 4.0.3 (531.9.1) XMLHttpRequest send incorect POST data or have wrong unicode map.

2. Run test.php5 to generate and store to disk binary files.

3. Results

   a) bin_explorer_8.0.7600.16385.dat
      bin_firefox_3.5.3.dat
      bin_konqueror_4.3.2.dat
      bin_opera_10.00.1750.dat

      are binary the same.

   b) bin_safari_4.0.3.531.9.1.dat
      bin_chrome_3.0.195.27.dat

      are wrong binary!!!
      Please read about UTF-8 encoding to binary representation.

4. Summary

   Server-side languages receive incorrect data from Safari & Chrome.
   Only characters from javascript map array (test.php5) are invalid encoded.

   Explorer, Opera, Konqueror and Firefox send perfect data.


--- code ---


<?php

 if($_SERVER['QUERY_STRING']){
  file_put_contents('bin.dat',file_get_contents('php://input'));
 }

?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
<script type="text/javascript">//<![CDATA[

 map=[832,833,835,836,884,894,903,2392,2393,2394,2395,2396,2397,2398,2399,2524,2525,2527,2611,2614,2649,2650,2651,2654,2908,2909,3907,3917,3922,3927,3932,3945,3955,3957,3958,3960,3969,3987,3997,4002,4007,4012,4025,8049,8051,8053,8055,8057,8059,8061,8123,8126,8137,8139,8147,8155,8163,8171,8174,8175,8185,8187,8189,8192,8193,8486,8490,8491,9001,9002,10972,63744,63745,63746,63747,63748,63749,63750,63751,63752,63753,63754,63755,63756,63757,63758,63759,63760,63761,63762,63763,63764,63765,63766,63767,63768,63769,63770,63771,63772,63773,63774,63775,63776,63777,63778,63779,63780,63781,63782,63783,63784,63785,63786,63787,63788,63789,63790,63791,63792,63793,63794,63795,63796,63797,63798,63799,63800,63801,63802,63803,63804,63805,63806,63807,63808,63809,63810,63811,63812,63813,63814,63815,63816,63817,63818,63819,63820,63821,63822,63823,63824,63825,63826,63827,63828,63829,63830,63831,63832,63833,63834,63835,63836,63837,63838,63839,63840,63841,63842,63843,63844,63845,63846,63847,63848,63849,63850,63851,63852,63853,63854,63855,63856,63857,63858,63859,63860,63861,63862,63863,63864,63865,63866,63867,63868,63869,63870,63871,63872,63873,63874,63875,63876,63877,63878,63879,63880,63881,63882,63883,63884,63885,63886,63887,63888,63889,63890,63891,63892,63893,63894,63895,63896,63897,63898,63899,63900,63901,63902,63903,63904,63905,63906,63907,63908,63909,63910,63911,63912,63913,63914,63915,63916,63917,63918,63919,63920,63921,63922,63923,63924,63925,63926,63927,63928,63929,63930,63931,63932,63933,63934,63935,63936,63937,63938,63939,63940,63941,63942,63943,63944,63945,63946,63947,63948,63949,63950,63951,63952,63953,63954,63955,63956,63957,63958,63959,63960,63961,63962,63963,63964,63965,63966,63967,63968,63969,63970,63971,63972,63973,63974,63975,63976,63977,63978,63979,63980,63981,63982,63983,63984,63985,63986,63987,63988,63989,63990,63991,63992,63993,63994,63995,63996,63997,63998,63999,64000,64001,64002,64003,64004,64005,64006,64007,64008,64009,64010,64011,64012,64013,64016,64018,64021,64022,64023,64024,64025,64026,64027,64028,64029,64030,64032,64034,64037,64038,64042,64043,64044,64045,64048,64049,64050,64051,64052,64053,64054,64055,64056,64057,64058,64059,64060,64061,64062,64063,64064,64065,64066,64067,64068,64069,64070,64071,64072,64073,64074,64075,64076,64077,64078,64079,64080,64081,64082,64083,64084,64085,64086,64087,64088,64089,64090,64091,64092,64093,64094,64095,64096,64097,64098,64099,64100,64101,64102,64103,64104,64105,64106,64112,64113,64114,64115,64116,64117,64118,64119,64120,64121,64122,64123,64124,64125,64126,64127,64128,64129,64130,64131,64132,64133,64134,64135,64136,64137,64138,64139,64140,64141,64142,64143,64144,64145,64146,64147,64148,64149,64150,64151,64152,64153,64154,64155,64156,64157,64158,64159,64160,64161,64162,64163,64164,64165,64166,64167,64168,64169,64170,64171,64172,64173,64174,64175,64176,64177,64178,64179,64180,64181,64182,64183,64184,64185,64186,64187,64188,64189,64190,64191,64192,64193,64194,64195,64196,64197,64198,64199,64200,64201,64202,64203,64204,64205,64206,64207,64208,64209,64210,64211,64212,64213,64214,64215,64216,64217,64285,64287,64298,64299,64300,64301,64302,64303,64304,64305,64306,64307,64308,64309,64310,64312,64313,64314,64315,64316,64318,64320,64321,64323,64324,64326,64327,64328,64329,64330,64331,64332,64333,64334]

 s='';for(z=0;z<map.length;z++)s+=String.fromCharCode(map[z])

 r=new XMLHttpRequest()
 r.open('POST',location.href+'?'+Math.random(),true)
 r.send(s)

//]]></script>
Comment 1 Alexey Proskuryakov 2009-10-15 14:35:23 PDT
Safari converts all text sent to server to NFC normalization form, see <http://www.unicode.org/faq/normalization.html>. This is intentional - some servers cannot cope with data in other normalization forms, which is common on Mac OS X.

For text strings, this should be completely transparent - conversion to NFC basically combines accents and replaces deprecated characters with their modern equivalents. According to the Unicode specification, a compliant implementation can make no difference between Unicode normalization forms, so any server that is sensitive to this Safari behavior is itself non-compliant.

If you need a way to post binary data, please e-mail W3C WebApps working group at <public-webapps@w3.org> to include this feature in XMLHttpRequest specification (and I'm earnestly recommending that).
Comment 2 bugzilla33 2009-10-15 15:39:41 PDT
Hmmmm, for example:

s=String.fromCharCode(832)
e=encodeURIComponent(s) // %CD%80 Safari returns good hex values of UTF8 charakter number 832.

This character should be seen as a binary two bytes chr(205)+chr(128) (encoding UTF8).
Unfortunately, Safari sends it in a different bytes combination ...and this should be the same as hex numbers from encodeURIComponent.
Comment 3 Alexey Proskuryakov 2009-10-15 16:01:50 PDT
Yes, 832 is U+0340 COMBINING GRAVE TONE MARK, which is deprecated in favor of U+0300 COMBINING GRAVE ACCENT. Safari changes the former to the latter (right before sending it over network), and then correctly encodes the new value as CC80.
Comment 4 Alexey Proskuryakov 2009-10-15 19:14:14 PDT
*** Bug 30394 has been marked as a duplicate of this bug. ***
Comment 5 bugzilla33 2009-10-15 23:30:51 PDT
Okay, one more question: 

Why depreciated characters are replaced only when sending to the network?

s=String.fromCharCode(832) // UTF+340

alert(encodeURIComponent(s)) // should alert also new value (UTF+300) %CC%80

alert(s.charCodeAt(0)) // should alert also new value (UTF+300) 768
Comment 6 Alexey Proskuryakov 2009-10-16 14:28:58 PDT
> Why depreciated characters are replaced only when sending to the network?

This is our current policy decision - we don't want decomposed characters to hit the server, and a nice and simple way to achieve that was to convert to NFC at the time of encoding text for sending it over the network.

Other solutions may be possible, and we are open to re-considering the approach in the future.
Comment 7 bugzilla33 2009-10-20 07:26:54 PDT
My observations: 
1. If the programmer uses encodeURIComponent - sends no translated utf8 data 
2. If the programmer uses JSON.stringify (Chrome encode with chars \uXXXX) - sends no translated utf8 data
3. If the programmer sends the raw data - they are translated. 

In my opinion, the data should not be translated. This should be a matter for the programmer. 

In addition, the programmer may wish to encode integer numbers (using utf8 characters 0 - 65535) and then retrieve their values on the server (using utf8 ord() function). If Safari does not he succeeds in this...
Comment 8 Alexey Proskuryakov 2009-10-20 12:44:26 PDT
(In reply to comment #7)
> In addition, the programmer may wish to encode integer numbers (using utf8
> characters 0 - 65535)

As I said in comment 1, XMLHttpRequest doesn't support sending binary data, and you should ask WebApps working group to add such a feature.
Comment 9 bugzilla33 2009-10-20 23:16:24 PDT
UTF8 is not binary format because we can not encode any bytes sequence.
We must code binary like:
0xxxxxxx
110xxxxx 10xxxxxx
1110xxxx 10xxxxxx 10xxxxxx
etc.

UTF8 can encode diferent 65536 charakters (0 - 65535).
We can generate each of them by using the fromStringCode and decode order number by using the charCodeAt method.

Now imagine: there is no charCodeAt method, and you wrote script to send any UTF8 character (raw format) to server and receive order nubmer decoded server side.
All the time we use UTF8 encoding, not binary data.
When you use Safari sometimes get erroneous results.