RESOLVED FIXED 4931
Unicode format characters (Cf) other than soft hyphen should be removed from JavaScript source
https://bugs.webkit.org/show_bug.cgi?id=4931
Summary Unicode format characters (Cf) other than soft hyphen should be removed from ...
Alexey Proskuryakov
Reported 2005-09-11 02:16:08 PDT
Section 7.1 of ECMA-262 says: "The format control characters can occur anywhere in the source text of an ECMAScript program. These characters are removed from the source text before applying the lexical grammar. <...> these characters are removed before processing string and regular expression literals" Besides Safari, this doesn't appear to work in Firefox, Opera or MacIE.
Attachments
test case for formatting characters in literals (759 bytes, text/html)
2005-09-11 02:16 PDT, Alexey Proskuryakov
no flags
test case for formatting characters outside literals (685 bytes, text/html)
2005-09-11 02:17 PDT, Alexey Proskuryakov
no flags
Darin's patch extracted from 4885 (4.61 KB, patch)
2005-10-03 09:35 PDT, Kimmo Kinnunen
no flags
first cut, not tested yet (2.97 KB, patch)
2008-04-21 10:34 PDT, Darin Adler
no flags
patch, tested now -- needs regression test cases and ChangeLog (2.98 KB, patch)
2008-04-24 12:30 PDT, Darin Adler
no flags
patch (43.94 KB, patch)
2008-05-05 10:08 PDT, Darin Adler
no flags
only remove the BOM (7.54 KB, patch)
2008-05-14 04:21 PDT, Alexey Proskuryakov
darin: review+
Alexey Proskuryakov
Comment 1 2005-09-11 02:16:58 PDT
Created attachment 3859 [details] test case for formatting characters in literals
Alexey Proskuryakov
Comment 2 2005-09-11 02:17:18 PDT
Created attachment 3860 [details] test case for formatting characters outside literals
Kimmo Kinnunen
Comment 3 2005-10-03 09:35:59 PDT
Created attachment 4179 [details] Darin's patch extracted from 4885 This patch strips Cf chars from the input stream. It was extracted from a patch done by Daring, from bug 4885. Note that it modifies a testcase.
Maciej Stachowiak
Comment 4 2005-10-04 20:28:27 PDT
Does this work in any browser? It may be a compatibility problem if we diverge on this. I can imagine people wanting to use soft hyphens in JS strings.
Darin Adler
Comment 5 2005-10-05 10:04:17 PDT
I don't think it's correct in the other browsers either. I agree that it could cause a problem. Wonder what we should do.
Darin Adler
Comment 6 2005-10-06 07:18:57 PDT
Comment on attachment 4179 [details] Darin's patch extracted from 4885 I think the patch is fine, but I am worried about Maciej's comment now. How can we decide whether to do this or not?
Alexey Proskuryakov
Comment 7 2005-10-06 11:47:12 PDT
I have looked in Mozilla's bugzilla, and also googled for relevant discussions, but found nothing. Perhaps, I didn't look close enough, as it's hard to expect that such a feature has never been tested in Firefox (see http://www.mozilla.org/js/language/js20/formal/stages.html). Besides the soft hyphen, I would expect various joiners/non-joiners and bidi overrides to be used in string literals sometimes, but I do not have any evidence.
Darin Adler
Comment 8 2005-10-06 11:52:40 PDT
Alexey, I'm not sure what relevance the JavaScript 2.0 specification has to this discussion. We've tested Firefox, and it does *not* remove soft hyphens. I'd love to fix this if we can be sure it won't cause compatibility problems.
Alexey Proskuryakov
Comment 9 2005-10-06 12:11:58 PDT
(In reply to comment #8) > Alexey, I'm not sure what relevance the JavaScript 2.0 specification has to this discussion. Only to underline that I'm surprised to have found nothing in Mozilla bugzilla. > We've tested Firefox, and it does *not* remove soft hyphens. Right, that's what I wrote in the description. Later, I have tested WinIE, and it also doesn't remove Cf characters, just the other browsers. > I'd love to fix this if we can be sure it won't cause compatibility problems. Well, I was only splitting bug 4885, and don't really have an opinion about this part (unlike the other ones :) )
Darin Adler
Comment 10 2005-10-11 08:41:27 PDT
Comment on attachment 4179 [details] Darin's patch extracted from 4885 I think we should probably land this even thought we are uneasy about the soft hyphen thing. If Maciej disagrees, he can review- it.
Timothy Hatcher
Comment 11 2005-10-24 14:27:54 PDT
John Sullivan landed this. sullivan 05/10/24 14:22:21 Modified: . ChangeLog kjs lexer.cpp tests/mozilla/ecma/Array 15.4.5.1-1.js
Alexey Proskuryakov
Comment 12 2006-09-23 11:11:40 PDT
This fix has been rolled out in bug 10183. Not re-opening this bug, since the problem was not with the implementation, but rather with site compatibility.
Darin Adler
Comment 13 2006-09-25 08:55:46 PDT
We won't fix this because of bad effects on site compatibility.
Darin Adler
Comment 14 2008-01-01 21:57:50 PST
*** Bug 16694 has been marked as a duplicate of this bug. ***
Alexey Proskuryakov
Comment 15 2008-04-21 09:11:15 PDT
In fact, it looks like Firefox only preserves U+00AD (soft hyphen), and other Cf characters are removed as required by the spec. We probably want to match this, if only to fix ecma_3/Unicode/uc-001.js, which verifies that \u200E is removed.
Darin Adler
Comment 16 2008-04-21 10:16:08 PDT
OK. Will be very easy to fix.
Darin Adler
Comment 17 2008-04-21 10:34:36 PDT
Created attachment 20731 [details] first cut, not tested yet
Darin Adler
Comment 18 2008-04-24 12:30:19 PDT
Created attachment 20796 [details] patch, tested now -- needs regression test cases and ChangeLog Here's a new patch, ready to go, except needs change log and some regression tests.
Darin Adler
Comment 19 2008-05-05 10:08:56 PDT
Alexey Proskuryakov
Comment 20 2008-05-05 10:26:19 PDT
Comment on attachment 20970 [details] patch r=me. Looks like we are testing for the soft hyphen not being removed in a very indirect way though - perhaps it would help to have a dedicated test case fir this. Or maybe the existing test and the long comment that you added are enough.
Alexey Proskuryakov
Comment 21 2008-05-14 02:54:26 PDT
Now that we are not removing BOM characters at decoding time, this is causing compatibility issues, see <rdar://problem/5934376>. So, I decided to make an additional test case, and land this. But turns out that Firefox is changing (almost) as we speak - see <https://bugzilla.mozilla.org/show_bug.cgi?id=274152> and <https://bugzilla.mozilla.org/show_bug.cgi?id=368516>. I'm going to investigate this further.
Alexey Proskuryakov
Comment 22 2008-05-14 04:21:44 PDT
Created attachment 21123 [details] only remove the BOM
Darin Adler
Comment 23 2008-05-14 09:14:47 PDT
Comment on attachment 21123 [details] only remove the BOM You need to use &lt; in the description string instead of the less-than symbol. r=me
Alexey Proskuryakov
Comment 24 2008-05-14 09:35:56 PDT
Mark Rowe (bdash)
Comment 25 2008-05-14 22:29:10 PDT
*** Bug 19070 has been marked as a duplicate of this bug. ***
Note You need to log in before you can comment on or make changes to this bug.