Section 7.1 of ECMA-262 says: "The format control characters can occur anywhere in the source text of an ECMAScript program. These characters are removed from the source text before applying the lexical grammar. <...> these characters are removed before processing string and regular expression literals" Besides Safari, this doesn't appear to work in Firefox, Opera or MacIE.
Created attachment 3859 [details] test case for formatting characters in literals
Created attachment 3860 [details] test case for formatting characters outside literals
Created attachment 4179 [details] Darin's patch extracted from 4885 This patch strips Cf chars from the input stream. It was extracted from a patch done by Daring, from bug 4885. Note that it modifies a testcase.
Does this work in any browser? It may be a compatibility problem if we diverge on this. I can imagine people wanting to use soft hyphens in JS strings.
I don't think it's correct in the other browsers either. I agree that it could cause a problem. Wonder what we should do.
Comment on attachment 4179 [details] Darin's patch extracted from 4885 I think the patch is fine, but I am worried about Maciej's comment now. How can we decide whether to do this or not?
I have looked in Mozilla's bugzilla, and also googled for relevant discussions, but found nothing. Perhaps, I didn't look close enough, as it's hard to expect that such a feature has never been tested in Firefox (see http://www.mozilla.org/js/language/js20/formal/stages.html). Besides the soft hyphen, I would expect various joiners/non-joiners and bidi overrides to be used in string literals sometimes, but I do not have any evidence.
Alexey, I'm not sure what relevance the JavaScript 2.0 specification has to this discussion. We've tested Firefox, and it does *not* remove soft hyphens. I'd love to fix this if we can be sure it won't cause compatibility problems.
(In reply to comment #8) > Alexey, I'm not sure what relevance the JavaScript 2.0 specification has to this discussion. Only to underline that I'm surprised to have found nothing in Mozilla bugzilla. > We've tested Firefox, and it does *not* remove soft hyphens. Right, that's what I wrote in the description. Later, I have tested WinIE, and it also doesn't remove Cf characters, just the other browsers. > I'd love to fix this if we can be sure it won't cause compatibility problems. Well, I was only splitting bug 4885, and don't really have an opinion about this part (unlike the other ones :) )
Comment on attachment 4179 [details] Darin's patch extracted from 4885 I think we should probably land this even thought we are uneasy about the soft hyphen thing. If Maciej disagrees, he can review- it.
John Sullivan landed this. sullivan 05/10/24 14:22:21 Modified: . ChangeLog kjs lexer.cpp tests/mozilla/ecma/Array 15.4.5.1-1.js
This fix has been rolled out in bug 10183. Not re-opening this bug, since the problem was not with the implementation, but rather with site compatibility.
We won't fix this because of bad effects on site compatibility.
*** Bug 16694 has been marked as a duplicate of this bug. ***
In fact, it looks like Firefox only preserves U+00AD (soft hyphen), and other Cf characters are removed as required by the spec. We probably want to match this, if only to fix ecma_3/Unicode/uc-001.js, which verifies that \u200E is removed.
OK. Will be very easy to fix.
Created attachment 20731 [details] first cut, not tested yet
Created attachment 20796 [details] patch, tested now -- needs regression test cases and ChangeLog Here's a new patch, ready to go, except needs change log and some regression tests.
Created attachment 20970 [details] patch
Comment on attachment 20970 [details] patch r=me. Looks like we are testing for the soft hyphen not being removed in a very indirect way though - perhaps it would help to have a dedicated test case fir this. Or maybe the existing test and the long comment that you added are enough.
Now that we are not removing BOM characters at decoding time, this is causing compatibility issues, see <rdar://problem/5934376>. So, I decided to make an additional test case, and land this. But turns out that Firefox is changing (almost) as we speak - see <https://bugzilla.mozilla.org/show_bug.cgi?id=274152> and <https://bugzilla.mozilla.org/show_bug.cgi?id=368516>. I'm going to investigate this further.
Created attachment 21123 [details] only remove the BOM
Comment on attachment 21123 [details] only remove the BOM You need to use < in the description string instead of the less-than symbol. r=me
Committed <http://trac.webkit.org/changeset/33443>.
*** Bug 19070 has been marked as a duplicate of this bug. ***