You need to
before you can comment on or make changes to this bug.
Section 7.1 of ECMA-262 says:
"The format control characters can occur anywhere in the source text of an ECMAScript program. These
characters are removed from the source text before applying the lexical grammar. <...> these characters
are removed before processing string and regular expression literals"
Besides Safari, this doesn't appear to work in Firefox, Opera or MacIE.
Created an attachment (id=3859) [details]
test case for formatting characters in literals
Created an attachment (id=3860) [details]
test case for formatting characters outside literals
Created an attachment (id=4179) [details]
Darin's patch extracted from 4885
This patch strips Cf chars from the input stream. It was extracted from a patch
done by Daring, from bug 4885.
Note that it modifies a testcase.
Does this work in any browser? It may be a compatibility problem if we diverge on this. I can imagine
people wanting to use soft hyphens in JS strings.
I don't think it's correct in the other browsers either. I agree that it could cause a problem. Wonder what
we should do.
(From update of attachment 4179 [details])
I think the patch is fine, but I am worried about Maciej's comment now. How can
we decide whether to do this or not?
I have looked in Mozilla's bugzilla, and also googled for relevant discussions, but found nothing. Perhaps,
I didn't look close enough, as it's hard to expect that such a feature has never been tested in Firefox (see
Besides the soft hyphen, I would expect various joiners/non-joiners and bidi overrides to be used in string
literals sometimes, but I do not have any evidence.
We've tested Firefox, and it does *not* remove soft hyphens.
I'd love to fix this if we can be sure it won't cause compatibility problems.
(In reply to comment #8)
Only to underline that I'm surprised to have found nothing in Mozilla bugzilla.
> We've tested Firefox, and it does *not* remove soft hyphens.
Right, that's what I wrote in the description. Later, I have tested WinIE, and it also doesn't remove Cf
characters, just the other browsers.
> I'd love to fix this if we can be sure it won't cause compatibility problems.
Well, I was only splitting bug 4885, and don't really have an opinion about this part (unlike the other
ones :) )
(From update of attachment 4179 [details])
I think we should probably land this even thought we are uneasy about the soft
If Maciej disagrees, he can review- it.
John Sullivan landed this.
sullivan 05/10/24 14:22:21
Modified: . ChangeLog
This fix has been rolled out in bug 10183. Not re-opening this bug, since the problem was not with the implementation, but rather with site compatibility.
We won't fix this because of bad effects on site compatibility.
*** Bug 16694 has been marked as a duplicate of this bug. ***
In fact, it looks like Firefox only preserves U+00AD (soft hyphen), and other Cf characters are removed as required by the spec. We probably want to match this, if only to fix ecma_3/Unicode/uc-001.js, which verifies that \u200E is removed.
OK. Will be very easy to fix.
Created an attachment (id=20731) [details]
first cut, not tested yet
Created an attachment (id=20796) [details]
patch, tested now -- needs regression test cases and ChangeLog
Here's a new patch, ready to go, except needs change log and some regression tests.
Created an attachment (id=20970) [details]
(From update of attachment 20970 [details])
r=me. Looks like we are testing for the soft hyphen not being removed in a very indirect way though - perhaps it would help to have a dedicated test case fir this. Or maybe the existing test and the long comment that you added are enough.
Now that we are not removing BOM characters at decoding time, this is causing compatibility issues, see <rdar://problem/5934376>. So, I decided to make an additional test case, and land this.
But turns out that Firefox is changing (almost) as we speak - see <https://bugzilla.mozilla.org/show_bug.cgi?id=274152> and <https://bugzilla.mozilla.org/show_bug.cgi?id=368516>.
I'm going to investigate this further.
Created an attachment (id=21123) [details]
only remove the BOM
(From update of attachment 21123 [details])
You need to use < in the description string instead of the less-than symbol.
*** Bug 19070 has been marked as a duplicate of this bug. ***