|Product:||WebKit||Reporter:||Alexey Proskuryakov <ap>|
|Severity:||Normal||CC:||darin, eric, erik, mjs|
|OS:||OS X 10.4|
|Bug Depends on:|
Description Alexey Proskuryakov 2005-09-11 02:16:08 PDT
Section 7.1 of ECMA-262 says: "The format control characters can occur anywhere in the source text of an ECMAScript program. These characters are removed from the source text before applying the lexical grammar. <...> these characters are removed before processing string and regular expression literals" Besides Safari, this doesn't appear to work in Firefox, Opera or MacIE.
Comment 1 Alexey Proskuryakov 2005-09-11 02:16:58 PDT
Created attachment 3859 [details] test case for formatting characters in literals
Comment 2 Alexey Proskuryakov 2005-09-11 02:17:18 PDT
Created attachment 3860 [details] test case for formatting characters outside literals
Comment 3 Kimmo Kinnunen 2005-10-03 09:35:59 PDT
Created attachment 4179 [details] Darin's patch extracted from 4885 This patch strips Cf chars from the input stream. It was extracted from a patch done by Daring, from bug 4885. Note that it modifies a testcase.
Comment 4 Maciej Stachowiak 2005-10-04 20:28:27 PDT
Does this work in any browser? It may be a compatibility problem if we diverge on this. I can imagine people wanting to use soft hyphens in JS strings.
Comment 5 Darin Adler 2005-10-05 10:04:17 PDT
I don't think it's correct in the other browsers either. I agree that it could cause a problem. Wonder what we should do.
Comment 6 Darin Adler 2005-10-06 07:18:57 PDT
Comment on attachment 4179 [details] Darin's patch extracted from 4885 I think the patch is fine, but I am worried about Maciej's comment now. How can we decide whether to do this or not?
Comment 7 Alexey Proskuryakov 2005-10-06 11:47:12 PDT
I have looked in Mozilla's bugzilla, and also googled for relevant discussions, but found nothing. Perhaps, I didn't look close enough, as it's hard to expect that such a feature has never been tested in Firefox (see http://www.mozilla.org/js/language/js20/formal/stages.html). Besides the soft hyphen, I would expect various joiners/non-joiners and bidi overrides to be used in string literals sometimes, but I do not have any evidence.
Comment 8 Darin Adler 2005-10-06 11:52:40 PDT
Comment 9 Alexey Proskuryakov 2005-10-06 12:11:58 PDT
Comment 10 Darin Adler 2005-10-11 08:41:27 PDT
Comment on attachment 4179 [details] Darin's patch extracted from 4885 I think we should probably land this even thought we are uneasy about the soft hyphen thing. If Maciej disagrees, he can review- it.
Comment 11 Timothy Hatcher 2005-10-24 14:27:54 PDT
John Sullivan landed this. sullivan 05/10/24 14:22:21 Modified: . ChangeLog kjs lexer.cpp tests/mozilla/ecma/Array 126.96.36.199-1.js
Comment 12 Alexey Proskuryakov 2006-09-23 11:11:40 PDT
This fix has been rolled out in bug 10183. Not re-opening this bug, since the problem was not with the implementation, but rather with site compatibility.
Comment 13 Darin Adler 2006-09-25 08:55:46 PDT
We won't fix this because of bad effects on site compatibility.
Comment 14 Darin Adler 2008-01-01 21:57:50 PST
*** Bug 16694 has been marked as a duplicate of this bug. ***
Comment 15 Alexey Proskuryakov 2008-04-21 09:11:15 PDT
In fact, it looks like Firefox only preserves U+00AD (soft hyphen), and other Cf characters are removed as required by the spec. We probably want to match this, if only to fix ecma_3/Unicode/uc-001.js, which verifies that \u200E is removed.
Comment 16 Darin Adler 2008-04-21 10:16:08 PDT
OK. Will be very easy to fix.
Comment 17 Darin Adler 2008-04-21 10:34:36 PDT
Created attachment 20731 [details] first cut, not tested yet
Comment 18 Darin Adler 2008-04-24 12:30:19 PDT
Created attachment 20796 [details] patch, tested now -- needs regression test cases and ChangeLog Here's a new patch, ready to go, except needs change log and some regression tests.
Comment 20 Alexey Proskuryakov 2008-05-05 10:26:19 PDT
Comment on attachment 20970 [details] patch r=me. Looks like we are testing for the soft hyphen not being removed in a very indirect way though - perhaps it would help to have a dedicated test case fir this. Or maybe the existing test and the long comment that you added are enough.
Comment 21 Alexey Proskuryakov 2008-05-14 02:54:26 PDT
Now that we are not removing BOM characters at decoding time, this is causing compatibility issues, see <rdar://problem/5934376>. So, I decided to make an additional test case, and land this. But turns out that Firefox is changing (almost) as we speak - see <https://bugzilla.mozilla.org/show_bug.cgi?id=274152> and <https://bugzilla.mozilla.org/show_bug.cgi?id=368516>. I'm going to investigate this further.
Comment 22 Alexey Proskuryakov 2008-05-14 04:21:44 PDT
Created attachment 21123 [details] only remove the BOM
Comment 23 Darin Adler 2008-05-14 09:14:47 PDT
Comment on attachment 21123 [details] only remove the BOM You need to use < in the description string instead of the less-than symbol. r=me
Comment 24 Alexey Proskuryakov 2008-05-14 09:35:56 PDT