WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
4931
Unicode format characters (Cf) other than soft hyphen should be removed from JavaScript source
https://bugs.webkit.org/show_bug.cgi?id=4931
Summary
Unicode format characters (Cf) other than soft hyphen should be removed from ...
Alexey Proskuryakov
Reported
2005-09-11 02:16:08 PDT
Section 7.1 of ECMA-262 says: "The format control characters can occur anywhere in the source text of an ECMAScript program. These characters are removed from the source text before applying the lexical grammar. <...> these characters are removed before processing string and regular expression literals" Besides Safari, this doesn't appear to work in Firefox, Opera or MacIE.
Attachments
test case for formatting characters in literals
(759 bytes, text/html)
2005-09-11 02:16 PDT
,
Alexey Proskuryakov
no flags
Details
test case for formatting characters outside literals
(685 bytes, text/html)
2005-09-11 02:17 PDT
,
Alexey Proskuryakov
no flags
Details
Darin's patch extracted from 4885
(4.61 KB, patch)
2005-10-03 09:35 PDT
,
Kimmo Kinnunen
no flags
Details
Formatted Diff
Diff
first cut, not tested yet
(2.97 KB, patch)
2008-04-21 10:34 PDT
,
Darin Adler
no flags
Details
Formatted Diff
Diff
patch, tested now -- needs regression test cases and ChangeLog
(2.98 KB, patch)
2008-04-24 12:30 PDT
,
Darin Adler
no flags
Details
Formatted Diff
Diff
patch
(43.94 KB, patch)
2008-05-05 10:08 PDT
,
Darin Adler
no flags
Details
Formatted Diff
Diff
only remove the BOM
(7.54 KB, patch)
2008-05-14 04:21 PDT
,
Alexey Proskuryakov
darin
: review+
Details
Formatted Diff
Diff
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2005-09-11 02:16:58 PDT
Created
attachment 3859
[details]
test case for formatting characters in literals
Alexey Proskuryakov
Comment 2
2005-09-11 02:17:18 PDT
Created
attachment 3860
[details]
test case for formatting characters outside literals
Kimmo Kinnunen
Comment 3
2005-10-03 09:35:59 PDT
Created
attachment 4179
[details]
Darin's patch extracted from 4885 This patch strips Cf chars from the input stream. It was extracted from a patch done by Daring, from
bug 4885
. Note that it modifies a testcase.
Maciej Stachowiak
Comment 4
2005-10-04 20:28:27 PDT
Does this work in any browser? It may be a compatibility problem if we diverge on this. I can imagine people wanting to use soft hyphens in JS strings.
Darin Adler
Comment 5
2005-10-05 10:04:17 PDT
I don't think it's correct in the other browsers either. I agree that it could cause a problem. Wonder what we should do.
Darin Adler
Comment 6
2005-10-06 07:18:57 PDT
Comment on
attachment 4179
[details]
Darin's patch extracted from 4885 I think the patch is fine, but I am worried about Maciej's comment now. How can we decide whether to do this or not?
Alexey Proskuryakov
Comment 7
2005-10-06 11:47:12 PDT
I have looked in Mozilla's bugzilla, and also googled for relevant discussions, but found nothing. Perhaps, I didn't look close enough, as it's hard to expect that such a feature has never been tested in Firefox (see
http://www.mozilla.org/js/language/js20/formal/stages.html
). Besides the soft hyphen, I would expect various joiners/non-joiners and bidi overrides to be used in string literals sometimes, but I do not have any evidence.
Darin Adler
Comment 8
2005-10-06 11:52:40 PDT
Alexey, I'm not sure what relevance the JavaScript 2.0 specification has to this discussion. We've tested Firefox, and it does *not* remove soft hyphens. I'd love to fix this if we can be sure it won't cause compatibility problems.
Alexey Proskuryakov
Comment 9
2005-10-06 12:11:58 PDT
(In reply to
comment #8
)
> Alexey, I'm not sure what relevance the JavaScript 2.0 specification has to this discussion.
Only to underline that I'm surprised to have found nothing in Mozilla bugzilla.
> We've tested Firefox, and it does *not* remove soft hyphens.
Right, that's what I wrote in the description. Later, I have tested WinIE, and it also doesn't remove Cf characters, just the other browsers.
> I'd love to fix this if we can be sure it won't cause compatibility problems.
Well, I was only splitting
bug 4885
, and don't really have an opinion about this part (unlike the other ones :) )
Darin Adler
Comment 10
2005-10-11 08:41:27 PDT
Comment on
attachment 4179
[details]
Darin's patch extracted from 4885 I think we should probably land this even thought we are uneasy about the soft hyphen thing. If Maciej disagrees, he can review- it.
Timothy Hatcher
Comment 11
2005-10-24 14:27:54 PDT
John Sullivan landed this. sullivan 05/10/24 14:22:21 Modified: . ChangeLog kjs lexer.cpp tests/mozilla/ecma/Array 15.4.5.1-1.js
Alexey Proskuryakov
Comment 12
2006-09-23 11:11:40 PDT
This fix has been rolled out in
bug 10183
. Not re-opening this bug, since the problem was not with the implementation, but rather with site compatibility.
Darin Adler
Comment 13
2006-09-25 08:55:46 PDT
We won't fix this because of bad effects on site compatibility.
Darin Adler
Comment 14
2008-01-01 21:57:50 PST
***
Bug 16694
has been marked as a duplicate of this bug. ***
Alexey Proskuryakov
Comment 15
2008-04-21 09:11:15 PDT
In fact, it looks like Firefox only preserves U+00AD (soft hyphen), and other Cf characters are removed as required by the spec. We probably want to match this, if only to fix ecma_3/Unicode/uc-001.js, which verifies that \u200E is removed.
Darin Adler
Comment 16
2008-04-21 10:16:08 PDT
OK. Will be very easy to fix.
Darin Adler
Comment 17
2008-04-21 10:34:36 PDT
Created
attachment 20731
[details]
first cut, not tested yet
Darin Adler
Comment 18
2008-04-24 12:30:19 PDT
Created
attachment 20796
[details]
patch, tested now -- needs regression test cases and ChangeLog Here's a new patch, ready to go, except needs change log and some regression tests.
Darin Adler
Comment 19
2008-05-05 10:08:56 PDT
Created
attachment 20970
[details]
patch
Alexey Proskuryakov
Comment 20
2008-05-05 10:26:19 PDT
Comment on
attachment 20970
[details]
patch r=me. Looks like we are testing for the soft hyphen not being removed in a very indirect way though - perhaps it would help to have a dedicated test case fir this. Or maybe the existing test and the long comment that you added are enough.
Alexey Proskuryakov
Comment 21
2008-05-14 02:54:26 PDT
Now that we are not removing BOM characters at decoding time, this is causing compatibility issues, see <
rdar://problem/5934376
>. So, I decided to make an additional test case, and land this. But turns out that Firefox is changing (almost) as we speak - see <
https://bugzilla.mozilla.org/show_bug.cgi?id=274152
> and <
https://bugzilla.mozilla.org/show_bug.cgi?id=368516
>. I'm going to investigate this further.
Alexey Proskuryakov
Comment 22
2008-05-14 04:21:44 PDT
Created
attachment 21123
[details]
only remove the BOM
Darin Adler
Comment 23
2008-05-14 09:14:47 PDT
Comment on
attachment 21123
[details]
only remove the BOM You need to use < in the description string instead of the less-than symbol. r=me
Alexey Proskuryakov
Comment 24
2008-05-14 09:35:56 PDT
Committed <
http://trac.webkit.org/changeset/33443
>.
Mark Rowe (bdash)
Comment 25
2008-05-14 22:29:10 PDT
***
Bug 19070
has been marked as a duplicate of this bug. ***
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug