Bug 10436 - Safari May Not Interpret Regular Expression in Compliance with W3C Standard
Summary: Safari May Not Interpret Regular Expression in Compliance with W3C Standard
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P2 Major
Assignee: Nobody
URL: javascript:alert('\u00e9'.match(new R...
Keywords:
Depends on:
Blocks:
 
Reported: 2006-08-16 09:48 PDT by Eduardo Foresti
Modified: 2006-12-06 09:36 PST (History)
3 users (show)

See Also:


Attachments
Test File for Proving Hypothesis RE: Bug 10436 (1.87 KB, text/html)
2006-08-16 13:31 PDT, Eduardo Foresti
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eduardo Foresti 2006-08-16 09:48:58 PDT
We are troubleshooting why some websites are no longer working when rendered with the Safari 2.0.4 browser. The failure begins when client entered data is validated using regular expressions.  So, we validate that a User's UserID and Password are 'valid' using the regex - since the regex is failing, a user cannot enter the site.  This only began with 10.4.7 and 2.0.4 of Safari.  Also note that it continues to work on 10.4.7 using IE and FireFox.

We have determined no other place to go than to you.

We seem to have localized the issue to Safari's not interpretting regular expressions consistently. 

For example, we looked at the follwing tests:
> Does the regex \u00e9 match the literal character é? (Validates the regular expression engine understands Unicode escape sequences for extended characters.)? - NO, but it does on IE and FireFox

> Does the regex \u0041 match the literal character A? (Validates the regular expression engine understands Unicode escape sequences for ASCII characters.)? - NO, but it does on IE and FireFox

>Does the regex é match the literal character é? (Validates the regular expression engine understands literal characters outside the ASCII range – this is against ECMAScript spec.)? - Sometimes, but always on IE and FireFox

> Write a Unicode escape sequence to the screen on the client side. (Validates the string parsing and display in the JS engine works.) - Works on all 3

> Is escape sequence \u00e9 equivalent to literal character é? (Validates the string functionality in the JS engine works with extended characters.)? Yes on all 3.

> Is escape sequence \u0041 equivalent to literal character A? (Validates the string functionality in the JS engine works with ASCII characters.)? Yes on all 3

> Does the regex A match the literal character A? (Validates the regular expression engine understands literal characters in the ASCII range – this is ECMAScript spec.)? Yes on all 3

Please help. It's hard for me to believe that the regular expression / javascript interpreter(s) for Safari aren't working as they have in the past - but all roads are pointed that way....

Thank you for your review.
Comment 1 jonathanjohnsson 2006-08-16 13:04:09 PDT
If you need to confirm that the issues indeed were introduced in Safari 2.0.4, you could try an earlier version from Multi-Safari at http://www.michelf.com/projects/multi-safari/ (be sure to read the "known issues" part of that page).

It would help if you could create a simple test case and attach to this bug report, demonstrating what you say in the description, so it can be easily verifiable what different browsers do, and what has changed in between versions.
Comment 2 Eduardo Foresti 2006-08-16 13:25:31 PDT
(In reply to comment #1)
> If you need to confirm that the issues indeed were introduced in Safari 2.0.4,
> you could try an earlier version from Multi-Safari at
> http://www.michelf.com/projects/multi-safari/ (be sure to read the "known
> issues" part of that page).
> It would help if you could create a simple test case and attach to this bug
> report, demonstrating what you say in the description, so it can be easily
> verifiable what different browsers do, and what has changed in between
> versions.


Thank you for your response(In reply to comment #1)
> We were able to confirm across other Browsers and that this did work in prior versions.  Thanks for the tip to MultiSafari - this will be very helpful in the future.
> In re: to the test case, attached is a standard HTML page with some very simple client-side code that can be used to test Unicode character handling both in and out of regular expressions in JavaScript.  This page is slightly modified from the original source at http://www.regular-expressions.info/javascriptexample.html - it is simplified (no ads or additional text) and adds some tests that the original page does not.  Tests include:

>>Does a Unicode escape sequence properly get interpreted when written to the screen? 
>>If you compare a Unicode escape sequence to its literal character interpretation, are they equivalent? 
>>Given a regular expression, does a given input match? 
 
Save the attached file with a .html extension and open it up in the browser to run the tests.

Given:
Unicode sequence “\u00e9” is the equivalent of “é” 
Unicode sequence “\u0041” is the equivalent of “A” 
 
I tested in IE 6 on Windows, IE on Mac, and Safari 2.0.4 on Mac.  The IE and Safari browsers were on the same physical Mac. As you can see from the test results (in the orginal post), the string functionality in all of the JavaScript implementations works but the regular expression engines did not always work.

Attached is a standard HTML page with some very simple client-side code that can be used to test Unicode character handling both in and out of regular expressions in JavaScript.  This page is slightly modified from the original source at http://www.regular-expressions.info/javascriptexample.html - it is simplified (no ads or additional text) and adds some tests that the original page does not.  Tests include:

Does a Unicode escape sequence properly get interpreted when written to the screen? 
If you compare a Unicode escape sequence to its literal character interpretation, are they equivalent? 
Given a regular expression, does a given input match? 
 

Save the attached file with a .html extension and open it up in the browser to run the tests.

Given:
Unicode sequence “\u00e9” is the equivalent of “é” 
Unicode sequence “\u0041” is the equivalent of “A” 
 

I tested in IE 6 on Windows, IE on Mac, and Safari 2.0.4 on Mac.  The IE and Safari browsers were on the same physical Mac. As you can see from the test results, the string functionality in all of the JavaScript implementations works but the regular expression engines did not always work.

I did not test every single FI regular expression however the ones I did test further proved the theory.  Anything with a \u sequence didn’t work; very simple expressions allowing no extended characters worked.

Conclusion - The JavaScript regular expression implementation in this version of Safari is wrong.  
Comment 3 Eduardo Foresti 2006-08-16 13:31:43 PDT
Created attachment 10076 [details]
Test File for Proving Hypothesis RE: Bug 10436

Note: NOT A PATCH.  The attached was derived from a file provided by the firm indicated in the file.  I do not represent this is my work, nor am I implying or suggesting its use is for anything other than the testing of this issue.
Comment 4 Alexey Proskuryakov 2006-08-16 14:10:10 PDT
Test case: javascript:alert('\u00e9'.match(new RegExp('\\u00e9')))

Downgrading severity (blocker is for bugs that block WebKit development). Once confirmed as a regression, this should be P1 priority, of course.
Comment 5 Eduardo Foresti 2006-10-11 21:02:32 PDT
(In reply to comment #4)

Any thoughts on this?  It seems to not have changed?  We took the patch for ID: 6257 and it FULLY addressed this issue, but, it appears that 6257 is being held up for release b/c it has become the child of a master bug (7383).  I still have thousands of customers that I'm pushing to FF and IE.  Any idea how to find out if 6257 or 7383 will be made available to the public?  I don't have anything to tell my customers.  Can you help?

Comment 6 Alexey Proskuryakov 2006-10-15 01:35:28 PDT
I have tried to confirm this as a regression using Multi-Safari, but I'm getting the same results from "Safari 2.0". If you could point out what exactly broke in 10.4.7, this would raise the priority of this issue to P1 (the highest), which would likely lead to a quick fix.

My understanding is that the proposed fix in bug 6257 was not good enough to be accepted, so we still need someone to write a better one.
Comment 7 Alexey Proskuryakov 2006-12-06 09:36:50 PST
This was fixed recently.