Bug 8264 - toLowerCase and toUpperCase don't honor special mappings
Summary: toLowerCase and toUpperCase don't honor special mappings
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Alexey Proskuryakov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-04-08 07:57 PDT by Alexey Proskuryakov
Modified: 2006-04-08 10:23 PDT (History)
1 user (show)

See Also:


Attachments
test case (702 bytes, text/html)
2006-04-08 07:58 PDT, Alexey Proskuryakov
no flags Details
proposed fix (19.89 KB, patch)
2006-04-08 08:03 PDT, Alexey Proskuryakov
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey Proskuryakov 2006-04-08 07:57:38 PDT
ECMA-262 says the following about String.prototype.toLowerCase() and toUpperCase():
------------------------------------------------
NOTE 1 
The result should be derived according to the case mappings in the Unicode character database (this 
explicitly includes not only the UnicodeData.txt file, but also the SpecialCasings.txt file that 
accompanies it in Unicode 2.1.8 and later). 
------------------------------------------------

None of the major browsers appears to implement this, but it still sounds like a bug that needs to be fixed to me. This requires some refactoring of the new KXMLCore:Unicode abstraction.
Comment 1 Alexey Proskuryakov 2006-04-08 07:58:01 PDT
Created attachment 7576 [details]
test case
Comment 2 Alexey Proskuryakov 2006-04-08 08:03:49 PDT
Created attachment 7577 [details]
proposed fix

Removed per-character capitalization routines altogether.

Added KXMLCore::Unicode headers to the project.

Looks like at least the ICU implementation should be in a .cpp file, not in the header - but I left it in the header for now, because the existing inline functions are even bigger, and maybe there was some reason to put them there.

Changes two Mozilla tests, which were assuming the old behavior.

Three more ICU headers will need to be svn copied to JavaScriptCore, they didn't show up in the patch.
Comment 3 Darin Adler 2006-04-08 09:53:43 PDT
Comment on attachment 7577 [details]
proposed fix

I like the general idea here, but it seems a shame to do a malloc in every case where the result is greater than the source.

It's also a shame that we have to to do the copyForWriting even when the string is already entirely in the correct case.

I'd have used uint16_t*& for destIfNeeded -- it's not a pointer to a pointer, it's an in/out parameter that's a pointer, and we normally use references for those.

But despite those issues, this code looks great and fixes a real problem. I think we can land it as-is.

r=me
Comment 4 Alexey Proskuryakov 2006-04-08 10:23:58 PDT
(In reply to comment #3)
> It's also a shame that we have to to do the copyForWriting even when the string
> is already entirely in the correct case.

  I think that this used to be the case even before - UCharReference::operator=() does an implicit copyForWriting().

> I'd have used uint16_t*& for destIfNeeded -- it's not a pointer to a pointer,
> it's an in/out parameter that's a pointer, and we normally use references for
> those.

  Fixed. Landed, r13740.