ECMA-262 says the following about String.prototype.toLowerCase() and toUpperCase(): ------------------------------------------------ NOTE 1 The result should be derived according to the case mappings in the Unicode character database (this explicitly includes not only the UnicodeData.txt file, but also the SpecialCasings.txt file that accompanies it in Unicode 2.1.8 and later). ------------------------------------------------ None of the major browsers appears to implement this, but it still sounds like a bug that needs to be fixed to me. This requires some refactoring of the new KXMLCore:Unicode abstraction.
Created attachment 7576 [details] test case
Created attachment 7577 [details] proposed fix Removed per-character capitalization routines altogether. Added KXMLCore::Unicode headers to the project. Looks like at least the ICU implementation should be in a .cpp file, not in the header - but I left it in the header for now, because the existing inline functions are even bigger, and maybe there was some reason to put them there. Changes two Mozilla tests, which were assuming the old behavior. Three more ICU headers will need to be svn copied to JavaScriptCore, they didn't show up in the patch.
Comment on attachment 7577 [details] proposed fix I like the general idea here, but it seems a shame to do a malloc in every case where the result is greater than the source. It's also a shame that we have to to do the copyForWriting even when the string is already entirely in the correct case. I'd have used uint16_t*& for destIfNeeded -- it's not a pointer to a pointer, it's an in/out parameter that's a pointer, and we normally use references for those. But despite those issues, this code looks great and fixes a real problem. I think we can land it as-is. r=me
(In reply to comment #3) > It's also a shame that we have to to do the copyForWriting even when the string > is already entirely in the correct case. I think that this used to be the case even before - UCharReference::operator=() does an implicit copyForWriting(). > I'd have used uint16_t*& for destIfNeeded -- it's not a pointer to a pointer, > it's an in/out parameter that's a pointer, and we normally use references for > those. Fixed. Landed, r13740.