A unicode abstraction layer is required to break the libicu dependency. This is a heavy penalty for some platforms.
Created attachment 7161 [details] My proposed solution
Comment on attachment 7161 [details] My proposed solution Looks like a fine approach to me.
Something that doesn't look right to me is that this artificially limits Unicode support to UCS-2: inline unsigned short toLower(unsigned short c) { return static_cast<unsigned short>(u_tolower(uc)); } Admittedly, exisiting code also uses unsigned shorts, see bug 4920.
Any suggestions on how to change it? I didn't see any use of it that this wouldn't serve so far, and it's not like we can't change this after... it's internal to KJS anyway.
I think that a real API for tolower/toupper can only work with complete strings, not individual characters (there are some evil examples in bug 6310). Could the following work? 1) KJS::Unicode::isFormatChar() and KJS::Unicode::isSeparatorSpace() take uint32_t parameters (complete Unicode code point in UTF-32 encoding). This won't change their existing uses, and will still be easily implementable with Qt, because format chars and spaces are all in the first Unicode plane AFAIK. 2) KJS::Unicode::toLower() and KJS::Unicode::toUpper work on unsigned short (UTF-16) buffers, with implementation details hidden behind the API. They should be able to change the length of the string. 3) UChar::toUpper() and UChar::toLower() are removed altogether (it doesn't look like they are actually used for anything but strings). I think that this abstraction may eventially become useful in WebCore, too.
It's nice that the API layer came out so thin. I don't think George needs to perfect the abstraction in all ways before landing. But there are the following issues, I believe: * The patch shows an include of one of the platform-specific headers. I think it would be better to include a single header, which uses PLATFORM macros to pick the right per-platform implementation. +#include "unicode/qt4/kjs_unicode.h" * This method is not implemented in the ICU code path: inline CharCategory category(unsigned short c) * I think it would be better to put this in kxmlcore instead of kjs, so it can be used in the future by WebCore and/or khtml. I think enhancing the API to go beyond KJS's current uses can be done in a separate pass, but these issues seem relatively important. r- for now because I think the included patch would break the JavaScriptCore build and we need to straighten out the header issue.
Created attachment 7478 [details] Updated patch
Comment on attachment 7478 [details] Updated patch + } while (KJS::Unicode::isFormatChar(next3)); You should remove the KJS:: prefixes in all these places, because the code is already in the KJS prefix. + return static_cast<unsigned short>(u_tolower(c)); + return static_cast<unsigned short>(u_toupper(c)); No need for the type casts here.
Ok I've corrected it locally. Any other issues?
Looks pretty good. A few minor comments: - this should be in namespace KXMLCore, not namespace KJS - ideally the headers should follow the kxmlcore naming convention, so Unicode.h, UnicodeICU.h, UnicodeQt4.h I'll let you fix this up as part of landing, or else it can be fixed later, so r+.
Done