RESOLVED FIXED7852
New unicode abstraction layer
https://bugs.webkit.org/show_bug.cgi?id=7852
Summary New unicode abstraction layer
George Staikos
Reported 2006-03-18 16:19:13 PST
A unicode abstraction layer is required to break the libicu dependency. This is a heavy penalty for some platforms.
Attachments
My proposed solution (5.39 KB, application/octet-stream)
2006-03-18 16:21 PST, George Staikos
mjs: review-
Updated patch (17.03 KB, patch)
2006-04-02 22:30 PDT, George Staikos
mjs: review+
George Staikos
Comment 1 2006-03-18 16:21:31 PST
Created attachment 7161 [details] My proposed solution
Darin Adler
Comment 2 2006-03-18 19:34:42 PST
Comment on attachment 7161 [details] My proposed solution Looks like a fine approach to me.
Alexey Proskuryakov
Comment 3 2006-03-18 22:49:13 PST
Something that doesn't look right to me is that this artificially limits Unicode support to UCS-2: inline unsigned short toLower(unsigned short c) { return static_cast<unsigned short>(u_tolower(uc)); } Admittedly, exisiting code also uses unsigned shorts, see bug 4920.
George Staikos
Comment 4 2006-03-19 05:21:29 PST
Any suggestions on how to change it? I didn't see any use of it that this wouldn't serve so far, and it's not like we can't change this after... it's internal to KJS anyway.
Alexey Proskuryakov
Comment 5 2006-03-19 06:20:56 PST
I think that a real API for tolower/toupper can only work with complete strings, not individual characters (there are some evil examples in bug 6310). Could the following work? 1) KJS::Unicode::isFormatChar() and KJS::Unicode::isSeparatorSpace() take uint32_t parameters (complete Unicode code point in UTF-32 encoding). This won't change their existing uses, and will still be easily implementable with Qt, because format chars and spaces are all in the first Unicode plane AFAIK. 2) KJS::Unicode::toLower() and KJS::Unicode::toUpper work on unsigned short (UTF-16) buffers, with implementation details hidden behind the API. They should be able to change the length of the string. 3) UChar::toUpper() and UChar::toLower() are removed altogether (it doesn't look like they are actually used for anything but strings). I think that this abstraction may eventially become useful in WebCore, too.
Maciej Stachowiak
Comment 6 2006-03-21 01:20:54 PST
It's nice that the API layer came out so thin. I don't think George needs to perfect the abstraction in all ways before landing. But there are the following issues, I believe: * The patch shows an include of one of the platform-specific headers. I think it would be better to include a single header, which uses PLATFORM macros to pick the right per-platform implementation. +#include "unicode/qt4/kjs_unicode.h" * This method is not implemented in the ICU code path: inline CharCategory category(unsigned short c) * I think it would be better to put this in kxmlcore instead of kjs, so it can be used in the future by WebCore and/or khtml. I think enhancing the API to go beyond KJS's current uses can be done in a separate pass, but these issues seem relatively important. r- for now because I think the included patch would break the JavaScriptCore build and we need to straighten out the header issue.
George Staikos
Comment 7 2006-04-02 22:30:42 PDT
Created attachment 7478 [details] Updated patch
Darin Adler
Comment 8 2006-04-03 08:53:39 PDT
Comment on attachment 7478 [details] Updated patch + } while (KJS::Unicode::isFormatChar(next3)); You should remove the KJS:: prefixes in all these places, because the code is already in the KJS prefix. + return static_cast<unsigned short>(u_tolower(c)); + return static_cast<unsigned short>(u_toupper(c)); No need for the type casts here.
George Staikos
Comment 9 2006-04-03 09:19:13 PDT
Ok I've corrected it locally. Any other issues?
Maciej Stachowiak
Comment 10 2006-04-03 15:54:05 PDT
Looks pretty good. A few minor comments: - this should be in namespace KXMLCore, not namespace KJS - ideally the headers should follow the kxmlcore naming convention, so Unicode.h, UnicodeICU.h, UnicodeQt4.h I'll let you fix this up as part of landing, or else it can be fixed later, so r+.
George Staikos
Comment 11 2006-04-03 16:46:43 PDT
Done
Note You need to log in before you can comment on or make changes to this bug.