UNCONFIRMED 67022
Investigate how to pass 8-bit characters to ICU
https://bugs.webkit.org/show_bug.cgi?id=67022
Summary Investigate how to pass 8-bit characters to ICU
Xianzhu Wang
Reported 2011-08-26 00:16:19 PDT
ICU is a big category of the final usages of const UChar* on certain platforms (see https://docs.google.com/a/google.com/spreadsheet/ccc?key=0AsX8ECAuTqEzdEVFYnZaUlZ3QzJCbnBKamwxdGMxVkE&hl=en_US&ndplr=1#gid=10 for details). To efficiently support 8/16-bit string buffers, we should make sure we can pass the strings to ICU without conversion between 8/16-bit formats in most cases. For now WebKit passes const UChar* parameters to the following components of ICU: * format: accepts only 16-bit strings. Only used from NumberInputType -> LocalizedNumberICU to parse and format numbers which seem not performance critical. * ubrk: from 3.4, ICU added ubrk_setUText which accepts an abstract UText parameter to support alternative string formats. We can have our own UText callbacks to support the new 8/16-bit string buffers. We should make sure if the version is available on each platform. We also need to measure the performance of UText concerning the cost of function callbacks. * ucnv: because we store only ASCII characters in 8-bit string, we can simple take different branch for 8- and 16-bit strings. For 8-bit strings, we simply calls the ASCII conversion functions which accept const char* parameters. * ucol: we can pass UCharIterator to ICU to support 8/16-bit strings. Also need to measure the performance concerning the cost of function callbacks. * unorm: When calling unorm_*(), UNORM_NFC is the only mode used by WebKit. In this mode, unorm_*() has no effect on ASCII strings, so we can just test 8/16-bit format and call unorm_*() for only 16-bit strings. * usearch: accepts only 16-bit strings. The only place in WebKit calling usearch_*() is WebCore/editing/TextIterator.cpp. * uset: uset_openPattern() accepts only 16-bit strings, but this is not a problem because it’s only called twice in each WebKit process.
Attachments
Note You need to log in before you can comment on or make changes to this bug.