Bug 67022
| Summary: | Investigate how to pass 8-bit characters to ICU | ||
|---|---|---|---|
| Product: | WebKit | Reporter: | Xianzhu Wang <wangxianzhu> |
| Component: | Web Template Framework | Assignee: | Nobody <webkit-unassigned> |
| Status: | UNCONFIRMED | ||
| Severity: | Normal | CC: | ap, jshin, sullivan |
| Priority: | P2 | ||
| Version: | 528+ (Nightly build) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Bug Depends on: | |||
| Bug Blocks: | 66161 | ||
Xianzhu Wang
ICU is a big category of the final usages of const UChar* on certain platforms (see https://docs.google.com/a/google.com/spreadsheet/ccc?key=0AsX8ECAuTqEzdEVFYnZaUlZ3QzJCbnBKamwxdGMxVkE&hl=en_US&ndplr=1#gid=10 for details). To efficiently support 8/16-bit string buffers, we should make sure we can pass the strings to ICU without conversion between 8/16-bit formats in most cases.
For now WebKit passes const UChar* parameters to the following components of ICU:
* format: accepts only 16-bit strings. Only used from NumberInputType -> LocalizedNumberICU to parse and format numbers which seem not performance critical.
* ubrk: from 3.4, ICU added ubrk_setUText which accepts an abstract UText parameter to support alternative string formats. We can have our own UText callbacks to support the new 8/16-bit string buffers. We should make sure if the version is available on each platform. We also need to measure the performance of UText concerning the cost of function callbacks.
* ucnv: because we store only ASCII characters in 8-bit string, we can simple take different branch for 8- and 16-bit strings. For 8-bit strings, we simply calls the ASCII conversion functions which accept const char* parameters.
* ucol: we can pass UCharIterator to ICU to support 8/16-bit strings. Also need to measure the performance concerning the cost of function callbacks.
* unorm: When calling unorm_*(), UNORM_NFC is the only mode used by WebKit. In this mode, unorm_*() has no effect on ASCII strings, so we can just test 8/16-bit format and call unorm_*() for only 16-bit strings.
* usearch: accepts only 16-bit strings. The only place in WebKit calling usearch_*() is WebCore/editing/TextIterator.cpp.
* uset: uset_openPattern() accepts only 16-bit strings, but this is not a problem because it’s only called twice in each WebKit process.
| Attachments | ||
|---|---|---|
| Add attachment proposed patch, testcase, etc. |