Bug 67022 - Investigate how to pass 8-bit characters to ICU
Summary: Investigate how to pass 8-bit characters to ICU
Status: UNCONFIRMED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 66161
  Show dependency treegraph
 
Reported: 2011-08-26 00:16 PDT by Xianzhu Wang
Modified: 2011-08-26 10:30 PDT (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Xianzhu Wang 2011-08-26 00:16:19 PDT
ICU is a big category of the final usages of const UChar* on certain platforms (see https://docs.google.com/a/google.com/spreadsheet/ccc?key=0AsX8ECAuTqEzdEVFYnZaUlZ3QzJCbnBKamwxdGMxVkE&hl=en_US&ndplr=1#gid=10 for details). To efficiently support 8/16-bit string buffers, we should make sure we can pass the strings to ICU without conversion between 8/16-bit formats in most cases.

For now WebKit passes const UChar* parameters to the following components of ICU:

* format: accepts only 16-bit strings. Only used from NumberInputType -> LocalizedNumberICU to parse and format numbers which seem not performance critical.

* ubrk: from 3.4, ICU added ubrk_setUText which accepts an abstract UText parameter to support alternative string formats. We can have our own UText callbacks to support the new 8/16-bit string buffers. We should make sure if the version is available on each platform. We also need to measure the performance of UText concerning the cost of function callbacks.

* ucnv: because we store only ASCII characters in 8-bit string, we can simple take different branch for 8- and 16-bit strings. For 8-bit strings, we simply calls the ASCII conversion functions which accept const char* parameters.

* ucol: we can pass UCharIterator to ICU to support 8/16-bit strings. Also need to measure the performance concerning the cost of function callbacks.

* unorm: When calling unorm_*(), UNORM_NFC is the only mode used by WebKit. In this mode, unorm_*() has no effect on ASCII strings, so we can just test 8/16-bit format and call unorm_*() for only 16-bit strings.

* usearch: accepts only 16-bit strings. The only place in WebKit calling usearch_*() is WebCore/editing/TextIterator.cpp.

* uset: uset_openPattern() accepts only 16-bit strings, but this is not a problem because it’s only called twice in each WebKit process.