<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>67022</bug_id>
          
          <creation_ts>2011-08-26 00:16:19 -0700</creation_ts>
          <short_desc>Investigate how to pass 8-bit characters to ICU</short_desc>
          <delta_ts>2011-08-26 10:30:17 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Web Template Framework</component>
          <version>528+ (Nightly build)</version>
          <rep_platform>Unspecified</rep_platform>
          <op_sys>Unspecified</op_sys>
          <bug_status>UNCONFIRMED</bug_status>
          <resolution></resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          <blocked>66161</blocked>
          <everconfirmed>0</everconfirmed>
          <reporter name="Xianzhu Wang">wangxianzhu</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>ap</cc>
    
    <cc>jshin</cc>
    
    <cc>sullivan</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>457433</commentid>
    <comment_count>0</comment_count>
    <who name="Xianzhu Wang">wangxianzhu</who>
    <bug_when>2011-08-26 00:16:19 -0700</bug_when>
    <thetext>ICU is a big category of the final usages of const UChar* on certain platforms (see https://docs.google.com/a/google.com/spreadsheet/ccc?key=0AsX8ECAuTqEzdEVFYnZaUlZ3QzJCbnBKamwxdGMxVkE&amp;hl=en_US&amp;ndplr=1#gid=10 for details). To efficiently support 8/16-bit string buffers, we should make sure we can pass the strings to ICU without conversion between 8/16-bit formats in most cases.

For now WebKit passes const UChar* parameters to the following components of ICU:

* format: accepts only 16-bit strings. Only used from NumberInputType -&gt; LocalizedNumberICU to parse and format numbers which seem not performance critical.

* ubrk: from 3.4, ICU added ubrk_setUText which accepts an abstract UText parameter to support alternative string formats. We can have our own UText callbacks to support the new 8/16-bit string buffers. We should make sure if the version is available on each platform. We also need to measure the performance of UText concerning the cost of function callbacks.

* ucnv: because we store only ASCII characters in 8-bit string, we can simple take different branch for 8- and 16-bit strings. For 8-bit strings, we simply calls the ASCII conversion functions which accept const char* parameters.

* ucol: we can pass UCharIterator to ICU to support 8/16-bit strings. Also need to measure the performance concerning the cost of function callbacks.

* unorm: When calling unorm_*(), UNORM_NFC is the only mode used by WebKit. In this mode, unorm_*() has no effect on ASCII strings, so we can just test 8/16-bit format and call unorm_*() for only 16-bit strings.

* usearch: accepts only 16-bit strings. The only place in WebKit calling usearch_*() is WebCore/editing/TextIterator.cpp.

* uset: uset_openPattern() accepts only 16-bit strings, but this is not a problem because it’s only called twice in each WebKit process.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>