<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>53044</bug_id>
          
          <creation_ts>2011-01-24 13:40:46 -0800</creation_ts>
          <short_desc>CJK word segmentation does not work</short_desc>
          <delta_ts>2011-01-25 15:46:04 -0800</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Text</component>
          <version>528+ (Nightly build)</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>NEW</bug_status>
          <resolution></resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Xiaomei Ji">xji</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>ap</cc>
    
    <cc>darin</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>339239</commentid>
    <comment_count>0</comment_count>
      <attachid>79969</attachid>
    <who name="Xiaomei Ji">xji</who>
    <bug_when>2011-01-24 13:40:46 -0800</bug_when>
    <thetext>Created attachment 79969
test case

Open the attached test case.

The word segmentation for most ports does not work for CJK languages. 
The correct segmentation when cursor is at every character boundary is the one in &quot;title&quot; attribute.
But the word segmentation result for most ports is single character each word.

For those ports that use ICU,
ICU bug to upstream Chrome&apos;s CJK segmentation patch is http://bugs.icu-project.org/trac/ticket/2229
After upstream and Apple picks it up in next version of Mac OS X, Mac and Win port will work correctly.

But there are webkit ports that do not use ICU (e.g. Qt, GTK), in which the port itself should take care of the word segmentation.</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>79969</attachid>
            <date>2011-01-24 13:40:46 -0800</date>
            <delta_ts>2011-01-24 13:40:46 -0800</delta_ts>
            <desc>test case</desc>
            <filename>cjk-segmentation.html</filename>
            <type>text/html</type>
            <size>2048</size>
            <attacher name="Xiaomei Ji">xji</attacher>
            
              <data encoding="base64">PGh0bWw+CjxoZWFkPgo8TUVUQSBIVFRQLUVRVUlWPSJDT05URU5ULVRZUEUiIENPTlRFTlQ9InRl
eHQvaHRtbDsgY2hhcnNldD1VVEYtOCI+Cjx0aXRsZT5UZXN0IGZvciBDSksgc2VnbWVudGF0aW9u
LjwvdGl0bGU+CjxzY3JpcHQ+CmZ1bmN0aW9uIGxvZyhzdHIpCnsKICAgIHZhciBsaSA9IGRvY3Vt
ZW50LmNyZWF0ZUVsZW1lbnQoImxpIik7CiAgICBsaS5hcHBlbmRDaGlsZChkb2N1bWVudC5jcmVh
dGVUZXh0Tm9kZShzdHIpKTsKICAgIHZhciBjb25zb2xlID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5
SWQoImNvbnNvbGUiKTsKICAgIGNvbnNvbGUuYXBwZW5kQ2hpbGQobGkpOwp9CgpmdW5jdGlvbiBh
c3NlcnRFcXVhbCh0ZXN0X25hbWUsIGFjdHVhbCwgZXhwZWN0ZWQpCnsKICAgIGlmIChhY3R1YWwg
IT0gZXhwZWN0ZWQpIHsKICAgICAgICBsb2coIj09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT0iKTsKICAgICAgICBsb2coIkZBSUxFRCB0ZXN0ICIgKyB0ZXN0X25hbWUpOwogICAgICAg
IGxvZygiYWN0dWFsOiAiICsgYWN0dWFsKTsKICAgICAgICBsb2coImV4cGVjdGVkOiAiICsgZXhw
ZWN0ZWQpOwogICAgfQp9CgpmdW5jdGlvbiB0ZXN0KCkKewogICAgaWYgKHdpbmRvdy5sYXlvdXRU
ZXN0Q29udHJvbGxlcikKICAgICAgICBsYXlvdXRUZXN0Q29udHJvbGxlci5kdW1wQXNUZXh0KCk7
CgogICAgdmFyIHJhbmdlID0gZG9jdW1lbnQuY3JlYXRlUmFuZ2UoKTsKICAgIHZhciBkaXZzID0g
ZG9jdW1lbnQuZ2V0RWxlbWVudHNCeUNsYXNzTmFtZSgiemgtQ04tZGl2Iik7CiAgICBmb3IgKHZh
ciBqID0gMDsgaiA8IGRpdnMubGVuZ3RoOyArK2opIHsKICAgICAgICB2YXIgZGl2ID0gZGl2c1tq
XTsKICAgICAgICB2YXIgbGVuZ3RoID0gZGl2LmlubmVyVGV4dC5sZW5ndGg7CiAgICAgICAgdmFy
IHRpdGxlID0gZGl2LnRpdGxlLnNwbGl0KCcgJyk7CiAgICAgICAgZm9yICh2YXIgaSA9IDA7IGkg
PCBsZW5ndGg7ICsraSkgewogICAgICAgICAgICByYW5nZS5zZXRTdGFydChkaXYuZmlyc3RDaGls
ZCwgaSk7CiAgICAgICAgICAgIHJhbmdlLnNldEVuZChkaXYuZmlyc3RDaGlsZCwgaSk7CiAgICAg
ICAgICAgIHJhbmdlLmV4cGFuZCgnd29yZCcpOwogICAgICAgICAgICB2YXIgYWN0dWFsID0gcmFu
Z2UudG9TdHJpbmcoKTsKICAgICAgICAgICAgYXNzZXJ0RXF1YWwoIiAiLCBhY3R1YWwsIHRpdGxl
W2ldKTsKICAgICAgICB9CiAgICAgICAgZGl2LnN0eWxlLmRpc3BsYXkgPSAibm9uZSI7CiAgICB9
Cn0KPC9zY3JpcHQ+Cjxib2R5IG9ubG9hZD0idGVzdCgpIj4KPHA+VGVzdCBDaGluZXNlIFNlZ21l
bnRhdGlvbi4KPGRpdiBjbGFzcz0iemgtQ04tZGl2IiB0aXRsZT0i5Zu95Yqh6ZmiIOWKoSDlm73l
iqHpmaIg5YWz5LqOIOWFs+S6jiDjgIog5Zyf5ZywIOWcn+WcsCDmiL/lsYsg5oi/5bGLIOeuoeeQ
hiDnrqHnkIYg5p2h5L6LIOadoeS+iyDjgIsiPuWbveWKoemZouWFs+S6juOAiuWcn+WcsOaIv+Wx
i+euoeeQhuadoeS+i+OAizwvZGl2Pgo8ZGl2IGNsYXNzPSJ6aC1DTi1kaXYiIHRpdGxlPSLniank
u7cg54mp5Lu3IOmihOacnyDpooTmnJ8g6LCD5o6nIOiwg+aOpyDnm67moIcg55uu5qCHIOWfuuac
rCDln7rmnKwg5a6e546wIOWunueOsCI+54mp5Lu36aKE5pyf6LCD5o6n55uu5qCH5Z+65pys5a6e
546wPC9kaXY+CjxkaXYgY2xhc3M9InpoLUNOLWRpdiIgdGl0bGU9IuS/hOe9l+aWryDnvZfmlq8g
5L+E572X5pavIOaAu+e7nyDmgLvnu58g77yaIOacuuWcuiDmnLrlnLog54iG54K4IOeIhueCuCDm
mK8g5oGQ5oCWIOaBkOaAliDooq3lh7sg6KKt5Ye7Ij7kv4TnvZfmlq/mgLvnu5/vvJrmnLrlnLrn
iIbngrjmmK/mgZDmgJbooq3lh7s8L2Rpdj4KPGRpdiBjbGFzcz0iemgtQ04tZGl2IiB0aXRsZT0i
5pil6L+QIOaYpei/kCA1IOWkqSDvvIwg5YyX5LqsIOWMl+S6rCDov5DpgIEg6L+Q6YCBIOaXheWu
oiDml4XlrqIgMTQ2IDE0NiAxNDYg5LiHIj7mmKXov5A15aSp77yM5YyX5Lqs6L+Q6YCB5peF5a6i
MTQ25LiHPC9kaXY+Cjx1bCBpZD0iY29uc29sZSI+PC91bD4KCjwvYm9keT4KPC9odG1sPgo=
</data>

          </attachment>
      

    </bug>

</bugzilla>