Bug 53044 - CJK word segmentation does not work
Summary: CJK word segmentation does not work
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-24 13:40 PST by Xiaomei Ji
Modified: 2011-01-25 15:46 PST (History)
2 users (show)

See Also:


Attachments
test case (2.00 KB, text/html)
2011-01-24 13:40 PST, Xiaomei Ji
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Xiaomei Ji 2011-01-24 13:40:46 PST
Created attachment 79969 [details]
test case

Open the attached test case.

The word segmentation for most ports does not work for CJK languages. 
The correct segmentation when cursor is at every character boundary is the one in "title" attribute.
But the word segmentation result for most ports is single character each word.

For those ports that use ICU,
ICU bug to upstream Chrome's CJK segmentation patch is http://bugs.icu-project.org/trac/ticket/2229
After upstream and Apple picks it up in next version of Mac OS X, Mac and Win port will work correctly.

But there are webkit ports that do not use ICU (e.g. Qt, GTK), in which the port itself should take care of the word segmentation.