Bug 53044

Summary: CJK word segmentation does not work
Product: WebKit Reporter: Xiaomei Ji <xji>
Component: TextAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: ap, darin
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
test case none

Description Xiaomei Ji 2011-01-24 13:40:46 PST
Created attachment 79969 [details]
test case

Open the attached test case.

The word segmentation for most ports does not work for CJK languages. 
The correct segmentation when cursor is at every character boundary is the one in "title" attribute.
But the word segmentation result for most ports is single character each word.

For those ports that use ICU,
ICU bug to upstream Chrome's CJK segmentation patch is http://bugs.icu-project.org/trac/ticket/2229
After upstream and Apple picks it up in next version of Mac OS X, Mac and Win port will work correctly.

But there are webkit ports that do not use ICU (e.g. Qt, GTK), in which the port itself should take care of the word segmentation.