Bug 15630

Summary: After U+3001, U+3002 (ideographic comma/full stop), lines cannot be broken
Product: WebKit Reporter: Jungshik Shin <jshin>
Component: Layout and RenderingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED DUPLICATE    
Severity: Normal CC: ap, artension, mitz
Priority: P2    
Version: 523.x (Safari 3)   
Hardware: PC   
OS: Windows XP   
URL: http://usstock.jrj.com.cn/xhmt
Attachments:
Description Flags
layout test case none

Jungshik Shin
Reported 2007-10-22 15:16:11 PDT
Due to the problem described in the summary line, the layout at http://usstock.jrj.com.cn/xhmt is broken. WebKit uses ICU line breaking iteratoer, which in turn is based on UAX #14 (Unicode Line Breaking Algorithm). It has the following rule: CL x (AL|NU) where CL includes U+3001 and U+3002 (Ideographic Comma and Full Stop). With the above rule, lines cannot be broken when U+3001 and U+3002 are followed by a Latin letter or a number. As a result, the box at the url given above with the title
Attachments
layout test case (710 bytes, text/html)
2007-10-22 15:19 PDT, Jungshik Shin
no flags
Jungshik Shin
Comment 1 2007-10-22 15:19:09 PDT
Created attachment 16810 [details] layout test case two columns should be rendered identically.
Jungshik Shin
Comment 2 2007-10-22 15:24:09 PDT
Hmm my comment #0 got trimmed.... As a result, the box at the url given above with the title
Jungshik Shin
Comment 3 2007-10-22 17:25:23 PDT
Try once more (this time with FF ;-)). The textbox whose title is '美通社简介' is a lot wider than its specified width breaking the layout of the page. A fix is very simple. We have to tailor UAX #14's line breaking property so that U+3001 and U+3002 followed by a Latin letter/number (or more broadly, any character belonging to AL/NU classes) are regarded as a line breaking opportunity. A way to do that is to move those characters from CL class to NS (non-starter) class in ICU's source/data/brkiter/line.txt. For WinSafari, it'd be a simple change, but for Safari on Mac, this may be more involved because it may mean changing the build of ICU shipped with OS X.
David Kilzer (:ddkilzer)
Comment 4 2007-10-22 22:56:34 PDT
(In reply to comment #3) > Try once more (this time with FF ;-)). It sounds like you're hitting Bug 14562 (or something similar) when entering text in a text area (which is truncated when sent to the server). Could you please file a new bug on this, stating the version of Safari/WebKit you're using, and steps to reproduce. Thanks!
Alexey Proskuryakov
Comment 5 2007-10-25 09:47:05 PDT
I can only reproduce this problem on Windows - Mac (Tiger) works as expected for me.
Alexey Proskuryakov
Comment 6 2007-10-26 09:18:33 PDT
Do you know if this has been reported to the Unicode consortium? This rule is new to Unicode 5.0, and doesn't look quite right, as you point out.
Jungshik Shin
Comment 7 2007-10-26 14:44:48 PDT
Yes, I've been in contact with the author of UAX #14 (indirectly). I talked to the author of ICU break iterator and he agreed with me (actually, we sat together and he suggested changing the class of those two to NS).
Alexey Proskuryakov
Comment 8 2008-02-18 02:51:21 PST
Bug 17411 has a patch for this. I'm still unsure whether the Unicode consortium is aware of this issue. ICU is one thing, but the proposed update to UAX #14 at <http://www.unicode.org/reports/tr14/tr14-21.html> seems to be unchanged.
Satoshi Nakagawa
Comment 9 2008-02-23 14:33:52 PST
(In reply to comment #8) I agree. They could be not aware of this issue. I wrote a report about this problem, and sent it to the Unicode ML. http://limechat.net/report/unicode-line-break-problem.html
Alexey Proskuryakov
Comment 10 2008-02-23 22:08:00 PST
Marking as a duplicate, as bug 17411 has an approved fix for this. *** This bug has been marked as a duplicate of 17411 ***
Jungshik Shin
Comment 11 2008-02-25 14:28:08 PST
(In reply to comment #8) > Bug 17411 has a patch for this. > > I'm still unsure whether the Unicode consortium is aware of this issue. ICU is > one thing, but the proposed update to UAX #14 at > <http://www.unicode.org/reports/tr14/tr14-21.html> seems to be unchanged. In the meantime, they nuked LB30 instead of changing the class for U+3001/3002. A long-term solution is being worked on according to my source. Anyway, on Mac OS X, ICU will always lag behind, I agree that we should fix webkit code (as in bug 17411) .
Note You need to log in before you can comment on or make changes to this bug.