|Summary:||After U+3001, U+3002 (ideographic comma/full stop), lines cannot be broken|
|Product:||WebKit||Reporter:||Jungshik Shin <jshin>|
|Component:||Layout and Rendering||Assignee:||Nobody <webkit-unassigned>|
|Severity:||Normal||CC:||ap, artension, mitz|
|Version:||523.x (Safari 3)|
Description Jungshik Shin 2007-10-22 15:16:11 PDT
Due to the problem described in the summary line, the layout at http://usstock.jrj.com.cn/xhmt is broken. WebKit uses ICU line breaking iteratoer, which in turn is based on UAX #14 (Unicode Line Breaking Algorithm). It has the following rule: CL x (AL|NU) where CL includes U+3001 and U+3002 (Ideographic Comma and Full Stop). With the above rule, lines cannot be broken when U+3001 and U+3002 are followed by a Latin letter or a number. As a result, the box at the url given above with the title
Comment 1 Jungshik Shin 2007-10-22 15:19:09 PDT
Created attachment 16810 [details] layout test case two columns should be rendered identically.
Comment 2 Jungshik Shin 2007-10-22 15:24:09 PDT
Hmm my comment #0 got trimmed.... As a result, the box at the url given above with the title
Comment 3 Jungshik Shin 2007-10-22 17:25:23 PDT
Try once more (this time with FF ;-)). The textbox whose title is '美通社简介' is a lot wider than its specified width breaking the layout of the page. A fix is very simple. We have to tailor UAX #14's line breaking property so that U+3001 and U+3002 followed by a Latin letter/number (or more broadly, any character belonging to AL/NU classes) are regarded as a line breaking opportunity. A way to do that is to move those characters from CL class to NS (non-starter) class in ICU's source/data/brkiter/line.txt. For WinSafari, it'd be a simple change, but for Safari on Mac, this may be more involved because it may mean changing the build of ICU shipped with OS X.
Comment 4 David Kilzer (:ddkilzer) 2007-10-22 22:56:34 PDT
(In reply to comment #3) > Try once more (this time with FF ;-)). It sounds like you're hitting Bug 14562 (or something similar) when entering text in a text area (which is truncated when sent to the server). Could you please file a new bug on this, stating the version of Safari/WebKit you're using, and steps to reproduce. Thanks!
Comment 5 Alexey Proskuryakov 2007-10-25 09:47:05 PDT
I can only reproduce this problem on Windows - Mac (Tiger) works as expected for me.
Comment 6 Alexey Proskuryakov 2007-10-26 09:18:33 PDT
Do you know if this has been reported to the Unicode consortium? This rule is new to Unicode 5.0, and doesn't look quite right, as you point out.
Comment 7 Jungshik Shin 2007-10-26 14:44:48 PDT
Yes, I've been in contact with the author of UAX #14 (indirectly). I talked to the author of ICU break iterator and he agreed with me (actually, we sat together and he suggested changing the class of those two to NS).
Comment 8 Alexey Proskuryakov 2008-02-18 02:51:21 PST
Bug 17411 has a patch for this. I'm still unsure whether the Unicode consortium is aware of this issue. ICU is one thing, but the proposed update to UAX #14 at <http://www.unicode.org/reports/tr14/tr14-21.html> seems to be unchanged.
Comment 9 Satoshi Nakagawa 2008-02-23 14:33:52 PST
(In reply to comment #8) I agree. They could be not aware of this issue. I wrote a report about this problem, and sent it to the Unicode ML. http://limechat.net/report/unicode-line-break-problem.html
Comment 10 Alexey Proskuryakov 2008-02-23 22:08:00 PST
Comment 11 Jungshik Shin 2008-02-25 14:28:08 PST
(In reply to comment #8) > Bug 17411 has a patch for this. > > I'm still unsure whether the Unicode consortium is aware of this issue. ICU is > one thing, but the proposed update to UAX #14 at > <http://www.unicode.org/reports/tr14/tr14-21.html> seems to be unchanged. In the meantime, they nuked LB30 instead of changing the class for U+3001/3002. A long-term solution is being worked on according to my source. Anyway, on Mac OS X, ICU will always lag behind, I agree that we should fix webkit code (as in bug 17411) .