Bug 15630 - After U+3001, U+3002 (ideographic comma/full stop), lines cannot be broken
Summary: After U+3001, U+3002 (ideographic comma/full stop), lines cannot be broken
Status: RESOLVED DUPLICATE of bug 17411
Alias: None
Product: WebKit
Classification: Unclassified
Component: Layout and Rendering (show other bugs)
Version: 523.x (Safari 3)
Hardware: PC Windows XP
: P2 Normal
Assignee: Nobody
URL: http://usstock.jrj.com.cn/xhmt
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-22 15:16 PDT by Jungshik Shin
Modified: 2008-02-25 14:28 PST (History)
3 users (show)

See Also:


Attachments
layout test case (710 bytes, text/html)
2007-10-22 15:19 PDT, Jungshik Shin
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jungshik Shin 2007-10-22 15:16:11 PDT
Due to the problem described in the summary line, the layout at http://usstock.jrj.com.cn/xhmt is broken.  

WebKit uses ICU line breaking iteratoer, which in turn is based on UAX #14 (Unicode Line Breaking Algorithm). It has the following rule:

CL x (AL|NU)

where CL includes U+3001 and U+3002 (Ideographic Comma and Full Stop).  With the above rule, lines cannot be broken when U+3001 and U+3002 are followed by a Latin letter or a number.  As a result, the box at the url given above with the title
Comment 1 Jungshik Shin 2007-10-22 15:19:09 PDT
Created attachment 16810 [details]
layout test case 

two columns should be rendered identically.
Comment 2 Jungshik Shin 2007-10-22 15:24:09 PDT
Hmm my comment #0 got trimmed....

As a result, the box at the url given above with the title 
Comment 3 Jungshik Shin 2007-10-22 17:25:23 PDT
Try once more (this time with FF ;-)). 

The textbox whose title is '美通社简介' is a lot wider than its specified width breaking the layout of the page. 

A fix is very simple. We have to tailor UAX #14's line breaking property so that U+3001 and U+3002 followed by a Latin letter/number (or more broadly, any character belonging to AL/NU classes) are regarded as a line breaking opportunity. A way to do that is to move those characters from CL class to NS (non-starter) class in ICU's source/data/brkiter/line.txt. 

For WinSafari, it'd be a simple change, but for Safari on Mac, this may be more involved because it may mean changing the build of ICU shipped with OS X. 

Comment 4 David Kilzer (:ddkilzer) 2007-10-22 22:56:34 PDT
(In reply to comment #3)
> Try once more (this time with FF ;-)). 

It sounds like you're hitting Bug 14562 (or something similar) when entering text in a text area (which is truncated when sent to the server).

Could you please file a new bug on this, stating the version of Safari/WebKit you're using, and steps to reproduce.  Thanks!

Comment 5 Alexey Proskuryakov 2007-10-25 09:47:05 PDT
I can only reproduce this problem on Windows - Mac (Tiger) works as expected for me.
Comment 6 Alexey Proskuryakov 2007-10-26 09:18:33 PDT
Do you know if this has been reported to the Unicode consortium? This rule is new to Unicode 5.0, and doesn't look quite right, as you point out.
Comment 7 Jungshik Shin 2007-10-26 14:44:48 PDT
Yes, I've been in contact with the author of UAX #14 (indirectly). I talked to the author of ICU break iterator and he agreed with me (actually, we sat together and he suggested changing the class of those two to NS).  
Comment 8 Alexey Proskuryakov 2008-02-18 02:51:21 PST
Bug 17411 has a patch for this.

I'm still unsure whether the Unicode consortium is aware of this issue. ICU is one thing, but the proposed update to UAX #14 at <http://www.unicode.org/reports/tr14/tr14-21.html> seems to be unchanged.
Comment 9 Satoshi Nakagawa 2008-02-23 14:33:52 PST
(In reply to comment #8)
I agree.  They could be not aware of this issue.
I wrote a report about this problem, and sent it to the Unicode ML.

http://limechat.net/report/unicode-line-break-problem.html
Comment 10 Alexey Proskuryakov 2008-02-23 22:08:00 PST
Marking as a duplicate, as bug 17411 has an approved fix for this.

*** This bug has been marked as a duplicate of 17411 ***
Comment 11 Jungshik Shin 2008-02-25 14:28:08 PST
(In reply to comment #8)
> Bug 17411 has a patch for this.
> 
> I'm still unsure whether the Unicode consortium is aware of this issue. ICU is
> one thing, but the proposed update to UAX #14 at
> <http://www.unicode.org/reports/tr14/tr14-21.html> seems to be unchanged.

In the meantime, they nuked LB30 instead of changing the class for U+3001/3002.  A long-term solution is being worked on according to my source. 

Anyway, on Mac OS X, ICU will always lag behind, I agree that we should fix webkit code (as in bug 17411) .