Due to the problem described in the summary line, the layout at http://usstock.jrj.com.cn/xhmt is broken.
WebKit uses ICU line breaking iteratoer, which in turn is based on UAX #14 (Unicode Line Breaking Algorithm). It has the following rule:
CL x (AL|NU)
where CL includes U+3001 and U+3002 (Ideographic Comma and Full Stop). With the above rule, lines cannot be broken when U+3001 and U+3002 are followed by a Latin letter or a number. As a result, the box at the url given above with the title
Created attachment 16810 [details]
layout test case
two columns should be rendered identically.
Hmm my comment #0 got trimmed....
As a result, the box at the url given above with the title
Try once more (this time with FF ;-)).
The textbox whose title is '美通社简介' is a lot wider than its specified width breaking the layout of the page.
A fix is very simple. We have to tailor UAX #14's line breaking property so that U+3001 and U+3002 followed by a Latin letter/number (or more broadly, any character belonging to AL/NU classes) are regarded as a line breaking opportunity. A way to do that is to move those characters from CL class to NS (non-starter) class in ICU's source/data/brkiter/line.txt.
For WinSafari, it'd be a simple change, but for Safari on Mac, this may be more involved because it may mean changing the build of ICU shipped with OS X.
(In reply to comment #3)
> Try once more (this time with FF ;-)).
It sounds like you're hitting Bug 14562 (or something similar) when entering text in a text area (which is truncated when sent to the server).
Could you please file a new bug on this, stating the version of Safari/WebKit you're using, and steps to reproduce. Thanks!
I can only reproduce this problem on Windows - Mac (Tiger) works as expected for me.
Do you know if this has been reported to the Unicode consortium? This rule is new to Unicode 5.0, and doesn't look quite right, as you point out.
Yes, I've been in contact with the author of UAX #14 (indirectly). I talked to the author of ICU break iterator and he agreed with me (actually, we sat together and he suggested changing the class of those two to NS).
Bug 17411 has a patch for this.
I'm still unsure whether the Unicode consortium is aware of this issue. ICU is one thing, but the proposed update to UAX #14 at <http://www.unicode.org/reports/tr14/tr14-21.html> seems to be unchanged.
(In reply to comment #8)
I agree. They could be not aware of this issue.
I wrote a report about this problem, and sent it to the Unicode ML.
Marking as a duplicate, as bug 17411 has an approved fix for this.
*** This bug has been marked as a duplicate of 17411 ***
(In reply to comment #8)
> Bug 17411 has a patch for this.
> I'm still unsure whether the Unicode consortium is aware of this issue. ICU is
> one thing, but the proposed update to UAX #14 at
> <http://www.unicode.org/reports/tr14/tr14-21.html> seems to be unchanged.
In the meantime, they nuked LB30 instead of changing the class for U+3001/3002. A long-term solution is being worked on according to my source.
Anyway, on Mac OS X, ICU will always lag behind, I agree that we should fix webkit code (as in bug 17411) .