Bug 17411

Summary: Ideographic comma or full stop, and subsequent ASCII token should be breakable into two lines
Product: WebKit Reporter: Satoshi Nakagawa <artension>
Component: TextAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, artension, eric, jshin, mitz
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.5   
URL: http://limechat.net/report/webkit-line-break-problem.html
Bug Depends on:    
Bug Blocks: 18721    
Attachments:
Description Flags
Fix and a test case
none
Revised patch darin: review+

Description Satoshi Nakagawa 2008-02-17 08:46:15 PST
With the current implementation of WebKit, '、' (U+3001) and '。' (U+3002) are not treated as line-breakable characters.

But in Japanese writing, we use them like comma and period in English. I think you can break a line just after comma or period. It's the same situation for '、' and '。' in Japanese.

I confirmed IE6/7 and Firefox 2/3 implemented it correctly.
Comment 1 Satoshi Nakagawa 2008-02-17 08:54:53 PST
Created attachment 19178 [details]
Fix and a test case
Comment 2 Satoshi Nakagawa 2008-02-17 09:39:08 PST
Comment on attachment 19178 [details]
Fix and a test case

Index: WebCore/ChangeLog
===================================================================
--- WebCore/ChangeLog	(revision 30355)
+++ WebCore/ChangeLog	(working copy)
@@ -1,3 +1,17 @@
+2008-02-17  Satoshi Nakagawa  <artension@gmail.com>
+
+        Reviewed by NOBODY (OOPS!).
+
+        Bug 17411: Ideographic comma and full stop should be treated as line-breakable characters
+        http://bugs.webkit.org/show_bug.cgi?id=17411
+
+        Test: fast/text/line-breaks-after-ideographic-comma-or-full-stop.html
+
+        * ChangeLog:
+        * platform/text/CharacterNames.h:
+        * rendering/break_lines.cpp:
+        (WebCore::shouldBreakAfter):
+
 2008-02-17  Bin Chen  <binary.chen@gmail.com>
 
         Reviewed by Alp Toker.
Index: WebCore/platform/text/CharacterNames.h
===================================================================
--- WebCore/platform/text/CharacterNames.h	(revision 30353)
+++ WebCore/platform/text/CharacterNames.h	(working copy)
@@ -39,6 +39,8 @@ namespace WebCore {
     const UChar bullet = 0x2022;
     const UChar horizontalEllipsis = 0x2026;
     const UChar ideographicSpace = 0x3000;
+    const UChar ideographicComma = 0x3001;
+    const UChar ideographicFullStop = 0x3002;
     const UChar leftToRightMark = 0x200E;
     const UChar leftToRightEmbed = 0x202A;
     const UChar leftToRightOverride = 0x202D;
Index: WebCore/rendering/break_lines.cpp
===================================================================
--- WebCore/rendering/break_lines.cpp	(revision 30353)
+++ WebCore/rendering/break_lines.cpp	(working copy)
@@ -51,6 +51,8 @@ static inline bool shouldBreakAfter(UCha
         case '-':
         case '?':
         case softHyphen:
+        case ideographicComma:
+        case ideographicFullStop:
             return true;
         default:
             return false;
Index: LayoutTests/ChangeLog
===================================================================
--- LayoutTests/ChangeLog	(revision 30355)
+++ LayoutTests/ChangeLog	(working copy)
@@ -1,3 +1,11 @@
+2008-02-17  Satoshi Nakagawa  <artension@gmail.com>
+
+        Reviewed by NOBODY (OOPS!).
+
+        - test for http://bugs.webkit.org/show_bug.cgi?id=17411
+
+        * fast/text/line-breaks-after-ideographic-comma-or-full-stop.html: Added.
+
 2008-02-17  Dan Bernstein  <mitz@apple.com>
 
         Reviewed by Darin Adler.
Index: LayoutTests/fast/text/line-breaks-after-ideographic-comma-or-full-stop.html
===================================================================
--- LayoutTests/fast/text/line-breaks-after-ideographic-comma-or-full-stop.html	(revision 0)
+++ LayoutTests/fast/text/line-breaks-after-ideographic-comma-or-full-stop.html	(revision 0)
@@ -0,0 +1,36 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
+"http://www.w3.org/TR/html4/loose.dtd">
+<html>
+    <head>
+        <title>Line breaks after ideographic comma or full stop</title>
+    </head>
+    <body>
+        These are good:
+        <div style="font-family:'Lucida Grande'; font-size:16pt; text-decoration:underline; width:5em;">
+            <p style="border:solid green 1px;">
+                &#x3042;&#x3044;&#x3046;&#x3001;<br/>abc
+            </p>
+            <p style="border:solid green 1px;">
+                &#x3042;&#x3044;&#x3046;&#x3002;<br/>abc
+            </p>
+        </div>
+        The following two should look like &ldquo;good&rdquo;:
+        <div style="font-family:'Lucida Grande'; font-size:16pt; text-decoration:underline; width:5em;">
+            <p style="border:solid blue 1px;">
+                &#x3042;&#x3044;&#x3046;&#x3001;abc
+            </p>
+            <p style="border:solid blue 1px;">
+                &#x3042;&#x3044;&#x3046;&#x3002;abc
+            </p>
+        </div>
+        These are bad:
+        <div style="font-family:'Lucida Grande'; font-size:16pt; text-decoration:underline; width:5em;">
+            <p style="border:solid red 1px;">
+                &#x3042;&#x3044;<br/>&#x3046;&#x3001;abc
+            </p>
+            <p style="border:solid red 1px;">
+                &#x3042;&#x3044;<br/>&#x3046;&#x3002;abc
+            </p>
+        </div>
+    </body>
+</html>
Index: LayoutTests/platform/mac/fast/text/line-breaks-after-ideographic-comma-or-full-stop-expected.txt
===================================================================
--- LayoutTests/platform/mac/fast/text/line-breaks-after-ideographic-comma-or-full-stop-expected.txt	(revision 0)
+++ LayoutTests/platform/mac/fast/text/line-breaks-after-ideographic-comma-or-full-stop-expected.txt	(revision 0)
@@ -0,0 +1,49 @@
+layer at (0,0) size 800x600
+  RenderView at (0,0) size 800x600
+layer at (0,0) size 800x551
+  RenderBlock {HTML} at (0,0) size 800x551
+    RenderBody {BODY} at (8,8) size 784x522
+      RenderBlock (anonymous) at (0,0) size 784x18
+        RenderText {#text} at (0,0) size 101x18
+          text run at (0,0) width 101: "These are good:"
+      RenderBlock {DIV} at (0,39) size 100x121
+        RenderBlock {P} at (0,0) size 100x50 [border: (1px solid #008000)]
+          RenderText {#text} at (1,1) size 84x24
+            text run at (1,1) width 84: "\x{3042}\x{3044}\x{3046}\x{3001}"
+          RenderBR {BR} at (85,21) size 0x0
+          RenderText {#text} at (1,25) size 36x24
+            text run at (1,25) width 36: "abc"
+        RenderBlock {P} at (0,71) size 100x50 [border: (1px solid #008000)]
+          RenderText {#text} at (1,1) size 84x24
+            text run at (1,1) width 84: "\x{3042}\x{3044}\x{3046}\x{3002}"
+          RenderBR {BR} at (85,21) size 0x0
+          RenderText {#text} at (1,25) size 36x24
+            text run at (1,25) width 36: "abc"
+      RenderBlock (anonymous) at (0,181) size 784x18
+        RenderText {#text} at (0,0) size 277x18
+          text run at (0,0) width 277: "The following two should look like \x{201C}good\x{201D}:"
+      RenderBlock {DIV} at (0,220) size 100x121
+        RenderBlock {P} at (0,0) size 100x50 [border: (1px solid #0000FF)]
+          RenderText {#text} at (1,1) size 84x48
+            text run at (1,1) width 84: "\x{3042}\x{3044}\x{3046}\x{3001}"
+            text run at (1,25) width 36: "abc"
+        RenderBlock {P} at (0,71) size 100x50 [border: (1px solid #0000FF)]
+          RenderText {#text} at (1,1) size 84x48
+            text run at (1,1) width 84: "\x{3042}\x{3044}\x{3046}\x{3002}"
+            text run at (1,25) width 36: "abc"
+      RenderBlock (anonymous) at (0,362) size 784x18
+        RenderText {#text} at (0,0) size 92x18
+          text run at (0,0) width 92: "These are bad:"
+      RenderBlock {DIV} at (0,401) size 100x121
+        RenderBlock {P} at (0,0) size 100x50 [border: (1px solid #FF0000)]
+          RenderText {#text} at (1,1) size 42x24
+            text run at (1,1) width 42: "\x{3042}\x{3044}"
+          RenderBR {BR} at (43,21) size 0x0
+          RenderText {#text} at (1,25) size 78x24
+            text run at (1,25) width 78: "\x{3046}\x{3001}abc"
+        RenderBlock {P} at (0,71) size 100x50 [border: (1px solid #FF0000)]
+          RenderText {#text} at (1,1) size 42x24
+            text run at (1,1) width 42: "\x{3042}\x{3044}"
+          RenderBR {BR} at (43,21) size 0x0
+          RenderText {#text} at (1,25) size 78x24
+            text run at (1,25) width 78: "\x{3046}\x{3002}abc"
Comment 3 Satoshi Nakagawa 2008-02-17 09:43:46 PST
Created attachment 19179 [details]
Revised patch

Updated the test case to make it regardless of the font size.
Comment 4 Alexey Proskuryakov 2008-02-18 02:38:21 PST
Duplicate of bug 15630 (which includes some discussion of standard-related issues).
Comment 5 Darin Adler 2008-02-20 09:27:52 PST
Comment on attachment 19179 [details]
Revised patch

Should this be fixed in ICU rather than in WebKit? We're using the ICU ubrk.h header for this, using UBRK_LINE. Why doesn't it say these characters should be breaks?

The switch statement in break_lines.cpp is *only* to work around bugs in ICU's ubrk.h or to implement rules that don't make sense at the ICU level but are needed for a web browser.

I'll tentatively agree that we should land this patch, but I'm not sure why ICU doesn't already handle this properly.

r=me
Comment 6 Satoshi Nakagawa 2008-02-23 14:35:59 PST
I updated the report to make it more easier to understand.

http://limechat.net/report/webkit-line-break-problem.html

I realized the original cause of this problem is the Unicode line breaking algorithm standard.

http://limechat.net/report/unicode-line-break-problem.html

I have sent this report to the Unicode ML.
Comment 7 Satoshi Nakagawa 2008-02-23 14:58:57 PST
(In reply to comment #5)
As I wrote in the report above, it seems a problem of the Unicode line breaking standards (http://www.unicode.org/reports/tr14/). ICU just implemented it strictly.

So the switch statement in break_lines.cpp is to work around bugs in the Unicode standards.

If the Unicode line breaking algorithm would be fixed, ICU's line break iterator would work correctly. Then WebKit will work well without any patches.

But if so, we need to wait the fix propagation from the Unicode standards, ICU, WebKit to Safari. It will take too much time.

I suggest that the patch should be applied to fix this problem quickly in WebKit.
Comment 8 Satoshi Nakagawa 2008-02-23 15:21:59 PST
Updated summary.
Comment 9 Alexey Proskuryakov 2008-02-23 21:49:26 PST
(In reply to comment #6)
> http://limechat.net/report/unicode-line-break-problem.html
> 
> I have sent this report to the Unicode ML.

Thank you! Is there an archive of the list available somewhere? I do not see the posting in <http://www.unicode.org/mail-arch/unicode-ml/y2008-m02/index.html>, so I assume that you sent it to some other list.

(In reply to comment #7)
> I suggest that the patch should be applied to fix this problem quickly in
> WebKit.

I agree, we shouldn't wait for ICU here. I think we can wait a few days for feedback from the Unicode mailing list though.
Comment 10 Alexey Proskuryakov 2008-02-23 22:08:00 PST
*** Bug 15630 has been marked as a duplicate of this bug. ***
Comment 11 Satoshi Nakagawa 2008-02-23 23:06:13 PST
(In reply to comment #9)
I sent a mail to the list, but it needs to be checked by a moderator. Because the report is my first post to the list. So please wait it.

> I agree, we shouldn't wait for ICU here. I think we can wait a few days for
> feedback from the Unicode mailing list though.

I'm glad to hear that.
Thanks a lot!
Comment 12 Satoshi Nakagawa 2008-02-24 18:52:25 PST
(In reply to comment #9)
My post to the list seemed approved by a moderator, and discussion has been started.

http://www.unicode.org/mail-arch/unicode-ml/y2008-m02/0138.html

I'll start intensive checking.
Comment 13 Alexey Proskuryakov 2008-02-26 02:16:46 PST
Landed in <http://trac.webkit.org/projects/webkit/changeset/30592>.

I changed the test to be text-only to not depend on available fonts and font fallback mechanism (the included results didn't match mine). Also added a few comments.
Comment 14 Satoshi Nakagawa 2008-02-26 13:25:03 PST
(In reply to comment #13)
Thanks a lot!
I have checked it.
It seems good.
Comment 15 Eric Seidel (no email) 2008-04-24 13:24:19 PDT
This does not work for me on Safari 3.1.1. on windows.  It seems this bug isn't really fixed (for windows)?

The latest nightly also fails this test on windows.
Comment 16 Alexey Proskuryakov 2008-04-24 13:59:24 PDT
The Windows results are different visually, which suggests that it may be an entirely different issue. As this report was against the Mac version, and it is fixed with a test case landed, I'd say that it is fully resolved.

Could you please file a new bug?