Bug 106303 - text-align:justify separate U+3033 from U+3035
Summary: text-align:justify separate U+3033 from U+3035
Status: UNCONFIRMED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Layout and Rendering (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.7
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on: 89235
Blocks:
  Show dependency treegraph
 
Reported: 2013-01-07 23:23 PST by Yuki Sekiguchi
Modified: 2023-02-03 05:56 PST (History)
10 users (show)

See Also:


Attachments
inseparable.html. Reproduced content for justification. (229 bytes, text/html)
2013-01-07 23:23 PST, Yuki Sekiguchi
no flags Details
inseparable-line-break.html. Reproduced content for line breaking. (171 bytes, text/html)
2013-01-07 23:24 PST, Yuki Sekiguchi
no flags Details
Patch (20.55 KB, patch)
2013-01-07 23:44 PST, Yuki Sekiguchi
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yuki Sekiguchi 2013-01-07 23:23:23 PST
Created attachment 181652 [details]
inseparable.html. Reproduced content for justification.

In the attached inseparable.html, U+3033 should not be separated from U+3035, but it is separated.

This bug is only reproduced on Mac, because other platforms don't expand between ideographs.

Requirements for Japanese Text Layout say not to separate the characters.
http://www.w3.org/TR/jlreq/#character_sequences_which_do_not_allow_space_insertion_as_part_of_line_adjustment_processing
>  Combinations of character classes which allow spaces to be inserted for line alignment, are described as a complete table in Appendix E Opportunities for Inter-character Space Expansion during Line Adjustment, following 3.9 About Character Classes.

In 3.9 About Character Classes, U+3033 and U+3035 are Inseparable characters (cl-08).
In 4th note in Appendix E.2 Notes:
http://www.w3.org/TR/jlreq/#opportunities_for_intercharacter_space_expansion_during_line_adjustment
> A third order opportunity exists for inter-character space expansion, to take up to a maximum of a quarter em space, with respect to the corresponding character size, between two consecutive inseparable characters (cl-08) which are of different kinds.

Therefore, we should not separate separate U+3033 from U+3035.

Line breaking also is occurred between U+3033 and U+3035.
Please watch inseparable-line-break.html.

Requirements for Japanese Text Layout say not to break line between the characters.
http://www.w3.org/TR/jlreq/#possibilities_for_linebreaking_between_characters
In 5th note in C.2 Notes:
> There is no line break opportunity between following couple of consecutive inseparable characters (cl-08) as follows:
> VERTICAL KANA REPEAT MARK UPPER HALF "〳", VERTICAL KANA REPEAT MARK LOWER HALF "〵"
> VERTICAL KANA REPEAT WITH VOICED SOUND MARK UPPER HALF "〴", VERTICAL KANA REPEAT MARK LOWER HALF "〵"
Comment 1 Yuki Sekiguchi 2013-01-07 23:24:51 PST
Created attachment 181653 [details]
inseparable-line-break.html. Reproduced content for line breaking.
Comment 2 Yuki Sekiguchi 2013-01-07 23:44:29 PST
Created attachment 181657 [details]
Patch
Comment 3 Glenn Adams 2013-01-08 08:40:09 PST
(1) line break opportunities need to be determined by ICU and not use a hardcoded escape around ICU such as Font::isUnbreakableCharactersPair;

(2) the JLREQ document [1] is not a W3C recommendation; it is a collection of input requirements being considered for preparing normative recommendations, such as CSS3 Text, the current draft of which defines the recommended behavior in [2][3];

(3) the current Unicode Line Break class database marks U+3033 and U+3035 as ID (Ideograph) class, and not IN (Inseperable); in general, ICU and CSS3 Text make normative reference to this database for determining line break classes;

(4) there is already a pending patch in process [5] which will be adding line-break property support according to [3][6][7], so any change for JLREQ related line breaking should be handled as part of [5];

[1] http://www.w3.org/TR/2012/NOTE-jlreq-20120403/
[2] http://dev.w3.org/csswg/css3-text/#line-break-details
[3] http://dev.w3.org/csswg/css3-text/#line-break
[4] http://www.unicode.org/Public/UNIDATA/LineBreak.txt
[5] http://bugs.webkit.org/show_bug.cgi?id=89235 
[6] http://trac.webkit.org/wiki/LineBreaking
[7] http://trac.webkit.org/wiki/LineBreakingCSS3Mapping
Comment 4 Glenn Adams 2013-01-08 08:41:04 PST
mark as dependent on bug 89235 to resolve line break semantics for japanese
Comment 5 Yuki Sekiguchi 2013-01-08 20:14:06 PST
Thank you, Glenn.
Your advice is very helpful to me.

I will ask CSS guys and Unicode guys to follow JLREQ behavior.

Therefore, I currently remove review flag.
Comment 6 Koji Ishii 2013-03-02 03:03:26 PST
Unicode 6.3 will fix line break property for U+3035 to CM. It will be propagated when ICU incorporates new data from CLDR.

Please be prepared, ANY * CM will not break, and not to justify between them.
Comment 7 Ahmad Saleem 2023-02-03 05:56:25 PST
inseparable.html. Reproduced content for justification. <- WebKit Trunk, Chrome Canary 112 and Firefox Nightly 111 match each other.

inseparable-line-break.html. Reproduced content for line breaking. <- WebKit Trunk & Chrome Canary 112 match each other but Firefox Nightly 111 differ in this.

I am not sure on the desired behavior in the last test, so will tag others to comment about whether it is something need to be fixed in WebKit or not. Thanks!