Created attachment 181652 [details] inseparable.html. Reproduced content for justification. In the attached inseparable.html, U+3033 should not be separated from U+3035, but it is separated. This bug is only reproduced on Mac, because other platforms don't expand between ideographs. Requirements for Japanese Text Layout say not to separate the characters. http://www.w3.org/TR/jlreq/#character_sequences_which_do_not_allow_space_insertion_as_part_of_line_adjustment_processing > Combinations of character classes which allow spaces to be inserted for line alignment, are described as a complete table in Appendix E Opportunities for Inter-character Space Expansion during Line Adjustment, following 3.9 About Character Classes. In 3.9 About Character Classes, U+3033 and U+3035 are Inseparable characters (cl-08). In 4th note in Appendix E.2 Notes: http://www.w3.org/TR/jlreq/#opportunities_for_intercharacter_space_expansion_during_line_adjustment > A third order opportunity exists for inter-character space expansion, to take up to a maximum of a quarter em space, with respect to the corresponding character size, between two consecutive inseparable characters (cl-08) which are of different kinds. Therefore, we should not separate separate U+3033 from U+3035. Line breaking also is occurred between U+3033 and U+3035. Please watch inseparable-line-break.html. Requirements for Japanese Text Layout say not to break line between the characters. http://www.w3.org/TR/jlreq/#possibilities_for_linebreaking_between_characters In 5th note in C.2 Notes: > There is no line break opportunity between following couple of consecutive inseparable characters (cl-08) as follows: > VERTICAL KANA REPEAT MARK UPPER HALF "〳", VERTICAL KANA REPEAT MARK LOWER HALF "〵" > VERTICAL KANA REPEAT WITH VOICED SOUND MARK UPPER HALF "〴", VERTICAL KANA REPEAT MARK LOWER HALF "〵"
Created attachment 181653 [details] inseparable-line-break.html. Reproduced content for line breaking.
Created attachment 181657 [details] Patch
(1) line break opportunities need to be determined by ICU and not use a hardcoded escape around ICU such as Font::isUnbreakableCharactersPair; (2) the JLREQ document [1] is not a W3C recommendation; it is a collection of input requirements being considered for preparing normative recommendations, such as CSS3 Text, the current draft of which defines the recommended behavior in [2][3]; (3) the current Unicode Line Break class database marks U+3033 and U+3035 as ID (Ideograph) class, and not IN (Inseperable); in general, ICU and CSS3 Text make normative reference to this database for determining line break classes; (4) there is already a pending patch in process [5] which will be adding line-break property support according to [3][6][7], so any change for JLREQ related line breaking should be handled as part of [5]; [1] http://www.w3.org/TR/2012/NOTE-jlreq-20120403/ [2] http://dev.w3.org/csswg/css3-text/#line-break-details [3] http://dev.w3.org/csswg/css3-text/#line-break [4] http://www.unicode.org/Public/UNIDATA/LineBreak.txt [5] http://bugs.webkit.org/show_bug.cgi?id=89235 [6] http://trac.webkit.org/wiki/LineBreaking [7] http://trac.webkit.org/wiki/LineBreakingCSS3Mapping
mark as dependent on bug 89235 to resolve line break semantics for japanese
Thank you, Glenn. Your advice is very helpful to me. I will ask CSS guys and Unicode guys to follow JLREQ behavior. Therefore, I currently remove review flag.
Unicode 6.3 will fix line break property for U+3035 to CM. It will be propagated when ICU incorporates new data from CLDR. Please be prepared, ANY * CM will not break, and not to justify between them.
inseparable.html. Reproduced content for justification. <- WebKit Trunk, Chrome Canary 112 and Firefox Nightly 111 match each other. inseparable-line-break.html. Reproduced content for line breaking. <- WebKit Trunk & Chrome Canary 112 match each other but Firefox Nightly 111 differ in this. I am not sure on the desired behavior in the last test, so will tag others to comment about whether it is something need to be fixed in WebKit or not. Thanks!