UNCONFIRMED 37543
Need more sophisticated line breaking rule for CJK about quotation marks
https://bugs.webkit.org/show_bug.cgi?id=37543
Summary Need more sophisticated line breaking rule for CJK about quotation marks
Xianzhu Wang
Reported 2010-04-13 19:23:14 PDT
For now, all quotation marks in CJK text are considered prohibiting line breaking both before and after them. This is not unacceptable, but a sophisticated CJK text layout software should also consider the opening and closing natures of the quotation marks and apply different line breaking rules. For example, the following Chinese text 一二三四五六七八九“甲乙丙丁” is displayed in a container whose width can contain 10 Chinese characters. In current WebKit, the above text will be displayed as: 一二三四五六七八 九“甲乙丙丁” while all word-processing software and other browsers (IE, Firefox) will display the text as: 一二三四五六七八九 “甲乙丙丁” which better utilizes the container space and looks better to Chinese people. Firefox implemented an algorithm (https://wiki.mozilla.org/Gecko:Line_Breaking http://mxr.mozilla.org/mozilla1.9.2/source/intl/lwbrk/src/nsJISx4501LineBreaker.cpp) conforming to the JIS X 4051 standard (Formatting rules for Japanese documents) which also applies to Chinese and Korean documents. I'd like to add JIS X 4501 support in WebKit. What's the rule about importing source code of other licenses?
Attachments
Alexey Proskuryakov
Comment 1 2010-04-14 14:36:39 PDT
Licensing requirements are available at patch submit page. Generally, we can only take code that is BSD or LGPL-licensed. We normally follow the Unicode line breaking algorithm <http://unicode.org/reports/tr14/>. While deviations from it are acceptable, they certainly need to be explained in detail. One question to answer is - why the Unicode algorithm doesn't implement this?
Xianzhu Wang
Comment 2 2010-04-14 23:18:34 PDT
Though UAX14 says that by default quotation marks "act like they are both opening and closing" thus prohibit line breaks both before and after it, there is also a note: "If language information is available, it can be used to determine which character is used as the opening quote and which as the closing quote. ... the quotation marks could be tailored to either OP or CL depending on their actual usage." Mozilla bug https://bugzilla.mozilla.org/show_bug.cgi?id=450088 contains detailed discussions about the quotation mark line breaking issue. Mozilla's rule about quotation marks are as follows (not including the quotation marks that already have OP or CL line breaking properties): 1. The following left quotation marks are treated as opening punctuations (prohibiting break after) in all language contexts: 201B;QU # SINGLE HIGH-REVERSED-9 QUOTATION MARK 201F;QU # DOUBLE HIGH-REVERSED-9 QUOTATION MARK 2. The following left quotation marks are treated as opening punctuations (prohibiting break after) in CJK contexts (where the next character is CJK a character): 00AB;QU # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 2018;QU # LEFT SINGLE QUOTATION MARK 201C;QU # LEFT DOUBLE QUOTATION MARK 3. The following right quotation marks are treated as closing punctuations (prohibiting break before) in all language contexts: 2019;QU # RIGHT SINGLE QUOTATION MARK 201D;QU # RIGHT DOUBLE QUOTATION MARK I think the above 1 should be combined into 2 because 201B and 201F has the same semantic as 2018 and 201C respectively. I think the solution might be either pushing icu to add this functionality or implementing an extra layer over icu in webkit code.
Note You need to log in before you can comment on or make changes to this bug.