[Harfbuzz] Take into account brackets or quotation marks when collecting runs
https://bugs.webkit.org/show_bug.cgi?id=177003
Summary [Harfbuzz] Take into account brackets or quotation marks when collecting runs
Carlos Garcia Campos
Reported 2017-09-15 08:33:13 PDT
In determining the boundaries of a run of text in a given script, programs must resolve any of the special Script property values, such as Common, based on the context of the surrounding characters. A simple heuristic uses the script of the preceding character, which works well in many cases. However, this may not always produce optimal results. For example, in the text “... gamma (γ) is ...”, this heuristic would cause matching parentheses to be in different scripts. Generally, paired punctuation, such as brackets or quotation marks, belongs to the enclosing or outer level of the text and should therefore match the script of the enclosing text. In addition, opening and closing elements of a pair resolve to the same Script property values, where possible. The use of quotation marks is language dependent; therefore it is not possible to tell from the character code alone whether a particular quotation mark is used as an opening or closing punctuation. For more information, see Section 6.2, General Punctuation, of [Unicode]. http://www.unicode.org/reports/tr24/#Common
Attachments
Test file for brackets handling (338 bytes, text/html)
2017-10-26 06:33 PDT, Khaled Hosny
no flags
Khaled Hosny
Comment 1 2017-10-26 06:33:58 PDT
Created attachment 325003 [details] Test file for brackets handling (copying from bug 178625 comment 11) In the attached HTML file the period should be rendered the same in both lines (you need the font from http://www.amirifont.org/), but currently the second line is different because the closing bracket takes the script of the Latin text before it and subsequently the period is rendered with the Latin script instead of Arabic. It should be noted that both Firefox and Chrome do not seem to handle this, so it seems not to be a priority (LibreOffice does, but I wrote that code and it isn’t a web browser).
Myles C. Maxfield
Comment 2 2017-10-26 12:06:04 PDT
Is this bug about our bidi algorithm implementation or is it about something inside ComplexTextControllerHarfBuzz?
Carlos Garcia Campos
Comment 3 2017-10-27 00:09:08 PDT
(In reply to Myles C. Maxfield from comment #2) > Is this bug about our bidi algorithm implementation or is it about something > inside ComplexTextControllerHarfBuzz? I'm not sure yet. Unless ComplexTextController already takes this into account when breaking runs, it's ComplexTextControllerHarfBuzz specific, but I haven't looked at it in detail yet.
Myles C. Maxfield
Comment 4 2017-10-27 14:00:40 PDT
(In reply to Carlos Garcia Campos from comment #3) > (In reply to Myles C. Maxfield from comment #2) > > Is this bug about our bidi algorithm implementation or is it about something > > inside ComplexTextControllerHarfBuzz? > > I'm not sure yet. Unless ComplexTextController already takes this into > account when breaking runs, it's ComplexTextControllerHarfBuzz specific, but > I haven't looked at it in detail yet. Recent versions of Unicode (for some definition of "recent") have changed how the bracket matching algorithm works in the UBA, which is something that we want to support in the near future. It may be worth sitting on this bug until we fix that, and seeing if that work fixes this problem. Or you could update our UBA for me <3<3<3
Khaled Hosny
Comment 5 2017-10-27 14:42:50 PDT
Updating UBA wouldn’t make much of a difference here, since the issue is about script itemization which ideally should be independent of bidi itemization. One option for updating UBA implementation is to switch to ICU’s, like Gecko did recently.
Myles C. Maxfield
Comment 6 2017-10-27 14:48:13 PDT
(In reply to Khaled Hosny from comment #5) > Updating UBA wouldn’t make much of a difference here, since the issue is > about script itemization which ideally should be independent of bidi > itemization. > > One option for updating UBA implementation is to switch to ICU’s, like Gecko > did recently. Historically, we've found that ICU's UBA is too slow and would be a regression. However, it's probably worth revisiting this, as the perf numbers were gathered many years ago.
Myles C. Maxfield
Comment 7 2021-03-31 15:56:07 PDT
(In reply to Myles C. Maxfield from comment #6) > Historically, we've found that ICU's UBA is too slow and would be a > regression. However, it's probably worth revisiting this, as the perf > numbers were gathered many years ago. I started investigating this here: https://bugs.webkit.org/show_bug.cgi?id=178960
Note You need to log in before you can comment on or make changes to this bug.