WebKit, along with all browsers and operating systems, implements the Unicode Bidirectional Algorithm (UBA), which is the specification for determining the visual character ordering of text. A correct implementation is essential for the correct display of RTL (Hebrew, Arabic, Farsi, etc.) text.
In Unicode 6.3, released last fall, the UBA (http://www.unicode.org/reports/tr9/) contains the most significant new features since its inception ten years ago:
- "Paired bracket" matching and resolution of both brackets to the same bidi level. This means that text like "max(x, y)" will no longer need the aid of invisible formatting characters to stop displaying as "(max(x, y" in an RTL context.
- New "isolate" directional formatting characters (LRI, RLI, FSI, PDI) that make it significantly easier to embed opposite-direction or unknown-direction text without having to determine its direction and without unduly affecting the order of the surrounding text.
The specification changes are highlighted in yellow in http://www.unicode.org/reports/tr9/tr9-28.html.
Version 52 of ICU includes the new features in its UBA implementations (C and Java).
This means that the next version of Android (L), which will contain that version of ICU, will support the new UBA features.
Mozilla is in the process of replacing their current UBA implementation, which is based on a predecessor of the ICU implementation, with the new ICU implementation, thus getting support for the new features in the short term (https://bugzilla.mozilla.org/show_bug.cgi?id=924851)
Windows (8 and up) and IE (10 and up) already implement the paired bracket feature in their UBA implementations.
WebKit needs to implement the new features, either by switching to the ICU implementation of the UBA, or by modifying the current implementation (based on that in KHTML, as far as I understand).
Why is this very important to implement ASAP:
- RTL text generated by users on implementing systems is quite likely to contain paired brackets that only work as intended on implementing systems. This means that text is being generated right now on Microsoft systems that looks broken on WebKit. When other systems, like Android and Mozilla, implement the new features, much more such text will start being generated. The longer WebKit waits before implementing, the higher percentage of RTL documents, created on implementing systems, will look broken in WebKit.
- The directional isolates feature is similar to the unicode-bidi:isolate and unicode-bidi:plaintext CSS features already implemented in WebKit. However, the existing WebKit implementation of these CSS features, which has to work around the existing WebKit UBA implementation, is very cumbersome and buggy; there are still serious open bugs on it now (at least bug 109624 and bug 124146, maybe more given recent history). It would be a cinch to implement 100% correctly by simulating the new formatting characters as described in http://dev.w3.org/csswg/css-writing-modes-3/#unicode-bidi (just as unicode-bidi:embed is simulated by LRE|RLE...PDF and unicode-bidi:override by LRO|RLO...PDF). The WebKit bugs in the implementation of unicode-bidi:isolate are blocking that becoming the default style for all elemenets with the dir attribute, as required in HTML5 (http://www.w3.org/TR/html5/rendering.html#bidi-rendering).
I started working on this in https://bugs.webkit.org/show_bug.cgi?id=178960
This would fix https://bugs.webkit.org/show_bug.cgi?id=204817
> New "isolate" directional formatting characters (LRI, RLI, FSI, PDI) that make it significantly easier to embed opposite-direction or unknown-direction text without having to determine its direction and without unduly affecting the order of the surrounding text.
Blink and Gecko browsers support these characters which are important for proper support of bidirectional text on the Web. However, WebKit browsers such as Safari still don't support the badly needed behaviour.
See tests at https://www.w3.org/International/i18n-tests/results/bidi-algorithm#rli_etc
Any chance we can bring WebKit up to speed for the (literally) millions of RTL script users out there? It should be high up in the list of priorities, in my opinion.
Fwiw, we are tracking this issue at https://www.w3.org/TR/adlm-gap/#issue10_bidi_text
Kinda surprising that Apple, known traditionally for its font handling and text rendering, as well as being the nr1 emoji engine, to not have an up to date version of the bidi algorithm.
Apparently even Legacy Edge was handling this better ?