Bug 149170 - Honor isolate bidi unicode codepoints
Summary: Honor isolate bidi unicode codepoints
Status: RESOLVED DUPLICATE of bug 178960
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Myles C. Maxfield
URL:
Keywords: InRadar
Depends on: 158749
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-15 10:51 PDT by Myles C. Maxfield
Modified: 2018-08-27 11:47 PDT (History)
12 users (show)

See Also:


Attachments
WIP (3.99 KB, patch)
2016-06-14 00:59 PDT, Myles C. Maxfield
no flags Details | Formatted Diff | Diff
Example (129 bytes, text/html)
2016-06-14 01:33 PDT, Myles C. Maxfield
no flags Details
Example 2 (187 bytes, text/html)
2016-06-14 01:38 PDT, Myles C. Maxfield
no flags Details
Patch (10.58 KB, patch)
2016-06-14 02:35 PDT, Myles C. Maxfield
simon.fraser: review+
Details | Formatted Diff | Diff
Patch for committing (11.02 KB, patch)
2016-06-14 20:34 PDT, Myles C. Maxfield
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Myles C. Maxfield 2015-09-15 10:51:12 PDT
Currently, our isolation logic only works on a RenderElement level. It should work inside a single RenderText too.
Comment 1 Radar WebKit Bug Importer 2015-09-15 10:51:51 PDT
<rdar://problem/22704517>
Comment 2 Myles C. Maxfield 2016-06-13 16:31:09 PDT
InlineBidiResolver::appendRun() will need to be fixed up, and potentially much more.
Comment 3 Myles C. Maxfield 2016-06-14 00:39:08 PDT
Currently, BidiResolverBase::createBidiRunsForLine() has no concept of U_RIGHT_TO_LEFT_ISOLATE. Because it doesn't appear as a case in any switch statements, the default is getting hit, which means we seem to ignore it completely (for the purposes of bidi).
Comment 4 Myles C. Maxfield 2016-06-14 00:45:15 PDT
These isolate characters don't force the complex text code path.
Comment 5 Myles C. Maxfield 2016-06-14 00:59:55 PDT
Created attachment 281243 [details]
WIP
Comment 6 Myles C. Maxfield 2016-06-14 01:33:00 PDT
Created attachment 281245 [details]
Example
Comment 7 Myles C. Maxfield 2016-06-14 01:38:16 PDT
Created attachment 281247 [details]
Example 2
Comment 8 Myles C. Maxfield 2016-06-14 02:19:10 PDT
Example 2 is explained by us not understanding how to deal with isolate characters inside BidiResolverBase::createBidiRunsForLine(). In particular, we have this general structure:

UCharDirection last = ...
while(true) {
    UCharDirection direction = u_charDirection(c);
    switch (direction) {
    case U_LEFT_TO_RIGHT:
        switch (last) {
        default:
            break;
        }
    case U_RIGHT_TO_LEFT:
        ...
    case U_WHITE_SPACE_NEUTRAL:
    default:
        break;
    }

    last = direction;
}

So, if the pattern of directions is:
U_RIGHT_TO_LEFT U_WHITE_SPACE_NEUTRAL U_FIRST_STRONG_ISOLATE U_LEFT_TO_RIGHT

When we hit the U_FIRST_STRONG_ISOLATE, we hit the outer default:. Then we set last = U_FIRST_STRONG_ISOLATE, and the next character is U_LEFT_TO_RIGHT, so we end up hitting the inner default:. So then we keep going, and never emit the run for the U_RIGHT_TO_LEFT parts.

Compare this to what you would get if you removed the ISOLATE characters. In that case, it would work correctly because the default:s wouldn't get hit.
Comment 9 Myles C. Maxfield 2016-06-14 02:35:42 PDT
Created attachment 281249 [details]
Patch
Comment 10 Myles C. Maxfield 2016-06-14 02:37:45 PDT
Comment on attachment 281249 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=281249&action=review

> Source/WTF/ChangeLog:4
> +        https://bugs.webkit.org/show_bug.cgi?id=149170

<rdar://problem/26527378>
Comment 11 Simon Fraser (smfr) 2016-06-14 11:18:35 PDT
Comment on attachment 281249 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=281249&action=review

> Source/WTF/ChangeLog:3
> +        Honor unicode bidi unicode codepoints

One too many "unicode"s here?

> Source/WebCore/ChangeLog:12
> +        Because BidiResolver doesn't have any concept of isolate Unicode code points,
> +        its output is unexpected when they are present in content. Instead of this,
> +        we should consider the code points the same as whitespace for the purposes
> +        of the bidi algorithm. This is a stop-gap measure until we can support the
> +        codepoints fully in our Bidi algorithm.

This reads awkwardly.

"BidiResolver doesn't have any concept of isolate Unicode code points, so produces unexpected output when they are present". Fix by considering such code points as whitespace in the bidi algorithm. This is..."

You should reference a follow-up bug to fix this the correct way.

> LayoutTests/fast/text/isolate-ignore-expected.html:4
> +This test makes sure that isolate codepoints are ignored (for the time being).

"for the time being" is never a good thing to put into code or a test. It quickly becomes anachronistic.
Comment 12 Myles C. Maxfield 2016-06-14 11:58:15 PDT
Committed r202057: <http://trac.webkit.org/changeset/202057>
Comment 13 Ryan Haddad 2016-06-14 12:42:51 PDT
This change broke the Windows build:
<https://build.webkit.org/builders/Apple%20Win%20Release%20%28Build%29/builds/78422>
Comment 14 Ryan Haddad 2016-06-14 17:29:59 PDT
(In reply to comment #13)
> This change broke the Windows build:
> <https://build.webkit.org/builders/Apple%20Win%20Release%20%28Build%29/
> builds/78422>

The change was rolled out in <http://trac.webkit.org/changeset/202059>
Comment 15 Darin Adler 2016-06-14 18:03:06 PDT
Can we add a regression test?
Comment 16 Myles C. Maxfield 2016-06-14 19:41:13 PDT
This patch broke the Windows build because we're building against ICU 6.1 on Windows, which doesn't know what any of the U_*_ISOLATE constants are.
Comment 17 Myles C. Maxfield 2016-06-14 19:42:02 PDT
It may be difficult to update https://developer.apple.com/opensource/internet/WebKitAuxiliaryLibrary.zip.
Comment 18 Myles C. Maxfield 2016-06-14 20:23:35 PDT
(In reply to comment #16)
> This patch broke the Windows build because we're building against ICU 6.1 on
> Windows, which doesn't know what any of the U_*_ISOLATE constants are.

Even though the header file on Windows doesn't include the U_*_ISOLATE symbols, the library on Windows does correctly return the correct constants from u_charDirection().
Comment 19 Myles C. Maxfield 2016-06-14 20:26:35 PDT
(In reply to comment #15)
> Can we add a regression test?

I thought the test I added (fast/text/isolate-ignore.html) is a regression test. Are you referring to something else?
Comment 20 Myles C. Maxfield 2016-06-14 20:34:28 PDT
Created attachment 281320 [details]
Patch for committing
Comment 21 WebKit Commit Bot 2016-06-14 20:36:50 PDT
Attachment 281320 [details] did not pass style-queue:


ERROR: Source/WebCore/platform/text/BidiResolver.h:635:  U_FIRST_STRONG_ISOLATE is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/WebCore/platform/text/BidiResolver.h:636:  U_LEFT_TO_RIGHT_ISOLATE is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/WebCore/platform/text/BidiResolver.h:637:  U_RIGHT_TO_LEFT_ISOLATE is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/WebCore/platform/text/BidiResolver.h:638:  U_POP_DIRECTIONAL_ISOLATE is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
Total errors found: 4 in 8 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 22 WebKit Commit Bot 2016-06-14 21:42:24 PDT
Comment on attachment 281320 [details]
Patch for committing

Clearing flags on attachment: 281320

Committed r202083: <http://trac.webkit.org/changeset/202083>
Comment 23 Darin Adler 2016-06-15 09:23:28 PDT
(In reply to comment #19)
> I thought the test I added (fast/text/isolate-ignore.html) is a regression
> test.

Sorry, somehow I missed it!
Comment 24 mitz 2017-10-17 17:04:29 PDT
Can you retitle this bug so that it’s clear that it’s about isolate codepoints? Embed and override codepoints had been working since KHTML.
Comment 25 Myles C. Maxfield 2018-08-27 11:47:04 PDT

*** This bug has been marked as a duplicate of bug 178960 ***