When two regional indicator symbols combine to form a single flag, the insertion point or a selection boundary should not be allowed between them, and delete commands should delete the pair as a unit.
<rdar://problem/9866836>
Why?
Because they were most likely entered as a single symbol, and because the behavior when the insertion point is between the two symbols is unexpected and most likely undesirable.
Does the Unicode spec agree?
That’s a good question. I am not sure where and how the Unicode standard address this sort of question. For example, is it the Unicode standard that dictates WebKit’s behavior for decomposed sequences such as e + ´ vs. Hebrew letters with vowel marks such as א + ָ Can you help me look?
I found this note in UAX #29 <http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries>: “Font-based information may be required to determine the appropriate unit to use for UI purposes, such as identification of boundaries for first-letter paragraph styling. For example, such a unit could be a ligature formed of two grapheme clusters, such as لا (Arabic lam + alef).” I think it may apply here.
In more technical terms, my question was whether these symbols form extended grapheme clusters. In WebCore, these are returned by characterBreakIterator, which just calls ICU. We sometimes allow cursor positioning inside extended grapheme clusters. We should probably never disallow cursor positioning at a place where a character break can occur (e.g. for truncation).
(In reply to comment #7) > In more technical terms, my question was whether these symbols form extended grapheme clusters. I believe that the answer is “not by default, but possibly by tailoring”. > In WebCore, these are returned by characterBreakIterator, which just calls ICU. > > We sometimes allow cursor positioning inside extended grapheme clusters. We should probably never disallow cursor positioning at a place where a character break can occur (e.g. for truncation). It now occurs to me that WebKit should also prohibit line breaks between two regional indicator symbols that combine, even in line wrapping modes that allow breaking between “characters”.
For ICU-based ports, this is probably going to be resolved externally to WebKit via <http://unicode.org/cldr/trac/ticket/4076>.
Created attachment 103927 [details] part 1 - backspace
Some EWS bots are expected to fail due to respective platforms not using ICU. I'll add the test to skipped lists when landing.
Comment on attachment 103927 [details] part 1 - backspace View in context: https://bugs.webkit.org/attachment.cgi?id=103927&action=review > Source/WebCore/ChangeLog:14 > + (WebCore::RenderText::previousOffsetForBackwardDeletion): No comment? > Source/WebCore/rendering/RenderText.cpp:1504 > + // Pairs of regional indicators are combined into flag emoji. s/are/may be/. Most pairs don’t make up a flag. > Source/WebCore/rendering/RenderText.cpp:1536 > + // We assume that regional indicators came in pairs. Otherwise we'd need to perform unlimited lookahead > + // to determine whether to delete an odd one at the end. > + if (isRegionalIndicator(character)) { > + if (sawRegionalIndicator) > + break; > + sawRegionalIndicator = true; > + continue; > + } How will this treat a sequence of regional indicator, mark, regional indicator?
> No comment? Yup. Any suggestions that would be helpful for people reading the ChangeLog? > > Source/WebCore/rendering/RenderText.cpp:1504 > > + // Pairs of regional indicators are combined into flag emoji. they all do when deleting, and all that can practically occur do in rendering. > How will this treat a sequence of regional indicator, mark, regional indicator? It will delete mark, regional indicator (i.e. correctly, I think). Should I add a test for this case?
(In reply to comment #13) > > No comment? > > Yup. Any suggestions that would be helpful for people reading the ChangeLog? A description of the new behavior. > > > > Source/WebCore/rendering/RenderText.cpp:1504 > > > + // Pairs of regional indicators are combined into flag emoji. > > they all do when deleting, and all that can practically occur do in rendering. I think this comment is saying something that’s not true. I’m not sure how it helps. I’m fine with removing it or with making a more truthful statement. > > > How will this treat a sequence of regional indicator, mark, regional indicator? > > It will delete mark, regional indicator (i.e. correctly, I think). Should I add a test for this case? Interesting. So the fact that the boolean is never reset to false really never matters? Please add tests.
>A description of the new behavior. I don't understand what you are saying. Do you want something like: > + (WebCore::RenderText::previousOffsetForBackwardDeletion): Fix this bug. ? > I think this comment is saying something that’s not true. I’m not sure how it helps. I’m fine with removing it or with making a more truthful statement. Regional indicators and their extremely weird implementation is certainly not something many people in the world know about. This comment answers both questions and provides keywords for further searching. Unless you can suggest a better comment, I'm going to keep it as is. > Interesting. So the fact that the boolean is never reset to false really never matters? We certainly never want to reset it to false - that would mean that we delete a terribly long sequence of characters.
Comment on attachment 103927 [details] part 1 - backspace I realized that I didn't include my test in the patch. Also, Dan convinced me to fix another edge case that I didn't care about at first.
Created attachment 103951 [details] part 1 - backspace
Attachment 103951 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/edit..." exit_code: 1 LayoutTests/ChangeLog:1: ChangeLog entry has no bug number [changelog/bugnumber] [5] Total errors found: 1 in 5 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 103955 [details] part 1 - backspace Patch preparation fail. Still need EWS results.
Created attachment 103956 [details] part 1 - backspace More fail...
Comment on attachment 103956 [details] part 1 - backspace Attachment 103956 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/9380913 New failing tests: editing/deleting/regional-indicators.html
Comment on attachment 103956 [details] part 1 - backspace Hmm, I'm surprised that Qt, Gtk and EFL all pass, but OK. Not sure what could be wrong with Chromium, need to see failing results.
Comment on attachment 103956 [details] part 1 - backspace Rejecting attachment 103956 [details] from commit-queue. Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=ec2-cq-02', '--port..." exit_code: 2 Last 500 characters of output: deleting/regional-indicators.html = TEXT Regressions: Unexpected image mismatch : (5) fast/text/atsui-multiple-renderers.html = IMAGE fast/text/international/danda-space.html = IMAGE fast/text/international/thai-baht-space.html = IMAGE fast/text/international/thai-line-breaks.html = IMAGE platform/chromium-linux/fast/text/international/complex-joining-using-gpos.html = IMAGE Regressions: Unexpected image and text mismatch : (1) svg/custom/svg-fonts-word-spacing.html = IMAGE+TEXT Full output: http://queues.webkit.org/results/9380937
There is something wrong going on with the chromium bot. Committed manually as <http://trac.webkit.org/r93068>, and reverted an accidental change in <http://trac.webkit.org/changeset/93069>. Keeping the bug open for other fixes.
> There is something wrong going on with the chromium bot. Nothing is wrong with the bot. I've filed https://bugs.webkit.org/show_bug.cgi?id=66268 about this test failing on Chromium.
Created attachment 104185 [details] part 2 - cursor movement iterator (WIP)
In the attached patch for cursor movement iterator, I have changed ICU rules in the same way that is being considered for mainstream ICU in <http://unicode.org/cldr/trac/ticket/4076>. While this change works with default ICU character break rules, it does not work with our customized rules for cursor movement, and I cannot figure out why. CC'ing Hironori Bono, who wrote our custom rules. The problem is that for a string of several national flag characters, we get a break opportunity inside a flag. For example, with asterisk denoting a single regional indicator symbol (AKA half of an emoji flag), I'm getting these breaks: |****|*|*| While for a sequence of two flags, breaks are correct: |**|**|
Greetings Alexey, Thank you for noticing it. Even though I have not investigated this issue so deeply, it seems to be a side-effect of the '!!chain;' rule. I will update the rule so it works with regional indicator symbols. Regards, Hironori Bono E-mail: hbono@chromium.org
This turned out to be my mistake - I copied the rules from ICU patch incorrectly. I'll post an updated patch shortly, sorry for the noise!
Created attachment 104367 [details] part 2 - cursor movement iterator
Comment on attachment 104367 [details] part 2 - cursor movement iterator Will need failing results on non-ICU platforms...
Comment on attachment 104367 [details] part 2 - cursor movement iterator Clearing flags on attachment: 104367 Committed r93344: <http://trac.webkit.org/changeset/93344>
All reviewed patches have been landed. Closing bug.
Landed failing results for Qt in r93351, filed bug 66500.
Bug 66501/r93353 for Gtk.