RESOLVED FIXED 65395
Regional indicator symbols that are combined should behave as a single character when editing
https://bugs.webkit.org/show_bug.cgi?id=65395
Summary Regional indicator symbols that are combined should behave as a single charac...
mitz
Reported 2011-07-29 14:39:43 PDT
When two regional indicator symbols combine to form a single flag, the insertion point or a selection boundary should not be allowed between them, and delete commands should delete the pair as a unit.
Attachments
part 1 - backspace (2.40 KB, patch)
2011-08-15 10:48 PDT, Alexey Proskuryakov
no flags
part 1 - backspace (5.26 KB, patch)
2011-08-15 14:23 PDT, Alexey Proskuryakov
mitz: review+
part 1 - backspace (2.92 KB, patch)
2011-08-15 14:37 PDT, Alexey Proskuryakov
no flags
part 1 - backspace (5.36 KB, patch)
2011-08-15 14:38 PDT, Alexey Proskuryakov
webkit.review.bot: commit-queue-
part 2 - cursor movement iterator (WIP) (3.57 KB, patch)
2011-08-17 10:03 PDT, Alexey Proskuryakov
no flags
part 2 - cursor movement iterator (5.77 KB, patch)
2011-08-18 11:10 PDT, Alexey Proskuryakov
no flags
Radar WebKit Bug Importer
Comment 1 2011-07-29 14:39:56 PDT
Alexey Proskuryakov
Comment 2 2011-07-29 20:57:38 PDT
Why?
mitz
Comment 3 2011-07-29 21:01:12 PDT
Because they were most likely entered as a single symbol, and because the behavior when the insertion point is between the two symbols is unexpected and most likely undesirable.
Alexey Proskuryakov
Comment 4 2011-07-29 21:38:19 PDT
Does the Unicode spec agree?
mitz
Comment 5 2011-07-30 11:17:32 PDT
That’s a good question. I am not sure where and how the Unicode standard address this sort of question. For example, is it the Unicode standard that dictates WebKit’s behavior for decomposed sequences such as e + ´ vs. Hebrew letters with vowel marks such as א + ָ Can you help me look?
mitz
Comment 6 2011-07-30 11:50:25 PDT
I found this note in UAX #29 <http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries>: “Font-based information may be required to determine the appropriate unit to use for UI purposes, such as identification of boundaries for first-letter paragraph styling. For example, such a unit could be a ligature formed of two grapheme clusters, such as لا (Arabic lam + alef).” I think it may apply here.
Alexey Proskuryakov
Comment 7 2011-07-30 20:50:00 PDT
In more technical terms, my question was whether these symbols form extended grapheme clusters. In WebCore, these are returned by characterBreakIterator, which just calls ICU. We sometimes allow cursor positioning inside extended grapheme clusters. We should probably never disallow cursor positioning at a place where a character break can occur (e.g. for truncation).
mitz
Comment 8 2011-07-30 20:57:24 PDT
(In reply to comment #7) > In more technical terms, my question was whether these symbols form extended grapheme clusters. I believe that the answer is “not by default, but possibly by tailoring”. > In WebCore, these are returned by characterBreakIterator, which just calls ICU. > > We sometimes allow cursor positioning inside extended grapheme clusters. We should probably never disallow cursor positioning at a place where a character break can occur (e.g. for truncation). It now occurs to me that WebKit should also prohibit line breaks between two regional indicator symbols that combine, even in line wrapping modes that allow breaking between “characters”.
mitz
Comment 9 2011-08-10 08:38:10 PDT
For ICU-based ports, this is probably going to be resolved externally to WebKit via <http://unicode.org/cldr/trac/ticket/4076>.
Alexey Proskuryakov
Comment 10 2011-08-15 10:48:41 PDT
Created attachment 103927 [details] part 1 - backspace
Alexey Proskuryakov
Comment 11 2011-08-15 10:49:45 PDT
Some EWS bots are expected to fail due to respective platforms not using ICU. I'll add the test to skipped lists when landing.
mitz
Comment 12 2011-08-15 11:44:54 PDT
Comment on attachment 103927 [details] part 1 - backspace View in context: https://bugs.webkit.org/attachment.cgi?id=103927&action=review > Source/WebCore/ChangeLog:14 > + (WebCore::RenderText::previousOffsetForBackwardDeletion): No comment? > Source/WebCore/rendering/RenderText.cpp:1504 > + // Pairs of regional indicators are combined into flag emoji. s/are/may be/. Most pairs don’t make up a flag. > Source/WebCore/rendering/RenderText.cpp:1536 > + // We assume that regional indicators came in pairs. Otherwise we'd need to perform unlimited lookahead > + // to determine whether to delete an odd one at the end. > + if (isRegionalIndicator(character)) { > + if (sawRegionalIndicator) > + break; > + sawRegionalIndicator = true; > + continue; > + } How will this treat a sequence of regional indicator, mark, regional indicator?
Alexey Proskuryakov
Comment 13 2011-08-15 11:56:56 PDT
> No comment? Yup. Any suggestions that would be helpful for people reading the ChangeLog? > > Source/WebCore/rendering/RenderText.cpp:1504 > > + // Pairs of regional indicators are combined into flag emoji. they all do when deleting, and all that can practically occur do in rendering. > How will this treat a sequence of regional indicator, mark, regional indicator? It will delete mark, regional indicator (i.e. correctly, I think). Should I add a test for this case?
mitz
Comment 14 2011-08-15 12:02:24 PDT
(In reply to comment #13) > > No comment? > > Yup. Any suggestions that would be helpful for people reading the ChangeLog? A description of the new behavior. > > > > Source/WebCore/rendering/RenderText.cpp:1504 > > > + // Pairs of regional indicators are combined into flag emoji. > > they all do when deleting, and all that can practically occur do in rendering. I think this comment is saying something that’s not true. I’m not sure how it helps. I’m fine with removing it or with making a more truthful statement. > > > How will this treat a sequence of regional indicator, mark, regional indicator? > > It will delete mark, regional indicator (i.e. correctly, I think). Should I add a test for this case? Interesting. So the fact that the boolean is never reset to false really never matters? Please add tests.
Alexey Proskuryakov
Comment 15 2011-08-15 12:11:41 PDT
>A description of the new behavior. I don't understand what you are saying. Do you want something like: > + (WebCore::RenderText::previousOffsetForBackwardDeletion): Fix this bug. ? > I think this comment is saying something that’s not true. I’m not sure how it helps. I’m fine with removing it or with making a more truthful statement. Regional indicators and their extremely weird implementation is certainly not something many people in the world know about. This comment answers both questions and provides keywords for further searching. Unless you can suggest a better comment, I'm going to keep it as is. > Interesting. So the fact that the boolean is never reset to false really never matters? We certainly never want to reset it to false - that would mean that we delete a terribly long sequence of characters.
Alexey Proskuryakov
Comment 16 2011-08-15 13:00:10 PDT
Comment on attachment 103927 [details] part 1 - backspace I realized that I didn't include my test in the patch. Also, Dan convinced me to fix another edge case that I didn't care about at first.
Alexey Proskuryakov
Comment 17 2011-08-15 14:23:26 PDT
Created attachment 103951 [details] part 1 - backspace
WebKit Review Bot
Comment 18 2011-08-15 14:26:51 PDT
Attachment 103951 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/edit..." exit_code: 1 LayoutTests/ChangeLog:1: ChangeLog entry has no bug number [changelog/bugnumber] [5] Total errors found: 1 in 5 files If any of these errors are false positives, please file a bug against check-webkit-style.
Alexey Proskuryakov
Comment 19 2011-08-15 14:37:26 PDT
Created attachment 103955 [details] part 1 - backspace Patch preparation fail. Still need EWS results.
Alexey Proskuryakov
Comment 20 2011-08-15 14:38:20 PDT
Created attachment 103956 [details] part 1 - backspace More fail...
WebKit Review Bot
Comment 21 2011-08-15 15:08:07 PDT
Comment on attachment 103956 [details] part 1 - backspace Attachment 103956 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/9380913 New failing tests: editing/deleting/regional-indicators.html
Alexey Proskuryakov
Comment 22 2011-08-15 15:39:50 PDT
Comment on attachment 103956 [details] part 1 - backspace Hmm, I'm surprised that Qt, Gtk and EFL all pass, but OK. Not sure what could be wrong with Chromium, need to see failing results.
WebKit Review Bot
Comment 23 2011-08-15 16:18:39 PDT
Comment on attachment 103956 [details] part 1 - backspace Rejecting attachment 103956 [details] from commit-queue. Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=ec2-cq-02', '--port..." exit_code: 2 Last 500 characters of output: deleting/regional-indicators.html = TEXT Regressions: Unexpected image mismatch : (5) fast/text/atsui-multiple-renderers.html = IMAGE fast/text/international/danda-space.html = IMAGE fast/text/international/thai-baht-space.html = IMAGE fast/text/international/thai-line-breaks.html = IMAGE platform/chromium-linux/fast/text/international/complex-joining-using-gpos.html = IMAGE Regressions: Unexpected image and text mismatch : (1) svg/custom/svg-fonts-word-spacing.html = IMAGE+TEXT Full output: http://queues.webkit.org/results/9380937
Alexey Proskuryakov
Comment 24 2011-08-15 16:38:15 PDT
There is something wrong going on with the chromium bot. Committed manually as <http://trac.webkit.org/r93068>, and reverted an accidental change in <http://trac.webkit.org/changeset/93069>. Keeping the bug open for other fixes.
Adam Barth
Comment 25 2011-08-15 17:31:33 PDT
> There is something wrong going on with the chromium bot. Nothing is wrong with the bot. I've filed https://bugs.webkit.org/show_bug.cgi?id=66268 about this test failing on Chromium.
Alexey Proskuryakov
Comment 26 2011-08-17 10:03:32 PDT
Created attachment 104185 [details] part 2 - cursor movement iterator (WIP)
Alexey Proskuryakov
Comment 27 2011-08-17 10:11:35 PDT
In the attached patch for cursor movement iterator, I have changed ICU rules in the same way that is being considered for mainstream ICU in <http://unicode.org/cldr/trac/ticket/4076>. While this change works with default ICU character break rules, it does not work with our customized rules for cursor movement, and I cannot figure out why. CC'ing Hironori Bono, who wrote our custom rules. The problem is that for a string of several national flag characters, we get a break opportunity inside a flag. For example, with asterisk denoting a single regional indicator symbol (AKA half of an emoji flag), I'm getting these breaks: |****|*|*| While for a sequence of two flags, breaks are correct: |**|**|
Hironori Bono
Comment 28 2011-08-18 04:50:02 PDT
Greetings Alexey, Thank you for noticing it. Even though I have not investigated this issue so deeply, it seems to be a side-effect of the '!!chain;' rule. I will update the rule so it works with regional indicator symbols. Regards, Hironori Bono E-mail: hbono@chromium.org
Alexey Proskuryakov
Comment 29 2011-08-18 08:56:44 PDT
This turned out to be my mistake - I copied the rules from ICU patch incorrectly. I'll post an updated patch shortly, sorry for the noise!
Alexey Proskuryakov
Comment 30 2011-08-18 11:10:47 PDT
Created attachment 104367 [details] part 2 - cursor movement iterator
Alexey Proskuryakov
Comment 31 2011-08-18 11:19:16 PDT
Comment on attachment 104367 [details] part 2 - cursor movement iterator Will need failing results on non-ICU platforms...
WebKit Review Bot
Comment 32 2011-08-18 13:33:11 PDT
Comment on attachment 104367 [details] part 2 - cursor movement iterator Clearing flags on attachment: 104367 Committed r93344: <http://trac.webkit.org/changeset/93344>
WebKit Review Bot
Comment 33 2011-08-18 13:33:18 PDT
All reviewed patches have been landed. Closing bug.
Alexey Proskuryakov
Comment 34 2011-08-18 14:52:19 PDT
Landed failing results for Qt in r93351, filed bug 66500.
Alexey Proskuryakov
Comment 35 2011-08-18 14:56:54 PDT
Bug 66501/r93353 for Gtk.
Note You need to log in before you can comment on or make changes to this bug.