Bug 15790

Summary: Ligatures aren't accounted for when manipulating VisiblePositions
Product: WebKit Reporter: Alexey Proskuryakov <ap>
Component: HTML EditingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: darin, hbono, jshin, mitz, oliver
Priority: P2    
Version: 523.x (Safari 3)   
Hardware: Mac   
OS: OS X 10.4   
Attachments:
Description Flags
test case
none
proposed fix
none
another proposed fix ap: review+

Description Alexey Proskuryakov 2007-11-02 05:39:51 PDT
E.g., pressing arrow keys in Devanagari text can make the current selection not match what the user sees.

VisiblePosition calls (indirectly) into TextRenderer to find previous/next position, but the latter simply asks ICU about character breaks, and has no idea about font features.
Comment 1 Alexey Proskuryakov 2007-11-02 05:40:28 PDT
Created attachment 16991 [details]
test case
Comment 2 Alexey Proskuryakov 2007-11-06 01:24:58 PST
Interestingly, TextEdit can set the insertion point inside some ligatures (such as Roman "fi" and "ffi"), but not in this one (Devanagari "k, d, Shift+,"). Its behavior when undoing deletions that affect ligatures is somewhat buggy as of Mac OS X 10.4.10 - insertion point disappears.
Comment 3 Alexey Proskuryakov 2007-11-08 11:00:41 PST
Created attachment 17124 [details]
proposed fix

This fixes the test case... But I'm not sure about many things here, e.g.:

- Is this likely to negatively affect performance?
- This changes behavior WRT zero-width characters, which will now be ignored when considering candidate positions.
- Is caretRect() the right way to compare coordinates?
Comment 4 mitz 2007-11-08 11:08:04 PST
(In reply to comment #3)

> - This changes behavior WRT zero-width characters, which will now be ignored
> when considering candidate positions.

Does that mean users won't be able to move the caret from before a zero-width character (-sequence) to after that character (after the first character in the sequence)?
Comment 5 Alexey Proskuryakov 2007-11-08 11:53:42 PST
(In reply to comment #4)
> Does that mean users won't be able to move the caret from before a zero-width
> character (-sequence) to after that character (after the first character in the
> sequence)?

Yes, I think the caret will be moved to after the next non zero-width character.
Comment 6 mitz 2007-11-08 11:57:14 PST
(In reply to comment #5)
> Yes, I think the caret will be moved to after the next non zero-width
> character.

That will make life even harder for people trying to manipulate bidi embedding characters and LRM/RLMs.
Comment 7 Alexey Proskuryakov 2007-11-08 12:28:34 PST
While this sounds quite probable, I'm afraid it's hard for me to come up with steps to reproduce the problems you mention. May I ask you to try this patch and provide some steps to reproduce them?

Is there any distinction at any level between zero length characters and those that are merged together when rendering (as this Devanagari ligature or 1/2 in some OpenType fonts)?
Comment 8 mitz 2007-11-08 15:50:17 PST
(In reply to comment #7)
> While this sounds quite probable, I'm afraid it's hard for me to come up with
> steps to reproduce the problems you mention.

No wonder -- things are already in pretty bad shape since many of the zero-width characters don't make it into the text boxes. Here is an example that's affected by your patch:

data:text/html,%3Cdiv%20contenteditable%3Ea&#x200e;b%3C/div%3E

Given broken things already are, the extra "damage" done by the patch is probably insignificant.
Comment 9 Darin Adler 2007-11-13 07:48:50 PST
Comment on attachment 17124 [details]
proposed fix

It's a little ugly to be using caretRect for this purpose, but I think it's a clever solution.

 497                 return offset() == textRenderer->nextOffset(textRenderer->previousOffset(offset()));
 498             else
 499                 return thisRect != textRenderer->caretRect(textRenderer->previousOffset(offset()), UPSTREAM);

We normally don't do else after return.

r=me
Comment 10 Alexey Proskuryakov 2007-11-13 23:37:28 PST
This patch breaks backspace and delete: since those work with selections, they now delete the whole ligature instead of just one character.

The solution is to have these code paths re-implemented to work with ranges (we need that for bug 4709, too).
Comment 11 Mark Rowe (bdash) 2007-12-22 01:27:27 PST
Alexey, if this breaks backspace and delete and therefore can't be landed, should the review flag be cleared to remove it from the commit queue?
Comment 12 Alexey Proskuryakov 2007-12-22 09:15:51 PST
Comment on attachment 17124 [details]
proposed fix

OK. Will land when the issues are resolved.
Comment 13 Alexey Proskuryakov 2009-03-05 07:46:10 PST
I no longer think that asking rendering layer for boundaries is the right approach. We should just tweak pure Unicode cursor movement iterator introduced in bug 24342 to better match platform conventions. One reason is that it should be possible to put insertion point inside a "fi" ligature, for example. Also, that's what other text engines do.
Comment 14 Hironori Bono 2009-03-10 22:29:05 PDT
Created attachment 28465 [details]
another proposed fix

Sorry for my fix for bug 24342 which caused a conflict with this bug. I added some rules which prevent a cursor from moving after Indic virama signs.
Nevertheless, I'm a little wondering if this change covers the all possible combinations for Indic ligatures. I read Unicode spec and investigated Firefox, IE, TextEdit, and NeoOffice. But their cursor behaviors are not completely the same. For example, IE prevents a cursor from moving in the middle of a Devanagari ligature "&#x0915;&#x094D;&#x0915;", written in Table 9-3 of Unicode specification 5.0, but Firefox, TextEdit, and NeoOffice premit.
Even though this change emulates IE for such cases, please feel free to blame me if this is not your expected behavior.
Comment 15 Alexey Proskuryakov 2009-03-11 00:18:22 PDT
Comment on attachment 28465 [details]
another proposed fix

r=me

In general, we need to match TextEdit behavior, at least on PLATFORM(MAC). But this can be further tweaked in a follow-up patch.

There is a tab character in ChangeLog that will need to be fixed before landing.
Comment 16 Alexey Proskuryakov 2009-03-11 09:06:31 PDT
Committed <http://trac.webkit.org/projects/webkit/changeset/41588>. I also added the new test to Skipped list on Tiger, since we don't use custom rules with ICU 3.2 now.
Comment 17 Hironori Bono 2009-03-11 17:00:29 PDT
Sorry for my stupid change which caused an assertion on Tiger.
I would like to test this change on Tiger (on ICU 3.2) and re-send this fix. Is it possible to revert my change 41588?
Comment 18 Alexey Proskuryakov 2009-03-12 03:09:04 PDT
I don't think it's very important to get this working on Tiger, although that would be nice.

Bringing the behavior closer to TextEdit one on Mac would be more important in my opinion.
Comment 19 Hironori Bono 2009-03-13 01:02:20 PDT
Thank you so much for your encouraging comment.

(In reply to comment #18)
> I don't think it's very important to get this working on Tiger, although that
> would be nice.
> Bringing the behavior closer to TextEdit one on Mac would be more important in
> my opinion.

I'm updating the rules to improve the compatibility with TextEdit, and I'm going to send its review request as soon as possible.