Summary: | Ligatures aren't accounted for when manipulating VisiblePositions | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Alexey Proskuryakov <ap> | ||||||||
Component: | HTML Editing | Assignee: | Nobody <webkit-unassigned> | ||||||||
Status: | RESOLVED FIXED | ||||||||||
Severity: | Normal | CC: | darin, hbono, jshin, mitz, oliver | ||||||||
Priority: | P2 | ||||||||||
Version: | 523.x (Safari 3) | ||||||||||
Hardware: | Mac | ||||||||||
OS: | OS X 10.4 | ||||||||||
Attachments: |
|
Description
Alexey Proskuryakov
2007-11-02 05:39:51 PDT
Created attachment 16991 [details]
test case
Interestingly, TextEdit can set the insertion point inside some ligatures (such as Roman "fi" and "ffi"), but not in this one (Devanagari "k, d, Shift+,"). Its behavior when undoing deletions that affect ligatures is somewhat buggy as of Mac OS X 10.4.10 - insertion point disappears. Created attachment 17124 [details]
proposed fix
This fixes the test case... But I'm not sure about many things here, e.g.:
- Is this likely to negatively affect performance?
- This changes behavior WRT zero-width characters, which will now be ignored when considering candidate positions.
- Is caretRect() the right way to compare coordinates?
(In reply to comment #3) > - This changes behavior WRT zero-width characters, which will now be ignored > when considering candidate positions. Does that mean users won't be able to move the caret from before a zero-width character (-sequence) to after that character (after the first character in the sequence)? (In reply to comment #4) > Does that mean users won't be able to move the caret from before a zero-width > character (-sequence) to after that character (after the first character in the > sequence)? Yes, I think the caret will be moved to after the next non zero-width character. (In reply to comment #5) > Yes, I think the caret will be moved to after the next non zero-width > character. That will make life even harder for people trying to manipulate bidi embedding characters and LRM/RLMs. While this sounds quite probable, I'm afraid it's hard for me to come up with steps to reproduce the problems you mention. May I ask you to try this patch and provide some steps to reproduce them? Is there any distinction at any level between zero length characters and those that are merged together when rendering (as this Devanagari ligature or 1/2 in some OpenType fonts)? (In reply to comment #7) > While this sounds quite probable, I'm afraid it's hard for me to come up with > steps to reproduce the problems you mention. No wonder -- things are already in pretty bad shape since many of the zero-width characters don't make it into the text boxes. Here is an example that's affected by your patch: data:text/html,%3Cdiv%20contenteditable%3Ea‎b%3C/div%3E Given broken things already are, the extra "damage" done by the patch is probably insignificant. Comment on attachment 17124 [details]
proposed fix
It's a little ugly to be using caretRect for this purpose, but I think it's a clever solution.
497 return offset() == textRenderer->nextOffset(textRenderer->previousOffset(offset()));
498 else
499 return thisRect != textRenderer->caretRect(textRenderer->previousOffset(offset()), UPSTREAM);
We normally don't do else after return.
r=me
This patch breaks backspace and delete: since those work with selections, they now delete the whole ligature instead of just one character. The solution is to have these code paths re-implemented to work with ranges (we need that for bug 4709, too). Alexey, if this breaks backspace and delete and therefore can't be landed, should the review flag be cleared to remove it from the commit queue? Comment on attachment 17124 [details]
proposed fix
OK. Will land when the issues are resolved.
I no longer think that asking rendering layer for boundaries is the right approach. We should just tweak pure Unicode cursor movement iterator introduced in bug 24342 to better match platform conventions. One reason is that it should be possible to put insertion point inside a "fi" ligature, for example. Also, that's what other text engines do. Created attachment 28465 [details] another proposed fix Sorry for my fix for bug 24342 which caused a conflict with this bug. I added some rules which prevent a cursor from moving after Indic virama signs. Nevertheless, I'm a little wondering if this change covers the all possible combinations for Indic ligatures. I read Unicode spec and investigated Firefox, IE, TextEdit, and NeoOffice. But their cursor behaviors are not completely the same. For example, IE prevents a cursor from moving in the middle of a Devanagari ligature "क्क", written in Table 9-3 of Unicode specification 5.0, but Firefox, TextEdit, and NeoOffice premit. Even though this change emulates IE for such cases, please feel free to blame me if this is not your expected behavior. Comment on attachment 28465 [details]
another proposed fix
r=me
In general, we need to match TextEdit behavior, at least on PLATFORM(MAC). But this can be further tweaked in a follow-up patch.
There is a tab character in ChangeLog that will need to be fixed before landing.
Committed <http://trac.webkit.org/projects/webkit/changeset/41588>. I also added the new test to Skipped list on Tiger, since we don't use custom rules with ICU 3.2 now. Sorry for my stupid change which caused an assertion on Tiger. I would like to test this change on Tiger (on ICU 3.2) and re-send this fix. Is it possible to revert my change 41588? I don't think it's very important to get this working on Tiger, although that would be nice. Bringing the behavior closer to TextEdit one on Mac would be more important in my opinion. Thank you so much for your encouraging comment. (In reply to comment #18) > I don't think it's very important to get this working on Tiger, although that > would be nice. > Bringing the behavior closer to TextEdit one on Mac would be more important in > my opinion. I'm updating the rules to improve the compatibility with TextEdit, and I'm going to send its review request as soon as possible. |