Bug 21820

Summary: Unable to enter the Tamil UNICODE Characters via Thamizha Phonetic IME
Product: WebKit Reporter: Hironori Bono <hbono>
Component: HTML EditingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, jshin
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Windows XP   
Attachments:
Description Flags
proposed fix
none
my second proposed patch including a layout test
none
the third proposed fix
none
the forth proposed patch (containing descriptions)
none
the fourth proposed fix
none
Fifth proposed fix
none
Sixth proposed patch
ap: review-
seventh proposed (and temporal) fix without layout test results
none
eighth proposed fix
none
nineth proposed fix
ap: review-
tenth proposed fix ap: review+

Hironori Bono
Reported 2008-10-22 19:40:52 PDT
Copied from "http://code.google.com/p/chromium/issues/detail?id=1097" * Steps to Reproduce 1. Download the Thamizha tamil phonetic IME From http://www.thamizha.com/downloads/ekalappai20b_anjal.zip 2. Extract the archive and install. 3. Try to enter Tamil UNICODE Characters to any checkboxes see the characters got deleted automatically once you try to enter some tamil characters. This is not the case with Firefox or Internet explorer. * Expected Results I need to enter Tamil UNICODE characters to wikipedia project, blogging sites directly from the IME. * Actual Results Characters got deleted automatically one I try to enter a character. * Additional Builds and Platforms: Other browsers tested: Add OK or FAIL after other browsers where you have tested this issue: Safari 3: FAIL Firefox 3: OK IE 7: OK - Does not occur on The English version of Windows XP * Additional Information: Copy paste from notepad to browser with Tamil UNICODE characters works but it is a real waste of time. Some similar issue was there with Yahoo! messenger 7 and it has been rectified with latest version 9 (I think worked with version 8 as well). This issue is caused by the Editor::deleteWithDirection() function which deletes a ligature without decomposing it. This issue also happens on languages which uses ligatures, e.g. Arabic, Lao, Thai, etc. The easiest solution is changing the above function to insert a decomposed characters after deleting a ligature.
Attachments
proposed fix (1.45 KB, patch)
2008-10-22 19:46 PDT, Hironori Bono
no flags
my second proposed patch including a layout test (5.87 KB, patch)
2008-10-24 02:39 PDT, Hironori Bono
no flags
the third proposed fix (6.18 KB, patch)
2008-10-27 14:23 PDT, Hironori Bono
no flags
the forth proposed patch (containing descriptions) (6.85 KB, patch)
2008-11-05 20:29 PST, Hironori Bono
no flags
the fourth proposed fix (6.86 KB, patch)
2008-11-05 20:38 PST, Hironori Bono
no flags
Fifth proposed fix (4.64 KB, patch)
2008-11-21 01:50 PST, Hironori Bono
no flags
Sixth proposed patch (7.40 KB, patch)
2008-11-27 20:22 PST, Hironori Bono
ap: review-
seventh proposed (and temporal) fix without layout test results (9.35 KB, patch)
2008-12-02 02:23 PST, Hironori Bono
no flags
eighth proposed fix (17.64 KB, patch)
2008-12-08 23:19 PST, Hironori Bono
no flags
nineth proposed fix (18.05 KB, patch)
2008-12-09 00:52 PST, Hironori Bono
ap: review-
tenth proposed fix (15.17 KB, patch)
2008-12-10 01:43 PST, Hironori Bono
ap: review+
Hironori Bono
Comment 1 2008-10-22 19:46:12 PDT
Created attachment 24586 [details] proposed fix This is the simplest solution for this issue. I'm wondering if this is the best one, though.
Alexey Proskuryakov
Comment 2 2008-10-23 05:31:31 PDT
I'm leaving it to others to decide whether this is a good fix. But this absolutely needs an automated test case. See e.g. editing/deleting/forward-delete.html and platform/mac/editing/input/devanagari-ligature.html for some examples of how to make such.
Hironori Bono
Comment 3 2008-10-24 02:39:23 PDT
Created attachment 24637 [details] my second proposed patch including a layout test Thank you for your review. I have added a layout test "LayoutTests/editing/deleting/delete-ligature.html" which verifies if this issue is fixed. While writing this test, I need to change another file "WebCore/editing/EditorCommand.cpp" to let a JavaScript call the executeDeleteBackward() function through document.execCommand("DeleteBackward").
Jungshik Shin
Comment 4 2008-10-24 10:19:56 PDT
Comment on attachment 24637 [details] my second proposed patch including a layout test > Index: WebCore/ChangeLog > =================================================================== > --- WebCore/ChangeLog (revision 37842) > +++ WebCore/ChangeLog (working copy) > @@ -1,3 +1,14 @@ > +2008-10-24 Hironori Bono <hbono@chromium.org> > + > + Reviewed by NOBODY (OOPS!). > + > + Test: editing/deleting/delete-ligature.html > + > + * editing/Editor.cpp: > + (WebCore::Editor::deleteWithDirection): > + * editing/EditorCommand.cpp: > + (WebCore::CommandEntry::): > + > 2008-10-24 David Kilzer <ddkilzer@apple.com> > > Rolled out r37840 and r37841. > Index: WebCore/editing/Editor.cpp > =================================================================== > --- WebCore/editing/Editor.cpp (revision 37801) > +++ WebCore/editing/Editor.cpp (working copy) > @@ -259,8 +259,14 @@ bool Editor::deleteWithDirection(Selecti > break; > case SelectionController::BACKWARD: > case SelectionController::LEFT: > - if (m_frame->document()) > + if (m_frame->document()) { > + String stringToDelete = plainText(range.get()); > TypingCommand::deleteKeyPressed(m_frame->document(), false, granularity); > + if (stringToDelete.length() > 1) { > + stringToDelete.truncate(stringToDelete.length() - 1); > + TypingCommand::insertText(m_frame->document(), stringToDelete, false, false); What if 'granularity' is not a "character"? I think the above change needs to be done only when 'granularity' is 'character'? I also wonder if this meets everybody's expectation of 'delete backward'. For instance, without your change, Latin base letter + combining accent will be deleted as a unit. With this change, only the combining accent will be deleted, right? Given that widely used Latin letters with accents are represented in a composed form, it's not very critical. However, some Latin letters with accents (used in African languages, for instance) do not have a composed form representation. So, at least, you may want to add a TODO comment about that. Perhaps, in the long run, a new granularity of 'Codepoint' needs to be introduced. IIRC, 'character' granularity is actually 'grapheme (cluster)' granularity at the moment. > + } > + } > break; > } > revealSelectionAfterEditingOperation(); > Index: WebCore/editing/EditorCommand.cpp > =================================================================== > --- WebCore/editing/EditorCommand.cpp (revision 37801) > +++ WebCore/editing/EditorCommand.cpp (working copy) > @@ -1164,7 +1164,7 @@ static const CommandMap& createCommandMa > { "CreateLink", { executeCreateLink, supported, enabledInRichlyEditableText, stateNone, valueNull, notTextInsertion, doNotAllowExecutionWhenDisabled } }, > { "Cut", { executeCut, supported, enabledCut, stateNone, valueNull, notTextInsertion, allowExecutionWhenDisabled } }, > { "Delete", { executeDelete, supported, enabledDelete, stateNone, valueNull, notTextInsertion, doNotAllowExecutionWhenDisabled } }, > - { "DeleteBackward", { executeDeleteBackward, supportedFromMenuOrKeyBinding, enabledInEditableText, stateNone, valueNull, notTextInsertion, doNotAllowExecutionWhenDisabled } }, > + { "DeleteBackward", { executeDeleteBackward, supported, enabledInEditableText, stateNone, valueNull, notTextInsertion, doNotAllowExecutionWhenDisabled } }, > { "DeleteBackwardByDecomposingPreviousCharacter", { executeDeleteBackwardByDecomposingPreviousCharacter, supportedFromMenuOrKeyBinding, enabledInEditableText, stateNone, valueNull, notTextInsertion, doNotAllowExecutionWhenDisabled } }, > { "DeleteForward", { executeDeleteForward, supportedFromMenuOrKeyBinding, enabledInEditableText, stateNone, valueNull, notTextInsertion, doNotAllowExecutionWhenDisabled } }, > { "DeleteToBeginningOfLine", { executeDeleteToBeginningOfLine, supportedFromMenuOrKeyBinding, enabledInEditableText, stateNone, valueNull, notTextInsertion, doNotAllowExecutionWhenDisabled } }, > Index: LayoutTests/ChangeLog > =================================================================== > --- LayoutTests/ChangeLog (revision 37842) > +++ LayoutTests/ChangeLog (working copy) > @@ -1,3 +1,9 @@ > +2008-10-24 Hironori Bono <hbono@chromium.org> > + > + Reviewed by NOBODY (OOPS!). > + > + * editing/deleting/delete-ligature.html: Added. > + > 2008-10-24 David Kilzer <ddkilzer@apple.com> > > Rolled out r37840. > Index: LayoutTests/editing/deleting/delete-ligature.html > =================================================================== > --- LayoutTests/editing/deleting/delete-ligature.html (revision 0) > +++ LayoutTests/editing/deleting/delete-ligature.html (revision 0) > @@ -0,0 +1,44 @@ > +<html xmlns="http://www.w3.org/1999/xhtml"> > + <head> > + <script src="../editing.js" language="javascript" type="text/javascript" ></script> > + <script language="javascript" type="text/javascript"> > + function execBackwardDeleteCommand() { > + document.execCommand("DeleteBackward"); > + } > + function backwardDeleteCommand() { > + if (commandDelay > 0) { > + window.setTimeout(execBackwardDeleteCommand, commandCount * commandDelay); > + commandCount++; > + } else { > + execBackwardDeleteCommand(); > + } > + } > + function log(str) { > + var li = document.createElement("li"); > + li.appendChild(document.createTextNode(str)); > + var console = document.getElementById("console"); > + console.appendChild(li); > + } > + function editingTest() { > + var textarea = document.getElementById("test"); > + textarea.setSelectionRange(2, 2); > + backwardDeleteCommand(); > + if (textarea.value == String.fromCharCode(0x0E27)) { > + log("Succeeded."); > + } else { > + log("Failed. Actual: \"" + textarea.value + "\", Expected: \"" + String.fromCharCode(0x0E27) + "\""); > + } > + } > + </script> > + <title>Editing Test (Deleting a ligature)</title> > +</head> > + <body> > + <p>This test tests if a ligature "&#x0E27;&#x0E31;" is decomposed while deleting it with a back-space key.</p> > + <p>If this test succeeds, you can see "&#x0E27;" (U+0E27) and a string "succeeded" below.</p> > + <textarea id="test" rows="1" cols="40">&#x0E27;&#x0E31;</textarea> > + <ul id="console"></ul> > + <script language="javascript" type="text/javascript"> > + runEditingTest(); > + </script> > + </body> > +</html> > > Property changes on: LayoutTests/editing/deleting/delete-ligature.html > ___________________________________________________________________ > Added: svn:executable > + * >
Hironori Bono
Comment 5 2008-10-27 14:23:40 PDT
Created attachment 24694 [details] the third proposed fix (In reply to comment #4) > What if 'granularity' is not a "character"? I think the above change needs to > be done only when 'granularity' is 'character'? Thank you for noticing this. You are right. I have updated my patch to check if the 'granularity' value is 'CharacterGranularity'. > I also wonder if this meets everybody's expectation of 'delete backward'. > For instance, without your change, Latin base letter + combining accent will be > deleted as a unit. With this change, only the combining accent will be deleted, right? Yes. When a ligature consists of a Latin character and Unicode combining characters (U+0300,...,U+036F), my change deletes only the last combining character. > Given that widely used Latin letters with accents are represented in a composed > form, it's not very critical. However, some Latin letters with accents (used in African > languages, for instance) do not have a composed form representation. So, at > least, you may want to add a TODO comment about that. I have added a 'FIXME' comment which notes this problem. > Perhaps, in the long run, a new granularity of 'Codepoint' needs to be > introduced. IIRC, 'character' granularity is actually 'grapheme (cluster)' > granularity at the moment.
Hironori Bono
Comment 6 2008-11-03 22:50:29 PST
Excuse me. Is it possible to give me what I should do next? Many people have been waiting for us to fix this issue and I would like to fix it as soon as possible.
Jungshik Shin
Comment 7 2008-11-03 23:17:34 PST
Comment on attachment 24694 [details] the third proposed fix > Index: WebCore/ChangeLog > =================================================================== > --- WebCore/ChangeLog (revision 37899) > +++ WebCore/ChangeLog (working copy) > @@ -1,3 +1,14 @@ > +2008-10-27 Hironori Bono <hbono@chromium.org> > + > + Reviewed by NOBODY (OOPS!). I guess you need to mention this bug here and add a brief description of the problem and your fix. Sorry if it's me who held you up. It looks good to me. I'm asking for review. By the way, are you going to implement DeleteBackwardByDecomposingPreviousCharacter as well?
Justin Garcia
Comment 8 2008-11-05 16:38:27 PST
Comment on attachment 24694 [details] the third proposed fix +2008-10-27 Hironori Bono <hbono@chromium.org> + + Reviewed by NOBODY (OOPS!). + + Test: editing/deleting/delete-ligature.html + + * editing/Editor.cpp: + (WebCore::Editor::deleteWithDirection): + * editing/EditorCommand.cpp: + (WebCore::CommandEntry::): Please describe the reasoning behind your changes here. case SelectionController::LEFT: - if (m_frame->document()) + if (m_frame->document()) { + String stringToDelete = plainText(range.get()); We found recently that deleting a text consumes most of the CPU on the iPhone and I'm worried about what performance impact this may have. Could you investigate? + <p>This test tests if a ligature "&#x0E27;&#x0E31;" is decomposed while deleting it with a back-space key.</p> Shouldn't you be implementing deleteBackwardByDecomposingPreviousCharacter then? Justin
Hironori Bono
Comment 9 2008-11-05 20:29:34 PST
Created attachment 24933 [details] the forth proposed patch (containing descriptions)
Hironori Bono
Comment 10 2008-11-05 20:38:06 PST
Created attachment 24934 [details] the fourth proposed fix
Hironori Bono
Comment 11 2008-11-05 20:54:50 PST
Thank you for your review and comments. (In reply to comment #8) > (From update of attachment 24694 [details] [edit]) > +2008-10-27 Hironori Bono <hbono@chromium.org> > + > + Reviewed by NOBODY (OOPS!). > + > + Test: editing/deleting/delete-ligature.html > + > + * editing/Editor.cpp: > + (WebCore::Editor::deleteWithDirection): > + * editing/EditorCommand.cpp: > + (WebCore::CommandEntry::): > > Please describe the reasoning behind your changes here. Sorry for the lack of descriptions. I have added them to "WebCore/ChangeLog" and "LayoutTests/ChangeLog". > case SelectionController::LEFT: > - if (m_frame->document()) > + if (m_frame->document()) { > + String stringToDelete = plainText(range.get()); > > We found recently that deleting a text consumes most of the CPU on the iPhone > and I'm worried about what performance impact this may have. Could you > investigate? Even though I cannot use iPhones to investigate processing power while deleting a text, it does not consume so much processing power on Chrome. > + <p>This test tests if a ligature "&#x0E27;&#x0E31;" is decomposed > while deleting it with a back-space key.</p> > > Shouldn't you be implementing deleteBackwardByDecomposingPreviousCharacter > then? Correctly, this fix changes the behavior of the "DeleteBackward" command to delete only the last character of a ligature consisting of multiple Unicode characters. That is, this fix does not decompose many Latin ligatures each of which consists of one Unicode character, e.g. this fix does not decompose '&#224;' (U+00E0) to 'a' + '`' but just deletes '&#224;' itself. Since I'm not sure this behaviors should be denoted as "decompose", I can move this fix to "deleteBackwardByDecomposingPreviousCharacter" function if you prefer it. > Justin
Alexey Proskuryakov
Comment 12 2008-11-17 13:15:58 PST
Comment on attachment 24934 [details] the fourth proposed fix I suppose that this was meant for review, marking as such. I don't think that this patch should enable DeleteBackward - that's not as easy as it looks, because we'd need to check whether its behavior in edge cases matches other commands, and how it differs from BackwardDelete. Please change the test to use eventSender (see e.g. editing/deleting/delete-by-word-001.html). Sorry for misguiding you by giving an example that uses commands. I don't have an opinion on the substance of the fix.
Hironori Bono
Comment 13 2008-11-21 01:50:31 PST
Created attachment 25344 [details] Fifth proposed fix Thank you for your comments and sorry for my late response. I have updated my fix to use the eventSender object and to make it work on the trunk (r38647).
Hironori Bono
Comment 14 2008-11-27 20:22:26 PST
Created attachment 25564 [details] Sixth proposed patch Sorry. My previous patch does not work at all since r38755. I have fixed a problem of my patch and its layout test.
Alexey Proskuryakov
Comment 15 2008-12-01 04:51:00 PST
Comment on attachment 25564 [details] Sixth proposed patch Please don't forget to set review? flag. I tested this change, and it doesn't work sufficiently well yet. Here are the issues I encountered in my testing: 1) Undo is broken. Steps to reproduce: - open the test in Safari, - press Backspace, - press Cmd+Z for Undo. Results: instead of the combining character being brought back, the first character is deleted, too. It is quite clear why this happens: the delete operation can not be simply replaced with a delete/re-add pair. 2) Deleting still doesn't work correctly after moving the caret. Steps to reproduce: - open the test in Safari, - press Left arrow, - press Right arrow, - press Backspace. Results: the whole ligature is deleted. 3) The automated test doesn't pass on Mac OS X. First, I'm getting EDITING DELEGATE lines that are not in this patch (I'm surprised that you didn't have them on Windows). Second, font metrics are different, so box sizes don't match. For this second problem, you could consider making the test text-only. When problems (1) and (2) are fixed, we'll need automated tests for them, as well.
Hironori Bono
Comment 16 2008-12-02 02:23:04 PST
Created attachment 25669 [details] seventh proposed (and temporal) fix without layout test results Thank you for your review and comments. Sorry for sending corrupted patch. I misunderstood the recent changes to the Editor class and wrote a completely corrupted fix. Even though I have another fix (attached to this comment) which fixes the reported problems, I would not like to send its review request because I'm wondering if it works on MacOS X. I'm now getting a MacBook for my webkit development and moving my development environment to it. Once I move my development environment of WebKit to MacOS X and fixes problems on it, I will send a review request.
Hironori Bono
Comment 17 2008-12-08 23:19:32 PST
Created attachment 25877 [details] eighth proposed fix Sorry for my late response. I got a macbook, set up my development environment for WebKit on it, and made my fix and tests work on it. Is it possible to review the latest fix?
Hironori Bono
Comment 18 2008-12-09 00:52:40 PST
Created attachment 25878 [details] nineth proposed fix Sorry. I forgot changing the "LayoutTests/ChangeLog". Is it possible to review this fix instead of the previous one?
Alexey Proskuryakov
Comment 19 2008-12-09 03:02:17 PST
Comment on attachment 25878 [details] nineth proposed fix + Added test for verifying if a backspace key deletes only the last character of a ligature which consists of + multiple Unicode characters. There is a tab in ChangeLog here, please replace it with spaces. These tests are not text-only, but the patch only includes text results. Please use layoutTestController to make the tests text-only: if (window.layoutTestController) layoutTestController.dumpAsText(). This patch works much better, but it breaks at least one existing layout test for me: editing/deleting/delete-br-011.html. Does this happen on your machine?
Hironori Bono
Comment 20 2008-12-10 01:43:36 PST
Created attachment 25917 [details] tenth proposed fix Thank you for your review and sorry for my stupid mistakes. I have added dumpAsText() calls in my tests and updated my patch to fix the layout-test regression in "delete-br-011.html". As far as I tested in my macbook, I cannot see any more regressions in my macbook. (WebKit r39162 has several layout-test regressions and I cannot confirm my fix does not yield any more regressions, though.)
Alexey Proskuryakov
Comment 21 2008-12-10 04:50:43 PST
Comment on attachment 25917 [details] tenth proposed fix r=me. As already mentioned by Jungshik Shin in comment 4, there may be follow-up work to do for rare characters that do not have precomposed form, or for text that is encoded in decomposed form - but I don't expect this to come up in practice often, and user expectations in these cases aren't obvious.
Alexey Proskuryakov
Comment 22 2008-12-10 04:51:56 PST
Alexey Proskuryakov
Comment 23 2008-12-10 05:16:12 PST
I just realized that the comments shouldn't talk about ligatures - ligatures are formed at rendering time from non-combining Unicode characters, like English "ffi", or many Indic ones. This patch fixed Backspace for sequences with combining characters, as ligatures worked correctly already. This will need to be corrected to avoid future confusion.
Note You need to log in before you can comment on or make changes to this bug.