Bug 21820 - Unable to enter the Tamil UNICODE Characters via Thamizha Phonetic IME
Summary: Unable to enter the Tamil UNICODE Characters via Thamizha Phonetic IME
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: HTML Editing (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Windows XP
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-10-22 19:40 PDT by Hironori Bono
Modified: 2008-12-10 05:16 PST (History)
2 users (show)

See Also:


Attachments
proposed fix (1.45 KB, patch)
2008-10-22 19:46 PDT, Hironori Bono
no flags Details | Formatted Diff | Diff
my second proposed patch including a layout test (5.87 KB, patch)
2008-10-24 02:39 PDT, Hironori Bono
no flags Details | Formatted Diff | Diff
the third proposed fix (6.18 KB, patch)
2008-10-27 14:23 PDT, Hironori Bono
no flags Details | Formatted Diff | Diff
the forth proposed patch (containing descriptions) (6.85 KB, patch)
2008-11-05 20:29 PST, Hironori Bono
no flags Details | Formatted Diff | Diff
the fourth proposed fix (6.86 KB, patch)
2008-11-05 20:38 PST, Hironori Bono
no flags Details | Formatted Diff | Diff
Fifth proposed fix (4.64 KB, patch)
2008-11-21 01:50 PST, Hironori Bono
no flags Details | Formatted Diff | Diff
Sixth proposed patch (7.40 KB, patch)
2008-11-27 20:22 PST, Hironori Bono
ap: review-
Details | Formatted Diff | Diff
seventh proposed (and temporal) fix without layout test results (9.35 KB, patch)
2008-12-02 02:23 PST, Hironori Bono
no flags Details | Formatted Diff | Diff
eighth proposed fix (17.64 KB, patch)
2008-12-08 23:19 PST, Hironori Bono
no flags Details | Formatted Diff | Diff
nineth proposed fix (18.05 KB, patch)
2008-12-09 00:52 PST, Hironori Bono
ap: review-
Details | Formatted Diff | Diff
tenth proposed fix (15.17 KB, patch)
2008-12-10 01:43 PST, Hironori Bono
ap: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Hironori Bono 2008-10-22 19:40:52 PDT
Copied from "http://code.google.com/p/chromium/issues/detail?id=1097"

* Steps to Reproduce
1. Download the Thamizha tamil phonetic IME From
http://www.thamizha.com/downloads/ekalappai20b_anjal.zip
2. Extract the archive and install.
3. Try to enter Tamil UNICODE Characters to any checkboxes see the characters got deleted automatically once you try to enter some tamil characters. This is not the case with Firefox or Internet explorer. 

* Expected Results
I need to enter Tamil UNICODE characters to wikipedia project, blogging sites directly from the IME.

* Actual Results
Characters got deleted automatically one I try to enter a character. 

* Additional Builds and Platforms:
Other browsers tested:
Add OK or FAIL after other browsers where you have tested this issue:
   Safari 3: FAIL
  Firefox 3: OK
       IE 7: OK

- Does not occur on
The English version of Windows XP

* Additional Information:
Copy paste from notepad to browser with Tamil UNICODE characters works but
it is a real waste of time. Some similar issue was there with Yahoo!
messenger 7 and it has been rectified with latest version 9 (I think worked
with version 8 as well).

This issue is caused by the Editor::deleteWithDirection() function which deletes a ligature without decomposing it.
This issue also happens on languages which uses ligatures, e.g. Arabic, Lao, Thai, etc.
The easiest solution is changing the above function to insert a decomposed characters after deleting a ligature.
Comment 1 Hironori Bono 2008-10-22 19:46:12 PDT
Created attachment 24586 [details]
proposed fix

This is the simplest solution for this issue. I'm wondering if this is the best one, though.
Comment 2 Alexey Proskuryakov 2008-10-23 05:31:31 PDT
I'm leaving it to others to decide whether this is a good fix. But this absolutely needs an automated test case. See e.g. editing/deleting/forward-delete.html and platform/mac/editing/input/devanagari-ligature.html for some examples of how to make such.
Comment 3 Hironori Bono 2008-10-24 02:39:23 PDT
Created attachment 24637 [details]
my second proposed patch including a layout test

Thank you for your review.
I have added a layout test "LayoutTests/editing/deleting/delete-ligature.html" which verifies if this issue is fixed.
While writing this test, I need to change another file "WebCore/editing/EditorCommand.cpp" to let a JavaScript call the executeDeleteBackward() function through document.execCommand("DeleteBackward").
Comment 4 Jungshik Shin 2008-10-24 10:19:56 PDT
Comment on attachment 24637 [details]
my second proposed patch including a layout test

> Index: WebCore/ChangeLog
> ===================================================================
> --- WebCore/ChangeLog	(revision 37842)
> +++ WebCore/ChangeLog	(working copy)
> @@ -1,3 +1,14 @@
> +2008-10-24  Hironori Bono  <hbono@chromium.org>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        Test: editing/deleting/delete-ligature.html
> +
> +        * editing/Editor.cpp:
> +        (WebCore::Editor::deleteWithDirection):
> +        * editing/EditorCommand.cpp:
> +        (WebCore::CommandEntry::):
> +
>  2008-10-24  David Kilzer  <ddkilzer@apple.com>
>  
>          Rolled out r37840 and r37841.
> Index: WebCore/editing/Editor.cpp
> ===================================================================
> --- WebCore/editing/Editor.cpp	(revision 37801)
> +++ WebCore/editing/Editor.cpp	(working copy)
> @@ -259,8 +259,14 @@ bool Editor::deleteWithDirection(Selecti
>                  break;
>              case SelectionController::BACKWARD:
>              case SelectionController::LEFT:
> -                if (m_frame->document())
> +                if (m_frame->document()) {
> +                    String stringToDelete = plainText(range.get());
>                      TypingCommand::deleteKeyPressed(m_frame->document(), false, granularity);
> +                    if (stringToDelete.length() > 1) {
> +                        stringToDelete.truncate(stringToDelete.length() - 1);
> +                        TypingCommand::insertText(m_frame->document(), stringToDelete, false, false);

What if 'granularity' is not a "character"?  I think the above change needs to be done only when 'granularity' is 'character'? 

I also wonder if this meets everybody's expectation of 'delete backward'. For instance, without your change, Latin base letter + combining accent will be deleted
as a unit. With this change, only the combining accent will be deleted, right? Given that widely used Latin letters with accents are represented in a composed form, it's 
not very critical. However, some Latin letters with accents (used in African languages, for instance) do not have a composed form representation. So, at least, you may want to
add a TODO comment about that. 

Perhaps, in the long run, a new granularity of 'Codepoint' needs to be introduced. IIRC, 'character' granularity is actually 'grapheme (cluster)' granularity at the moment. 
 



> +                    }
> +                }
>                  break;
>          }
>          revealSelectionAfterEditingOperation();
> Index: WebCore/editing/EditorCommand.cpp
> ===================================================================
> --- WebCore/editing/EditorCommand.cpp	(revision 37801)
> +++ WebCore/editing/EditorCommand.cpp	(working copy)
> @@ -1164,7 +1164,7 @@ static const CommandMap& createCommandMa
>          { "CreateLink", { executeCreateLink, supported, enabledInRichlyEditableText, stateNone, valueNull, notTextInsertion, doNotAllowExecutionWhenDisabled } },
>          { "Cut", { executeCut, supported, enabledCut, stateNone, valueNull, notTextInsertion, allowExecutionWhenDisabled } },
>          { "Delete", { executeDelete, supported, enabledDelete, stateNone, valueNull, notTextInsertion, doNotAllowExecutionWhenDisabled } },
> -        { "DeleteBackward", { executeDeleteBackward, supportedFromMenuOrKeyBinding, enabledInEditableText, stateNone, valueNull, notTextInsertion, doNotAllowExecutionWhenDisabled } },
> +        { "DeleteBackward", { executeDeleteBackward, supported, enabledInEditableText, stateNone, valueNull, notTextInsertion, doNotAllowExecutionWhenDisabled } },
>          { "DeleteBackwardByDecomposingPreviousCharacter", { executeDeleteBackwardByDecomposingPreviousCharacter, supportedFromMenuOrKeyBinding, enabledInEditableText, stateNone, valueNull, notTextInsertion, doNotAllowExecutionWhenDisabled } },
>          { "DeleteForward", { executeDeleteForward, supportedFromMenuOrKeyBinding, enabledInEditableText, stateNone, valueNull, notTextInsertion, doNotAllowExecutionWhenDisabled } },
>          { "DeleteToBeginningOfLine", { executeDeleteToBeginningOfLine, supportedFromMenuOrKeyBinding, enabledInEditableText, stateNone, valueNull, notTextInsertion, doNotAllowExecutionWhenDisabled } },
> Index: LayoutTests/ChangeLog
> ===================================================================
> --- LayoutTests/ChangeLog	(revision 37842)
> +++ LayoutTests/ChangeLog	(working copy)
> @@ -1,3 +1,9 @@
> +2008-10-24  Hironori Bono  <hbono@chromium.org>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        * editing/deleting/delete-ligature.html: Added.
> +
>  2008-10-24  David Kilzer  <ddkilzer@apple.com>
>  
>          Rolled out r37840.
> Index: LayoutTests/editing/deleting/delete-ligature.html
> ===================================================================
> --- LayoutTests/editing/deleting/delete-ligature.html	(revision 0)
> +++ LayoutTests/editing/deleting/delete-ligature.html	(revision 0)
> @@ -0,0 +1,44 @@
> +<html xmlns="http://www.w3.org/1999/xhtml">
> +    <head>
> +        <script src="../editing.js" language="javascript" type="text/javascript" ></script>
> +        <script language="javascript" type="text/javascript">
> +        function execBackwardDeleteCommand() {
> +            document.execCommand("DeleteBackward");
> +        }
> +        function backwardDeleteCommand() {
> +            if (commandDelay > 0) {
> +                window.setTimeout(execBackwardDeleteCommand, commandCount * commandDelay);
> +                commandCount++;
> +            } else {
> +                execBackwardDeleteCommand();
> +            }
> +        }
> +        function log(str) {
> +            var li = document.createElement("li");
> +            li.appendChild(document.createTextNode(str));
> +            var console = document.getElementById("console");
> +            console.appendChild(li);
> +        }
> +        function editingTest() {
> +            var textarea = document.getElementById("test");
> +            textarea.setSelectionRange(2, 2);
> +            backwardDeleteCommand();
> +            if (textarea.value == String.fromCharCode(0x0E27)) {
> +                log("Succeeded.");
> +            } else {
> +                log("Failed. Actual: \"" + textarea.value + "\", Expected: \"" + String.fromCharCode(0x0E27) + "\"");
> +            }
> +        }
> +        </script>
> +    <title>Editing Test (Deleting a ligature)</title> 
> +</head> 
> +    <body>
> +        <p>This test tests if a ligature "&#x0E27;&#x0E31;" is decomposed while deleting it with a back-space key.</p>
> +        <p>If this test succeeds, you can see "&#x0E27;" (U+0E27) and a string "succeeded" below.</p>
> +        <textarea id="test" rows="1" cols="40">&#x0E27;&#x0E31;</textarea>
> +        <ul id="console"></ul>
> +        <script language="javascript" type="text/javascript">
> +        runEditingTest();
> +        </script>
> +    </body>
> +</html>
> 
> Property changes on: LayoutTests/editing/deleting/delete-ligature.html
> ___________________________________________________________________
> Added: svn:executable
>    + *
>
Comment 5 Hironori Bono 2008-10-27 14:23:40 PDT
Created attachment 24694 [details]
the third proposed fix

(In reply to comment #4)

> What if 'granularity' is not a "character"?  I think the above change needs to
> be done only when 'granularity' is 'character'? 

Thank you for noticing this. You are right.
I have updated my patch to check if the 'granularity' value is 'CharacterGranularity'.

> I also wonder if this meets everybody's expectation of 'delete backward'.
> For instance, without your change, Latin base letter + combining accent will be
> deleted as a unit. With this change, only the combining accent will be deleted, right?

Yes. When a ligature consists of a Latin character and Unicode combining characters (U+0300,...,U+036F), my change deletes only the last combining character.

> Given that widely used Latin letters with accents are represented in a composed
> form, it's not very critical. However, some Latin letters with accents (used in African
> languages, for instance) do not have a composed form representation. So, at
> least, you may want to add a TODO comment about that. 

I have added a 'FIXME' comment which notes this problem.
 
> Perhaps, in the long run, a new granularity of 'Codepoint' needs to be
> introduced. IIRC, 'character' granularity is actually 'grapheme (cluster)'
> granularity at the moment.
Comment 6 Hironori Bono 2008-11-03 22:50:29 PST
Excuse me.
Is it possible to give me what I should do next? Many people have been waiting for us to fix this issue and I would like to fix it as soon as possible.
Comment 7 Jungshik Shin 2008-11-03 23:17:34 PST
Comment on attachment 24694 [details]
the third proposed fix

> Index: WebCore/ChangeLog
> ===================================================================
> --- WebCore/ChangeLog	(revision 37899)
> +++ WebCore/ChangeLog	(working copy)
> @@ -1,3 +1,14 @@
> +2008-10-27  Hironori Bono  <hbono@chromium.org>
> +
> +        Reviewed by NOBODY (OOPS!).

I guess you need to mention this bug here and add a brief description of the problem and your fix.

Sorry if it's me who held you up. It looks good to me. I'm asking for review.

By the way, are you going to implement DeleteBackwardByDecomposingPreviousCharacter as well?
Comment 8 Justin Garcia 2008-11-05 16:38:27 PST
Comment on attachment 24694 [details]
the third proposed fix

+2008-10-27  Hironori Bono  <hbono@chromium.org>
+
+        Reviewed by NOBODY (OOPS!).
+
+        Test: editing/deleting/delete-ligature.html
+
+        * editing/Editor.cpp:
+        (WebCore::Editor::deleteWithDirection):
+        * editing/EditorCommand.cpp:
+        (WebCore::CommandEntry::):

Please describe the reasoning behind your changes here.

             case SelectionController::LEFT:
-                if (m_frame->document())
+                if (m_frame->document()) {
+                    String stringToDelete = plainText(range.get());

We found recently that deleting a text consumes most of the CPU on the iPhone and I'm worried about what performance impact this may have.  Could you investigate?

+        <p>This test tests if a ligature "&#x0E27;&#x0E31;" is decomposed while deleting it with a back-space key.</p>

Shouldn't you be implementing deleteBackwardByDecomposingPreviousCharacter then?

Justin
Comment 9 Hironori Bono 2008-11-05 20:29:34 PST
Created attachment 24933 [details]
the forth proposed patch (containing descriptions)
Comment 10 Hironori Bono 2008-11-05 20:38:06 PST
Created attachment 24934 [details]
the fourth proposed fix
Comment 11 Hironori Bono 2008-11-05 20:54:50 PST
Thank you for your review and comments.

(In reply to comment #8)
> (From update of attachment 24694 [details] [edit])
> +2008-10-27  Hironori Bono  <hbono@chromium.org>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        Test: editing/deleting/delete-ligature.html
> +
> +        * editing/Editor.cpp:
> +        (WebCore::Editor::deleteWithDirection):
> +        * editing/EditorCommand.cpp:
> +        (WebCore::CommandEntry::):
> 
> Please describe the reasoning behind your changes here.

Sorry for the lack of descriptions.
I have added them to "WebCore/ChangeLog" and "LayoutTests/ChangeLog".

>              case SelectionController::LEFT:
> -                if (m_frame->document())
> +                if (m_frame->document()) {
> +                    String stringToDelete = plainText(range.get());
> 
> We found recently that deleting a text consumes most of the CPU on the iPhone
> and I'm worried about what performance impact this may have.  Could you
> investigate?

Even though I cannot use iPhones to investigate processing power while deleting a text, it does not consume so much processing power on Chrome.

> +        <p>This test tests if a ligature "&#x0E27;&#x0E31;" is decomposed
> while deleting it with a back-space key.</p>
>
> Shouldn't you be implementing deleteBackwardByDecomposingPreviousCharacter
> then?

Correctly, this fix changes the behavior of the "DeleteBackward" command to delete only the last character of a ligature consisting of multiple Unicode characters. That is, this fix does not decompose many Latin ligatures each of which consists of one Unicode character, e.g. this fix does not decompose '&#224;' (U+00E0) to 'a' + '`' but just deletes '&#224;' itself. Since I'm not sure this behaviors should be denoted as "decompose", I can move this fix to "deleteBackwardByDecomposingPreviousCharacter" function if you prefer it.

> Justin
Comment 12 Alexey Proskuryakov 2008-11-17 13:15:58 PST
Comment on attachment 24934 [details]
the fourth proposed fix

I suppose that this was meant for review, marking as such.

I don't think that this patch should enable DeleteBackward - that's not as easy as it looks, because we'd need to check whether its behavior in edge cases matches other commands, and how it differs from BackwardDelete. Please change the test to use eventSender (see e.g. editing/deleting/delete-by-word-001.html). Sorry for misguiding you by giving an example that uses commands.

I don't have an opinion on the substance of the fix.
Comment 13 Hironori Bono 2008-11-21 01:50:31 PST
Created attachment 25344 [details]
Fifth proposed fix

Thank you for your comments and sorry for my late response.
I have updated my fix to use the eventSender object and to make it work on the trunk (r38647).
Comment 14 Hironori Bono 2008-11-27 20:22:26 PST
Created attachment 25564 [details]
Sixth proposed patch

Sorry. My previous patch does not work at all since r38755.
I have fixed a problem of my patch and its layout test.
Comment 15 Alexey Proskuryakov 2008-12-01 04:51:00 PST
Comment on attachment 25564 [details]
Sixth proposed patch

Please don't forget to set review? flag.

I tested this change, and it doesn't work sufficiently well yet. Here are the issues I encountered in my testing:

1) Undo is broken. Steps to reproduce:
 - open the test in Safari,
 - press Backspace,
 - press Cmd+Z for Undo.
Results: instead of the combining character being brought back, the first character is deleted, too.
It is quite clear why this happens: the delete operation can not be simply replaced with a delete/re-add pair.

2) Deleting still doesn't work correctly after moving the caret. Steps to reproduce:
 - open the test in Safari,
 - press Left arrow,
 - press Right arrow,
 - press Backspace.
Results: the whole ligature is deleted.

3) The automated test doesn't pass on Mac OS X. First, I'm getting EDITING DELEGATE lines that are not in this patch (I'm surprised that you didn't have them on Windows). Second, font metrics are different, so box sizes don't match. For this second problem, you could consider making the test text-only.

When problems (1) and (2) are fixed, we'll need automated tests for them, as well.
Comment 16 Hironori Bono 2008-12-02 02:23:04 PST
Created attachment 25669 [details]
seventh proposed (and temporal) fix without layout test results

Thank you for your review and comments.

Sorry for sending corrupted patch. I misunderstood the recent changes to the Editor class and wrote a completely corrupted fix.
Even though I have another fix (attached to this comment) which fixes the reported problems, I would not like to send its review request because I'm wondering if it works on MacOS X. I'm now getting a MacBook for my webkit development and moving my development environment to it. Once I move my development environment of WebKit to MacOS X and fixes problems on it, I will send a review request.
Comment 17 Hironori Bono 2008-12-08 23:19:32 PST
Created attachment 25877 [details]
eighth proposed fix

Sorry for my late response.
I got a macbook, set up my development environment for WebKit on it, and made my fix and tests work on it.
Is it possible to review the latest fix?
Comment 18 Hironori Bono 2008-12-09 00:52:40 PST
Created attachment 25878 [details]
nineth proposed fix

Sorry. I forgot changing the "LayoutTests/ChangeLog".
Is it possible to review this fix instead of the previous one?
Comment 19 Alexey Proskuryakov 2008-12-09 03:02:17 PST
Comment on attachment 25878 [details]
nineth proposed fix

+        Added test for verifying if a backspace key deletes only the last character of a ligature which consists of
+	multiple Unicode characters.

There is a tab in ChangeLog here, please replace it with spaces.

These tests are not text-only, but the patch only includes text results. Please use layoutTestController to make the tests text-only:
if (window.layoutTestController)
    layoutTestController.dumpAsText().

This patch works much better, but it breaks at least one existing layout test for me: editing/deleting/delete-br-011.html. Does this happen on your machine?
Comment 20 Hironori Bono 2008-12-10 01:43:36 PST
Created attachment 25917 [details]
tenth proposed fix

Thank you for your review and sorry for my stupid mistakes.
I have added dumpAsText() calls in my tests and updated my patch to fix the layout-test regression in "delete-br-011.html".
As far as I tested in my macbook, I cannot see any more regressions in my macbook. (WebKit r39162 has several layout-test regressions and I cannot confirm my fix does not yield any more regressions, though.)
Comment 21 Alexey Proskuryakov 2008-12-10 04:50:43 PST
Comment on attachment 25917 [details]
tenth proposed fix

r=me. As already mentioned by Jungshik Shin in comment 4, there may be follow-up work to do for rare characters that do not have precomposed form, or for text that is encoded in decomposed form - but I don't expect this to come up in practice often, and user expectations in these cases aren't obvious.
Comment 22 Alexey Proskuryakov 2008-12-10 04:51:56 PST
Committed as <http://trac.webkit.org/changeset/39169>.
Comment 23 Alexey Proskuryakov 2008-12-10 05:16:12 PST
I just realized that the comments shouldn't talk about ligatures - ligatures are formed at rendering time from non-combining Unicode characters, like English "ffi", or many Indic ones. This patch fixed Backspace for sequences with combining characters, as ligatures worked correctly already.

This will need to be corrected to avoid future confusion.