Bug 25298

Summary: Ctrl + Right/Left arrow move forward/backward through document instead of right/left in RTL text
Product: WebKit Reporter: Xiaomei Ji <xji>
Component: HTML EditingAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: amir.aharoni, dglazkov, eric, jdiggs, mario, mitz, noamraph, nomaed, playmobil, progame+wk, schaub1st, uriber, webkit.review.bot, xji
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Windows XP   
Bug Depends on: 57233, 57336, 57543, 57806, 58294, 58504, 59652, 60910, 61324, 61344, 61346, 61348, 61978, 62247, 62814, 65277, 66436, 71163, 71600, 72048, 78856, 81136, 81408, 81581, 85017    
Bug Blocks:    
Attachments:
Description Flags
test case
none
Proposed fix for review
none
unfinished patch
none
path /w layout test
none
path /w layout test
mitz: review-
patch w/ layout test
none
patch w/ layout test
mitz: review-
patch w/ layout test
none
patch w/ layout test
none
just test case none

Description Xiaomei Ji 2009-04-20 12:07:22 PDT
This bug is one of the cases not addressed by the patch of
https://bugs.webkit.org/show_bug.cgi?id=3729 

Pressing ctrl + right/left arrow should move caret visually (not logically) across word boundary.
Comment 1 Xiaomei Ji 2009-04-20 12:07:46 PDT
Created attachment 29619 [details]
test case
Comment 2 Xiaomei Ji 2009-04-20 12:09:18 PDT
Chromium bug:
http://code.google.com/p/chromium/issues/detail?id=10741
Comment 3 Arvind 2009-05-14 02:44:02 PDT
Hi,
I am analysing this bug and found a probable solution.
Kindly let me know the way I am planning to fix this bug is correct.
In editorcommand.cpp file functions executeMoveWordLeft() or executeMoveWordRight() are used to create the fucntionality.Here I check for the direction of the text and if the direction is RTL I reverse the direction that is sent to frame->selection()->modify() function.
Comment 4 Xiaomei Ji 2009-05-14 10:05:34 PDT
(In reply to comment #3)
> Hi,
> I am analysing this bug and found a probable solution.
> Kindly let me know the way I am planning to fix this bug is correct.
> In editorcommand.cpp file functions executeMoveWordLeft() or
> executeMoveWordRight() are used to create the fucntionality.Here I check for
> the direction of the text and if the direction is RTL I reverse the direction
> that is sent to frame->selection()->modify() function.
> 

I doubt it works.
The problem with current implementation is that ctrl+right/left arrow follows logical order. The correct behavior should be following visual order.

Seems that your fix assumes that visual order is the reverse of logical order in RTL, which might not be true.

Considering the following test cases:
The logical order of a string is "abc def HIJ", in which "HIJ" are Hebrew characters.
The visual order would be "JIH abc def", when cursor is within "abc", press ctrl+right arrow twice should move cursor to the end of "abc", then to the end of 'def'.
Comment 5 Arvind 2009-05-17 22:25:49 PDT
(In reply to comment #4)
> (In reply to comment #3)
> > Hi,
> > I am analysing this bug and found a probable solution.
> > Kindly let me know the way I am planning to fix this bug is correct.
> > In editorcommand.cpp file functions executeMoveWordLeft() or
> > executeMoveWordRight() are used to create the fucntionality.Here I check for
> > the direction of the text and if the direction is RTL I reverse the direction
> > that is sent to frame->selection()->modify() function.
> > 
> 
> I doubt it works.
> The problem with current implementation is that ctrl+right/left arrow follows
> logical order. The correct behavior should be following visual order.
> 
> Seems that your fix assumes that visual order is the reverse of logical order
> in RTL, which might not be true.
> 
> Considering the following test cases:
> The logical order of a string is "abc def HIJ", in which "HIJ" are Hebrew
> characters.
> The visual order would be "JIH abc def", when cursor is within "abc", press
> ctrl+right arrow twice should move cursor to the end of "abc", then to the end
> of 'def'.
> 

Hi  Xiaomei Ji,
Thanks for your valuable suggestion.I checked the fix against the test case 
> Considering the following test cases:
> The logical order of a string is "abc def HIJ", in which "HIJ" are Hebrew
> characters.
> The visual order would be "JIH abc def", when cursor is within "abc", press
> ctrl+right arrow twice should move cursor to the end of "abc", then to the end
> of 'def'.

It seems to work fine. I have attached the fix that I had proposed. Kindly review it and let me know your comments about it if you still feel there is something wrong in the way I am trying to fix it. 

Comment 6 Arvind 2009-05-17 22:28:57 PDT
Created attachment 30436 [details]
Proposed fix for review

Modified code is present under 
static bool executeMoveWordLeft(Frame* frame, Event*, EditorCommandSource, const String&)
static bool executeMoveWordLeftAndModifySelection(Frame* frame, Event*, EditorCommandSource, const String&)
static bool executeMoveWordRight(Frame* frame, Event*, EditorCommandSource, const String&)
static bool executeMoveWordRightAndModifySelection(Frame* frame, Event*, EditorCommandSource, const String&)
Comment 7 Xiaomei Ji 2009-05-18 12:13:46 PDT
(In reply to comment #6)
> Created an attachment (id=30436) [review]
> Proposed fix for review
> 
> Modified code is present under 
> static bool executeMoveWordLeft(Frame* frame, Event*, EditorCommandSource,
> const String&)
> static bool executeMoveWordLeftAndModifySelection(Frame* frame, Event*,
> EditorCommandSource, const String&)
> static bool executeMoveWordRight(Frame* frame, Event*, EditorCommandSource,
> const String&)
> static bool executeMoveWordRightAndModifySelection(Frame* frame, Event*,
> EditorCommandSource, const String&)
> 

Hi Arvind,

Thanks for working on this. 

1. I thought you were using whether the renderer()->style()->direction() == RTL to toggle to move left/right, then, I think my previous comments apply.

2. By using renderer->containsReversedText(), seems it works fine for the test case. But it breaks the LTR text containing mixed text. 
I only tried the changed executeMoveWordLeft().
Given an example: 
<div contenteditable> 
abc &#1488;&#1489;&#1490; xyz &#1491;&#1492;&#1493; def
</div> 
ctrl+left-arrow works fine originally, but not with the fix. The fix moves the cursor to right visually.

3. I think you probably do not need to change the 2 modify selection functions:
static bool executeMoveWordLeftAndModifySelection(Frame* frame, Event*,
EditorCommandSource, const String&)
static bool executeMoveWordRightAndModifySelection(Frame* frame, Event*,
EditorCommandSource, const String&)

They are corresponding to shift-ctrl-arrow, which follows logical order. and should work fine after the fix of issue 24303
https://bugs.webkit.org/show_bug.cgi?id=24303

4. I guess  you are just uploading the patch for opinions, not for real review.
You probably know that for real review, you need to follow the guideline in
http://webkit.org/coding/contributing.html


Thanks,
Xiaomei

Comment 8 Arvind 2009-05-18 23:51:35 PDT
(In reply to comment #7)
> (In reply to comment #6)
> > Created an attachment (id=30436) [review] [review]
> > Proposed fix for review
> > 
> > Modified code is present under 
> > static bool executeMoveWordLeft(Frame* frame, Event*, EditorCommandSource,
> > const String&)
> > static bool executeMoveWordLeftAndModifySelection(Frame* frame, Event*,
> > EditorCommandSource, const String&)
> > static bool executeMoveWordRight(Frame* frame, Event*, EditorCommandSource,
> > const String&)
> > static bool executeMoveWordRightAndModifySelection(Frame* frame, Event*,
> > EditorCommandSource, const String&)
> > 
> Hi Arvind,
> Thanks for working on this. 
> 1. I thought you were using whether the renderer()->style()->direction() == RTL
> to toggle to move left/right, then, I think my previous comments apply.
> 2. By using renderer->containsReversedText(), seems it works fine for the test
> case. But it breaks the LTR text containing mixed text. 
> I only tried the changed executeMoveWordLeft().
> Given an example: 
> <div contenteditable> 
> abc &#1488;&#1489;&#1490; xyz &#1491;&#1492;&#1493; def
> </div> 
> ctrl+left-arrow works fine originally, but not with the fix. The fix moves the
> cursor to right visually.
> 3. I think you probably do not need to change the 2 modify selection functions:
> static bool executeMoveWordLeftAndModifySelection(Frame* frame, Event*,
> EditorCommandSource, const String&)
> static bool executeMoveWordRightAndModifySelection(Frame* frame, Event*,
> EditorCommandSource, const String&)
> They are corresponding to shift-ctrl-arrow, which follows logical order. and
> should work fine after the fix of issue 24303
> https://bugs.webkit.org/show_bug.cgi?id=24303
> 4. I guess  you are just uploading the patch for opinions, not for real review.
> You probably know that for real review, you need to follow the guideline in
> http://webkit.org/coding/contributing.html
> Thanks,
> Xiaomei

Hi Xiaomei
Thanks for your review.I will try to fix other such scenarios as well.

Comment 9 Xiaomei Ji 2010-05-27 17:00:20 PDT
Created attachment 57283 [details]
unfinished patch

Hi Mitz,

I am trying to fix the issue by creating the string in visual order sequence, then finding the word boundary within the string, and convert the boundary offset within the string into DOM position finally.

I only changed the ctrl-left-arrow part.
It works correctly for bidi text in RTL div, LTR div, RTL div with span, and LTR div with span.

But it does not work correctly for bidi text in RTL div with LTR span or LTR div with RTL span.

Given the following example:
<div contenteditable dir=rtl>abc ששש def <span dir=ltr>שנב  abc סטז</span>  uvw זזז xyz</div> 

ctrl-left-arrow does not work correctly when caret is around the Hebrew characters inside the <span>.

The position information in some of the positions inside <span> looks like a bug to me, which caused the ctrl-left-arrow does not move correctly.

Below, I am using "(node, offset)a" to represent the node and offset information of position when caret is before character "a". 

Assume "שנב abc  סטז" is node A and "abc ששש def " is node B.

(A, 3)ב(A, 2)נ(A, 1)ש(B, 0xc) (A, 4)a(A, 6)b(A, 7)c(A, 8) (A, 0xc)ז(A, 0xB)ט(A, 0xa)ס(A, 9).

The position information (B, 0xc) looks like a bug. position (A, 4) or (A, 6) looks like a bug too (why (A, 5) is missing?).

The above node has 4 runs, their (start, length) are (0, 3), (3, 1), (5, 4), and (9, 3).
The information (5, 4) looks like a bug. Why it starts at 5, which is not continuous with previous run?

Based on the above information, when caret is after the rightmost Hebrew character 'ס', leftWordPosition() will find the word break is at string offset 4 when the string is "abc סטז", then it adds offset 4 to the 3rd run's start offset 5 to get the offset within node A, which is 9. And the position (A, 9) is where the caret is at originally.


Given an example of LTR div with RTL span, such as 
<div contenteditable>abc ששש def <span dir=rtl>שנב  abc סטז</span>  uvw ששש xyz</div>

I am not able to mouse click to put the caret before 'a' inside the <span>. left-arrow and right-arrow do not work correctly inside <span> either, and it is not able to place caret before 'a' using arrow key.
Maybe related to position information too.


I feel the position computation is wrong given the above example, it might be computed wrongly in bidi resolver or somewhere else.
Or I am in the wrong direction of fixing the issue.

Would like to get your opinion.
Comment 10 Xiaomei Ji 2010-06-01 18:05:52 PDT
> 
> Given the following example:
> <div contenteditable dir=rtl>abc ששש def <span dir=ltr>שנב  abc סטז</span>  uvw זזז xyz</div> 
> 
> ctrl-left-arrow does not work correctly when caret is around the Hebrew characters inside the <span>.
> 
> The position information in some of the positions inside <span> looks like a bug to me, which caused the ctrl-left-arrow does not move correctly.
> 
> Below, I am using "(node, offset)a" to represent the node and offset information of position when caret is before character "a". 
> 
> Assume "שנב abc  סטז" is node A and "abc ששש def " is node B.
> 
> (A, 3)ב(A, 2)נ(A, 1)ש(B, 0xc) (A, 4)a(A, 6)b(A, 7)c(A, 8) (A, 0xc)ז(A, 0xB)ט(A, 0xa)ס(A, 9).
> 
> The position information (B, 0xc) looks like a bug. position (A, 4) or (A, 6) looks like a bug too (why (A, 5) is missing?).

(A, 5) is the 2nd space before "abc" within span.

> 
> The above node has 4 runs, their (start, length) are (0, 3), (3, 1), (5, 4), and (9, 3).
> The information (5, 4) looks like a bug. Why it starts at 5, which is not continuous with previous run?

It is because of the 2 contiguous spaces before "abc" within span.

The patch does not work in all cases. The way to compute character offset inside InlineBox to VisiblePosition is wrong. It's based on the assumption that   offset of VisiblePosition for all positions in an InlineBox are continuous, which is not true for the above example.
Comment 11 Xiaomei Ji 2010-06-04 18:12:46 PDT
Created attachment 57932 [details]
path /w layout test

The patch does not work correctly when there are more than 1 contiguous spaces in the inline element. See commented out test cases in move-left-right.html
Comment 12 Eric Seidel (no email) 2010-06-04 18:26:11 PDT
Attachment 57932 [details] did not build on mac:
Build output: http://webkit-commit-queue.appspot.com/results/3143032
Comment 13 WebKit Review Bot 2010-06-05 02:09:32 PDT
Attachment 57932 [details] did not build on win:
Build output: http://webkit-commit-queue.appspot.com/results/3127056
Comment 14 Xiaomei Ji 2010-06-07 16:36:01 PDT
Created attachment 58093 [details]
path /w layout test

fix compilation error.
large patch size due to test result file.
Comment 15 WebKit Review Bot 2010-06-07 20:24:19 PDT
Attachment 58093 [details] did not build on win:
Build output: http://webkit-commit-queue.appspot.com/results/3208016
Comment 16 mitz 2010-06-16 10:00:54 PDT
Comment on attachment 58093 [details]
path /w layout test

Thanks for your patch! I have some comments on this version:

Does it really make sense to use a word break iterator on a string in visual order? I would think that the word break iterator is meant to be use on a logically-ordered string. For example, for languages where word breaks are based on a dictionary, like Thai, I don’t think this will work.

> +    Node* de = d->documentElement();

Please use a longer name for this variable, like “documentElement”.

> +    while (box) {
> +        node = box->renderer()->node();
> +        // Get strings in visual order.
> +        if (box->direction() == LTR) {
> +            range->setStart(node, box->caretMinOffset(), ec);
> +            startNode ? range->setEnd(node, offset, ec) : range->setEnd(node, box->caretMaxOffset(), ec);

Normally the ternary operator is only used in WebKit when the result is used (for example, as a parameter to a function or assigned to a variable). In other cases, an if statement is used.

> +            SimplifiedBackwardsTextIterator it(range.get());

I am confused by the way you set up the range and use of a backwards text iterator. I am really not sure how adjusting only one end of the range makes sense; the code seems to make some assumptions about how the text iterator works and what it returns first (you never advance the iterator) which it should probably not be making.

> +            string.prepend(it.characters(), it.length());
> +        } else {
> +            startNode ? range->setStart(node, offset, ec) : range->setStart(node, box->caretMinOffset(), ec);
> +            range->setEnd(node, box->caretMaxOffset(), ec);
> +            SimplifiedBackwardsTextIterator it(range.get());
> +            for (int i = 0; i < it.length(); ++i)
> +                string.prepend(it.characters()[i]);

Like I said, I doubt that reversing the string will always give the correct result. Moreover, you cannot reverse a string like this, because it will break surrogate pairs (characters that are encoded by two UTF-16 characters).

> +        prev = previousWordPositionBoundary(string.data(), string.size(), string.size(), DontHaveMoreContext, needMoreContext);

Is it really true that you couldn’t provide more context to the break iterator if needed?

> +VisiblePosition rightWordPosition(const VisiblePosition& c)

A lot of the code in this function is identical or similar to leftWordPosition(). The common code should not be repeated. It should be factored out (probably into file-static functions).

I think it’s great that you’ve added so many tests! Do you know if the results in these tests match the behavior of the native text system on Mac OS X (that is, does it behave like TextEdit?)? What about other platforms’ native text systems?

It’s especially good that you have these tests and the expected results, because I think you need to take a significantly different approach to this bug, and the tests will allow you to verify that the new approach produces the same (or better) results. It seems like the behavior you were going for here is to return the (visually) nearest position to the left (respectively, to the right) that is a word boundary. A naive algorithm could just iterate over word boundaries (logically) forward and backwards along the line, then pick the one nearest to the starting position to the left (resp. to the right). Does that sounds like a correct description of the desired behavior? If so, perhaps the right thing to do would be to start with an implementation of the naive algorithm, then see how it can be made reasonably efficient.
Comment 17 Xiaomei Ji 2010-06-16 12:48:28 PDT
Thanks for the review!
Please see my replies inline.

(In reply to comment #16)
> (From update of attachment 58093 [details])
> Thanks for your patch! I have some comments on this version:
> 
> Does it really make sense to use a word break iterator on a string in visual order? I would think that the word break iterator is meant to be use on a logically-ordered string. For example, for languages where word breaks are based on a dictionary, like Thai, I don’t think this will work.

I also had a bit concern on whether it is correct to use work break iterator on a sting in visual order.

Yes, you are right that work break iterator is meant to be used on a logically-ordered string. 
But is it true that visual order is different from logical order only for RTL languages and those languages use space as word delimiter? If so, the approach is probably ok.

For dictionary based word break language, such as Chinese, since the visual order is the same as the logical order, the approach should work. 
Is Thai different from other dictionary-based-word-break languages?

 
> 
> > +    Node* de = d->documentElement();
> 
> Please use a longer name for this variable, like “documentElement”.
> 
> > +    while (box) {
> > +        node = box->renderer()->node();
> > +        // Get strings in visual order.
> > +        if (box->direction() == LTR) {
> > +            range->setStart(node, box->caretMinOffset(), ec);
> > +            startNode ? range->setEnd(node, offset, ec) : range->setEnd(node, box->caretMaxOffset(), ec);
> 
> Normally the ternary operator is only used in WebKit when the result is used (for example, as a parameter to a function or assigned to a variable). In other cases, an if statement is used.
> 
> > +            SimplifiedBackwardsTextIterator it(range.get());
> 
> I am confused by the way you set up the range and use of a backwards text iterator. I am really not sure how adjusting only one end of the range makes sense; the code seems to make some assumptions about how the text iterator works and what it returns first (you never advance the iterator) which it should probably not be making.
> 
> > +            string.prepend(it.characters(), it.length());
> > +        } else {
> > +            startNode ? range->setStart(node, offset, ec) : range->setStart(node, box->caretMinOffset(), ec);
> > +            range->setEnd(node, box->caretMaxOffset(), ec);
> > +            SimplifiedBackwardsTextIterator it(range.get());
> > +            for (int i = 0; i < it.length(); ++i)
> > +                string.prepend(it.characters()[i]);
> 
> Like I said, I doubt that reversing the string will always give the correct result. Moreover, you cannot reverse a string like this, because it will break surrogate pairs (characters that are encoded by two UTF-16 characters).
> 
> > +        prev = previousWordPositionBoundary(string.data(), string.size(), string.size(), DontHaveMoreContext, needMoreContext);
> 
> Is it really true that you couldn’t provide more context to the break iterator if needed?

I was just making it simple.

> 
> > +VisiblePosition rightWordPosition(const VisiblePosition& c)
> 
> A lot of the code in this function is identical or similar to leftWordPosition(). The common code should not be repeated. It should be factored out (probably into file-static functions).
> 
> I think it’s great that you’ve added so many tests! Do you know if the results in these tests match the behavior of the native text system on Mac OS X (that is, does it behave like TextEdit?)? What about other platforms’ native text systems?

It does not (completely) match native behavior in Windows,  OS X, or Linux.

In Widows, arrow key follows logical order, same for ctrl-arrow key.

In Linux, from my test, looks like ctrl-arrow follows logical order too, although arrow key follows visual order. There is a bug filed on it: https://bugzilla.gnome.org/show_bug.cgi?id=136059

In OS X, if the writing direction is RTL, both arrow and alt-arrow follows visual order for pure RTL text, but alt-arrow does not work correctly for pure English.

> 
> It’s especially good that you have these tests and the expected results, because I think you need to take a significantly different approach to this bug, and the tests will allow you to verify that the new approach produces the same (or better) results. It seems like the behavior you were going for here is to return the (visually) nearest position to the left (respectively, to the right) that is a word boundary. A naive algorithm could just iterate over word boundaries (logically) forward and backwards along the line, then pick the one nearest to the starting position to the left (resp. to the right). Does that sounds like a correct description of the desired behavior? If so, perhaps the right thing to do would be to start with an implementation of the naive algorithm, then see how it can be made reasonably efficient.

Good suggestion.

But since caret positions are not always continuous, especially at the boundary,
then, the problem is how to "pick the one nearest to the starting position".
For example:
<div contenteditable dir=rtl>abc def שנב סטז uvw xyz</div> 

the position will be:
.....(8) (7)a(1)b(2)c(3) (4)d(5)e(6)f(0)

the word boundaries are: 0, 4, 3, 7, 8....

when the caret is between 'a' and 'b', whose position is 1, how to make ctrl-left-arrow pick position 7 instead of position 4?

An alternative is to use VisiblePosition::left() to advance caret and check it against the word break positions.

Another one is to use startOfWord/endOfWord to find the boundary, then, only do further work when the caret is already at the boundary.

Or for ctrl-left-arrow,
1. move caret left by VisiblePosition::left()
2. return startOfWord() if the current box is LTR,
or return endOfWord() if the current box is RTL.
3. plus some tweaks if necessary.
Same applies for ctrl-right-arrow.
Comment 18 Xiaomei Ji 2010-06-18 11:36:02 PDT
> .... It seems like the behavior you were going for here is to return the (visually) nearest position to the left (respectively, to the right) that is a word boundary. A naive algorithm could just iterate over word boundaries (logically) forward and backwards along the line, then pick the one nearest to the starting position to the left (resp. to the right). Does that sounds like a correct description of the desired behavior? ....

Looks like when collecting the word boundaries in one line, we need to start from the begin of the line going (logically) forward and start from the end of the line going (logically) backward, in order to get all word boundaries. (I will call it method 1 for later reference).

Start from the caret position going (logically) forward till the end of line and (logically) backward till the begin of line wont cover all word boundaries.
Given example <div contenteditable dir=rtl>abc ששש def <span dir=ltr>שנב  abc סטז</span>  uvw זזז xyz</div>, when caret is visually at the left of word "ששש", logically forward and backward from the caret position wont cover word boundary between  "סטז" and "uvw".

The problem of collecting word boundaries using method 1 is that the word separator is being treated as a word. Given an example "ab cd", the word boundaries include position both before and after ' '(space). Then, when caret is at the right of ' '(space), ctrl-left-arrow will move the caret at the left of the space, which is not only incorrect (ctrl-left-arrow should move caret at the (visual) begin of (visually) left word), but also very weird for non-RTL involved strings.

That brings the underlying question: since there is no clear simple relationship between logical and visual ordering, should we co-relate them for this case?

BTW, startOfWord() and endOfWord() are *not* visual operation either.
Comment 19 Yair Yogev 2010-09-04 12:03:28 PDT
I want to point out that while the behavior you are trying to accomplish sounds optimal, once implemented WebKit will be the only engine to behave that way.
For that reason, since implementing it seems so complex, maybe as a temp solution you should consider fixing the current broken behavior to line up with the other browsers? It's not optimal but at least it will not feel completely broken for the RTL users (unlike the current situation)
Comment 20 Xiaomei Ji 2010-10-25 16:13:37 PDT
(In reply to comment #19)
> I want to point out that while the behavior you are trying to accomplish sounds optimal, once implemented WebKit will be the only engine to behave that way.
> For that reason, since implementing it seems so complex, maybe as a temp solution you should consider fixing the current broken behavior to line up with the other browsers? It's not optimal but at least it will not feel completely broken for the RTL users (unlike the current situation.

That's the behavior of Firefox (at least for most cases. Maybe a bug or two).
Could you please give an concrete example to show what Firefox compromise?
Comment 21 Yair Yogev 2010-10-25 16:53:34 PDT
My mistake, i guess i was comparing to an old behavior of FF i remembered.

What i meant was that "reversing" the current behavior for RTL (with a similar logic to your fix for Shift+Arrow i guess) would be a big improvement for the users (i think that's the IE behavior currently?)... just in case the desired behavior you are working on runs into setbacks due to its complexity.
Comment 22 Noam Yorav-Raphael 2010-11-16 01:38:27 PST
I encountered this bug, and found this bug report. I understand that it turns out to be really complex to do word movement correctly.

Can I suggest a solution which will make all this much simpler and make me happier as a user? Just don't use visual cursor movement.

I hate visual cursor movement. Why not use the obvious and simple solution - the right arrow key moves logically forward in LTR paragraphs and logically backward in RTL paragraphs, and vice versa for the left arrow?

That's what Windows does, and OpenOffice does, and the new version of Google Docs does; I feel comfortable editing Hebrew texts using those tools. When using tools which try to offer visual movement (Webkit, Firefox, GTK apps), I always get into trouble. If I wrote something like "123 Street" in Hebrew, and want to add a "4" after the number, I never know how to do this: if I move the cursor to the left of the number, is it actually before the number or after it? Not to mention bugs which always arise, when the cursor keeps going in loops around a number over and over, and I have to use the mouse to get out of it (and sometimes even the mouse won't help.) In GTK there's even a "secondary cursor" which is supposed to help in such situations, but I was never able to deduce what exactly it does.

To summarize, visual movement is a problem that's VERY hard to solve correctly (maybe that's the reason that programmers are attracted to it), and there's NO REASON to solve, since if the algorithm is so complex, how am I, as a user, supposed to predict what it's going to do? Just give me predictable, simple logical movement, and I would be happy.

I know that it must be hard to throw away all the work spent on the intricacies of visual movement, but that's what should be done, in my opinion.

Thanks for reading this,
Noam
Comment 23 Ram Rachum 2010-11-19 05:43:38 PST
(Another Hebrew speaker here who has been frustrated by this bug for too long.)

I think Noam's comment is dead-on: Logical movement sounds like the right solution to me.
Comment 24 Jeremy Moskovich 2010-11-19 06:01:37 PST
This bug is not the right forum to discuss whether WebKit should use logical or visual caret movements.

Please refrain from posting "me too" comments and non-technical opinions.  This only serves to hide the technical content in this bug and makes it harder for developers to work on fixing this issue.

And for what it's worth, I agree with you, this bug is annoying and I'd also like to see it fixed ASAP.
Comment 25 Noam Yorav-Raphael 2010-11-21 06:00:16 PST
What is the right forum to discuss that? Because it will fix this bug (and also improve WebKit behavior in other editing situations.)
Comment 26 Xiaomei Ji 2011-01-10 17:31:29 PST
Created attachment 78476 [details]
patch w/ layout test

I tried different approaches, but seems this brutal force way works the best: it is the easiest, cleanest, and closest to complete correctness.

The patch computes word boundary box by box in order to save the correct word break position either before a space or after a space (if space is used as word breaker), so <div>who<span>ever</span><div> might be considered as 2 words "who" and "ever".

The patch works for most cases, but does not work well in complex structure such as block with inline element that having different directionality. It does not work well in 2 aspects:
1. word break might stop at both positions before and after a space in some case.
For example: <div dir=rtl class="test_move_by_word" contenteditable>abc ששש def <span dir=ltr>שנב  abc סטז</span>  uvw זזז xyz</div>
press ctrl-right-arrow when cursor is before "שנב abc" moves cursor after "שנב", press it again moves cursor after '  '(space).

2. word break might stop at the wrong position around space in some case. 
For example: <div dir=rtl class="test_move_by_word" contenteditable>abc ששש def <span dir=ltr>שנב  abc סטז</span>  uvw זזז xyz</div>
press ctrl-right-arrow when cursor is before "def שנב" moves cursor after '  '(space).
Comment 27 WebKit Review Bot 2011-01-10 17:45:31 PST
Attachment 78476 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/7473072
Comment 28 Early Warning System Bot 2011-01-10 17:52:32 PST
Attachment 78476 [details] did not build on qt:
Build output: http://queues.webkit.org/results/7409111
Comment 29 Xiaomei Ji 2011-01-11 10:42:01 PST
Created attachment 78549 [details]
patch w/ layout test

fix compilation error in chromium and qt (efl error seems not related).


I tried different approaches, but seems this brutal force way works the best: it is the easiest, cleanest, and closest to complete correctness.

The patch computes word boundary box by box in order to save the correct word break position either before a space or after a space (if space is used as word breaker), so <div>who<span>ever</span><div> might be considered as 2 words "who" and "ever".

The patch works for most cases, but does not work well in complex structure such as block with inline element that having different directionality. It does not work well in 2 aspects:
1. word break might stop at both positions before and after a space in some case.
For example: <div dir=rtl class="test_move_by_word" contenteditable>abc ששש def <span dir=ltr>שנב  abc סטז</span>  uvw זזז xyz</div>
press ctrl-right-arrow when cursor is before "שנב abc" moves cursor after "שנב", press it again moves cursor after '  '(space).

2. word break might stop at the wrong position around space in some case. 
For example: <div dir=rtl class="test_move_by_word" contenteditable>abc ששש def <span dir=ltr>שנב  abc סטז</span>  uvw זזז xyz</div>
press ctrl-right-arrow when cursor is before "def שנב" moves cursor after '  '(space).
Comment 30 mitz 2011-02-13 21:41:30 PST
Comment on attachment 78549 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=78549&action=review

> Source/WebCore/ChangeLog:22
> +        * editing/SelectionController.cpp:
> +        (WebCore::SelectionController::modifyExtendingRight):
> +        (WebCore::SelectionController::modifyMovingRight):
> +        (WebCore::SelectionController::modifyExtendingLeft):
> +        (WebCore::SelectionController::modifyMovingLeft):
> +        * editing/SelectionController.h:
> +        * editing/htmlediting.cpp:
> +        (WebCore::directionOfEnclosingBlock):
> +        * editing/htmlediting.h:
> +        * editing/visible_units.cpp:
> +        (WebCore::visiblePositionComparison):
> +        (WebCore::wordBoundary):
> +        (WebCore::leftWordPosition):
> +        (WebCore::rightWordPosition):
> +        * editing/visible_units.h:

The change log should describe the changes, not just list the changed functions.

> Source/WebCore/editing/SelectionController.cpp:-354
> -TextDirection SelectionController::directionOfEnclosingBlock()
> -{
> -    Node* enclosingBlockNode = enclosingBlock(m_selection.extent().node());
> -    if (!enclosingBlockNode)
> -        return LTR;
> -    RenderObject* renderer = enclosingBlockNode->renderer();
> -    if (renderer)
> -        return renderer->style()->direction();
> -    return LTR;
> -}
> -

Seems useful to leave this function here (perhaps changed to call the new one in htmlediting) rather than change all call sites.

> Source/WebCore/editing/htmlediting.cpp:335
> +TextDirection directionOfEnclosingBlock(Node* n)

It’s better to use a word like “node” rather than a single letter for a variable name.

> Source/WebCore/editing/htmlediting.h:100
> +// TextDirection on Node's enclosing block
> +TextDirection directionOfEnclosingBlock(Node*);

The function’s name makes the comment redundant. Can it take a const Node*?

> Source/WebCore/editing/visible_units.cpp:1221
> +enum WORDDIRECTION {
> +    LEFTWORD,
> +    RIGHTWORD
> +};

Enums and their values should be in “CamelCase”, not uppercase. How does this enum differ from TextDirection?

> Source/WebCore/editing/visible_units.cpp:1223
> +static VisiblePosition wordBoundary(enum WORDDIRECTION direction, const VisiblePosition& c)

No need to qualify WORDDIRECTION with the “enum” keyword. “c” is a poor name for a VisiblePosition. How about “visiblePosition” or “position”?

> Source/WebCore/editing/visible_units.cpp:1233
> +        VisiblePosition wordBreak = ((direction == RIGHTWORD && box->direction() == LTR) || (direction == LEFTWORD && box->direction() == RTL) ? nextWordPosition(c) : previousWordPosition(c));

For example, here, if you used TextDirection instead of WORDDIRECTION you could have just written this expression as
    (direction == boxDirection) ? nextWordPosition(c) : previousWordPosition(c);

> Source/WebCore/editing/visible_units.cpp:1236
> +        if (offset != boxOfWordBreak->caretMaxOffset() && offset != boxOfWordBreak->caretMinOffset() && box == boxOfWordBreak)

You’d want to move the box == boxOfWorkdBreak condition to the beginning, to avoid unnecessary calls to caret{Max,Min}Offset(). If you do this, then you’ll be getting box->caret{Max,Min}Offset(). Since you’ve already gotten them in line 1232, I suggest using local variables for them.

> Source/WebCore/editing/visible_units.cpp:1242
> +    Vector<VisiblePosition> wordBoundaries;

You should give this Vector some inline capacity to avoid heap allocation in most cases.

> Source/WebCore/editing/visible_units.cpp:1256
> +        Node* node = renderer->node();
> +        if (!node)
> +            break;

Can we do better than bail out in this case? How does this affect lines starting or ending with generated content?

> Source/WebCore/editing/visible_units.cpp:1264
> +        if ((direction == RIGHTWORD && directionOfBlock != box->direction())
> +            || (direction == LEFTWORD && directionOfBlock == box->direction()))
> +            wordBoundaries.append(previousPosition); 

This appends previousPosition to wordBoundaries but we don’t know yet that previousPosition is a word boundary. Why is this okay?

> Source/WebCore/editing/visible_units.cpp:1267
> +        while (1) {

WebKit style is to use while (true)

> Source/WebCore/editing/visible_units.cpp:1305
> +    std::sort(wordBoundaries.begin(), wordBoundaries.end(), visiblePositionComparison);
> +
> +    VisiblePosition previousPosition = c;
> +    while (true) {
> +        VisiblePosition position = (direction == RIGHTWORD ? previousPosition.right(true) : previousPosition.left(true));
> +        if (std::binary_search(wordBoundaries.begin(), wordBoundaries.end(), position, visiblePositionComparison))
> +            return position;
> +        if (position == previousPosition)
> +            return position;
> +        previousPosition = position;
> +    }

This doesn’t look very efficient (and I am not convinced it’s correct). I would think that you could collect all word boundaries as (InlineBox*, offset) pairs (already ordered by box from left to right), find the nearest box to c in the direction you’re going, then find the word boundary offset nearest to c in that box (depending on the direction of that box). You’ll need to discard boundaries of the wrong kind (i.e. if going to the right, you want to keep boundaries where the word is to the left and the space is to the right).

> Source/WebCore/editing/visible_units.cpp:1309
> +VisiblePosition leftWordPosition(const VisiblePosition& c)

Please consider renaming “c”.

> Source/WebCore/editing/visible_units.cpp:1314
> +VisiblePosition rightWordPosition(const VisiblePosition& c)

Ditto.

> Source/WebCore/editing/visible_units.h:45
> +VisiblePosition rightWordPosition(const VisiblePosition &);
> +VisiblePosition leftWordPosition(const VisiblePosition &);

There shouldn’t be spaces before the ampersands here.
Comment 31 Xiaomei Ji 2011-02-16 09:05:04 PST
Thanks for the review and comments!


(In reply to comment #30)
> (From update of attachment 78549 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=78549&action=review
> 
> > Source/WebCore/editing/htmlediting.h:100
> > +// TextDirection on Node's enclosing block
> > +TextDirection directionOfEnclosingBlock(Node*);
> 
> The function’s name makes the comment redundant. Can it take a const Node*?

All underlying functions (including Position constructor) take a non-const Node*.

> 
> > Source/WebCore/editing/visible_units.cpp:1221
> > +enum WORDDIRECTION {
> > +    LEFTWORD,
> > +    RIGHTWORD
> > +};
> 
> Enums and their values should be in “CamelCase”, not uppercase. How does this enum differ from TextDirection?

This is only used to differentiate whether we are moving cursor left or right. It is introduced so that the leftWordPosition() and rightWordPosition() could share a common function. 

> 
> > Source/WebCore/editing/visible_units.cpp:1233
> > +        VisiblePosition wordBreak = ((direction == RIGHTWORD && box->direction() == LTR) || (direction == LEFTWORD && box->direction() == RTL) ? nextWordPosition(c) : previousWordPosition(c));
> 
> For example, here, if you used TextDirection instead of WORDDIRECTION you could have just written this expression as
>     (direction == boxDirection) ? nextWordPosition(c) : previousWordPosition(c);


It could. The problem is the meaning of the enum.

> 
> > Source/WebCore/editing/visible_units.cpp:1236
> > +        if (offset != boxOfWordBreak->caretMaxOffset() && offset != boxOfWordBreak->caretMinOffset() && box == boxOfWordBreak)
> 
> You’d want to move the box == boxOfWorkdBreak condition to the beginning, to avoid unnecessary calls to caret{Max,Min}Offset(). If you do this, then you’ll be getting box->caret{Max,Min}Offset(). Since you’ve already gotten them in line 1232, I suggest using local variables for them.

They are accessing different boxes, local variables wont help here.

> 
> > Source/WebCore/editing/visible_units.cpp:1256
> > +        Node* node = renderer->node();
> > +        if (!node)
> > +            break;
> 
> Can we do better than bail out in this case? How does this affect lines starting or ending with generated content?

I guess this should be a "continue" not a "break".


> 
> > Source/WebCore/editing/visible_units.cpp:1264
> > +        if ((direction == RIGHTWORD && directionOfBlock != box->direction())
> > +            || (direction == LEFTWORD && directionOfBlock == box->direction()))
> > +            wordBoundaries.append(previousPosition); 
> 
> This appends previousPosition to wordBoundaries but we don’t know yet that previousPosition is a word boundary. Why is this okay?


You are right. The code assumes box boundary as word breaker, so <div>who<span>ever</span><div> might be considered as 2 words "who" and "ever".
I will see whether I can get rid of it.

> 
> > Source/WebCore/editing/visible_units.cpp:1305
> > +    std::sort(wordBoundaries.begin(), wordBoundaries.end(), visiblePositionComparison);
> > +
> > +    VisiblePosition previousPosition = c;
> > +    while (true) {
> > +        VisiblePosition position = (direction == RIGHTWORD ? previousPosition.right(true) : previousPosition.left(true));
> > +        if (std::binary_search(wordBoundaries.begin(), wordBoundaries.end(), position, visiblePositionComparison))
> > +            return position;
> > +        if (position == previousPosition)
> > +            return position;
> > +        previousPosition = position;
> > +    }
> 
> This doesn’t look very efficient (and I am not convinced it’s correct). I would think that you could collect all word boundaries as (InlineBox*, offset) pairs (already ordered by box from left to right), find the nearest box to c in the direction you’re going, then find the word boundary offset nearest to c in that box (depending on the direction of that box). You’ll need to discard boundaries of the wrong kind (i.e. if going to the right, you want to keep boundaries where the word is to the left and the space is to the right).
> 

Good suggestion. I will try.
Comment 32 Xiaomei Ji 2011-02-16 09:07:01 PST
> > > Source/WebCore/editing/visible_units.cpp:1221
> > > +enum WORDDIRECTION {
> > > +    LEFTWORD,
> > > +    RIGHTWORD
> > > +};
> > 
> > Enums and their values should be in “CamelCase”, not uppercase. How does this enum differ from TextDirection?
> 
> This is only used to differentiate whether we are moving cursor left or right. It is introduced so that the leftWordPosition() and rightWordPosition() could share a common function. 
> 
> > 
> > > Source/WebCore/editing/visible_units.cpp:1233
> > > +        VisiblePosition wordBreak = ((direction == RIGHTWORD && box->direction() == LTR) || (direction == LEFTWORD && box->direction() == RTL) ? nextWordPosition(c) : previousWordPosition(c));
> > 
> > For example, here, if you used TextDirection instead of WORDDIRECTION you could have just written this expression as
> >     (direction == boxDirection) ? nextWordPosition(c) : previousWordPosition(c);
> 
> 
> It could. The problem is the meaning of the enum.
> 

Think it over again, I feel it is reasonable that they share the same enum.
Comment 33 mitz 2011-02-21 22:32:18 PST
(In reply to comment #32)
> > > > Source/WebCore/editing/visible_units.cpp:1221
> > > > +enum WORDDIRECTION {
> > > > +    LEFTWORD,
> > > > +    RIGHTWORD
> > > > +};
> > > 
> > > Enums and their values should be in “CamelCase”, not uppercase. How does this enum differ from TextDirection?
> > 
> > This is only used to differentiate whether we are moving cursor left or right. It is introduced so that the leftWordPosition() and rightWordPosition() could share a common function. 
> > 
> > > 
> > > > Source/WebCore/editing/visible_units.cpp:1233
> > > > +        VisiblePosition wordBreak = ((direction == RIGHTWORD && box->direction() == LTR) || (direction == LEFTWORD && box->direction() == RTL) ? nextWordPosition(c) : previousWordPosition(c));
> > > 
> > > For example, here, if you used TextDirection instead of WORDDIRECTION you could have just written this expression as
> > >     (direction == boxDirection) ? nextWordPosition(c) : previousWordPosition(c);
> > 
> > 
> > It could. The problem is the meaning of the enum.
> > 
> 
> Think it over again, I feel it is reasonable that they share the same enum.

Actually, I see your point about TextDirection being inappropriate. This should use the pre-existing SelectionDirection values DirectionLeft and DirectionRight.
Comment 34 Xiaomei Ji 2011-03-03 19:04:36 PST
Created attachment 84680 [details]
patch w/ layout test
Comment 35 WebKit Review Bot 2011-03-03 19:11:11 PST
Attachment 84680 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/8084494
Comment 36 Early Warning System Bot 2011-03-03 23:42:32 PST
Attachment 84680 [details] did not build on qt:
Build output: http://queues.webkit.org/results/8088294
Comment 37 Xiaomei Ji 2011-03-04 09:51:40 PST
Created attachment 84764 [details]
patch w/ layout test

fix the compilation error in cr-linux and qt.
Comment 38 Xiaomei Ji 2011-03-15 09:26:10 PDT
Dan,

Could you review and see whether this is in the right direction? Thanks.
Comment 39 Eric Seidel (no email) 2011-03-28 00:45:59 PDT
I think we should consider landing the test cases first w/o any of the code changes to document our current (broken) behavior.

I also worry that we may be trying to fix too much in one go.  Some users commented above that there may be simpler solutions which would be "good enough" for a first pass, and we could continue to iterate from there.

End result: I think we should do everything possible to make this change in several smaller pieces.  Landing the layout tests w/o any code changes is one easy piece to break off.
Comment 40 Eric Seidel (no email) 2011-03-28 01:00:18 PDT
It's not clear to me if this is fixing both control-arrow movements as well as normal arrow movements? (Or if both are even "broken" as some of the posters seem to indicate.)
Comment 41 Xiaomei Ji 2011-03-28 01:03:44 PDT
(In reply to comment #40)
> It's not clear to me if this is fixing both control-arrow movements as well as normal arrow movements? (Or if both are even "broken" as some of the posters seem to indicate.)

This is fixing control-arrow only.
arrow works as expected (arrow key moves cursor by character in visual order, except few edge case.). It was fixed by mitz a while ago in bug 3729.
Comment 42 Xiaomei Ji 2011-03-28 04:28:57 PDT
(In reply to comment #39)
> I think we should consider landing the test cases first w/o any of the code changes to document our current (broken) behavior.


The problem is currently, the behavior is ctrl-arrow moves caret by word in logical order.
So, if we land the test case first, only the pure English in LTR block will behaves as expected (in which case, the visual order is the same as logical order). All other cases (RTL text in LTR block, LTR or RTL tet in RTL block, bidi text in either LTR block or RTL block) will be wrong. And we will need to replace all those results when landing the correct behavior.

> 
> I also worry that we may be trying to fix too much in one go.  Some users commented above that there may be simpler solutions which would be "good enough" for a first pass, and we could continue to iterate from there.

Well, I am not quite sure how to split the current patch in to different small ones.
One tweak I can do is that: for the pure RTL text in LTR block or pure LTR text in RTL block, since the logical order is exactly the opposite of visual order, having a short cut to get the visual order by getting reverse order of logical orders. 
I need to check whether that taking care of the spaces correctly (as to where should stop -- right of space or left of space when space is used as word breaker).

And, this approach wont work for bidi text at all, and it is totally different approach from current patch.


> 
> End result: I think we should do everything possible to make this change in several smaller pieces.  Landing the layout tests w/o any code changes is one easy piece to break off.
Comment 43 Eric Seidel (no email) 2011-03-28 04:30:36 PDT
(In reply to comment #42)
> (In reply to comment #39)
> > I think we should consider landing the test cases first w/o any of the code changes to document our current (broken) behavior.
> 
> 
> The problem is currently, the behavior is ctrl-arrow moves caret by word in logical order.
> So, if we land the test case first, only the pure English in LTR block will behaves as expected (in which case, the visual order is the same as logical order). All other cases (RTL text in LTR block, LTR or RTL tet in RTL block, bidi text in either LTR block or RTL block) will be wrong. And we will need to replace all those results when landing the correct behavior.

That's fine.  We commonly land test suites with failures and then fix the failures later.
Comment 44 Xiaomei Ji 2011-03-28 04:47:25 PDT
Hi Dan,

I talked with Uri about the patch.
Basically, high-level wise, the patch follows what Firefox does as following (extracted from Uri's email).
One of the differences is that Firefox uses space as word breaker while WebKit uses word-breaker which is a logical word breaker.

As to the bidi-level, Uri mentioned that it might be a good start to only consider bidi level  0, 1, and 2. 
I am currently ignoring bidi-level (like assuming bidi level are 0 or 1), which from Uri might be ok as the first step.

Other questions are:
1. The Mac's  native behavior is control-arrow moves caret in logical order (although it is really broken if you try a bidi text that logically begins with English word, then, Hebrew word, then English word. ctrl-right-arrow will be in dead-loop when there are 4 Hebrew words). If you want to keep Mac platform's behavior the same as native's, I can make this patch only work for other platforms.

2. Currently, Firefox in Mac and Windows behaves differently (by default, and is controlled by a config):
In Windows, ctrl-arrow always moves caret to the left boundary of a word in LTR block, and ctrl-arrow always moves caret to the right boundary of a word in RTL block.
In Mac, ctrl-left-arrow moves caret to the left boundary of a word, while  ctrl-right-arrow moves caret to the right boundary of a word.
If we are going to provide a cross-platform behavior as the first step, which behavior we should follow?


FYI: following is the email from Uri about how Firefox implemented it:

"The Mozilla source code for detecting where to stop when handling word-by-word movement is (mostly) at http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsFrame.cpp#5461.

.....

Background: It's important to understand Mozilla's rendering code's concept of "frame". For our purposes the most important type of frame is TextFrame. The key is that a TextFrame is always unidirectional. So a bidi HTML inline element is broken into multiple frames (non-bidi elements could also span a fe frames - as a frame can never span multiple lines).

The loop you see at lines 5492-5523 loops over frames (as long as there's need to continue), and for each (unidirectional!) frame, calls PeekOffsetWord(), which looks for an appropriate stopping position (roughly, a word boundary) within that frame. The outer loop uses GetFrameFromDirection(), which iterates over frames visually. For each frame encountered, IsMovingInFrameDirection() decides (roughly) weather the curent frame has to be searched forward from its beginning or backwards from its end - based on the direction of movement and the direction (LTR/RTL) of the frame and the containing block element. The output of this is then passed to PeekOffsetWord() which searches the frame accordingly.

The net effect of all of this is that the text is searched, letter-by-letter, in visual order, for an appropriate place to break."
Comment 45 Xiaomei Ji 2011-03-28 05:57:18 PDT
Created attachment 87131 [details]
just  test case
Comment 46 Eric Seidel (no email) 2011-03-28 06:14:10 PDT
Comment on attachment 87131 [details]
just  test case

View in context: https://bugs.webkit.org/attachment.cgi?id=87131&action=review

This seems reasonable to me.  Landing this first will make it very easy to see when we change behavior.

How do the expected results from this test match up with Firefox's behavior?  IE's behavior?

> LayoutTests/editing/selection/move-left-right-expected.txt:302
> +WARNING: Moved in the wrong direction in step 1: from (791, 574) to (701, 574).

This should probably say FAIL, no?  I geuss you didn't write this test, you just added to it.
Comment 47 Yair Yogev 2011-03-28 06:19:56 PDT
(In reply to comment #39)

>Some users commented above that there may be simpler solutions which would be "good enough" for a first pass, and we could continue to iterate from there.

That "good enough" (and easier) approach is to line up with IE instead of FF, but unfortunately that's not going to speedup the process of lining up with FF.
Comment 48 Eric Seidel (no email) 2011-03-28 06:38:33 PDT
Xiaomei commented to me in person that one of the ways that FF gets to their superior behavior is by "cheating" and defining word boundaries as spaces.  Which is not true for languages like Japanese (which don't have spaces) and thus require use of a heuristic (like the one provided by ICU).

At least I think I'm recalling the conversation with Xiamoei correctly.  She should correct me if I'm wrong! :)

It might be useful to consider "cheating" like FF for the complex cases if that will make our behavior much better.  I don't know.
Comment 49 Xiaomei Ji 2011-03-28 07:34:36 PDT
(In reply to comment #48)
> Xiaomei commented to me in person that one of the ways that FF gets to their superior behavior is by "cheating" and defining word boundaries as spaces.  Which is not true for languages like Japanese (which don't have spaces) and thus require use of a heuristic (like the one provided by ICU).
> 
> At least I think I'm recalling the conversation with Xiamoei correctly.  She should correct me if I'm wrong! :)
> 
> It might be useful to consider "cheating" like FF for the complex cases if that will make our behavior much better.  I don't know.

Firefox implements control-arrow moves caret by word visually.
It treats space as word breaker, which is not completely right for CJK languages, and it simplifies the process a bit. But simply using space as word boundary is not the reason that they achieve the superior behavior. They did a good job, I have to say. 

We use word breaker which break words logically, which made it a bit complicated than what Firefox does, but I do not think we should move backward.
Comment 50 Xiaomei Ji 2011-04-15 16:53:41 PDT
Comment on attachment 87131 [details]
just  test case

clear flag since it is obsolete. We are implementing this incrementally.
Comment 51 Joanmarie Diggs (irc: joanie) 2011-05-02 15:49:19 PDT
Mario:

I took a quick look at the test case, and the fact that in webkitgtk you move in one direction by char and the opposite direction by word in rtl languages is really disturbing.

How this shall impact what we get from object:text-caret-moved and object:text-selection-changed signals is something I just promised some people in #webkit that you and I would look into. ;-) I'll buy the beers next week. ;-) Thanks!!
Comment 52 Mario Sanchez Prada 2011-05-04 03:05:29 PDT
(In reply to comment #51)
> Mario:
> 
> I took a quick look at the test case, and the fact that in webkitgtk you move 
> in one direction by char and the opposite direction by word in rtl languages is 
> really disturbing.

Joanie, I took a look to the test case with GtkLauncher and I'm afraid I don't get the problem properly. Could you post here some "how to reproduce the problem for dummies" steps so I can see the problem myself? :-)

Thank you!

> How this shall impact what we get from object:text-caret-moved and object:text-
> selection-changed signals is something I just promised some people in #webkit 
> that you and I would look into. ;-) I'll buy the beers next week. ;-) Thanks!!

Other option is to tackle this next week, but if you could provide the steps I could probably spend some time not taking a look to it, hopefully saving time from the following week for other stuff.
Comment 53 Joanmarie Diggs (irc: joanie) 2011-05-04 05:28:15 PDT
(In reply to comment #52)

> Joanie, I took a look to the test case with GtkLauncher and I'm afraid I don't get the problem properly. Could you post here some "how to reproduce the problem for dummies" steps so I can see the problem myself? :-)

GtkLauncher has keyboard nav? Maybe I need my own dummies guide. ;-)

Anyhoo:

1. In Epiphany, view this: https://bugs.webkit.org/attachment.cgi?id=29619

2. Press Tab to give focus to one of the lines.

3. Press Left Arrow (result: moves to the left)

4. Press Ctrl Left Arrow (result: moves to the right)

5. Press Right Arrow (result: moves to the right)

6. Press Ctrl Right Arrow (result: moves to the left)

I, of course, defer to readers of RTL languages to determine what direction right and left should move with respect to RTL text. However, the fact that Left and Ctrl+Left move in opposite directions, and that Right and Ctrl+Right move in opposite directions surely cannot be correct.

So those working on the bug are straightening that out, and will do so for WebKitGtk too (as I understand it). This will include *nix-unique caret-positioning (not yet addressed).

All we have to do (I believe) is be sure that the change doesn't have unexpected side effects w.r.t. a11y within WebKitGtk. Secondly, I am assuming that we're already doing the correct thing with respect to text and selection events for rtl languages, but it's probably worth verifying that.

Hope this helps!
Comment 54 Amir E. Aharoni 2011-05-04 05:42:53 PDT
(In reply to comment #53)
> I, of course, defer to readers of RTL languages to determine what direction right and left should move with respect to RTL text.

I write Hebrew and the answer is very easy:

Right and Ctrl-Right must go to the right.

Left and Ctrl-Left must go to the left.
Comment 55 Mario Sanchez Prada 2011-05-04 06:53:32 PDT
(In reply to comment #53)
> (In reply to comment #52)
> 
> > Joanie, I took a look to the test case with GtkLauncher and I'm afraid I don't get the problem properly. Could you post here some "how to reproduce the problem for dummies" steps so I can see the problem myself? :-)
> 
> GtkLauncher has keyboard nav? Maybe I need my own dummies guide. ;-)

Now it hasn't keyboard navigation, so you're not crazy. I just added some sloppy code to enable it there. Perhaps I should think of adding it in a more persistent way to the app :-)

> Anyhoo:
> 
> 1. In Epiphany, view this: https://bugs.webkit.org/attachment.cgi?id=29619
> 
> 2. Press Tab to give focus to one of the lines.
> 
> 3. Press Left Arrow (result: moves to the left)
> 
> 4. Press Ctrl Left Arrow (result: moves to the right)
> 
> 5. Press Right Arrow (result: moves to the right)
> 
> 6. Press Ctrl Right Arrow (result: moves to the left)
> 
> I, of course, defer to readers of RTL languages to determine what direction 
> right and left should move with respect to RTL text. However, the fact that 
> Left and Ctrl+Left move in opposite directions, and that Right and Ctrl+Right 
> move in opposite directions surely cannot be correct.

Agreed, even if I'm not an expert on this, it certainly looks wrong.

> So those working on the bug are straightening that out, and will do so for
> WebKitGtk too (as I understand it). This will include *nix-unique caret-
> positioning (not yet addressed).

Yep, I think I understood the same as per reading all the comments so far in this bug.

> All we have to do (I believe) is be sure that the change doesn't have 
> unexpected side effects w.r.t. a11y within WebKitGtk. Secondly, I am assuming 
> that we're already doing the correct thing with respect to text and selection 
> events for rtl languages, but it's probably worth verifying that.

Which kind of side effects are you thinking of? Asking this because for me this bidi-text stuff is something relatively new and I'm not sure atm what kind of problems we could expect in this regard, apart from the obvious ones (moving in the opposite direction)

> Hope this helps!

My main doubt about your previous comment was that you mentioned that sometimes it moved by character and sometimes by word, which is something I could not reproduce yet.

Thanks for the detailed response!
Comment 56 David Wajc 2012-01-07 13:35:18 PST
At the risk of sounding simple-minded: It seems that ctrl-shit-left and ctrl-shift-right work correctly both in ltr and rtl, thanks to Xiaomei Ji's fixes of this bug: https://bugs.webkit.org/show_bug.cgi?id=24303. Why not use the exact same code for ctrl-left and ctrl-right, just without highlighting and selecting the text? As a native Hebrew speaker, I'm confident in saying that this is the exact behavior expected of ctrl-arrow.
Comment 57 Xiaomei Ji 2012-01-12 14:18:16 PST
(In reply to comment #56)
> At the risk of sounding simple-minded: It seems that ctrl-shit-left and ctrl-shift-right work correctly both in ltr and rtl, thanks to Xiaomei Ji's fixes of this bug: https://bugs.webkit.org/show_bug.cgi?id=24303. Why not use the exact same code for ctrl-left and ctrl-right, just without highlighting and selecting the text? As a native Hebrew speaker, I'm confident in saying that this is the exact behavior expected of ctrl-arrow.

ctrl-shift-arrow is logical operation.
ctrl-arrow currently is logical operation, and we are changing it to visual operation, which is this bug is about.
Comment 58 David Wajc 2012-01-16 10:34:26 PST
Xiaomei Ji,

While ctrl-shift-arrow might be supposed to be a logical operation, it currently performs as a visual operation. That is, ctrl-shift-right jumps (and marks) a whole word to the right and ctrl-shift-left jumps (and marks) a whole word to the left, irrespective of language choice (LTR or RTL). This is exactly what ctrl-right and ctrl-left should do (just without marking the text). As you fixed shift-arrow, causing this behavior of ctrl-shift-arrow, I imagine you are most familiar with the relevant code, so as to give ctrl-arrow this required behavior. Please help!
Comment 59 Xiaomei Ji 2012-01-17 13:31:48 PST
(In reply to comment #58)
> Xiaomei Ji,
> 
> While ctrl-shift-arrow might be supposed to be a logical operation, it currently performs as a visual operation. That is, ctrl-shift-right jumps (and marks) a whole word to the right and ctrl-shift-left jumps (and marks) a whole word to the left, irrespective of language choice (LTR or RTL). This is exactly what ctrl-right and ctrl-left should do (just without marking the text). As you fixed shift-arrow, causing this behavior of ctrl-shift-arrow, I imagine you are most familiar with the relevant code, so as to give ctrl-arrow this required behavior. Please help!

if you type in a bi-directional text, you will see that ctrl-shift-arrow is selecting word in logical order.
For example, logical text "abc def HIJ OPQ uvw xyz", visual display is "abc def JIH  QPO uvw xyz|" (capital letter represents Hebrew character).
if you continuously press ctrl-shift-left-arrow, the selected words will be
"xyz", "uvw xyz", "JIH uvw xyz", "JIH QPO uvw xyz"....
this is a selection in logical order.

And yes you are right that we are fixing the ctrl-arrow to be in visual order.
It is currently enabled in Chromium dev.
You can refer http://code.google.com/p/chromium/issues/detail?id=10741#c46 for detail.