Bug 78856 - visual word movement: Using ICU break iterator to simplify implementation
Summary: visual word movement: Using ICU break iterator to simplify implementation
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: HTML Editing (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 25298
  Show dependency treegraph
 
Reported: 2012-02-16 16:29 PST by Xiaomei Ji
Modified: 2012-03-16 01:27 PDT (History)
5 users (show)

See Also:


Attachments
patch w/ layout test (65.52 KB, patch)
2012-02-16 17:25 PST, Xiaomei Ji
no flags Details | Formatted Diff | Diff
patch w/ layout test (72.45 KB, patch)
2012-02-21 12:14 PST, Xiaomei Ji
no flags Details | Formatted Diff | Diff
WIP (fix compilation error) (72.45 KB, patch)
2012-02-21 15:11 PST, Xiaomei Ji
no flags Details | Formatted Diff | Diff
wip (72.45 KB, patch)
2012-02-24 11:09 PST, Xiaomei Ji
no flags Details | Formatted Diff | Diff
WIP (67.48 KB, patch)
2012-03-12 17:52 PDT, Xiaomei Ji
no flags Details | Formatted Diff | Diff
patch w/ layout test (82.48 KB, patch)
2012-03-14 16:43 PDT, Xiaomei Ji
rniwa: review-
Details | Formatted Diff | Diff
patch w/ layout test (78.35 KB, patch)
2012-03-15 12:22 PDT, Xiaomei Ji
no flags Details | Formatted Diff | Diff
patch w/ layout test (78.38 KB, patch)
2012-03-15 14:27 PDT, Xiaomei Ji
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Xiaomei Ji 2012-02-16 16:29:22 PST
Using ICU break iterator to simplify implementation of visual word movement.
Comment 1 Xiaomei Ji 2012-02-16 17:25:54 PST
Created attachment 127480 [details]
patch w/ layout test

1st version brutal force computation. it reduces the complexity of old implementation in the cost of performance.
Comment 2 Xiaomei Ji 2012-02-16 17:30:26 PST
(In reply to comment #1)
> Created an attachment (id=127480) [details]
> patch w/ layout test
> 
> 1st version brutal force computation. it reduces the complexity of old implementation in the cost of performance.

I need to check how to get a pretty diff. The old code are almost completely replaced.
Comment 3 Alexey Proskuryakov 2012-02-16 23:44:20 PST
> it reduces the complexity of old implementation in the cost of performance.

This patch breaks regression tests. Is your plan to also make sure that nothing breaks, in addition to not regressing performance?
Comment 4 Ryosuke Niwa 2012-02-17 00:22:07 PST
Comment on attachment 127480 [details]
patch w/ layout test

Ugh... the diff in visible_units.cpp is hard to read. Could you insert the new code elsewhere in the file so that it won't get mangled with the deleted code?
Comment 5 Ryosuke Niwa 2012-02-17 00:23:26 PST
(In reply to comment #3)
> This patch breaks regression tests. Is your plan to also make sure that nothing breaks, in addition to not regressing performance?

My expectation is that there will be no correctness regressions. I'm not too concerned about the performance here as this code is only used for user-initiated visual word movement unlike startOfWord and endOfWord which are used in spell checking, etc...
Comment 6 Xiaomei Ji 2012-02-21 12:14:07 PST
Created attachment 128018 [details]
patch w/ layout test

Shuffled the code to the beginning of the file for readability only.

In reply to comment #3)
> > it reduces the complexity of old implementation in the cost of performance.
> 
> This patch breaks regression tests. Is your plan to also make sure that nothing breaks, in addition to not regressing performance?

You are talking about the regression in move-by-word-visually-single-space-inline-element.html, right?
There are 5 regressions and 5 progressions in this file.

There are 2 cases of the regression, both because I used logical word break to imply visual word break, which might not be correct. But except using space as word segmentor (which is totally wrong for languages do not use space as word segmentor, such as CJK languages), I could not think of any solution yet.

The first case is the regression of test 11, 13, and 17.
Using test 11 as an example, the logical text is: <div dir=rtl>ABC DEF <span dir=ltr>abc def</span>OPQ RST</div>.
Logically, position after "def|" is not a word breaker.
The text is visually displayed as "TSR QPOabc def FED CBA". When cursor is at "TSR QPOabc def| FED CBA", since we use logical word break position to imply visual word break position, that position is not considered as word breaker logically, so it wont be considered as word breaker visually.

The second case is the regression of test 14 and 18.
Using test 14 as an example, the logical text is <div dir=ltr>ABC DEF <span>abc def</span>OPQ RST</div>.
The text visually displayed as "FED CBA abc defTSR QPO".
Position "FED CBA abc def|TSR QPO" is at (InlineBox: "OPQ RST", offset:7). It is corresponding to logical end of the text, which is considered as a word break logically, so it is considered as a word break visually.

Except those 5 test cases, all other changes (either in the test file itself or in the expected result) are progression.

More cases (that I have not added yet) that the new code yield wrong result is when arrow-left-right yields wrong result.
For example, for case "abc \u202bABC def\u202c xyz", ICU word break iterator treat "def\u202c|" as a word break position. But arrow key is not able to reach that position.
Another example, <div contenteditable id="multiple_space_wrong_result">abc ששש def <span dir=rtl>שנב  opq סטז</span>  uvw ששש xyz</div>, in which arrow-left falls in a dead-loop.
Comment 7 Xiaomei Ji 2012-02-21 15:11:14 PST
Created attachment 128053 [details]
WIP (fix compilation error)
Comment 8 WebKit Commit Bot 2012-02-23 19:52:07 PST
Attachment 128053 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/edit..." exit_code: 1
Source/WebCore/editing/visible_units.cpp:306:  This { should be at the end of the previous line  [whitespace/braces] [4]
Total errors found: 1 in 13 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 9 Xiaomei Ji 2012-02-24 11:09:57 PST
Created attachment 128769 [details]
wip

fix style error.
Comment 10 Xiaomei Ji 2012-03-12 17:52:47 PDT
Created attachment 131476 [details]
WIP

Get rid of regression by using UPSTREAM in getInlineBoxAndOffset()
Comment 11 Ryosuke Niwa 2012-03-13 14:00:20 PDT
Comment on attachment 131476 [details]
WIP

View in context: https://bugs.webkit.org/attachment.cgi?id=131476&action=review

> Source/WebCore/editing/visible_units.cpp:166
> +static void searchPreviousBoxInSameLine(const InlineTextBox* textBox, InlineTextBox*& previousBox)

I think it's better for this function to just return the box.
Also we don't normally call these functions "search" blah. It's probably better to call it something along the like of logicallyPreviousBoxInSameLine.

> Source/WebCore/editing/visible_units.cpp:173
> +            break;

If you do that, you can just return 0.

> Source/WebCore/editing/visible_units.cpp:175
> +            previousBox = toInlineTextBox(leafBoxesInLogicalOrder[i]);

And will be returning toInlineTextBox(leafBoxesInLogicalOrder[i]) here.

> Source/WebCore/editing/visible_units.cpp:192
> +            while (j < leafBoxesInLogicalOrder.size()) {
> +                if (leafBoxesInLogicalOrder[j]->isInlineTextBox()) {
> +                    nextBox = toInlineTextBox(leafBoxesInLogicalOrder[j]);
> +                    break;
> +                }
> +            }

It seems like searchPreviousBoxInSameLine should also have this loop.
It'll be great if we could add a test case for searchPreviousBoxInSameLine.

> Source/WebCore/editing/visible_units.cpp:200
> +static void searchPreviousBoxInPreviousRootInlineBox(const VisiblePosition& visiblePosition, const InlineTextBox* textBox,
> +    InlineTextBox*& previousBox, bool& previousBoxInDifferentLine)
> +{

It'll be great if you can combine this function with searchPreviousBoxInSameLine.

> Source/WebCore/editing/visible_units.cpp:261
> +    TextBreakIterator*& iter, int& previousBoxLen, bool& previousBoxInDifferentLine)

Please spell out length in previousBoxLen. It's probably still better for this function to return TextBreakIterator*.

> Source/WebCore/editing/visible_units.cpp:265
> +    // TODO: how should we handle non-text box?

s/TODO/FIXME/. Again, it's probably better to say "handle the case when we don't have an inline box".

> Source/WebCore/editing/visible_units.cpp:311
> +enum ECursorMovementDirection {

I think the new style is not to prefix enum types with E.

> Source/WebCore/editing/visible_units.cpp:312
> +    MOVE_LEFT, MOVE_RIGHT,

We should use MoveLeft and MoveRight for the consistency.

> Source/WebCore/editing/visible_units.cpp:322
> +    textBreakFollowing(iter, position);
> +    return isWordTextBreak(iter);

Maybe we need a comment explaining that ICU's isWordTextBreak returns true after moving across a word and false after moving across a punctuation/whitespace.

> Source/WebCore/editing/visible_units.cpp:331
> +static VisiblePosition wordPosition(const VisiblePosition& visiblePosition, ECursorMovementDirection direction)

We should probably call this function visualWordPosition.

> Source/WebCore/editing/visible_units.cpp:342
> +        VisiblePosition adjacentByCharacter = direction == MOVE_RIGHT ? current.right(true) : current.left(true); 

We should probably call this adjacentCharacter.

> Source/WebCore/editing/visible_units.cpp:347
> +        int offset;

Maybe we want to clarify that this offset is in box because there are two offsets here: one in box and one in iterator.

> Source/WebCore/editing/visible_units.cpp:350
> +        // TODO: how should we handle non text box?

We don't use TODO. Use FIXME instead. I think it's probably better to say "Handle the case when we don't have an inline box" instead of asking "how".

> Source/WebCore/editing/visible_units.cpp:355
> +        int previousBoxLen = 0;

Please spell out length.

> Source/WebCore/editing/visible_units.cpp:364
> +            if (previouslyVisitedBox != box) {

Maybe you want to a define a boolean like movingIntoNewBox = previouslyVisitedBox != box to clarify the semantics.

> Source/WebCore/editing/visible_units.cpp:368
> +            previouslyVisitedBox = box;

This statement can be moved into the if statement so that you can just have one else if instead of nested else and if.

> Source/WebCore/editing/visible_units.cpp:372
> +            isStartOfWord(iter, offset - textBox->start() + previousBoxLen,

Please define descriptive local variables for offset - textBox->start() + previousBoxLen.

> Source/WebCore/editing/visible_units.cpp:373
> +                offset == (int)textBox->start() && previousBoxInDifferentLine) :

Please use static_cast<int>(~).

> Source/WebCore/editing/visible_units.cpp:392
> +    // FIXME: How should we handle a non-editable position?
> +    if (leftWordBreak.isNull() && isEditablePosition(visiblePosition.deepEquivalent())) {
> +        TextDirection blockDirection = directionOfEnclosingBlock(visiblePosition.deepEquivalent());
> +        leftWordBreak = blockDirection == LTR ? startOfEditableContent(visiblePosition) : endOfEditableContent(visiblePosition);
> +    }

I think it's okay to ignore editing boundaries in wordPosition function and then just call honorEditingBoundaryAtOrBefore here.
Comment 12 Ryosuke Niwa 2012-03-13 14:07:32 PDT
Comment on attachment 131476 [details]
WIP

View in context: https://bugs.webkit.org/attachment.cgi?id=131476&action=review

> Source/WebCore/editing/visible_units.cpp:182
> +    textBox->root()->collectLeafBoxesInLogicalOrder(leafBoxesInLogicalOrder);

I guess we can cache these vector for the current root inline box to avoid calling collectLeafBoxesInLogicalOrder on the same root inline box repeatedly when a single word is spread across multiple inline boxes but that can probably be done in a separate patch since these functions are only used when user presses ctrl+alt+left/right, and not used by any internal functions.
Comment 13 Ryosuke Niwa 2012-03-13 14:16:32 PDT
Comment on attachment 131476 [details]
WIP

View in context: https://bugs.webkit.org/attachment.cgi?id=131476&action=review

> LayoutTests/editing/selection/move-by-word-visually-inline-block-positioned-element-expected.txt:5
> +"begin start"[0, 6], "abc def"[0, 4], "end ing"[0, 4], "this is float"[0, 5, 8], "this is fixed"[0, 5, 8], "this is relative"[0, 5, 8], "this is absolute"[0, 5, 8, 16]

It's great see a bug being fixed :D

> LayoutTests/editing/selection/move-by-word-visually-inline-block-positioned-element.html:21
> +title="[d_1, 0][d_1, 6][d_2, 0][d_2, 4][d_3, 0][d_3, 4][d_4, 0][d_4, 5][d_4, 8][d_5, 0][d_5, 5][d_5, 8][d_6, 0][d_6, 5][d_6, 8][d_7, 0][d_7, 5][d_7, 8][d_7, 16]|
> +[d_7, 16][d_7, 8][d_7, 5][d_7, 0][d_6, 8][d_6, 5][d_6, 0][d_5, 8][d_5, 5][d_5, 0][d_4, 8][d_4, 5][d_4, 0][d_3, 4][d_3, 0][d_2, 4][d_2, 0][d_1, 6][d_1, 0]" 

Maybe we want to change the format to make it more semantically clear. Of course, in a separate patch :)
It's really hard to understand what's going on here.

> LayoutTests/editing/selection/move-by-word-visually-multi-line-expected.txt:40
> +"abc def ghi jkl mn "[0, 3, 8, 11, 16], "opq rst uvw xyz"[0, 3, 8, 11, 15]

It's great to see this test case passing but can we print PASS in these passing cases? (in a separate patch, of course).

> LayoutTests/editing/selection/move-by-word-visually-multi-line.html:57
> +<div contenteditable dir=rtl id="ml_8" class="test_move_by_word fix_width" title="[ml_8, 15, 5][ml_8, 11, 5][ml_8, 8, 5][ml_8, 3, 5][ml_8, 0, 5][ml_8, 16][ml_8, 11][ml_8, 8][ml_8, 3][ml_8, 0]|[ml_8, 0][ml_8, 3][ml_8, 8][ml_8, 11][ml_8, 16][ml_8, 0, 5][ml_8, 3, 5][ml_8, 8, 5][ml_8, 11, 5][ml_8, 15, 5]">abc def ghi jkl mn <div><br/></div><div><br/></div><div><br/></div>opq rst uvw xyz</div>

Please explain the change in the change log as done in person :)
I understand that this is an improvement after talking with you, but it'll be nice to document it here so that other contributors like ap and mitz can understand it as well without having to talk with you in person.
Comment 14 Xiaomei Ji 2012-03-14 16:37:55 PDT
Comment on attachment 131476 [details]
WIP

View in context: https://bugs.webkit.org/attachment.cgi?id=131476&action=review

>> Source/WebCore/editing/visible_units.cpp:166
>> +static void searchPreviousBoxInSameLine(const InlineTextBox* textBox, InlineTextBox*& previousBox)
> 
> I think it's better for this function to just return the box.
> Also we don't normally call these functions "search" blah. It's probably better to call it something along the like of logicallyPreviousBoxInSameLine.

done.

>> Source/WebCore/editing/visible_units.cpp:173
>> +            break;
> 
> If you do that, you can just return 0.

it actually should return previousBox. But not applicable anymore since the function changed.

>> Source/WebCore/editing/visible_units.cpp:175
>> +            previousBox = toInlineTextBox(leafBoxesInLogicalOrder[i]);
> 
> And will be returning toInlineTextBox(leafBoxesInLogicalOrder[i]) here.

it actually should not change. But not applicable anymore.

>> Source/WebCore/editing/visible_units.cpp:182
>> +    textBox->root()->collectLeafBoxesInLogicalOrder(leafBoxesInLogicalOrder);
> 
> I guess we can cache these vector for the current root inline box to avoid calling collectLeafBoxesInLogicalOrder on the same root inline box repeatedly when a single word is spread across multiple inline boxes but that can probably be done in a separate patch since these functions are only used when user presses ctrl+alt+left/right, and not used by any internal functions.

will do in a separate patch.

>> Source/WebCore/editing/visible_units.cpp:192
>> +            }
> 
> It seems like searchPreviousBoxInSameLine should also have this loop.
> It'll be great if we could add a test case for searchPreviousBoxInSameLine.

the original logic in searchPreviousBoxInSameLine is correct since previousBox is updated on every iteration. So, the above loop is not needed.
But I've changed these functions, and the new logic is always looking for current textbox first, then, iterate to find the previous or next Inline *text* box.

added test case to test non InlineTextBox.

>> Source/WebCore/editing/visible_units.cpp:261
>> +    TextBreakIterator*& iter, int& previousBoxLen, bool& previousBoxInDifferentLine)
> 
> Please spell out length in previousBoxLen. It's probably still better for this function to return TextBreakIterator*.

done.

>> Source/WebCore/editing/visible_units.cpp:265
>> +    // TODO: how should we handle non-text box?
> 
> s/TODO/FIXME/. Again, it's probably better to say "handle the case when we don't have an inline box".

done.

>> Source/WebCore/editing/visible_units.cpp:311
>> +enum ECursorMovementDirection {
> 
> I think the new style is not to prefix enum types with E.

done.

>> Source/WebCore/editing/visible_units.cpp:312
>> +    MOVE_LEFT, MOVE_RIGHT,
> 
> We should use MoveLeft and MoveRight for the consistency.

done.

>> Source/WebCore/editing/visible_units.cpp:322
>> +    return isWordTextBreak(iter);
> 
> Maybe we need a comment explaining that ICU's isWordTextBreak returns true after moving across a word and false after moving across a punctuation/whitespace.

done.

>> Source/WebCore/editing/visible_units.cpp:331
>> +static VisiblePosition wordPosition(const VisiblePosition& visiblePosition, ECursorMovementDirection direction)
> 
> We should probably call this function visualWordPosition.

done.

>> Source/WebCore/editing/visible_units.cpp:342
>> +        VisiblePosition adjacentByCharacter = direction == MOVE_RIGHT ? current.right(true) : current.left(true); 
> 
> We should probably call this adjacentCharacter.

done.

>> Source/WebCore/editing/visible_units.cpp:347
>> +        int offset;
> 
> Maybe we want to clarify that this offset is in box because there are two offsets here: one in box and one in iterator.

renamed to offsetInBox.

>> Source/WebCore/editing/visible_units.cpp:350
>> +        // TODO: how should we handle non text box?
> 
> We don't use TODO. Use FIXME instead. I think it's probably better to say "Handle the case when we don't have an inline box" instead of asking "how".

done.

>> Source/WebCore/editing/visible_units.cpp:355
>> +        int previousBoxLen = 0;
> 
> Please spell out length.

done.

>> Source/WebCore/editing/visible_units.cpp:364
>> +            if (previouslyVisitedBox != box) {
> 
> Maybe you want to a define a boolean like movingIntoNewBox = previouslyVisitedBox != box to clarify the semantics.

done.

>> Source/WebCore/editing/visible_units.cpp:368
>> +            previouslyVisitedBox = box;
> 
> This statement can be moved into the if statement so that you can just have one else if instead of nested else and if.

done.

>> Source/WebCore/editing/visible_units.cpp:372
>> +            isStartOfWord(iter, offset - textBox->start() + previousBoxLen,
> 
> Please define descriptive local variables for offset - textBox->start() + previousBoxLen.

done.

>> Source/WebCore/editing/visible_units.cpp:373
>> +                offset == (int)textBox->start() && previousBoxInDifferentLine) :
> 
> Please use static_cast<int>(~).

done.

>> Source/WebCore/editing/visible_units.cpp:392
>> +    }
> 
> I think it's okay to ignore editing boundaries in wordPosition function and then just call honorEditingBoundaryAtOrBefore here.

added honorEditingBoundaryAtOrBefore to reinforce.

>> LayoutTests/editing/selection/move-by-word-visually-inline-block-positioned-element.html:21
>> +[d_7, 16][d_7, 8][d_7, 5][d_7, 0][d_6, 8][d_6, 5][d_6, 0][d_5, 8][d_5, 5][d_5, 0][d_4, 8][d_4, 5][d_4, 0][d_3, 4][d_3, 0][d_2, 4][d_2, 0][d_1, 6][d_1, 0]" 
> 
> Maybe we want to change the format to make it more semantically clear. Of course, in a separate patch :)
> It's really hard to understand what's going on here.

I need to think a way to make the format more readable (although there is a comment about the format in each HTML file). I can prepend "elment_id", "position_offset", and "child_node_index" in each field, but that will make the long string even longer. Fortunately, the format in expected result is readable.

>> LayoutTests/editing/selection/move-by-word-visually-multi-line-expected.txt:40
>> +"abc def ghi jkl mn "[0, 3, 8, 11, 16], "opq rst uvw xyz"[0, 3, 8, 11, 15]
> 
> It's great to see this test case passing but can we print PASS in these passing cases? (in a separate patch, of course).

An alternative would be to print nothing for those PASS (And print a PASS at the end if all tests pass). Will do next.

>> LayoutTests/editing/selection/move-by-word-visually-multi-line.html:57
>> +<div contenteditable dir=rtl id="ml_8" class="test_move_by_word fix_width" title="[ml_8, 15, 5][ml_8, 11, 5][ml_8, 8, 5][ml_8, 3, 5][ml_8, 0, 5][ml_8, 16][ml_8, 11][ml_8, 8][ml_8, 3][ml_8, 0]|[ml_8, 0][ml_8, 3][ml_8, 8][ml_8, 11][ml_8, 16][ml_8, 0, 5][ml_8, 3, 5][ml_8, 8, 5][ml_8, 11, 5][ml_8, 15, 5]">abc def ghi jkl mn <div><br/></div><div><br/></div><div><br/></div>opq rst uvw xyz</div>
> 
> Please explain the change in the change log as done in person :)
> I understand that this is an improvement after talking with you, but it'll be nice to document it here so that other contributors like ap and mitz can understand it as well without having to talk with you in person.

done.
Comment 15 Xiaomei Ji 2012-03-14 16:43:27 PDT
Created attachment 131956 [details]
patch w/ layout test

I did not consolidate previousRootInlineBox with the functionality in previousLinePosition. There are a couple of differences there (for example, the RenderPosition and ShadowNode), and I am hesitate to change previousLinePosition() in this patch to make it more inline with previousRootInlineBox. And consolidate to a single function with parameters to control different requirement looks not nice.
Comment 16 Ryosuke Niwa 2012-03-15 09:48:16 PDT
Comment on attachment 131956 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=131956&action=review

r-. Let's do one more iteration.

> Source/WebCore/editing/visible_units.cpp:52
> +// The first leaf before node that has the same editability as node.

I think this comment is more confusing than useful.

> Source/WebCore/editing/visible_units.cpp:56
> +    Node* n = node->previousLeafNode();

Can't we just modify node here? We try not to introduce new one-letter variable in WebKit.

> Source/WebCore/editing/visible_units.cpp:67
> +    for (Node* p = n; p; p = p->parentNode()) {

Ditto about p & n. Just rename n to node and re-use that.

> Source/WebCore/editing/visible_units.cpp:79
> +    Node* n = child ? child->nextLeafNode() : node->lastDescendant()->nextLeafNode();

Ditto about one-letter variable.

> Source/WebCore/editing/visible_units.cpp:94
> +    Node* n = node->nextLeafNode();

Same comment.

> Source/WebCore/editing/visible_units.cpp:110
> +    while (previousNode && enclosingBlockNode == enclosingNodeWithNonInlineRenderer(previousNode))

Please put a blank line before while for the better readability.

> Source/WebCore/editing/visible_units.cpp:160
> +static const InlineTextBox* previousBoxInOneLine(const RootInlineBox* root, const InlineTextBox* box, Vector<InlineBox*>& leafBoxesInLogicalOrder)

I think "One" is redundant here because we don't say "lines".

> Source/WebCore/editing/visible_units.cpp:166
> +    leafBoxesInLogicalOrder.clear();
> +    root->collectLeafBoxesInLogicalOrder(leafBoxesInLogicalOrder);

Wait... I don't think this is right. The only time we need to clear the cache is when we move onto a new root inline box.

> Source/WebCore/editing/visible_units.cpp:169
> +    // Otherwise, root is box's RootInlineBox, and previousBox is the previous logical box in the same line.

I think this second line is more confusing than helpful, and sort of repeats what the code says.

> Source/WebCore/editing/visible_units.cpp:172
> +        for (size_t i = 0; i < leafBoxesInLogicalOrder.size(); ++i) {

Shouldn't we be decrementing i here?

> Source/WebCore/editing/visible_units.cpp:180
> +    for (int j = previousBoxIndex; j >= 0; --j) {

We can use i here again.

> Source/WebCore/editing/visible_units.cpp:320
> +static bool logicallyStartOfWord(TextBreakIterator* iter, int position, bool hardLineBreak)

Nit: logicalStartOfWord.
By the way, I like isLogicalStartOfWord better.

> Source/WebCore/editing/visible_units.cpp:331
> +static bool logicallyEndOfWord(TextBreakIterator* iter, int position, bool hardLineBreak)

Ditto about the name.

> Source/WebCore/editing/visible_units.cpp:339
> +enum CursorMovementDirection {
> +    MoveLeft, MoveRight,
> +};

I think we normally put them in one line, or put each value in the enum in a separate line.
i.e.
enum A { V1, V2 };
or
enum A {
    V1,
    V2,
};

> Source/WebCore/editing/visible_units.cpp:352
> +        VisiblePosition adjacentCharacter = direction == MoveRight ? current.right(true) : current.left(true); 

Ugh... sorry, on my second thought, adjacentCharacter is confusing since it sounds as if it's a character.
Maybe adjacentCharacterPosition or adjacentPosition?

> Source/WebCore/editing/visible_units.cpp:369
> +        bool previousBoxInDifferentRenderer = false;

Maybe it's better to name this variable previousBoxIsInDifferentBlock or previousBoxIsInDifferentParagraph to clarify the semantics?

> Source/WebCore/editing/visible_units.cpp:411
> +    // FIXME: How should we handle a non-editable position?
> +    if (leftWordBreak.isNull() && isEditablePosition(visiblePosition.deepEquivalent())) {
> +        TextDirection blockDirection = directionOfEnclosingBlock(visiblePosition.deepEquivalent());
> +        leftWordBreak = blockDirection == LTR ? startOfEditableContent(visiblePosition) : endOfEditableContent(visiblePosition);
> +    }

Why don't we just call honor* here?
Comment 17 Xiaomei Ji 2012-03-15 12:21:26 PDT
Comment on attachment 131956 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=131956&action=review

>> Source/WebCore/editing/visible_units.cpp:52
>> +// The first leaf before node that has the same editability as node.
> 
> I think this comment is more confusing than useful.

Ah,  the first 4 functions are moved over by me without any functionality change. I updated the styles accordingly.
Removed the comment.

>> Source/WebCore/editing/visible_units.cpp:56
>> +    Node* n = node->previousLeafNode();
> 
> Can't we just modify node here? We try not to introduce new one-letter variable in WebKit.

done.

>> Source/WebCore/editing/visible_units.cpp:67
>> +    for (Node* p = n; p; p = p->parentNode()) {
> 
> Ditto about p & n. Just rename n to node and re-use that.

done.

>> Source/WebCore/editing/visible_units.cpp:94
>> +    Node* n = node->nextLeafNode();
> 
> Same comment.

done.

>> Source/WebCore/editing/visible_units.cpp:110
>> +    while (previousNode && enclosingBlockNode == enclosingNodeWithNonInlineRenderer(previousNode))
> 
> Please put a blank line before while for the better readability.

done.

>> Source/WebCore/editing/visible_units.cpp:160
>> +static const InlineTextBox* previousBoxInOneLine(const RootInlineBox* root, const InlineTextBox* box, Vector<InlineBox*>& leafBoxesInLogicalOrder)
> 
> I think "One" is redundant here because we don't say "lines".

done.

>> Source/WebCore/editing/visible_units.cpp:166
>> +    root->collectLeafBoxesInLogicalOrder(leafBoxesInLogicalOrder);
> 
> Wait... I don't think this is right. The only time we need to clear the cache is when we move onto a new root inline box.

the function only called when we move onto a  new root inline box. And it called continuously when moving to different root inline box so need to clear the vector.

>> Source/WebCore/editing/visible_units.cpp:169
>> +    // Otherwise, root is box's RootInlineBox, and previousBox is the previous logical box in the same line.
> 
> I think this second line is more confusing than helpful, and sort of repeats what the code says.

removed.

>> Source/WebCore/editing/visible_units.cpp:172
>> +        for (size_t i = 0; i < leafBoxesInLogicalOrder.size(); ++i) {
> 
> Shouldn't we be decrementing i here?

This is for looking for the current box's index, and it is the logical index we are looking for, so, it does not matter whether it is start from and incrementing or start from size-1 and decrementing.

>> Source/WebCore/editing/visible_units.cpp:180
>> +    for (int j = previousBoxIndex; j >= 0; --j) {
> 
> We can use i here again.

this one can not. Since prevBoxIndex could be -1 and can not be assigned to size_t. Also because of the name scope.

>> Source/WebCore/editing/visible_units.cpp:320
>> +static bool logicallyStartOfWord(TextBreakIterator* iter, int position, bool hardLineBreak)
> 
> Nit: logicalStartOfWord.
> By the way, I like isLogicalStartOfWord better.

done.

>> Source/WebCore/editing/visible_units.cpp:331
>> +static bool logicallyEndOfWord(TextBreakIterator* iter, int position, bool hardLineBreak)
> 
> Ditto about the name.

done.

>> Source/WebCore/editing/visible_units.cpp:339
>> +};
> 
> I think we normally put them in one line, or put each value in the enum in a separate line.
> i.e.
> enum A { V1, V2 };
> or
> enum A {
>     V1,
>     V2,
> };

done.

>> Source/WebCore/editing/visible_units.cpp:352
>> +        VisiblePosition adjacentCharacter = direction == MoveRight ? current.right(true) : current.left(true); 
> 
> Ugh... sorry, on my second thought, adjacentCharacter is confusing since it sounds as if it's a character.
> Maybe adjacentCharacterPosition or adjacentPosition?

done.

>> Source/WebCore/editing/visible_units.cpp:369
>> +        bool previousBoxInDifferentRenderer = false;
> 
> Maybe it's better to name this variable previousBoxIsInDifferentBlock or previousBoxIsInDifferentParagraph to clarify the semantics?

done.

>> Source/WebCore/editing/visible_units.cpp:411
>> +    }
> 
> Why don't we just call honor* here?

honor* is called above which enforces the returned visible position is in the same editing boundary.
The 'if' block is used to return the edge position of editable content.
For example, "abc def", since the word position we are looking for is *start* of word, it covers "|abc def" and "abc |def". But we also want to reach the edges of the text, the 'if' here and in rightWordPosition covers the "|abc def" and "abc def|".
Comment 18 Xiaomei Ji 2012-03-15 12:22:57 PDT
Created attachment 132101 [details]
patch w/ layout test

I removed gtk/Skipped and qt/Skipped so that the patch can be successfully patched and go through EWS.
Comment 19 Ryosuke Niwa 2012-03-15 12:26:12 PDT
> >> Source/WebCore/editing/visible_units.cpp:180
> >> +    for (int j = previousBoxIndex; j >= 0; --j) {
> > 
> > We can use i here again.
> 
> this one can not. Since prevBoxIndex could be -1 and can not be assigned to size_t. Also because of the name scope.

What I meant is that you can call it i instead of j. i.e.

for (int i = previousBoxIndex; i >= 0; --i) {
Comment 20 Ryosuke Niwa 2012-03-15 12:46:18 PDT
Comment on attachment 132101 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=132101&action=review

> Source/WebCore/editing/visible_units.cpp:177
> +    int previousBoxIndex = leafBoxesInLogicalOrder.size() - 1;
> +    if (box) {
> +        for (size_t i = 0; i < leafBoxesInLogicalOrder.size(); ++i) {
> +            if (box == leafBoxesInLogicalOrder[i]) {
> +                previousBoxIndex = i - 1;
> +                break;
> +            }
> +        }
> +    }    

Maybe you can extract a function here (e.g. boxIndexInVector) and use it in nextBoxInLine as well? You can subtract/add 1 outside of the function, right?

> Source/WebCore/editing/visible_units.cpp:214
> +        if (leafBoxesInLogicalOrder.size())
> +            startBox = leafBoxesInLogicalOrder[0];
> +        else
> +            break;

I think it's better to do:
if (!leafBoxesInLogicalOrder.size())
    break;
startBox = leafBoxesInLogicalOrder[0];

> Source/WebCore/editing/visible_units.cpp:274
> +        if (leafBoxesInLogicalOrder.size())
> +            startBox = leafBoxesInLogicalOrder[0];
> +        else
> +            break;

Ditto.
Comment 21 Ryosuke Niwa 2012-03-15 12:48:21 PDT
(In reply to comment #17)
> (From update of attachment 131956 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=131956&action=review
> >> Source/WebCore/editing/visible_units.cpp:166
> >> +    root->collectLeafBoxesInLogicalOrder(leafBoxesInLogicalOrder);
> > 
> > Wait... I don't think this is right. The only time we need to clear the cache is when we move onto a new root inline box.
> 
> the function only called when we move onto a  new root inline box. And it called continuously when moving to different root inline box so need to clear the vector.

But visualWordPosition moves one character at the time, right? So logicallyPreviousBox/logicallyNextBox can be called for the box in the same root inline box.
Comment 22 Ryosuke Niwa 2012-03-15 12:50:37 PDT
(In reply to comment #21)
> (In reply to comment #17)
> > (From update of attachment 131956 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=131956&action=review
> > >> Source/WebCore/editing/visible_units.cpp:166
> > >> +    root->collectLeafBoxesInLogicalOrder(leafBoxesInLogicalOrder);
> > > 
> > > Wait... I don't think this is right. The only time we need to clear the cache is when we move onto a new root inline box.
> > 
> > the function only called when we move onto a  new root inline box. And it called continuously when moving to different root inline box so need to clear the vector.
> 
> But visualWordPosition moves one character at the time, right? So logicallyPreviousBox/logicallyNextBox can be called for the box in the same root inline box.

The easiest way to do this is to wrap the vector in some class, and then add collectLeafBoxesInLogicalOrder(RootInlneBox*) m_rootInlineBox as members. collectLeafBoxesInLogicalOrder would then check if the new root inline box is same as m_rootInlineBox or not, and if not, clears the vector and does root->collectLeafBoxesInLogicalOrder(this);
Comment 23 Xiaomei Ji 2012-03-15 14:07:47 PDT
(In reply to comment #22)
> (In reply to comment #21)
> > (In reply to comment #17)
> > > (From update of attachment 131956 [details] [details] [details])
> > > View in context: https://bugs.webkit.org/attachment.cgi?id=131956&action=review
> > > >> Source/WebCore/editing/visible_units.cpp:166
> > > >> +    root->collectLeafBoxesInLogicalOrder(leafBoxesInLogicalOrder);
> > > > 
> > > > Wait... I don't think this is right. The only time we need to clear the cache is when we move onto a new root inline box.
> > > 
> > > the function only called when we move onto a  new root inline box. And it called continuously when moving to different root inline box so need to clear the vector.
> > 
> > But visualWordPosition moves one character at the time, right? So logicallyPreviousBox/logicallyNextBox can be called for the box in the same root inline box.
> 
> The easiest way to do this is to wrap the vector in some class, and then add collectLeafBoxesInLogicalOrder(RootInlneBox*) m_rootInlineBox as members. collectLeafBoxesInLogicalOrder would then check if the new root inline box is same as m_rootInlineBox or not, and if not, clears the vector and does root->collectLeafBoxesInLogicalOrder(this);

This function will only be called when position is at box boundary.
So, the cache is only useful when a word is across multiple inline boxes.
I thought we will do it in a separate patch (see your previous comment:
"I guess we can cache these vector for the current root inline box to avoid calling collectLeafBoxesInLogicalOrder on the same root inline box repeatedly when a single word is spread across multiple inline boxes but that can probably be done in a separate patch since these functions are only used when user presses ctrl+alt+left/right, and not used by any internal functions.")
Comment 24 Xiaomei Ji 2012-03-15 14:12:14 PDT
Comment on attachment 132101 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=132101&action=review

>> Source/WebCore/editing/visible_units.cpp:177
>> +    }    
> 
> Maybe you can extract a function here (e.g. boxIndexInVector) and use it in nextBoxInLine as well? You can subtract/add 1 outside of the function, right?

done.

>> Source/WebCore/editing/visible_units.cpp:214
>> +            break;
> 
> I think it's better to do:
> if (!leafBoxesInLogicalOrder.size())
>     break;
> startBox = leafBoxesInLogicalOrder[0];

done.

>> Source/WebCore/editing/visible_units.cpp:274
>> +            break;
> 
> Ditto.

done.
Comment 25 Xiaomei Ji 2012-03-15 14:27:57 PDT
Created attachment 132121 [details]
patch w/ layout test
Comment 26 Ryosuke Niwa 2012-03-15 15:00:38 PDT
(In reply to comment #23)
> (In reply to comment #22)
> > The easiest way to do this is to wrap the vector in some class, and then add collectLeafBoxesInLogicalOrder(RootInlneBox*) m_rootInlineBox as members. collectLeafBoxesInLogicalOrder would then check if the new root inline box is same as m_rootInlineBox or not, and if not, clears the vector and does root->collectLeafBoxesInLogicalOrder(this);
> 
> This function will only be called when position is at box boundary.
> So, the cache is only useful when a word is across multiple inline boxes.
> I thought we will do it in a separate patch (see your previous comment:
> "I guess we can cache these vector for the current root inline box to avoid calling collectLeafBoxesInLogicalOrder on the same root inline box repeatedly when a single word is spread across multiple inline boxes but that can probably be done in a separate patch since these functions are only used when user presses ctrl+alt+left/right, and not used by any internal functions.")

Okay, that's fine. I just thought you've decidee to do it in this patch because you addee the vector.
Comment 27 Ryosuke Niwa 2012-03-15 15:15:59 PDT
Comment on attachment 132121 [details]
patch w/ layout test

I'm very happy with this rewrite. The code is much easier to read and it's much more robust.
It even fixes bugs!
Comment 28 WebKit Review Bot 2012-03-16 01:27:44 PDT
Comment on attachment 132121 [details]
patch w/ layout test

Clearing flags on attachment: 132121

Committed r110965: <http://trac.webkit.org/changeset/110965>
Comment 29 WebKit Review Bot 2012-03-16 01:27:50 PDT
All reviewed patches have been landed.  Closing bug.