Bug 57806

Summary: continue experiment with moving caret by word in visual order
Product: WebKit Reporter: Xiaomei Ji <xji>
Component: HTML EditingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: eric, mitz, rniwa, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Bug Depends on:    
Bug Blocks: 25298    
Attachments:
Description Flags
patch w/ layout test
none
patch w/ layout test
rniwa: review-
patch w/ layout test
rniwa: review-
patch w/ layout test
none
patch w/ layout test
none
patch w/ layout test rniwa: review+

Description Xiaomei Ji 2011-04-04 16:35:52 PDT
This is the 2nd patch, which adds implementation when caret is inside box (not at boundaries).
If the word break is inside the same box and not at the boundaries either, the word break will be returned.
If need to search the adjacent boxes for word breaks, then, only the cases implemented in bug 57336 work.
Comment 1 Xiaomei Ji 2011-04-04 16:42:41 PDT
Created attachment 88157 [details]
patch w/ layout test
Comment 2 WebKit Review Bot 2011-04-04 16:44:01 PDT
Attachment 88157 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style']" exit_code: 1

Last 3072 characters of output:
/visible_units.cpp:1357:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1358:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1359:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1360:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1361:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1362:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1363:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1364:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1365:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1366:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1367:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1368:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1369:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1370:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1371:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1372:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1373:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1374:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1376:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1377:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1420:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1429:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1430:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1431:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1432:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1433:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1434:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1435:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1450:  Tab found; better to use spaces  [whitespace/tab] [1]
Source/WebCore/editing/visible_units.cpp:1459:  Tab found; better to use spaces  [whitespace/tab] [1]
Total errors found: 39 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Xiaomei Ji 2011-04-04 16:57:14 PDT
Created attachment 88162 [details]
patch w/ layout test

fix style errors.
Comment 4 Ryosuke Niwa 2011-04-05 10:42:19 PDT
Comment on attachment 88162 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=88162&action=review

> Source/WebCore/editing/visible_units.cpp:1185
> +        return previousLeaf ? Position(previousLeaf->renderer()->node(), previousLeaf->caretMaxOffset(), Position::PositionIsOffsetInAnchor) :
> +        Position(node, 0, Position::PositionIsOffsetInAnchor);

I'd use the regular if statement instead of a tertiary operator here since the statement is really long.

> Source/WebCore/editing/visible_units.cpp:1206
> +    const InlineBox* box, const VisiblePosition& previousWordBreak, int& offsetOfWordBreak, bool& last)

what does last mean here?  last in the box?

Also, this function has very few blank lines.  It's hard to grasp the logic when there is one giant block of code.  Please insert blank lines as needed to separate blocks of logic.  I'm likely to have to come back to this function again later because I'm yet to fully understand how it works.  But extracting functions and refactoring a bit should help.

> Source/WebCore/editing/visible_units.cpp:1233
> +    int previousOffset = offsetOfWordBreak;
> +    InlineBox* boxOfWordBreak;
> +    wordBreak.getInlineBoxAndOffset(boxOfWordBreak, offsetOfWordBreak);
> +    // "previousOffset < offsetOfWordBreak" is to check the offset values are in ascending order.
> +    // For example, RTL box "FED CBA" (in visual display) in LTR block, when cursor is at the left of "CBA",
> +    // nextWordPosition returns position right of "CBA", which needs to be skipped to keep the the offsets ordered from right to left.
> +    if (boxOfWordBreak == box && (!hasSeenWordBreakInThisBox || previousOffset < offsetOfWordBreak))
> +        return wordBreak;

I'd extract this condition as a function instead of adding comment here like this.

> Source/WebCore/editing/visible_units.cpp:1237
> +    bool hasRTLPreviousBox = previousLeaf && !previousLeaf->isLeftToRightDirection();

I don't think we need to have a variable for this. It's pretty self-evident what the code is doing.

> Source/WebCore/editing/visible_units.cpp:1240
> +    VisiblePosition boundaryPosition;

What does boundaryPosition mean? I don't think it's descriptive.

> Source/WebCore/editing/visible_units.cpp:1244
> +        boundaryPosition = leftmostPositionInRTLBoxInLTRBlock(box, previousLeaf, nextLeaf);
> +    else if (box->direction() == LTR && !hasLTRNextBox)
> +        boundaryPosition = rightmostPositionInLTRBoxInRTLBlock(box, previousLeaf, nextLeaf);

These two functions don't really need to take previous/next leaf, right?  We should let those functions call nextLeafChild and prevLeafChild themselves so that we don't have to worry about the possibility of someone accidentally calling them with wrong boxes (i.e. box->nextLeafChild() != nextLeaf, etc...)

> Source/WebCore/editing/visible_units.cpp:1245
> +    if (!boundaryPosition.isNull()) {

Perhaps, you need to explain when this case is reached.

> Source/WebCore/editing/visible_units.cpp:1281
> +static VisiblePosition collectOneWordBreakInBox(const InlineBox* box, TextDirection blockDirection, const VisiblePosition& previousWordBreak, int& offsetOfWordBreak, bool& last)
> +{
> +    last = false;
> +    VisiblePosition wordBreak;
> +    if (box->direction() == blockDirection) {
> +        wordBreak = previousWordBreakInBoxInsideBlockWithSameDirectionality(box, previousWordBreak, offsetOfWordBreak);
> +        if (wordBreak.isNull())
> +            last = true;
> +    } else
> +        wordBreak = nextWordBreakInBoxInsideBlockWithDifferentDirectionality(box, previousWordBreak, offsetOfWordBreak, last);
> +    return wordBreak;
> +}
> +    
> +static void collectWordBreaksInBox(const InlineBox* box, TextDirection blockDirection, Vector<std::pair<VisiblePosition, int>, 50>& wordBoundaries)
> +{
> +    wordBoundaries.clear();
> +    int offset;
> +    bool last = false;
> +    VisiblePosition wordBreak = collectOneWordBreakInBox(box, blockDirection, VisiblePosition(), offset, last);
> +    while (!last) {
> +        wordBoundaries.append(std::make_pair(wordBreak, offset));
> +        wordBreak = collectOneWordBreakInBox(box, blockDirection, wordBreak, offset, last);
> +    }
> +    if (!wordBreak.isNull())
> +        wordBoundaries.append(std::make_pair(wordBreak, offset));
> +}

I would merge these two functions since collecting "one word break" doesn't make much sense to me.

> Source/WebCore/editing/visible_units.cpp:1299
> +static int greatestLessOffset(int offset, bool boxAndBlockInSameDirection, const Vector<std::pair<VisiblePosition, int>, 50>& wordBoundaries)

We should give this function a better name.  How about greatestValueUnder? (credit: leviw).

> Source/WebCore/editing/visible_units.cpp:1303
> +        return index;

return invalidOffset instead.

> Source/WebCore/editing/visible_units.cpp:1309
> +                index = i;
> +                break;

You should just "return index" here instead of assigning i to index.

> Source/WebCore/editing/visible_units.cpp:1312
> +    } else {

return invalidOffset here instead so that you don't need the else clause.

> Source/WebCore/editing/visible_units.cpp:1316
> +                index = i;
> +                break;

Early return here agian.

> Source/WebCore/editing/visible_units.cpp:1320
> +    return index;

Return invalidOffset here.

> Source/WebCore/editing/visible_units.cpp:1323
> +static int leastGreaterOffset(int offset, bool boxAndBlockInSameDirection, const Vector<std::pair<VisiblePosition, int>, 50>& wordBoundaries)

smallestOffsetAbove?

> Source/WebCore/editing/visible_units.cpp:1327
> +        return index;

return invalidOffset

> Source/WebCore/editing/visible_units.cpp:1333
> +                index = i;
> +                break;

Same comment about early return.

> Source/WebCore/editing/visible_units.cpp:1335
> +        }

return invalidOffset here as well

> Source/WebCore/editing/visible_units.cpp:1336
> +    } else {

so that you don't need else here.

> Source/WebCore/editing/visible_units.cpp:1340
> +                index = i;
> +                break;

early return

> Source/WebCore/editing/visible_units.cpp:1344
> +    return index;

return invalidOffset

> Source/WebCore/editing/visible_units.cpp:1375
> +static VisiblePosition wordBreakCandidate(TextDirection boxDirection, TextDirection blockDirection, WebCore::SelectionDirection searchDirection, const VisiblePosition& visiblePosition)
> +{
> +    VisiblePosition wordBreak;
> +    if (boxDirection == blockDirection) {
> +        if ((boxDirection == LTR && searchDirection == WebCore::DirectionRight)
> +            || (boxDirection == RTL && searchDirection == WebCore::DirectionLeft)) {
> +            // The word boundary at the right of space if space is used to segment words.
> +            VisiblePosition next = nextWordPosition(visiblePosition);
> +            VisiblePosition nextNext = nextWordPosition(next);
> +            if (next == nextNext)
> +                return visiblePosition;
> +            wordBreak = previousWordPosition(nextNext);
> +        } else
> +            wordBreak = previousWordPosition(visiblePosition);
> +    } else {
> +        if ((boxDirection == LTR && searchDirection == WebCore::DirectionLeft)
> +            || (boxDirection == RTL && searchDirection == WebCore::DirectionRight)) {
> +            // The word boundary at the left of space if space is used to segment words.
> +            VisiblePosition previous = previousWordPosition(visiblePosition);
> +            VisiblePosition previousPrevious = previousWordPosition(previous);
> +            if (previous == previousPrevious)
> +                wordBreak = previous;
> +            else
> +                wordBreak = nextWordPosition(previousPrevious);
> +        } else
> +            wordBreak = nextWordPosition(visiblePosition);
> +    }
> +    return wordBreak;
> +}

I'd break this function into two pieces as we did last time.  In fact, I'd embed the following code in rightWordPosition:
if (boxDirection == blockDirection) {
    if (boxDirection == LTR)
        wordBreakCandidate = positionAfterNextWord(visiblePosition)
    else
        wordBreakCandidate = previousWordPosition(visiblePosition);
} else {
    if (boxDirection == RTL)
        wordBreakCandidate = positionBeforePreviousWord(visiblePosition)
    else
        wordBreakCandidate = nextWordPosition(visiblePosition);
}

where

static VisiblePosition positionAfterNextWord(VisiblePosition position) {
    VisiblePosition beforeNextWord = nextWordPosition(position);
    VisiblePosition beforeWordAfterNextWord = nextWordPosition(beforeNextWord);
    if (beforeNextWord == beforeWordAfterNextWord)
        return VisiblePosition();
    return previousWordPosition(beforeWordAfterNextWord);
}

static VisiblePosition positionBeforePreviousWord(VisiblePosition position) {
    VisiblePosition afterNextWord = previousWordPosition(position);
    VisiblePosition afterWordAfterNextWord = previousWordPosition(afterNextWord);
    if (afterNextWord == afterWordAfterNextWord)
        return VisiblePosition();
    return nextWordPosition(previousPrevious);
}

for leftWordBoundary, I'd add do:

if (boxDirection == blockDirection) {
    if (boxDirection == RTL)
        wordBreakCandidate = positionAfterNextWord(visiblePosition);
    else
        wordBreakCandidate = previousWordPosition(visiblePosition);
} else {
    if (boxDirection == LTR)
        wordBreakCandidate = positionBeforePreviousWord(visiblePosition);
    else
        wordBreakCandidate = nextWordPosition(visiblePosition);
}

> Source/WebCore/editing/visible_units.cpp:1424
> +    InlineBox* boxOfWordBreak;
> +    int offsetOfWordBreak;
> +    wordBreak.getInlineBoxAndOffset(boxOfWordBreak, offsetOfWordBreak);
> +    if (box == boxOfWordBreak && offsetOfWordBreak != boxOfWordBreak->caretMaxOffset() && offsetOfWordBreak != boxOfWordBreak->caretMinOffset())

I would extract this as a function that takes a VisiblePosition and a box.  Maybe positionIsInsideBox?
Comment 5 Xiaomei Ji 2011-04-05 12:39:14 PDT
Comment on attachment 88162 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=88162&action=review

>> Source/WebCore/editing/visible_units.cpp:1206
>> +    const InlineBox* box, const VisiblePosition& previousWordBreak, int& offsetOfWordBreak, bool& last)
> 
> what does last mean here?  last in the box?
> 
> Also, this function has very few blank lines.  It's hard to grasp the logic when there is one giant block of code.  Please insert blank lines as needed to separate blocks of logic.  I'm likely to have to come back to this function again later because I'm yet to fully understand how it works.  But extracting functions and refactoring a bit should help.

changed to isLastWordBreakInBox.

>> Source/WebCore/editing/visible_units.cpp:1233
>> +        return wordBreak;
> 
> I'd extract this condition as a function instead of adding comment here like this.

I could extract. but I'd leave the comments there. Without them, it is very hard to understand why the code is doing that.

>> Source/WebCore/editing/visible_units.cpp:1240
>> +    VisiblePosition boundaryPosition;
> 
> What does boundaryPosition mean? I don't think it's descriptive.

it is the leftmost or rightmost position in box. suggestion for a better name?

> Source/WebCore/editing/visible_units.cpp:1282
> +    

it is more about granularity. I feel split them is more clear.

>> Source/WebCore/editing/visible_units.cpp:1312
>> +    } else {
> 
> return invalidOffset here instead so that you don't need the else clause.

I like the "if"/else branch, and I feel it is clearer. But no strong objection.
Comment 6 Xiaomei Ji 2011-04-05 15:19:40 PDT
Comment on attachment 88162 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=88162&action=review

>> Source/WebCore/editing/visible_units.cpp:1282
>> +    
> 
> it is more about granularity. I feel split them is more clear.

I renamed collectOneWordBreakInBox as wordBreakInBox

>> Source/WebCore/editing/visible_units.cpp:1375
>> +}
> 
> I'd break this function into two pieces as we did last time.  In fact, I'd embed the following code in rightWordPosition:
> if (boxDirection == blockDirection) {
>     if (boxDirection == LTR)
>         wordBreakCandidate = positionAfterNextWord(visiblePosition)
>     else
>         wordBreakCandidate = previousWordPosition(visiblePosition);
> } else {
>     if (boxDirection == RTL)
>         wordBreakCandidate = positionBeforePreviousWord(visiblePosition)
>     else
>         wordBreakCandidate = nextWordPosition(visiblePosition);
> }
> 
> where
> 
> static VisiblePosition positionAfterNextWord(VisiblePosition position) {
>     VisiblePosition beforeNextWord = nextWordPosition(position);
>     VisiblePosition beforeWordAfterNextWord = nextWordPosition(beforeNextWord);
>     if (beforeNextWord == beforeWordAfterNextWord)
>         return VisiblePosition();
>     return previousWordPosition(beforeWordAfterNextWord);
> }
> 
> static VisiblePosition positionBeforePreviousWord(VisiblePosition position) {
>     VisiblePosition afterNextWord = previousWordPosition(position);
>     VisiblePosition afterWordAfterNextWord = previousWordPosition(afterNextWord);
>     if (afterNextWord == afterWordAfterNextWord)
>         return VisiblePosition();
>     return nextWordPosition(previousPrevious);
> }
> 
> for leftWordBoundary, I'd add do:
> 
> if (boxDirection == blockDirection) {
>     if (boxDirection == RTL)
>         wordBreakCandidate = positionAfterNextWord(visiblePosition);
>     else
>         wordBreakCandidate = previousWordPosition(visiblePosition);
> } else {
>     if (boxDirection == LTR)
>         wordBreakCandidate = positionBeforePreviousWord(visiblePosition);
>     else
>         wordBreakCandidate = nextWordPosition(visiblePosition);
> }

I kept the name "next"/"nextNext" and "previous"/previousPrevious since it is hard for me to understand what beforeNextWord, beforeWordAfterNextWord, afterNextWord, afterWordAfterNextWord mean.
Comment 7 Xiaomei Ji 2011-04-05 15:29:01 PDT
Created attachment 88319 [details]
patch w/ layout test
Comment 8 Ryosuke Niwa 2011-04-06 06:58:13 PDT
Comment on attachment 88319 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=88319&action=review

Okay, I think leftmostPositionInRTLBoxInLTRBlock, rightmostPositionInLTRBoxInRTLBlock, & lastWordBreakInBox look pretty sane to me. They should be good to go after this iteration. The rest seems to require a couple more iterations.  Thanks for patiently improving the patch :)

> Source/WebCore/editing/visible_units.cpp:1179
> +    VisiblePosition leftmost;

This variable doesn't seem to be ever used.  Please remove.

> Source/WebCore/editing/visible_units.cpp:1183
> +    if (previousLeaf && !previousLeaf->isLeftToRightDirection())

Let's put a blank line right above this line to improve the readability.

> Source/WebCore/editing/visible_units.cpp:1196
> +    VisiblePosition rightmost;

Ditto; can't find any reference.

> Source/WebCore/editing/visible_units.cpp:1200
> +    if (nextLeaf && nextLeaf->isLeftToRightDirection())    

Ditto about inserting a blank line here.

> Source/WebCore/editing/visible_units.cpp:1220
> +    if (boundaryPosition.isNotNull()) { // boundaryPosition is not null only in the above 2 cases.

Please do an early return instead.

> Source/WebCore/editing/visible_units.cpp:1232
> +static bool positionIsVisuallyOrderedInBox(const VisiblePosition& wordBreak, const InlineBox* box, int& offsetOfWordBreak)

I don't think positionIsVisuallyOrderedInBox is a descriptive name here.  It explains what the function is checking but it doesn't explain anything in the context of nextWordBreakInBoxInsideBlockWithDifferentDirectionality.  Please rename.

> Source/WebCore/editing/visible_units.cpp:1237
> +    // "previousOffset < offsetOfWordBreak" is to guarantee that the word breaks collected are visually ordered.

I don't think this comment is helpful.  It doesn't explain why "previousOffset < offsetOfWordBreak" guarantees that the words breaks are visually ordered.

> Source/WebCore/editing/visible_units.cpp:1262
> +    isLastWordBreakInBox = false;
> +    if (wordBreak == previousWordBreak) {
> +        isLastWordBreakInBox = true;

It seems redundant to set isLastWordBreakInBox to false if we're immediately setting it back to true.  I'd move isLastWordBreakInBox = false below the if clause.

> Source/WebCore/editing/visible_units.cpp:1304
> +static VisiblePosition wordBreakInBox(const InlineBox* box, TextDirection blockDirection, const VisiblePosition& previousWordBreak, int& offsetOfWordBreak, bool& isLastWordBreakInBox)
> +{
> +    isLastWordBreakInBox = false;
> +    VisiblePosition wordBreak;
> +    if (box->direction() == blockDirection) {
> +        wordBreak = previousWordBreakInBoxInsideBlockWithSameDirectionality(box, previousWordBreak, offsetOfWordBreak);
> +        if (wordBreak.isNull())
> +            isLastWordBreakInBox = true;
> +    } else
> +        wordBreak = nextWordBreakInBoxInsideBlockWithDifferentDirectionality(box, previousWordBreak, offsetOfWordBreak, isLastWordBreakInBox);
> +    return wordBreak;
> +}
> +    
> +static void collectWordBreaksInBox(const InlineBox* box, TextDirection blockDirection, Vector<std::pair<VisiblePosition, int>, 50>& wordBoundaries)
> +{
> +    wordBoundaries.clear();
> +    int offset = invalidOffset;
> +    bool isLastWordBreakInBox = false;
> +    VisiblePosition wordBreak = wordBreakInBox(box, blockDirection, VisiblePosition(), offset, isLastWordBreakInBox);
> +    while (!isLastWordBreakInBox) {
> +        wordBoundaries.append(std::make_pair(wordBreak, offset));
> +        wordBreak = wordBreakInBox(box, blockDirection, wordBreak, offset, isLastWordBreakInBox);
> +    }
> +    if (!wordBreak.isNull())
> +        wordBoundaries.append(std::make_pair(wordBreak, offset));
> +}

As I said in the previous comment, I don't see why these two functions can't be merged.  In fact, I just did it for you (please verify):

static void collectWordBreaksInBox(const InlineBox* box, TextDirection blockDirection, Vector<std::pair<VisiblePosition, int>, 50>& wordBoundaries)
{
    wordBoundaries.clear();

    VisiblePosition wordBreak;
    int offsetOfWordBreak = invalidOffset;
    while (1) {
        if (box->direction() == blockDirection) {
            wordBreak = previousWordBreakInBoxInsideBlockWithSameDirectionality(box, wordBreak, offsetOfWordBreak);
            if (wordBreak.isNull())
                break;
            wordBoundaries.append(std::make_pair(wordBreak, offset));
        } else {
            bool isLastWordBreakInBox = false;
            wordBreak = nextWordBreakInBoxInsideBlockWithDifferentDirectionality(box, wordBreak, offsetOfWordBreak, isLastWordBreakInBox);
            if (wordBreak.isNotNull())
                wordBoundaries.append(std::make_pair(wordBreak, offset));
            if (isLastWordBreakInBox)
                break;
        }
    }
}

Once I did that, it was obvious to me that we should have two different functions for same directionality and different directionality as in:
static void collectWordBreaksInBoxInsideBlockWithSameDirectionality(const InlineBox* box, Vector<std::pair<VisiblePosition, int>, 50>& wordBoundaries)
{
    wordBoundaries.clear();

    VisiblePosition wordBreak;
    int offsetOfWordBreak = invalidOffset;
    while (1) {
        wordBreak = previousWordBreakInBoxInsideBlockWithSameDirectionality(box, wordBreak, offsetOfWordBreak);
        if (wordBreak.isNull())
            break;
        wordBoundaries.append(std::make_pair(wordBreak, offset));
    }
}

static void collectWordBreaksInBoxInsideBlockWithDifferntDirectionality(const InlineBox* box, Vector<std::pair<VisiblePosition, int>, 50>& wordBoundaries)
{
    wordBoundaries.clear();

    VisiblePosition wordBreak;
    int offsetOfWordBreak = invalidOffset;
    while (1) {
        bool isLastWordBreakInBox = false;
        wordBreak = nextWordBreakInBoxInsideBlockWithDifferentDirectionality(box, wordBreak, offsetOfWordBreak, isLastWordBreakInBox);
        if (wordBreak.isNotNull())
            wordBoundaries.append(std::make_pair(wordBreak, offset));
        if (isLastWordBreakInBox)
            break;
    }
}

> Source/WebCore/editing/visible_units.cpp:1322
> +static int greatestValueUnder(int offset, bool boxAndBlockInSameDirection, const Vector<std::pair<VisiblePosition, int>, 50>& wordBoundaries)

You should rename wordBoundaries to something like orderedWordBoundaries to signify the fact they're ordered.  Or otherwise things doesn't make sense.
Also I would have much preferred if you defined a struct instead of using a pair here so that I know what a pair represents.

Nit: s/boxAndBlockInSameDirection/boxAndBlockAreInSameDirection/.

> Source/WebCore/editing/visible_units.cpp:1394
> +    if (box == boxOfWordBreak && offsetOfWordBreak != box->caretMaxOffset() && offsetOfWordBreak != box->caretMinOffset())
> +        return true;        
> +    return false;

Nit: we should do:
return box == boxOfWordBreak && offsetOfWordBreak != box->caretMaxOffset() && offsetOfWordBreak != box->caretMinOffset();
instead.

> Source/WebCore/editing/visible_units.cpp:1397
> +static VisiblePosition positionAfterNextWord(const VisiblePosition& position)

Oops, this should probably named as positionAfterWord instead.  Since if we're in "he^llo world", then this function will return "hello^ world".

> Source/WebCore/editing/visible_units.cpp:1400
> +    VisiblePosition next = nextWordPosition(position);
> +    VisiblePosition nextNext = nextWordPosition(next);

next and nextNext don't make any sense to me.  Next what?  It's much clearer to call next as beforeNextWord because it signifies the fact nextWordPosition returns the position between the next word and the whitespace immediately before the next word.  It then becomes clear from your other comment that we must move forward again to obtain beforeWordAfterNextWord or afterNextWord & backward in order obtain the position after the current word.

> Source/WebCore/editing/visible_units.cpp:1406
> +static VisiblePosition positionBeforePreviousWord(const VisiblePosition& position)

Ditto. This should be named as positionBeforeWord

> Source/WebCore/editing/visible_units.cpp:1409
> +    VisiblePosition previous = previousWordPosition(position);
> +    VisiblePosition previousPrevious = previousWordPosition(previous);

Ditto about local variable names here.

> Source/WebCore/editing/visible_units.cpp:1446
> +    // Otherwise, first collect all word breaks in terms of (VisiblePosition, offset) inside the box.

Please get rid of this comment.  It repeats the function name collectWordBreaksInBox.

> Source/WebCore/editing/visible_units.cpp:1447
> +    Vector<std::pair<VisiblePosition, int>, 50> wordBoundaries;

Why 50?  That sounds like an arbitrary limit to me.  I'd rather not to specify any number here.

> Source/WebCore/editing/visible_units.cpp:1449
> +    // Then, search for word boundaries inside the box.

Please remove this comment.  It repeats what code says.

> Source/WebCore/editing/visible_units.cpp:1455
> +    // If not found, search for word boundaries in visually adjacent boxes.

Please remove this comment.  It repeats the code.

> Source/WebCore/editing/visible_units.cpp:1456
> +    return leftWordBoundary(box->prevLeafChild(), invalidOffset, blockDirection);

Mn... this is a tail recursion. I wonder if we can make it iterative in a some followup patch.
Comment 9 Xiaomei Ji 2011-04-06 10:00:25 PDT
Comment on attachment 88319 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=88319&action=review

>> Source/WebCore/editing/visible_units.cpp:1232
>> +static bool positionIsVisuallyOrderedInBox(const VisiblePosition& wordBreak, const InlineBox* box, int& offsetOfWordBreak)
> 
> I don't think positionIsVisuallyOrderedInBox is a descriptive name here.  It explains what the function is checking but it doesn't explain anything in the context of nextWordBreakInBoxInsideBlockWithDifferentDirectionality.  Please rename.

The function is named as what it does, does not matter what the context is. As to it is only needed when block and box has different directionality, I think that should be explained in the caller, as in the current code.
Or suggestion on a better name?

>> Source/WebCore/editing/visible_units.cpp:1400
>> +    VisiblePosition nextNext = nextWordPosition(next);
> 
> next and nextNext don't make any sense to me.  Next what?  It's much clearer to call next as beforeNextWord because it signifies the fact nextWordPosition returns the position between the next word and the whitespace immediately before the next word.  It then becomes clear from your other comment that we must move forward again to obtain beforeWordAfterNextWord or afterNextWord & backward in order obtain the position after the current word.

nextWordPosition does not return the position between the next word and the white space immediately before the next word. It returns the position after the current word, before the white space immediately before the next word.

This function returns the position between the next word and the white space immediately before the next word.

Then, the function name should be "positionBeforeNextWord"? 
and suggestion on "next"/"nextNext"? Themselves are nothing special. If follows your naming convention, they probably should be named as "positionAfterWord" and "positionAfterNextWord".

>> Source/WebCore/editing/visible_units.cpp:1406
>> +static VisiblePosition positionBeforePreviousWord(const VisiblePosition& position)
> 
> Ditto. This should be named as positionBeforeWord

the function returns position after previous word, by after, I mean logically. should it be named as "positionAfterPreviousWord"?
how about "previous" and "previousPrevious"? any suggestion? "positionBeforeWord" and "positionBeforePreviousWord"?

> Source/WebCore/editing/visible_units.cpp:1448
> +    collectWordBreaksInBox(box, blockDirection, wordBoundaries);

Mitz suggested inline capacity to avoid heap allocation in most cases.
Comment 10 Ryosuke Niwa 2011-04-06 10:11:40 PDT
Comment on attachment 88319 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=88319&action=review

>>> Source/WebCore/editing/visible_units.cpp:1232
>>> +static bool positionIsVisuallyOrderedInBox(const VisiblePosition& wordBreak, const InlineBox* box, int& offsetOfWordBreak)
>> 
>> I don't think positionIsVisuallyOrderedInBox is a descriptive name here.  It explains what the function is checking but it doesn't explain anything in the context of nextWordBreakInBoxInsideBlockWithDifferentDirectionality.  Please rename.
> 
> The function is named as what it does, does not matter what the context is. As to it is only needed when block and box has different directionality, I think that should be explained in the caller, as in the current code.
> Or suggestion on a better name?

I disagree. We should name a function to explain for what purpose the function is used especially because this function is only used in one place.

>>> Source/WebCore/editing/visible_units.cpp:1400
>>> +    VisiblePosition nextNext = nextWordPosition(next);
>> 
>> next and nextNext don't make any sense to me.  Next what?  It's much clearer to call next as beforeNextWord because it signifies the fact nextWordPosition returns the position between the next word and the whitespace immediately before the next word.  It then becomes clear from your other comment that we must move forward again to obtain beforeWordAfterNextWord or afterNextWord & backward in order obtain the position after the current word.
> 
> nextWordPosition does not return the position between the next word and the white space immediately before the next word. It returns the position after the current word, before the white space immediately before the next word.
> 
> This function returns the position between the next word and the white space immediately before the next word.
> 
> Then, the function name should be "positionBeforeNextWord"? 
> and suggestion on "next"/"nextNext"? Themselves are nothing special. If follows your naming convention, they probably should be named as "positionAfterWord" and "positionAfterNextWord".

Ah, I see. Thanks for the clarification.  Yes, the function name should be positionBeforeNextWord and variables should be named along the lines of positionAfterCurrentWord and positionAfterNextWord respectively.

>>> Source/WebCore/editing/visible_units.cpp:1406
>>> +static VisiblePosition positionBeforePreviousWord(const VisiblePosition& position)
>> 
>> Ditto. This should be named as positionBeforeWord
> 
> the function returns position after previous word, by after, I mean logically. should it be named as "positionAfterPreviousWord"?
> how about "previous" and "previousPrevious"? any suggestion? "positionBeforeWord" and "positionBeforePreviousWord"?

I'd name them positionAfterPreviousWord, positionBeforeCurrentWord, and positionBeforePreviousWord respectively.

>> Source/WebCore/editing/visible_units.cpp:1448
>> +    collectWordBreaksInBox(box, blockDirection, wordBoundaries);
> 
> Mitz suggested inline capacity to avoid heap allocation in most cases.

I feel like that's a pre-mature optimization.  If anything, we should make a benchmark and figure out if that'll affect the performance at all.
Comment 11 Xiaomei Ji 2011-04-06 10:38:18 PDT
(In reply to comment #10)
> (From update of attachment 88319 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=88319&action=review
> 
> >>> Source/WebCore/editing/visible_units.cpp:1232
> >>> +static bool positionIsVisuallyOrderedInBox(const VisiblePosition& wordBreak, const InlineBox* box, int& offsetOfWordBreak)
> >> 
> >> I don't think positionIsVisuallyOrderedInBox is a descriptive name here.  It explains what the function is checking but it doesn't explain anything in the context of nextWordBreakInBoxInsideBlockWithDifferentDirectionality.  Please rename.
> > 
> > The function is named as what it does, does not matter what the context is. As to it is only needed when block and box has different directionality, I think that should be explained in the caller, as in the current code.
> > Or suggestion on a better name?
> 
> I disagree. We should name a function to explain for what purpose the function is used especially because this function is only used in one place.

suggestion for a better name?
positionIsVisuallyOrderedInBoxInBlockWithDifferentDirectionality?
Comment 12 Ryosuke Niwa 2011-04-06 14:39:56 PDT
Comment on attachment 88319 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=88319&action=review

> Source/WebCore/editing/visible_units.cpp:1272
> +    // Given RTL box "ABC DEF" either follows a LTR box or is the first visual box in an LTR block as an example,
> +    // the visual display of the RTL box is: "(0)J(10)I(9)H(8) (7)F(6)E(5)D(4) (3)C(2)B(1)A(11)",
> +    // where the number in parenthesis represents offset in visiblePosition. 
> +    // Start at offset 0, the first word break is at offset 3, the 2nd word break is at offset 7, and the 3rd word break should be at offset 0.
> +    // But nextWordPosition() of offset 7 is offset 11, which should be ignored, 
> +    // and the position at offset 0 should be manually added as the last word break within the box.
> +    if (positionIsVisuallyOrderedInBox(wordBreak, box, offsetOfWordBreak))

It's not great that we need this giant chunk of comment to explain what's happening here.  positionIsVisuallyOrderedInBox should be named such that you can reduce the amount of comment here.
Comment 13 Xiaomei Ji 2011-04-06 16:08:20 PDT
Comment on attachment 88319 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=88319&action=review

> Source/WebCore/editing/visible_units.cpp:1273
> +        return wordBreak;

I am out of idea on how to rename   positionIsVisuallyOrderedInBox so that the comments could be removed. Any suggestion?
Comment 14 Xiaomei Ji 2011-04-06 16:14:01 PDT
Created attachment 88540 [details]
patch w/ layout test
Comment 15 Ryosuke Niwa 2011-04-06 16:18:33 PDT
(In reply to comment #13)
> (From update of attachment 88319 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=88319&action=review
> 
> > Source/WebCore/editing/visible_units.cpp:1273
> > +        return wordBreak;
> 
> I am out of idea on how to rename   positionIsVisuallyOrderedInBox so that the comments could be removed. Any suggestion?

Comment doesn't need to be entirely removed.  I'm simply saying that the name should complement the comment.  Well, I can't think of a better name for now so it's fine if you left the name as is.
Comment 16 Ryosuke Niwa 2011-04-07 09:29:41 PDT
Comment on attachment 88540 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=88540&action=review

> Source/WebCore/editing/visible_units.cpp:1188
> +        return Position(node, 0, Position::PositionIsOffsetInAnchor);

What if there was node extended for multiple lines?   Would this be still correct?

> Source/WebCore/editing/visible_units.cpp:1230
> +static VisiblePosition lastWordBreakInBox(const InlineBox* box, int& offsetOfWordBreak)
> +{
> +    // Add the leftmost word break for RTL box or rightmost word break for LTR box.
> +    InlineBox* previousLeaf = box->prevLeafChild();
> +    InlineBox* nextLeaf = box->nextLeafChild();
> +    VisiblePosition boundaryPosition;
> +    if (box->direction() == RTL && (!previousLeaf || previousLeaf->isLeftToRightDirection()))
> +        boundaryPosition = leftmostPositionInRTLBoxInLTRBlock(box);
> +    else if (box->direction() == LTR && (!nextLeaf || !nextLeaf->isLeftToRightDirection()))
> +        boundaryPosition = rightmostPositionInLTRBoxInRTLBlock(box);
> +    if (boundaryPosition.isNull())
> +        return VisiblePosition();            
> +    VisiblePosition wordBreak = nextWordPosition(boundaryPosition);
> +    if (wordBreak != boundaryPosition)
> +        wordBreak = previousWordPosition(wordBreak);
> +    InlineBox* boxOfWordBreak;
> +    wordBreak.getInlineBoxAndOffset(boxOfWordBreak, offsetOfWordBreak);
> +    if (boxOfWordBreak == box)
> +        return wordBreak;
> +    return VisiblePosition();    
> +}

Please insert blank lines as appropriate.

> Source/WebCore/editing/visible_units.cpp:1281
> +    WordBoundaryEntry(): visiblePosition(VisiblePosition()), offset(invalidOffset) {}
> +    WordBoundaryEntry(const VisiblePosition& position, int offset): visiblePosition(position), offset(offset) {}

Please follow WebKit style for constructors here.
Comment 17 Xiaomei Ji 2011-04-07 10:06:24 PDT
Created attachment 88654 [details]
patch w/ layout test

updated per comments.
Comment 18 Ryosuke Niwa 2011-04-07 10:16:29 PDT
Comment on attachment 88654 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=88654&action=review

> Source/WebCore/editing/visible_units.cpp:1195
> +        InlineBox* lastRTLLeaf;
> +        do {
> +            lastRTLLeaf = nextLeaf;
> +            nextLeaf = nextLeaf->nextLeafChild();
> +        } while (nextLeaf && !nextLeaf->isLeftToRightDirection());
> +        return Position(lastRTLLeaf->renderer()->node(), lastRTLLeaf->caretMinOffset(), Position::PositionIsOffsetInAnchor);

I see you fixed this code per my comment. Could you add a test case as well?

> Source/WebCore/editing/visible_units.cpp:1300
> +        : visiblePosition(VisiblePosition())

I don't think you need to explicitly call VisiblePosition() like this.

> Source/WebCore/editing/visible_units.cpp:1307
> +    : visiblePosition(position)
> +    , offset(offset) 

Please indent these.

> Source/WebCore/editing/visible_units.cpp:1312
> +    int offset; // offset returned from visiblePosition.getInlineBoxAndOffset() and is used in comparison to find the visually closest word break.

Please rename offset to something more explicit and remove the comment.
Comment 19 Xiaomei Ji 2011-04-07 17:53:16 PDT
Created attachment 88744 [details]
patch w/ layout test
Comment 20 Ryosuke Niwa 2011-04-08 23:26:54 PDT
Comment on attachment 88744 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=88744&action=review

r=me provided you address the following comments.  I'm excited for this patch!

> Source/WebCore/editing/visible_units.cpp:1292
> +    isLastWordBreakInBox = false;
> +
> +    // Given RTL box "ABC DEF" either follows a LTR box or is the first visual box in an LTR block as an example,
> +    // the visual display of the RTL box is: "(0)J(10)I(9)H(8) (7)F(6)E(5)D(4) (3)C(2)B(1)A(11)",
> +    // where the number in parenthesis represents offset in visiblePosition. 
> +    // Start at offset 0, the first word break is at offset 3, the 2nd word break is at offset 7, and the 3rd word break should be at offset 0.
> +    // But nextWordPosition() of offset 7 is offset 11, which should be ignored, 
> +    // and the position at offset 0 should be manually added as the last word break within the box.
> +    if (positionIsVisuallyOrderedInBoxInBlockWithDifferentDirectionality(wordBreak, box, offsetOfWordBreak))
> +        return wordBreak;

Can we set isLastWordBreakInBox false right above the return statement? i.e.
if (positionIsVisuallyOrderedInBoxInBlockWithDifferentDirectionality(wordBreak, box, offsetOfWordBreak)) {
    isLastWordBreakInBox = false;
    return wordBreak
}

> Source/WebCore/editing/visible_units.cpp:1326
> +        struct WordBoundaryEntry wordBoundaryEntry(wordBreak, offsetOfWordBreak);

You shouldn't put "struct" in front of a struct name. Please remove.

> Source/WebCore/editing/visible_units.cpp:1341
> +            struct WordBoundaryEntry wordBoundaryEntry(wordBreak, offsetOfWordBreak);

Ditto.

> Source/WebCore/editing/visible_units.cpp:1494
> +    int index = box->isLeftToRightDirection() ? greatestValueUnder(offset, blockDirection == LTR, orderedWordBoundaries) :
> +                                                smallestOffsetAbove(offset, blockDirection == RTL, orderedWordBoundaries);

I don't think we normally align function calls like that.  Please do:
int index = box->isLeftToRightDirection() ? greatestValueUnder(offset, blockDirection == LTR, orderedWordBoundaries) :
    smallestOffsetAbove(offset, blockDirection == RTL, orderedWordBoundaries);

> Source/WebCore/editing/visible_units.cpp:1534
> +    int index = box->isLeftToRightDirection() ? smallestOffsetAbove(offset, blockDirection == LTR, orderedWordBoundaries) :
> +                                                greatestValueUnder(offset, blockDirection == RTL, orderedWordBoundaries);

Ditto.

> LayoutTests/ChangeLog:7
> +

Please explain changes happened in the expected result and what test case you're adding.

> LayoutTests/editing/selection/move-by-word-visually-expected.txt:43
> -"ZQB abc RIG"[8, 8, 0, 0]    FAIL expected: [8, 4, 0, 0]
> +"ZQB abc RIG"[8, 0, 0, 0]    FAIL expected: [8, 4, 0, 0]
>  

I don't see new test result added here.  Did you forget to rebaseline?
Comment 21 Xiaomei Ji 2011-04-11 11:50:06 PDT
Comment on attachment 88744 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=88744&action=review

>> LayoutTests/editing/selection/move-by-word-visually-expected.txt:43
>>  
> 
> I don't see new test result added here.  Did you forget to rebaseline?

The new test tests word break stops at a  position given a specific position. It only printed out information when the test failed.
Comment 22 Ryosuke Niwa 2011-04-11 11:51:47 PDT
(In reply to comment #21)
> (From update of attachment 88744 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=88744&action=review
> The new test tests word break stops at a  position given a specific position. It only printed out information when the test failed.

That seems like a bad test. We should print something when the test passes so that not having that output causes the test to fail as well.
Comment 23 Xiaomei Ji 2011-04-11 13:02:37 PDT
Committed r83483: <http://trac.webkit.org/changeset/83483>
Comment 24 Xiaomei Ji 2011-04-11 13:03:53 PDT
(In reply to comment #22)
> (In reply to comment #21)
> > (From update of attachment 88744 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=88744&action=review
> > The new test tests word break stops at a  position given a specific position. It only printed out information when the test failed.
> 
> That seems like a bad test. We should print something when the test passes so that not having that output causes the test to fail as well.

Changed the format to be simliar to existing format. Printed the result, and if failed, printed "FAIL" + expectation.