Bug 58294

Summary: continue (3rd) experiment with moving caret by word in visual order
Product: WebKit Reporter: Xiaomei Ji <xji>
Component: HTML EditingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: mitz, rniwa
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Windows 7   
Bug Depends on: 58790    
Bug Blocks: 25298    
Attachments:
Description Flags
work in progress
none
patch w/ layout test
rniwa: review-
patch w/ layout test
rniwa: review-
patch w/ layout test
rniwa: review+
patch w/ layout test
rniwa: review+
difference from last patch (attachment 89674 which was committed in r83996) just for reference. none

Description Xiaomei Ji 2011-04-11 18:02:23 PDT
The last patch that finish all common cases.
Comment 1 Xiaomei Ji 2011-04-11 18:07:23 PDT
Created attachment 89134 [details]
work in progress

need more test cases.
Comment 2 Ryosuke Niwa 2011-04-11 18:18:10 PDT
Comment on attachment 89134 [details]
work in progress

View in context: https://bugs.webkit.org/attachment.cgi?id=89134&action=review

> Source/WebCore/editing/visible_units.cpp:1381
> +static VisiblePosition orderedLastWordBounaryInBox(const InlineBox* box, int offset, TextDirection blockDirection)

What does "ordered last word boundary" mean?
Comment 3 Xiaomei Ji 2011-04-12 10:56:35 PDT
Comment on attachment 89134 [details]
work in progress

View in context: https://bugs.webkit.org/attachment.cgi?id=89134&action=review

>> Source/WebCore/editing/visible_units.cpp:1381
>> +static VisiblePosition orderedLastWordBounaryInBox(const InlineBox* box, int offset, TextDirection blockDirection)
> 
> What does "ordered last word boundary" mean?

Needs comments here.

Word boundaries are visually ordered. It is ordered from right to left for LTR block, and ordered from left to right for RTL block. (as comments in nextWordBreakInBoxInsideBlockWithDifferentDirectionality and previousWordBreakInBoxInsideBlockWithSameDirectionality). 
this returns the last word boundary in this ordered list.
Comment 4 Ryosuke Niwa 2011-04-12 11:26:51 PDT
Comment on attachment 89134 [details]
work in progress

View in context: https://bugs.webkit.org/attachment.cgi?id=89134&action=review

>>> Source/WebCore/editing/visible_units.cpp:1381
>>> +static VisiblePosition orderedLastWordBounaryInBox(const InlineBox* box, int offset, TextDirection blockDirection)
>> 
>> What does "ordered last word boundary" mean?
> 
> Needs comments here.
> 
> Word boundaries are visually ordered. It is ordered from right to left for LTR block, and ordered from left to right for RTL block. (as comments in nextWordBreakInBoxInsideBlockWithDifferentDirectionality and previousWordBreakInBoxInsideBlockWithSameDirectionality). 
> this returns the last word boundary in this ordered list.

Oh, in that case, I don't think you need include the word "ordered" at all. Just say lastVisualWordBoundaryInBox or visuallyLastWordBoundaryInBox.
Comment 5 Xiaomei Ji 2011-04-13 18:42:41 PDT
Created attachment 89513 [details]
patch w/ layout test
Comment 6 Ryosuke Niwa 2011-04-14 10:40:36 PDT
Comment on attachment 89513 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=89513&action=review

yay! we're almost there.

> Source/WebCore/ChangeLog:14
> +           and the direction of the movement, looking for visually adjacent word breaks.

Nit: s/looking for/look for/

> Source/WebCore/ChangeLog:15
> +        2.1 If the offset is the lestmost offset of the box,

Do you mean the minimum offset of the box?

> Source/WebCore/ChangeLog:21
> +            based on the directionality of the box. If the word break position is also inside
> +            the box, just return it. Otherwise, collect the word breaks inside the box and

I don't think this part makes sense. Why would you collect word breaks inside the box when there are no word break inside the box?  I don't think you meant this.

> Source/WebCore/ChangeLog:36
> +        2. Only boundaries of the right kind (i.e. for LTR block, the boundaries where the word
> +           is to the right and space is to the left, based on Firefox's behavior) are collected.

I don't really understand the sentence inside the paragraph.

> Source/WebCore/ChangeLog:39
> +           for LTR box, word boundaries are collected from visually right to visually left;

I think it's better to say "word boundaries are collected from right to left visually in a LTR box"

> Source/WebCore/ChangeLog:45
> +        3. Same reason as above, to avoid manually check for space character 
> +           (which I do not think we should), some of the codes are inefficient. 

What is same reason as above?

> Source/WebCore/ChangeLog:49
> +           For example, given a LTR box in LTR block, if the movement is right, 
> +           nextWordPosition() returns the boundary of the wrong kind, 
> +           nextWordPosition()'s nextWordPosition()'s previousWordPosition() returns the right 
> +           kind. 

I don't think calling something "wrong" and "right" is descriptive.

> Source/WebCore/ChangeLog:51
> +        4. When collect word breaks inside a box, it first computes a start position, then

Nit: when s/collect word breaks/collecting word breaks/

> Source/WebCore/ChangeLog:52
> +           collect the right kind of word breaks until reach the end (or beyond) the box.

Nit: s/until reach the end/until it reaches the end of/

> Source/WebCore/ChangeLog:55
> +           have not consider the box's bidi level, which I think they probably should.

Nit: s/have not consider/does not consider/

I don't think "which I think they probably should" adds any value here.  If anything, you should file a bug instead.

> Source/WebCore/editing/visible_units.cpp:-1283
> -    
> -    if (wordBreak == previousWordBreak) {
> -        isLastWordBreakInBox = true;
> -        return VisiblePosition();
> -    }
> -
> -

This seems to imply that isLastWordBreakInBox is never true in collectWordBreaksInBoxInsideBlockWithDifferntDirectionality.  We should get rid of the argument altogether. r- because of this.

> Source/WebCore/editing/visible_units.cpp:1367
> +static VisiblePosition nextWordBoundaryInBox(const InlineBox* box, int offset)
> +{

If you make the following two changes, then this function becomes identical to previousWordBoundaryInBox except that this one calls nextWordBreakInBoxInsideBlockWithDifferentDirectionality whereas previousWordBoundaryInBox calls previousWordBreakInBoxInsideBlockWithSameDirectionality. Maybe we should combine these two functions? But naming them properly will be tricky.  Maybe findWordBoundaryInBox?

> Source/WebCore/editing/visible_units.cpp:1374
> +        if (wordBreak.isNotNull() && (offset == invalidOffset || offsetOfWordBreak != offset))
> +            return wordBreak;

It seems to be that we should break when wordBreak is null.

> Source/WebCore/editing/visible_units.cpp:1376
> +        if (isLastWordBreakInBox)
> +            break;

Ditto; isLastWordBreakInBox is never true.

> Source/WebCore/editing/visible_units.cpp:1381
> +static VisiblePosition visuallyLastWordBoundaryInBox(const InlineBox* box, int offset, TextDirection blockDirection)

I don't think visually last is a good name.  It gave me a wrong impression that it's at either edge of a box (i.e. rightmost word boundary in LTR box and leftmost word boundary in RTL box).  I think visuallyPreviousWordBondaryInBox will be more appropriate.

> Source/WebCore/editing/visible_units.cpp:1455
> +            if (!box->isLeftToRightDirection())

I'd prefer to have if (box->isLeftToRightDirection()) here and exchange the order of previousWordBoundaryInBox and nextWordBoundaryInBox.
Comment 7 Xiaomei Ji 2011-04-14 11:47:31 PDT
Comment on attachment 89513 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=89513&action=review

> Source/WebCore/editing/visible_units.cpp:1277
> +  

it is set as "true" at line 1289. And we need this flag to avoid search word breaks in the box in dead-loop. Please note: last == true does not necessarily mean the last word break returned is null visible position.

> Source/WebCore/editing/visible_units.cpp:1368
> +    int offsetOfWordBreak = 0;

Haha, I split them to make them clear and more consistent with our current function/naming convention. We mostly split such into left/right, previous/next, BoxBlockSameDirectionality/BoxBlockDifferentDirectionality in the current code style.

> Source/WebCore/editing/visible_units.cpp:1377
> +    }        

it will be true when the returned word break is the last in box.

>> Source/WebCore/editing/visible_units.cpp:1381
>> +static VisiblePosition visuallyLastWordBoundaryInBox(const InlineBox* box, int offset, TextDirection blockDirection)
> 
> I don't think visually last is a good name.  It gave me a wrong impression that it's at either edge of a box (i.e. rightmost word boundary in LTR box and leftmost word boundary in RTL box).  I think visuallyPreviousWordBondaryInBox will be more appropriate.

it does return visually last word boundary in the box when offset is invalid or the word break's offset is different from the pass-in offset.
only when the visually last is the same as the pass-in offset, it needs to return the previous one.
Maybe: visuallyLastWordBoundaryInBoxDifferentFromOffset?
Comment 8 Ryosuke Niwa 2011-04-14 12:20:46 PDT
Comment on attachment 89513 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=89513&action=review

>> Source/WebCore/editing/visible_units.cpp:1277
>> +  
> 
> it is set as "true" at line 1289. And we need this flag to avoid search word breaks in the box in dead-loop. Please note: last == true does not necessarily mean the last word break returned is null visible position.

Ah, ok.

> Source/WebCore/editing/visible_units.cpp:1386
> +        if (offset == invalidOffset || orderedWordBoundaries[orderedWordBoundaries.size() - 1].offsetInInlineBox != offset)

It seems like it's redundant to check against invalidOffset since offsetInInlineBox should never be invalidOffset.

> Source/WebCore/editing/visible_units.cpp:1442
> +        } else 
> +            wordBreak = visuallyLastWordBoundaryInBox(adjacentBox, adjacentBox == box ? offset : invalidOffset, blockDirection);            

If visuallyLastWordBoundaryInBox really returns the visually last word boundary in box, then why is this correct?  Why should we be returning the visually last word boundary when the block's direction is RTL?
Comment 9 Ryosuke Niwa 2011-04-14 12:20:47 PDT
Comment on attachment 89513 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=89513&action=review

>> Source/WebCore/editing/visible_units.cpp:1277
>> +  
> 
> it is set as "true" at line 1289. And we need this flag to avoid search word breaks in the box in dead-loop. Please note: last == true does not necessarily mean the last word break returned is null visible position.

Ah, ok.

> Source/WebCore/editing/visible_units.cpp:1386
> +        if (offset == invalidOffset || orderedWordBoundaries[orderedWordBoundaries.size() - 1].offsetInInlineBox != offset)

It seems like it's redundant to check against invalidOffset since offsetInInlineBox should never be invalidOffset.

> Source/WebCore/editing/visible_units.cpp:1442
> +        } else 
> +            wordBreak = visuallyLastWordBoundaryInBox(adjacentBox, adjacentBox == box ? offset : invalidOffset, blockDirection);            

If visuallyLastWordBoundaryInBox really returns the visually last word boundary in box, then why is this correct?  Why should we be returning the visually last word boundary when the block's direction is RTL?
Comment 10 Xiaomei Ji 2011-04-14 12:35:17 PDT
Comment on attachment 89513 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=89513&action=review

>>> Source/WebCore/editing/visible_units.cpp:1386
>>> +        if (offset == invalidOffset || orderedWordBoundaries[orderedWordBoundaries.size() - 1].offsetInInlineBox != offset)
>> 
>> It seems like it's redundant to check against invalidOffset since offsetInInlineBox should never be invalidOffset.
> 
> It seems like it's redundant to check against invalidOffset since offsetInInlineBox should never be invalidOffset.

the check is done only when offset is valid. it returns the last word boundary when offset is invalid (which means the one we returned wont be same as the input caret position) or when the offset is valid but it is not equal to the last word position.

> Source/WebCore/editing/visible_units.cpp:1443
>          if (wordBreak.isNotNull())

When block's direction is RTL, the word boundary are collected from left to right visually. The function looks for left word boundary of the current word, it actually looks for the right-most word boundary of a previous box (or the right-most word boundary of the current box assuming that it is not the same as the word itself).  visually last word boundary in RTL block is the right-most word boundary we are looking for.
Comment 11 Xiaomei Ji 2011-04-14 14:13:53 PDT
Created attachment 89642 [details]
patch w/ layout test
Comment 12 Ryosuke Niwa 2011-04-14 15:15:08 PDT
(In reply to comment #10)
> When block's direction is RTL, the word boundary are collected from left to right visually. The function looks for left word boundary of the current word, it actually looks for the right-most word boundary of a previous box (or the right-most word boundary of the current box assuming that it is not the same as the word itself).  visually last word boundary in RTL block is the right-most word boundary we are looking for.

Ah, I finally understand. leftWordBoundary and rightWordBoundary are called only when we're at the boundary of a box.  We should rename these two functions to signify that fact or add a comment.
Comment 13 Ryosuke Niwa 2011-04-14 15:16:53 PDT
Comment on attachment 89642 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=89642&action=review

> Source/WebCore/ChangeLog:19
> +        2.3 When offset is inside the box (not at boundaries), first find the previousWordPosition or nextWordPosition

I think you need a line break before or after previousWordPosition.

> Source/WebCore/ChangeLog:21
> +            also inside (not at the bounadries) the same box, return it. 

Since you've already qualified the previous "inside the box", I don't think you need to repeat yourself here; i.e. remove "(not at the boundaries)"

> Source/WebCore/ChangeLog:23
> +            Otherwise (the nextWordPosition/previousWordPosition is not
> +in the same box or is at the box boundary),  collect all the

What happened here??

> Source/WebCore/ChangeLog:33
> +           need to save InlineBox in word boundaries. Instead, the word boundaries are saved
> +           as (VisiblePosition, offset) pair so there is no need to recompute VisiblePosition

I feel like this is too wordy.  Maybe you can rephrase it as "as a pair (VisiblePosition, offset) to avoid recomputing VisiblePosition"

> Source/WebCore/ChangeLog:36
> +           after we calcuated which "offset" is the closest one to the input offset.
> +           (I think recompute VisiblePosition from offset might be the same complexity as
> +            getInlineBoxAndOffset()).

These comments seem to only add noise.

> Source/WebCore/ChangeLog:39
> +        2. Only boundaries of the right kind are collected. By right kind, it means the 
> +           left boundary of a word in LTR block and right boundary of a word in RTL block. 

I think you can put the explanation of right kind in parenthesis following the pattern as in:
We only collect boundaries of the right kind (i.e. left boundary of a word in LTR block and right boundary of a word in RTL block).

> Source/WebCore/ChangeLog:43
> +           So when box directionality is the same as block directionality,
> +           for LTR box, word boundaries are collected from right to left visually;
> +           for RTL box, offsets collected from left to right visually.

As I wrote in my previous review, this sentence doesn't make sense.  You need to rephrase "for LTR box" and "for RTL box".  See my suggestion in the previous review.

> Source/WebCore/ChangeLog:53
> +        3. Because for LTR block, only left boundary of a word is collected (and for RTL block,
> +           only right boundary of a word is collected), to avoid manually check for space character 
> +           (which I do not think we should), some of the codes are inefficient. 
> +           For example, given a LTR box in LTR block, if the movement is left to right, 
> +           nextWordPosition() returns the right boundary of a word, which is not what we want,
> +           nextWordPosition()'s nextWordPosition()'s previousWordPosition() returns
> +           the left boundary of a word, and it is what we want. 

This whole paragraph is very confusing. I'd revise it to something along the like of:

To find the right kinds of word boundaries, we must move back and forth between words
in some situations. For example, if we're moving to the right in a LTR box in LTR block,
we cannot simply return nextWordPosition() because it would return the right boundary
of a word. Instead, we return nextWordPosition()'s nextWordPosition()'s previousWordPosition().

> Source/WebCore/ChangeLog:58
> +           position bases on the directionality of the box and block. Those computation

Nit: s/bases/based/

Nit: These computations do not or This computation does not.

> Source/WebCore/editing/visible_units.cpp:1371
> +    bool isLastWordBreakInBox;
> +    while (true) {

I'd initialize isLastWordBreakInBox to false and do while(!isLastWordBreakInBox).

> Source/WebCore/editing/visible_units.cpp:1376
> +        if (isLastWordBreakInBox)
> +            break;

So that I can get rid of this if statement.

> Source/WebCore/editing/visible_units.cpp:1385
> +    if (orderedWordBoundaries.size()) {

I would do early return instead.
Comment 14 Xiaomei Ji 2011-04-14 15:49:23 PDT
Created attachment 89674 [details]
patch w/ layout test

updated per comments. Thanks for the review!
Comment 15 Ryosuke Niwa 2011-04-14 18:05:14 PDT
Comment on attachment 89674 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=89674&action=review

(In reply to comment #12)
> (In reply to comment #10)
> > When block's direction is RTL, the word boundary are collected from left to right visually. The function looks for left word boundary of the current word, it actually looks for the right-most word boundary of a previous box (or the right-most word boundary of the current box assuming that it is not the same as the word itself).  visually last word boundary in RTL block is the right-most word boundary we are looking for.
> 
> Ah, I finally understand. leftWordBoundary and rightWordBoundary are called only when we're at the boundary of a box.  We should rename these two functions to signify that fact or add a comment.

Please address this point.

> Source/WebCore/ChangeLog:49
> +           based on the directionality of the box and block. These computations do not consider the 

Nit: should probably say "These computations do not currently consider" to signify the fact they should in the future.

> Source/WebCore/editing/visible_units.cpp:1375
> +    } while (!isLastWordBreakInBox);       

Why do we need to use do-while?  isLastWordBreakInBox is always false in the first iteration.
Comment 16 Xiaomei Ji 2011-04-15 11:31:52 PDT
Committed r83996: <http://trac.webkit.org/changeset/83996>
Comment 17 Ryosuke Niwa 2011-04-18 09:49:31 PDT
The patch was rolled out.
Comment 18 Ryosuke Niwa 2011-04-18 09:58:36 PDT
The test was failing on chromium:
http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showExpectations=true&tests=editing%2Fselection%2Fmove-by-word-visually.html&group=%40ToT%20-%20chromium.org

I can't really make sense of these outputs those.  They even seem like V8 or DRT bug...
Comment 19 Xiaomei Ji 2011-04-20 14:54:56 PDT
Created attachment 90420 [details]
patch w/ layout test

firstRectForCharacterRange in chromium's DRT has bug (https://bugs.webkit.org/show_bug.cgi?id=59021).
removed caretCoordinates() from test since it is not really needed anymore.
Comment 20 Ryosuke Niwa 2011-04-20 15:00:08 PDT
Comment on attachment 90420 [details]
patch w/ layout test

r=me assuming that removing caretCoordinates was all you did.  As Xiaomei clarified, the values computed by caretCoordinates aren't used in the test anymore so it's okay to remove it.
Comment 21 Xiaomei Ji 2011-04-20 15:01:44 PDT
Created attachment 90421 [details]
difference from last patch (attachment 89674 [details] which was committed in r83996) just for reference.
Comment 22 Xiaomei Ji 2011-04-20 15:20:52 PDT
Committed r84425: <http://trac.webkit.org/changeset/84425>