Bug 61324 - --webkit-visual-word does not work well in words separated by multiple spaces
Summary: --webkit-visual-word does not work well in words separated by multiple spaces
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: HTML Editing (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 25298
  Show dependency treegraph
 
Reported: 2011-05-23 17:00 PDT by Xiaomei Ji
Modified: 2011-06-08 11:03 PDT (History)
3 users (show)

See Also:


Attachments
patch w/ layout test (20.09 KB, patch)
2011-05-25 15:14 PDT, Xiaomei Ji
no flags Details | Formatted Diff | Diff
patch w/ layout test (20.33 KB, patch)
2011-05-26 18:32 PDT, Xiaomei Ji
no flags Details | Formatted Diff | Diff
patch w/ layout test (21.82 KB, patch)
2011-06-02 17:09 PDT, Xiaomei Ji
rniwa: review-
Details | Formatted Diff | Diff
patch w/ layout test (22.87 KB, patch)
2011-06-07 15:35 PDT, Xiaomei Ji
no flags Details | Formatted Diff | Diff
patch w/ layout test (23.20 KB, patch)
2011-06-07 17:03 PDT, Xiaomei Ji
rniwa: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Xiaomei Ji 2011-05-23 17:00:21 PDT
Thanks Yair who reported this.

For example, following logical text in LTR context:
"    abc    def    hij".
The word breaks are:
"    |abc |   def |   hij".

When cursor is at position "abc  |", press ctrl-right-arrow should move cursor to "|def", instead, it moves cursor to "def |    hij".

The automatic test has a flaw and misses this case. The automatic test uses sel.modify("move", "right", "character") followed by sel.modify("move", "right", "word") to test caret move by word in every character position. But sel.modify("move", "right", "character") actually does not stop at every character position, it bypasses collapsed spaces. For example, when cursor is at "abc |", sel.modify("move", "right", "character") moves cursor to "d|ef".

The cause of this failure is in positionBeforeNextWord(), where it checks whether the position itself is a word break, and if it is, this position is used as positionAfterCurrentWord, and position before next word is calculated. But position "abc  |" is not considered as position after current word since its previous word break's next word break is at position "abc |", not itself "abc  |".


I could not think of a better way to solve this except remove those positionBeforeNextWord/positionAfterPreviousWord short-cut.
Comment 1 Xiaomei Ji 2011-05-23 18:27:13 PDT
I have not figured out how to write an auto test for it.
Given example: "    abc |   def",  I am not able to set position at offset 9.
sel.setPosition(node, 9) will set the position to 8.

And I have to test white-space collapse. The code works fine if the style is preserve white spaces.
Comment 2 Ryosuke Niwa 2011-05-23 22:03:15 PDT
(In reply to comment #1)
> I have not figured out how to write an auto test for it.
> Given example: "    abc |   def",  I am not able to set position at offset 9.
> sel.setPosition(node, 9) will set the position to 8.

What if we added a span at that point?
Comment 3 Xiaomei Ji 2011-05-23 23:12:28 PDT
(In reply to comment #2)
> (In reply to comment #1)
> > I have not figured out how to write an auto test for it.
> > Given example: "    abc |   def",  I am not able to set position at offset 9.
> > sel.setPosition(node, 9) will set the position to 8.
> 
> What if we added a span at that point?

I doubt it helps. add a span makes what inside a span a separate node. Inside this node, we still would not have a way to set position in those collapsed spaces for testing.
Comment 4 Xiaomei Ji 2011-05-25 15:14:41 PDT
Created attachment 94867 [details]
patch w/ layout test

> And I have to test white-space collapse. The code works fine if the style is preserve white spaces.
Actually, the code did not work completely correct when preserve white spaces. So, I can use them for testing.
Comment 5 Ryosuke Niwa 2011-05-25 18:57:59 PDT
Comment on attachment 94867 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=94867&action=review

> Source/WebCore/ChangeLog:10
> +        the one at left or right of current position. It will have a performance hit. 

By will have a performance hit, do you mean that we'll have a performance regression?

> Source/WebCore/editing/visible_units.cpp:1534
>      if (box->direction() == blockDirection) {
> -        if (blockDirection == RTL)
> -            wordBreak = positionBeforeNextWord(visiblePosition);
> -        else
> +        if (blockDirection == LTR)
>              wordBreak = previousWordPosition(visiblePosition);
>      } else {
> -        if (blockDirection == RTL)
> -            wordBreak = positionAfterPreviousWord(visiblePosition);
> -        else
> +        if (blockDirection == LTR)
>              wordBreak = nextWordPosition(visiblePosition);
>      }

I would put if (box->direction() == blockDirection) .. else .. in if (blockDirection == LTR) instead.
Comment 6 Xiaomei Ji 2011-05-25 19:13:04 PDT
(In reply to comment #5)
> (From update of attachment 94867 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=94867&action=review
> 
> > Source/WebCore/ChangeLog:10
> > +        the one at left or right of current position. It will have a performance hit. 
> 
> By will have a performance hit, do you mean that we'll have a performance regression?

Yes. I think it will have performance regression.
I think single space is more common, in which, previously, if a position is inside a box,
for half of the cases (where we have to look for the right kind of word break, in terms of before or after space,  that can not directly get from previousWordBreak or nextWordBreak), we could do nextWordPosition and previousWordPosition to find out its left or right word break.
But now, we need to find out all the word breaks inside the box and then find the left or right word break by comparing offsets.

I have not figured out a better way to solve this.

> 
> > Source/WebCore/editing/visible_units.cpp:1534
> >      if (box->direction() == blockDirection) {
> > -        if (blockDirection == RTL)
> > -            wordBreak = positionBeforeNextWord(visiblePosition);
> > -        else
> > +        if (blockDirection == LTR)
> >              wordBreak = previousWordPosition(visiblePosition);
> >      } else {
> > -        if (blockDirection == RTL)
> > -            wordBreak = positionAfterPreviousWord(visiblePosition);
> > -        else
> > +        if (blockDirection == LTR)
> >              wordBreak = nextWordPosition(visiblePosition);
> >      }
> 
> I would put if (box->direction() == blockDirection) .. else .. in if (blockDirection == LTR) instead.

Ah, you right.
Comment 7 Xiaomei Ji 2011-05-26 18:32:36 PDT
Created attachment 95090 [details]
patch w/ layout test
Comment 8 Xiaomei Ji 2011-06-02 17:09:24 PDT
Created attachment 95834 [details]
patch w/ layout test
Comment 9 Ryosuke Niwa 2011-06-06 18:10:12 PDT
Comment on attachment 95834 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=95834&action=review

> Source/WebCore/ChangeLog:12
> +        Remove positionBeforeNextWord and positionAfterPreviousWord short-cuts, which are not 
> +        correct for words separated by multiple spaces and words not separated by space.
> +
> +        For those cases, we will have to collect all the word breaks inside the box and look for
> +        the one at left or right of current position.

Why?  Why was old code wrong?  And how does removing those calls fix the bug?  I'd to see more explanation.
Comment 10 Xiaomei Ji 2011-06-06 18:41:20 PDT
Comment on attachment 95834 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=95834&action=review

> Source/WebCore/ChangeLog:13
> +

The old code was trying to find the right word boundary (before the space or after the space) by using previousWordPosition and nextWordPosition. It was introduced as short-cut to improve performance. But they are not correct functionality wise.

Using positionBeforeNextWord() as example, 

first, it checks whether the current position is one kind (either before the space or after the space, but not the right kind we are looking for) of word break position by checking the current position's previousWordPosition's nextWordPosition is the same position as current position, which is wrong for words separated by multiple spaces. For example, given words A and B separated by 3 continuous spaces "A   B", position "A|", "A |", and "A  |" should all be considered as word break positions, but not the right ones we are looking for. But the old code only takes care of case "A|", for which, positonBeforeNextWord() returns correct result "A   |B". It returns "A   B |C" for cases "A |" and "A  |", which are wrong.

second, The code was assuming calling nextWordPosition by going forward and calling previousWordPosition by going backward returns different positions, which is not true for words not separated by spaces. Given 3 Chinese words "ABC", when cursor is at "A|BC", current position is after current word, and positionBeforeNextWord() still returns current position after calling current position's nextWordPosition's previousWordPosition. It should returns position "AB|C" (ignore the naming accuracy now).
Comment 11 Ryosuke Niwa 2011-06-06 18:58:36 PDT
(In reply to comment #10)
> (From update of attachment 95834 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=95834&action=review
> 
> > Source/WebCore/ChangeLog:13
> > +
> 
> The old code was trying to find the right word boundary (before the space or after the space) by using previousWordPosition and nextWordPosition. It was introduced as short-cut to improve performance. But they are not correct functionality wise.
> 
> Using positionBeforeNextWord() as example, 
> 
> first, it checks whether the current position is one kind (either before the space or after the space, but not the right kind we are looking for) of word break position by checking the current position's previousWordPosition's nextWordPosition is the same position as current position, which is wrong for words separated by multiple spaces. For example, given words A and B separated by 3 continuous spaces "A   B", position "A|", "A |", and "A  |" should all be considered as word break positions, but not the right ones we are looking for. But the old code only takes care of case "A|", for which, positonBeforeNextWord() returns correct result "A   |B". It returns "A   B |C" for cases "A |" and "A  |", which are wrong.

I see.  In the case of "A |", do we hit positionAfterCurrentWord = nextWordPosition(position); and then positionAfterNextWord = nextWordPosition(positionAfterCurrentWord) ?

> second, The code was assuming calling nextWordPosition by going forward and calling previousWordPosition by going backward returns different positions, which is not true for words not separated by spaces. Given 3 Chinese words "ABC", when cursor is at "A|BC", current position is after current word, and positionBeforeNextWord() still returns current position after calling current position's nextWordPosition's previousWordPosition. It should returns position "AB|C" (ignore the naming accuracy now).

Ah, I see.

It'll be nice to summarize this explanation and put it in ChangeLog.
Comment 12 Xiaomei Ji 2011-06-06 22:41:43 PDT
(In reply to comment #11)
> (In reply to comment #10)
> > (From update of attachment 95834 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=95834&action=review
> > 
> > > Source/WebCore/ChangeLog:13
> > > +
> > 
> > The old code was trying to find the right word boundary (before the space or after the space) by using previousWordPosition and nextWordPosition. It was introduced as short-cut to improve performance. But they are not correct functionality wise.
> > 
> > Using positionBeforeNextWord() as example, 
> > 
> > first, it checks whether the current position is one kind (either before the space or after the space, but not the right kind we are looking for) of word break position by checking the current position's previousWordPosition's nextWordPosition is the same position as current position, which is wrong for words separated by multiple spaces. For example, given words A and B separated by 3 continuous spaces "A   B", position "A|", "A |", and "A  |" should all be considered as word break positions, but not the right ones we are looking for. But the old code only takes care of case "A|", for which, positonBeforeNextWord() returns correct result "A   |B". It returns "A   B |C" for cases "A |" and "A  |", which are wrong.
> 
> I see.  In the case of "A |", do we hit positionAfterCurrentWord = nextWordPosition(position); and then positionAfterNextWord = nextWordPosition(positionAfterCurrentWord) ?

Yes. That is right.

> 
> > second, The code was assuming calling nextWordPosition by going forward and calling previousWordPosition by going backward returns different positions, which is not true for words not separated by spaces. Given 3 Chinese words "ABC", when cursor is at "A|BC", current position is after current word, and positionBeforeNextWord() still returns current position after calling current position's nextWordPosition's previousWordPosition. It should returns position "AB|C" (ignore the naming accuracy now).
> 
> Ah, I see.
> 
> It'll be nice to summarize this explanation and put it in ChangeLog.

Sure. Will do.
Comment 13 Xiaomei Ji 2011-06-07 15:35:26 PDT
Created attachment 96316 [details]
patch w/ layout test
Comment 14 Ryosuke Niwa 2011-06-07 16:17:40 PDT
Comment on attachment 96316 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=96316&action=review

> Source/WebCore/ChangeLog:13
> +        Using positionBeforeNextWord() as example, 

I don't think this is a complete sentence.  Maybe "Consider positionBeforeNextWord() for example."?

> Source/WebCore/ChangeLog:17
> +        wrong for words separated by multiple spaces. For example, given words A and B separated by 

You should probably state why it's wrong.

> Source/WebCore/ChangeLog:20
> +        position after current word. But positionBeforeNextWord and positionAfterPreviousWord only 
> +        take care of case "A|".

Instead of what positionBeforeNextWord can do, you should state what it gets wrong.
Comment 15 Xiaomei Ji 2011-06-07 17:03:14 PDT
Created attachment 96330 [details]
patch w/ layout test

(In reply to comment #14)
> (From update of attachment 96316 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=96316&action=review
> 
> > Source/WebCore/ChangeLog:17
> > +        wrong for words separated by multiple spaces. For example, given words A and B separated by 
> 
> You should probably state why it's wrong.

I used an example to explain since I feel it is not easy to explain without an example.

> 
> > Source/WebCore/ChangeLog:20
> > +        position after current word. But positionBeforeNextWord and positionAfterPreviousWord only 
> > +        take care of case "A|".
> 
> Instead of what positionBeforeNextWord can do, you should state what it gets wrong.

How about?

 For example, given words A and B separated by
        3 continuous spaces "A   B", position "A|", "A |", and "A  |" should all be considered as
        position after current word. But for position "A |", its previousWordPosition's
        nextWordPosition is position "A|", which is different from its current position, so the
        current position is not considered as a position after current word, consequently,
        instead of returning the right position as "A   |B", positionBeforeNextWord returns the
        position before next next word, as "A   B |C". Similar happens for position "A  |".
Comment 16 Ryosuke Niwa 2011-06-07 18:09:59 PDT
(In reply to comment #15)
> (In reply to comment #14)
> > (From update of attachment 96316 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=96316&action=review
> > 
> > > Source/WebCore/ChangeLog:17
> > > +        wrong for words separated by multiple spaces. For example, given words A and B separated by 
> > 
> > You should probably state why it's wrong.
> 
> I used an example to explain since I feel it is not easy to explain without an example.
> 
> > 
> > > Source/WebCore/ChangeLog:20
> > > +        position after current word. But positionBeforeNextWord and positionAfterPreviousWord only 
> > > +        take care of case "A|".
> > 
> > Instead of what positionBeforeNextWord can do, you should state what it gets wrong.
> 
> How about?
> 
>  For example, given words A and B separated by
>         3 continuous spaces "A   B", position "A|", "A |", and "A  |" should all be considered as
>         position after current word. But for position "A |", its previousWordPosition's
>         nextWordPosition is position "A|", which is different from its current position, so the
>         current position is not considered as a position after current word, consequently,
>         instead of returning the right position as "A   |B", positionBeforeNextWord returns the
>         position before next next word, as "A   B |C". Similar happens for position "A  |".

Sounds good to me.
Comment 17 Xiaomei Ji 2011-06-08 11:03:22 PDT
Committed r88359: <http://trac.webkit.org/changeset/88359>