61324 – --webkit-visual-word does not work well in words separated by multiple spaces

RESOLVED FIXED 61324

--webkit-visual-word does not work well in words separated by multiple spaces

https://bugs.webkit.org/show_bug.cgi?id=61324

Summary --webkit-visual-word does not work well in words separated by multiple spaces

Xiaomei Ji

Reported 2011-05-23 17:00:21 PDT

Thanks Yair who reported this. For example, following logical text in LTR context: " abc def hij". The word breaks are: " |abc | def | hij". When cursor is at position "abc |", press ctrl-right-arrow should move cursor to "|def", instead, it moves cursor to "def | hij". The automatic test has a flaw and misses this case. The automatic test uses sel.modify("move", "right", "character") followed by sel.modify("move", "right", "word") to test caret move by word in every character position. But sel.modify("move", "right", "character") actually does not stop at every character position, it bypasses collapsed spaces. For example, when cursor is at "abc |", sel.modify("move", "right", "character") moves cursor to "d|ef". The cause of this failure is in positionBeforeNextWord(), where it checks whether the position itself is a word break, and if it is, this position is used as positionAfterCurrentWord, and position before next word is calculated. But position "abc |" is not considered as position after current word since its previous word break's next word break is at position "abc |", not itself "abc |". I could not think of a better way to solve this except remove those positionBeforeNextWord/positionAfterPreviousWord short-cut.

Attachments
patch w/ layout test (20.09 KB, patch) 2011-05-25 15:14 PDT, Xiaomei Ji	no flags	Details Formatted Diff Diff
patch w/ layout test (20.33 KB, patch) 2011-05-26 18:32 PDT, Xiaomei Ji	no flags	Details Formatted Diff Diff
patch w/ layout test (21.82 KB, patch) 2011-06-02 17:09 PDT, Xiaomei Ji	rniwa: review-	Details Formatted Diff Diff
patch w/ layout test (22.87 KB, patch) 2011-06-07 15:35 PDT, Xiaomei Ji	no flags	Details Formatted Diff Diff
patch w/ layout test (23.20 KB, patch) 2011-06-07 17:03 PDT, Xiaomei Ji	rniwa: review+	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Xiaomei Ji

Comment 1 2011-05-23 18:27:13 PDT

I have not figured out how to write an auto test for it. Given example: " abc | def", I am not able to set position at offset 9. sel.setPosition(node, 9) will set the position to 8. And I have to test white-space collapse. The code works fine if the style is preserve white spaces.

Ryosuke Niwa

Comment 2 2011-05-23 22:03:15 PDT

(In reply to comment #1) > I have not figured out how to write an auto test for it. > Given example: " abc | def", I am not able to set position at offset 9. > sel.setPosition(node, 9) will set the position to 8. What if we added a span at that point?

Xiaomei Ji

Comment 3 2011-05-23 23:12:28 PDT

(In reply to comment #2) > (In reply to comment #1) > > I have not figured out how to write an auto test for it. > > Given example: " abc | def", I am not able to set position at offset 9. > > sel.setPosition(node, 9) will set the position to 8. > > What if we added a span at that point? I doubt it helps. add a span makes what inside a span a separate node. Inside this node, we still would not have a way to set position in those collapsed spaces for testing.

Xiaomei Ji

Comment 4 2011-05-25 15:14:41 PDT

Created attachment 94867 [details] patch w/ layout test > And I have to test white-space collapse. The code works fine if the style is preserve white spaces. Actually, the code did not work completely correct when preserve white spaces. So, I can use them for testing.

Ryosuke Niwa

Comment 5 2011-05-25 18:57:59 PDT

Comment on attachment 94867 [details] patch w/ layout test View in context: https://bugs.webkit.org/attachment.cgi?id=94867&action=review > Source/WebCore/ChangeLog:10 > + the one at left or right of current position. It will have a performance hit. By will have a performance hit, do you mean that we'll have a performance regression? > Source/WebCore/editing/visible_units.cpp:1534 > if (box->direction() == blockDirection) { > - if (blockDirection == RTL) > - wordBreak = positionBeforeNextWord(visiblePosition); > - else > + if (blockDirection == LTR) > wordBreak = previousWordPosition(visiblePosition); > } else { > - if (blockDirection == RTL) > - wordBreak = positionAfterPreviousWord(visiblePosition); > - else > + if (blockDirection == LTR) > wordBreak = nextWordPosition(visiblePosition); > } I would put if (box->direction() == blockDirection) .. else .. in if (blockDirection == LTR) instead.

Xiaomei Ji

Comment 6 2011-05-25 19:13:04 PDT

(In reply to comment #5) > (From update of attachment 94867 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=94867&action=review > > > Source/WebCore/ChangeLog:10 > > + the one at left or right of current position. It will have a performance hit. > > By will have a performance hit, do you mean that we'll have a performance regression? Yes. I think it will have performance regression. I think single space is more common, in which, previously, if a position is inside a box, for half of the cases (where we have to look for the right kind of word break, in terms of before or after space, that can not directly get from previousWordBreak or nextWordBreak), we could do nextWordPosition and previousWordPosition to find out its left or right word break. But now, we need to find out all the word breaks inside the box and then find the left or right word break by comparing offsets. I have not figured out a better way to solve this. > > > Source/WebCore/editing/visible_units.cpp:1534 > > if (box->direction() == blockDirection) { > > - if (blockDirection == RTL) > > - wordBreak = positionBeforeNextWord(visiblePosition); > > - else > > + if (blockDirection == LTR) > > wordBreak = previousWordPosition(visiblePosition); > > } else { > > - if (blockDirection == RTL) > > - wordBreak = positionAfterPreviousWord(visiblePosition); > > - else > > + if (blockDirection == LTR) > > wordBreak = nextWordPosition(visiblePosition); > > } > > I would put if (box->direction() == blockDirection) .. else .. in if (blockDirection == LTR) instead. Ah, you right.

Xiaomei Ji

Comment 7 2011-05-26 18:32:36 PDT

Created attachment 95090 [details] patch w/ layout test

Xiaomei Ji

Comment 8 2011-06-02 17:09:24 PDT

Created attachment 95834 [details] patch w/ layout test

Ryosuke Niwa

Comment 9 2011-06-06 18:10:12 PDT

Comment on attachment 95834 [details] patch w/ layout test View in context: https://bugs.webkit.org/attachment.cgi?id=95834&action=review > Source/WebCore/ChangeLog:12 > + Remove positionBeforeNextWord and positionAfterPreviousWord short-cuts, which are not > + correct for words separated by multiple spaces and words not separated by space. > + > + For those cases, we will have to collect all the word breaks inside the box and look for > + the one at left or right of current position. Why? Why was old code wrong? And how does removing those calls fix the bug? I'd to see more explanation.

Xiaomei Ji

Comment 10 2011-06-06 18:41:20 PDT

Comment on attachment 95834 [details] patch w/ layout test View in context: https://bugs.webkit.org/attachment.cgi?id=95834&action=review > Source/WebCore/ChangeLog:13 > + The old code was trying to find the right word boundary (before the space or after the space) by using previousWordPosition and nextWordPosition. It was introduced as short-cut to improve performance. But they are not correct functionality wise. Using positionBeforeNextWord() as example, first, it checks whether the current position is one kind (either before the space or after the space, but not the right kind we are looking for) of word break position by checking the current position's previousWordPosition's nextWordPosition is the same position as current position, which is wrong for words separated by multiple spaces. For example, given words A and B separated by 3 continuous spaces "A B", position "A|", "A |", and "A |" should all be considered as word break positions, but not the right ones we are looking for. But the old code only takes care of case "A|", for which, positonBeforeNextWord() returns correct result "A |B". It returns "A B |C" for cases "A |" and "A |", which are wrong. second, The code was assuming calling nextWordPosition by going forward and calling previousWordPosition by going backward returns different positions, which is not true for words not separated by spaces. Given 3 Chinese words "ABC", when cursor is at "A|BC", current position is after current word, and positionBeforeNextWord() still returns current position after calling current position's nextWordPosition's previousWordPosition. It should returns position "AB|C" (ignore the naming accuracy now).

Ryosuke Niwa

Comment 11 2011-06-06 18:58:36 PDT

(In reply to comment #10) > (From update of attachment 95834 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=95834&action=review > > > Source/WebCore/ChangeLog:13 > > + > > The old code was trying to find the right word boundary (before the space or after the space) by using previousWordPosition and nextWordPosition. It was introduced as short-cut to improve performance. But they are not correct functionality wise. > > Using positionBeforeNextWord() as example, > > first, it checks whether the current position is one kind (either before the space or after the space, but not the right kind we are looking for) of word break position by checking the current position's previousWordPosition's nextWordPosition is the same position as current position, which is wrong for words separated by multiple spaces. For example, given words A and B separated by 3 continuous spaces "A B", position "A|", "A |", and "A |" should all be considered as word break positions, but not the right ones we are looking for. But the old code only takes care of case "A|", for which, positonBeforeNextWord() returns correct result "A |B". It returns "A B |C" for cases "A |" and "A |", which are wrong. I see. In the case of "A |", do we hit positionAfterCurrentWord = nextWordPosition(position); and then positionAfterNextWord = nextWordPosition(positionAfterCurrentWord) ? > second, The code was assuming calling nextWordPosition by going forward and calling previousWordPosition by going backward returns different positions, which is not true for words not separated by spaces. Given 3 Chinese words "ABC", when cursor is at "A|BC", current position is after current word, and positionBeforeNextWord() still returns current position after calling current position's nextWordPosition's previousWordPosition. It should returns position "AB|C" (ignore the naming accuracy now). Ah, I see. It'll be nice to summarize this explanation and put it in ChangeLog.

Xiaomei Ji

Comment 12 2011-06-06 22:41:43 PDT

(In reply to comment #11) > (In reply to comment #10) > > (From update of attachment 95834 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=95834&action=review > > > > > Source/WebCore/ChangeLog:13 > > > + > > > > The old code was trying to find the right word boundary (before the space or after the space) by using previousWordPosition and nextWordPosition. It was introduced as short-cut to improve performance. But they are not correct functionality wise. > > > > Using positionBeforeNextWord() as example, > > > > first, it checks whether the current position is one kind (either before the space or after the space, but not the right kind we are looking for) of word break position by checking the current position's previousWordPosition's nextWordPosition is the same position as current position, which is wrong for words separated by multiple spaces. For example, given words A and B separated by 3 continuous spaces "A B", position "A|", "A |", and "A |" should all be considered as word break positions, but not the right ones we are looking for. But the old code only takes care of case "A|", for which, positonBeforeNextWord() returns correct result "A |B". It returns "A B |C" for cases "A |" and "A |", which are wrong. > > I see. In the case of "A |", do we hit positionAfterCurrentWord = nextWordPosition(position); and then positionAfterNextWord = nextWordPosition(positionAfterCurrentWord) ? Yes. That is right. > > > second, The code was assuming calling nextWordPosition by going forward and calling previousWordPosition by going backward returns different positions, which is not true for words not separated by spaces. Given 3 Chinese words "ABC", when cursor is at "A|BC", current position is after current word, and positionBeforeNextWord() still returns current position after calling current position's nextWordPosition's previousWordPosition. It should returns position "AB|C" (ignore the naming accuracy now). > > Ah, I see. > > It'll be nice to summarize this explanation and put it in ChangeLog. Sure. Will do.

Xiaomei Ji

Comment 13 2011-06-07 15:35:26 PDT

Created attachment 96316 [details] patch w/ layout test

Ryosuke Niwa

Comment 14 2011-06-07 16:17:40 PDT

Comment on attachment 96316 [details] patch w/ layout test View in context: https://bugs.webkit.org/attachment.cgi?id=96316&action=review > Source/WebCore/ChangeLog:13 > + Using positionBeforeNextWord() as example, I don't think this is a complete sentence. Maybe "Consider positionBeforeNextWord() for example."? > Source/WebCore/ChangeLog:17 > + wrong for words separated by multiple spaces. For example, given words A and B separated by You should probably state why it's wrong. > Source/WebCore/ChangeLog:20 > + position after current word. But positionBeforeNextWord and positionAfterPreviousWord only > + take care of case "A|". Instead of what positionBeforeNextWord can do, you should state what it gets wrong.

Xiaomei Ji

Comment 15 2011-06-07 17:03:14 PDT

Created attachment 96330 [details] patch w/ layout test (In reply to comment #14) > (From update of attachment 96316 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=96316&action=review > > > Source/WebCore/ChangeLog:17 > > + wrong for words separated by multiple spaces. For example, given words A and B separated by > > You should probably state why it's wrong. I used an example to explain since I feel it is not easy to explain without an example. > > > Source/WebCore/ChangeLog:20 > > + position after current word. But positionBeforeNextWord and positionAfterPreviousWord only > > + take care of case "A|". > > Instead of what positionBeforeNextWord can do, you should state what it gets wrong. How about? For example, given words A and B separated by 3 continuous spaces "A B", position "A|", "A |", and "A |" should all be considered as position after current word. But for position "A |", its previousWordPosition's nextWordPosition is position "A|", which is different from its current position, so the current position is not considered as a position after current word, consequently, instead of returning the right position as "A |B", positionBeforeNextWord returns the position before next next word, as "A B |C". Similar happens for position "A |".

Ryosuke Niwa

Comment 16 2011-06-07 18:09:59 PDT

(In reply to comment #15) > (In reply to comment #14) > > (From update of attachment 96316 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=96316&action=review > > > > > Source/WebCore/ChangeLog:17 > > > + wrong for words separated by multiple spaces. For example, given words A and B separated by > > > > You should probably state why it's wrong. > > I used an example to explain since I feel it is not easy to explain without an example. > > > > > > Source/WebCore/ChangeLog:20 > > > + position after current word. But positionBeforeNextWord and positionAfterPreviousWord only > > > + take care of case "A|". > > > > Instead of what positionBeforeNextWord can do, you should state what it gets wrong. > > How about? > > For example, given words A and B separated by > 3 continuous spaces "A B", position "A|", "A |", and "A |" should all be considered as > position after current word. But for position "A |", its previousWordPosition's > nextWordPosition is position "A|", which is different from its current position, so the > current position is not considered as a position after current word, consequently, > instead of returning the right position as "A |B", positionBeforeNextWord returns the > position before next next word, as "A B |C". Similar happens for position "A |". Sounds good to me.

Xiaomei Ji

Comment 17 2011-06-08 11:03:22 PDT

Committed r88359: <http://trac.webkit.org/changeset/88359>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component HTML Editing

Assignee

Nobody

Reported

2011-05-23 17:00 PDT

Modified

2011-06-08 11:03 PDT History

CC List

3 users Show

URL

Keywords

Depends on

Blocks

25298

Dependencies

tree graph