Thanks Yair who reported this. For example, following logical text in LTR context: " abc def hij". The word breaks are: " |abc | def | hij". When cursor is at position "abc |", press ctrl-right-arrow should move cursor to "|def", instead, it moves cursor to "def | hij". The automatic test has a flaw and misses this case. The automatic test uses sel.modify("move", "right", "character") followed by sel.modify("move", "right", "word") to test caret move by word in every character position. But sel.modify("move", "right", "character") actually does not stop at every character position, it bypasses collapsed spaces. For example, when cursor is at "abc |", sel.modify("move", "right", "character") moves cursor to "d|ef". The cause of this failure is in positionBeforeNextWord(), where it checks whether the position itself is a word break, and if it is, this position is used as positionAfterCurrentWord, and position before next word is calculated. But position "abc |" is not considered as position after current word since its previous word break's next word break is at position "abc |", not itself "abc |". I could not think of a better way to solve this except remove those positionBeforeNextWord/positionAfterPreviousWord short-cut.
I have not figured out how to write an auto test for it. Given example: " abc | def", I am not able to set position at offset 9. sel.setPosition(node, 9) will set the position to 8. And I have to test white-space collapse. The code works fine if the style is preserve white spaces.
(In reply to comment #1) > I have not figured out how to write an auto test for it. > Given example: " abc | def", I am not able to set position at offset 9. > sel.setPosition(node, 9) will set the position to 8. What if we added a span at that point?
(In reply to comment #2) > (In reply to comment #1) > > I have not figured out how to write an auto test for it. > > Given example: " abc | def", I am not able to set position at offset 9. > > sel.setPosition(node, 9) will set the position to 8. > > What if we added a span at that point? I doubt it helps. add a span makes what inside a span a separate node. Inside this node, we still would not have a way to set position in those collapsed spaces for testing.
Created attachment 94867 [details] patch w/ layout test > And I have to test white-space collapse. The code works fine if the style is preserve white spaces. Actually, the code did not work completely correct when preserve white spaces. So, I can use them for testing.
Comment on attachment 94867 [details] patch w/ layout test View in context: https://bugs.webkit.org/attachment.cgi?id=94867&action=review > Source/WebCore/ChangeLog:10 > + the one at left or right of current position. It will have a performance hit. By will have a performance hit, do you mean that we'll have a performance regression? > Source/WebCore/editing/visible_units.cpp:1534 > if (box->direction() == blockDirection) { > - if (blockDirection == RTL) > - wordBreak = positionBeforeNextWord(visiblePosition); > - else > + if (blockDirection == LTR) > wordBreak = previousWordPosition(visiblePosition); > } else { > - if (blockDirection == RTL) > - wordBreak = positionAfterPreviousWord(visiblePosition); > - else > + if (blockDirection == LTR) > wordBreak = nextWordPosition(visiblePosition); > } I would put if (box->direction() == blockDirection) .. else .. in if (blockDirection == LTR) instead.
(In reply to comment #5) > (From update of attachment 94867 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=94867&action=review > > > Source/WebCore/ChangeLog:10 > > + the one at left or right of current position. It will have a performance hit. > > By will have a performance hit, do you mean that we'll have a performance regression? Yes. I think it will have performance regression. I think single space is more common, in which, previously, if a position is inside a box, for half of the cases (where we have to look for the right kind of word break, in terms of before or after space, that can not directly get from previousWordBreak or nextWordBreak), we could do nextWordPosition and previousWordPosition to find out its left or right word break. But now, we need to find out all the word breaks inside the box and then find the left or right word break by comparing offsets. I have not figured out a better way to solve this. > > > Source/WebCore/editing/visible_units.cpp:1534 > > if (box->direction() == blockDirection) { > > - if (blockDirection == RTL) > > - wordBreak = positionBeforeNextWord(visiblePosition); > > - else > > + if (blockDirection == LTR) > > wordBreak = previousWordPosition(visiblePosition); > > } else { > > - if (blockDirection == RTL) > > - wordBreak = positionAfterPreviousWord(visiblePosition); > > - else > > + if (blockDirection == LTR) > > wordBreak = nextWordPosition(visiblePosition); > > } > > I would put if (box->direction() == blockDirection) .. else .. in if (blockDirection == LTR) instead. Ah, you right.
Created attachment 95090 [details] patch w/ layout test
Created attachment 95834 [details] patch w/ layout test
Comment on attachment 95834 [details] patch w/ layout test View in context: https://bugs.webkit.org/attachment.cgi?id=95834&action=review > Source/WebCore/ChangeLog:12 > + Remove positionBeforeNextWord and positionAfterPreviousWord short-cuts, which are not > + correct for words separated by multiple spaces and words not separated by space. > + > + For those cases, we will have to collect all the word breaks inside the box and look for > + the one at left or right of current position. Why? Why was old code wrong? And how does removing those calls fix the bug? I'd to see more explanation.
Comment on attachment 95834 [details] patch w/ layout test View in context: https://bugs.webkit.org/attachment.cgi?id=95834&action=review > Source/WebCore/ChangeLog:13 > + The old code was trying to find the right word boundary (before the space or after the space) by using previousWordPosition and nextWordPosition. It was introduced as short-cut to improve performance. But they are not correct functionality wise. Using positionBeforeNextWord() as example, first, it checks whether the current position is one kind (either before the space or after the space, but not the right kind we are looking for) of word break position by checking the current position's previousWordPosition's nextWordPosition is the same position as current position, which is wrong for words separated by multiple spaces. For example, given words A and B separated by 3 continuous spaces "A B", position "A|", "A |", and "A |" should all be considered as word break positions, but not the right ones we are looking for. But the old code only takes care of case "A|", for which, positonBeforeNextWord() returns correct result "A |B". It returns "A B |C" for cases "A |" and "A |", which are wrong. second, The code was assuming calling nextWordPosition by going forward and calling previousWordPosition by going backward returns different positions, which is not true for words not separated by spaces. Given 3 Chinese words "ABC", when cursor is at "A|BC", current position is after current word, and positionBeforeNextWord() still returns current position after calling current position's nextWordPosition's previousWordPosition. It should returns position "AB|C" (ignore the naming accuracy now).
(In reply to comment #10) > (From update of attachment 95834 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=95834&action=review > > > Source/WebCore/ChangeLog:13 > > + > > The old code was trying to find the right word boundary (before the space or after the space) by using previousWordPosition and nextWordPosition. It was introduced as short-cut to improve performance. But they are not correct functionality wise. > > Using positionBeforeNextWord() as example, > > first, it checks whether the current position is one kind (either before the space or after the space, but not the right kind we are looking for) of word break position by checking the current position's previousWordPosition's nextWordPosition is the same position as current position, which is wrong for words separated by multiple spaces. For example, given words A and B separated by 3 continuous spaces "A B", position "A|", "A |", and "A |" should all be considered as word break positions, but not the right ones we are looking for. But the old code only takes care of case "A|", for which, positonBeforeNextWord() returns correct result "A |B". It returns "A B |C" for cases "A |" and "A |", which are wrong. I see. In the case of "A |", do we hit positionAfterCurrentWord = nextWordPosition(position); and then positionAfterNextWord = nextWordPosition(positionAfterCurrentWord) ? > second, The code was assuming calling nextWordPosition by going forward and calling previousWordPosition by going backward returns different positions, which is not true for words not separated by spaces. Given 3 Chinese words "ABC", when cursor is at "A|BC", current position is after current word, and positionBeforeNextWord() still returns current position after calling current position's nextWordPosition's previousWordPosition. It should returns position "AB|C" (ignore the naming accuracy now). Ah, I see. It'll be nice to summarize this explanation and put it in ChangeLog.
(In reply to comment #11) > (In reply to comment #10) > > (From update of attachment 95834 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=95834&action=review > > > > > Source/WebCore/ChangeLog:13 > > > + > > > > The old code was trying to find the right word boundary (before the space or after the space) by using previousWordPosition and nextWordPosition. It was introduced as short-cut to improve performance. But they are not correct functionality wise. > > > > Using positionBeforeNextWord() as example, > > > > first, it checks whether the current position is one kind (either before the space or after the space, but not the right kind we are looking for) of word break position by checking the current position's previousWordPosition's nextWordPosition is the same position as current position, which is wrong for words separated by multiple spaces. For example, given words A and B separated by 3 continuous spaces "A B", position "A|", "A |", and "A |" should all be considered as word break positions, but not the right ones we are looking for. But the old code only takes care of case "A|", for which, positonBeforeNextWord() returns correct result "A |B". It returns "A B |C" for cases "A |" and "A |", which are wrong. > > I see. In the case of "A |", do we hit positionAfterCurrentWord = nextWordPosition(position); and then positionAfterNextWord = nextWordPosition(positionAfterCurrentWord) ? Yes. That is right. > > > second, The code was assuming calling nextWordPosition by going forward and calling previousWordPosition by going backward returns different positions, which is not true for words not separated by spaces. Given 3 Chinese words "ABC", when cursor is at "A|BC", current position is after current word, and positionBeforeNextWord() still returns current position after calling current position's nextWordPosition's previousWordPosition. It should returns position "AB|C" (ignore the naming accuracy now). > > Ah, I see. > > It'll be nice to summarize this explanation and put it in ChangeLog. Sure. Will do.
Created attachment 96316 [details] patch w/ layout test
Comment on attachment 96316 [details] patch w/ layout test View in context: https://bugs.webkit.org/attachment.cgi?id=96316&action=review > Source/WebCore/ChangeLog:13 > + Using positionBeforeNextWord() as example, I don't think this is a complete sentence. Maybe "Consider positionBeforeNextWord() for example."? > Source/WebCore/ChangeLog:17 > + wrong for words separated by multiple spaces. For example, given words A and B separated by You should probably state why it's wrong. > Source/WebCore/ChangeLog:20 > + position after current word. But positionBeforeNextWord and positionAfterPreviousWord only > + take care of case "A|". Instead of what positionBeforeNextWord can do, you should state what it gets wrong.
Created attachment 96330 [details] patch w/ layout test (In reply to comment #14) > (From update of attachment 96316 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=96316&action=review > > > Source/WebCore/ChangeLog:17 > > + wrong for words separated by multiple spaces. For example, given words A and B separated by > > You should probably state why it's wrong. I used an example to explain since I feel it is not easy to explain without an example. > > > Source/WebCore/ChangeLog:20 > > + position after current word. But positionBeforeNextWord and positionAfterPreviousWord only > > + take care of case "A|". > > Instead of what positionBeforeNextWord can do, you should state what it gets wrong. How about? For example, given words A and B separated by 3 continuous spaces "A B", position "A|", "A |", and "A |" should all be considered as position after current word. But for position "A |", its previousWordPosition's nextWordPosition is position "A|", which is different from its current position, so the current position is not considered as a position after current word, consequently, instead of returning the right position as "A |B", positionBeforeNextWord returns the position before next next word, as "A B |C". Similar happens for position "A |".
(In reply to comment #15) > (In reply to comment #14) > > (From update of attachment 96316 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=96316&action=review > > > > > Source/WebCore/ChangeLog:17 > > > + wrong for words separated by multiple spaces. For example, given words A and B separated by > > > > You should probably state why it's wrong. > > I used an example to explain since I feel it is not easy to explain without an example. > > > > > > Source/WebCore/ChangeLog:20 > > > + position after current word. But positionBeforeNextWord and positionAfterPreviousWord only > > > + take care of case "A|". > > > > Instead of what positionBeforeNextWord can do, you should state what it gets wrong. > > How about? > > For example, given words A and B separated by > 3 continuous spaces "A B", position "A|", "A |", and "A |" should all be considered as > position after current word. But for position "A |", its previousWordPosition's > nextWordPosition is position "A|", which is different from its current position, so the > current position is not considered as a position after current word, consequently, > instead of returning the right position as "A |B", positionBeforeNextWord returns the > position before next next word, as "A B |C". Similar happens for position "A |". Sounds good to me.
Committed r88359: <http://trac.webkit.org/changeset/88359>