Bug 61324

Summary: --webkit-visual-word does not work well in words separated by multiple spaces
Product: WebKit Reporter: Xiaomei Ji <xji>
Component: HTML EditingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: playmobil, progame+wk, rniwa
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 25298    
Attachments:
Description Flags
patch w/ layout test
none
patch w/ layout test
none
patch w/ layout test
rniwa: review-
patch w/ layout test
none
patch w/ layout test rniwa: review+

Xiaomei Ji
Reported 2011-05-23 17:00:21 PDT
Thanks Yair who reported this. For example, following logical text in LTR context: " abc def hij". The word breaks are: " |abc | def | hij". When cursor is at position "abc |", press ctrl-right-arrow should move cursor to "|def", instead, it moves cursor to "def | hij". The automatic test has a flaw and misses this case. The automatic test uses sel.modify("move", "right", "character") followed by sel.modify("move", "right", "word") to test caret move by word in every character position. But sel.modify("move", "right", "character") actually does not stop at every character position, it bypasses collapsed spaces. For example, when cursor is at "abc |", sel.modify("move", "right", "character") moves cursor to "d|ef". The cause of this failure is in positionBeforeNextWord(), where it checks whether the position itself is a word break, and if it is, this position is used as positionAfterCurrentWord, and position before next word is calculated. But position "abc |" is not considered as position after current word since its previous word break's next word break is at position "abc |", not itself "abc |". I could not think of a better way to solve this except remove those positionBeforeNextWord/positionAfterPreviousWord short-cut.
Attachments
patch w/ layout test (20.09 KB, patch)
2011-05-25 15:14 PDT, Xiaomei Ji
no flags
patch w/ layout test (20.33 KB, patch)
2011-05-26 18:32 PDT, Xiaomei Ji
no flags
patch w/ layout test (21.82 KB, patch)
2011-06-02 17:09 PDT, Xiaomei Ji
rniwa: review-
patch w/ layout test (22.87 KB, patch)
2011-06-07 15:35 PDT, Xiaomei Ji
no flags
patch w/ layout test (23.20 KB, patch)
2011-06-07 17:03 PDT, Xiaomei Ji
rniwa: review+
Xiaomei Ji
Comment 1 2011-05-23 18:27:13 PDT
I have not figured out how to write an auto test for it. Given example: " abc | def", I am not able to set position at offset 9. sel.setPosition(node, 9) will set the position to 8. And I have to test white-space collapse. The code works fine if the style is preserve white spaces.
Ryosuke Niwa
Comment 2 2011-05-23 22:03:15 PDT
(In reply to comment #1) > I have not figured out how to write an auto test for it. > Given example: " abc | def", I am not able to set position at offset 9. > sel.setPosition(node, 9) will set the position to 8. What if we added a span at that point?
Xiaomei Ji
Comment 3 2011-05-23 23:12:28 PDT
(In reply to comment #2) > (In reply to comment #1) > > I have not figured out how to write an auto test for it. > > Given example: " abc | def", I am not able to set position at offset 9. > > sel.setPosition(node, 9) will set the position to 8. > > What if we added a span at that point? I doubt it helps. add a span makes what inside a span a separate node. Inside this node, we still would not have a way to set position in those collapsed spaces for testing.
Xiaomei Ji
Comment 4 2011-05-25 15:14:41 PDT
Created attachment 94867 [details] patch w/ layout test > And I have to test white-space collapse. The code works fine if the style is preserve white spaces. Actually, the code did not work completely correct when preserve white spaces. So, I can use them for testing.
Ryosuke Niwa
Comment 5 2011-05-25 18:57:59 PDT
Comment on attachment 94867 [details] patch w/ layout test View in context: https://bugs.webkit.org/attachment.cgi?id=94867&action=review > Source/WebCore/ChangeLog:10 > + the one at left or right of current position. It will have a performance hit. By will have a performance hit, do you mean that we'll have a performance regression? > Source/WebCore/editing/visible_units.cpp:1534 > if (box->direction() == blockDirection) { > - if (blockDirection == RTL) > - wordBreak = positionBeforeNextWord(visiblePosition); > - else > + if (blockDirection == LTR) > wordBreak = previousWordPosition(visiblePosition); > } else { > - if (blockDirection == RTL) > - wordBreak = positionAfterPreviousWord(visiblePosition); > - else > + if (blockDirection == LTR) > wordBreak = nextWordPosition(visiblePosition); > } I would put if (box->direction() == blockDirection) .. else .. in if (blockDirection == LTR) instead.
Xiaomei Ji
Comment 6 2011-05-25 19:13:04 PDT
(In reply to comment #5) > (From update of attachment 94867 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=94867&action=review > > > Source/WebCore/ChangeLog:10 > > + the one at left or right of current position. It will have a performance hit. > > By will have a performance hit, do you mean that we'll have a performance regression? Yes. I think it will have performance regression. I think single space is more common, in which, previously, if a position is inside a box, for half of the cases (where we have to look for the right kind of word break, in terms of before or after space, that can not directly get from previousWordBreak or nextWordBreak), we could do nextWordPosition and previousWordPosition to find out its left or right word break. But now, we need to find out all the word breaks inside the box and then find the left or right word break by comparing offsets. I have not figured out a better way to solve this. > > > Source/WebCore/editing/visible_units.cpp:1534 > > if (box->direction() == blockDirection) { > > - if (blockDirection == RTL) > > - wordBreak = positionBeforeNextWord(visiblePosition); > > - else > > + if (blockDirection == LTR) > > wordBreak = previousWordPosition(visiblePosition); > > } else { > > - if (blockDirection == RTL) > > - wordBreak = positionAfterPreviousWord(visiblePosition); > > - else > > + if (blockDirection == LTR) > > wordBreak = nextWordPosition(visiblePosition); > > } > > I would put if (box->direction() == blockDirection) .. else .. in if (blockDirection == LTR) instead. Ah, you right.
Xiaomei Ji
Comment 7 2011-05-26 18:32:36 PDT
Created attachment 95090 [details] patch w/ layout test
Xiaomei Ji
Comment 8 2011-06-02 17:09:24 PDT
Created attachment 95834 [details] patch w/ layout test
Ryosuke Niwa
Comment 9 2011-06-06 18:10:12 PDT
Comment on attachment 95834 [details] patch w/ layout test View in context: https://bugs.webkit.org/attachment.cgi?id=95834&action=review > Source/WebCore/ChangeLog:12 > + Remove positionBeforeNextWord and positionAfterPreviousWord short-cuts, which are not > + correct for words separated by multiple spaces and words not separated by space. > + > + For those cases, we will have to collect all the word breaks inside the box and look for > + the one at left or right of current position. Why? Why was old code wrong? And how does removing those calls fix the bug? I'd to see more explanation.
Xiaomei Ji
Comment 10 2011-06-06 18:41:20 PDT
Comment on attachment 95834 [details] patch w/ layout test View in context: https://bugs.webkit.org/attachment.cgi?id=95834&action=review > Source/WebCore/ChangeLog:13 > + The old code was trying to find the right word boundary (before the space or after the space) by using previousWordPosition and nextWordPosition. It was introduced as short-cut to improve performance. But they are not correct functionality wise. Using positionBeforeNextWord() as example, first, it checks whether the current position is one kind (either before the space or after the space, but not the right kind we are looking for) of word break position by checking the current position's previousWordPosition's nextWordPosition is the same position as current position, which is wrong for words separated by multiple spaces. For example, given words A and B separated by 3 continuous spaces "A B", position "A|", "A |", and "A |" should all be considered as word break positions, but not the right ones we are looking for. But the old code only takes care of case "A|", for which, positonBeforeNextWord() returns correct result "A |B". It returns "A B |C" for cases "A |" and "A |", which are wrong. second, The code was assuming calling nextWordPosition by going forward and calling previousWordPosition by going backward returns different positions, which is not true for words not separated by spaces. Given 3 Chinese words "ABC", when cursor is at "A|BC", current position is after current word, and positionBeforeNextWord() still returns current position after calling current position's nextWordPosition's previousWordPosition. It should returns position "AB|C" (ignore the naming accuracy now).
Ryosuke Niwa
Comment 11 2011-06-06 18:58:36 PDT
(In reply to comment #10) > (From update of attachment 95834 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=95834&action=review > > > Source/WebCore/ChangeLog:13 > > + > > The old code was trying to find the right word boundary (before the space or after the space) by using previousWordPosition and nextWordPosition. It was introduced as short-cut to improve performance. But they are not correct functionality wise. > > Using positionBeforeNextWord() as example, > > first, it checks whether the current position is one kind (either before the space or after the space, but not the right kind we are looking for) of word break position by checking the current position's previousWordPosition's nextWordPosition is the same position as current position, which is wrong for words separated by multiple spaces. For example, given words A and B separated by 3 continuous spaces "A B", position "A|", "A |", and "A |" should all be considered as word break positions, but not the right ones we are looking for. But the old code only takes care of case "A|", for which, positonBeforeNextWord() returns correct result "A |B". It returns "A B |C" for cases "A |" and "A |", which are wrong. I see. In the case of "A |", do we hit positionAfterCurrentWord = nextWordPosition(position); and then positionAfterNextWord = nextWordPosition(positionAfterCurrentWord) ? > second, The code was assuming calling nextWordPosition by going forward and calling previousWordPosition by going backward returns different positions, which is not true for words not separated by spaces. Given 3 Chinese words "ABC", when cursor is at "A|BC", current position is after current word, and positionBeforeNextWord() still returns current position after calling current position's nextWordPosition's previousWordPosition. It should returns position "AB|C" (ignore the naming accuracy now). Ah, I see. It'll be nice to summarize this explanation and put it in ChangeLog.
Xiaomei Ji
Comment 12 2011-06-06 22:41:43 PDT
(In reply to comment #11) > (In reply to comment #10) > > (From update of attachment 95834 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=95834&action=review > > > > > Source/WebCore/ChangeLog:13 > > > + > > > > The old code was trying to find the right word boundary (before the space or after the space) by using previousWordPosition and nextWordPosition. It was introduced as short-cut to improve performance. But they are not correct functionality wise. > > > > Using positionBeforeNextWord() as example, > > > > first, it checks whether the current position is one kind (either before the space or after the space, but not the right kind we are looking for) of word break position by checking the current position's previousWordPosition's nextWordPosition is the same position as current position, which is wrong for words separated by multiple spaces. For example, given words A and B separated by 3 continuous spaces "A B", position "A|", "A |", and "A |" should all be considered as word break positions, but not the right ones we are looking for. But the old code only takes care of case "A|", for which, positonBeforeNextWord() returns correct result "A |B". It returns "A B |C" for cases "A |" and "A |", which are wrong. > > I see. In the case of "A |", do we hit positionAfterCurrentWord = nextWordPosition(position); and then positionAfterNextWord = nextWordPosition(positionAfterCurrentWord) ? Yes. That is right. > > > second, The code was assuming calling nextWordPosition by going forward and calling previousWordPosition by going backward returns different positions, which is not true for words not separated by spaces. Given 3 Chinese words "ABC", when cursor is at "A|BC", current position is after current word, and positionBeforeNextWord() still returns current position after calling current position's nextWordPosition's previousWordPosition. It should returns position "AB|C" (ignore the naming accuracy now). > > Ah, I see. > > It'll be nice to summarize this explanation and put it in ChangeLog. Sure. Will do.
Xiaomei Ji
Comment 13 2011-06-07 15:35:26 PDT
Created attachment 96316 [details] patch w/ layout test
Ryosuke Niwa
Comment 14 2011-06-07 16:17:40 PDT
Comment on attachment 96316 [details] patch w/ layout test View in context: https://bugs.webkit.org/attachment.cgi?id=96316&action=review > Source/WebCore/ChangeLog:13 > + Using positionBeforeNextWord() as example, I don't think this is a complete sentence. Maybe "Consider positionBeforeNextWord() for example."? > Source/WebCore/ChangeLog:17 > + wrong for words separated by multiple spaces. For example, given words A and B separated by You should probably state why it's wrong. > Source/WebCore/ChangeLog:20 > + position after current word. But positionBeforeNextWord and positionAfterPreviousWord only > + take care of case "A|". Instead of what positionBeforeNextWord can do, you should state what it gets wrong.
Xiaomei Ji
Comment 15 2011-06-07 17:03:14 PDT
Created attachment 96330 [details] patch w/ layout test (In reply to comment #14) > (From update of attachment 96316 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=96316&action=review > > > Source/WebCore/ChangeLog:17 > > + wrong for words separated by multiple spaces. For example, given words A and B separated by > > You should probably state why it's wrong. I used an example to explain since I feel it is not easy to explain without an example. > > > Source/WebCore/ChangeLog:20 > > + position after current word. But positionBeforeNextWord and positionAfterPreviousWord only > > + take care of case "A|". > > Instead of what positionBeforeNextWord can do, you should state what it gets wrong. How about? For example, given words A and B separated by 3 continuous spaces "A B", position "A|", "A |", and "A |" should all be considered as position after current word. But for position "A |", its previousWordPosition's nextWordPosition is position "A|", which is different from its current position, so the current position is not considered as a position after current word, consequently, instead of returning the right position as "A |B", positionBeforeNextWord returns the position before next next word, as "A B |C". Similar happens for position "A |".
Ryosuke Niwa
Comment 16 2011-06-07 18:09:59 PDT
(In reply to comment #15) > (In reply to comment #14) > > (From update of attachment 96316 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=96316&action=review > > > > > Source/WebCore/ChangeLog:17 > > > + wrong for words separated by multiple spaces. For example, given words A and B separated by > > > > You should probably state why it's wrong. > > I used an example to explain since I feel it is not easy to explain without an example. > > > > > > Source/WebCore/ChangeLog:20 > > > + position after current word. But positionBeforeNextWord and positionAfterPreviousWord only > > > + take care of case "A|". > > > > Instead of what positionBeforeNextWord can do, you should state what it gets wrong. > > How about? > > For example, given words A and B separated by > 3 continuous spaces "A B", position "A|", "A |", and "A |" should all be considered as > position after current word. But for position "A |", its previousWordPosition's > nextWordPosition is position "A|", which is different from its current position, so the > current position is not considered as a position after current word, consequently, > instead of returning the right position as "A |B", positionBeforeNextWord returns the > position before next next word, as "A B |C". Similar happens for position "A |". Sounds good to me.
Xiaomei Ji
Comment 17 2011-06-08 11:03:22 PDT
Note You need to log in before you can comment on or make changes to this bug.