one way to go might be: provide a flexible internal API which can return logical word break before or after space depending on passing-in parameter. expose the public API by passing the correct parameter so that the logical word break matches native platform's. --webkit-visual-word should use this internal API, so, some of the cases where word break in a box is collected will not be needed, instead, we could just use logical word break by passing in correct parameter depends on box and block's directionality.
related issue: https://bugs.webkit.org/show_bug.cgi?id=64392
Created attachment 103122 [details] WIP patch, would like some feedback Detailed description of changes: - Extracted functionality of findNextWordFromIndex into findNextWordFromIndexConsideringWordBreakPositioningRelativeToWord (this name is WIP) so that it can be parametrized with whether the word break should be logically before or after the word while maintaining existing functionality. - Added four new search functions to be passed into nextBoundary/previousBoundary to choose where the word break should be. This is what I'd like most feedback on as doing it this way results in some duplicate code. An alternative to adding four new search functions introducing some duplicate code is to add a parameter to each search function; while there is no duplicate code, this would require changing more code (which includes adding parameters that will be unused in search functions endWordBoundary, endSentenceBoundary, and nextSentencePositionBoundary). - In right/leftWordPositionAcrossBoundary, we never need to collect word breaks within a single box; this is an improvement because previously when moving right in an LTR block or left in an RTL block, word breaks would always be collected; this is significant for cases where runs of text of one directionality are very long.
How does this improve the performance? Can you give us some metrics?
We are using logical word break to get the visual word break. due to the way logical word break works (in terms of whether the word break is before or after the space), even when the start position is within the box, we are not able to correctly get the visual word break (in terms of before or after space) by directly using logical word break in 50% of the cases. Using moving left as example, Given a LTR box "abc def|" in LTR block, we can get visually left word break "abc |def" by directly using previousWordPosition(). Given a RTL box "FED CBA|" in LTR block, we can get the visually left word break "FED |CBA" by directly using nextWordPosition(). But given LTR box "abc def|" in RTL block, we are not able to get the visual word break "abc| def" (which is a position after previous word) by directly using previousWordPosition(). (please refer to http://trac.webkit.org/changeset/88359 for more details). Instead, we are collecting all word breaks in the box and compare offset to get the correct visual word break. Similar for a RTL box "FED CBA|" in RTL block. If we provide flexible logical word breaker (in terms of whether it breaks before or after space), we could always use that directly when start position is inside box, instead of collecting word breaks of the box and comparing the offset to get the correct word break for 50% cases. Also, we could use this flexible logical word breaker in some cases when start position is at the boundary.
The question is whether this is really impacting the performance or not. You should use Shirk on Mac or some other means to measure the performance impact of this patch.
Just running visual word movement tests with --repeat & measure timing is good. Make 10+ measurements and compute stdev.
Also, this bug title is too general. With the current title, any patch that tries to improve the performance of --webkit-visual-word should be posted here. That's not ideal. If you have a specific fix that improves the performance measurably, then please file a separate bug as a blocker of this bug.
I still don't understand the benefit of this change. Is there a mesurable difference after this change? Otherwise, it seems to bloats the code pretty badly.
Hi Ryosuke. There is a strong performance improvement with this change. If you consider visual word movement code as doing either: 1) getting the next/previous word break within a box 2) collecting *all* the word breaks in a box using (1) and choosing the correct next word break This patch essentially makes the need for (2) much less frequent. Right now, if we have: |abc |def ... |xyz| (LTR context) for every word movement we'd collect every word break in the box. Now with this patch, we just get the correct next/previous word break in time proportional to the length of the next word (as opposed to the length of the entire box). I'll submit profiling results in a few.
When I profiled the release build of DumpRenderTree with --repeat-each 5 on LayoutTests/editing/selection/move-by-word-visually*, I got the results (only showing relevant part): Before change: 3.8% rightWordPosition 1.9% collectWordBreaksInBox 1.7% rightWordBoundary 0.1% nextBoundary ... After change: 3.4% rightWordPosition 1.7% rightWordBoundary 1.2% collectWordBreaksInBox 0.3% nextBoundary ... Results for leftWordPosition are similar. This makes sense as, before the change, calls to collectWordBreaksInBox occurred when going right in LTR blocks and left in RTL blocks (roughly accounts for half of test cases). Now word breaks don't need to be collected for those cases; but they still need to be collected when the correct word break is at the logical end of line or when moving from one box to another.
(In reply to comment #8) > I still don't understand the benefit of this change. Is there a mesurable difference after this change? Otherwise, it seems to bloats the code pretty badly. This patch provides 4 word search functions depends on the word break positions relative to word (before word or after word). The pro is cleaner code logic, and the con is code duplication (we could make existing function previousWordPositionBoundary call previousWordPositionBoundaryLogicallyAfterWord to reduce the duplication, if we want to fix the logical word break behavior across platform). An alternative is: add one more parameter in nextBoundary(), previousBoundary(), previousWordPositionBounary(), nextWordPositionBoundary(), findNextWordFromIndex(), and other 6 search functions in visible_unit.cpp (including startSentenceBoundary and endSentenceBounary, in which the newly added parameter is just a placeholder). Inside function findNextWordFromIndex(), depends on the passing in parameter, return logical word break before or after word. the word breaker search functions just pass in the IN parameter to findNextWordFromIndex(). the caller of nextBoundary()/previousBoundary() pass in the correct (before word or after word) parameter based on their usage: for visual word break, the value of the parameter depends on the platform, box directionality, and block directionality. For logical word break (previousWordPosition and nextWordPosition), the value of the parameter depends on the platform (editing behavior. related to https://bugs.webkit.org/show_bug.cgi?id=64392). The pro of the alternative: reduce code duplication, and it could be used to solve issue 64392 as well. The con of the alternative: adding placeholder parameters to the already heavily parameter-loaded search functions such as startSentenceBoundary(). We are debating on the approaches. Van prefer the approach in the patch, and I prefer the alternative. But we believe neither is ideal. Would appreciate your inputs! One more note, the uploaded patch provides findNextWordFromIndexConsideringWordBreakPositioningRelativeToWord() as a cross-platform function, the existing findNextWordFromIndex() is platform-dependent.
(In reply to comment #10) > When I profiled the release build of DumpRenderTree with --repeat-each 5 on LayoutTests/editing/selection/move-by-word-visually*, I got the results (only showing relevant part): > > Before change: > 3.8% rightWordPosition > 1.9% collectWordBreaksInBox > 1.7% rightWordBoundary > 0.1% nextBoundary > ... > > After change: > 3.4% rightWordPosition > 1.7% rightWordBoundary > 1.2% collectWordBreaksInBox > 0.3% nextBoundary > ... > > Results for leftWordPosition are similar. This makes sense as, before the change, calls to collectWordBreaksInBox occurred when going right in LTR blocks and left in RTL blocks (roughly accounts for half of test cases). Now word breaks don't need to be collected for those cases; but they still need to be collected when the correct word break is at the logical end of line or when moving from one box to another. rightWordBoundary (and leftWordBoundary) could be improved using the same technic. currently, word breaks are collected in the box for 50% of the cases. Then, nextBoundary() and previousBoundary() will take even higher percentage of time, and we can continue work on reducing constructing of VisiblePosition.
Created attachment 103716 [details] Proposed fix
Attachment 103716 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/edit..." exit_code: 1 Source/WebCore/platform/text/TextBoundaries.h:33: Code inside a namespace should not be indented. [whitespace/indent] [4] Total errors found: 1 in 5 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 103717 [details] Proposed fix Fixed style
Comment on attachment 103717 [details] Proposed fix Attachment 103717 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/9358439
The measured performance improvements don't seem particularly large. Also, our problem is with calling these functions many times for each operation, not necessarily with them being slow. I think that prior to adding new code, we should better define the concepts it works with. Different ports return drastically different results for word boundaries, and considering how basic these operations are, that causes really ugly bugs (see e.g. bug 65898 or bug 65999). While we cannot even agree upon what word boundaries are, adding "word boundaries considering word break positioning" seems counter-productive.
I agree with Alexey's sentiment here. We should try investigating different approaches to see which optimization pays off the most. When I did optimization in editing, the improvement I saw were in the order of 1-2% in release and 30-40% in debug. Anything below 1% seems ineffective at best.
(In reply to comment #17) > The measured performance improvements don't seem particularly large. Also, our problem is with calling these functions many times for each operation, not necessarily with them being slow. Yes. you are right. The problem is calling these functions many times. And the improvement is to decrease the number of calls. When we compute visual word breaks, we are calling logical word break many times because logical word break does not return the correct (in terms of before or after space) word break position visual word breaker is looking for. For example, given a string "abc def hij", nextWordPosition() returns the following position "abc| def| hij", and previousWordPosition() returns the following position "|abc |def |hij". when start from beginning and move left to right, if the visual word break I am looking for is "abc |def |hij", then, I am not able to use nextWordPosition(). Instead, I collected the word breaks inside this box starting from end of the string and using previousWordPosition(), the collected word breaks are "abc def |hij", then, "abc |def hij", then, "|abc def hij". And by comparing the word break offset and current position offset, return "abc |def hij" as the visual word break. The proposed improvement is to provide an *internal* API which can return flexible logical word break in terms of space. Given the above example, if there is an *internal* API that returns "abc |def |hij" as logical next word breaker, then, I can use that to get the visual word breaker in one call. > > I think that prior to adding new code, we should better define the concepts it works with. Different ports return drastically different results for word boundaries, and considering how basic these operations are, that causes really ugly bugs (see e.g. bug 65898 or bug 65999). > While we cannot even agree upon what word boundaries are, adding "word boundaries considering word break positioning" seems counter-productive.
(In reply to comment #19) > The proposed improvement is to provide an *internal* API which can return flexible logical word break in terms of space. Given the above example, if there is an *internal* API that returns "abc |def |hij" as logical next word breaker, then, I can use that to get the visual word breaker in one call. Regardless of how logical and rationale the improvement is, I don't think 0.4% performance improvement justifies the maintenance burden introduced by doubling the number of functions. For this to make sense, the performance improvement needs to be much more substantial.
(In reply to comment #18) > I agree with Alexey's sentiment here. We should try investigating different approaches to see which optimization pays off the most. When I did optimization in editing, the improvement I saw were in the order of 1-2% in release and 30-40% in debug. Anything below 1% seems ineffective at best. when you say 1-2% and 30-40%, are you talking about relative or absolute numbers? For example, rightWordPosition from 3.8% to 3.4%, is the improvement 0.4% or ~10% (0.4/3.8)? collectWordbreakInBox has 30%+ gain (0.7/1.9). And I believe it and rightWordBoundary should improve further if we decrease the collectWordBreakInBox calls in rightWordBoundary. I think this is the 1st step of improvement (reduce the number of calls to nextBoundary()), after which, the cost of nextBoundary() will be more evident, and that is the next step for improvement.
I should have clarified when I posted the profiling results the percentages are relative to running DRT; so originally 3.8% of the time was spent in rightWordPosition and after the change, 3.4%. This can be approximately seen as a ~10% speedup as Xiaomei pointed out.
I don't think that we should focus much on performance of ad hoc tests. Move by word is not a very common editing operation.
(In reply to comment #23) > I don't think that we should focus much on performance of ad hoc tests. Move by word is not a very common editing operation. hm... shift-arrow (alt-arrow in Mac) should move caret by word visually is the top #1 issue reported by chrome native users.
Clarification on the patch: The overall performance speedup is 0.8% (going left and right) when running in DRT; about 7.5% is spent in visual word movement code. From within visual word movement code, however, this is about a 10% speedup; this might be improved to 15-20% if we more aggressively utilize this API within rightWordBoundary and leftWordBoundary. A secondary (probably more significant) purpose of this patch is that it will make future work much simpler to implement, most directly the Mac/Linux version (https://bugs.webkit.org/show_bug.cgi?id=65773). I've implemented a working Mac/Linux version but did not submit it for review just because it is not very understandable. I think with this change it would be much easier to implement and maintain.
Created attachment 103963 [details] Proposed fix
Attachment 103963 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/edit..." exit_code: 1 Source/WebCore/platform/text/TextBoundaries.h:33: Code inside a namespace should not be indented. [whitespace/indent] [4] Source/WebCore/platform/text/TextBoundaries.h:45: The parameter name "wordBreakPositioning" adds no information, so it should be removed. [readability/parameter_name] [5] Source/WebCore/platform/text/brew/TextBoundariesBrew.cpp:39: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] Source/WebCore/platform/text/brew/TextBoundariesBrew.cpp:40: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] Total errors found: 4 in 7 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 103969 [details] Proposed fix
Comment on attachment 103969 [details] Proposed fix View in context: https://bugs.webkit.org/attachment.cgi?id=103969&action=review > Source/WebCore/ChangeLog:21 > + (WebCore::previousWordPositionBoundaryLogicallyBeforeWord): > + (WebCore::previousWordPositionBoundaryLogicallyAfterWord): > + (WebCore::nextWordPositionBoundaryLogicallyBeforeWord): > + (WebCore::nextWordPositionBoundaryLogicallyAfterWord): > + (WebCore::leftWordPositionAcrossBoundary): > + (WebCore::rightWordPositionAcrossBoundary): I cannot really figure out from these function names what they are doing. Does "logically" have something to do with RTL languages? What is a "position across boundary"?
(In reply to comment #29) > (From update of attachment 103969 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=103969&action=review > > > Source/WebCore/ChangeLog:21 > > + (WebCore::previousWordPositionBoundaryLogicallyBeforeWord): > > + (WebCore::previousWordPositionBoundaryLogicallyAfterWord): > > + (WebCore::nextWordPositionBoundaryLogicallyBeforeWord): > > + (WebCore::nextWordPositionBoundaryLogicallyAfterWord): > > + (WebCore::leftWordPositionAcrossBoundary): > > + (WebCore::rightWordPositionAcrossBoundary): > > I cannot really figure out from these function names what they are doing. Does "logically" have something to do with RTL languages? What is a "position across boundary"? I'm aiming to specify where word breaks should be. So "logically after word" means those boundaries that follow words: In LTR text: abc| def| RTL: |FED |CBA The usage of these functions is that one is passed into nextBoundary/previousBoundary to specify where the word break should be; it would mean the difference between: (calling nextBoundary) |abc def -> abc| def and |abc def -> abc |def Currently when calling nextBoundary, there is only one clear function to pass in at the word-movement level and that will always return the word break logically after the word; same goes for previousBoundary except it returns the word break logically before the word. An alternative is to specify visually (previousWordPositionBoundaryOnLeftOfWord, etc.).
(In reply to comment #29) > (From update of attachment 103969 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=103969&action=review > > > Source/WebCore/ChangeLog:21 > > + (WebCore::previousWordPositionBoundaryLogicallyBeforeWord): > > + (WebCore::previousWordPositionBoundaryLogicallyAfterWord): > > + (WebCore::nextWordPositionBoundaryLogicallyBeforeWord): > > + (WebCore::nextWordPositionBoundaryLogicallyAfterWord): > > + (WebCore::leftWordPositionAcrossBoundary): > > + (WebCore::rightWordPositionAcrossBoundary): > > I cannot really figure out from these function names what they are doing. Does "logically" have something to do with RTL languages? What is a "position across boundary"? "position across boundary" is a position does not honor editable boundary. Maybe I should rename it to "right/leftWordPositionAcrossEditableBoundary".
Both names mean that the result will always be across boundary, which is apparently untrue.
I talked with Van on IRC, and the purpose of this patch is apparently not the performance but rather to implement Windows and Linux behaviors. So let's focus on that and not on the superfluous performance improvement.
Comment on attachment 103969 [details] Proposed fix View in context: https://bugs.webkit.org/attachment.cgi?id=103969&action=review Why is there no change in platform/text/mac/TextBoundaries.mm? > Source/WebCore/ChangeLog:19 > + (WebCore::previousWordPositionBoundaryLogicallyBeforeWord): > + (WebCore::previousWordPositionBoundaryLogicallyAfterWord): > + (WebCore::nextWordPositionBoundaryLogicallyBeforeWord): > + (WebCore::nextWordPositionBoundaryLogicallyAfterWord): Before and after are logical concepts so "logically" is redundant. > Source/WebCore/platform/text/TextBoundaries.cpp:112 > +#if !PLATFORM(BREWMP) && !PLATFORM(MAC) && !PLATFORM(QT) We shouldn't be duplicating code in Brew and Qt and leaving Mac behind. r- because of this. > Source/WebCore/platform/text/qt/TextBoundariesQt.cpp:82 > +int findNextWordFromIndex(UChar const* buffer, int len, int position, bool forward) > +{ > + return findNextWordFromIndexConsideringWordBreakPositioningRelativeToWord > + (buffer, len, position, forward, forward ? LogicallyAfterWord : LogicallyBeforeWord); > +} > + What's the point of duplicating the function here?
(In reply to comment #34) > (From update of attachment 103969 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=103969&action=review > > Why is there no change in platform/text/mac/TextBoundaries.mm? > > > Source/WebCore/ChangeLog:19 > > + (WebCore::previousWordPositionBoundaryLogicallyBeforeWord): > > + (WebCore::previousWordPositionBoundaryLogicallyAfterWord): > > + (WebCore::nextWordPositionBoundaryLogicallyBeforeWord): > > + (WebCore::nextWordPositionBoundaryLogicallyAfterWord): As far as visual word movement goes it seemed like the implementation in TextBoundaries.cpp works on Mac; but to make the change uniform across platforms I'll add the duplication. > > Before and after are logical concepts so "logically" is redundant. > > > Source/WebCore/platform/text/TextBoundaries.cpp:112 > > +#if !PLATFORM(BREWMP) && !PLATFORM(MAC) && !PLATFORM(QT) > > We shouldn't be duplicating code in Brew and Qt and leaving Mac behind. r- because of this. > > > Source/WebCore/platform/text/qt/TextBoundariesQt.cpp:82 > > +int findNextWordFromIndex(UChar const* buffer, int len, int position, bool forward) > > +{ > > + return findNextWordFromIndexConsideringWordBreakPositioningRelativeToWord > > + (buffer, len, position, forward, forward ? LogicallyAfterWord : LogicallyBeforeWord); > > +} > > + > > What's the point of duplicating the function here? Good catch; there only needs to be one copy of findNextWordFromIndex that delegates to the platform's findNextWordFromIndexConsideringWordBreakPositioningRelativeToWord. Thanks for the review.
Created attachment 104254 [details] Proposed fix Modified patch following review and comments. Would like feedback on my changes in TextBoundaries.mm (not sure if my decision to use TextBreakIterator is correct).
I don't think I'm qualified to review this patch. ap, darin, or eseidel should do.
I think Dan also understand the issues here, perhaps better than I do.
I have commented about function names above, and those comments still stand. This patch adds a good deal of confusion to an already confused area of code.
(In reply to comment #39) > I have commented about function names above, and those comments still stand. This patch adds a good deal of confusion to an already confused area of code. Hi Alexey, I understand that the naming of right/leftWordPositionAcrossBoundary should change but thought that the renaming and several other stylistic to-dos would best be addressed in a separate cleanup patch. I also understand that this isn't a clean patch. I've considered an alternative to implement this based on the observation: You can chain calls to nextBoundary and previousBoundary to get the correct position. For example in the common case, you can get the next word break before word in LTR text by calling previousBoundary(nextBoundary(nextBoundary(currentVisiblePosition, nextWordPositionBoundary), nextWordPositionBoundary), previousWordPositionBoundary). While this avoids platform-level changes there are many cases to consider especially in bidi text and it is slower. I'm open to ideas on how best to implement this functionality as it's blocking https://bugs.webkit.org/show_bug.cgi?id=65773. Thanks
Created attachment 104432 [details] Proposed (alternative) fix
(In reply to comment #41) > Created an attachment (id=104432) [details] > Proposed (alternative) fix This seems the right fix for the code removed in http://trac.webkit.org/changeset/88359.
Comment on attachment 104432 [details] Proposed (alternative) fix View in context: https://bugs.webkit.org/attachment.cgi?id=104432&action=review > Source/WebCore/editing/visible_units.cpp:1591 > +static VisiblePosition nextWordBoundaryBeforeWord(VisiblePosition startingPosition, InlineBox* startingBox, int startingOffset) Why do we ned startingBox and startingOffset?
Comment on attachment 104432 [details] Proposed (alternative) fix View in context: https://bugs.webkit.org/attachment.cgi?id=104432&action=review > Source/WebCore/editing/visible_units.cpp:1601 > + if (currentOffset > startingOffset) should we checking (currentBox == startingBox &&....)? > Source/WebCore/editing/visible_units.cpp:1624 > + return prevNext; ditto
(In reply to comment #43) > (From update of attachment 104432 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=104432&action=review > > > Source/WebCore/editing/visible_units.cpp:1591 > > +static VisiblePosition nextWordBoundaryBeforeWord(VisiblePosition startingPosition, InlineBox* startingBox, int startingOffset) > > Why do we ned startingBox and startingOffset? We could obtain these from the VisiblePosition passed in but since these two functions are called from right/leftWordPositionIgnoringEditingBoundary in which we've already called getInlineBoxAndOffset, I chose to just pass in the box and offset instead of recalculating. The actual box and offset are used to check if the result actually makes sense; since these two functions rely on next/previousBoundary which are logical functions, if the result is outside the box there's no guarantee that there's visual continuity.
(In reply to comment #44) > (From update of attachment 104432 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=104432&action=review > > > Source/WebCore/editing/visible_units.cpp:1601 > > + if (currentOffset > startingOffset) > > should we checking (currentBox == startingBox &&....)? > > > Source/WebCore/editing/visible_units.cpp:1624 > > + return prevNext; > > ditto If we exit early because of this check then (In reply to comment #44) > (From update of attachment 104432 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=104432&action=review > > > Source/WebCore/editing/visible_units.cpp:1601 > > + if (currentOffset > startingOffset) > > should we checking (currentBox == startingBox &&....)? > > > Source/WebCore/editing/visible_units.cpp:1624 > > + return prevNext; > > ditto Right. I didn't catch this as the return values of these two functions are checked using positionIsInBoxButNotOnBoundary right after being called. But we should never return a position outside of startingBox to be safe. I'll resubmit with the additional check.
Created attachment 104591 [details] Revised (alternative) fix Corrected previous patch following Xiaomei's comment.
Comment on attachment 104591 [details] Revised (alternative) fix View in context: https://bugs.webkit.org/attachment.cgi?id=104591&action=review > Source/WebCore/editing/visible_units.cpp:1591 > +static VisiblePosition nextWordBoundaryBeforeWord(VisiblePosition startingPosition, InlineBox* startingBox, int startingOffset) This function's interface is quite confusing. Why does this function take BOTH startingPosition and startingBox & startingOffset. I'd rather make an extra call to getInlineBoxAndOffset than requiring callers to pass in the correct arguments. It appears that this function wants to take RenderedPosition as the argument after all. > Source/WebCore/editing/visible_units.cpp:1593 > + VisiblePosition next = nextBoundary(startingPosition, nextWordPositionBoundary); next is a very vague name. In this function, we're concerned about word boundary and on which side of the word boundary we're at. So your variable name should reflect that. nextPrev, nextNextPrev do not convey such information and should therefore be be avoided. > LayoutTests/ChangeLog:8 > + My change fixed this failure. You should explain what has been fixed and why it has been fixed.
Created attachment 105225 [details] Revised (alternative) fix Thanks for the review. Here's my updated patch.
Ping reviewers. Would really appreciate a review for my updated patch as I want to work on 65773.
I'm having fairly hard time trying to understand what the new functions do. Let's look at nextWordBoundaryBeforeWord() for a specific example. What will it return in each of the following cases (vertical line denoting starting position)? te|st test te|st test test |test test test test| test test |test test test | test test +1|0 test +++++++|++++++ test U|.S.A. test "te|st" test
Also, <p>te|st</p><p>test</p> (two words separated by a paragraph break).
(In reply to comment #51) > I'm having fairly hard time trying to understand what the new functions do. > > Let's look at nextWordBoundaryBeforeWord() for a specific example. What will it return in each of the following cases (vertical line denoting starting position)? > > te|st > test te|st test > test |test test > test test| test > test |test test > test | test test > +1|0 > test +++++++|++++++ test > U|.S.A. > test "te|st" test Hi Alexey, nextWordBoundaryBeforeWord is based off the observation that in most cases, the next word boundary from a position somePosition that lies before a word is previousBoundary(nextBoundary(nextBoundary(somePosition, _), _), _); let the underscores be the appropriate search functions passed in (next/previousWordPositionBoundary). Example: Given text "a|bc def ghi" where | indicates the starting position, call nextBoundary: a|bc def ghi -> abc| def ghi call nextBoundary: abc| def ghi -> abc def| ghi call previousBoundary: abc def| ghi -> abc |def ghi This doesn't work when 1) the current position is at the end of a word or 2) when nextBoundary returns the last word boundary in the block. In the latter case, VisiblePosition() is returned to indicate that the desired word break is not in the box of the starting position. As for the cases: te|st -> VisiblePosition() test te|st test -> test test |test test |test test -> test test |test test |test test -> test test |test +1|0 -> VisiblePosition() test +++++++|++++++ test -> test ++++++++|+++++ test U|.S.A. -> VisiblePosition() test "te|st" test -> test "test" |test These are the results in Safari on Mac; results may differ slightly on other platforms as the functions eventually delegate to findNextWordFromIndex in TextBoundaries.
(In reply to comment #52) > Also, <p>te|st</p><p>test</p> (two words separated by a paragraph break). In this case nextWordBoundaryBeforeWord returns VisiblePosition(). Because the two "test" strings are in different blocks (I think this is the correct term) the behavior of the function will work like: call nextBoundary: <p>te|st</p><p>test</p> -> <p>test|</p><p>test</p> call nextBoundary: <p>test|</p><p>test</p> -> <p>test|</p><p>test</p> (same position returned) At this point the code returns VisiblePosition() and the correct word break (at the end of the first line) is computed in another part of visual word movement code.
Some comments: 1. It's important to know not just what function currently does, but what it's supposed to do. There are two ways to describe that in function name - either by telling what exactly it does, or when exactly it's supposed to be called. 2. nextWordBoundaryBeforeWord(VisiblePosition) looks as if it returned next word boundary before word that contained the VisiblePosition, which makes no sense. 3. You changed one of the test cases, so I'm unsure what happens for "test | test test". 4. You are saying that there are platform differences allowed in the results. What differences are OK? Mac version of findNextWordFromIndex() isn't based on ICU, and I don't know how different it can be. In particular, how will we avoid issues like bug 65898 going forward? 5. If implementing Windows style alt-arrow movement is the goal, perhaps the code should make this explicit. Also, being unable to move across paragraphs is incorrect for this use case.
(In reply to comment #55) > Some comments: > 1. It's important to know not just what function currently does, but what it's supposed to do. There are two ways to describe that in function name - either by telling what exactly it does, or when exactly it's supposed to be called. > 2. nextWordBoundaryBeforeWord(VisiblePosition) looks as if it returned next word boundary before word that contained the VisiblePosition, which makes no sense. > 3. You changed one of the test cases, so I'm unsure what happens for "test | test test". > 4. You are saying that there are platform differences allowed in the results. What differences are OK? Mac version of findNextWordFromIndex() isn't based on ICU, and I don't know how different it can be. In particular, how will we avoid issues like bug 65898 going forward? > 5. If implementing Windows style alt-arrow movement is the goal, perhaps the code should make this explicit. Also, being unable to move across paragraphs is incorrect for this use case. 1) I should make it explicit that these functions work within a single box. 2) How's wordBoundaryBeforeNextWord? I think this is more clear. 3) Ah, I overlooked this test case. Here it is: test | test test -> test |test test 4) As far as I know every platform's findNextWordFromIndex returns the word boundary after the next word; the differences that will affect this function are which special non-alphanumeric characters get treated as parts of words. As for bug 65898, isn't this a problem that should be fixed within findWordBoundary and findNextWordFromIndex? Right now "foo bar _baz_" breaks the logical word movement as well (alt left arrow after "_baz_"). 5) The ultimate goal is to implement the Mac style alt-arrow movement (as the Windows version is already implemented); to do this we (Xiaomei and I) planned to return the correct word break based on box directionality, block directionality, movement direction, and editing behavior. Having these two functions would simplify the code a lot. I should point out that these two functions would mainly be used as helper functions within visual word movement code; in the end alt-arrow will move across paragraphs. Thanks for your feedback!
> 3) Ah, I overlooked this test case. Here it is: test | test test -> test |test test This result seems substantially different from others.
(In reply to comment #57) > > 3) Ah, I overlooked this test case. Here it is: test | test test -> test |test test > > This result seems substantially different from others. If you consider all word breaks before words, nextWordBoundaryBeforeWord will return the logically next one after the passed-in position. The behavior of this function on this case aligns with Windows alt-arrow behavior.
(In reply to comment #58) > If you consider all word breaks before words, nextWordBoundaryBeforeWord I think you mean to say “word boundaries” not “word breaks”. And no, I don’t think that the term “word boundary” specifically means the place before a word. We’d need a more-specific term for that. A word boundary would mean both the place before a word and the place after a word.
I think that we need better abstraction for code to be used in cursor movement iterator. For instance, there is no guarantee that Option-left arrow wants the same breaks as Option-delete. Current code is guilty of having lots of specialized functions with generic names in visible_units.cpp and then mistakenly reusing them when subtly different behavior is needed. Adding more of that is a step in a wrong direction.
rniwa told me to add some comments here. Basically I am in the need for a method similar to VisualSelection.expandUsingGranularity(WordGranularity);, but that does not include spaces and never considers spaces to be a word. This is needed for injecting compositions into our virtual keyboard when clicking on a word in an editable field.
Comment on attachment 105225 [details] Revised (alternative) fix This patch is obsolete after xji's rewrite of visual word movement patch.