Bug 114871

Summary: [GTK] Reimplement atk_text_get_text_*_offset for WORD boundaries
Product: WebKit Reporter: Mario Sanchez Prada <mario>
Component: WebKitGTKAssignee: Mario Sanchez Prada <mario>
Status: RESOLVED FIXED    
Severity: Normal CC: aboxhall, apinheiro, cfleizach, commit-queue, dmazzoni, eflews.bot, gyuyoung.kim, jdiggs, mrobinson
Priority: P2 Keywords: Gtk
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 118179, 118908    
Bug Blocks: 114867, 114872, 114873    
Attachments:
Description Flags
Patch proposal
none
Patch proposal
eflews.bot: commit-queue-
Patch proposal
none
Patch proposal plus new Unit tests
none
Patch proposal plus new Unit tests
none
Patch proposal plus new Layout test
none
Patch proposal plus new Unit test mrobinson: review+

Description Mario Sanchez Prada 2013-04-19 06:49:51 PDT
We need to get rid of Pango/Gail dependency, and this is one of the functionalities we would need to fix.
Comment 1 Mario Sanchez Prada 2013-05-31 02:34:57 PDT
Created attachment 203426 [details]
Patch proposal

Attaching a patch proposal to fix this. I've checked this idea with Accerciser and Orca and seems to work fine. It also does not break any (layout or unit) test and I even added some more cases to those to check edge cases we were not checking previously (asking for "text before/after offset" for over the first/last position).

Thus, kindly asking for review. If this patch gets in I'll work on the LINE boundaries bug next :)
Comment 2 Mario Sanchez Prada 2013-05-31 03:28:41 PDT
Comment on attachment 203426 [details]
Patch proposal

View in context: https://bugs.webkit.org/attachment.cgi?id=203426&action=review

> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:686
> +    Node* node = originalPosition.deepEquivalent().containerNode();
> +    RefPtr<Range> selectedRange = Range::create(node->document(), startPosition.deepEquivalent(), endPosition.deepEquivalent());

Informal auto-review: these two lines should NOT be here. Please ignore them when doing the actual review
Comment 3 Mario Sanchez Prada 2013-05-31 06:30:16 PDT
Created attachment 203438 [details]
Patch proposal

New version of the patch fixing the previous issue and adding some more bits to refactor code related to translate offsets between the WebCore to Atk worlds.

This refactor will also be used by further patches for LINE and SENTENCE boundaries, although they are a win already right now (and increases readability)
Comment 4 EFL EWS Bot 2013-05-31 06:34:41 PDT
Comment on attachment 203438 [details]
Patch proposal

Attachment 203438 [details] did not pass efl-ews (efl):
Output: http://webkit-queues.appspot.com/results/662392
Comment 5 EFL EWS Bot 2013-05-31 06:34:56 PDT
Comment on attachment 203438 [details]
Patch proposal

Attachment 203438 [details] did not pass efl-wk2-ews (efl-wk2):
Output: http://webkit-queues.appspot.com/results/734182
Comment 6 Mario Sanchez Prada 2013-05-31 06:51:58 PDT
Created attachment 203441 [details]
Patch proposal

Make EWS bots happy
Comment 7 Martin Robinson 2013-06-01 01:06:00 PDT
Comment on attachment 203441 [details]
Patch proposal

Looks like this is a couple patches appended?
Comment 8 Mario Sanchez Prada 2013-06-01 02:41:53 PDT
(In reply to comment #7)
> (From update of attachment 203441 [details])
> Looks like this is a couple patches appended?

Not really. Although I agree we could split it into two separate things: maybe one for the refactoring and other for the implementation of WORD (based on that), or something like that?
Comment 9 Martin Robinson 2013-06-20 13:18:01 PDT
Comment on attachment 203441 [details]
Patch proposal

View in context: https://bugs.webkit.org/attachment.cgi?id=203441&action=review

Sorry for the late review. I've a couple questions about the patch. Nice testing!

> Source/WebCore/ChangeLog:50
> +

Double changelog here.

> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:473
> +static gint offsetAdjustmentForObject(const AccessibilityObject* object)

Maybe call this listItemMarkerOffsetForObject?

> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:489
> +static gint webCoreOffsetToAtkOffset(const AccessibilityObject* object, gint offset)

You shouldn't use glib types for private functions. So gint -> int and for the rest of this patch.

> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:544
> +    Node* referenceNode = rangeStartNode->isInShadowTree() ? rangeStartNode->highestAncestor() : node;

Does the shadow DOM still exist?

> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:644
> +    // word following.

I'm curious why you don't use things like startOfWord and endOfWord to do these tasks.

> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:691
> +        if (isStartOfWord(originalPosition) && isWhitespace(originalPosition.characterBefore()))

In what situations can something be a start of a word and not prefixed by whitespace. Does this account for if this position is the very first position in the document?

> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:700
> +        endPosition = nextWordPosition(startPosition);

Why not endOfWord?

> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:806
> +        return webkitAccessibleTextGetWord(text, offset, boundaryType, textPosition, startOffset, endOffset);

webkitAccessibleTextGetWord -> webkitAccessibleTextGetAtWordBoundary?

> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:813
> +    // has been properly implemented without using Pango/Cairo.
>      GailOffsetType offsetType;
>      switch (textPosition) {
>      case GetTextPositionBefore:

I guess this isn't the last one?
Comment 10 Mario Sanchez Prada 2013-06-26 06:07:14 PDT
Comment on attachment 203441 [details]
Patch proposal

View in context: https://bugs.webkit.org/attachment.cgi?id=203441&action=review

Thanks for the review Martin, and no problem for the delay. I've been busy with other matters as well and wouldn't have had time before now anyway, so "perfect timing" in the end :).

Now see my comments below...

>> Source/WebCore/ChangeLog:50
>> +
> 
> Double changelog here.

Oops! Will fix that.

>> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:473
>> +static gint offsetAdjustmentForObject(const AccessibilityObject* object)
> 
> Maybe call this listItemMarkerOffsetForObject?

Well, the idea is that it might server for more adjustments in the future, not just the list item. That said, reality is that at this moment it is just about list items, so I'm fine with changing the name now.

>> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:489
>> +static gint webCoreOffsetToAtkOffset(const AccessibilityObject* object, gint offset)
> 
> You shouldn't use glib types for private functions. So gint -> int and for the rest of this patch.

Argh! Will fix it. Thx

>> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:544
>> +    Node* referenceNode = rangeStartNode->isInShadowTree() ? rangeStartNode->highestAncestor() : node;
> 
> Does the shadow DOM still exist?

At the time of writing this patch it did, but I will double check that.

>> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:644
>> +    // word following.
> 
> I'm curious why you don't use things like startOfWord and endOfWord to do these tasks.

:-) Believe me I tried, but there's a explanation for that change:

The reason is that semantics for startOfWord() and endOfWord() are slightly different than what we need to do here, since we need to take into account different kind of boundaries (WORD_START, WORD_END), deal with special characters (e.g. the period or the '+') in special ways to provide what ATs expect to receive when you ask things like "What are the word boundaries for the current offset?". Also, startOfWord() and endOfWord() consider a "whitespace between words" as yet another "word", which is clearly not what we want, and more things like that.

For instance, if you have the following sentence:

  "This is a stupid sentence. WebKitGTK+ rocks. My taylor is rich"

If you just use startOfWord() and endOfWord(), you could easily conclude that things like " ", ". " or "+ " are valid words for the WORD_START boundary, unless you do a lot of specific, case-by-case, tests.

However, if you use this "trick" going forward and backwards (which I got the idea for from FrameSelection.cpp, as per Enrica's comment while at the meeting), it's more easy to get the expected results without having to deal with all those special cases yourself, since are already considered in lower layers.

>> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:691
>> +        if (isStartOfWord(originalPosition) && isWhitespace(originalPosition.characterBefore()))
> 
> In what situations can something be a start of a word and not prefixed by whitespace. Does this account for if this position is the very first position in the document?

According to the semantics of startOfWort(), which considers spaces between words as another "word", you can get into this situation fairly easily.

For instance, if you have the "foo bar baz" string and you are in the 4th position (right at the end of "foo" and before the space), you would exactly be in that situation. Furthermore, if you call endOfWord() from that position you will be given the 5th position, right before "baz", since that " " in the middle is a "word" to the eyes of those functions.

>> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:700
>> +        endPosition = nextWordPosition(startPosition);
> 
> Why not endOfWord?

Because that would return you the beginning of the word coming right after the end of the previous word, which is basically where the space between the two words ends (so you would have the boundaries for the whitespace between them).

If you have doubts and want to check your self what happens, think that nextWordPosition() works pretty similar to Ctrl+RightArrow when you have your cursor located in an editable text field. For instance, if you have the sentence "foo bar baz" and you put the cursor at the end of "foo", what happens if you press Ctrl+RightArrow? You move to the end of "bar", which is exactly what you want here. That's why :)

>> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:806
>> +        return webkitAccessibleTextGetWord(text, offset, boundaryType, textPosition, startOffset, endOffset);
> 
> webkitAccessibleTextGetWord -> webkitAccessibleTextGetAtWordBoundary?

Ok

>> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:813
>>      case GetTextPositionBefore:
> 
> I guess this isn't the last one?

Nope. Unfortunately we still need one for LINE and other for SENTENCE boundaries. Hopefully those should be easier once we have the WORD boundary fixed, I believe.
Comment 11 Mario Sanchez Prada 2013-06-26 08:05:00 PDT
(In reply to comment #10)
> [...]
> >> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:806
> >> +        return webkitAccessibleTextGetWord(text, offset, boundaryType, textPosition, startOffset, endOffset);
> > 
> > webkitAccessibleTextGetWord -> webkitAccessibleTextGetAtWordBoundary?
> 
> Ok

Well, actually I would name it webkitAccessibleTextGetWordForBoundary() instead, since this static function is used internally for the AT, AFTER and BEFORE cases (see the textPosition param), not only the AT one.

Also, you are not getting anything "at the word boundary", you are getting the word at/after/before the word at the specified offset :)
Comment 12 Mario Sanchez Prada 2013-06-28 05:20:58 PDT
(In reply to comment #8)
> (In reply to comment #7)
> > (From update of attachment 203441 [details] [details])
> > Looks like this is a couple patches appended?
> 
> Not really. Although I agree we could split it into two separate things: maybe
> one for the refactoring and other for the implementation of WORD (based on
> that), or something like that?

I did this splitting in the end. Find the patch for the refactoring waiting for review here: https://bugs.webkit.org/show_bug.cgi?id=118179
Comment 13 Mario Sanchez Prada 2013-07-12 05:30:31 PDT
(In reply to comment #10)
> [...]
> >> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:544
> >> +    Node* referenceNode = rangeStartNode->isInShadowTree() ? rangeStartNode->highestAncestor() : node;
> > 
> > Does the shadow DOM still exist?
> 
> At the time of writing this patch it did, but I will double check that.

Sorry for the confusion with this. You were right, we do not have Shadow DOM in WebKitGTK+ and so this is not a valid approach anymore.

Fortunately, I've been working in a new patch for this bug which uses a different approach (based on VisiblePositions) so we don't need this Shadow DOM thing anymore.
Comment 14 Mario Sanchez Prada 2013-07-12 05:35:23 PDT
Created attachment 206528 [details]
Patch proposal plus new Unit tests

After working for a while on the previous patch + the suggestions from Martin, I realized there were some issues with that patch that needed more changes, so here you have the new patch, which I believe works at least as well (or as bad) as the previous -pango based- implementation. I'd even say it works better, at least in the tests I did, but I'm prudent and won't say that :)

Additionally, I found some issues while developing it (some of them already present in the previous implementation) that I've fixed along the way with this patch, so that's why I'm also providing a new unit test for checking things related to embedded objects, and extending a bit previous ones.

I'd therefore kindly ask for a review over this one and will move on to the LINE boundary in the meantime... getting closer to the no-pango thing!
Comment 15 EFL EWS Bot 2013-07-12 05:40:12 PDT
Comment on attachment 206528 [details]
Patch proposal plus new Unit tests

Attachment 206528 [details] did not pass efl-ews (efl):
Output: http://webkit-queues.appspot.com/results/899774
Comment 16 EFL EWS Bot 2013-07-12 05:40:39 PDT
Comment on attachment 206528 [details]
Patch proposal plus new Unit tests

Attachment 206528 [details] did not pass efl-wk2-ews (efl-wk2):
Output: http://webkit-queues.appspot.com/results/913794
Comment 17 Mario Sanchez Prada 2013-07-15 03:10:25 PDT
Created attachment 206652 [details]
Patch proposal plus new Unit tests

New patch
Comment 18 Mario Sanchez Prada 2013-07-16 10:10:06 PDT
Created attachment 206793 [details]
Patch proposal plus new Layout test

... and yet another patch fixing a minor issue I found while working on the LINE boundary patch which, btw, I'm attaching to bugzilla right now too :)

I know GTK reviewers are quite busy these days, but if we could get this ones reviewed soon that would be awesome. Thanks!
Comment 19 Mario Sanchez Prada 2013-07-19 10:07:09 PDT
Created attachment 207115 [details]
Patch proposal plus new Unit test

New patch after taking some stuff from the previous one out to a different bug (see bug 118908)
Comment 20 Mario Sanchez Prada 2013-07-29 10:34:37 PDT
Martin, would you mind reviewing this one now that the blocking bug 118908 has been fixed? Thanks!
Comment 21 Martin Robinson 2013-08-02 03:16:50 PDT
Comment on attachment 207115 [details]
Patch proposal plus new Unit test

View in context: https://bugs.webkit.org/attachment.cgi?id=207115&action=review

> Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:645
> +static gchar* webkitAccessibleTextGetChar(AtkText* text, gint offset, GetTextRelativePosition textPosition, gint* startOffset, gint* endOffset)

gchar -> char, gint -> int
Comment 22 Mario Sanchez Prada 2013-08-02 09:00:04 PDT
Thanks Martin for reviewing this patch (the biggest one, btw!)

(In reply to comment #21)
> (From update of attachment 207115 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=207115&action=review
> 
> > Source/WebCore/accessibility/atk/WebKitAccessibleInterfaceText.cpp:645
> > +static gchar* webkitAccessibleTextGetChar(AtkText* text, gint offset, GetTextRelativePosition textPosition, gint* startOffset, gint* endOffset)
> 
> gchar -> char, gint -> int

I'll do that before landing
Comment 23 Mario Sanchez Prada 2013-08-02 09:01:05 PDT
Committed r153652: <http://trac.webkit.org/changeset/153652>