25415 – [GTK][ATK] Please implement support for get_text_at_offset

Joanmarie Diggs

Reported 2009-04-26 18:48:13 PDT

The accessible text interface includes methods to get a specified amount of text, given a caret offset and a boundary type. For more information please see http://library.gnome.org/devel/pygtk/stable/class-atktext.html. Usage/need example: (Once caret-moved events have been implemented) A user presses Down Arrow and WebKit emits a caret-moved event with, say, detail1 == 30. As a result, an AT such as Orca should present the line of text at offset 30 which it could obtain by: acc.queryText().getTextAtOffset(30, TEXT_BOUNDARY_LINE_START) The expected result would be something like: ('This is a test', 25, 39) Currently we get something like: ('', -1074032920, 152169976) As a result, it is difficult to present the new position to the user. Thanks!

Xan Lopez

Comment 1 2009-05-11 13:27:06 PDT

OK, so I have a patch for atk_text_get_text_{at,before,after}_offset, for now I support WORD_START, WORD_END and CHAR boundaries. One question though: what are you supposed to do in 'degenerate' cases? Example: get the previous word when you are at the first word, or the next word when you are at the last one, etc. Does not seem to be documented.

Xan Lopez

Comment 2 2009-05-11 13:32:14 PDT

Oh, and remember when I said this seemed easy to implement? Lies.

Joanmarie Diggs

Comment 3 2009-05-11 13:52:43 PDT

(In reply to comment #1) > OK, so I have a patch for atk_text_get_text_{at,before,after}_offset, for now I > support WORD_START, WORD_END and CHAR boundaries. Excellent! > One question though: what are > you supposed to do in 'degenerate' cases? Example: get the previous word when > you are at the first word, or the next word when you are at the last one, etc. I'd go with ('', 0, 0) > Does not seem to be documented. Documentation is for the weak. ;-) The only reason I know/am suggesting the above is because that's what I've grown accustomed to seeing in the cases you describe. If I didn't have an answer, however, what I would do is see what other apps do -- starting with either gtk-demo and/or Gedit (which as a general rule seems to use standard widgets). (In reply to comment #2) > Oh, and remember when I said this seemed easy to implement? Lies. Heh. :-) (And sorry! And thank you for doing it!!)

Xan Lopez

Comment 4 2009-05-12 08:57:54 PDT

Created attachment 30234 [details] gettextat.patch OK, all the stuff is implemented now. I don't have tests for the LINE_{START,END} boundaries because I don't seem to be able to convince WebKit to load strings with line breaks. Either me being stupid or a bug, I'll figure it out, but in any case this can be already reviewed IMHO.

Joanmarie Diggs

Comment 5 2009-05-12 19:18:24 PDT

Xan this is awesome! Thanks!! > OK, all the stuff is implemented now. I don't have tests for the > LINE_{START,END} boundaries because I don't seem to be able to convince WebKit > to load strings with line breaks. When I read this, I initially took it to mean the functionality for ATs was implemented (i.e. ATs could get the text at/before/after the LINE_{START,END} boundary) and that there were WebKit tests that were not yet in place. In testing it, I'm seeing ('', 0, 0) when I try to get the text of a line. No worries as long as you know that line support doesn't seem to be implemented. I also noticed that your implementation doesn't include the character(s) serving as the boundary (i.e. space and/or punctuation mark), whereas other apps and toolkits do include it. For instance, given the sentence "This is another, silly test." and using Accerciser's IPython console: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Gedit and OOo Writer: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In [1]: text = acc.queryText() In [2]: text.getTextAtOffset(8, TEXT_BOUNDARY_WORD_START) Out[2]: ('another, ', 8, 17) In [3]: text.getTextAtOffset(8, TEXT_BOUNDARY_WORD_END) Out[3]: (' another', 7, 15) In [4]: text.getTextAtOffset(23, TEXT_BOUNDARY_WORD_START) Out[4]: ('test.', 23, 28) In [5]: text.getTextAtOffset(23, TEXT_BOUNDARY_WORD_END) Out[5]: (' test', 22, 27) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Gecko: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In [1]:text = acc.queryText() In [2]:text.getTextAtOffset(8, TEXT_BOUNDARY_WORD_START) Out[2]: ('another, ', 8, 17) In [3]:text.getTextAtOffset(8, TEXT_BOUNDARY_WORD_END) Out[3]: (' another,', 7, 16) In [4]:text.getTextAtOffset(23, TEXT_BOUNDARY_WORD_START) Out[4]: ('test.', 23, 28) In [5]:text.getTextAtOffset(23, TEXT_BOUNDARY_WORD_END) Out[5]: (' test.', 22, 28) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WebKit: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In [1]: text = acc.queryText() In [2]: text.getTextAtOffset(8, TEXT_BOUNDARY_WORD_START) Out[2]: ('another', 8, 15) In [3]: text.getTextAtOffset(8, TEXT_BOUNDARY_WORD_END) Out[3]: ('another', 8, 15) In [4]: text.getTextAtOffset(23, TEXT_BOUNDARY_WORD_START) Out[4]: ('test', 23, 27) In [5]: text.getTextAtOffset(23, TEXT_BOUNDARY_WORD_END) Out[5]: ('test', 23, 27) I'm not convinced the inclusion of the space in each word is critical (although given that Gedit, OOo, and Gecko all do it, perhaps it is??). That said, the inclusion of any punctuation that defines the boundary is important. If a user presses Control+Right and moves to the word 'another,' Orca needs to know that there's a comma attached to the word so that we display it in braille and -- when appropriate -- present it in speech.

Xan Lopez

Comment 6 2009-05-14 08:14:45 PDT

(In reply to comment #5) > Xan this is awesome! Thanks!! > > > OK, all the stuff is implemented now. I don't have tests for the > > LINE_{START,END} boundaries because I don't seem to be able to convince WebKit > > to load strings with line breaks. > > When I read this, I initially took it to mean the functionality for ATs was > implemented (i.e. ATs could get the text at/before/after the LINE_{START,END} > boundary) and that there were WebKit tests that were not yet in place. In > testing it, I'm seeing ('', 0, 0) when I try to get the text of a line. No > worries as long as you know that line support doesn't seem to be implemented. Mmm, I thought I had, but it seems I was wrong. Looking at what Gecko does it seems to me that for LINE_{START,END} we want the *visual* lines, not the logical ones, right? That is, what is returned in a getText(0, -1) might be "One two three four five six seven eight nine ten.", but if that's split in 5 lines of two items in the browser getTextAtOffset(0, TEXT_BOUNDARY_LINE_START) would be "one two". Am I right? > > I also noticed that your implementation doesn't include the character(s) > serving as the boundary (i.e. space and/or punctuation mark), whereas other > apps and toolkits do include it. For instance, given the sentence "This is > another, silly test." and using Accerciser's IPython console: > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Gedit and OOo Writer: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > In [1]: text = acc.queryText() > In [2]: text.getTextAtOffset(8, TEXT_BOUNDARY_WORD_START) > Out[2]: ('another, ', 8, 17) > In [3]: text.getTextAtOffset(8, TEXT_BOUNDARY_WORD_END) > Out[3]: (' another', 7, 15) > In [4]: text.getTextAtOffset(23, TEXT_BOUNDARY_WORD_START) > Out[4]: ('test.', 23, 28) > In [5]: text.getTextAtOffset(23, TEXT_BOUNDARY_WORD_END) > Out[5]: (' test', 22, 27) > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Gecko: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > In [1]:text = acc.queryText() > In [2]:text.getTextAtOffset(8, TEXT_BOUNDARY_WORD_START) > Out[2]: ('another, ', 8, 17) > In [3]:text.getTextAtOffset(8, TEXT_BOUNDARY_WORD_END) > Out[3]: (' another,', 7, 16) > In [4]:text.getTextAtOffset(23, TEXT_BOUNDARY_WORD_START) > Out[4]: ('test.', 23, 28) > In [5]:text.getTextAtOffset(23, TEXT_BOUNDARY_WORD_END) > Out[5]: (' test.', 22, 28) > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > WebKit: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > In [1]: text = acc.queryText() > In [2]: text.getTextAtOffset(8, TEXT_BOUNDARY_WORD_START) > Out[2]: ('another', 8, 15) > In [3]: text.getTextAtOffset(8, TEXT_BOUNDARY_WORD_END) > Out[3]: ('another', 8, 15) > In [4]: text.getTextAtOffset(23, TEXT_BOUNDARY_WORD_START) > Out[4]: ('test', 23, 27) > In [5]: text.getTextAtOffset(23, TEXT_BOUNDARY_WORD_END) > Out[5]: ('test', 23, 27) > > I'm not convinced the inclusion of the space in each word is critical (although > given that Gedit, OOo, and Gecko all do it, perhaps it is??). That said, the > inclusion of any punctuation that defines the boundary is important. If a user > presses Control+Right and moves to the word 'another,' Orca needs to know that > there's a comma attached to the word so that we display it in braille and -- > when appropriate -- present it in speech. OK, I'll look into this.

Joanmarie Diggs

Comment 7 2009-05-14 08:50:43 PDT

(In reply to comment #6) > Mmm, I thought I had, but it seems I was wrong. Looking at what Gecko does it > seems to me that for LINE_{START,END} we want the *visual* lines, not the > logical ones, right? That is, what is returned in a getText(0, -1) might be > "One two three four five six seven eight nine ten.", but if that's split in 5 > lines of two items in the browser getTextAtOffset(0, TEXT_BOUNDARY_LINE_START) > would be "one two". Am I right? You are indeed. And this should occur regardless of why the line is split (e.g. forced to split by <br /> or just happens to split due to text wrapping). > OK, I'll look into this. Awesome. Thanks!

Xan Lopez

Comment 8 2009-05-14 09:39:18 PDT

OK, one thing. Say I have this string: "This is." (3 spaces between 'This' and 'is'). If I ask for getTextAtOffset(7, TEXT_BOUNDARY_WORD_END) should I really get: " is" ? I think this might complicate the implementation enough that I might want to rewrite it from scratch, not sure.

Joanmarie Diggs

Comment 9 2009-05-14 10:22:23 PDT

(In reply to comment #8) > OK, one thing. Say I have this string: "This is." (3 spaces between 'This' > and 'is'). If I ask for getTextAtOffset(7, TEXT_BOUNDARY_WORD_END) should I > really get: " is" ? I'd not thought of that case before. But I just looked at the AT-SPI IDL and it says: "TEXT_BOUNDARY_WORD_END: Boundary condition is the end of a word; i.e. range is from the end of one word to the end of another." So, yeah, I suppose you should get " is" in that case. > I think this might complicate the implementation enough > that I might want to rewrite it from scratch, not sure. Sorry!

Xan Lopez

Comment 10 2009-05-15 01:22:55 PDT

Created attachment 30370 [details] gettextv2.patch OK, I *think* this now works as it should, but please review it carefully :) I have removed the LINE_{START,END} implementation, since it was completely wrong. I'll try to figure out how to get the visual line break information out of WebKit now to do this.

Joanmarie Diggs

Comment 11 2009-05-15 20:19:22 PDT

Created attachment 30408 [details] test case Hi Xan. I did some thorough testing of your patch. Observations: 1. Given the variants present in implementations (e.g. Gedit/Gtk+ text objects, OOo Writer, and Gecko), it looks like what you've done is most consistent with Gedit/Gtk+ text objects. Yea! Thanks!! :-) 2. Given the text "This is another, silly test." (See attached test case.) I compared what we get for getText{At,Before,After}Offset from Gedit and what we now get from WebKit: * TEXT_BOUNDARY_CHAR is the same in both. * TEXT_BOUNDARY_WORD_{START,END} is essentially the same, however: - When we do a getTextAtOffset(i, TEXT_BOUNDARY_WORD_END) where i is an offset of a character in the first word, Gedit (and OOo Writer, and Firefox) return the first word. I think the idea is that there *is* a word *at* the specified offset. From the test case I described above: i Gedit WORD_END WebKit WORD_END 0 ('This', 0, 4) ('', 0, 0) 1 ('This', 0, 4) ('', 0, 0) 2 ('This', 0, 4) ('', 0, 0) 3 ('This', 0, 4) ('', 0, 0) A similar issue can be seen with getTextBeforeOffset. Both Gedit and WebKit return ('', 0, 0) as expected when i is the offset of a character in the first word, but WebKit continues to do so until i seems to be the offset of the end of the 2nd word: i Gedit WORD_END WebKit WORD_END 3 ('', 0, 0) ('', 0, 0) 4 ('This', 0, 4) ('', 0, 0) 5 ('This', 0, 4) ('', 0, 0) 6 ('This', 0, 4) ('', 0, 0) - Similarly for getTextAtOffset(i, TEXT_BOUNDARY_WORD_START) where i is an offset of a character in the final word: i Gedit WORD_START WebKit WORD_START 23 ('test.', 23, 28) ('', 0, 0) 24 ('test.', 23, 28) ('', 0, 0) 25 ('test.', 23, 28) ('', 0, 0) 26 ('test.', 23, 28) ('', 0, 0) 27 ('test.', 23, 28) ('', 0, 0) - There's also a difference with TEXT_BOUNDARY_WORD_END when the offset is for a space: i Gedit WORD_END WebKit WORD_END 7 (' another', 7, 15) (' is', 4, 7) 15 (', silly', 15, 22) (' another', 7, 15) 22 (' test', 22, 27) (', silly', 15, 22) Firefox does the same thing as Gedit. OOo Writer returns ('', 0, 0), but don't think that makes as much sense. - Calling getTextAfterOffset with either word boundary type will eventually crash the GtkLauncher with the following error: GLib-ERROR **: /build/buildd/glib2.0-2.20.1/glib/gmem.c:136: failed to allocate 4294967269 bytes aborting... Aborted See test case below. * TEXT_BOUNDARY_LINE_{START,END} have not been implemented, as you indicated in your comments. * TEXT_BOUNDARY_SENTENCE_END when used with an offset in the first/only sentence results in ('', 0, 0) instead of that sentence. * TEXT_BOUNDARY_SENTENCE_START when used with an offset in the first/only sentence results in GtkLauncher crashing with the above Glib error. See test case below. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Tests to reproduce the aforementioned crashes: 1. Open the attached test case in GtkLauncher. 2. Using Accerciser, select the text object in the tree on the left. 3. In the IPython console: Test 1: acc.queryText().getTextAtOffset(0, TEXT_BOUNDARY_SENTENCE_START) Test 2: acc.queryText().getTextAfterOffset(26, TEXT_BOUNDARY_WORD_START) Hope this helps!

Xan Lopez

Comment 12 2009-05-19 03:36:53 PDT

Created attachment 30465 [details] gettextv3.patch OK, I think I've fixed all issues raised in the last comment (most of them were really special cases of the same bug: "Not taking into account being in the first or last word/sentence"). I've also fixed the crashers, which were due to not doing error checking properly in some cases. LINE_{START,END} is still missing from the patch.

Jan Alonzo

Comment 13 2009-05-20 07:22:35 PDT

(In reply to comment #12) > Created an attachment (id=30465) [review] > gettextv3.patch > > OK, I think I've fixed all issues raised in the last comment (most of them were > really special cases of the same bug: "Not taking into account being in the > first or last word/sentence"). I've also fixed the crashers, which were due to > not doing error checking properly in some cases. Hi Xan. Is it possible to split this patch into bite-sized chunks so it's easier to review? Thanks.

Xan Lopez

Comment 14 2009-05-20 07:32:13 PDT

(In reply to comment #13) > (In reply to comment #12) > > Created an attachment (id=30465) [review] [review] > > gettextv3.patch > > > > OK, I think I've fixed all issues raised in the last comment (most of them were > > really special cases of the same bug: "Not taking into account being in the > > first or last word/sentence"). I've also fixed the crashers, which were due to > > not doing error checking properly in some cases. > > Hi Xan. Is it possible to split this patch into bite-sized chunks so it's > easier to review? Thanks. > Well, I'm not sure. I mean, the bulk of it is the getTextHelper function, and that can't be really splitted too much (I could do a first version only supporting the CHAR boundary and then add everything else perhaps, but that's a bit artificial), since the point of it is that it manages to implement 4 boundaries with the same code. Everything else are either little helper functions for getTextHelper, the test file, or the actual hooking of the code into the ATK API, which is trivial. What are you having problems with?

Joanmarie Diggs

Comment 15 2009-05-20 18:16:34 PDT

(In reply to comment #12) > Created an attachment (id=30465) [review] > gettextv3.patch > > OK, I think I've fixed all issues raised in the last comment (most of them were > really special cases of the same bug: "Not taking into account being in the > first or last word/sentence"). I've also fixed the crashers, which were due to > not doing error checking properly in some cases. In terms of testing, so far so good. Awesome!! > LINE_{START,END} is still missing from the patch. Understood. Having that is quite important, but I don't see it as a reason to hold off committing what you've done so far.

Joanmarie Diggs

Comment 16 2009-05-21 18:42:15 PDT

Xan, given your work on this bug and other recent fixes, I'm thinking we're approaching the point where it would be worth getting a skeleton/debugging/testing script for WebKit in place in Orca. (Up to now, I've done all testing in Accerciser.) Unfortunately, the mere act of using the GtkLauncher or Epiphany while Orca is running crashes GtkLauncher/Epiphany 100% of the time with this error: ~~~ ERROR:WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp:628:gchar* getTextHelper(GetTextFunctionType, AtkText*, gint, AtkTextBoundary, gint*, gint*): code should not be reached Aborted ~~~~ Looking at what code in Orca was triggering the crash, it turned out to be attempting to get the text of a line. I realize that you're still working on getting the visible line. The problem is that while Orca speaks the character, word, etc. based on how the user is navigating, it ALWAYS brailles the line. Hence the constant crashes. I'm guessing that the problem is the assertion at the bottom of this block: + if (boundaryType == ATK_TEXT_BOUNDARY_WORD_START) + predicate = isWordStart; + else if (boundaryType == ATK_TEXT_BOUNDARY_WORD_END) + predicate = isWordEnd; + else if (boundaryType == ATK_TEXT_BOUNDARY_SENTENCE_START) + predicate = isSentenceStart; + else if (boundaryType == ATK_TEXT_BOUNDARY_SENTENCE_END) + predicate = isSentenceEnd; + else + g_assert_not_reached(); If so, would it be possible to return ('', 0, 0) instead? If you'd like, you could also display a grumpy error message. ;-) ;-) That way, while you're working on the line implementation, I could start implementing support in Orca for things which are not lines. If this would be more hassle than it's worth, no worries, I'll wait. Thanks either way!

Jan Alonzo

Comment 17 2009-05-23 00:00:53 PDT

Comment on attachment 30465 [details] gettextv3.patch > -static gchar* webkit_accessible_text_get_text_after_offset(AtkText* text, gint offset, AtkTextBoundary boundary_type, gint* start_offset, gint* end_offset) > +typedef enum { > + AfterOffset, > + AtOffset, > + BeforeOffset > +} GetTextFunctionType; enum GetTextFunctionType would do. > +typedef int (*advanceFunc) (int); > + > +static int increaseInt(int i) { > + return i + 1; > +} > + > +static int decreaseInt(int i) { > + return i - 1; > +} > + > +typedef enum { > + DirectionForward, > + DirectionBackwards > +} Direction; See enum comment above. > + > +static bool findCharacterAttribute(isCharacterAttribute predicateFunction, PangoLogAttr* attributes, Direction direction, int startOffset, int attrsLength, int* resultOffset) > +{ attrsLength should be unsigned. can starOffset be negative here? > + int i; You can move this in for loop below. > + advanceFunc advanceFunc = direction == DirectionForward ? increaseInt : decreaseInt; > + > + *resultOffset = -1; > + > + for (i = startOffset; i >= 0 && i < attrsLength; i = advanceFunc(i)) { int i = startOffset; > + if (predicateFunction(attributes+i)) { there should be spaces in attributes+1 > +static bool findCharacterAttributeSkip(isCharacterAttribute predicateFunction, unsigned skip, PangoLogAttr* attributes, Direction direction, int startOffset, int attrsLength, int* resultOffset) > +{ > + int tmpOffset; > + bool retValue; > + > + retValue = findCharacterAttribute(predicateFunction, attributes, direction, startOffset, attrsLength, &tmpOffset); Better to assign value in the declaration. > +static gchar* getTextHelper(GetTextFunctionType getTextFunctionType, AtkText* textObject, gint offset, AtkTextBoundary boundaryType, gint* startOffset, gint* endOffset) > +{ > + AccessibilityObject* coreObject = core(textObject); > + String text; > + > + *startOffset = *endOffset = -1; > + > + if (coreObject->isTextControl()) > + text = coreObject->text(); > + else > + text = coreObject->textUnderElement(); > + > + char* cText = g_strdup(text.utf8().data()); > + glong textLength = g_utf8_strlen(cText, -1); -1 means cText is nul-terminated. Is this going to be the case? Can't we just use strlen here? > + if (boundaryType == ATK_TEXT_BOUNDARY_CHAR) { > + int effectiveOffset; > + > + switch(getTextFunctionType) { space between switch and (. > + case AfterOffset: > + effectiveOffset = offset + 1; > + break; > + case BeforeOffset: > + effectiveOffset = offset - 1; > + break; > + case AtOffset: > + effectiveOffset = offset; > + break; > + default: > + g_assert_not_reached(); > + } Is g_asset_not_reached the correct behaviour here? Can't we set effectiveOffset to NULL in this case? > + *startOffset = effectiveOffset; > + *endOffset = effectiveOffset + 1; what should be the behaviour if effectiveOffset is NULL? should we set start/end offsets to 0? > + } else { > + PangoLogAttr* attrs = g_new(PangoLogAttr, textLength+1); spaces in textLength+1 > + PangoLanguage* language = pango_language_get_default(); > + pango_get_log_attrs(cText, -1, -1, language, attrs, textLength+1); ditto. > + > + isCharacterAttribute predicate; > + > + if (boundaryType == ATK_TEXT_BOUNDARY_WORD_START) > + predicate = isWordStart; > + else if (boundaryType == ATK_TEXT_BOUNDARY_WORD_END) > + predicate = isWordEnd; > + else if (boundaryType == ATK_TEXT_BOUNDARY_SENTENCE_START) > + predicate = isSentenceStart; > + else if (boundaryType == ATK_TEXT_BOUNDARY_SENTENCE_END) > + predicate = isSentenceEnd; > + else > + g_assert_not_reached(); what should be the default behaviour here? > + > + switch(boundaryType) { space between switch and (. > + case ATK_TEXT_BOUNDARY_WORD_START: > + case ATK_TEXT_BOUNDARY_SENTENCE_START: > + if (getTextFunctionType == AfterOffset) { > + // Take the item after the current one in any case > + findCharacterAttribute(predicate, attrs, DirectionForward, offset+1, textLength + 1, startOffset); > + findCharacterAttributeSkip(predicate, 1, attrs, DirectionForward, offset+1, textLength + 1, endOffset); Spaces in offset+1. Also, what's the difference between the two? > + } else if (getTextFunctionType == AtOffset) { > + // Take the item at point if the offset is in an item or > + // the item before otherwise > + findCharacterAttribute(predicate, attrs, DirectionBackwards, offset, textLength + 1, startOffset); > + if (!findCharacterAttribute(predicate, attrs, DirectionForward, offset+1, textLength + 1, endOffset)) { > + findCharacterAttribute(oppositePredicate(predicate), attrs, DirectionForward, offset+1, textLength + 1, endOffset); spaces in offset+1. > + findCharacterAttribute(predicate, attrs, DirectionForward, offset+1, textLength + 1, endOffset); ditto. > + } else { > + // Take the item before the point in any case > + if (!findCharacterAttributeSkip(predicate, 1, attrs, DirectionBackwards, offset, textLength + 1, startOffset)) { > + int tmpOffset; > + // No match before offset, take the first opposite match at or before the offset > + findCharacterAttribute(predicate, attrs, DirectionBackwards, offset, textLength + 1, &tmpOffset); > + findCharacterAttribute(oppositePredicate(predicate), attrs, DirectionBackwards, tmpOffset-1, textLength + 1, startOffset); spaces in tmpOffset-1. > + default: > + g_assert_not_reached(); is asserting the right behaviour here? > + char* start = g_utf8_offset_to_pointer(cText, (glong)*startOffset); > + char* end = g_utf8_offset_to_pointer(cText, (glong)*endOffset); > + char* resultText = g_strndup(start, end - start); > + g_free(cText); > + return resultText; should we g_strdup resultText? > +} > + > +static gchar* webkit_accessible_text_get_text_after_offset(AtkText* text, gint offset, AtkTextBoundary boundaryType, gint* startOffset, gint* endOffset) > +{ > + return getTextHelper(AfterOffset, text, offset, boundaryType, startOffset, endOffset); > +} > + > +static gchar* webkit_accessible_text_get_text_at_offset(AtkText* text, gint offset, AtkTextBoundary boundaryType, gint* startOffset, gint* endOffset) > +{ > + return getTextHelper(AtOffset, text, offset, boundaryType, startOffset, endOffset); > } > > -static gchar* webkit_accessible_text_get_text_before_offset(AtkText* text, gint offset, AtkTextBoundary boundary_type, gint* start_offset, gint* end_offset) > +static gchar* webkit_accessible_text_get_text_before_offset(AtkText* text, gint offset, AtkTextBoundary boundaryType, gint* startOffset, gint* endOffset) > +{ > + return getTextHelper(BeforeOffset, text, offset, boundaryType, startOffset, endOffset); > +} > + > +static gunichar webkit_accessible_text_get_character_at_offset(AtkText* text, gint offset) I think we should start using WebKit-style for the function names here. r- for the style issues and the crash Joanmarie reported in comment #16. Would also be nice to work out the default behaviour instead of just asserting.

Jan Alonzo

Comment 18 2009-05-23 00:05:47 PDT

(In reply to comment #16) > Xan, given your work on this bug and other recent fixes, I'm thinking we're > approaching the point where it would be worth getting a > skeleton/debugging/testing script for WebKit in place in Orca. Hi Joanmarie. Apart from the unit tests this patch introduces, I'm also working to get our accessibility layout tests working in WebKit so we can identify AX issues in our layout (and hopefully we'd be able to detect possible crashes like this).

Xan Lopez

Comment 19 2009-05-23 00:40:16 PDT

(In reply to comment #17) > (From update of attachment 30465 [details] [review]) > > -static gchar* webkit_accessible_text_get_text_after_offset(AtkText* text, gint offset, AtkTextBoundary boundary_type, gint* start_offset, gint* end_offset) > > +typedef enum { > > + AfterOffset, > > + AtOffset, > > + BeforeOffset > > +} GetTextFunctionType; > > enum GetTextFunctionType would do. Ah, right, C habits :) > > > + > > +static bool findCharacterAttribute(isCharacterAttribute predicateFunction, PangoLogAttr* attributes, Direction direction, int startOffset, int attrsLength, int* resultOffset) > > +{ > > attrsLength should be unsigned. can starOffset be negative here? Right, I think both can be unsigned. > > > + int i; > > You can move this in for loop below. Right. > > > + advanceFunc advanceFunc = direction == DirectionForward ? increaseInt : decreaseInt; > > + > > + *resultOffset = -1; > > + > > + for (i = startOffset; i >= 0 && i < attrsLength; i = advanceFunc(i)) { > > int i = startOffset; Right. > > > + if (predicateFunction(attributes+i)) { > > there should be spaces in attributes+1 Right. > > > +static bool findCharacterAttributeSkip(isCharacterAttribute predicateFunction, unsigned skip, PangoLogAttr* attributes, Direction direction, int startOffset, int attrsLength, int* resultOffset) > > +{ > > + int tmpOffset; > > + bool retValue; > > + > > + retValue = findCharacterAttribute(predicateFunction, attributes, direction, startOffset, attrsLength, &tmpOffset); > > Better to assign value in the declaration. Right. > > > +static gchar* getTextHelper(GetTextFunctionType getTextFunctionType, AtkText* textObject, gint offset, AtkTextBoundary boundaryType, gint* startOffset, gint* endOffset) > > +{ > > + AccessibilityObject* coreObject = core(textObject); > > + String text; > > + > > + *startOffset = *endOffset = -1; > > + > > + if (coreObject->isTextControl()) > > + text = coreObject->text(); > > + else > > + text = coreObject->textUnderElement(); > > + > > + char* cText = g_strdup(text.utf8().data()); > > + glong textLength = g_utf8_strlen(cText, -1); > > -1 means cText is nul-terminated. Is this going to be the case? Can't we just > use strlen here? Yes, it's NULL terminated. strlen() is wrong, that gives the number of bytes, I need the number of characters (and anyway strlen would crash just the same if it were not NULL terminated? Or you meant that as a different issue). > > > + if (boundaryType == ATK_TEXT_BOUNDARY_CHAR) { > > + int effectiveOffset; > > + > > + switch(getTextFunctionType) { > > space between switch and (. Right. > > > + case AfterOffset: > > + effectiveOffset = offset + 1; > > + break; > > + case BeforeOffset: > > + effectiveOffset = offset - 1; > > + break; > > + case AtOffset: > > + effectiveOffset = offset; > > + break; > > + default: > > + g_assert_not_reached(); > > + } > > Is g_asset_not_reached the correct behaviour here? Can't we set effectiveOffset > to NULL in this case? Since there are only three functions calling this, it's in theory impossible that the default would be reached here, so I think an assert is OK (IMHO asserts should be used for 'this is in theory impossible, or should never happen'). > > > + *startOffset = effectiveOffset; > > + *endOffset = effectiveOffset + 1; > > what should be the behaviour if effectiveOffset is NULL? should we set > start/end offsets to 0? What you mean with NULL exactly? effectiveOffset is a number, not a pointer. > > > + } else { > > + PangoLogAttr* attrs = g_new(PangoLogAttr, textLength+1); > > spaces in textLength+1 Right. > > > + PangoLanguage* language = pango_language_get_default(); > > + pango_get_log_attrs(cText, -1, -1, language, attrs, textLength+1); > > ditto. Right. > > > + > > + isCharacterAttribute predicate; > > + > > + if (boundaryType == ATK_TEXT_BOUNDARY_WORD_START) > > + predicate = isWordStart; > > + else if (boundaryType == ATK_TEXT_BOUNDARY_WORD_END) > > + predicate = isWordEnd; > > + else if (boundaryType == ATK_TEXT_BOUNDARY_SENTENCE_START) > > + predicate = isSentenceStart; > > + else if (boundaryType == ATK_TEXT_BOUNDARY_SENTENCE_END) > > + predicate = isSentenceEnd; > > + else > > + g_assert_not_reached(); > > what should be the default behaviour here? OK, this is wrong now, because we miss the LINE boundary implementations, and this will crash if someone tries to use them. I think I should just set *startOffset and *endOffset to -1 beforehand and here do a goto to the end of the function (as a temporary solution until everything is implemented). When LINE boundary is done I think an assert would be again correct, since that last option should never be reached. > > > + > > + switch(boundaryType) { > > space between switch and (. Right. > > > + case ATK_TEXT_BOUNDARY_WORD_START: > > + case ATK_TEXT_BOUNDARY_SENTENCE_START: > > + if (getTextFunctionType == AfterOffset) { > > + // Take the item after the current one in any case > > + findCharacterAttribute(predicate, attrs, DirectionForward, offset+1, textLength + 1, startOffset); > > + findCharacterAttributeSkip(predicate, 1, attrs, DirectionForward, offset+1, textLength + 1, endOffset); > > Spaces in offset+1. Also, what's the difference between the two? Sorry, between what two? > > > + } else if (getTextFunctionType == AtOffset) { > > + // Take the item at point if the offset is in an item or > > + // the item before otherwise > > + findCharacterAttribute(predicate, attrs, DirectionBackwards, offset, textLength + 1, startOffset); > > + if (!findCharacterAttribute(predicate, attrs, DirectionForward, offset+1, textLength + 1, endOffset)) { > > + findCharacterAttribute(oppositePredicate(predicate), attrs, DirectionForward, offset+1, textLength + 1, endOffset); > > spaces in offset+1. Right. > > > + findCharacterAttribute(predicate, attrs, DirectionForward, offset+1, textLength + 1, endOffset); > > ditto. > > > + } else { > > + // Take the item before the point in any case > > + if (!findCharacterAttributeSkip(predicate, 1, attrs, DirectionBackwards, offset, textLength + 1, startOffset)) { > > + int tmpOffset; > > + // No match before offset, take the first opposite match at or before the offset > > + findCharacterAttribute(predicate, attrs, DirectionBackwards, offset, textLength + 1, &tmpOffset); > > + findCharacterAttribute(oppositePredicate(predicate), attrs, DirectionBackwards, tmpOffset-1, textLength + 1, startOffset); > > spaces in tmpOffset-1. > > > + default: > > + g_assert_not_reached(); > > is asserting the right behaviour here? Since we will exit early for unknown boundaries, I think it is. > > > + char* start = g_utf8_offset_to_pointer(cText, (glong)*startOffset); > > + char* end = g_utf8_offset_to_pointer(cText, (glong)*endOffset); > > + char* resultText = g_strndup(start, end - start); > > + g_free(cText); > > + return resultText; > > should we g_strdup resultText? I am already doing that, see two lines up. > > > +} > > + > > +static gchar* webkit_accessible_text_get_text_after_offset(AtkText* text, gint offset, AtkTextBoundary boundaryType, gint* startOffset, gint* endOffset) > > +{ > > + return getTextHelper(AfterOffset, text, offset, boundaryType, startOffset, endOffset); > > +} > > + > > +static gchar* webkit_accessible_text_get_text_at_offset(AtkText* text, gint offset, AtkTextBoundary boundaryType, gint* startOffset, gint* endOffset) > > +{ > > + return getTextHelper(AtOffset, text, offset, boundaryType, startOffset, endOffset); > > } > > > > -static gchar* webkit_accessible_text_get_text_before_offset(AtkText* text, gint offset, AtkTextBoundary boundary_type, gint* start_offset, gint* end_offset) > > +static gchar* webkit_accessible_text_get_text_before_offset(AtkText* text, gint offset, AtkTextBoundary boundaryType, gint* startOffset, gint* endOffset) > > +{ > > + return getTextHelper(BeforeOffset, text, offset, boundaryType, startOffset, endOffset); > > +} > > + > > +static gunichar webkit_accessible_text_get_character_at_offset(AtkText* text, gint offset) > > I think we should start using WebKit-style for the function names here. Next patch perhaps, to do it all at once? > > r- for the style issues and the crash Joanmarie reported in comment #16. Would > also be nice to work out the default behaviour instead of just asserting. > OK, thanks!

Xan Lopez

Comment 20 2009-05-23 03:39:59 PDT

Created attachment 30614 [details] gettextv4.patch This fixes all issues raised in the comment, plus the crash with LINE boundary and tests to check that using LINE boundary does not crash.

Gustavo Noronha (kov)

Comment 21 2009-05-28 10:40:38 PDT

Comment on attachment 30614 [details] gettextv4.patch > +typedef int (*advanceFunc) (int); > + > +static int increaseInt(int i) { > + return i + 1; > +} > + > +static int decreaseInt(int i) { > + return i - 1; > +} [...] > +static bool findCharacterAttribute(isCharacterAttribute predicateFunction, PangoLogAttr* attributes, Direction direction, int startOffset, int attrsLength, int* resultOffset) > +{ > + advanceFunc advanceFunc = direction == DirectionForward ? increaseInt : decreaseInt; > + > + *resultOffset = -1; > + > + for (int i = startOffset; i >= 0 && i < attrsLength; i = advanceFunc(i)) { The patch looks good to me, except for this nitpick. I think using the advanceFunc strategy here is a bit overblown. You should be able to use a simple int variable here. int advanceBy = direction == DirectionForward ? 1 : -1; for (int i = startOffset; i >= 0 && i < attrsLength; i = i + advanceBy) { If there's a reason that escaped my eye, please not in a comment =).

Jan Alonzo

Comment 22 2009-05-29 16:00:46 PDT

Patch landed in r44246.

Xan Lopez

Comment 23 2009-05-29 19:15:55 PDT

The line boundaries are still not implemented, this bug is not fixed yet.

Jan Alonzo

Comment 24 2009-06-05 19:58:18 PDT

Comment on attachment 30614 [details] gettextv4.patch clearing review flag so it won't appear in the commit queue.

Xan Lopez

Comment 25 2009-06-08 07:08:54 PDT

Created attachment 31049 [details] gailtextutil.patch This uses GailTextUtil to implement the get_text functions, instead of my own lesser implementation. It should add support for all boundary types, but that's untested for now (I'll move into this now). It still passes all the tests we have though, fixing two bugs/oversights on top of that.

Xan Lopez

Comment 26 2009-06-08 08:31:24 PDT

Created attachment 31051 [details] uselayout.patch Pass a properly initialized PangoLayout to GailTextUtil, needed when using LINE boundaries.

Xan Lopez

Comment 27 2009-06-08 08:31:52 PDT

Created attachment 31052 [details] unifytextaccess.patch Reduce some duplicated code.

Jan Alonzo

Comment 28 2009-06-10 04:05:28 PDT

Comment on attachment 31049 [details] gailtextutil.patch > -static bool findCharacterAttributeSkip(isCharacterAttribute predicateFunction, unsigned skip, PangoLogAttr* attributes, Direction direction, int startOffset, int attrsLength, int* resultOffset) > +static GailTextUtil* getGailTextUtilForAtk(AtkText* textObject) > { > - int tmpOffset; > + gpointer data = g_object_get_data(G_OBJECT(textObject), "webkit-accessible-gail-text-util"); > static gchar* webkit_accessible_text_get_text_after_offset(AtkText* text, gint offset, AtkTextBoundary boundaryType, gint* startOffset, gint* endOffset) > { > - return getTextHelper(AfterOffset, text, offset, boundaryType, startOffset, endOffset); > + return gail_text_util_get_text(getGailTextUtilForAtk(text), NULL, GAIL_AFTER_OFFSET, boundaryType, offset, startOffset, endOffset); > } > > static gchar* webkit_accessible_text_get_text_at_offset(AtkText* text, gint offset, AtkTextBoundary boundaryType, gint* startOffset, gint* endOffset) > { > - return getTextHelper(AtOffset, text, offset, boundaryType, startOffset, endOffset); > + return gail_text_util_get_text(getGailTextUtilForAtk(text), NULL, GAIL_AT_OFFSET, boundaryType, offset, startOffset, endOffset); > } > > static gchar* webkit_accessible_text_get_text_before_offset(AtkText* text, gint offset, AtkTextBoundary boundaryType, gint* startOffset, gint* endOffset) > { > - return getTextHelper(BeforeOffset, text, offset, boundaryType, startOffset, endOffset); > + return gail_text_util_get_text(getGailTextUtilForAtk(text), NULL, GAIL_BEFORE_OFFSET, boundaryType, offset, startOffset, endOffset); > } Would be nice if we can refactor these in the future. Looks fine. Are we OK with depending on libgail? If so, r=me.

Jan Alonzo

Comment 29 2009-06-10 04:36:46 PDT

Comment on attachment 31051 [details] uselayout.patch > From f00e7bd148421009d2178db814c4ed4cb20c1074 Mon Sep 17 00:00:00 2001 > From: Xan Lopez <xlopez@igalia.com> > Date: Mon, 8 Jun 2009 18:20:51 +0300 > Subject: [PATCH] 2009-06-08 Xan Lopez <xlopez@igalia.com> > > Reviewed by NOBODY (OOPS!). > > https://bugs.webkit.org/show_bug.cgi?id=25415 > [GTK][ATK] Please implement support for get_text_at_offset > > Pass a PangoLayout to the GailTextUtil function calls. > > It's needed for LINE boundary calls to work correctly. > > * accessibility/gtk/AccessibilityObjectWrapperAtk.cpp: > (updateLayout): > (getPangoLayoutForAtk): > (webkit_accessible_text_get_text_after_offset): > (webkit_accessible_text_get_text_at_offset): > --- > WebCore/ChangeLog | 17 ++++++++ > .../gtk/AccessibilityObjectWrapperAtk.cpp | 41 ++++++++++++++++++- > 2 files changed, 55 insertions(+), 3 deletions(-) > > diff --git a/WebCore/ChangeLog b/WebCore/ChangeLog > index 0334758..694f595 100644 > --- a/WebCore/ChangeLog > +++ b/WebCore/ChangeLog > @@ -5,6 +5,23 @@ > https://bugs.webkit.org/show_bug.cgi?id=25415 > [GTK][ATK] Please implement support for get_text_at_offset > > + Pass a PangoLayout to the GailTextUtil function calls. > + > + It's needed for LINE boundary calls to work correctly. > + > + * accessibility/gtk/AccessibilityObjectWrapperAtk.cpp: > + (updateLayout): > + (getPangoLayoutForAtk): > + (webkit_accessible_text_get_text_after_offset): > + (webkit_accessible_text_get_text_at_offset): > + > +2009-06-08 Xan Lopez <xlopez@igalia.com> > + > + Reviewed by NOBODY (OOPS!). > + > + https://bugs.webkit.org/show_bug.cgi?id=25415 > + [GTK][ATK] Please implement support for get_text_at_offset > + > Use GailUtilText instead of my crappy partial reimplementation of > it. This should add support for LINE boundaries too, although it's > mostly untested for now. > diff --git a/WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp b/WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp > index e70aaa0..bddac93 100644 > --- a/WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp > +++ b/WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp > @@ -41,6 +41,7 @@ > #include "Editor.h" > #include "Frame.h" > #include "FrameView.h" > +#include "HostWindow.h" > #include "HTMLNames.h" > #include "IntRect.h" > #include "NotImplemented.h" > @@ -503,19 +504,53 @@ static GailTextUtil* getGailTextUtilForAtk(AtkText* textObject) > return gailTextUtil; > } > > +// We can use the same callback for both 'style-set' and > +// 'direction-changed', since we don't care about neither of their > +// second parameters. > +static void updateLayout(GtkWidget* widget, gpointer dummy, gpointer userData) > +{ > + gpointer data = g_object_get_data(G_OBJECT(userData), "webkit-accessible-pango-layout"); > + if (!data) > + return; > + > + pango_layout_context_changed(static_cast<PangoLayout*>(data)); > +} > + > +static PangoLayout* getPangoLayoutForAtk(AtkText* textObject) > +{ > + gpointer data = g_object_get_data(G_OBJECT(textObject), "webkit-accessible-pango-layout"); > + if (data) > + return static_cast<PangoLayout*>(data); > + > + String text; > + AccessibilityObject* coreObject = core(textObject); > + > + if (coreObject->isTextControl()) > + text = coreObject->text(); > + else > + text = coreObject->textUnderElement(); > + > + PlatformWidget webView = coreObject->document()->view()->hostWindow()->platformWindow(); Might be good to null-check hostWindow() and maybe platformWindow() too. Looks fine. r=me.

Jan Alonzo

Comment 30 2009-06-10 04:39:20 PDT

Comment on attachment 31052 [details] unifytextaccess.patch r=me

Xan Lopez

Comment 31 2009-06-10 04:48:48 PDT

(In reply to comment #28) > Would be nice if we can refactor these in the future. Mmm, refactor what exactly? > > Looks fine. Are we OK with depending on libgail? If so, r=me. > Ideally a11y in general should be optional I think (uh oh, here we go again), but other than that I believe we are OK depending on it yeah (and Holger and Gustavo agree on this per conversations on IRC).

Jan Alonzo

Comment 32 2009-06-10 05:08:31 PDT

(In reply to comment #31) > (In reply to comment #28) > > Would be nice if we can refactor these in the future. > > Mmm, refactor what exactly? AFAICS, the only difference between these three are whether it's an at/before/after offset. I think we can compress it into one function. > > Looks fine. Are we OK with depending on libgail? If so, r=me. > > > > Ideally a11y in general should be optional I think (uh oh, here we go again), > but other than that I believe we are OK depending on it yeah (and Holger and > Gustavo agree on this per conversations on IRC). Cool

Xan Lopez

Comment 33 2009-06-10 05:11:48 PDT

(In reply to comment #32) > (In reply to comment #31) > > (In reply to comment #28) > > > Would be nice if we can refactor these in the future. > > > > Mmm, refactor what exactly? > > AFAICS, the only difference between these three are whether it's an > at/before/after offset. I think we can compress it into one function. > Well, each one is the implementation of a different AtkText method with identical parameter list, so if we use the same function for all three there would be no way of knowing if we have to do AT/BEFORE/AFTER. So this can't be refactored.

Xan Lopez

Comment 34 2009-06-10 06:11:33 PDT

Patches committed (with the NULL check suggested) as r4455{7,8,9}. Closing the bug, thanks for the review!

Joanmarie Diggs

Comment 35 2009-06-10 17:23:12 PDT

It looks like returning the actual/displayed line isn't working. Instead, it seems like the full text object is being returned. Test I tried: On Orca wiki (live.gnome.org/Orca), under the first heading, there is a short paragraph. I sized my browser window so that the text on screen wrapped across 6 lines. >> acc.queryText().getText(0, -1) 'Before you are tempted to post a note to every mailing list on the planet, like every other user has done, please read this. Speech and audio integration is not working well on Jaunty. You need to install with the blindness profile:' >> acc.queryText().characterCount 232 >> acc.queryText().getTextAtOffset(100, TEXT_BOUNDARY_LINE_START) ('Before you are tempted to post a note to every mailing list on the planet, like every other user has done, please read this. Speech and audio integration is not working well on Jaunty. You need to install with the blindness profile:', 0, 232) Reopening. Sorry.

Jan Alonzo

Comment 36 2009-06-12 15:11:26 PDT

Comment on attachment 31049 [details] gailtextutil.patch Clearing review flag as the patch has landed

Jan Alonzo

Comment 37 2009-06-12 15:11:47 PDT

Comment on attachment 31051 [details] uselayout.patch Clearing review flag as the patch has landed

Jan Alonzo

Comment 38 2009-06-12 15:12:24 PDT

Comment on attachment 31052 [details] unifytextaccess.patch Clearing review flag as the patch has landed

Xan Lopez

Comment 39 2009-07-08 07:21:11 PDT

Created attachment 32451 [details] lineboundary.patch OK, this patch seems to at least get the basics right from my testing. Unfortunately it copies a bit of code from FontGtk.cpp, we need to share it.

Joanmarie Diggs

Comment 40 2009-07-08 14:07:53 PDT

Created attachment 32471 [details] Screenshot illustrating a problem This is definitely a great start! I suspect this is going to take some time to fully sort through. Therefore once things are reasonably close to being correct, I propose we close this bug and open new, specific bugs for problems getting the text of a line. In the meantime.... 1. I found a case where things start out working as expected but then fail. I'm not sure why/what's special at the failure point. Rather than attempt to explain it, I've attached a screenshot so you can see the URL, window size and text wrapping, hierarchy, what I typed, where things worked and failed, etc., etc. 2. After taking the above screenshot, I tried resizing the Epiphany window and getting the lines again. I got the same results as before (i.e. as if the lines hadn't changed).

Joanmarie Diggs

Comment 41 2009-07-08 14:42:49 PDT

Another (hopefully reproducible) example of the line contents getting cut off unexpectedly: In the table of attachments for this bug, highlight the accessible text object which corresponds to the attachment size and do: acc.queryText().getTextAtOffset(0, TEXT_BOUNDARY_LINE_START) Examples: * ('(17.85 KB, p', 0, 12) * ('(5.28 KB, p', 0, 11) * ('(263.30 KB, i', 0, 13)

Gustavo Noronha (kov)

Comment 42 2009-07-12 10:10:32 PDT

Comment on attachment 32451 [details] lineboundary.patch > +static void utf16_to_utf8(const UChar* aText, gint aLength, char* &text, gint &length) > +{ > + gboolean need_copy = FALSE; I believe we want UTF16ToUTF8, and needCopy, here, though the first one looks blergh. Maybe Utf16ToUtf8? Though the style doesn't like that for acronyms... > + if (!aText[i] || IS_LOW_SURROGATE(aText[i])) { > + need_copy = TRUE; > + break; > + } > + else if (IS_HIGH_SURROGATE(aText[i])) { The else if line should be after the if's } here. > + gint new_length = 0; newLength It would be good to have a comment on top of convertUniCharToUTF8 explaining that it is not really general, as the name seems to imply, and that it is concerned with how the layouting is done. I know there's a comment that may lead people to notice this in the middle of the function, but then again, it would be best to make this obvious. The rest looks fine to me, so r=me with the style changes. It's a pitty you had to do all that messing with the text to get it right, but I figure it would be much harder to change the way webcore works.

Xan Lopez

Comment 43 2009-07-12 10:29:56 PDT

Comment on attachment 32451 [details] lineboundary.patch Committed as r45760 with the suggested fixes, removing from queue.

Jan Alonzo

Comment 44 2009-07-15 02:15:40 PDT

(In reply to comment #43) > (From update of attachment 32451 [details]) > Committed as r45760 with the suggested fixes, removing from queue. Are there more upcoming patches for this bug? Can we close it? Thanks.

Joanmarie Diggs

Comment 45 2009-07-15 05:21:34 PDT

(In reply to comment #44) > (In reply to comment #43) > > (From update of attachment 32451 [details] [details]) > > Committed as r45760 with the suggested fixes, removing from queue. > > Are there more upcoming patches for this bug? Can we close it? Thanks. I'm okay with closing it, but then we need to open a new bug for the issues raised in comment 40 and comment 41.

Jan Alonzo

Comment 46 2009-07-15 05:27:20 PDT

(In reply to comment #45) > (In reply to comment #44) > > (In reply to comment #43) > > > (From update of attachment 32451 [details] [details] [details]) > > > Committed as r45760 with the suggested fixes, removing from queue. > > > > Are there more upcoming patches for this bug? Can we close it? Thanks. > > I'm okay with closing it, but then we need to open a new bug for the issues > raised in comment 40 and comment 41. Thanks Joannie for the status. It's fine. I'll leave the bug open.

Jan Alonzo

Comment 47 2009-07-15 05:28:08 PDT

Comment on attachment 32451 [details] lineboundary.patch Clearing review. Patch landed.

Xan Lopez

Comment 48 2009-07-22 05:44:07 PDT

Created attachment 33258 [details] leninbytes.patch Fix a problem in the g_substr function wrt length in characters != length in bytes for utf8 strings.

Xan Lopez

Comment 49 2009-07-22 05:44:39 PDT

Created attachment 33259 [details] textencoding.patch Use TextEncoding facilities for UTF16->UTF8 conversion.

Jan Alonzo

Comment 50 2009-07-24 00:49:53 PDT

Comment on attachment 33259 [details] textencoding.patch > +static gchar* utf8_substr(const gchar* string, gint start, gint end) Style - utf8Substr or similar. > { > gchar* startPtr = g_utf8_offset_to_pointer(string, start); > gsize lenInBytes = g_utf8_offset_to_pointer(string, end) - startPtr + 1; > @@ -581,27 +530,23 @@ static gchar* g_substr(const gchar* string, gint start, gint end) > // internals of WebCore's text presentation. > static gchar* convertUniCharToUTF8(const UChar* characters, gint length, int from, int to) > { > - gchar* utf8 = 0; > - gint newLength = 0; > - UTF16ToUTF8(characters, length, utf8, newLength); > - if (!utf8) > - return NULL; > - > - gchar *pos = g_substr(utf8, from, to); > - g_free(utf8); > - gint len = strlen(pos); > + CString stringUTF8 = UTF8Encoding().encode(characters, length, QuestionMarksForUnencodables); > + gchar* utf8 = utf8_substr(stringUTF8.data(), from, to); > + if (!g_utf8_validate(utf8, -1, NULL)) > + return 0; Please change utf8 to something like utf8String. Also maybe free utf8 as well? It may not validate but that doesn't mean utf8 is NULL either. > + gint len = strlen(utf8); Use gsize here too. r=me.

Jan Alonzo

Comment 51 2009-07-24 01:07:09 PDT

Comment on attachment 33258 [details] leninbytes.patch > static gchar* g_substr(const gchar* string, gint start, gint end) > { > - gsize len = end - start + 1; > - gchar* output = static_cast<gchar*>(g_malloc0(len + 1)); > - return g_utf8_strncpy(output, string +start, len); > + gchar* startPtr = g_utf8_offset_to_pointer(string, start); According to the doc, g_utf8_offset_to_pointer doesn't check if offset is within the boundaries of string. Can we add a check here? > + gsize lenInBytes = g_utf8_offset_to_pointer(string, end) - startPtr + 1; Ditto for 'end' and check if startPtr is 0 first. Looks fine otherwise. r=me.

Xan Lopez

Comment 52 2009-07-24 01:14:14 PDT

Assign bug to myself.

Xan Lopez

Comment 53 2009-07-24 01:44:14 PDT

Comment on attachment 33258 [details] leninbytes.patch Landed as r46342, clearing flags.

Xan Lopez

Comment 54 2009-07-24 01:44:35 PDT

Comment on attachment 33259 [details] textencoding.patch Landed as r46343, clearing flags.

Joanmarie Diggs

Comment 55 2009-07-27 13:04:35 PDT

Xan, I just tried the latest WebKit with the latest Epiphany, and lines are looking much, much better. Thanks! So that leaves two issues: 1. Updating what is returned if the user resizes the window (not a huge big deal, but needs doing), and: 2. When I attempted to verify the issue in comment #41 by performing the exact steps I outlined there, as soon as I pressed Return, Epiphany crashed with the following spewed out without my asking: *** glibc detected *** epiphany: munmap_chunk(): invalid pointer: 0x097beb0b *** ======= Backtrace: ========= /lib/tls/i686/cmov/libc.so.6[0x2403e44] /usr/lib/libglib-2.0.so.0(g_free+0x36)[0x612f16] /usr/lib/libwebkit-1.0.so.2[0x17a2cf7] /usr/lib/libwebkit-1.0.so.2[0x17a3b64] /usr/lib/libatk-1.0.so.0(atk_text_get_text_at_offset+0xb3)[0x1992f3] /usr/lib/libspi.so.0[0x1b46d6a] /usr/lib/libspi.so.0(_ORBIT_skel_small_Accessibility_Text_getTextAtOffset+0x3c)[0x1b3b29c] /usr/lib/libORBit-2.so.0[0x81d537] /usr/lib/libORBit-2.so.0(ORBit_OAObject_invoke+0x35)[0x823b45] /usr/lib/libORBit-2.so.0(ORBit_small_invoke_adaptor+0x113)[0x80fe63] /usr/lib/libORBit-2.so.0[0x821649] /usr/lib/libORBit-2.so.0[0x821d22] /usr/lib/libORBit-2.so.0[0x821ed9] /usr/lib/libORBit-2.so.0(ORBit_handle_request+0x52)[0x823f92] /usr/lib/libORBit-2.so.0(giop_connection_handle_input+0x385)[0x80c155] /usr/lib/libORBit-2.so.0[0x82b743] /usr/lib/libORBit-2.so.0[0x82e016] /usr/lib/libglib-2.0.so.0(g_main_context_dispatch+0x1f8)[0x60abc8] /usr/lib/libglib-2.0.so.0[0x60e470] /usr/lib/libglib-2.0.so.0(g_main_loop_run+0x1bf)[0x60e8df] /usr/lib/libgtk-x11-2.0.so.0(gtk_main+0xb9)[0x1ef1a19] epiphany(main+0x69f)[0x806f15f] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0x23ab7c5] epiphany[0x806e6e1] ======= Memory map: ======== 00110000-00145000 r-xp 00000000 08:01 7192 /usr/lib/libxslt.so.1.1.24 00145000-00146000 r--p 00034000 08:01 7192 /usr/lib/libxslt.so.1.1.24 00146000-00147000 rw-p 00035000 08:01 7192 /usr/lib/libxslt.so.1.1.24 00147000-0017a000 r-xp 00000000 08:01 29469 /usr/lib/libgconf-2.so.4.1.5 0017a000-0017b000 r--p 00033000 08:01 29469 /usr/lib/libgconf-2.so.4.1.5 0017b000-0017d000 rw-p 00034000 08:01 29469 /usr/lib/libgconf-2.so.4.1.5 0017d000-00185000 r-xp 00000000 08:01 7091 /usr/lib/libstartup-notification-1.so.0.0.0 00185000-00186000 r--p 00008000 08:01 7091 /usr/lib/libstartup-notification-1.so.0.0.0 00186000-00187000 rw-p 00009000 08:01 7091 /usr/lib/libstartup-notification-1.so.0.0.0 00187000-001a3000 r-xp 00000000 08:01 6270 /usr/lib/libatk-1.0.so.0.2609.1 001a3000-001a4000 r--p 0001c000 08:01 6270 /usr/lib/libatk-1.0.so.0.2609.1 001a4000-001a5000 rw-p 0001d000 08:01 6270 /usr/lib/libatk-1.0.so.0.2609.1 001a5000-001ce000 r-xp 00000000 08:01 37655 /usr/lib/libpangoft2-1.0.so.0.2400.4 001ce000-001cf000 r--p 00028000 08:01 37655 /usr/lib/libpangoft2-1.0.so.0.2400.4 001cf000-001d0000 rw-p 00029000 08:01 37655 /usr/lib/libpangoft2-1.0.so.0.2400.4 001d0000-001d2000 r-xp 00000000 08:01 6211 /usr/lib/libXinerama.so.1.0.0 001d2000-001d3000 rw-p 00001000 08:01 6211 /usr/lib/libXinerama.so.1.0.0 001d3000-001e2000 r-xp 00000000 08:01 6276 /usr/lib/libavahi-client.so.3.2.5 001e2000-001e3000 r--p 0000f000 08:01 6276 /usr/lib/libavahi-client.so.3.2.5 001e3000-001e4000 rw-p 00010000 08:01 6276 /usr/lib/libavahi-client.so.3.2.5 001e4000-0031d000 r-xp 00000000 08:01 77747 /usr/lib/libxml2.so.2.7.3 0031d000-00321000 r--p 00139000 08:01 77747 /usr/lib/libxml2.so.2.7.3 00321000-00322000 rw-p 0013d000 08:01 77747 /usr/lib/libxml2.so.2.7.3 00322000-00323000 rw-p 00000000 00:00 0 00323000-0032e000 r-xp 00000000 08:01 37654 /usr/lib/libpangocairo-1.0.so.0.2400.4 0032e000-0032f000 r--p 0000a000 08:01 37654 /usr/lib/libpangocairo-1.0.so.0.2400.4 0032f000-00330000 rw-p 0000b000 08:01 37654 /usr/lib/libpangocairo-1.0.so.0.2400.4 00330000-00334000 r-xp 00000000 08:01 57734 /usr/lib/libsoup-gnome-2.4.so.1.2.0 00334000-00335000 r--p 00003000 08:01 57734 /usr/lib/libsoup-gnome-2.4.so.1.2.0 00335000-00336000 rw-p 00004000 08:01 57734 /usr/lib/libsoup-gnome-2.4.so.1.2.0 00336000-00354000 r-xp 00000000 08:01 6385 /usr/lib/libdbus-glib-1.so.2.1.0 00354000-00355000 r--p 0001e000 08:01 6385 /usr/lib/libdbus-glib-1.so.2.1.0 00355000-00356000 rw-p 0001f000 08:01 6385 /usr/lib/libdbus-glib-1.so.2.1.0

Xan Lopez

Comment 56 2009-07-27 13:26:05 PDT

Created attachment 33565 [details] dontcachelayout.patch I believe this should fix the issue with the function not reacting to layout changes. About the crasher, if you can provide a trace with full debug symbols that would help (you need to compile WebKit with --enable-debug), otherwise I'll get to it tomorrow. Thanks!

Joanmarie Diggs

Comment 57 2009-07-27 13:26:44 PDT

Another instance of a segfault with get_text_at_offset: 1. View Google in Epiphany. Don't type anything in the search entry. 2. In Accerciser, locate the search entry in the tree of accessibles on the left. With it highlighted, type the following in the iPython console: acc.queryText().getTextAtOffset(0, TEXT_BOUNDARY_LINE_START) Program received signal SIGSEGV, Segmentation fault. 0x0174fb5c in getPangoLayoutForAtk(_AtkText*) () from /usr/lib/libwebkit-1.0.so.2 Current language: auto; currently asm (gdb) thread apply all bt Thread 1 (Thread 0xb725a740 (LWP 18561)): #0 0x0174fb5c in getPangoLayoutForAtk(_AtkText*) () from /usr/lib/libwebkit-1.0.so.2 #1 0x01750b64 in webkit_accessible_text_get_text_at_offset(_AtkText*, int, AtkTextBoundary, int*, int*) () from /usr/lib/libwebkit-1.0.so.2 #2 0x002ff2f3 in atk_text_get_text_at_offset () from /usr/lib/libatk-1.0.so.0 #3 0x06c94d6a in impl_getTextAtOffset (servant=0xa087564, offset=0, type=Accessibility_TEXT_BOUNDARY_LINE_START, startOffset=0xbf903f90, endOffset=0xbf903f70, ev=0xbf9041dc) at text.c:128 #4 0x06c8929c in _ORBIT_skel_small_Accessibility_Text_getTextAtOffset (_o_servant=0xa087564, _o_retval=0xbf904030, _o_args=0xbf904010, _o_ctx=0xbf9040c8, _o_ev=0xbf9041dc, _impl_getTextAtOffset=0x6c94d20 <impl_getTextAtOffset>) at Accessibility-common.c:744 #5 0x007c0537 in ?? () from /usr/lib/libORBit-2.so.0 #6 0x0a087564 in ?? () #7 0xbf904030 in ?? () #8 0xbf904010 in ?? () #9 0xbf9040c8 in ?? () #10 0xbf9041dc in ?? () #11 0x06c94d20 in ?? () at text.c:190 from /usr/lib/libspi.so.0 #12 0xbf903f48 in ?? () #13 0x007c6b45 in ORBit_OAObject_invoke () from /usr/lib/libORBit-2.so.0 Backtrace stopped: frame did not save the PC

Joanmarie Diggs

Comment 58 2009-07-27 15:10:04 PDT

(In reply to comment #56) > About the crasher, if you can provide a trace with full debug symbols > that would help (you need to compile WebKit with --enable-debug) Program received signal SIGABRT, Aborted. 0x00c5c422 in __kernel_vsyscall () (gdb) thread apply all bt Thread 1 (Thread 0xb71fb740 (LWP 8533)): #0 0x00c5c422 in __kernel_vsyscall () #1 0x058ce4c0 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #2 0x058d1b75 in *__GI_abort () at abort.c:88 #3 0x0590783d in __libc_message (do_abort=2, fmt=0x59e5768 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:192 #4 0x05911e44 in malloc_printerr (action=2, str=0x59e5794 "munmap_chunk(): invalid pointer", ptr=0x942e31b) at malloc.c:5994 #5 0x00901f16 in IA__g_free (mem=0x942e31b) at /build/buildd/glib2.0-2.21.3/glib/gmem.c:190 #6 0x018b5f12 in convertUniCharToUTF8 (characters=0x925b120, length=62, from=15, to=25) at WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp:559 #7 0x018b6102 in getPangoLayoutForAtk (textObject=0x93d4568) at WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp:593 #8 0x018b622f in webkit_accessible_text_get_text_at_offset (text=0x93d4568, offset=0, boundaryType=ATK_TEXT_BOUNDARY_LINE_START, startOffset=0xbfd0aa4c, endOffset=0xbfd0aa48) at WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp:611 #9 0x0077c2f3 in atk_text_get_text_at_offset (text=0x93d4568, offset=0, boundary_type=ATK_TEXT_BOUNDARY_LINE_START, start_offset=0xbfd0aa4c, end_offset=0xbfd0aa48) at atktext.c:419 #10 0x0304fd6a in impl_getTextAtOffset (servant=0x9461644, offset=0, type=Accessibility_TEXT_BOUNDARY_LINE_START, startOffset=0xbfd0ab20, endOffset=0xbfd0ab00, ev=0xbfd0ad6c) at text.c:128 #11 0x0304429c in _ORBIT_skel_small_Accessibility_Text_getTextAtOffset (_o_servant=0x9461644, _o_retval=0xbfd0abc0, _o_args=0xbfd0aba0, _o_ctx=0xbfd0ac58, _o_ev=0xbfd0ad6c, _impl_getTextAtOffset=0x304fd20 <impl_getTextAtOffset>) at Accessibility-common.c:744 #12 0x00a7d537 in ?? () from /usr/lib/libORBit-2.so.0 #13 0x09461644 in ?? () #14 0xbfd0abc0 in ?? () #15 0xbfd0aba0 in ?? () #16 0xbfd0ac58 in ?? () #17 0xbfd0ad6c in ?? () #18 0x0304fd20 in ?? () at text.c:190 from /usr/lib/libspi.so.0 #19 0xbfd0aad8 in ?? () #20 0x00a83b45 in ORBit_OAObject_invoke () from /usr/lib/libORBit-2.so.0 Backtrace stopped: frame did not save the PC

Joanmarie Diggs

Comment 59 2009-07-27 17:24:03 PDT

(In reply to comment #56) > Created an attachment (id=33565) [details] > dontcachelayout.patch > > I believe this should fix the issue with the function not reacting to layout > changes. Seems to. Thanks!! I did find another case/example where but is getting cut off prematurely, however. (Sorry.) At the bottom of http://live.gnome.org/Orca is a line stating who did the last edit. acc.name 'Orca (last edited 2009-07-14 15:13:57 by ' (Note that my window is sized so that the whole thing is on a single line.) acc.queryText().getTextAtOffset(0, TEXT_BOUNDARY_LINE_START) ('Orca (', 0, 6)

Gustavo Noronha (kov)

Comment 60 2009-07-28 03:28:23 PDT

Comment on attachment 33565 [details] dontcachelayout.patch There is also an updateLayout callback that currently updates the patch. rs=me on removing the callback and the signal connections, too.

Xan Lopez

Comment 61 2009-07-28 03:54:14 PDT

Comment on attachment 33565 [details] dontcachelayout.patch Landed as r46466 addressing Gustavo's comments.

Jan Alonzo

Comment 62 2009-07-30 23:58:02 PDT

Comment on attachment 33565 [details] dontcachelayout.patch Clearing review flag as this patch was already committed.

Joanmarie Diggs

Comment 63 2009-11-04 21:17:31 PST

Created attachment 42544 [details] Fix for the issue in comment 59 This seems to fix the issue I pointed out in comment 59, namely: > At the bottom of http://live.gnome.org/Orca is a line stating who did the last > edit. > > acc.name > 'Orca (last edited 2009-07-14 15:13:57 by ' > > (Note that my window is sized so that the whole thing is on a single line.) > > acc.queryText().getTextAtOffset(0, TEXT_BOUNDARY_LINE_START) > ('Orca (', 0, 6) Assuming this solution (or one I generate post-review) is acceptable, I believe that only leaves the invalid pointer issue from comment #55.

Joanmarie Diggs

Comment 64 2009-11-05 00:04:55 PST

(In reply to comment #63) > Assuming this solution (or one I generate post-review) is acceptable, I believe > that only leaves the invalid pointer issue from comment #55. Sorry for the spam. Turns out the segfault reported in comment #57 is still present too....

Joanmarie Diggs

Comment 65 2009-11-08 02:36:48 PST

Created attachment 42712 [details] Fix for the issue in comment 41, 55 It turns out that the segfault would/will occur pretty frequently: <body> hello world </body> is sufficient. What seems to be going on is the modification (corruption?) of utf8String when there is at least one newline char in the source. When we attempt to free the string, GtkLauncher reliably segfaults. The attach patch solves the problem and doesn't seem to have any side effects w.r.t. get_text_at_offset. Is there a reason the line in question needs to remain? Please review. Thanks!

Xan Lopez

Comment 66 2009-11-08 03:37:52 PST

Comment on attachment 42712 [details] Fix for the issue in comment 41, 55 Well, this is the line that makes us advance to the next paragraph boundary. If you remove it I think this function won't work for text chunks with more than one of those...? If removing that makes the crash go away I think it's very likely the problem is some mix-up between lengths in characters and lengths in bytes. What are the indexs the pango function return specified in? Characters? If that is so then advancing the string like we do it there is wrong, and we should use the glib functions for doing so I think.

Joanmarie Diggs

Comment 67 2009-11-08 15:07:10 PST

Created attachment 42720 [details] What about....? (for the comment 41, 55 crasher) (In reply to comment #66) > (From update of attachment 42712 [details]) > Well, this is the line that makes us advance to the next paragraph boundary. Gotcha. Thanks! > If that is so then advancing the string like we do it there is wrong, and we > should use the glib functions for doing so I think. Which ones in particular? I experimented a bit with g_utf8_offset_to_pointer and g_utf8_pointer_to_offset combined with some other changes and nothing I did seemed to make a difference. :-( Then I looked at the g_free() doc: g_free () void g_free (gpointer mem); Frees the memory pointed to by mem. If mem is NULL it simply returns. mem : the memory to free I apologize for being new to C++, and for asking what is probably a silly question, but based on the above.... Is the thing which should be freed the memory at the original location and not the new location resulting from doing: utf8String += start; The attached solves the crasher by resetting utf8String to what it was before. It strikes me as inelegant, assuming it is even (more or less) the right thing to do. So I'm not flagging this patch for review, but asking for enlightenment -- or, better still, for the RightAnswer(tm). :-) Thanks!

Xan Lopez

Comment 68 2009-11-08 15:50:26 PST

(In reply to comment #67) > Then I looked at the g_free() doc: > > g_free () > void g_free (gpointer mem); > Frees the memory pointed to by mem. If mem is NULL it simply returns. > mem : the memory to free > > I apologize for being new to C++, and for asking what is probably a silly > question, but based on the above.... Is the thing which should be freed the > memory at the original location and not the new location resulting from doing: > > utf8String += start; You are totally right, good catch. What you pass to free must be a pointer returned by malloc/calloc/realloc, and you modify it with pointer arithmetic you will very likely confuse the underlying memory management system. > > The attached solves the crasher by resetting utf8String to what it was before. > It strikes me as inelegant, assuming it is even (more or less) the right thing > to do. So I'm not flagging this patch for review, but asking for enlightenment > -- or, better still, for the RightAnswer(tm). :-) Well, this is wrong too, since it will be allocating a new string (and was fixing the problem by means of not actually freeing the original string). What you need to do is to save the original pointer, and just use a temporary one to advance the string. So something like, utf8string = ptr = utf8Substr...; Then use ptr in the loop, and call free on utf8String in the end. > > Thanks!

Joanmarie Diggs

Comment 69 2009-11-08 16:27:27 PST

Created attachment 42722 [details] Use temporary pointer to advance the string (comment 41, 55 crasher) Xan, thank you so much for your help!

Xan Lopez

Comment 70 2009-11-09 00:06:10 PST

Comment on attachment 42722 [details] Use temporary pointer to advance the string (comment 41, 55 crasher) Looks good to me.

WebKit Commit Bot

Comment 71 2009-11-09 00:18:13 PST

Comment on attachment 42722 [details] Use temporary pointer to advance the string (comment 41, 55 crasher) Clearing flags on attachment: 42722 Committed r50641: <http://trac.webkit.org/changeset/50641>

WebKit Commit Bot

Comment 72 2009-11-09 00:18:20 PST

All reviewed patches have been landed. Closing bug.

Joanmarie Diggs

Comment 73 2009-11-09 06:05:47 PST

Reopening. Xan, do you have an opinion on my proposed fix for the issue in comment 59? The one other issue besides the issue in comment 59 is the one in comment 57. I'll look at that one today.

Joanmarie Diggs

Comment 74 2009-11-10 02:39:07 PST

Created attachment 42855 [details] Fix for the issue in comment 57 It turns out that the segfault occurs in all entries and whether there is text or not. (At the time I made the comment in question, entries didn't implement the accessible text interface.) This patch seems to solve the problem and cause get_text_at_offset to do the right thing in single-line and multi-line entries. Note that this patch assumes the patch for comment 59. Not so much in terms of function as in terms of whether this patch will apply cleanly. :-) Also note that I think this is the last of the get_text_at_offset-specific issues. And there was much rejoicing. Yea.

Xan Lopez

Comment 75 2009-11-11 04:52:54 PST

Comment on attachment 42544 [details] Fix for the issue in comment 59 Can't you simply check if 'box' is not null to add the \n? I'm not 100% sure nextOnLineExists does exactly what we want, since I'm not very familiar with these APIs. It might we the right thing, but then we should be probably using netxOnLine() instead of nextTextboxy()? If you feel like checking that in detail feel free to do so, otherwise just check the value of box. r- for now.

Xan Lopez

Comment 76 2009-11-11 04:56:36 PST

Comment on attachment 42855 [details] Fix for the issue in comment 57 Marking r- for now since the previous patch needs some changes. Also, doesn't this has the same problem in the textControl branch with one extra \n? (Hopefully I'll have my computer back tomorrow and I'll be able to review this in detail!)

Joanmarie Diggs

Comment 77 2009-11-11 11:18:45 PST

(In reply to comment #75) > (From update of attachment 42544 [details]) > Can't you simply check if 'box' is not null to add the \n? I'm not 100% sure > nextOnLineExists does exactly what we want, since I'm not very familiar with > these APIs. It might we the right thing, but then we should be probably using > netxOnLine() instead of nextTextboxy()? If you feel like checking that in > detail feel free to do so, otherwise just check the value of box. r- for now. Thanks for the review! Checking to see if 'box' is not null to add the \n is the first thing I tried actually. :-) Test Case: ======================== <html> <head> <title>test</title> </head> <body> hello world This is a silly test. <br /> Here is another line. </body> </html> ======================== Which, of course, results in the following being displayed: --------------------------------- hello world This is a silly test. Here is another line. --------------------------------- My patch: - g_string_append(str, "\n"); + if (!box->nextOnLineExists()) + g_string_append(str, "\n"); In [1]: acc.queryText().getTextAtOffset(0, TEXT_BOUNDARY_LINE_START) Out[1]: ('hello world This is a silly test. ', 0, 34) In [2]: acc.queryText().getTextAtOffset(0, TEXT_BOUNDARY_LINE_START) Out[2]: ('Here is another line.', 0, 21) ~~~~~~~~~~~~~~~~~~~~~~~~~ - g_string_append(str, "\n"); box = box->nextTextBox(); + if (box) + g_string_append(str, "\n"); In [1]: acc.queryText().getTextAtOffset(0, TEXT_BOUNDARY_LINE_START) Out[1]: ('hello world T', 0, 13) In [2]: acc.queryText().getTextAtOffset(0, TEXT_BOUNDARY_LINE_START) Out[2]: ('Here is another line.', 0, 21) ~~~~~~~~~~~~~~~~~~~~~~~~~ - box = box->nextTextBox(); + box = static_cast<InlineTextBox*>(box->nextOnLine()); In [1]: acc.queryText().getTextAtOffset(0, TEXT_BOUNDARY_LINE_START) Out[1]: ('hello world T', 0, 13) In [2]: acc.queryText().getTextAtOffset(0, TEXT_BOUNDARY_LINE_START) Out[2]: ('Here is another line.', 0, 21) ~~~~~~~~~~~~~~~~~~~~~~~~~ - g_string_append(str, "\n"); - box = box->nextTextBox(); + box = static_cast<InlineTextBox*>(box->nextOnLine()); In [1]: acc.queryText().getTextAtOffset(0, TEXT_BOUNDARY_LINE_START) Out[1]: ('hello world This is a silly test. ', 0, 34) In [2]: acc.queryText().getTextAtOffset(0, TEXT_BOUNDARY_LINE_START) Out[2]: ('Here is another line.', 0, 21) So this looks like a promising alternative as long as we can be certain that there aren't occasions where nextOnLine would fail, but nextTextBox would succeed. Note that I'm not suggesting that those occasions do or will exist; merely that I'm not yet familiar enough with WebKit to rule out definitively that they will not. :-) Also note that my proposed patch was based on the assumption that nextTextBox was being used for a specific reason/test case, and that appending a newline char was also done for a specific reason/test case. If my assumption was wrong, well.... :-) :-) Thoughts?

Joanmarie Diggs

Comment 78 2009-11-21 23:05:45 PST

Xan and I discussed this further via IRC, and he asked me to dig into this a bit more so that we are sure we understand what's taking place. So dug. :-) Same test case as the one outlined in comment #77 with one line altered to have an image. [...] Here is another <img src="missing.gif" alt="junk-filled" /> line. [...] Accessible Hierarchy: -> Document Frame -> Panel -> hello world This is a silly test. (Text) -> Here is another (Text) -> (Image) -> line. (Text) ------------------------------------------------------------- 1st text object, Window sized so that text wraps after 'This' ------------------------------------------------------------- accObject->stringValue(): <<hello world This is a silly test. >> renderText->textLength(): 45 box: 1 text: <<hello world >> box->start(): 1 box->end(): 12 nextTextBox is NULL: 0 nextOnLineExists: 1 box: 2 text: <<This>> box->start(): 15 box->end(): 18 nextTextBox is NULL: 0 nextOnLineExists: 0 box: 3 text: <<is >> box->start(): 20 box->end(): 22 nextTextBox is NULL: 0 nextOnLineExists: 1 box: 4 text: <<a >> box->start(): 25 box->end(): 26 nextTextBox is NULL: 0 nextOnLineExists: 1 box: 5 text: <<silly test. >> box->start(): 32 box->end(): 43 nextTextBox is NULL: 1 nextOnLineExists: 0 ----------------- Observations: 1. Newline chars in the source document result in separate boxes. 2. Newline chars in the source document impact the start and end offsets of boxes. 3. We'll also have a separate box when the displayed text in an object wraps onto subsequent lines. 4. For text which is displayed on a single line, nextTextBox will be the same as nextOnLine, thus it really doesn't matter which one we use. 5. For text within an object which wraps onto subsequent lines, there will be a nextTextBox for (at least part of) the text displayed on the next line. There will not be a nextOnLine, however. Thoughts: 1. The current code seems to be making the assumption that getting the nextTextBox() will return a box which contains all of the text on the next line. Operating under this assumption, it makes sense to append a newline char at the end each time in the while loop. The problem with this assumption is that it is incorrect in instances where one or more newline chars appear in the source document. 2. If instead of using nextTextBox we switch to nextOnLine as an alternative, we'll be stopping before we have all of the text associated with this AtkObject. 3. In order to get all of the text that is associated with this AtkObject *and* accurately reflect where the line breaks are in the displayed text, we should continue to use nextTextBox as the current code does, but only insert a newline char if there is not any more text on the current line (a question which !box->nextOnLineExists() seems to answer). ---------------------------------------------------------------- 2nd text object, Window sized so that text wraps after the image ---------------------------------------------------------------- accObject->stringValue(): <<Here is another >> renderText->textLength(): 17 box: 1 text: <<Here is another >> box->start(): 1 box->end(): 16 nextTextBox is NULL: 1 nextOnLineExists: 1 ----------------- Observations: 1. The current code automatically appends a newline char at the end of each box. Therefore, it will insert one after "Here is another " and before the image which is on the same line. We probably don't want that. 2. It appears that here too we can use nextOnLineExists() as a means to check that there is something else on this line (the image at the end of the line). Xan, thoughts? Or more details that you'd like? Thanks!

Xan Lopez

Comment 79 2009-11-23 02:31:33 PST

(In reply to comment #78) (The analysis for the first text object seems right to me, so skipping that). > ---------------------------------------------------------------- > 2nd text object, Window sized so that text wraps after the image > ---------------------------------------------------------------- > > accObject->stringValue(): <<Here is another >> > renderText->textLength(): 17 > box: 1 > text: <<Here is another >> > box->start(): 1 > box->end(): 16 > nextTextBox is NULL: 1 > nextOnLineExists: 1 > ----------------- Hrm, this seems pretty bad? If nextTextBox is NULL we won't get the last line of the text at all, so whether or not nextOnLineExists() says the right thing is a bit pointless? What would be the ideal behavior here? What does Firefox do? > Observations: > > 1. The current code automatically appends a newline char at the end of each > box. Therefore, it will insert one after "Here is another " and before the > image which is on the same line. We probably don't want that. > > 2. It appears that here too we can use nextOnLineExists() as a means to check > that there is something else on this line (the image at the end of the line). > > Xan, thoughts? Or more details that you'd like? I think the analysis for the first object makes sense, and I'd be OK with having it go provided we have a good explanation of the code in a comment and a few tests to track regressions. Still, I'd like to know what should we be doing in cases like the text object #2. > > Thanks!

Joanmarie Diggs

Comment 80 2009-11-23 07:53:25 PST

(In reply to comment #79) > (In reply to comment #78) > Hrm, this seems pretty bad? If nextTextBox is NULL we won't get the last line > of the text at all, If you look at the first example (box 5), nextTextBox is NULL as well. But there's still an additional line of text. As I understand it, the function in question is looking at a single AtkObject and creating a pango layout based on that. The text that appears in subsequent objects is, as far as this method is concerned, irrelevant. (Right??) > so whether or not nextOnLineExists() says the right thing > is a bit pointless? Is it? Don't we want to know if we have a line break at the end of the text that precedes the image? And even if it does happen to be pointless, it's nice to know that in the case of an image we don't have do to anything special to avoid breakage. :-) > What would be the ideal behavior here? What does Firefox do? Firefox puts all of the text in the parent object rather than in separate objects of ATK_ROLE_TEXT like WebKit does. The image would be represented by an embedded object character. Thus, given the test case, the accessible hierarchy would be, in its entirety: -> Document Frame -> Image The document frame would implement AtkText. If we want to do what Firefox does (I think we want to do some of it, but not all), we have big changes to make which go well beyond the scope of this bug. > I think the analysis for the first object makes sense, and I'd be OK with > having it go provided we have a good explanation of the code in a comment and a > few tests to track regressions. Okie dokie. Thanks. I'll start looking at creating unit tests. > Still, I'd like to know what should we be doing > in cases like the text object #2. In the short term (i.e. GNOME a11y should already be in place, Yelp needs to work, etc., etc.), I'd suggest that we fix the bug related to character extents (i.e. bug 25677). See, right now, WebKit is rendering text in lots and lots of separate text objects. As a result ATs like Orca must construct the full contents of a line based on the bits and pieces. I'd rather we didn't, but it's not out of the realm of what's reasonable to ask an AT to do. Given this.... The work done in this bug is essential because we need to be able to get the text on the line w.r.t. any given object. The fix for bug 25677 is what will enable us to then go to the next object and say "are you on this line as well?" if the answer is yes, then we use the functionality in this bug to get the text contents for that object. (In the case of images, we'd actually be comparing character extents to component extents, but you get the idea.) Ultimately, I'd like to get text objects as flattened as possible. At that point, then we do have to consider the very issue you've raised, decide if we want to use embedded object characters (in the case of images, we might have to. blek.), etc. Make sense?

Joanmarie Diggs

Comment 81 2009-11-23 13:02:26 PST

Created attachment 43725 [details] Fix for the issue in comment 59 - with updated unit test and comment I modified the existing unit test for get_text_at_offset to include some newline chars. Without the proposed fix, the modified test fails: ~~~~~~~~~~~~~~~~~~~~~~ /webkit/atk/get_text_at_offset: ** ERROR:../../WebKit/gtk/tests/testatk.c:47:test_get_text_function: assertion failed (text == text_result): ("This is a test. T" == "This is a test. This is the second sentence. And this the third.") Aborted ~~~~~~~~~~~~~~~~~~~~~~ With the proposed fix, it once again passes. I've also added a comment regarding what is taking place and referencing this discussion for more detail. Xan, is there anything else you'd like me to do w.r.t. this particular issue?

Xan Lopez

Comment 82 2009-11-24 07:49:46 PST

Comment on attachment 43725 [details] Fix for the issue in comment 59 - with updated unit test and comment I think I prefer to add new functions to that test instead of modifying the existing string to serve as a testcase for lots of different issues, otherwise things can be a bit confusing.

Joanmarie Diggs

Comment 83 2009-11-24 10:03:32 PST

Created attachment 43778 [details] Fix for the issue in comment 59 - with new unit test and comment (In reply to comment #82) > (From update of attachment 43725 [details]) > I think I prefer to add new functions to that test instead of modifying the > existing string to serve as a testcase for lots of different issues, otherwise > things can be a bit confusing. Makes sense. How 'bout this?

Xan Lopez

Comment 84 2009-11-24 10:36:00 PST

Comment on attachment 43778 [details] Fix for the issue in comment 59 - with new unit test and comment Woot.

WebKit Commit Bot

Comment 85 2009-11-24 10:47:22 PST

Comment on attachment 43778 [details] Fix for the issue in comment 59 - with new unit test and comment Clearing flags on attachment: 43778 Committed r51342: <http://trac.webkit.org/changeset/51342>

WebKit Commit Bot

Comment 86 2009-11-24 10:47:32 PST

All reviewed patches have been landed. Closing bug.

Joanmarie Diggs

Comment 87 2009-11-24 10:51:03 PST

Just.... one.... last.... issue. :-) And then we shall never, ever speak of get_text_at_offset again. :-)

Xan Lopez

Comment 88 2009-11-24 10:53:00 PST

(In reply to comment #87) > Just.... one.... last.... issue. :-) > > And then we shall never, ever speak of get_text_at_offset again. :-) That's what you always say.

Joanmarie Diggs

Comment 89 2009-11-26 21:30:23 PST

Created attachment 43935 [details] Comment 57 fix with unit tests This is essentially the same fix as I originally proposed for the issue. The only different in the fix is adding a check to be sure we have wrapped text before eliminating the final character on the line. This patch also adds two more unit tests.

Adam Barth

Comment 90 2009-11-30 12:46:23 PST

Attachment 43935 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 Done processing WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp WebKit/gtk/tests/testatk.c:303: Declaration has space between * and variable name in WebKitWebView* webView [whitespace/declaration] [3] WebKit/gtk/tests/testatk.c:304: Declaration has space between * and variable name in AtkObject* obj [whitespace/declaration] [3] WebKit/gtk/tests/testatk.c:305: Declaration has space between * and variable name in GMainLoop* loop [whitespace/declaration] [3] WebKit/gtk/tests/testatk.c:306: Declaration has space between * and variable name in AtkText* text_obj [whitespace/declaration] [3] WebKit/gtk/tests/testatk.c:336: Declaration has space between * and variable name in WebKitWebView* webView [whitespace/declaration] [3] WebKit/gtk/tests/testatk.c:337: Declaration has space between * and variable name in AtkObject* obj [whitespace/declaration] [3] WebKit/gtk/tests/testatk.c:338: Declaration has space between * and variable name in GMainLoop* loop [whitespace/declaration] [3] WebKit/gtk/tests/testatk.c:339: Declaration has space between * and variable name in AtkText* text_obj [whitespace/declaration] [3] Done processing WebKit/gtk/tests/testatk.c Total errors found: 8

Joanmarie Diggs

Comment 91 2009-11-30 15:49:37 PST

Created attachment 44046 [details] Comment 57 fix with unit tests Correct the issue pointed out by the style bot.

WebKit Review Bot

Comment 92 2009-11-30 15:52:48 PST

style-queue ran check-webkit-style on attachment 44046 [details] without any errors.

Xan Lopez

Comment 93 2009-12-07 05:22:33 PST

Comment on attachment 44046 [details] Comment 57 fix with unit tests >@@ -197,10 +201,10 @@ static void run_get_text_tests(AtkText* text_obj) > > static void test_webkit_atk_get_text_at_offset_forms(void) > { >- WebKitWebView* webView; >- AtkObject* obj; >- GMainLoop* loop; >- AtkText* text_obj; >+ WebKitWebView *webView; >+ AtkObject *obj; >+ GMainLoop *loop; >+ AtkText *text_obj; Everything else looks fine, but doesn't this go against the style rules? How can the bot say it's ok? What am I missing?

Joanmarie Diggs

Comment 94 2009-12-07 05:24:09 PST

(In reply to comment #93) > (From update of attachment 44046 [details]) > >@@ -197,10 +201,10 @@ static void run_get_text_tests(AtkText* text_obj) > > > > static void test_webkit_atk_get_text_at_offset_forms(void) > > { > >- WebKitWebView* webView; > >- AtkObject* obj; > >- GMainLoop* loop; > >- AtkText* text_obj; > >+ WebKitWebView *webView; > >+ AtkObject *obj; > >+ GMainLoop *loop; > >+ AtkText *text_obj; > > Everything else looks fine, but doesn't this go against the style rules? How > can the bot say it's ok? What am I missing? C versus C++ ?

Xan Lopez

Comment 95 2009-12-07 06:12:38 PST

(In reply to comment #94) > > Everything else looks fine, but doesn't this go against the style rules? How > > can the bot say it's ok? What am I missing? > > C versus C++ ? I don't think it's that, the style rule applies to all C/C++ code. Anyway, this goes against the rules, so can you please update the patch to not do that and we'll get it committed? Thanks!

Joanmarie Diggs

Comment 96 2009-12-07 06:31:56 PST

(In reply to comment #95) > (In reply to comment #94) > > > Everything else looks fine, but doesn't this go against the style rules? How > > > can the bot say it's ok? What am I missing? > > > > C versus C++ ? > > I don't think it's that, the style rule applies to all C/C++ code. Anyway, this > goes against the rules, so can you please update the patch to not do that and > we'll get it committed? Thanks! Sure. But then perhaps the style rules should be changed? If you take the rules literally (perhaps the bot author did), they are different: 1. Constructors for C++ classes should initialize all of their members using C++ initializer syntax. Each member (and superclass) should be indented on a separate line, with the colon or comma preceding the member on that line. 2. Pointer types in non-C++ code — Pointer types should be written with a space between the type and the * (so the * is adjacent to the following identifier if any). 3. Pointer and reference types in C++ code — Both pointer types and reference types should be written with no space between the type name and the * or &.

Joanmarie Diggs

Comment 97 2009-12-07 06:33:40 PST

By the way, here is the original one which doesn't include the bot-inspired changes: (In reply to comment #89) > Created an attachment (id=43935) [details] > Comment 57 fix with unit tests > > This is essentially the same fix as I originally proposed for the issue. The > only different in the fix is adding a check to be sure we have wrapped text > before eliminating the final character on the line. > > This patch also adds two more unit tests.

Xan Lopez

Comment 98 2009-12-07 06:53:06 PST

(In reply to comment #96) > (In reply to comment #95) > > (In reply to comment #94) > > > > Everything else looks fine, but doesn't this go against the style rules? How > > > > can the bot say it's ok? What am I missing? > > > > > > C versus C++ ? > > > > I don't think it's that, the style rule applies to all C/C++ code. Anyway, this > > goes against the rules, so can you please update the patch to not do that and > > we'll get it committed? Thanks! > > Sure. But then perhaps the style rules should be changed? If you take the rules > literally (perhaps the bot author did), they are different: > > 1. Constructors for C++ classes should initialize all of their members using > C++ initializer syntax. Each member (and superclass) should be indented on a > separate line, with the colon or comma preceding the member on that line. > > 2. Pointer types in non-C++ code — Pointer types should be written with a space > between the type and the * (so the * is adjacent to the following identifier if > any). > > 3. Pointer and reference types in C++ code — Both pointer types and reference > types should be written with no space between the type name and the * or &. Holy crap. Neither I or the other WebKitGTK+ reviewers were aware of this change in the style guide (and I must say, I don't really get what it tries to achieve). Maybe it's meant to be applied to Obj-C or something? In any case applying that patch would only move some lines of the file towards the "new" style, leaving everything else as is. I don't that makes much sense, so I'll r+ the old patch.

Xan Lopez

Comment 99 2009-12-07 06:56:05 PST

Comment on attachment 43935 [details] Comment 57 fix with unit tests r=me

WebKit Commit Bot

Comment 100 2009-12-07 07:38:57 PST

Comment on attachment 43935 [details] Comment 57 fix with unit tests Clearing flags on attachment: 43935 Committed r51768: <http://trac.webkit.org/changeset/51768>

WebKit Commit Bot

Comment 101 2009-12-07 07:39:11 PST

All reviewed patches have been landed. Closing bug.

Attachments
gettextat.patch (25.59 KB, patch) 2009-05-12 08:57 PDT, Xan Lopez	no flags	Details Formatted Diff Diff
gettextv2.patch (22.44 KB, patch) 2009-05-15 01:22 PDT, Xan Lopez	no flags	Details Formatted Diff Diff
test case (59 bytes, text/html) 2009-05-15 20:19 PDT, Joanmarie Diggs	no flags	Details
gettextv3.patch (24.63 KB, patch) 2009-05-19 03:36 PDT, Xan Lopez	jmalonzo: review-	Details Formatted Diff Diff
gettextv4.patch (25.04 KB, patch) 2009-05-23 03:39 PDT, Xan Lopez	no flags	Details Formatted Diff Diff
gailtextutil.patch (17.85 KB, patch) 2009-06-08 07:08 PDT, Xan Lopez	no flags	Details Formatted Diff Diff
uselayout.patch (5.28 KB, patch) 2009-06-08 08:31 PDT, Xan Lopez	no flags	Details Formatted Diff Diff
unifytextaccess.patch (3.78 KB, patch) 2009-06-08 08:31 PDT, Xan Lopez	no flags	Details Formatted Diff Diff
lineboundary.patch (6.57 KB, patch) 2009-07-08 07:21 PDT, Xan Lopez	no flags	Details Formatted Diff Diff
Screenshot illustrating a problem (263.30 KB, image/png) 2009-07-08 14:07 PDT, Joanmarie Diggs	no flags	Details
leninbytes.patch (2.51 KB, patch) 2009-07-22 05:44 PDT, Xan Lopez	no flags	Details Formatted Diff Diff
textencoding.patch (5.43 KB, patch) 2009-07-22 05:44 PDT, Xan Lopez	no flags	Details Formatted Diff Diff
dontcachelayout.patch (2.10 KB, patch) 2009-07-27 13:26 PDT, Xan Lopez	no flags	Details Formatted Diff Diff
Fix for the issue in comment 59 (2.24 KB, patch) 2009-11-04 21:17 PST, Joanmarie Diggs	xan.lopez: review-	Details Formatted Diff Diff
Fix for the issue in comment 41, 55 (1.94 KB, patch) 2009-11-08 02:36 PST, Joanmarie Diggs	xan.lopez: review-	Details Formatted Diff Diff
What about....? (for the comment 41, 55 crasher) (1.95 KB, patch) 2009-11-08 15:07 PST, Joanmarie Diggs	no flags	Details Formatted Diff Diff
Use temporary pointer to advance the string (comment 41, 55 crasher) (2.51 KB, patch) 2009-11-08 16:27 PST, Joanmarie Diggs	no flags	Details Formatted Diff Diff
Fix for the issue in comment 57 (3.27 KB, patch) 2009-11-10 02:39 PST, Joanmarie Diggs	xan.lopez: review-	Details Formatted Diff Diff
Fix for the issue in comment 59 - with updated unit test and comment (3.75 KB, patch) 2009-11-23 13:02 PST, Joanmarie Diggs	no flags	Details Formatted Diff Diff
Fix for the issue in comment 59 - with new unit test and comment (5.43 KB, patch) 2009-11-24 10:03 PST, Joanmarie Diggs	no flags	Details Formatted Diff Diff
Comment 57 fix with unit tests (8.10 KB, patch) 2009-11-26 21:30 PST, Joanmarie Diggs	no flags	Details Formatted Diff Diff
Comment 57 fix with unit tests (9.32 KB, patch) 2009-11-30 15:49 PST, Joanmarie Diggs	no flags	Details Formatted Diff Diff
Show Obsolete (17) View All Add attachment proposed patch, testcase, etc.