Bug 47365 - getTextAtOffset returns incorrect results if a link includes text and an image
Summary: getTextAtOffset returns incorrect results if a link includes text and an image
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Accessibility (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords: Gtk
Depends on:
Blocks: 25531
  Show dependency treegraph
 
Reported: 2010-10-07 11:07 PDT by Joanmarie Diggs (irc: joanie)
Modified: 2010-10-27 14:03 PDT (History)
5 users (show)

See Also:


Attachments
test case (304 bytes, text/html)
2010-10-07 11:07 PDT, Joanmarie Diggs (irc: joanie)
no flags Details
Patch proposal (WIP) (7.17 KB, patch)
2010-10-08 05:45 PDT, Mario Sanchez Prada
no flags Details | Formatted Diff | Diff
Patch proposal + unit test (12.19 KB, patch)
2010-10-20 13:37 PDT, Mario Sanchez Prada
mrobinson: review-
Details | Formatted Diff | Diff
Patch proposal + unit test (12.33 KB, patch)
2010-10-21 02:43 PDT, Mario Sanchez Prada
no flags Details | Formatted Diff | Diff
Patch proposal + unit test (12.35 KB, patch)
2010-10-26 02:07 PDT, Mario Sanchez Prada
mrobinson: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Joanmarie Diggs (irc: joanie) 2010-10-07 11:07:38 PDT
Created attachment 70120 [details]
test case

Steps to reproduce:

1. View the attached test case in Epiphany.

2. Using Accerciser, obtain the text of each line using pyatspi.TEXT_BOUNDARY_LINE_START.

Expected results: The full line of text for each line would be returned.

Actual results: The full line of text is not returned when there is an image inside of the link.

~~~~~~

Copy and paste from Accerciser's iPython console, where tn is the AtkText interface for the nth AtkObject of ATK_ROLE_PARAGRAPH:

In [1]: t1 = acc.queryText()
In [2]: t2 = acc.queryText()
In [3]: t3 = acc.queryText()
In [4]: t4 = acc.queryText()
In [5]: t1.getTextAtOffset(0, TEXT_BOUNDARY_LINE_START)
Out[5]: ('One two three', 0, 13)
In [6]: t2.getTextAtOffset(0, TEXT_BOUNDARY_LINE_START)
Out[6]: ('One two thr', 0, 11)
In [7]: t3.getTextAtOffset(0, TEXT_BOUNDARY_LINE_START)
Out[7]: ('One two thr', 0, 11)
In [8]: t4.getTextAtOffset(0, TEXT_BOUNDARY_LINE_START)
Out[8]: ('One two ', 0, 8)
Comment 1 Mario Sanchez Prada 2010-10-08 05:45:05 PDT
Created attachment 70231 [details]
Patch proposal (WIP)

Ataching a WorkInProgress patch that would fix this issue, just for sharing, as currently my devel environment is kind of broken so unit tests won't run...

As soon as I get it everything properly working back again I'll write an unit test for this stuff and upload a final patch to be reviewed.
Comment 2 Mario Sanchez Prada 2010-10-20 13:37:03 PDT
Created attachment 71330 [details]
Patch proposal + unit test

(In reply to comment #1)
> [...]
> As soon as I get it everything properly working back again I'll write an unit test for this stuff and upload a final patch to be reviewed.

Back to normality, so attaching a patch proposal with the unit tests I couldn't implement before.

Hance, asking for review.
Comment 3 Martin Robinson 2010-10-21 00:44:50 PDT
Comment on attachment 71330 [details]
Patch proposal + unit test

View in context: https://bugs.webkit.org/attachment.cgi?id=71330&action=review

Looks good, but I have a few concerns.

> WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp:866
> +    GString* str = g_string_new(0);

Please don't use an abbreviation here.

> WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp:873
> +    for (RenderObject* obj = renderer->firstChild(); obj; obj = obj->nextSibling()) {
> +        if (obj->isBR()) {

Again, we've been trying to avoid abbreviations in new code.

> WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp:936
> +        g_string_append(str, textForRenderer(accObject->renderer()));

Isn't this a memory leak? textForRenderer returns a newly allocated string.

> WebKit/gtk/tests/testatk.c:975
> +    webkit_web_view_load_string(webView, linksWithInlineImages, NULL, NULL, NULL);

We should follow WebKit style in the tests as much as possible, thus NULL => 0.

> WebKit/gtk/tests/testatk.c:980
> +    g_idle_add((GSourceFunc)bail_out, loop);
> +    g_main_loop_run(loop);
> +

Didn't we discover that this approach was unnecessary in a previous patch or does something about this require it?
Comment 4 Mario Sanchez Prada 2010-10-21 02:43:32 PDT
Created attachment 71412 [details]
Patch proposal + unit test

(In reply to comment #3)
> (From update of attachment 71330 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=71330&action=review
> 
> Looks good, but I have a few concerns.

Thanks. Attaching a new patch trying to address all those issues.

> > WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp:866
> > +    GString* str = g_string_new(0);
> 
> Please don't use an abbreviation here.

Sorry. Fixed using 'resultText'

> > WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp:873
> > +    for (RenderObject* obj = renderer->firstChild(); obj; obj = obj->nextSibling()) {
> > +        if (obj->isBR()) {
> 
> Again, we've been trying to avoid abbreviations in new code.

Fixed. I used 'object' instead :P

> > WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp:936
> > +        g_string_append(str, textForRenderer(accObject->renderer()));
> 
> Isn't this a memory leak? textForRenderer returns a newly allocated string.

Good catch. It actually IS a memory leak. Fixed by using GOwnPtr in there.

> > WebKit/gtk/tests/testatk.c:975
> > +    webkit_web_view_load_string(webView, linksWithInlineImages, NULL, NULL, NULL);
> 
> We should follow WebKit style in the tests as much as possible, thus NULL => 0.

Fixed.

> > WebKit/gtk/tests/testatk.c:980
> > +    g_idle_add((GSourceFunc)bail_out, loop);
> > +    g_main_loop_run(loop);
> > +
> 
> Didn't we discover that this approach was unnecessary in a previous patch or does something about this require it?

Actually, what happened back then (in testWebkitAtkTextChangedNotifications()) was that I needed to connect signals from accessible object before entering the main loop, but I couldn't do it that way since the accessible object wouldn't be created if the main loop was not started, so it was kind of an egg-chicken problem.

After some digging and numerous experiments manually stopping and restarting the main loop, we (a lot of help from your side :-)) finally realized we could just remove this approach for that test and just manually spin the original main loop to make sure the accessible objects were created before connecting to their signals, with no need to bother about playing with stopping/restarting  the main loop, resulting in the following code:

   [...]
    webkit_web_view_load_string(webView, formWithTextInputs, 0, 0, 0);

    // Manually spin the main context to get the accessible objects
    while (g_main_context_pending(0))
        g_main_context_iteration(0, TRUE);

    AtkObject* obj = gtk_widget_get_accessible(GTK_WIDGET(webView));
    g_assert(obj);

    AtkObject* form = atk_object_ref_accessible_child(obj, 0);
    g_assert(ATK_IS_OBJECT(form));

    AtkObject* textEntry = atk_object_ref_accessible_child(form, 0);
    g_assert(ATK_IS_EDITABLE_TEXT(textEntry));
    g_assert(atk_object_get_role(ATK_OBJECT(textEntry)) == ATK_ROLE_ENTRY);

    g_signal_connect(textEntry, "text-changed::insert",
                     G_CALLBACK(textChangedCb),
                     (gpointer)"insert");
    g_signal_connect(textEntry, "text-changed::delete",
                     G_CALLBACK(textChangedCb),
                     (gpointer)"delete");
   [...]


However, this case is different as we don't know to connect to any signals, so the original chicken-egg problem (which led me to experiment with stopping-restarting the mainloop) is not present in this case. However, we'd still need to manually spin the original main loop to get valid accessible objects, in case we want to get rid of that g_main_loop_run() / bail_out() stuff, which would result in the following patch on top of the current proposal:

   --- a/WebKit/gtk/tests/testatk.c
   +++ b/WebKit/gtk/tests/testatk.c
   @@ -972,11 +972,11 @@ static void testWebkitAtkLinksWithInlineImages(void)
        g_object_ref_sink(webView);
        GtkAllocation alloc = { 0, 0, 800, 600 };
        gtk_widget_size_allocate(GTK_WIDGET(webView), &alloc);
        webkit_web_view_load_string(webView, linksWithInlineImages, 0, 0, 0);
   -    GMainLoop* loop = g_main_loop_new(NULL, TRUE);
 
   -    g_idle_add((GSourceFunc)bail_out, loop);
   -    g_main_loop_run(loop);
   +
   +   // Manually spin the main context to get the accessible objects
   +    while (g_main_context_pending(0))
   +        g_main_context_iteration(0, TRUE);
 
     AtkObject* obj = gtk_widget_get_accessible(GTK_WIDGET(webView));
     g_assert(obj);


... and I'm not sure whether that's more clear than just using the other approach, which is used elsewhere in this file, instead of putting that weird while loop in this test as well (currently that other approach is only being used in testWebkitAtkTextChangedNotifications()).

Hence, I'm not making that change yet in this new patch, just in case you agree with me it's ok to leave it as it is, because of the rationale above :-)

Thanks!
Comment 5 Martin Robinson 2010-10-25 11:17:18 PDT
Comment on attachment 71412 [details]
Patch proposal + unit test

View in context: https://bugs.webkit.org/attachment.cgi?id=71412&action=review

Looks good, but I'm waiting for the reuslts of your experiments to give the final r+. We should switch to iterating the main main loop, if possible.

> WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp:937
> +        GOwnPtr<gchar> rendererText(textForRenderer(accObject->renderer()));
> +        g_string_append(str, rendererText.get());

Thanks for fixing this. In the future it might make sense for these methods to just return CStrings. Perhaps a cleanup in some other patch.

> WebKit/gtk/tests/testatk.c:979
> +    g_idle_add((GSourceFunc)bail_out, loop);
> +    g_main_loop_run(loop);

I think we should also try to iterate the main main loop here as well. I think you're exploring that presently.
Comment 6 Mario Sanchez Prada 2010-10-26 02:07:03 PDT
Created attachment 71852 [details]
Patch proposal + unit test

Attaching a new patch:

(In reply to comment #5)
> [...]
> > WebKit/gtk/tests/testatk.c:979
> > +    g_idle_add((GSourceFunc)bail_out, loop);
> > +    g_main_loop_run(loop);
> 
> I think we should also try to iterate the main main loop here as well. I think you're exploring that presently.

Fixed (by manually spinning the loop, which is still needed  to get the accessible objects)

Asking for review again
Comment 7 Martin Robinson 2010-10-27 00:57:47 PDT
Comment on attachment 71852 [details]
Patch proposal + unit test

View in context: https://bugs.webkit.org/attachment.cgi?id=71852&action=review

Great! Thanks. Perhaps make this small change before landing:

> WebKit/gtk/tests/testatk.c:985
> +    AtkObject* obj = gtk_widget_get_accessible(GTK_WIDGET(webView));
> +    g_assert(obj);
> +
> +    // First paragraph (link at the beginning)
> +    AtkObject* paragraph = atk_object_ref_accessible_child(obj, 0);

Please rename obj to something like "object" to avoid the abbreviation.
Comment 8 Mario Sanchez Prada 2010-10-27 04:24:19 PDT
Landed, http://trac.webkit.org/changeset/70634
Comment 9 WebKit Review Bot 2010-10-27 14:03:00 PDT
http://trac.webkit.org/changeset/70634 might have broken GTK Linux 32-bit Release