Bug 20403 - [Gtk] Segfault after a table with an iframe is attempted to be added twice to DOM model with javascript.
: [Gtk] Segfault after a table with an iframe is attempted to be added twice to...
Status: RESOLVED FIXED
: WebKit
New Bugs
: 528+ (Nightly build)
: PC Linux
: P1 Major
Assigned To:
:
: Gtk, NeedsReduction
: 18237 20935
:
  Show dependency treegraph
 
Reported: 2008-08-15 12:15 PST by
Modified: 2008-10-18 00:06 PST (History)


Attachments
reduced (from 1.2mb to 0.5mb!!) stand-alone example (284.31 KB, application/x-gzip)
2008-08-16 06:34 PST, Luke Kenneth Casson Leighton
no flags Details
Potential fix (2.15 KB, patch)
2008-09-13 07:37 PST, Alp Toker
no flags Review Patch | Details | Formatted Diff | Diff
move g_object_unref() and m_frame = 0 into frameLoaderDestroyed() (723 bytes, patch)
2008-09-19 09:39 PST, Luke Kenneth Casson Leighton
darin: review-
Review Patch | Details | Formatted Diff | Diff
added changelog and removed notImplemented from detachFromParent4 (1.38 KB, patch)
2008-10-17 19:47 PST, Jan Alonzo
mrowe: review+
Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2008-08-15 12:15:20 PST
whilst i realise that this isn't the bug reporting location for pywebkitgtk, the bug, which is a segfault entirely without warning, is much more likely to be in libwebkit than it is in pywebkitgtk, so i am reporting it here as well as at http://code.google.com/p/pywebkitgtk/issues/detail?id=11

you will need to attempt to log in to the above site, which is a pure AJAX web site with 1.2mb of javascript.  i was stunned, amazed and delighted that the demo browser in pywebkitgtk worked on my _smaller_ AJAX-only web site, http://lkcl.net - and quickly moved on to the bigger site, only to be amused at the segfault.

if you use the two test accounts - test and test2 - with passwords of ffffff and tttttt (it could be the other way round) please bear in mind that the site is running LIVE.  also bear in mind that people cannot log in twice with the same account.

i downloaded the source from svn as of yesterday.  modify pywebkitgtk demobrowser.py in the demo/ directory to instead of starting off the page at gnome somewhere to put http://partyliveonline.com in, instead.
------- Comment #1 From 2008-08-16 04:06:34 PST -------
tested against Programs/GtkBrowser and it exhibits exactly the same segfault.
so it's nothing to do with pywebkitgtk at all.
------- Comment #2 From 2008-08-16 05:16:47 PST -------
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2aed784af040 (LWP 1524)]
0x00002aed72eb0520 in WebKit::getViewFromFrame ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
(gdb) where
#0  0x00002aed72eb0520 in WebKit::getViewFromFrame ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#1  0x00002aed72ead89a in WebKit::FrameLoaderClient::postProgressEstimateChangedNotification () from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#2  0x00002aed730b44c8 in WebCore::ProgressTracker::incrementProgress ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#3  0x00002aed7309200d in WebCore::FrameLoader::didReceiveData ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#4  0x00002aed730b557e in WebCore::ResourceLoader::didReceiveData ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#5  0x00002aed730afc96 in WebCore::MainResourceLoader::didReceiveData ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#6  0x00002aed73219187 in WebCore::writeCallback ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#7  0x00002aed74b42192 in ?? () from /usr/lib/libcurl-gnutls.so.4
#8  0x00002aed74b5c598 in ?? () from /usr/lib/libcurl-gnutls.so.4
#9  0x00002aed74b5c872 in ?? () from /usr/lib/libcurl-gnutls.so.4
#10 0x00002aed74b553fc in ?? () from /usr/lib/libcurl-gnutls.so.4
#11 0x00002aed74b5a78c in ?? () from /usr/lib/libcurl-gnutls.so.4
#12 0x00002aed74b5b05b in curl_multi_perform ()
   from /usr/lib/libcurl-gnutls.so.4
#13 0x00002aed7321a840 in WebCore::ResourceHandleManager::downloadTimerCallback
    () from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#14 0x00002aed73130d73 in WebCore::TimerBase::fireTimers ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#15 0x00002aed73130e2b in WebCore::TimerBase::sharedTimerFired ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#16 0x00002aed73337242 in WebCore::timeout_cb ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#17 0x000000311fe36f92 in g_main_context_dispatch ()
   from /usr/lib/libglib-2.0.so.0
#18 0x000000311fe3a236 in ?? () from /usr/lib/libglib-2.0.so.0
#19 0x000000311fe3a4f7 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#20 0x00002aed73b43587 in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0
#21 0x0000000000401eab in main ()
(gdb) 

(gdb) disas 'WebKit::getViewFromFrame'
Dump of assembler code for function _ZN6WebKit16getViewFromFrameEP15_WebKitWebFrame:
0x00002b08c740c520 <_ZN6WebKit16getViewFromFrameEP15_WebKitWebFrame+0>: mov    0x18(%rdi),%rax
0x00002b08c740c524 <_ZN6WebKit16getViewFromFrameEP15_WebKitWebFrame+4>: mov    0x10(%rax),%rax
0x00002b08c740c528 <_ZN6WebKit16getViewFromFrameEP15_WebKitWebFrame+8>: retq   
End of assembler dump.
(gdb) print $rip
$1 = (void (*)()) 0x2b08c740c520 <WebKit::getViewFromFrame(_WebKitWebFrame*)>
(gdb) 
------- Comment #3 From 2008-08-16 05:19:46 PST -------
void FrameLoaderClient::postProgressEstimateChangedNotification()
{
    WebKitWebView* webView = getViewFromFrame(m_frame);

WebKitWebView* getViewFromFrame(WebKitWebFrame* frame)
{
    WebKitWebFramePrivate* priv = frame->priv;
    return priv->webView;


The disassembly indicates that inside FrameLoaderClient::postProgressEstimateChangedNotification, m_frame is 0.
------- Comment #4 From 2008-08-16 06:34:22 PST -------
Created an attachment (id=22831) [details]
reduced (from 1.2mb to 0.5mb!!) stand-alone example
------- Comment #5 From 2008-08-16 06:35:09 PST -------
must be run like this:
 /var/src/WebKit/Programs/GtkLauncher /var/src/output/index2.html 
if you run it as:
 cd /var/src
 /var/src/WebKit/Programs/GtkLauncher output/index2.html 
you don't get a crash (don't know why)

don't look at me and say "that ain't reduced!!!!" - this is the app
that generated it:

from ui import RootPanel, VerticalPanel
from Timer import Timer
from ui import Frame

class index2:

    def onModuleLoad(self):

        self.panel=VerticalPanel()
        RootPanel().add(self.panel)
        Timer(1000, self)

    def onTimer(self, t):

        ad_frame = Frame("./side_ad.html")
        self.panel.add(ad_frame)
        RootPanel().add(self.panel)

i can't help it if those few lines of code result in 500k of javascript!!!
------- Comment #6 From 2008-08-16 06:49:02 PST -------
as you can see from the example python code (which is compiled to javascript
with the pyjamas compiler), what i'm accidentally doing is adding a widget which contains an iframe, and then i'm adding the *same* widget (by mistake) to the root DOM model, twice. it's a mistake on my part, to add the widget twice to the DOM tree but, it causes webkit to segfault.
------- Comment #7 From 2008-08-16 06:51:40 PST -------
the use of the Timer, btw, is essential to repro the segfault.
------- Comment #8 From 2008-09-02 09:04:31 PST -------
another one.  same bug being encountered, different way.  arg.

(gdb) disas 'WebKit::getViewFromFrame'
Dump of assembler code for function _ZN6WebKit16getViewFromFrameEP15_WebKitWebFrame:
0x00002b4b126616f0 <_ZN6WebKit16getViewFromFrameEP15_WebKitWebFrame+0>: mov    0x18(%rdi),%rax
0x00002b4b126616f4 <_ZN6WebKit16getViewFromFrameEP15_WebKitWebFrame+4>: mov    0x10(%rax),%rax
0x00002b4b126616f8 <_ZN6WebKit16getViewFromFrameEP15_WebKitWebFrame+8>: retq   
End of assembler dump.
(gdb) 
------- Comment #9 From 2008-09-02 09:19:42 PST -------
euuw. yuk.  but it works.

void FrameLoaderClient::postProgressEstimateChangedNotification()
{
    if (!m_frame) /* bug#20403 - hack! shouldn't be here!  we've been deleted! */
        return;
------- Comment #10 From 2008-09-02 09:22:03 PST -------
might be related to #18237?
------- Comment #11 From 2008-09-03 02:51:29 PST -------
i think i know what's going on, here.

i believe that the issue could be related to a Document (the one in the frame) having been forcibly deleted.  i noticed that the JS bindings have a concept of "mark()ing".

when a frame is removed from the window, it contains a Document which gets forcibly deleted.  of course, that doesn't result in any of the bindings getting deleted.

i am _guessing_ that this is why the concept of "mark()" was added to JS bindings.

the alternative is to have a "delete" callback into the binding.  when an object has to be forcibly deleted, you must notify each of the bindings that the object is now "gone".  it is up to the bindings to then do a "delete" of whatever kind it needs.
------- Comment #12 From 2008-09-13 07:37:28 PST -------
Created an attachment (id=23392) [details]
Potential fix

Can you try with this patch and let me know if it helps? I hacked it up a while ago but it fell by the wayside before getting landed.
------- Comment #13 From 2008-09-18 10:52:03 PST -------
alp - sorry.  what it does - if you try it on http://partyliveonline.com - is stop the loading process at about 24%.  i hit "refresh" several times, waited for about 20 seconds, and no progress was shown.
i unpatched and recompiled, immediately it worked.  proceeded to 40%, 58% and loaded the page.  takes a lot longer than e.g. on firefox to load that 1.2mb of javascript, but it gets there...
... but not with this patch.
------- Comment #14 From 2008-09-18 10:58:46 PST -------
remember - the order of events is as follows:

* view page
* initial frame gets loaded
* timer goes off (creating new thread context)
* timer-inited-thread removes old frame, resulting in destruction of document, including a "Frame"
* destruction of document results in webkit objects being deleted.  regardless of current refcount, refcounts are forced to zero.
* WebKitWebFrame contains an object which has been forcibly deleted
* main thread still has "page loading" signals outstanding
* main thread tries to notify user of "page loading" on a WebKitWebFrame where the Frame has been forcibly destroyed, outside of the control of the main thread (by timer-initiated-thread).
* main thread tries to de-reference NULL pointer.  segfault.

the fix _really is_ to check that m_frame != NULL.  much better would be to copy the style of the JS Bindings, which already have this concept of "mark()ing".

the issue's been solved for javascript bindings, and the code required to fix this issue is there, tried, tested and available for easy adoption and use, to solve the problem.
------- Comment #15 From 2008-09-19 09:39:27 PST -------
Created an attachment (id=23568) [details]
move g_object_unref() and m_frame = 0 into frameLoaderDestroyed()

no more crashes.
the clue was the two occurrences where m_frame is set to 0.
that, and looking at the qt and wx frame loaders, they do
exactly what this patch does - get rid of the m_frame in
frameLoaderDestroyed() and do nothing - at all - in any of
the detachedFromParentN() functions.
------- Comment #16 From 2008-09-21 00:51:21 PST -------
(In reply to comment #14)
> the fix _really is_ to check that m_frame != NULL.  much better would be to
> copy the style of the JS Bindings, which already have this concept of
> "mark()ing".

The JavaScript bindings have a concept of marking because JavaScriptCore uses a mark/sweep garbage collector.  The collector needs to know which objects are still reachable, so it asks the root set of objects to mark themselves.  Each object marks itself, then asks each object it has a reference to to mark itself, which results in recursively marking the entire live JS object graph. WebCore and WebKit are not garbage collected in this respect, they instead use the simpler approach of reference counting.  Introducing a "mark" method for refcounted objects doesn't make a lot of sense.
------- Comment #17 From 2008-09-21 12:22:19 PST -------
(From update of attachment 23568 [details])
Yes, this is exactly the right fix!

+    notImplemented();

There's no reason to add that call. It's fine for detachedFromParent4() to be empty and no need to call notImplemented().

The change looks fine, but there's no ChangeLog nor test case, so I'm going to say review-. Please submit a new patch with ChangeLog. If this was a platform where layout tests were working well, I'd insist on a test case too, but I don't think GTK is working well with regression tests yet, so I guess we're OK without one.
------- Comment #18 From 2008-09-21 13:08:42 PST -------
darin, hi, sorry about no changelog.  will write an entry.
notImplemented() was copied verbatim from the other ... uhhh... oh :)

void FrameLoaderClientQt::detachedFromParent4()
{
}   

oh - this is why:

void WebFrame::detachedFromParent4()
{
    notImplemented();
}   

and:

void FrameLoaderClientWx::detachedFromParent4()
{   
    notImplemented();
}

so... qt doesn't call notImplemented(), but wx and win do.
------- Comment #19 From 2008-09-25 11:57:45 PST -------
darin: many apologies - my time is now limited.  someone else will need to finalise the work on this.
------- Comment #20 From 2008-10-17 19:47:09 PST -------
Created an attachment (id=24476) [details]
added changelog and removed notImplemented from detachFromParent4

Added ChangeLog and removed notImplemented from detachFromParent4()
------- Comment #21 From 2008-10-17 19:51:52 PST -------
(From update of attachment 24476 [details])
It would be great to get a test case landed if it's possible, in order to ensure that this bug is not re-introduced somewhere down the line (or in other ports).

Other than that, r=me.
------- Comment #22 From 2008-10-18 00:06:45 PST -------
Landed in r37676. Tests are in https://bugs.webkit.org/show_bug.cgi?id=21726