Bug 20403 - [Gtk] Segfault after a table with an iframe is attempted to be added twice to DOM model with javascript.
Summary: [Gtk] Segfault after a table with an iframe is attempted to be added twice to...
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Linux
: P1 Major
Assignee: Nobody
Keywords: Gtk, NeedsReduction
Depends on: 18237 20935
  Show dependency treegraph
Reported: 2008-08-15 12:15 PDT by Luke Kenneth Casson Leighton
Modified: 2008-10-18 00:06 PDT (History)
2 users (show)

See Also:

reduced (from 1.2mb to 0.5mb!!) stand-alone example (284.31 KB, application/x-gzip)
2008-08-16 06:34 PDT, Luke Kenneth Casson Leighton
no flags Details
Potential fix (2.15 KB, patch)
2008-09-13 07:37 PDT, Alp Toker
no flags Details | Formatted Diff | Diff
move g_object_unref() and m_frame = 0 into frameLoaderDestroyed() (723 bytes, patch)
2008-09-19 09:39 PDT, Luke Kenneth Casson Leighton
darin: review-
Details | Formatted Diff | Diff
added changelog and removed notImplemented from detachFromParent4 (1.38 KB, patch)
2008-10-17 19:47 PDT, Jan Alonzo
mrowe: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2008-08-15 12:15:20 PDT
whilst i realise that this isn't the bug reporting location for pywebkitgtk, the bug, which is a segfault entirely without warning, is much more likely to be in libwebkit than it is in pywebkitgtk, so i am reporting it here as well as at http://code.google.com/p/pywebkitgtk/issues/detail?id=11

you will need to attempt to log in to the above site, which is a pure AJAX web site with 1.2mb of javascript.  i was stunned, amazed and delighted that the demo browser in pywebkitgtk worked on my _smaller_ AJAX-only web site, http://lkcl.net - and quickly moved on to the bigger site, only to be amused at the segfault.

if you use the two test accounts - test and test2 - with passwords of ffffff and tttttt (it could be the other way round) please bear in mind that the site is running LIVE.  also bear in mind that people cannot log in twice with the same account.

i downloaded the source from svn as of yesterday.  modify pywebkitgtk demobrowser.py in the demo/ directory to instead of starting off the page at gnome somewhere to put http://partyliveonline.com in, instead.
Comment 1 Luke Kenneth Casson Leighton 2008-08-16 04:06:34 PDT
tested against Programs/GtkBrowser and it exhibits exactly the same segfault.
so it's nothing to do with pywebkitgtk at all.
Comment 2 Luke Kenneth Casson Leighton 2008-08-16 05:16:47 PDT
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2aed784af040 (LWP 1524)]
0x00002aed72eb0520 in WebKit::getViewFromFrame ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
(gdb) where
#0  0x00002aed72eb0520 in WebKit::getViewFromFrame ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#1  0x00002aed72ead89a in WebKit::FrameLoaderClient::postProgressEstimateChangedNotification () from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#2  0x00002aed730b44c8 in WebCore::ProgressTracker::incrementProgress ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#3  0x00002aed7309200d in WebCore::FrameLoader::didReceiveData ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#4  0x00002aed730b557e in WebCore::ResourceLoader::didReceiveData ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#5  0x00002aed730afc96 in WebCore::MainResourceLoader::didReceiveData ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#6  0x00002aed73219187 in WebCore::writeCallback ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#7  0x00002aed74b42192 in ?? () from /usr/lib/libcurl-gnutls.so.4
#8  0x00002aed74b5c598 in ?? () from /usr/lib/libcurl-gnutls.so.4
#9  0x00002aed74b5c872 in ?? () from /usr/lib/libcurl-gnutls.so.4
#10 0x00002aed74b553fc in ?? () from /usr/lib/libcurl-gnutls.so.4
#11 0x00002aed74b5a78c in ?? () from /usr/lib/libcurl-gnutls.so.4
#12 0x00002aed74b5b05b in curl_multi_perform ()
   from /usr/lib/libcurl-gnutls.so.4
#13 0x00002aed7321a840 in WebCore::ResourceHandleManager::downloadTimerCallback
    () from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#14 0x00002aed73130d73 in WebCore::TimerBase::fireTimers ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#15 0x00002aed73130e2b in WebCore::TimerBase::sharedTimerFired ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#16 0x00002aed73337242 in WebCore::timeout_cb ()
   from /var/src/WebKit/.libs/libwebkit-1.0.so.1
#17 0x000000311fe36f92 in g_main_context_dispatch ()
   from /usr/lib/libglib-2.0.so.0
#18 0x000000311fe3a236 in ?? () from /usr/lib/libglib-2.0.so.0
#19 0x000000311fe3a4f7 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#20 0x00002aed73b43587 in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0
#21 0x0000000000401eab in main ()

(gdb) disas 'WebKit::getViewFromFrame'
Dump of assembler code for function _ZN6WebKit16getViewFromFrameEP15_WebKitWebFrame:
0x00002b08c740c520 <_ZN6WebKit16getViewFromFrameEP15_WebKitWebFrame+0>: mov    0x18(%rdi),%rax
0x00002b08c740c524 <_ZN6WebKit16getViewFromFrameEP15_WebKitWebFrame+4>: mov    0x10(%rax),%rax
0x00002b08c740c528 <_ZN6WebKit16getViewFromFrameEP15_WebKitWebFrame+8>: retq   
End of assembler dump.
(gdb) print $rip
$1 = (void (*)()) 0x2b08c740c520 <WebKit::getViewFromFrame(_WebKitWebFrame*)>
Comment 3 Mark Rowe (bdash) 2008-08-16 05:19:46 PDT
void FrameLoaderClient::postProgressEstimateChangedNotification()
    WebKitWebView* webView = getViewFromFrame(m_frame);

WebKitWebView* getViewFromFrame(WebKitWebFrame* frame)
    WebKitWebFramePrivate* priv = frame->priv;
    return priv->webView;

The disassembly indicates that inside FrameLoaderClient::postProgressEstimateChangedNotification, m_frame is 0.
Comment 4 Luke Kenneth Casson Leighton 2008-08-16 06:34:22 PDT
Created attachment 22831 [details]
reduced (from 1.2mb to 0.5mb!!) stand-alone example
Comment 5 Luke Kenneth Casson Leighton 2008-08-16 06:35:09 PDT
must be run like this:
 /var/src/WebKit/Programs/GtkLauncher /var/src/output/index2.html 
if you run it as:
 cd /var/src
 /var/src/WebKit/Programs/GtkLauncher output/index2.html 
you don't get a crash (don't know why)

don't look at me and say "that ain't reduced!!!!" - this is the app
that generated it:

from ui import RootPanel, VerticalPanel
from Timer import Timer
from ui import Frame

class index2:

    def onModuleLoad(self):

        Timer(1000, self)

    def onTimer(self, t):

        ad_frame = Frame("./side_ad.html")

i can't help it if those few lines of code result in 500k of javascript!!!
Comment 6 Luke Kenneth Casson Leighton 2008-08-16 06:49:02 PDT
as you can see from the example python code (which is compiled to javascript
with the pyjamas compiler), what i'm accidentally doing is adding a widget which contains an iframe, and then i'm adding the *same* widget (by mistake) to the root DOM model, twice. it's a mistake on my part, to add the widget twice to the DOM tree but, it causes webkit to segfault.
Comment 7 Luke Kenneth Casson Leighton 2008-08-16 06:51:40 PDT
the use of the Timer, btw, is essential to repro the segfault.
Comment 8 Luke Kenneth Casson Leighton 2008-09-02 09:04:31 PDT
another one.  same bug being encountered, different way.  arg.

(gdb) disas 'WebKit::getViewFromFrame'
Dump of assembler code for function _ZN6WebKit16getViewFromFrameEP15_WebKitWebFrame:
0x00002b4b126616f0 <_ZN6WebKit16getViewFromFrameEP15_WebKitWebFrame+0>: mov    0x18(%rdi),%rax
0x00002b4b126616f4 <_ZN6WebKit16getViewFromFrameEP15_WebKitWebFrame+4>: mov    0x10(%rax),%rax
0x00002b4b126616f8 <_ZN6WebKit16getViewFromFrameEP15_WebKitWebFrame+8>: retq   
End of assembler dump.
Comment 9 Luke Kenneth Casson Leighton 2008-09-02 09:19:42 PDT
euuw. yuk.  but it works.

void FrameLoaderClient::postProgressEstimateChangedNotification()
    if (!m_frame) /* bug#20403 - hack! shouldn't be here!  we've been deleted! */
Comment 10 Luke Kenneth Casson Leighton 2008-09-02 09:22:03 PDT
might be related to #18237?
Comment 11 Luke Kenneth Casson Leighton 2008-09-03 02:51:29 PDT
i think i know what's going on, here.

i believe that the issue could be related to a Document (the one in the frame) having been forcibly deleted.  i noticed that the JS bindings have a concept of "mark()ing".

when a frame is removed from the window, it contains a Document which gets forcibly deleted.  of course, that doesn't result in any of the bindings getting deleted.

i am _guessing_ that this is why the concept of "mark()" was added to JS bindings.

the alternative is to have a "delete" callback into the binding.  when an object has to be forcibly deleted, you must notify each of the bindings that the object is now "gone".  it is up to the bindings to then do a "delete" of whatever kind it needs.

Comment 12 Alp Toker 2008-09-13 07:37:28 PDT
Created attachment 23392 [details]
Potential fix

Can you try with this patch and let me know if it helps? I hacked it up a while ago but it fell by the wayside before getting landed.
Comment 13 Luke Kenneth Casson Leighton 2008-09-18 10:52:03 PDT
alp - sorry.  what it does - if you try it on http://partyliveonline.com - is stop the loading process at about 24%.  i hit "refresh" several times, waited for about 20 seconds, and no progress was shown.
i unpatched and recompiled, immediately it worked.  proceeded to 40%, 58% and loaded the page.  takes a lot longer than e.g. on firefox to load that 1.2mb of javascript, but it gets there...
... but not with this patch.
Comment 14 Luke Kenneth Casson Leighton 2008-09-18 10:58:46 PDT
remember - the order of events is as follows:

* view page
* initial frame gets loaded
* timer goes off (creating new thread context)
* timer-inited-thread removes old frame, resulting in destruction of document, including a "Frame"
* destruction of document results in webkit objects being deleted.  regardless of current refcount, refcounts are forced to zero.
* WebKitWebFrame contains an object which has been forcibly deleted
* main thread still has "page loading" signals outstanding
* main thread tries to notify user of "page loading" on a WebKitWebFrame where the Frame has been forcibly destroyed, outside of the control of the main thread (by timer-initiated-thread).
* main thread tries to de-reference NULL pointer.  segfault.

the fix _really is_ to check that m_frame != NULL.  much better would be to copy the style of the JS Bindings, which already have this concept of "mark()ing".

the issue's been solved for javascript bindings, and the code required to fix this issue is there, tried, tested and available for easy adoption and use, to solve the problem.
Comment 15 Luke Kenneth Casson Leighton 2008-09-19 09:39:27 PDT
Created attachment 23568 [details]
move g_object_unref() and m_frame = 0 into frameLoaderDestroyed()

no more crashes.
the clue was the two occurrences where m_frame is set to 0.
that, and looking at the qt and wx frame loaders, they do
exactly what this patch does - get rid of the m_frame in
frameLoaderDestroyed() and do nothing - at all - in any of
the detachedFromParentN() functions.
Comment 16 Mark Rowe (bdash) 2008-09-21 00:51:21 PDT
(In reply to comment #14)
> the fix _really is_ to check that m_frame != NULL.  much better would be to
> copy the style of the JS Bindings, which already have this concept of
> "mark()ing".

The JavaScript bindings have a concept of marking because JavaScriptCore uses a mark/sweep garbage collector.  The collector needs to know which objects are still reachable, so it asks the root set of objects to mark themselves.  Each object marks itself, then asks each object it has a reference to to mark itself, which results in recursively marking the entire live JS object graph. WebCore and WebKit are not garbage collected in this respect, they instead use the simpler approach of reference counting.  Introducing a "mark" method for refcounted objects doesn't make a lot of sense.
Comment 17 Darin Adler 2008-09-21 12:22:19 PDT
Comment on attachment 23568 [details]
move g_object_unref() and m_frame = 0 into frameLoaderDestroyed()

Yes, this is exactly the right fix!

+    notImplemented();

There's no reason to add that call. It's fine for detachedFromParent4() to be empty and no need to call notImplemented().

The change looks fine, but there's no ChangeLog nor test case, so I'm going to say review-. Please submit a new patch with ChangeLog. If this was a platform where layout tests were working well, I'd insist on a test case too, but I don't think GTK is working well with regression tests yet, so I guess we're OK without one.
Comment 18 Luke Kenneth Casson Leighton 2008-09-21 13:08:42 PDT
darin, hi, sorry about no changelog.  will write an entry.
notImplemented() was copied verbatim from the other ... uhhh... oh :)

void FrameLoaderClientQt::detachedFromParent4()

oh - this is why:

void WebFrame::detachedFromParent4()


void FrameLoaderClientWx::detachedFromParent4()

so... qt doesn't call notImplemented(), but wx and win do.
Comment 19 Luke Kenneth Casson Leighton 2008-09-25 11:57:45 PDT
darin: many apologies - my time is now limited.  someone else will need to finalise the work on this.
Comment 20 Jan Alonzo 2008-10-17 19:47:09 PDT
Created attachment 24476 [details]
added changelog and removed notImplemented from detachFromParent4

Added ChangeLog and removed notImplemented from detachFromParent4()
Comment 21 Mark Rowe (bdash) 2008-10-17 19:51:52 PDT
Comment on attachment 24476 [details]
added changelog and removed notImplemented from detachFromParent4

It would be great to get a test case landed if it's possible, in order to ensure that this bug is not re-introduced somewhere down the line (or in other ports).

Other than that, r=me.
Comment 22 Jan Alonzo 2008-10-18 00:06:45 PDT
Landed in r37676. Tests are in https://bugs.webkit.org/show_bug.cgi?id=21726