Bug 34490 - REGRESSION(r52116): WebCore::ImageEventSender::dispatchPendingEvents() crashes in certain conditions
: REGRESSION(r52116): WebCore::ImageEventSender::dispatchPendingEvents() crashe...
Status: RESOLVED FIXED
: WebKit
WebCore Misc.
: 528+ (Nightly build)
: PC All
: P2 Normal
Assigned To:
:
: InRadar
:
:
  Show dependency treegraph
 
Reported: 2010-02-02 10:38 PST by
Modified: 2010-02-10 13:19 PST (History)


Attachments
test case by adding delay to ImageLoader::m_timer in ImageEventSender::dispatchEventSoon() (664 bytes, text/html)
2010-02-03 09:46 PST, Victor Wang
no flags Details
Proposed patch (2.06 KB, patch)
2010-02-03 10:32 PST, Victor Wang
ap: review-
Review Patch | Details | Formatted Diff | Diff
test html (1.02 KB, text/html)
2010-02-08 16:01 PST, Victor Wang
no flags Details
work in progress (11.44 KB, patch)
2010-02-09 00:38 PST, Alexey Proskuryakov
no flags Review Patch | Details | Formatted Diff | Diff
proposed fix (24.19 KB, patch)
2010-02-09 17:55 PST, Alexey Proskuryakov
ggaren: review+
Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2010-02-02 10:38:18 PST
This is a big crasher in recent Chromium builds and happens on all platforms (Win, Linux and Mac), no repro steps yet.

Here is the stack Trace:
0x020941e8     [chrome.dll     - imageloader.cpp:292]    WebCore::ImageEventSender::dispatchPendingEvents()
0x01f907e0     [chrome.dll     - document.cpp:1823]    WebCore::Document::implicitClose()
0x01f9ece9     [chrome.dll     - frameloader.cpp:1182]    WebCore::FrameLoader::checkCallImplicitClose()
0x01f9ebfc     [chrome.dll     - frameloader.cpp:1130]    WebCore::FrameLoader::checkCompleted()
0x01f9eb41     [chrome.dll     - frameloader.cpp:1069]    WebCore::FrameLoader::finishedParsing()
0x01f945e2     [chrome.dll     - document.cpp:4184]    WebCore::Document::finishedParsing()
0x02158016     [chrome.dll     - htmlparser.cpp:1662]    WebCore::HTMLParser::finished()
0x020b95b8     [chrome.dll     - htmltokenizer.cpp:1878]    WebCore::HTMLTokenizer::end()
0x020b982f     [chrome.dll     - htmltokenizer.cpp:1918]    WebCore::HTMLTokenizer::finish()
0x01f9eacf     [chrome.dll     - frameloader.cpp:972]    WebCore::FrameLoader::endIfNotLoadingMainResource()
0x01f9ea8f     [chrome.dll     - frameloader.cpp:957]    WebCore::FrameLoader::end()
0x01fa1a30     [chrome.dll     - frameloader.cpp:2752]    WebCore::FrameLoader::finishedLoading()
------- Comment #1 From 2010-02-02 10:46:36 PST -------
Looking into the dump, think the crash is caused by the "loader" becomes invalid when ImageEventSender dispatches pending events. 

The crash started a few weeks ago and seems related to webkit patch: http://trac.webkit.org/changeset/52116, which fixes bug 31660: Image load event fires before the document fragment is attached. To fix the issue, the patch (52116) removes ImageLoader::dispatchPendingLoadEvents call from HTMLTokenizer::write() so the load events will be dispatched later.

From the dump, the crash happens in ImageEventSender::dispatchPendingEvents() which is called from HTMLTokenizer::finish(), and it seems that ImageLoader could be destroyed before that and the "loader" in m_dispatchSoonList becomes invalid.

Looking at ImageLoader destructor code, the event is removed from loadEventSender only if m_fireLoad is false. I think we could remove the m_fireLoad check in ImageLoader destructor and this should fix the crashes we are having with Chromium. This should not be Chromium specific issue and think the crash may happen on other browsers too. 

Alexey and Dave,
you touched this code before, any comments on the issue and the change I proposed to remove the check in ImageLoader destructor?
------- Comment #2 From 2010-02-02 11:15:14 PST -------
Such a fix would definitely need a regression test.
------- Comment #3 From 2010-02-03 09:46:18 PST -------
Created an attachment (id=48047) [details]
test case by adding delay to ImageLoader::m_timer in ImageEventSender::dispatchEventSoon()

I was able to catch the crash by adding delay to the timer in ImageEventSender (ImageLoader.cpp) and keep refreshing the page attached. This confirms the theory in comment#1 is correct.

The crash happens if ImageLoader (owned by HTMLImageElement) is destroyed after it has been added to ImageEventSender::m_dispatchSoonList but before the pending load events are dispatched (by timer, HTMLParser::finished etc).

The crash could not be easily repro because of all the timing issues: it depends on the time the ImageLoader timer fires, the time HTMLImageElement is destroyed and collected by GC, the image loading time and the time a new empty image src is sets to existing loader etc.

Removing the image loader from the pending event queue whenever the loader is destroyed fixes the crash.
------- Comment #4 From 2010-02-03 10:04:01 PST -------
We can simulate slow loading for test purposes - see e.g. /Users/ap/Safari/OpenSource/LayoutTests/http/tests/misc/resources/slow-png-load.pl.
------- Comment #5 From 2010-02-03 10:25:11 PST -------
Hi Alexey,

Thanks for your suggestions. While I was repro this, looks like the hardest thing in making a reliable test is the timer in ImageEventSender. It depends on how fast the timer fires after the pending event has been added to the list (see code in ImageEventSender::dispatchEventSoon). There are also other things need to happen at right time such as setting the src in HTMLImageElement to empty (to make ImageLoader::m_firedLoad becomes true) before ImageLoader is destroyed etc.
------- Comment #6 From 2010-02-03 10:32:23 PST -------
Created an attachment (id=48051) [details]
Proposed patch

Not sure about a reliable way to test this. This is a big crasher in chromium recent builds. Prefer to commit this first. Filled a bug to figure out reliable test case later: https://bugs.webkit.org/show_bug.cgi?id=34535
------- Comment #7 From 2010-02-03 10:54:42 PST -------
> setting the src in HTMLImageElement to
> empty (to make ImageLoader::m_firedLoad becomes true)

Shouldn't that just cancel pending events?

Without a real test case, it's still hard for me to tell if the fix is correct, or if it introduces behavior regressions.
------- Comment #8 From 2010-02-03 11:25:03 PST -------
Cancelling the pending event removes the loader from m_dispatchSoonList and m_dispatingList if the loader has been destroyed.

The code before patch "http://trac.webkit.org/changeset/49394" actually always removes the loader from the list. David adds this check 4 months ago while he refactored ImageLoader and ImageEventSender in his patch.

David,
Do you see any issue if we always remove the loader from pending event list if the loader is destroyed?

Thanks,
Victor
------- Comment #9 From 2010-02-03 19:20:04 PST -------
I don't think this is specific to Chromium. Victor, have you been able to get a stack trace of this crash in WebKit/mac?
------- Comment #10 From 2010-02-04 17:48:36 PST -------
I have been trying hard to make a test, here is how the crash may happen if every step below happens at the right timing:

1. Create a image element, set src to a valid url, add it to document so image starts loading.

2. When image loading finished, ImageLoader::notifyFinished() is called, and it does the following two things:
  2a. adds the image loader to ImageEventSender::m_dispatchSoonListlist.
  2b. starts a one shot 0 interval timer.

3. Set image element src attribute to empty string in JS so ImageLoader::m_firedLoad = true in ImageLoader::setLoadingImage().

4. Delete image element and force GC to collect it.
   In ImageLoader::~ImageLoader(), because m_firedLoad is true, this image loader will be destroyed but not removed from m_dispatchSoonList.

5. ImageEventSender::dispatchPendingEvents() (may be invoked by ImageEventSender::timerFired() or FrameLoader::finishedParsing()) calls
loader->dispatchPendingLoadEvent(), where "loader" points to an object that has already be destroyed by step 4.

6. Since the loader has be destroyed, loader->dispatchPendingLoadEvent() may crash.

Here are some timing issues:
ImageLoader::m_firedLoad (step 3) needs to be true before calling ImageLoader destructor and ImageLoader destructor needs to called after step 2a but before step 2b triggers dispatchPendingEvents (step 5). To make sure this happen in sequence, I uses JS setTimeout with a time that guarantees the load will be done, however, this will not make sure step 3 and 4 happens before step 5 and in most case, it won't because step 2b starts a 0 interval timer, which is likely happens earlier.

Whether step 6 crashes or not also depends what the garbage data left in that address. If it happens that m_firedLoad != 0 or m_image == 0, then it just returns and does nothing.

The patch I proposed removes the deleted ImageLoader object from m_dispatchSoonListlist, and this should avoid possible crash in step 6.

I am not sure how to write a test for it, do you have any suggestions?
------- Comment #11 From 2010-02-05 00:04:16 PST -------
> Whether step 6 crashes or not also depends what the garbage data left in that
> address. If it happens that m_firedLoad != 0 or m_image == 0, then it just
> returns and does nothing.

While making a test, you can run debug WebKit with MallocScribble environment variable set on Mac (see man malloc), making it more likely to crash. Once you get timing right with MallocScribble, you can further tweak the test to overwrite freed memory even without it.

> I am not sure how to write a test for it, do you have any suggestions?

I'll try to look into this, but probably won't be able to this week.
------- Comment #12 From 2010-02-05 08:50:27 PST -------
Sounds good. Let me know what you find out. Thanks!
------- Comment #13 From 2010-02-05 09:44:26 PST -------
Even though the problem is fairly clear by inspection of the code, writing a reliable test here is very hard --  we have to set the order of the message queue just right (see comment 10).

I spent a bit of time with Victor yesterday, trying to buliding "by science", but as you know, timers code has all kinds of fun things that affect the order of firing, which makes the task excruciating.

Which is why I am asking you guys: what other tips/tricks/tools can we employ here? We replaced the image with the data URL, but that only ensures the first half of the necessary order.

This looks like a frequent renderer crasher in Chromium dev channel, and I have no doubt Safari will have it too.
------- Comment #14 From 2010-02-08 14:06:14 PST -------
Since this is a crash, any chance we could commit this and use bug 34535 to write the layout test?
------- Comment #15 From 2010-02-08 15:59:31 PST -------
I added a delay to the timer in ImageLoader and was able to catch DumpRenderTree (win) crash once with debugger attached. Here is the stack:

WebKit.dll!WebCore::Node::attached()  Line 277 + 0x11 bytes    C++
WebKit.dll!WebCore::ImageLoader::dispatchPendingLoadEvent()  Line 228 + 0x12 bytes    C++
WebKit.dll!WebCore::ImageEventSender::dispatchPendingEvents()  Line 294    C++
WebKit.dll!WebCore::ImageEventSender::timerFired(WebCore::Timer<WebCore::ImageEventSender> * __formal=0x035be680)  Line 301    C++
WebKit.dll!WebCore::Timer<WebCore::ImageEventSender>::fired()  Line 98 + 0x23 bytes    C++
WebKit.dll!WebCore::ThreadTimers::sharedTimerFiredInternal()  Line 112 + 0xf bytes    C++
WebKit.dll!WebCore::ThreadTimers::sharedTimerFired()  Line 91    C++
WebKit.dll!WebCore::TimerWindowWndProc(HWND__ * hWnd=0x000c11c4, unsigned int message=49906, unsigned int wParam=0, long lParam=0)  Line 102 + 0x8 bytes    C++
user32.dll!_InternalCallWinProc@20()  + 0x23 bytes    
user32.dll!_UserCallWinProcCheckWow@32()  + 0xb7 bytes    
user32.dll!_DispatchMessageWorker@8()  + 0xee bytes    
user32.dll!_DispatchMessageW@4()  + 0xf bytes    
DumpRenderTree_debug.exe!runTest(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & testPathOrURL="d:\workspace\WebKit1\LayoutTests\fast\images\image-loader-crashes.html")  Line 948 + 0xf bytes    C++
DumpRenderTree_debug.exe!main(int argc=2, char * * argv=0x00589f40)  Line 1289 + 0x28 bytes    C++
DumpRenderTree_debug.exe!__tmainCRTStartup()  Line 597 + 0x17 bytes    C
kernel32.dll!@BaseThreadInitThunk@12()  + 0xe bytes    
ntdll.dll!___RtlUserThreadStart@8()  + 0x23 bytes    
ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes    

Here is what I did:
1. add delay to the timer in ImageEventSender::dispatchEventSoon(): m_timer.startOneShot(0) -> m_timer.startOneShot(10)
2. create a test html (attached)
3. start DRT from debugger. I use DRT so I have more control on GC.
4. crashes may happen at the above stack location if all the timing follows the steps in comment #10.
------- Comment #16 From 2010-02-08 16:01:10 PST -------
Created an attachment (id=48373) [details]
test html
------- Comment #17 From 2010-02-09 00:38:30 PST -------
Created an attachment (id=48395) [details]
work in progress

With this patch applied, lots of existing tests crash on the new assertion. Actually, maybe even too many for it to be good.
------- Comment #18 From 2010-02-09 08:51:20 PST -------
The check is wrong, because ImageLoaders are created as members of other objects.
------- Comment #19 From 2010-02-09 09:34:05 PST -------
Is Victor's patch incorrect?
------- Comment #20 From 2010-02-09 13:56:12 PST -------
I have a test case that crashes debugs builds reliably - even without any new assertions. It's even easier to crash with MallocScribble enabled.

Victor's patch seems correct in that it doesn't regress behavior, but it introduces unnecessary iteration even when we could know that there are no pending listeners. I'll try to come up with a patch that doesn't affect performance in this way.
------- Comment #21 From 2010-02-09 14:26:34 PST -------
(From update of attachment 48051 [details])
r- to get this out of review queue.
------- Comment #22 From 2010-02-09 17:51:03 PST -------
<rdar://problem/7631464>
------- Comment #23 From 2010-02-09 17:55:05 PST -------
Created an attachment (id=48455) [details]
proposed fix
------- Comment #24 From 2010-02-09 18:01:30 PST -------
Attachment 48455 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
JavaScriptCore/wtf/Vector.h:998:  One line control clauses should not use braces.  [whitespace/braces] [4]
JavaScriptCore/wtf/ValueCheck.h:26:  #ifndef header guard has wrong style, please use: ValueCheck_h  [build/header_guard] [5]
JavaScriptCore/wtf/ValueCheck.h:50:  More than one command on the same line in if  [whitespace/parens] [4]
WebCore/ForwardingHeaders/wtf/ValueCheck.h:1:  #ifndef header guard has wrong style, please use: ValueCheck_h  [build/header_guard] [5]
Total errors found: 4


If any of these errors are false positives, please file a bug against check-webkit-style.
------- Comment #25 From 2010-02-10 12:17:21 PST -------
(From update of attachment 48455 [details])
 116         // Callers of this method do not need events dispatched.
 117         if (!m_firedBeforeLoad) {

It's better if a function comments about what it will do, and not about what its callers will do. Callers are unpredictable.

I'd move this comment to the header declaration of setImage: // Cancels any pending beforeLoad and load events.

r=me
------- Comment #26 From 2010-02-10 13:19:29 PST -------
Committed <http://trac.webkit.org/changeset/54618>. Oh, and addressed style bot complaints in r54619.