Bug 34490 - REGRESSION(r52116): WebCore::ImageEventSender::dispatchPendingEvents() crashes in certain conditions
Summary: REGRESSION(r52116): WebCore::ImageEventSender::dispatchPendingEvents() crashe...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC All
: P2 Normal
Assignee: Alexey Proskuryakov
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2010-02-02 10:38 PST by Victor Wang
Modified: 2010-02-10 13:19 PST (History)
10 users (show)

See Also:


Attachments
test case by adding delay to ImageLoader::m_timer in ImageEventSender::dispatchEventSoon() (664 bytes, text/html)
2010-02-03 09:46 PST, Victor Wang
no flags Details
Proposed patch (2.06 KB, patch)
2010-02-03 10:32 PST, Victor Wang
ap: review-
Details | Formatted Diff | Diff
test html (1.02 KB, text/html)
2010-02-08 16:01 PST, Victor Wang
no flags Details
work in progress (11.44 KB, patch)
2010-02-09 00:38 PST, Alexey Proskuryakov
no flags Details | Formatted Diff | Diff
proposed fix (24.19 KB, patch)
2010-02-09 17:55 PST, Alexey Proskuryakov
ggaren: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Victor Wang 2010-02-02 10:38:18 PST
This is a big crasher in recent Chromium builds and happens on all platforms (Win, Linux and Mac), no repro steps yet.

Here is the stack Trace:
0x020941e8	 [chrome.dll	 - imageloader.cpp:292]	WebCore::ImageEventSender::dispatchPendingEvents()
0x01f907e0	 [chrome.dll	 - document.cpp:1823]	WebCore::Document::implicitClose()
0x01f9ece9	 [chrome.dll	 - frameloader.cpp:1182]	WebCore::FrameLoader::checkCallImplicitClose()
0x01f9ebfc	 [chrome.dll	 - frameloader.cpp:1130]	WebCore::FrameLoader::checkCompleted()
0x01f9eb41	 [chrome.dll	 - frameloader.cpp:1069]	WebCore::FrameLoader::finishedParsing()
0x01f945e2	 [chrome.dll	 - document.cpp:4184]	WebCore::Document::finishedParsing()
0x02158016	 [chrome.dll	 - htmlparser.cpp:1662]	WebCore::HTMLParser::finished()
0x020b95b8	 [chrome.dll	 - htmltokenizer.cpp:1878]	WebCore::HTMLTokenizer::end()
0x020b982f	 [chrome.dll	 - htmltokenizer.cpp:1918]	WebCore::HTMLTokenizer::finish()
0x01f9eacf	 [chrome.dll	 - frameloader.cpp:972]	WebCore::FrameLoader::endIfNotLoadingMainResource()
0x01f9ea8f	 [chrome.dll	 - frameloader.cpp:957]	WebCore::FrameLoader::end()
0x01fa1a30	 [chrome.dll	 - frameloader.cpp:2752]	WebCore::FrameLoader::finishedLoading()
Comment 1 Victor Wang 2010-02-02 10:46:36 PST
Looking into the dump, think the crash is caused by the "loader" becomes invalid when ImageEventSender dispatches pending events. 

The crash started a few weeks ago and seems related to webkit patch: http://trac.webkit.org/changeset/52116, which fixes bug 31660: Image load event fires before the document fragment is attached. To fix the issue, the patch (52116) removes ImageLoader::dispatchPendingLoadEvents call from HTMLTokenizer::write() so the load events will be dispatched later.

From the dump, the crash happens in ImageEventSender::dispatchPendingEvents() which is called from HTMLTokenizer::finish(), and it seems that ImageLoader could be destroyed before that and the "loader" in m_dispatchSoonList becomes invalid.

Looking at ImageLoader destructor code, the event is removed from loadEventSender only if m_fireLoad is false. I think we could remove the m_fireLoad check in ImageLoader destructor and this should fix the crashes we are having with Chromium. This should not be Chromium specific issue and think the crash may happen on other browsers too. 

Alexey and Dave,
you touched this code before, any comments on the issue and the change I proposed to remove the check in ImageLoader destructor?
Comment 2 Alexey Proskuryakov 2010-02-02 11:15:14 PST
Such a fix would definitely need a regression test.
Comment 3 Victor Wang 2010-02-03 09:46:18 PST
Created attachment 48047 [details]
test case by adding delay to ImageLoader::m_timer in ImageEventSender::dispatchEventSoon()

I was able to catch the crash by adding delay to the timer in ImageEventSender (ImageLoader.cpp) and keep refreshing the page attached. This confirms the theory in comment#1 is correct.

The crash happens if ImageLoader (owned by HTMLImageElement) is destroyed after it has been added to ImageEventSender::m_dispatchSoonList but before the pending load events are dispatched (by timer, HTMLParser::finished etc).

The crash could not be easily repro because of all the timing issues: it depends on the time the ImageLoader timer fires, the time HTMLImageElement is destroyed and collected by GC, the image loading time and the time a new empty image src is sets to existing loader etc.

Removing the image loader from the pending event queue whenever the loader is destroyed fixes the crash.
Comment 4 Alexey Proskuryakov 2010-02-03 10:04:01 PST
We can simulate slow loading for test purposes - see e.g. /Users/ap/Safari/OpenSource/LayoutTests/http/tests/misc/resources/slow-png-load.pl.
Comment 5 Victor Wang 2010-02-03 10:25:11 PST
Hi Alexey,

Thanks for your suggestions. While I was repro this, looks like the hardest thing in making a reliable test is the timer in ImageEventSender. It depends on how fast the timer fires after the pending event has been added to the list (see code in ImageEventSender::dispatchEventSoon). There are also other things need to happen at right time such as setting the src in HTMLImageElement to empty (to make ImageLoader::m_firedLoad becomes true) before ImageLoader is destroyed etc.
Comment 6 Victor Wang 2010-02-03 10:32:23 PST
Created attachment 48051 [details]
Proposed patch

Not sure about a reliable way to test this. This is a big crasher in chromium recent builds. Prefer to commit this first. Filled a bug to figure out reliable test case later: https://bugs.webkit.org/show_bug.cgi?id=34535
Comment 7 Alexey Proskuryakov 2010-02-03 10:54:42 PST
> setting the src in HTMLImageElement to
> empty (to make ImageLoader::m_firedLoad becomes true)

Shouldn't that just cancel pending events?

Without a real test case, it's still hard for me to tell if the fix is correct, or if it introduces behavior regressions.
Comment 8 Victor Wang 2010-02-03 11:25:03 PST
Cancelling the pending event removes the loader from m_dispatchSoonList and m_dispatingList if the loader has been destroyed.

The code before patch "http://trac.webkit.org/changeset/49394" actually always removes the loader from the list. David adds this check 4 months ago while he refactored ImageLoader and ImageEventSender in his patch.

David,
Do you see any issue if we always remove the loader from pending event list if the loader is destroyed?

Thanks,
Victor
Comment 9 Dimitri Glazkov (Google) 2010-02-03 19:20:04 PST
I don't think this is specific to Chromium. Victor, have you been able to get a stack trace of this crash in WebKit/mac?
Comment 10 Victor Wang 2010-02-04 17:48:36 PST
I have been trying hard to make a test, here is how the crash may happen if every step below happens at the right timing:

1. Create a image element, set src to a valid url, add it to document so image starts loading.

2. When image loading finished, ImageLoader::notifyFinished() is called, and it does the following two things:
  2a. adds the image loader to ImageEventSender::m_dispatchSoonListlist.
  2b. starts a one shot 0 interval timer.

3. Set image element src attribute to empty string in JS so ImageLoader::m_firedLoad = true in ImageLoader::setLoadingImage().

4. Delete image element and force GC to collect it.
   In ImageLoader::~ImageLoader(), because m_firedLoad is true, this image loader will be destroyed but not removed from m_dispatchSoonList.

5. ImageEventSender::dispatchPendingEvents() (may be invoked by ImageEventSender::timerFired() or FrameLoader::finishedParsing()) calls
loader->dispatchPendingLoadEvent(), where "loader" points to an object that has already be destroyed by step 4.

6. Since the loader has be destroyed, loader->dispatchPendingLoadEvent() may crash.

Here are some timing issues:
ImageLoader::m_firedLoad (step 3) needs to be true before calling ImageLoader destructor and ImageLoader destructor needs to called after step 2a but before step 2b triggers dispatchPendingEvents (step 5). To make sure this happen in sequence, I uses JS setTimeout with a time that guarantees the load will be done, however, this will not make sure step 3 and 4 happens before step 5 and in most case, it won't because step 2b starts a 0 interval timer, which is likely happens earlier.

Whether step 6 crashes or not also depends what the garbage data left in that address. If it happens that m_firedLoad != 0 or m_image == 0, then it just returns and does nothing.

The patch I proposed removes the deleted ImageLoader object from m_dispatchSoonListlist, and this should avoid possible crash in step 6.

I am not sure how to write a test for it, do you have any suggestions?
Comment 11 Alexey Proskuryakov 2010-02-05 00:04:16 PST
> Whether step 6 crashes or not also depends what the garbage data left in that
> address. If it happens that m_firedLoad != 0 or m_image == 0, then it just
> returns and does nothing.

While making a test, you can run debug WebKit with MallocScribble environment variable set on Mac (see man malloc), making it more likely to crash. Once you get timing right with MallocScribble, you can further tweak the test to overwrite freed memory even without it.

> I am not sure how to write a test for it, do you have any suggestions?

I'll try to look into this, but probably won't be able to this week.
Comment 12 Victor Wang 2010-02-05 08:50:27 PST
Sounds good. Let me know what you find out. Thanks!
Comment 13 Dimitri Glazkov (Google) 2010-02-05 09:44:26 PST
Even though the problem is fairly clear by inspection of the code, writing a reliable test here is very hard --  we have to set the order of the message queue just right (see comment 10).

I spent a bit of time with Victor yesterday, trying to buliding "by science", but as you know, timers code has all kinds of fun things that affect the order of firing, which makes the task excruciating.

Which is why I am asking you guys: what other tips/tricks/tools can we employ here? We replaced the image with the data URL, but that only ensures the first half of the necessary order.

This looks like a frequent renderer crasher in Chromium dev channel, and I have no doubt Safari will have it too.
Comment 14 Dimitri Glazkov (Google) 2010-02-08 14:06:14 PST
Since this is a crash, any chance we could commit this and use bug 34535 to write the layout test?
Comment 15 Victor Wang 2010-02-08 15:59:31 PST
I added a delay to the timer in ImageLoader and was able to catch DumpRenderTree (win) crash once with debugger attached. Here is the stack:

WebKit.dll!WebCore::Node::attached()  Line 277 + 0x11 bytes	C++
WebKit.dll!WebCore::ImageLoader::dispatchPendingLoadEvent()  Line 228 + 0x12 bytes	C++
WebKit.dll!WebCore::ImageEventSender::dispatchPendingEvents()  Line 294	C++
WebKit.dll!WebCore::ImageEventSender::timerFired(WebCore::Timer<WebCore::ImageEventSender> * __formal=0x035be680)  Line 301	C++
WebKit.dll!WebCore::Timer<WebCore::ImageEventSender>::fired()  Line 98 + 0x23 bytes	C++
WebKit.dll!WebCore::ThreadTimers::sharedTimerFiredInternal()  Line 112 + 0xf bytes	C++
WebKit.dll!WebCore::ThreadTimers::sharedTimerFired()  Line 91	C++
WebKit.dll!WebCore::TimerWindowWndProc(HWND__ * hWnd=0x000c11c4, unsigned int message=49906, unsigned int wParam=0, long lParam=0)  Line 102 + 0x8 bytes	C++
user32.dll!_InternalCallWinProc@20()  + 0x23 bytes	
user32.dll!_UserCallWinProcCheckWow@32()  + 0xb7 bytes	
user32.dll!_DispatchMessageWorker@8()  + 0xee bytes	
user32.dll!_DispatchMessageW@4()  + 0xf bytes	
DumpRenderTree_debug.exe!runTest(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & testPathOrURL="d:\workspace\WebKit1\LayoutTests\fast\images\image-loader-crashes.html")  Line 948 + 0xf bytes	C++
DumpRenderTree_debug.exe!main(int argc=2, char * * argv=0x00589f40)  Line 1289 + 0x28 bytes	C++
DumpRenderTree_debug.exe!__tmainCRTStartup()  Line 597 + 0x17 bytes	C
kernel32.dll!@BaseThreadInitThunk@12()  + 0xe bytes	
ntdll.dll!___RtlUserThreadStart@8()  + 0x23 bytes	
ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes	

Here is what I did:
1. add delay to the timer in ImageEventSender::dispatchEventSoon(): m_timer.startOneShot(0) -> m_timer.startOneShot(10)
2. create a test html (attached)
3. start DRT from debugger. I use DRT so I have more control on GC.
4. crashes may happen at the above stack location if all the timing follows the steps in comment #10.
Comment 16 Victor Wang 2010-02-08 16:01:10 PST
Created attachment 48373 [details]
test html
Comment 17 Alexey Proskuryakov 2010-02-09 00:38:30 PST
Created attachment 48395 [details]
work in progress

With this patch applied, lots of existing tests crash on the new assertion. Actually, maybe even too many for it to be good.
Comment 18 Alexey Proskuryakov 2010-02-09 08:51:20 PST
The check is wrong, because ImageLoaders are created as members of other objects.
Comment 19 Dimitri Glazkov (Google) 2010-02-09 09:34:05 PST
Is Victor's patch incorrect?
Comment 20 Alexey Proskuryakov 2010-02-09 13:56:12 PST
I have a test case that crashes debugs builds reliably - even without any new assertions. It's even easier to crash with MallocScribble enabled.

Victor's patch seems correct in that it doesn't regress behavior, but it introduces unnecessary iteration even when we could know that there are no pending listeners. I'll try to come up with a patch that doesn't affect performance in this way.
Comment 21 Alexey Proskuryakov 2010-02-09 14:26:34 PST
Comment on attachment 48051 [details]
Proposed patch

r- to get this out of review queue.
Comment 22 Alexey Proskuryakov 2010-02-09 17:51:03 PST
<rdar://problem/7631464>
Comment 23 Alexey Proskuryakov 2010-02-09 17:55:05 PST
Created attachment 48455 [details]
proposed fix
Comment 24 WebKit Review Bot 2010-02-09 18:01:30 PST
Attachment 48455 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
JavaScriptCore/wtf/Vector.h:998:  One line control clauses should not use braces.  [whitespace/braces] [4]
JavaScriptCore/wtf/ValueCheck.h:26:  #ifndef header guard has wrong style, please use: ValueCheck_h  [build/header_guard] [5]
JavaScriptCore/wtf/ValueCheck.h:50:  More than one command on the same line in if  [whitespace/parens] [4]
WebCore/ForwardingHeaders/wtf/ValueCheck.h:1:  #ifndef header guard has wrong style, please use: ValueCheck_h  [build/header_guard] [5]
Total errors found: 4


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 25 Geoffrey Garen 2010-02-10 12:17:21 PST
Comment on attachment 48455 [details]
proposed fix

 116         // Callers of this method do not need events dispatched.
 117         if (!m_firedBeforeLoad) {

It's better if a function comments about what it will do, and not about what its callers will do. Callers are unpredictable.

I'd move this comment to the header declaration of setImage: // Cancels any pending beforeLoad and load events.

r=me
Comment 26 Alexey Proskuryakov 2010-02-10 13:19:29 PST
Committed <http://trac.webkit.org/changeset/54618>. Oh, and addressed style bot complaints in r54619.