Bug 53045

Summary: REGRESSION (r74807): memory corruption after CachedResourceLoader refactoring
Product: WebKit Reporter: Mihai Parparita <mihaip>
Component: Page LoadingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, darin, koivisto, mitz, psolanki, simon.fraser, thakis, tonyg
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 53059, 54424, 55596    
Bug Blocks: 55693    
Attachments:
Description Flags
Patch
none
Patch
none
Patch none

Mihai Parparita
Reported 2011-01-24 14:10:05 PST
The Chromium reliability bot started to see crashes in malloc code shortly after http://trac.webkit.org/changeset/74807/ was checked in. By bisecting the WebKit changes in between known stable and known crashing revisions, it was determined that that was the cause (see http://crbug.com/68516 for more details). Antti, I'll take a look at change to see if anything stands out, but let me know if you can think of any helpful tips.
Attachments
Patch (1.84 KB, patch)
2011-03-09 17:06 PST, Mihai Parparita
no flags
Patch (1.95 KB, patch)
2011-03-09 18:32 PST, Mihai Parparita
no flags
Patch (1.94 KB, patch)
2011-03-09 18:43 PST, Mihai Parparita
no flags
Antti Koivisto
Comment 1 2011-01-24 16:53:36 PST
Does your reliability bot run debug or release builds?
Mihai Parparita
Comment 2 2011-01-24 17:01:58 PST
(In reply to comment #1) > Does your reliability bot run debug or release builds? Just release builds. I'll see if it's possible to run a debug build.
Mihai Parparita
Comment 3 2011-01-25 09:33:31 PST
I enabled some of the debug-only checks in release builds (http://trac.webkit.org/changeset/76575 and http://trac.webkit.org/changeset/76576), but unfortunately they weren't triggered by recent reliability runs.
Antti Koivisto
Comment 4 2011-02-15 12:20:43 PST
https://bugs.webkit.org/show_bug.cgi?id=54486 might be related. I added a patch there that might be helpful for catching this.
Mihai Parparita
Comment 5 2011-02-23 17:57:49 PST
Latest update from the Chromium side (http://code.google.com/p/chromium/issues/detail?id=68516#c44): I managed to catch (by running chrome with patched tcmalloc on bots) something that looks like a double free: We are in CSSSelectorList::deleteSelector (called from ~CSSSelectorList, called from ~CSSStyleRule) and CSSSelector::m_selectorArray point so something that looks like it was already deleted (it's contents are zapped with 0xDEADDEAD which I use as a magic value in the tcmalloc's free). Unfortunately I can't say where exactly it was deleted, because my naive checks do not track this information. I will try to address this tomorrow. Maybe somebody with WebKit knowledge can deduce something just from looking at CSSSelectorList code.
Alexey Proskuryakov
Comment 6 2011-02-23 18:22:15 PST
Antti Koivisto
Comment 7 2011-02-24 03:06:10 PST
Maybe someone has deleted CSSSelector that is actually allocated from m_selectorArray? CSSSelector should probably have private destructor.
Mihai Parparita
Comment 8 2011-03-02 11:00:58 PST
(In reply to comment #7) > Maybe someone has deleted CSSSelector that is actually allocated from m_selectorArray? CSSSelector should probably have private destructor. The CSSSelector destructor ends up being invoked by the OwnPtr member of RareData. Any ideas for how to make this work without making OwnPtr a friend? It seems like adding a m_deleted flag/CRASH check in CSSSelector would help with catching this.
Mihai Parparita
Comment 9 2011-03-09 15:31:38 PST
We were able to run the reliability bot with a debugging allocator, and got this crash. chrome_25a0000!WTF::HashTable<WTF::String,WTF::String,WTF::IdentityExtractor<WTF::String>,WTF::StringHash,WTF::HashTraits<WTF::String>,WTF::HashTraits<WTF::String> >::add<WTF::String,WTF::String,WTF::IdentityHashTranslator<WTF::String,WTF::String,WTF::StringHash> >+0x39 [c:\b\build\slave\win\build\src\third_party\webkit\source\javascriptcore\wtf\hashtable.h @ 644] chrome_25a0000!WebCore::CachedResourceLoader::revalidateResource+0xbc [c:\b\build\slave\win\build\src\third_party\webkit\source\webcore\loader\cache\cachedresourceloader.cpp @ 355] chrome_25a0000!WebCore::CachedResourceLoader::requestResource+0x1b5 [c:\b\build\slave\win\build\src\third_party\webkit\source\webcore\loader\cache\cachedresourceloader.cpp @ 318] chrome_25a0000!WebCore::CachedResourceLoader::requestImage+0xfa [c:\b\build\slave\win\build\src\third_party\webkit\source\webcore\loader\cache\cachedresourceloader.cpp @ 144] chrome_25a0000!WebCore::ImageLoader::updateFromElement+0x1b0 [c:\b\build\slave\win\build\src\third_party\webkit\source\webcore\loader\imageloader.cpp @ 172] chrome_25a0000!WebCore::HTMLImageElement::parseMappedAttribute+0x5a [c:\b\build\slave\win\build\src\third_party\webkit\source\webcore\html\htmlimageelement.cpp @ 167] chrome_25a0000!WebCore::StyledElement::attributeChanged+0xf7 [c:\b\build\slave\win\build\src\third_party\webkit\source\webcore\dom\styledelement.cpp @ 189] chrome_25a0000!WebCore::NamedNodeMap::addAttribute+0x77 [c:\b\build\slave\win\build\src\third_party\webkit\source\webcore\dom\namednodemap.cpp @ 263] Looking at the implementation of CachedResourceLoader::revalidateResource (http://trac.webkit.org/browser/trunk/Source/WebCore/loader/cache/CachedResourceLoader.cpp#L335) we crash when calling m_validatedURLs.add(url). We got url earlier, from "const String& url = resource->url();". However, resource could have been deleted by the "memoryCache()->remove(resource);" call, thus the url String reference may not be good anymore. I think making an explicit copy of the url String should fix things. The fact that all this code originated with r74807 makes me pretty sure that this is the cause.
Mihai Parparita
Comment 10 2011-03-09 15:44:16 PST
I now see that that the stack that I got in comment 9 is the same as the one in bug 54486, so that may be a dupe of this.
Antti Koivisto
Comment 11 2011-03-09 15:56:37 PST
(In reply to comment #9) 5) we crash when calling m_validatedURLs.add(url). We got url earlier, from "const String& url = resource->url();". However, resource could have been deleted by the "memoryCache()->remove(resource);" call, thus the url String reference may not be good anymore. I think making an explicit copy of the url String should fix things. The fact that all this code originated with r74807 makes me pretty sure that this is the cause. Great, that must be it! Do you want to make the patch?
Antti Koivisto
Comment 12 2011-03-09 15:57:49 PST
*** Bug 54486 has been marked as a duplicate of this bug. ***
Mihai Parparita
Comment 13 2011-03-09 15:58:22 PST
(In reply to comment #11) > Great, that must be it! Do you want to make the patch? Yeah, I'll take a shot (and try to reproduce locally too).
Mihai Parparita
Comment 14 2011-03-09 17:06:01 PST
Mihai Parparita
Comment 15 2011-03-09 17:08:10 PST
(In reply to comment #13) > Yeah, I'll take a shot (and try to reproduce locally too). I didn't have much luck reproducing this locally. MemoryCache::remove calls MemoryCache::evict, which will delete the resource if CachedResource::canDelete returns true, but I couldn't get that configuration to happen. Since this is a harmless change otherwise (except for the extra copy), I think it's worth trying to land this and see if the crashes on our reliability bot go away.
Mihai Parparita
Comment 16 2011-03-09 18:32:46 PST
Tony Gentilcore
Comment 17 2011-03-09 18:42:56 PST
Comment on attachment 85273 [details] Patch Up to you, but the style seems to be to drop the const.
Mihai Parparita
Comment 18 2011-03-09 18:43:06 PST
Mihai Parparita
Comment 19 2011-03-09 18:53:08 PST
Comment on attachment 85275 [details] Patch Accidentally clobbered Tony's r+ with the new patch, will land it by hand.
Mihai Parparita
Comment 20 2011-03-09 18:55:13 PST
Mihai Parparita
Comment 21 2011-03-10 09:12:37 PST
(In reply to comment #20) > Committed r80686: <http://trac.webkit.org/changeset/80686> I'm pretty sure that this does fix the problem. Reliability bot runs after that got picked up by a DEPS roll (starting with http://build.chromium.org/p/chromium/builders/Win%20Reliability/builds/2556/steps/reliability%3A%20complete%20result%20of%20previous%20build/logs/stdio, and thus far 6 other runs) no longer show tcmalloc crashes.
Note You need to log in before you can comment on or make changes to this bug.