Bug 53045 - REGRESSION (r74807): memory corruption after CachedResourceLoader refactoring
Summary: REGRESSION (r74807): memory corruption after CachedResourceLoader refactoring
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
: 54486 (view as bug list)
Depends on: 53059 54424 55596
Blocks: 55693
  Show dependency treegraph
 
Reported: 2011-01-24 14:10 PST by Mihai Parparita
Modified: 2011-03-10 09:12 PST (History)
8 users (show)

See Also:


Attachments
Patch (1.84 KB, patch)
2011-03-09 17:06 PST, Mihai Parparita
no flags Details | Formatted Diff | Diff
Patch (1.95 KB, patch)
2011-03-09 18:32 PST, Mihai Parparita
no flags Details | Formatted Diff | Diff
Patch (1.94 KB, patch)
2011-03-09 18:43 PST, Mihai Parparita
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mihai Parparita 2011-01-24 14:10:05 PST
The Chromium reliability bot started to see crashes in malloc code shortly after http://trac.webkit.org/changeset/74807/ was checked in. By bisecting the WebKit changes in between known stable and known crashing revisions, it was determined that that was the cause (see http://crbug.com/68516 for more details).

Antti, I'll take a look at change to see if anything stands out, but let me know if you can think of any helpful tips.
Comment 1 Antti Koivisto 2011-01-24 16:53:36 PST
Does your reliability bot run debug or release builds?
Comment 2 Mihai Parparita 2011-01-24 17:01:58 PST
(In reply to comment #1)
> Does your reliability bot run debug or release builds?

Just release builds. I'll see if it's possible to run a debug build.
Comment 3 Mihai Parparita 2011-01-25 09:33:31 PST
I enabled some of the debug-only checks in release builds (http://trac.webkit.org/changeset/76575 and http://trac.webkit.org/changeset/76576), but unfortunately they weren't triggered by recent reliability runs.
Comment 4 Antti Koivisto 2011-02-15 12:20:43 PST
https://bugs.webkit.org/show_bug.cgi?id=54486 might be related. I added a patch there that might be helpful for catching this.
Comment 5 Mihai Parparita 2011-02-23 17:57:49 PST
Latest update from the Chromium side (http://code.google.com/p/chromium/issues/detail?id=68516#c44):

I managed to catch (by running chrome with patched tcmalloc on bots) something that looks like a double free:

We are in CSSSelectorList::deleteSelector (called from ~CSSSelectorList, called from ~CSSStyleRule) and CSSSelector::m_selectorArray point so something that looks like it was already deleted (it's contents are zapped with 0xDEADDEAD which I use as a magic value in the tcmalloc's free). Unfortunately I can't say where exactly it was deleted, because my naive checks do not track this information. I will try to address this tomorrow.

Maybe somebody with WebKit knowledge can deduce something just from looking at CSSSelectorList code.
Comment 6 Alexey Proskuryakov 2011-02-23 18:22:15 PST
<rdar://problem/9045923>
Comment 7 Antti Koivisto 2011-02-24 03:06:10 PST
Maybe someone has deleted CSSSelector that is actually allocated from m_selectorArray? CSSSelector should probably have private destructor.
Comment 8 Mihai Parparita 2011-03-02 11:00:58 PST
(In reply to comment #7)
> Maybe someone has deleted CSSSelector that is actually allocated from m_selectorArray? CSSSelector should probably have private destructor.

The CSSSelector destructor ends up being invoked by the OwnPtr member of RareData. Any ideas for how to make this work without making OwnPtr a friend? 

It seems like adding a m_deleted flag/CRASH check in CSSSelector would help with catching this.
Comment 9 Mihai Parparita 2011-03-09 15:31:38 PST
We were able to run the reliability bot with a debugging allocator, and got this crash.

chrome_25a0000!WTF::HashTable<WTF::String,WTF::String,WTF::IdentityExtractor<WTF::String>,WTF::StringHash,WTF::HashTraits<WTF::String>,WTF::HashTraits<WTF::String> >::add<WTF::String,WTF::String,WTF::IdentityHashTranslator<WTF::String,WTF::String,WTF::StringHash> >+0x39 [c:\b\build\slave\win\build\src\third_party\webkit\source\javascriptcore\wtf\hashtable.h @ 644]
chrome_25a0000!WebCore::CachedResourceLoader::revalidateResource+0xbc [c:\b\build\slave\win\build\src\third_party\webkit\source\webcore\loader\cache\cachedresourceloader.cpp @ 355]
chrome_25a0000!WebCore::CachedResourceLoader::requestResource+0x1b5 [c:\b\build\slave\win\build\src\third_party\webkit\source\webcore\loader\cache\cachedresourceloader.cpp @ 318]
chrome_25a0000!WebCore::CachedResourceLoader::requestImage+0xfa [c:\b\build\slave\win\build\src\third_party\webkit\source\webcore\loader\cache\cachedresourceloader.cpp @ 144]
chrome_25a0000!WebCore::ImageLoader::updateFromElement+0x1b0 [c:\b\build\slave\win\build\src\third_party\webkit\source\webcore\loader\imageloader.cpp @ 172]
chrome_25a0000!WebCore::HTMLImageElement::parseMappedAttribute+0x5a [c:\b\build\slave\win\build\src\third_party\webkit\source\webcore\html\htmlimageelement.cpp @ 167]
chrome_25a0000!WebCore::StyledElement::attributeChanged+0xf7 [c:\b\build\slave\win\build\src\third_party\webkit\source\webcore\dom\styledelement.cpp @ 189]
chrome_25a0000!WebCore::NamedNodeMap::addAttribute+0x77 [c:\b\build\slave\win\build\src\third_party\webkit\source\webcore\dom\namednodemap.cpp @ 263]

Looking at the implementation of CachedResourceLoader::revalidateResource (http://trac.webkit.org/browser/trunk/Source/WebCore/loader/cache/CachedResourceLoader.cpp#L335) we crash when calling m_validatedURLs.add(url). We got url earlier, from "const String& url = resource->url();". However, resource could have been deleted by the "memoryCache()->remove(resource);" call, thus the url String reference may not be good anymore. I think making an explicit copy of the url String should fix things. The fact that all this code originated with r74807 makes me pretty sure that this is the cause.
Comment 10 Mihai Parparita 2011-03-09 15:44:16 PST
I now see that that the stack that I got in comment 9 is the same as the one in bug 54486, so that may be a dupe of this.
Comment 11 Antti Koivisto 2011-03-09 15:56:37 PST
(In reply to comment #9)
5) we crash when calling m_validatedURLs.add(url). We got url earlier, from "const String& url = resource->url();". However, resource could have been deleted by the "memoryCache()->remove(resource);" call, thus the url String reference may not be good anymore. I think making an explicit copy of the url String should fix things. The fact that all this code originated with r74807 makes me pretty sure that this is the cause.

Great, that must be it! Do you want to make the patch?
Comment 12 Antti Koivisto 2011-03-09 15:57:49 PST
*** Bug 54486 has been marked as a duplicate of this bug. ***
Comment 13 Mihai Parparita 2011-03-09 15:58:22 PST
(In reply to comment #11)
> Great, that must be it! Do you want to make the patch?

Yeah, I'll take a shot (and try to reproduce locally too).
Comment 14 Mihai Parparita 2011-03-09 17:06:01 PST
Created attachment 85261 [details]
Patch
Comment 15 Mihai Parparita 2011-03-09 17:08:10 PST
(In reply to comment #13)
> Yeah, I'll take a shot (and try to reproduce locally too).

I didn't have much luck reproducing this locally. MemoryCache::remove calls MemoryCache::evict, which will delete the resource if CachedResource::canDelete returns true, but I couldn't get that configuration to happen.

Since this is a harmless change otherwise (except for the extra copy), I think it's worth trying to land this and see if the crashes on our reliability bot go away.
Comment 16 Mihai Parparita 2011-03-09 18:32:46 PST
Created attachment 85273 [details]
Patch
Comment 17 Tony Gentilcore 2011-03-09 18:42:56 PST
Comment on attachment 85273 [details]
Patch

Up to you, but the style seems to be to drop the const.
Comment 18 Mihai Parparita 2011-03-09 18:43:06 PST
Created attachment 85275 [details]
Patch
Comment 19 Mihai Parparita 2011-03-09 18:53:08 PST
Comment on attachment 85275 [details]
Patch

Accidentally clobbered Tony's r+ with the new patch, will land it by hand.
Comment 20 Mihai Parparita 2011-03-09 18:55:13 PST
Committed r80686: <http://trac.webkit.org/changeset/80686>
Comment 21 Mihai Parparita 2011-03-10 09:12:37 PST
(In reply to comment #20)
> Committed r80686: <http://trac.webkit.org/changeset/80686>

I'm pretty sure that this does fix the problem. Reliability bot runs after that got picked up by a DEPS roll (starting with http://build.chromium.org/p/chromium/builders/Win%20Reliability/builds/2556/steps/reliability%3A%20complete%20result%20of%20previous%20build/logs/stdio, and thus far 6 other runs) no longer show tcmalloc crashes.