Bug 112873 - Remove ValueCheck system
Summary: Remove ValueCheck system
Status: RESOLVED LATER
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Eric Seidel (no email)
URL:
Keywords:
Depends on:
Blocks: 112831
  Show dependency treegraph
 
Reported: 2013-03-20 21:33 PDT by Eric Seidel (no email)
Modified: 2013-11-05 08:52 PST (History)
21 users (show)

See Also:


Attachments
Patch (11.54 KB, patch)
2013-03-20 21:36 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch (14.92 KB, patch)
2013-03-20 23:47 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2013-03-20 21:33:30 PDT
Remove ValueCheck system
Comment 1 Eric Seidel (no email) 2013-03-20 21:36:39 PDT
Created attachment 194181 [details]
Patch
Comment 2 Eric Seidel (no email) 2013-03-20 21:39:19 PDT
This came from bug http://trac.webkit.org/changeset/54618.  It was a good dream, but it doesn't seem to have yielded much, and I think should now just be removed.

For bug 112831, we could easily add a ValueCheck<StringImpl*> override which avoids this fastMallocSize() check, but given that fastMalloc isn't even used everywhere, it seems this code isn't buying us much anymore.
Comment 3 Eric Seidel (no email) 2013-03-20 21:40:18 PDT
I've CC'd the original author (ggaren) and reviewer (ap) and our Security peeps in case they have opinions on the existence of ValueCheck.
Comment 4 Early Warning System Bot 2013-03-20 21:47:52 PDT
Comment on attachment 194181 [details]
Patch

Attachment 194181 [details] did not pass qt-ews (qt):
Output: http://webkit-commit-queue.appspot.com/results/17226348
Comment 5 Early Warning System Bot 2013-03-20 21:51:05 PDT
Comment on attachment 194181 [details]
Patch

Attachment 194181 [details] did not pass qt-wk2-ews (qt):
Output: http://webkit-commit-queue.appspot.com/results/17241078
Comment 6 WebKit Review Bot 2013-03-20 22:01:13 PDT
Comment on attachment 194181 [details]
Patch

Attachment 194181 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://webkit-commit-queue.appspot.com/results/17223427
Comment 7 EFL EWS Bot 2013-03-20 22:18:13 PDT
Comment on attachment 194181 [details]
Patch

Attachment 194181 [details] did not pass efl-ews (efl):
Output: http://webkit-commit-queue.appspot.com/results/17290035
Comment 8 WebKit Review Bot 2013-03-20 22:22:49 PDT
Comment on attachment 194181 [details]
Patch

Attachment 194181 [details] did not pass cr-linux-debug-ews (chromium-xvfb):
Output: http://webkit-commit-queue.appspot.com/results/17112592
Comment 9 Build Bot 2013-03-20 22:23:31 PDT
Comment on attachment 194181 [details]
Patch

Attachment 194181 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-commit-queue.appspot.com/results/17213788
Comment 10 Alexey Proskuryakov 2013-03-20 23:04:02 PDT
> it doesn't seem to have yielded much

Huh?
Comment 11 Alexey Proskuryakov 2013-03-20 23:21:45 PDT
Comment on attachment 194181 [details]
Patch

The work in bug 112831 is not nearly useful enough to validate removing security checks that proved helpful in the past. Is there a stronger reason to do this?

I'm not sure what you mean when saying that fastMalloc isn't used everywhere. Are you talking about ports using a malloc that doesn't have introspection capabilities? Or about some parts of WebCore?

I find it unimpressive that the patch got immediately cq+'ed without any discussion outside business hours.
Comment 12 Eric Seidel (no email) 2013-03-20 23:47:32 PDT
Created attachment 194193 [details]
Patch
Comment 13 Eric Seidel (no email) 2013-03-20 23:50:06 PDT
I've written a better ChangeLog and updated the patch so that it should compile on all ports.  I'm in no rush to take this patch anywhere. :)  I've CC'd all the folks who have touched this code, and they should feel encouraged to read the patch, consider and comment.  Thanks.
Comment 14 Maciej Stachowiak 2013-03-21 00:03:33 PDT
What's the purpose of ValueCheck and what is the reason to remove it? Might be worth a discussion on webkit-dev if we don't have agreement on its value.
Comment 15 Alexey Proskuryakov 2013-03-21 00:25:18 PDT
This is a technique to catch some hard to reproduce crashes with strong security implications, added in bug 34150, and later extended in bug 34490. Quoting from the former,

------------------------
Having e.g. HashMap<AtomicStringImpl*, ValueType>, it is extremely difficult to catch stale AtomicStringImpl pointers in map keys. In most cases, they can just sit there making the map larger, and only cause issues if another AtomicStringImpl gets allocated at exactly the same address.

Even running with CHECK_HASHTABLE_CONSISTENCY won't catch this, since pointer's hash is just its value, so the table remains perfectly consistent. And we (I?) don't usually enable CHECK_HASHTABLE_CONSISTENCY, even when hunting down mystery crashers.
------------------------

This was prompted by an investigation of a very tricky bug we had at the time (I can't easily find it) - we even knew a specific page where this frequently happened, but could not reproduce without these assertions. That also proved to be a security bug, later exploited.

These assertions don't catch bugs very often AFAIK, but I have a recollection of them catching something similarly bad once or twice later.
Comment 16 Adam Barth 2013-03-21 00:33:58 PDT
These ASSERTs are just plain wrong for ports that set ENABLE_GLOBAL_FASTMALLOC_NEW to 0 and USE_SYSTEM_MALLOC to 0.  There's no reason to believe that a given pointer was allocated with fastMalloc.  It depends on whether the class in question contains WTF_MAKE_FAST_ALLOCATED.

Even in ports that define ENABLE_GLOBAL_FASTMALLOC_NEW to 1, the ASSERTs aren't always correct.  For example, they're wrong for ImageLoader, which is why it needs a template specialization of ValueCheck.

Finally, there's no requirement that any given object is allocated with malloc at all.  As shown in bug 112831, even objects that are normally heap allocated can be allocated in other ways, such as statically.
Comment 17 Eric Seidel (no email) 2013-03-21 00:41:50 PDT
(In reply to comment #14)
> What's the purpose of ValueCheck and what is the reason to remove it? Might be worth a discussion on webkit-dev if we don't have agreement on its value.

I recommend reading the ChangeLog in attachment 194913 [details].  I've also cc'd the author/reviewer (as noted in comment #3), who can offer validation/rejection of my understanding as noted in the ChangeLog.
Comment 18 Alexey Proskuryakov 2013-03-21 01:12:24 PDT
Adam, there is nothing in these checks that is FastMalloc specific. We even have an implementation that uses system malloc_size today. But even if the checks were FastMalloc specific, that wouldn't be an acceptable reason to remove them.

These checks do not deal with arbitrary void* pointers, they are always typed. So it's not meaningful to say that their validity depends on whether a given class uses WTF_MAKE_FAST_ALLOCATED.

You say that there is no requirement on how objects are allocated, and that is true. However, writing code that is not allowed to have any prerequisites and may not assert those is not practical for a project of this size. We should be making the code more rigid and predictable, not less.

The example with ImageLoader shows that this does not even necessarily add limitations on what can be done.
Comment 19 Maciej Stachowiak 2013-03-21 01:26:20 PDT
(In reply to comment #18)
> Adam, there is nothing in these checks that is FastMalloc specific. We even have an implementation that uses system malloc_size today. 

For reference:

size_t fastMallocSize(const void* p)
{
#if ENABLE(WTF_MALLOC_VALIDATION)
    return Internal::fastMallocValidationHeader(const_cast<void*>(p))->m_size;
#elif OS(DARWIN)
    return malloc_size(p);
#elif OS(WINDOWS)
    return _msize(const_cast<void*>(p));
#else
    UNUSED_PARAM(p);
    return 1;
#endif
}
Comment 20 Maciej Stachowiak 2013-03-21 01:28:37 PDT
Comment on attachment 194193 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=194193&action=review

> Source/WTF/ChangeLog:71
> +        * GNUmakefile.list.am:
> +        * WTF.gypi:
> +        * WTF.pro:
> +        * WTF.vcproj/WTF.vcproj:
> +        * WTF.vcxproj/WTF.vcxproj:
> +        * WTF.xcodeproj/project.pbxproj:
> +        * wtf/CMakeLists.txt:
> +        * wtf/HashTable.h:
> +        (WTF::::checkTableConsistencyExceptSize):
> +        * wtf/Vector.h:
> +        (Vector):
> +        (WTF):

Is there a reason this patch doesn't remove ValueCheck.h?
Comment 21 Eric Seidel (no email) 2013-03-21 01:31:01 PDT
(In reply to comment #20)
> (From update of attachment 194193 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=194193&action=review
>
> Is there a reason this patch doesn't remove ValueCheck.h?

Just an oversight on my part.
Comment 22 Eric Seidel (no email) 2013-03-21 09:56:42 PDT
(In reply to comment #19)
> (In reply to comment #18)
> > Adam, there is nothing in these checks that is FastMalloc specific. We even have an implementation that uses system malloc_size today. 
> 
> For reference:
> 
> size_t fastMallocSize(const void* p)
> {
> #if ENABLE(WTF_MALLOC_VALIDATION)
>     return Internal::fastMallocValidationHeader(const_cast<void*>(p))->m_size;
> #elif OS(DARWIN)
>     return malloc_size(p);
> #elif OS(WINDOWS)
>     return _msize(const_cast<void*>(p));
> #else
>     UNUSED_PARAM(p);
>     return 1;
> #endif
> }

It appears the windows branch doesn't actually do what you want:
http://msdn.microsoft.com/en-us/library/z2s077bc(v=vs.80).aspx

Will return -1 on error, which means the ASSERT(fastMallocSize(p) in ValueCheck<T> will not fire.

The only port which seems to enable WTF_MALLOC_VALIDATION is Blackberry:
https://code.google.com/p/chromium/codesearch#search/&q=WTF_MALLOC_VALIDATION&sq=package:chromium&type=cs

So it appears that the only builds with a working version of this ASSERT are DARWIN builds, and possibly Blackberry. :)
Comment 23 Adam Barth 2013-03-21 13:20:30 PDT
The code is also plain wrong for the ENABLE_GLOBAL_FASTMALLOC_NEW to 0 and USE_SYSTEM_MALLOC to 0 configuration.  In that configuration, we use both the system heap and the FastMalloc heap.  This code has no way to know which pointer the heap was allocated from (assuming it was even allocated from a heap at all).
Comment 24 Alexey Proskuryakov 2013-03-21 13:27:42 PDT
These are all good opportunities to make the checks more universal when someone needs to perform this sort of investigation in such configurations.
Comment 25 Eric Seidel (no email) 2013-03-21 14:01:54 PDT
Another solution here would be to just #ifdef all this code as OS(DARWIN) since that's the only platform it works on. :)

But I think the better solution is to just remove it.

I'm not super passionate about this bug, nor am I in any hurry to land it.  I worked around the issue blocking bug 112831.  But I do believe removing this code is the right path forward for the project.
Comment 26 Maciej Stachowiak 2013-03-21 16:05:04 PDT
(In reply to comment #25)
> Another solution here would be to just #ifdef all this code as OS(DARWIN) since that's the only platform it works on. :)
> 
> But I think the better solution is to just remove it.
> 
> I'm not super passionate about this bug, nor am I in any hurry to land it.  I worked around the issue blocking bug 112831.  But I do believe removing this code is the right path forward for the project.

It seems like there is at least one bug that actually happened which would have been caught by the checkConsistency ASSERTs, and it seems like fixing the crash only became tractable and testable using ValueCheck.

https://bugs.webkit.org/show_bug.cgi?id=34490

That seems like  better track record than most ASSERTs in the project. Removing it would materially reduce our test coverage, it seems to me.

It seems to me that making it apply correctly to other malloc setups (the cases Adam cited are both eminently fixable) would be better than removing it. 

What's your reason for thinking it should be removed despite that?
Comment 27 Eric Seidel (no email) 2013-03-21 20:18:57 PDT
(In reply to comment #26)
> (In reply to comment #25)
> What's your reason for thinking it should be removed despite that?

I guess my understanding of ASSERTs is that they're supposed to document/enforce conditions that we never believe should occur.

ValueCheck, as written, enforces a condition which by chance (and only on DARWIN) happens to occur much of the time, but by no means always.

One could re-write ValueCheck to only ASSERT(fastMallocSize(p)) when p is of a class type with MAKE_FAST_ALLOCATED() defined.  But since MAKE_FAST_ALLOCATED currently only adds new/delete overrides, it's not immediately clear to me how we'd do that.

Certainly extending the current opt-out solution to all classes for which fastMallocSize(p) is not valid seems much more trouble than it's worth.  Leaving an ASSERT in the code base which is not true (and in fact we know to be false for many of our classes) seems like a bad idea. :)  It's just an accident waiting for the next contributor to run into.
Comment 28 Maciej Stachowiak 2013-03-21 20:51:17 PDT
(In reply to comment #27)
> (In reply to comment #26)
> > (In reply to comment #25)
> > What's your reason for thinking it should be removed despite that?
> 
> I guess my understanding of ASSERTs is that they're supposed to document/enforce conditions that we never believe should occur.
> 
> ValueCheck, as written, enforces a condition which by chance (and only on DARWIN) happens to occur much of the time, but by no means always.
> 
> One could re-write ValueCheck to only ASSERT(fastMallocSize(p)) when p is of a class type with MAKE_FAST_ALLOCATED() defined.  But since MAKE_FAST_ALLOCATED currently only adds new/delete overrides, it's not immediately clear to me how we'd do that.
> 
> Certainly extending the current opt-out solution to all classes for which fastMallocSize(p) is not valid seems much more trouble than it's worth.  Leaving an ASSERT in the code base which is not true (and in fact we know to be false for many of our classes) seems like a bad idea. :)  It's just an accident waiting for the next contributor to run into.

Here's the cases I can think of where it could be wrong:
(a) Raw pointers to statically allocated values
(b) Raw pointers to values allocated on the stack
(c) Raw pointers to values that are data members of other objects
(d) Raw pointers to values allocated in space that is obtained directly from the VM, rather than from malloc, such as JavaScript GC memory.

Such objects would cause assertion failures when used in a HashSet or as HashMap keys. Other than (b), I'd expect all cases to occur with some regularity Since there are so few ValueCheck specializations, I would guess we only do not see more problems from this because many types are not used as HashTable keys at all. Or maybe we generally do not often use raw pointers as keys.

In any case, it seems like the actually difficulties in practice have been low, even though in theory they could be high. Or at least, I don't recall a lot of other instances of ValueCheck false positives.

I wonder if there is another way to do this that requires more explicit choice. For example, whenever a raw pointer type is used as a HashTable key, we could require a ValueCheck specialization specifically for that type. Then you'd never get it by accident and could think about the allocation policy for your object.

Another thought: perhaps we could enhance FastMalloc to be able to directly answer the question: "is this a pointer to freed memory" rather than "is this a known valid malloc-allocated pointer". That would bias things the other way for objects that are not malloc-allocated at all.

I'm curious what Alexey's further thoughts are on this topic.
Comment 29 Maciej Stachowiak 2013-03-21 22:01:34 PDT
So I looked closer into how fastMallocSize is implemented:

In the WTF_MALLOC_VALIDATION case, fastMallocSize() doesn't actually check if the pointer is known to fastMalloc - it will return a random value for a pointer allocated otherwise than by FastMalloc, it seems to me. 

In the non-WTF_MALLOC_VALIDATION case, it looks like fastMallocSize() will always return zero, because malloc_size will call through to FastMallocZone::size(), which always returns 0.

It does not look like WTF_MALLOC_VALIDATION is enabled by default. So I'm not sure how this code ever works. Clearly the assert isn't firing all the time, so am I misreading something here?
Comment 30 Alexey Proskuryakov 2013-03-21 23:12:43 PDT
> I'm curious what Alexey's further thoughts are on this topic.

Checking how checkConsistency is used in WebCore now, I'm seeing these cases:

HashMap<AtomicStringImpl*, OwnPtr<RadioButtonGroup> >
HashMap<AtomicStringImpl*, RefPtr<KeyframeAnimation> >
HashMap<AtomicStringImpl*, Element*>
Vector<T*>, where T is template parameter for EventSender (HTMLLinkElement, HTMLStyleElement, ImageLoader)
HashMap<AtomicStringImpl*, AtomicStringImpl*>
HashMap<String, HostInformation*, StringHash>
HashMap<AtomicStringImpl*, RefPtr<StyleRuleKeyframes> >

All these cases worked reliably for years until yesterday, because StringImpl was always heap allocated, and so were elements. Objects that can be allocated in other ways just need different ValueCheck specializations, as demonstrated with ImageLoader.

This is just how template specializations work. As another example, HashTraits is not a flawed idea because the default version does not work with every conceivable type. We add a new specialization when we want to put new types into HashTable. The situation is no different with ValueCheck.

From the list above, it's clear that changes made in bug 112831 disable almost all checks that we have in WebCore, except for EventSender ones.

I think that the responsibility to update template specializations to keep working is on people who make changes. If StringImpl now gets allocated in new ways, ValueCheck should be changed accordingly.

It would be really good to have this addressed soon, before rolling out the changes in bug 112831 becomes difficult due to source conflicts.

> Clearly the assert isn't firing all the time, so am I misreading something here?

The only case where these checks are made is in debug builds, which use system malloc, and thus malloc_size and _msize work as expected.

I agree with you that fastMallocSize behavior in release builds seems broken whether WTF_MALLOC_VALIDATION is enabled or not. The code has been deeply refactored since I last looked at it, but these issues would only surface if anyone enables assertions in builds that use FastMalloc, so in practice, this is orthogonal to ValueCheck behavior.

As Eric mentioned, the Windows case also looks suspicious. I think that this concern was possibly based on a misreading of MSDN though - the only place where it mentions -1 is when the argument is null, not when it's not a pointer to an allocated block. I think that I actually confirmed correctness by manual testing back in the day.
Comment 31 Alexey Proskuryakov 2013-03-21 23:18:19 PDT
One thing that could be easy to overlook is that ValueCheck checks do not automatically happen in all WTF containers, they are always manual. They are also not under CHECK_HASHTABLE_CONSISTENCY ifdef.

In retrospect, it was probably a poor decision to make the names so similar to CHECK_HASHTABLE_CONSISTENCY ones.
Comment 32 Maciej Stachowiak 2013-03-21 23:29:09 PDT
(In reply to comment #31)
> One thing that could be easy to overlook is that ValueCheck checks do not automatically happen in all WTF containers, they are always manual. They are also not under CHECK_HASHTABLE_CONSISTENCY ifdef.
> 
> In retrospect, it was probably a poor decision to make the names so similar to CHECK_HASHTABLE_CONSISTENCY ones.

I see - if it's opt-in per hashtable, then only a subset of types need a specialization. It sounds like it will be possible to fix the StringImpl/AtomicStringImpl cases once the new statically allocated StringImpl's return true from isStatic(). Not sure why that wasn't done in the first patch, but it sounds like it will be fixed.
Comment 33 Eric Seidel (no email) 2013-03-21 23:37:26 PDT
I disagree with the utility of this code, but I'm not particularly interested in pushing this change.  Someone else can fight this fight a later time.
Comment 34 Eric Seidel (no email) 2013-03-21 23:38:15 PDT
"take up this cause" might have been more appropriate than "fight this fight" as a phrasing. :)
Comment 35 Alexey Proskuryakov 2013-03-22 00:01:25 PDT
I filed bug 113011 to restore the functionality.
Comment 36 Csaba Osztrogonác 2013-11-05 08:52:53 PST
Comment on attachment 194193 [details]
Patch

Cleared review? from attachment 194193 [details] so that this bug does not appear in http://webkit.org/pending-review.  If you would like this patch reviewed, please attach it to a new bug (or re-open this bug before marking it for review again).