RESOLVED FIXED 229461
[WTF] Fix static analyzer warnings about nullptr derefs in StringImpl::copyCharacters() and tryMakeStringFromAdapters()
https://bugs.webkit.org/show_bug.cgi?id=229461
Summary [WTF] Fix static analyzer warnings about nullptr derefs in StringImpl::copyCh...
David Kilzer (:ddkilzer)
Reported 2021-08-24 11:55:20 PDT
Ignore static analyzer warnings about nullptr derefs in WTF::StringImpl::copyCharacters() and WTF::StringTypeAdapter::writeTo(). A nullptr deref here will simply crash, and we don't know of any such issues in practice. Adding a Debug ASSERT() causes the clang static analyzer to assume the `destination` argument is never nullptr. I considered adding something like a CLANG_STATIC_ANLYZER_ASSERT() that's only active on static analyzer runs (not even in Debug builds), but wasn't sure if that's something folks are agreeable to, or whether it's too similar in name to `static_assert()`.
Attachments
Patch v1 (2.53 KB, patch)
2021-08-24 12:08 PDT, David Kilzer (:ddkilzer)
ews-feeder: commit-queue-
Patch v2 (3.69 KB, patch)
2021-08-24 18:06 PDT, David Kilzer (:ddkilzer)
no flags
Patch v3 (20.25 KB, patch)
2021-08-25 17:13 PDT, David Kilzer (:ddkilzer)
ews-feeder: commit-queue-
Patch v4 (20.25 KB, patch)
2021-08-25 17:23 PDT, David Kilzer (:ddkilzer)
ews-feeder: commit-queue-
Patch v5 (20.80 KB, patch)
2021-08-25 17:58 PDT, David Kilzer (:ddkilzer)
no flags
Patch v6 (for test failures, not review) (15.88 KB, patch)
2021-08-26 12:21 PDT, David Kilzer (:ddkilzer)
ews-feeder: commit-queue-
Patch v7 (for testing only) (15.81 KB, patch)
2021-08-26 14:45 PDT, David Kilzer (:ddkilzer)
ews-feeder: commit-queue-
Patch v8 (for testing only) (17.04 KB, patch)
2021-08-26 17:03 PDT, David Kilzer (:ddkilzer)
no flags
Patch v9 (3.14 KB, patch)
2021-08-26 21:00 PDT, David Kilzer (:ddkilzer)
darin: review+
Patch for landing (3.19 KB, patch)
2021-08-28 08:46 PDT, David Kilzer (:ddkilzer)
no flags
Radar WebKit Bug Importer
Comment 1 2021-08-24 12:05:08 PDT
David Kilzer (:ddkilzer)
Comment 2 2021-08-24 12:08:59 PDT
Created attachment 436316 [details] Patch v1
David Kilzer (:ddkilzer)
Comment 3 2021-08-24 12:10:42 PDT
(In reply to David Kilzer (:ddkilzer) from comment #0) > I considered adding something like a CLANG_STATIC_ANLYZER_ASSERT() that's > only active on static analyzer runs (not even in Debug builds), but wasn't > sure if that's something folks are agreeable to, or whether it's too similar > in name to `static_assert()`. Could also do something like this: #if ENABLE(CLANG_STATIC_ANALYZER) ASSERT(destination); #endif My main concern here is slowing down Debug builds unnecessarily to quiet a clang static analyzer warning.
David Kilzer (:ddkilzer)
Comment 4 2021-08-24 18:06:53 PDT
Created attachment 436359 [details] Patch v2
Alexey Proskuryakov
Comment 5 2021-08-24 18:26:25 PDT
> A nullptr deref here will simply crash, and we don't know of any such issues in practice. A nullptr deref means that a compiler is free to optimize away any code that leads to the crash, as it assumes that no one writes programs that have undefined behavior. Perhaps we don't know of crashes because they were optimized away!
Alexey Proskuryakov
Comment 6 2021-08-24 18:28:16 PDT
There must be annotations that can be added to the arguments to move warnings where they belong?
David Kilzer (:ddkilzer)
Comment 7 2021-08-25 07:46:12 PDT
(In reply to Alexey Proskuryakov from comment #6) > There must be annotations that can be added to the arguments to move > warnings where they belong? Yes, I found __attribute((nonnull(...))), which also has the nice side-effect of complaining about use of nullptr at compile-time (if known). <https://clang-analyzer.llvm.org/annotations.html#generic> I suspect we could simply use the `nonnull` attribute on each argument (which is mostly used for Objective-C API at Apple), but I'm not sure if gcc supports it (though gcc does support the longer attribute). Will give it a try.
Darin Adler
Comment 8 2021-08-25 10:23:42 PDT
(In reply to Alexey Proskuryakov from comment #6) > There must be annotations that can be added to the arguments to move > warnings where they belong? This is what I was hoping for.
David Kilzer (:ddkilzer)
Comment 9 2021-08-25 16:25:52 PDT
(In reply to Darin Adler from comment #8) > (In reply to Alexey Proskuryakov from comment #6) > > There must be annotations that can be added to the arguments to move > > warnings where they belong? > > This is what I was hoping for. I have a fix that does not solve this using ASSERT() statements, and I'll post it as "Patch v3", but after implementing this fix, I wonder if I could just create template specializations using the std::is_mull_pointer type trait instead: <https://en.cppreference.com/w/cpp/types/is_null_pointer> Because as far as I can tell, nullptr is used "on purpose" currently, and having a specialization for it would actually be faster when it's used.
Darin Adler
Comment 10 2021-08-25 16:32:02 PDT
(In reply to David Kilzer (:ddkilzer) from comment #9) > Because as far as I can tell, nullptr is used "on purpose" currently, and > having a specialization for it would actually be faster when it's used. I don’t think that’s right. This helps only if the caller is literally using nullptr, not a pointer value that happens to be null. We definitely sometimes pass null and zero length to functions, but I think it’s too many function calls away for is_null_pointer compile time checking to help.
David Kilzer (:ddkilzer)
Comment 11 2021-08-25 16:42:00 PDT
(In reply to Darin Adler from comment #10) > (In reply to David Kilzer (:ddkilzer) from comment #9) > > Because as far as I can tell, nullptr is used "on purpose" currently, and > > having a specialization for it would actually be faster when it's used. > > I don’t think that’s right. This helps only if the caller is literally using > nullptr, not a pointer value that happens to be null. We definitely > sometimes pass null and zero length to functions, but I think it’s too many > function calls away for is_null_pointer compile time checking to help. Thanks--that's what I was thinking when I read the description, but I wasn't sure. I think we are passing nullptr literally in one place, though I haven't checked if std::is_null_pointer would catch this: Source/WTF/wtf/text/StringBuilder.cpp: allocateBuffer<LChar>(static_cast<LChar*>(nullptr), newCapacity); But other times, we're setting a variable to nullptr: Source/WTF/wtf/text/StringImpl.cpp: if (!length) { data = nullptr; return *empty(); } if (!length) { data = nullptr; return Ref<StringImpl>(*empty()); } Source/WTF/wtf/text/StringImpl.h: if (!length) { output = nullptr; return empty(); }
David Kilzer (:ddkilzer)
Comment 12 2021-08-25 17:13:30 PDT
Created attachment 436442 [details] Patch v3
David Kilzer (:ddkilzer)
Comment 13 2021-08-25 17:23:53 PDT
Created attachment 436444 [details] Patch v4
David Kilzer (:ddkilzer)
Comment 14 2021-08-25 17:24:07 PDT
(In reply to David Kilzer (:ddkilzer) from comment #12) > Created attachment 436442 [details] > Patch v3 The NONNULL_ARGS() arguments for WTF::StringImpl::copyCharacters() were off-by-one since this is a static class method.
David Kilzer (:ddkilzer)
Comment 15 2021-08-25 17:32:10 PDT
Comment on attachment 436444 [details] Patch v4 Well this is awkward. Looks like non-Apple compilers think StringImpl::copyCharacters() is static, but Apple's clang compiler thinks it's not static.
David Kilzer (:ddkilzer)
Comment 16 2021-08-25 17:58:15 PDT
Created attachment 436450 [details] Patch v5
Darin Adler
Comment 17 2021-08-25 18:09:52 PDT
Comment on attachment 436450 [details] Patch v5 View in context: https://bugs.webkit.org/attachment.cgi?id=436450&action=review > Source/WTF/wtf/text/StringImpl.h:383 > + WTF_EXPORT_PRIVATE static const char s_emptyCharacters[2]; > + template<typename CharacterType> ALWAYS_INLINE static CharacterType* emptyCharacters() > + { > + static_assert(sizeof(CharacterType) <= sizeof(s_emptyCharacters)); > + return reinterpret_cast<CharacterType*>(const_cast<char(*)[2]>(&s_emptyCharacters)); > + } Given that this is about static analysis, can we do this in a way that has no additional runtime cost in the final binary? A null pointer will be more efficient than fetching a pointer to a global. > Source/WTF/wtf/text/StringImpl.h:1133 > +// The clang compiler treats WTF::StringImpl::copyCharacters() as an instance method > +// when applying __attribute((nonnull(...))), which means the argument counts are > +// off-by-one since it assumes argument 1 is the `this` pointer. What does that even mean? Sounds like a bug. It’s not an instance method. Is this working around a clang bug? If so, can we file a bug report?
David Kilzer (:ddkilzer)
Comment 18 2021-08-25 21:21:03 PDT
Comment on attachment 436450 [details] Patch v5 View in context: https://bugs.webkit.org/attachment.cgi?id=436450&action=review >> Source/WTF/wtf/text/StringImpl.h:383 >> + } > > Given that this is about static analysis, can we do this in a way that has no additional runtime cost in the final binary? A null pointer will be more efficient than fetching a pointer to a global. Returning nullptr here would re-introduce the concern that Alexey had about undefined behavior from Patch v2, so I'm not sure that's an option, or maybe I'm still not clear on why Patch v2 wasn't acceptable. What the clang static analyzer doesn't understand is that whenever nullptr is passed as the `destination` parameter of StringImpl::copyCharacters(), `numCharacters` is always zero, so there is no undefined behavior. (This is what the ASSERT() statement checked.) And if that ever wasn't true, the method would crash, as would the other two methods that had `ASSERT(destination);` added. So I'm really not sure now what undefined behavior was happening that Patch v2 was missing. >> Source/WTF/wtf/text/StringImpl.h:1133 >> +// off-by-one since it assumes argument 1 is the `this` pointer. > > What does that even mean? Sounds like a bug. It’s not an instance method. Is this working around a clang bug? If so, can we file a bug report? Yes, it seems to be a bug in clang. Yes, I will file a bug for it tomorrow. The reason I didn't file a bug for it today was (a) I ran out of time today, and (b) I wanted to get this patch posted to get feedback on it (which was the right decision in hindsight because there were new concerns).
David Kilzer (:ddkilzer)
Comment 19 2021-08-26 07:25:13 PDT
Have an idea for a hybrid approach that would keep the annotations, but get rid of the s_emptyCharacters global variable.
Alexey Proskuryakov
Comment 20 2021-08-26 10:32:46 PDT
> What the clang static analyzer doesn't understand is that whenever nullptr is passed as the `destination` parameter of StringImpl::copyCharacters(), `numCharacters` is always zero, so there is no undefined behavior. (This is what the ASSERT() statement checked.) What are the annotations on C library functions that behave in the same way?
David Kilzer (:ddkilzer)
Comment 21 2021-08-26 12:21:49 PDT
Created attachment 436545 [details] Patch v6 (for test failures, not review) Add some RELEASE_ASSERT() statements to find out if we really do have undefined behavior in layout/api tests.
David Kilzer (:ddkilzer)
Comment 22 2021-08-26 12:25:31 PDT
(In reply to Alexey Proskuryakov from comment #20) > > What the clang static analyzer doesn't understand is that whenever nullptr is passed as the `destination` parameter of StringImpl::copyCharacters(), `numCharacters` is always zero, so there is no undefined behavior. (This is what the ASSERT() statement checked.) > > What are the annotations on C library functions that behave in the same way? I'm not sure what you're asking here. Are you asking if __attribute__((nonnull(...))) (which is what the NONNULL_ARGS() macro expands to) works on both C and C++ functions? If so, the answer is yes, it works on both C and C++ functions. I don't know of a way to annotate (or otherwise inform the clang static analyzer) that when `destination` is nullptr that `numCharacters` is zero, other than via ASSERT().
Darin Adler
Comment 23 2021-08-26 13:55:06 PDT
(In reply to David Kilzer (:ddkilzer) from comment #22) > I'm not sure what you're asking here. I think the question is: I believe memcpy and memmove functions have the same constraint. If you pass them nullptr and 0, all is well. If you pass them nullptr and non-zero then there is a problem. How does the static analyzer handle those?
David Kilzer (:ddkilzer)
Comment 24 2021-08-26 14:41:50 PDT
Comment on attachment 436545 [details] Patch v6 (for test failures, not review) View in context: https://bugs.webkit.org/attachment.cgi?id=436545&action=review > Source/WTF/wtf/text/StringConcatenate.h:436 > template<typename ResultType, typename Adapter> > -inline void stringTypeAdapterAccumulator(ResultType* result, Adapter adapter) > +inline /*NONNULL_ARGS(1)*/ void stringTypeAdapterAccumulator(ResultType* result, Adapter adapter) > { > + RELEASE_ASSERT(result); > adapter.writeTo(result); > } > > template<typename ResultType, typename Adapter, typename... Adapters> > -inline void stringTypeAdapterAccumulator(ResultType* result, Adapter adapter, Adapters ...adapters) > +inline /*NONNULL_ARGS(1)*/ void stringTypeAdapterAccumulator(ResultType* result, Adapter adapter, Adapters ...adapters) > { > + RELEASE_ASSERT(result); > adapter.writeTo(result); > stringTypeAdapterAccumulator(result + adapter.length(), adapters...); > } Adding the RELEASE_ASSERT() here was too aggressive. Some of the writeTo() methods may do okay with nullptr values.
David Kilzer (:ddkilzer)
Comment 25 2021-08-26 14:45:44 PDT
Created attachment 436570 [details] Patch v7 (for testing only) This is the same as Patch v6, but removes the RELEASE_ASSERT() statements from the WTF::stringTypeAdapterAccumulator<>() template functions.
Alexey Proskuryakov
Comment 26 2021-08-26 15:01:46 PDT
> I think the question is: > > I believe memcpy and memmove functions have the same constraint. If you pass > them nullptr and 0, all is well. If you pass them nullptr and non-zero then > there is a problem. How does the static analyzer handle those? This is exactly right. Expanding just a little bit more, 1. How does static analyzer handle code that uses memcpy and memmove. 2. How does static analyzer handle the code of memcpy and memmove themselves, when compiling the standard library.
David Kilzer (:ddkilzer)
Comment 27 2021-08-26 15:10:38 PDT
(In reply to Alexey Proskuryakov from comment #26) > > I think the question is: > > > > I believe memcpy and memmove functions have the same constraint. If you pass > > them nullptr and 0, all is well. If you pass them nullptr and non-zero then > > there is a problem. How does the static analyzer handle those? > > This is exactly right. Expanding just a little bit more, > 1. How does static analyzer handle code that uses memcpy and memmove. Will write some tests and get back here. > 2. How does static analyzer handle the code of memcpy and memmove > themselves, when compiling the standard library. Can you expand on why this is important for fixing these warnings in WebKit? I don't see the (direct) connection here. Are you wanting to know if the static analyzer flags any warnings in the C standard library sources? I need to double-check, but I'm pretty sure memcpy and memmove are implemented in assembly, so the static analyzer wouldn't have any source code to analyze. Also, on Darwin platforms, memcpy is implemented in terms of memmove, so although the manual page for memcpy (`man memcpy`) says overlapping src/dst buffers is undefined behavior, it's actually the well-defined behavior of memmove on Darwin platforms.
Alexey Proskuryakov
Comment 28 2021-08-26 15:49:08 PDT
You said that you did not know how to annotate the "nonnull unless another argument is 0" case. Since it's the same behavior as in standard functions AFAIK, I suggested that you could take a look as standard library code to see how it was done there. Now that I took a look myself, I don't think that it there is an answer. I don't see any annotations in clang, whereas glibc annotates arguments as nonnull regardless (https://code.woboq.org/userspace/glibc/string/string.h.html).
David Kilzer (:ddkilzer)
Comment 29 2021-08-26 17:03:18 PDT
Created attachment 436590 [details] Patch v8 (for testing only) WTF::StringTypeAdapter<WTF::StringImpl*, void>::writeTo<>() calls WTF::StringView::getCharactersWithUpconvert() which calls WTF::StringImpl::copyCharacters(), so these writeTo() methods don't need to assert since they will handle nullptr values.
David Kilzer (:ddkilzer)
Comment 30 2021-08-26 17:41:57 PDT
(In reply to Alexey Proskuryakov from comment #28) > You said that you did not know how to annotate the "nonnull unless another > argument is 0" case. Since it's the same behavior as in standard functions > AFAIK, I suggested that you could take a look as standard library code to > see how it was done there. > > Now that I took a look myself, I don't think that it there is an answer. I > don't see any annotations in clang, whereas glibc annotates arguments as > nonnull regardless > (https://code.woboq.org/userspace/glibc/string/string.h.html). Ah, that was a good idea. Sorry I'm not thinking along the same lines as you. So I have an idea of what the clang static analyzer isn't growing now. Here's an example. In StringBuilder.cpp, we have WTF::StringBuilder::reserveCapacity(). Look at the `if (!m_length)` case: void StringBuilder::reserveCapacity(unsigned newCapacity) { if (hasOverflowed()) return; if (m_buffer) { if (newCapacity > m_buffer->length()) reallocateBuffer(newCapacity); } else { if (newCapacity > m_length) { if (!m_length) allocateBuffer<LChar>(StringImpl::emptyCharacters<LChar>(), newCapacity); else if (m_string.is8Bit()) allocateBuffer<LChar>(m_string.characters8(), newCapacity); else allocateBuffer<UChar>(m_string.characters16(), newCapacity); } } ASSERT(hasOverflowed() || !newCapacity || m_buffer->length() >= newCapacity); } That calls WTF:: StringBuilder::allocateBuffer(): template<typename AllocationCharacterType, typename CurrentCharacterType> void StringBuilder::allocateBuffer(const CurrentCharacterType* currentCharacters, unsigned requiredCapacity) { AllocationCharacterType* bufferCharacters; auto buffer = StringImpl::tryCreateUninitialized(requiredCapacity, bufferCharacters); if (UNLIKELY(!buffer)) { didOverflow(); return; } ASSERT(!hasOverflowed()); StringImpl::copyCharacters(bufferCharacters, currentCharacters, m_length); m_buffer = WTFMove(buffer); m_string = { }; } Which then calls WTF:: StringImpl::copyCharacters() with a proposed ASSERT() added below: template<typename SourceCharacterType, typename DestinationCharacterType> inline void StringImpl::copyCharacters(DestinationCharacterType* destination, const SourceCharacterType* source, unsigned numCharacters) { ASSERT(destination || !numCharacters); // Proposed fix. static_assert(std::is_same_v<SourceCharacterType, LChar> || std::is_same_v<SourceCharacterType, UChar>); static_assert(std::is_same_v<DestinationCharacterType, LChar> || std::is_same_v<DestinationCharacterType, UChar>); if constexpr (std::is_same_v<SourceCharacterType, DestinationCharacterType>) { if (numCharacters == 1) { *destination = *source; return; } memcpy(destination, source, numCharacters * sizeof(DestinationCharacterType)); } else { // FIXME: We should ensure that UChar -> LChar copying happens when UChar only contains Latin-1. // https://bugs.webkit.org/show_bug.cgi?id=205355 for (unsigned i = 0; i < numCharacters; ++i) destination[i] = source[i]; } } So what the clang static analyzer is failing to do is to remember the constraint on `(!m_length)` in WTF::StringBuilder::reserveCapacity() applies to the `numCharacters` argument because WTF:: StringBuilder::allocateBuffer() passed `m_length` as the `numCharacters` parameter. Then by adding the `ASSERT(destination || !numCharacters);` statement above, this tells the clang static analyzer that `destination` must be non-null or `numCharacters` must be zero, and no other states are allowed. So I will file a clang static analyzer radar about the `m_length` constraint issue, but I propose that the `ASSERT(destination || !numCharacters);` is the correct workaround in the meantime, and actually does a nice job of documenting this implicit assumption. Note that the clang static analyzer will ignore calls to memcpy() and memmove() when either src or dst or both nullptr and length is zero: memcpy(nullptr, nullptr, 0); memmove(nullptr, nullptr, 0); It only flags an issue if either src or dst are nullptr and length is non-zero: memcpy(a, nullptr, 1); memmove(a, nullptr, 1); memcpy(nullptr, b, 1); memmove(nullptr, b, 1); memcpy(nullptr, nullptr, 1); memmove(nullptr, nullptr, 1); To analyze a source file, run this on macOS: $ xcrun clang -x c++ -Xclang -analyzer-config -Xclang mode=deep --analyze foo.cpp Going back to WebKit code, the clang static analyzer doesn't know about the `(destination || !numCharacters)` constraint (a bug), it tests all combinations of `destination` (nullptr or not) and `numCharacters` (zero and non-zero), and it thinks there is a scenario where `destination` is nullptr and `numCharacters` is non-zero. (Having said that, I need to rework the patch again starting with the `(destination || !numCharacters)` constraint in WTF::StringImpl::copyCharacters() and then see what falls out of that.
David Kilzer (:ddkilzer)
Comment 31 2021-08-26 17:42:42 PDT
> So I have an idea of what the clang static analyzer isn't growing now. * isn't _DOING_ now.
David Kilzer (:ddkilzer)
Comment 32 2021-08-26 21:00:48 PDT
Created attachment 436608 [details] Patch v9
David Kilzer (:ddkilzer)
Comment 33 2021-08-27 08:14:04 PDT
Comment on attachment 436608 [details] Patch v9 View in context: https://bugs.webkit.org/attachment.cgi?id=436608&action=review > Source/WTF/wtf/text/StringConcatenate.h:446 > - stringTypeAdapterAccumulator(buffer, adapter, adapters...); > + if (buffer) > + stringTypeAdapterAccumulator(buffer, adapter, adapters...); Could also replace this with an `ASSERT(destination)` in WTF::stringTypeAdapterAccumulator<>() methods if there are performance concerns. > Source/WTF/wtf/text/StringConcatenate.h:457 > - stringTypeAdapterAccumulator(buffer, adapter, adapters...); > + if (buffer) > + stringTypeAdapterAccumulator(buffer, adapter, adapters...); Ditto.
Alexey Proskuryakov
Comment 34 2021-08-27 17:00:53 PDT
I don't expect to have any further comments on this patch.
David Kilzer (:ddkilzer)
Comment 35 2021-08-28 08:46:30 PDT
Created attachment 436714 [details] Patch for landing
David Kilzer (:ddkilzer)
Comment 36 2021-08-28 08:49:40 PDT
(In reply to David Kilzer (:ddkilzer) from comment #30) > So I will file a clang static analyzer radar about the `m_length` constraint > issue, but I propose that the `ASSERT(destination || !numCharacters);` is > the correct workaround in the meantime, and actually does a nice job of > documenting this implicit assumption. I filed <rdar://problem/82475719> to track the clang static analyzer bug.
EWS
Comment 37 2021-08-28 09:48:23 PDT
Committed r281735 (241077@main): <https://commits.webkit.org/241077@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 436714 [details].
David Kilzer (:ddkilzer)
Comment 38 2021-08-28 10:39:03 PDT
(In reply to David Kilzer (:ddkilzer) from comment #18) > Comment on attachment 436450 [details] > Patch v5 > > View in context: > https://bugs.webkit.org/attachment.cgi?id=436450&action=review > > >> Source/WTF/wtf/text/StringImpl.h:1133 > >> +// off-by-one since it assumes argument 1 is the `this` pointer. > > > > What does that even mean? Sounds like a bug. It’s not an instance method. Is this working around a clang bug? If so, can we file a bug report? > > Yes, it seems to be a bug in clang. Yes, I will file a bug for it tomorrow. I filed <rdar://problem/82477229> to track this issue in clang. (Note that I didn't have to use a workaround for this bug since I didn't use the attribute in the final patch.)
Note You need to log in before you can comment on or make changes to this bug.