Summary: | Asan false positive: stack use after scope under WebCore::ApplyPropertyBorderImageModifier in WebCore::Length::Length(WebCore::Length&&) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Michael Catanzaro <mcatanzaro> | ||||||||||
Component: | WebCore Misc. | Assignee: | Michael Catanzaro <mcatanzaro> | ||||||||||
Status: | RESOLVED FIXED | ||||||||||||
Severity: | Normal | CC: | aboya, ap, bfulgham, darin, ddkilzer, koivisto, mcatanzaro, oliver, product-security, simon.fraser, webkit-bug-importer, zalan, zan | ||||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||||
Version: | WebKit Nightly Build | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Attachments: |
|
Description
Michael Catanzaro
2018-06-24 11:57:25 PDT
BTW, a workaround is to just remove the Length(Length&&) constructor. Since it's implemented with a copy anyway, it's not any more efficient than the copy constructor, so that might even be a reasonable solution. Could you add the "about:overview" page contents that generated this crash? I'd like to confirm we don't have a similar issue in iOS/macOS. (In reply to Brent Fulgham from comment #3) > Could you add the "about:overview" page contents that generated this crash? > I'd like to confirm we don't have a similar issue in iOS/macOS. I'm afraid it depends on local files and Epiphany-specific JavaScript, but no worries: I can reproduce the same crash on both apple.com and google.com (simpler). Just loading either page with asan enabled should suffice. There's no platform-specific code anywhere in the trace (which ends at the llint boundary) so it's very likely a cross-platform problem. (In reply to Michael Catanzaro from comment #2) > BTW, a workaround is to just remove the Length(Length&&) constructor. Since > it's implemented with a copy anyway, it's not any more efficient than the > copy constructor, so that might even be a reasonable solution. Still, I wish I understood what's going wrong. (In reply to Michael Catanzaro from comment #2) > BTW, a workaround is to just remove the Length(Length&&) constructor. Since > it's implemented with a copy anyway, it's not any more efficient than the > copy constructor, so that might even be a reasonable solution. Actually that's a lie, that just shifts the error to the copy constructor. I must have messed up somehow when I tried that yesterday. (In reply to Michael Catanzaro from comment #0) > Note that the memcpy in the Length::Length(Length&& other) constructor (and > the other constructors, and the assignment operators) is at best highly > questionable, since Length is not a trivially copyable type. I hope that's > not causing problems here. The error still occurs when replacing the memcpy with manual assignments, so that's not it. I found in the GCC 7 release notes, which added -fsanitize-address-use-after-scope: """ Compared to the LLVM compiler, where the option already exists, the implementation in the GCC compiler has some improvements and advantages: [...] C++ temporaries are sanitized. """ So if Clang's asan is unable to detect this issue, that's probably why. I'm really having trouble with this one. Here is some debug: Length::Length(Length&& other) { WTFLogAlways("%s: other=%p", __PRETTY_FUNCTION__, &other); WTFLogAlways("other.m_hasQuirk=%d", other.m_hasQuirk); WTFLogAlways("other.m_type=%d", other.m_type); WTFLogAlways("other.m_isFloat=%d", other.m_isFloat); WTFLogAlways("this=%p", this); <-- Last statement printed WTFLogAlways("m_hasQuirk=%d", m_hasQuirk); <-- Crash occurs here WTFLogAlways("m_type=%d", m_type); WTFLogAlways("m_isFloat=%d", m_isFloat); memcpy(this, &other, sizeof(Length)); other.m_type = Auto; } It seems that using the values of other is perfectly fine, but not the data members of this, even though they should have just been initialized by the constructor, which is crazy. *** Bug 187122 has been marked as a duplicate of this bug. *** (In reply to Michael Catanzaro from comment #9) > Length::Length(Length&& other) > { > WTFLogAlways("%s: other=%p", __PRETTY_FUNCTION__, &other); > > WTFLogAlways("other.m_hasQuirk=%d", other.m_hasQuirk); > WTFLogAlways("other.m_type=%d", other.m_type); > WTFLogAlways("other.m_isFloat=%d", other.m_isFloat); > > WTFLogAlways("this=%p", this); <-- Last statement printed > WTFLogAlways("m_hasQuirk=%d", m_hasQuirk); <-- Crash occurs here > WTFLogAlways("m_type=%d", m_type); > WTFLogAlways("m_isFloat=%d", m_isFloat); > > memcpy(this, &other, sizeof(Length)); > other.m_type = Auto; > } > > It seems that using the values of other is perfectly fine, but not the data > members of this, even though they should have just been initialized by the > constructor, which is crazy. It’s not correct to say that these data members "should have just been initialized by the constructor". The function here *is* the constructor and at the point of the log statement m_hasQuirk has *not* been initialized yet. It’s correct for ASan to object to *reading* this->m_hasQuirk before any code has *written* to it. I don’t know how this logging relates to the bug, though. Just a guess, but I think the issue might possibly be that ASan doesn’t understand the union in Length. This Length move constructor needs to copy m_intValue, but m_floatValue is not initialized. The code, however, tries to move the entire object. ASan gets upset that we are moving some uninitialized bugs from one object to another, and doesn’t know that those uninitialized bytes won’t be looked at because of m_isFloat. Here’s a way to test this theory. We can write the following version of the move constructor: if (other.m_isFloat) m_floatValue = other.m_floatValue; else m_intValue = other.m_intValue; m_hasQuirk = other.m_hasQuirk; m_type = other.m_type; m_isFloat = other.m_isFloat; other.m_type = Auto; If this version makes the ASan complaints go away, then it’s a false positive and we can think about how to make ASan happy instead of worrying that it’s detecting a bug. (In reply to Darin Adler from comment #11) > It’s not correct to say that these data members "should have just been > initialized by the constructor". The function here *is* the constructor and > at the point of the log statement m_hasQuirk has *not* been initialized yet. But we are past the initializer list at this point, right? So everything should be initialized to default values already? I think it will still crash if I replace the read with a write to any member of this. I tested this yesterday; I'll do so again now to be completely sure. (In reply to Darin Adler from comment #12) > Here’s a way to test this theory. We can write the following version of the > move constructor: > > if (other.m_isFloat) > m_floatValue = other.m_floatValue; > else > m_intValue = other.m_intValue; > m_hasQuirk = other.m_hasQuirk; > m_type = other.m_type; > m_isFloat = other.m_isFloat; > other.m_type = Auto; I tried almost exactly this yesterday (the only difference being I also tested m_type and copied m_calculationValueHandle instead if it was Calculated, but this problem does not involved calculated Lengths, so that should not matter). Unfortunately, that did not placate asan. It crashed as soon as I tried assigning to the members of this. That doesn't mean you are wrong to suspect the union; it could still be involved somehow, perhaps. I'll test this now, just to be certain that I didn't screw up or misremember my testing from yesterday: Length::Length(Length&& other) { if (other.isCalculated()) m_calculationValueHandle = other.m_calculationValueHandle; else if (other.m_isFloat) m_floatValue = other.m_floatValue; else m_intValue = other.m_intValue; // <-- pretty sure it will crash here m_hasQuirk = other.m_hasQuirk; m_type = other.m_type; m_isFloat = other.m_isFloat; other.m_type = Auto; } (In reply to Darin Adler from comment #11) > It’s not correct to say that these data members "should have just been > initialized by the constructor". The function here *is* the constructor and > at the point of the log statement m_hasQuirk has *not* been initialized yet. > > It’s correct for ASan to object to *reading* this->m_hasQuirk before any > code has *written* to it. > > I don’t know how this logging relates to the bug, though. Alicia (CCed) has convinced me that you're right about this, because C++ does not initialize primitive types to default values. Exactly. Default initialization <https://en.cppreference.com/w/cpp/language/default_initialization> is done before this point, but a careful reading will show that such initialization doesn’t initialize all data members. The terminology used is "initialized to indeterminate values", in other words, not initialized. (In reply to Darin Adler from comment #12) > uninitialized bugs Heh, "uninitialized bugs", "uninitialized bytes", whichever you prefer. (In reply to Michael Catanzaro from comment #14) > I'll test this now, just to be certain that I didn't screw up or misremember > my testing from yesterday: > > Length::Length(Length&& other) > { > if (other.isCalculated()) > m_calculationValueHandle = other.m_calculationValueHandle; > else if (other.m_isFloat) > m_floatValue = other.m_floatValue; > else > m_intValue = other.m_intValue; // <-- pretty sure it will crash here > > m_hasQuirk = other.m_hasQuirk; > m_type = other.m_type; > m_isFloat = other.m_isFloat; > other.m_type = Auto; > } The result is as expected, the stack-use-after-free crash moves to the line where m_intValue is initialized. Super weird. Created attachment 343849 [details]
move-code-outside.patch
This bug is super weird. These kind of errors usually derive from taking the address of a defunt temporary but I see no obvious instance of that.
Look at this patch, it makes the error go away just by moving the conflictive code to a separate function, but does exactly the same: same allocations, same copies, same method calls... I'm starting to wonder if the compiler may be calculating the wrong frame size for applyInitialValue() or something of the likes.
gcc (GCC) 8.1.1 20180502 (Red Hat 8.1.1-1)
If the error is happening when reading from an address, we should figure out exactly which address it is. I would expect it to be a stack location from inside the call of applyInitialValue. Is the stack value consistent with this assumption? We have the stack frame addresses of the various functions in the backtrace to check against. Can we make this failure happen inside a debugger? I think it’s really relevant to poke around and see exactly what memory we are talking about. Some more ideas for debugging: We need to verify that it’s the *read* of other.m_intValue that is the issue. If optimization is turned off then I think we should rearrange this move constructor so that other.m_hasQuirk, other.m_type, and other.m_isFloat are all read first to verify that it’s specifically other.m_intValue that it’s complaining about, not the entire "other" object. Then drill into why other.m_intValue is supposed to be initialized and not invalidated. I'm going to fire up gdb in a second. First, a couple more notes. Alicia discovered what we believe to be a subtle bug in the RectEdges constructor: --- a/Source/WebCore/platform/RectEdges.h +++ b/Source/WebCore/platform/RectEdges.h @@ -36,7 +36,7 @@ public: template<typename U> RectEdges(U&& top, U&& right, U&& bottom, U&& left) - : m_sides({ { std::forward<T>(top), std::forward<T>(right), std::forward<T>(bottom), std::forward<T>(left) } }) + : m_sides({ { std::forward<U>(top), std::forward<U>(right), std::forward<U>(bottom), std::forward<U>(left) } }) { } T& at(PhysicalBoxSide side) { return m_sides[static_cast<size_t>(side)]; } (The template type is wrong.) And then I found some misplaced WTFMoves (moving from copies rather than rvalue references). diff --git a/Source/WebCore/rendering/style/NinePieceImage.h b/Source/WebCore/rendering/style/NinePieceIm age.h index d5465e94e92..ecef25095f8 100644 --- a/Source/WebCore/rendering/style/NinePieceImage.h +++ b/Source/WebCore/rendering/style/NinePieceImage.h @@ -111,16 +111,16 @@ public: void setImage(RefPtr<StyleImage>&& image) { m_data.access().image = WTFMove(image); } const LengthBox& imageSlices() const { return m_data->imageSlices; } - void setImageSlices(LengthBox slices) { m_data.access().imageSlices = WTFMove(slices); } + void setImageSlices(LengthBox slices) { m_data.access().imageSlices = slices; } bool fill() const { return m_data->fill; } void setFill(bool fill) { m_data.access().fill = fill; } const LengthBox& borderSlices() const { return m_data->borderSlices; } - void setBorderSlices(LengthBox slices) { m_data.access().borderSlices = WTFMove(slices); } + void setBorderSlices(LengthBox slices) { m_data.access().borderSlices = slices; } const LengthBox& outset() const { return m_data->outset; } - void setOutset(LengthBox outset) { m_data.access().outset = WTFMove(outset); } + void setOutset(LengthBox outset) { m_data.access().outset = outset; } ENinePieceImageRule horizontalRule() const { return static_cast<ENinePieceImageRule>(m_data->horizontalRule); } void setHorizontalRule(ENinePieceImageRule rule) { m_data.access().horizontalRule = rule; } ASan-generated code in ApplyPropertyBorderImageModifier<>::applyInitialValue() allocates the shadow memory necessary to accompany the stack in this method. In this method, that's 52 bytes. This is how this memory looks at the start: (gdb) x /13x $r12 + 0x7fff8000 0x10006c7aec02: 0x00000000 0x00000000 0x00000000 0x00000000 0x10006c7aec12: 0x00000000 0x00000000 0x00000000 0x00000000 0x10006c7aec22: 0x00000000 0x00000000 0x00000000 0xf8f8f8f8 0x10006c7aec32: 0x00000000 There's already the 0xf8f8f8f8 value there. I don't know how ASan exactly allocates this, but I'm pretty sure it expects that the memory is zeroed out. Here's how the shadow memory is initialized shortly after allocation: 0x00007f4a55fae9bd <+112>: movl $0xf1f1f1f1,0x7fff8000(%r12) 0x00007f4a55fae9c9 <+124>: movl $0xf2f2f200,0x7fff8004(%r12) 0x00007f4a55fae9d5 <+136>: movl $0xf2f2f2f2,0x7fff8008(%r12) 0x00007f4a55fae9e1 <+148>: movl $0xf2f2f200,0x7fff800c(%r12) 0x00007f4a55fae9ed <+160>: movl $0xf2f2f2f2,0x7fff8010(%r12) 0x00007f4a55fae9f9 <+172>: movl $0xf2f2f200,0x7fff8014(%r12) 0x00007f4a55faea05 <+184>: movl $0xf2f2f2f2,0x7fff8018(%r12) 0x00007f4a55faea11 <+196>: movl $0xf2f2f200,0x7fff801c(%r12) 0x00007f4a55faea1d <+208>: movl $0xf2f2f2f2,0x7fff8020(%r12) 0x00007f4a55faea29 <+220>: movl $0xf2f2f200,0x7fff8024(%r12) 0x00007f4a55faea35 <+232>: movl $0xf2f2f2f2,0x7fff8028(%r12) 0x00007f4a55faea41 <+244>: movl $0xf3f3f3f3,0x7fff8030(%r12) That leaves the shadow memory in this state: (gdb) x /13x $r12 + 0x7fff8000 0x10006c7aec02: 0xf1f1f1f1 0xf2f2f200 0xf2f2f2f2 0xf2f2f200 0x10006c7aec12: 0xf2f2f2f2 0xf2f2f200 0xf2f2f2f2 0xf2f2f200 0x10006c7aec22: 0xf2f2f2f2 0xf2f2f200 0xf2f2f2f2 0xf8f8f8f8 0x10006c7aec32: 0xf3f3f3f3 This corresponds to the frame layout dumped by ASan upon crash: This frame has 6 object(s): [32, 40) 'image' -- NinePieceImage object at 0x10006c7aec02 + 7 [96, 104) '<unknown>' -- first Length() object at 0x10006c7aec02 + 15 [160, 168) '<unknown>' -- second Length() object at 0x10006c7aec12 + 7 [224, 232) '<unknown>' -- third Length() object at + 0x10006c7aec12 + 15 [288, 296) '<unknown>' -- fourth Length() object at 0x10006c7aec22 + 7 [352, 384) '<unknown>' -- LengthBox() object that is being passed all the Length() objects, at 0x10006c7aec22 + 12 What's missing in the shadow memory initialization is a movl instruction that would null out the four bytes located at 0x7fff802c(%r12). Because of that, the 0xf8f8f8f8 value remains in that place. In the end, the 0xf8f8f8f8 value is in shadow memory in a location that is associated with the LengthBox object or, more precisely, the std::array<Length, 4> member object into which the temporary Length objects are being moved. As the first move constructor is called and data store is attempted, ASan finds the 0xf8 value in the corresponding shadow memory and things go sideways. I assume ASan expects the allocated shadow memory to be nulled-out, given the initialization instructions that are generated. That's not happening, apparently. For that reason I would mark this down as an issue with ASan, and it should be reported to the GCC project. (In reply to Zan Dobersek from comment #24) > I assume ASan expects the allocated shadow memory to be nulled-out, given > the initialization instructions that are generated. That's not happening, > apparently. For that reason I would mark this down as an issue with ASan, > and it should be reported to the GCC project. OK, thanks very much for this detailed analysis, Zan. I spent some time yesterday trying to put together an isolated reproducer that would be suitable for a GCC bug report; however, I was not successful. I will keep trying. Created attachment 343916 [details]
Patch
(In reply to Michael Catanzaro from comment #26) > Created attachment 343916 [details] > Patch Zan, Darin, are you OK with landing this workaround (provided that we additionally report this issue to either the asan or the GCC developers)? I've failed to create a reduction that doesn't depend on the WebKit codebase, but we can at least submit the complete preprocessed source of StyleBuilder.cpp (the source file that is including StyleBuilderCustom.h here). (In reply to Michael Catanzaro from comment #23) > template<typename U> > RectEdges(U&& top, U&& right, U&& bottom, U&& left) > - : m_sides({ { std::forward<T>(top), std::forward<T>(right), > std::forward<T>(bottom), std::forward<T>(left) } }) > + : m_sides({ { std::forward<U>(top), std::forward<U>(right), > std::forward<U>(bottom), std::forward<U>(left) } }) Yes, I think that’s a real bug that needs to be fixed! > And then I found some misplaced WTFMoves (moving from copies rather than > rvalue references). Those don’t seem like bugs. It makes sense to move from a copy, to avoid making yet another copy when doing the assignment. Comment on attachment 343916 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=343916&action=review > Source/WebCore/css/StyleBuilderCustom.h:579 > + // FIXME: This should not be constructed when unneeded, but see https://bugs.webkit.org/show_bug.cgi?id=186980. I’d rather have the comment do something other than pointing to the bug report. I would write this comment instead. // FIXME: Put this into a local variable to work around a bug in the GCC <xxx> version of Address Sanitizer. // Might be slightly less efficient when the type is not BorderImage since this is unused in that case. You can reference the bug number if you like, but I’d like the comment in the code to be more specific. (In reply to Michael Catanzaro from comment #27) > I've failed to create a reduction that doesn't depend on the WebKit > codebase, but we can at least submit the complete preprocessed source of > StyleBuilder.cpp (the source file that is including StyleBuilderCustom.h > here). This is just intellectual curiosity, nothing required of course: Rather than creating a reduction by writing fresh code, did you try starting with the complete preprocessed StyleBuilder.cpp and then delete as much as you could? Might be able to make a smaller test case that way. (In reply to Darin Adler from comment #30) > (In reply to Michael Catanzaro from comment #27) > > I've failed to create a reduction that doesn't depend on the WebKit > > codebase, but we can at least submit the complete preprocessed source of > > StyleBuilder.cpp (the source file that is including StyleBuilderCustom.h > > here). > > This is just intellectual curiosity, nothing required of course: Rather than > creating a reduction by writing fresh code, did you try starting with the > complete preprocessed StyleBuilder.cpp and then delete as much as you could? > Might be able to make a smaller test case that way. There is a tool called “delta” that does this for you. You just write a shell script that returns a particular status when the test input file reproduces the issue, and the delta tool then reduces the input file. http://delta.tigris.org/ (In reply to Darin Adler from comment #28) > > And then I found some misplaced WTFMoves (moving from copies rather than > > rvalue references). > > Those don’t seem like bugs. It makes sense to move from a copy, to avoid > making yet another copy when doing the assignment. Yes, I was wrong. (In reply to Darin Adler from comment #30) > This is just intellectual curiosity, nothing required of course: Rather than > creating a reduction by writing fresh code, did you try starting with the > complete preprocessed StyleBuilder.cpp and then delete as much as you could? > Might be able to make a smaller test case that way. Good idea. (In reply to David Kilzer (:ddkilzer) from comment #31) > There is a tool called “delta” that does this for you. You just write a > shell script that returns a particular status when the test input file > reproduces the issue, and the delta tool then reduces the input file. > > http://delta.tigris.org/ Interesting! (In reply to Darin Adler from comment #28) > Yes, I think that’s a real bug that needs to be fixed! Bug #187182 Created attachment 344030 [details]
Complete preprocessed source
Created attachment 344031 [details]
Corresponding assembly
Committed r233405: <https://trac.webkit.org/changeset/233405> GCC bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1607459 I never got a reduction, so it might not be appreciated, but that will have to suffice. |