Bug 190113

Summary: ASAN failure in ~GCReachableRef()
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: DOMAssignee: Ryosuke Niwa <rniwa>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, cdumez, darin, dbates, ddkilzer, esprehn+autocc, ews-watchlist, ggaren, kangil.han, ryanhaddad
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 190115    
Attachments:
Description Flags
Fixes the bug darin: review+

Ryosuke Niwa
Reported 2018-09-29 15:01:12 PDT
~GCReachableRef can cause an ASAN failure if it's called after it had been WTFMove'd.
Attachments
Fixes the bug (3.26 KB, patch)
2018-09-29 15:14 PDT, Ryosuke Niwa
darin: review+
Ryosuke Niwa
Comment 1 2018-09-29 15:01:48 PDT
Ryosuke Niwa
Comment 2 2018-09-29 15:10:06 PDT
Hm... it looks like custom-elements-reaction-queue-retains-js-wrapper.html would always timeout in ASAN builds :( Probably too much malloc involved with the test. Is there a way to skip a test only in ASAN builds?
Ryosuke Niwa
Comment 3 2018-09-29 15:14:46 PDT
Created attachment 351200 [details] Fixes the bug
Alexey Proskuryakov
Comment 4 2018-09-29 16:13:24 PDT
ASan shouldn’t be orders of magnitude slower than release. I would expect some cause specific to this test. Does it ever finish with --no-timeout?
Geoffrey Garen
Comment 5 2018-09-29 16:22:02 PDT
Comment on attachment 351200 [details] Fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=351200&action=review > Source/WebCore/ChangeLog:10 > + The bug was caused by ~GCReachableRef accessing Ref after it had been poisoned for ASAN > + in Ref::leakRef via Ref(Ref&& other). Fixed the bug by using RefPtr instead since that's > + the simplest solution here although we could unpoison Ref temporarily as done in ~Ref. In ~GCReachableRef, who calls Ref::leakRef?
Ryosuke Niwa
Comment 6 2018-09-29 17:09:29 PDT
(In reply to Geoffrey Garen from comment #5) > Comment on attachment 351200 [details] > Fixes the bug > > View in context: > https://bugs.webkit.org/attachment.cgi?id=351200&action=review > > > Source/WebCore/ChangeLog:10 > > + The bug was caused by ~GCReachableRef accessing Ref after it had been poisoned for ASAN > > + in Ref::leakRef via Ref(Ref&& other). Fixed the bug by using RefPtr instead since that's > > + the simplest solution here although we could unpoison Ref temporarily as done in ~Ref. > > In ~GCReachableRef, who calls Ref::leakRef? GCReachableRef(GCReachableRef&&), which WTFMove's Ref, which invokes leakRef on Ref in the original / source GCReachableRef.
Ryosuke Niwa
Comment 7 2018-09-29 17:48:15 PDT
Hm... it looks like the timeout I'm experiencing is nothing to do with this code / patch or custom elements. WebKitTestRunner's WebContent process can end up stalling and sit there without doing anything, and the set of tests in which this problem reproduces shifts.
Ryosuke Niwa
Comment 8 2018-09-29 17:49:28 PDT
Maybe yet another fallout from https://trac.webkit.org/changeset/236512?
Ryosuke Niwa
Comment 9 2018-09-29 18:02:36 PDT
Man... our testing infrastructure isn't in a good shape lately. iOS EWS is still suffering from the same data migration issue: RuntimeError raised: Timed out while waiting for data migration Traceback (most recent call last): File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py", line 85, in main run_details = run(port, options, args, stderr) File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py", line 448, in run run_details = manager.run(args) File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/manager.py", line 246, in run if not self._set_up_run(tests_to_run): File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/manager.py", line 181, in _set_up_run self._port.setup_test_run(self._options.device_class) File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/port/ios.py", line 160, in setup_test_run self._create_devices(device_class) File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/port/ios_simulator.py", line 103, in _create_devices SimulatedDeviceManager.initialize_devices([request] * self.child_processes(), self.host) File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/xcode/simulated_device.py", line 370, in initialize_devices SimulatedDeviceManager.wait_until_data_migration_is_done(host, max(0, deadline - time.time())) File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/xcode/simulated_device.py", line 430, in wait_until_data_migration_is_done raise RuntimeError('Timed out while waiting for data migration') RuntimeError: Timed out while waiting for data migration Stopping HTTP server ... Stopping WebSocket server ... Stopping Web Platform Test server ...
Ryosuke Niwa
Comment 10 2018-09-29 18:59:34 PDT
Hm... I can still reproduce the stall at r236511 but whatever causing it is to do with WebKitTestRunner or run-webkit-tests. Weirdly, specifying --time-out-ms=3000 would cause the tests to timeout less frequently despite of the fact none of the tests are timing out. Anyhow, I don't think this is anything to do with this bug or the patch.
Geoffrey Garen
Comment 11 2018-09-30 20:21:42 PDT
> > In ~GCReachableRef, who calls Ref::leakRef? > > GCReachableRef(GCReachableRef&&), which WTFMove's Ref, which invokes leakRef > on Ref in the original / source GCReachableRef. In ~GCReachableRef, who calls GCReachableRef(GCReachableRef&&)? :P
Darin Adler
Comment 12 2018-09-30 20:52:51 PDT
Comment on attachment 351200 [details] Fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=351200&action=review > Source/WebCore/dom/GCReachableRef.h:66 > + : m_ptr(WTFMove(other.m_ptr)) This line of code is wrong and uses of this function template won’t compile. Since "other" is a Ref, it does not have an "m_ptr" member to move from. The reason we don’t see a compiler error is that there are no uses of this function template. Maybe we should delete it for now? Or we could add unit tests for it if we want to keep it. > Source/WebCore/dom/GCReachableRef.h:75 > GCReachableRef(GCReachableRef&& other) > - : m_ref(WTFMove(other.m_ref)) > + : m_ptr(WTFMove(other.m_ptr)) > { > } This is the same as the default implementation; we could get the same effect with "= default" here or maybe by omitting this entirely and letting it get generated. On reflection, though, I realize that one thing this lacks is an assertion that the "other" is not already null. Perhaps we should leave this explicitly written out, but also explicitly forbid moving out of the same object twice just as the move constructor in Ref does, by asserting the pointer is non-null in the function body. > Source/WebCore/dom/GCReachableRef.h:77 > template<typename X, typename Y> GCReachableRef(const GCReachableRef<X, Y>& other) = delete; It doesn't makes sense that we would have to delete this function template. Yes, we need to delete the automatically generated copy constructor, since the code it generates would be incorrect, but this copy-constructor-like function template should *not* be automatically generated so it should not need to be deleted. The same logic applies in the Ref class. I have no idea why the copy-constructor-like and assignment-operator-like template functions need to be deleted in that class. I believe those lines of code in Ref.h have no effect and are unnecessary. When GCReachableRef was using a Ref for a member, though, we did *not* need to delete its copy constructor, the Ref member already prevented it from being generated. But now that we are using RefPtr instead, we need to prevent the automatic generation of the copy constructor and copy assignment operator either directly with delete or indirectly by defining the move constructor and move assignment operator. So this line should go. And we have additional work to do to make sure we don’t automatically generate an incorrect copy constructor and assignment operator. > Source/WebCore/dom/GCReachableRef.h:80 > + T* operator->() const { ASSERT(m_ptr); return m_ptr.get(); } > + T* ptr() const RETURNS_NONNULL { ASSERT(m_ptr); return m_ptr.get(); } Would be nice to not have to write out the assertion. Maybe just call &get() in these functions? > Source/WebCore/dom/GCReachableRef.h:83 > + T& get() const { ASSERT(m_ptr); return *m_ptr; } > + operator T&() const { ASSERT(m_ptr); return *m_ptr; } > + bool operator!() const { ASSERT(m_ptr); return !*m_ptr; } RefPtr::operator* already asserts, so there is no need for additional assertion here
Darin Adler
Comment 13 2018-09-30 21:02:50 PDT
(In reply to Geoffrey Garen from comment #11) > > > In ~GCReachableRef, who calls Ref::leakRef? > > > > GCReachableRef(GCReachableRef&&), which WTFMove's Ref, which invokes leakRef > > on Ref in the original / source GCReachableRef. > > In ~GCReachableRef, who calls GCReachableRef(GCReachableRef&&)? :P The language Ryosuke was using was imprecise. The error in GCReachableRef before this patch is that ~GCReachableRef was calling GCReachableRef::isNull, which in turn was implemented with a call to Ref::isHashTableEmptyValue. It‘s not OK to call Ref::isHashTableEmptyValue on a Ref once Ref::leakRef has been called on it. Despite the fact that a moved-from or leakRef'd-from Ref uses the same value, nullptr, that hash table empty values do, the use of poisoning means we can't blur the distinction between these two. One way to fix this would have been to write a new GCReachableRef::isNull that uses some different function in Ref that knows how to answer the question in a way that works even on poisoned memory.
Ryosuke Niwa
Comment 14 2018-09-30 22:56:19 PDT
Comment on attachment 351200 [details] Fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=351200&action=review >> Source/WebCore/dom/GCReachableRef.h:66 >> + : m_ptr(WTFMove(other.m_ptr)) > > This line of code is wrong and uses of this function template won’t compile. Since "other" is a Ref, it does not have an "m_ptr" member to move from. The reason we don’t see a compiler error is that there are no uses of this function template. Maybe we should delete it for now? Or we could add unit tests for it if we want to keep it. Sure, will remove. >> Source/WebCore/dom/GCReachableRef.h:75 >> } > > This is the same as the default implementation; we could get the same effect with "= default" here or maybe by omitting this entirely and letting it get generated. > > On reflection, though, I realize that one thing this lacks is an assertion that the "other" is not already null. Perhaps we should leave this explicitly written out, but also explicitly forbid moving out of the same object twice just as the move constructor in Ref does, by asserting the pointer is non-null in the function body. Oh weird. I thought I had added ASSERT here but clearly I forgot. Will add the assertion. >> Source/WebCore/dom/GCReachableRef.h:77 >> template<typename X, typename Y> GCReachableRef(const GCReachableRef<X, Y>& other) = delete; > > It doesn't makes sense that we would have to delete this function template. Yes, we need to delete the automatically generated copy constructor, since the code it generates would be incorrect, but this copy-constructor-like function template should *not* be automatically generated so it should not need to be deleted. > > The same logic applies in the Ref class. I have no idea why the copy-constructor-like and assignment-operator-like template functions need to be deleted in that class. I believe those lines of code in Ref.h have no effect and are unnecessary. > > When GCReachableRef was using a Ref for a member, though, we did *not* need to delete its copy constructor, the Ref member already prevented it from being generated. But now that we are using RefPtr instead, we need to prevent the automatic generation of the copy constructor and copy assignment operator either directly with delete or indirectly by defining the move constructor and move assignment operator. > > So this line should go. And we have additional work to do to make sure we don’t automatically generate an incorrect copy constructor and assignment operator. Hm... I don't quite follow what you're saying here. Isn't this the line to prevent generating an incorrect copy constructor?? >> Source/WebCore/dom/GCReachableRef.h:80 >> + T* ptr() const RETURNS_NONNULL { ASSERT(m_ptr); return m_ptr.get(); } > > Would be nice to not have to write out the assertion. Maybe just call &get() in these functions? Sure, will do. >> Source/WebCore/dom/GCReachableRef.h:83 >> + bool operator!() const { ASSERT(m_ptr); return !*m_ptr; } > > RefPtr::operator* already asserts, so there is no need for additional assertion here Sure, will remove.
Alexey Proskuryakov
Comment 15 2018-10-01 00:28:17 PDT
(In reply to Ryosuke Niwa from comment #10) > Hm... I can still reproduce the stall at r236511 but whatever causing it is to > do with WebKitTestRunner or run-webkit-tests. Weirdly, specifying --time-out-ms=3000 > would cause the tests to timeout less frequently despite of the fact none of the > tests are timing out. Could you please file a new bug with steps to reproduce? I don't understand what you are describing in these comments. > Man... our testing infrastructure isn't in a good shape lately. iOS EWS is still suffering from the same data migration issue: > > RuntimeError raised: Timed out while waiting for data migration As you know, we just upgraded EWS to new major version of iOS SDK. It will take some tweaking to work around changed performance characteristics of the simulator. Either way, EWS eventually passed.
Darin Adler
Comment 16 2018-10-01 09:58:43 PDT
Comment on attachment 351200 [details] Fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=351200&action=review >>> Source/WebCore/dom/GCReachableRef.h:77 >>> template<typename X, typename Y> GCReachableRef(const GCReachableRef<X, Y>& other) = delete; >> >> It doesn't makes sense that we would have to delete this function template. Yes, we need to delete the automatically generated copy constructor, since the code it generates would be incorrect, but this copy-constructor-like function template should *not* be automatically generated so it should not need to be deleted. >> >> The same logic applies in the Ref class. I have no idea why the copy-constructor-like and assignment-operator-like template functions need to be deleted in that class. I believe those lines of code in Ref.h have no effect and are unnecessary. >> >> When GCReachableRef was using a Ref for a member, though, we did *not* need to delete its copy constructor, the Ref member already prevented it from being generated. But now that we are using RefPtr instead, we need to prevent the automatic generation of the copy constructor and copy assignment operator either directly with delete or indirectly by defining the move constructor and move assignment operator. >> >> So this line should go. And we have additional work to do to make sure we don’t automatically generate an incorrect copy constructor and assignment operator. > > Hm... I don't quite follow what you're saying here. > Isn't this the line to prevent generating an incorrect copy constructor?? No. A template function like this, that is similar to the copy constructor and seems to cover a cast like it, will not prevent the language/compiler from generating the implicitly-declared and implicitly-defined copy constructor.
Ryosuke Niwa
Comment 17 2018-10-01 14:37:39 PDT
(In reply to Darin Adler from comment #16) > Comment on attachment 351200 [details] > Fixes the bug > > View in context: > https://bugs.webkit.org/attachment.cgi?id=351200&action=review > > >>> Source/WebCore/dom/GCReachableRef.h:77 > >>> template<typename X, typename Y> GCReachableRef(const GCReachableRef<X, Y>& other) = delete; > >> > >> It doesn't makes sense that we would have to delete this function template. Yes, we need to delete the automatically generated copy constructor, since the code it generates would be incorrect, but this copy-constructor-like function template should *not* be automatically generated so it should not need to be deleted. > >> > >> The same logic applies in the Ref class. I have no idea why the copy-constructor-like and assignment-operator-like template functions need to be deleted in that class. I believe those lines of code in Ref.h have no effect and are unnecessary. > >> > >> When GCReachableRef was using a Ref for a member, though, we did *not* need to delete its copy constructor, the Ref member already prevented it from being generated. But now that we are using RefPtr instead, we need to prevent the automatic generation of the copy constructor and copy assignment operator either directly with delete or indirectly by defining the move constructor and move assignment operator. > >> > >> So this line should go. And we have additional work to do to make sure we don’t automatically generate an incorrect copy constructor and assignment operator. > > > > Hm... I don't quite follow what you're saying here. > > Isn't this the line to prevent generating an incorrect copy constructor?? > > No. A template function like this, that is similar to the copy constructor > and seems to cover a cast like it, will not prevent the language/compiler > from generating the implicitly-declared and implicitly-defined copy > constructor. Oh, I see. Thanks for the clarification.
Ryosuke Niwa
Comment 18 2018-10-01 14:50:19 PDT
Comment on attachment 351200 [details] Fixes the bug I just noticed that GCReachableRef already uses WTF_MAKE_NONCOPYABLE so we just need to delete: template<typename X, typename Y> GCReachableRef(const GCReachableRef<X, Y>& other) = delete; We don't need to add new code to prevent copy constructor and operator=.
Ryosuke Niwa
Comment 19 2018-10-01 14:53:38 PDT
Ryosuke Niwa
Comment 20 2018-10-01 19:59:53 PDT
(In reply to Alexey Proskuryakov from comment #15) > (In reply to Ryosuke Niwa from comment #10) > > Hm... I can still reproduce the stall at r236511 but whatever causing it is to > > do with WebKitTestRunner or run-webkit-tests. Weirdly, specifying --time-out-ms=3000 > > would cause the tests to timeout less frequently despite of the fact none of the > > tests are timing out. > > Could you please file a new bug with steps to reproduce? I don't understand > what you are describing in these comments. This is very strange but I can't reproduce these timeouts with trunk WebKit anymore. Maybe it was a temporary issue which existed at the revision I was testing. > > Man... our testing infrastructure isn't in a good shape lately. iOS EWS is still suffering from the same data migration issue: > > > > RuntimeError raised: Timed out while waiting for data migration > > As you know, we just upgraded EWS to new major version of iOS SDK. It will > take some tweaking to work around changed performance characteristics of the > simulator. Either way, EWS eventually passed. Yeah, I understand. I wasn't trying to blame anyone.
Note You need to log in before you can comment on or make changes to this bug.