Bug 190113 - ASAN failure in ~GCReachableRef()
Summary: ASAN failure in ~GCReachableRef()
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Ryosuke Niwa
URL:
Keywords: InRadar
Depends on:
Blocks: 190115
  Show dependency treegraph
 
Reported: 2018-09-29 15:01 PDT by Ryosuke Niwa
Modified: 2018-10-01 19:59 PDT (History)
10 users (show)

See Also:


Attachments
Fixes the bug (3.26 KB, patch)
2018-09-29 15:14 PDT, Ryosuke Niwa
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ryosuke Niwa 2018-09-29 15:01:12 PDT
~GCReachableRef can cause an ASAN failure if it's called after it had been WTFMove'd.
Comment 1 Ryosuke Niwa 2018-09-29 15:01:48 PDT
<rdar://problem/44866778>
Comment 2 Ryosuke Niwa 2018-09-29 15:10:06 PDT
Hm... it looks like custom-elements-reaction-queue-retains-js-wrapper.html would always timeout in ASAN builds :( Probably too much malloc involved with the test.

Is there a way to skip a test only in ASAN builds?
Comment 3 Ryosuke Niwa 2018-09-29 15:14:46 PDT
Created attachment 351200 [details]
Fixes the bug
Comment 4 Alexey Proskuryakov 2018-09-29 16:13:24 PDT
ASan shouldn’t be orders of magnitude slower than release. I would expect some cause specific to this test. 

Does it ever finish with --no-timeout?
Comment 5 Geoffrey Garen 2018-09-29 16:22:02 PDT
Comment on attachment 351200 [details]
Fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=351200&action=review

> Source/WebCore/ChangeLog:10
> +        The bug was caused by ~GCReachableRef accessing Ref after it had been poisoned for ASAN
> +        in Ref::leakRef via Ref(Ref&& other). Fixed the bug by using RefPtr instead since that's
> +        the simplest solution here although we could unpoison Ref temporarily as done in ~Ref.

In ~GCReachableRef, who calls Ref::leakRef?
Comment 6 Ryosuke Niwa 2018-09-29 17:09:29 PDT
(In reply to Geoffrey Garen from comment #5)
> Comment on attachment 351200 [details]
> Fixes the bug
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=351200&action=review
> 
> > Source/WebCore/ChangeLog:10
> > +        The bug was caused by ~GCReachableRef accessing Ref after it had been poisoned for ASAN
> > +        in Ref::leakRef via Ref(Ref&& other). Fixed the bug by using RefPtr instead since that's
> > +        the simplest solution here although we could unpoison Ref temporarily as done in ~Ref.
> 
> In ~GCReachableRef, who calls Ref::leakRef?

GCReachableRef(GCReachableRef&&), which WTFMove's Ref, which invokes leakRef on Ref in the original / source GCReachableRef.
Comment 7 Ryosuke Niwa 2018-09-29 17:48:15 PDT
Hm... it looks like the timeout I'm experiencing is nothing to do with this code / patch or custom elements. WebKitTestRunner's WebContent process can end up stalling and sit there without doing anything, and the set of tests in which this problem reproduces shifts.
Comment 8 Ryosuke Niwa 2018-09-29 17:49:28 PDT
Maybe yet another fallout from https://trac.webkit.org/changeset/236512?
Comment 9 Ryosuke Niwa 2018-09-29 18:02:36 PDT
Man... our testing infrastructure isn't in a good shape lately. iOS EWS is still suffering from the same data migration issue:

RuntimeError raised: Timed out while waiting for data migration
Traceback (most recent call last):
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py", line 85, in main
    run_details = run(port, options, args, stderr)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py", line 448, in run
    run_details = manager.run(args)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/manager.py", line 246, in run
    if not self._set_up_run(tests_to_run):
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/manager.py", line 181, in _set_up_run
    self._port.setup_test_run(self._options.device_class)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/port/ios.py", line 160, in setup_test_run
    self._create_devices(device_class)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/port/ios_simulator.py", line 103, in _create_devices
    SimulatedDeviceManager.initialize_devices([request] * self.child_processes(), self.host)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/xcode/simulated_device.py", line 370, in initialize_devices
    SimulatedDeviceManager.wait_until_data_migration_is_done(host, max(0, deadline - time.time()))
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/xcode/simulated_device.py", line 430, in wait_until_data_migration_is_done
    raise RuntimeError('Timed out while waiting for data migration')
RuntimeError: Timed out while waiting for data migration
Stopping HTTP server ...
Stopping WebSocket server ...
Stopping Web Platform Test server ...
Comment 10 Ryosuke Niwa 2018-09-29 18:59:34 PDT
Hm... I can still reproduce the stall at r236511 but whatever causing it is to do with WebKitTestRunner or run-webkit-tests. Weirdly, specifying --time-out-ms=3000 would cause the tests to timeout less frequently despite of the fact none of the tests are timing out.

Anyhow, I don't think this is anything to do with this bug or the patch.
Comment 11 Geoffrey Garen 2018-09-30 20:21:42 PDT
> > In ~GCReachableRef, who calls Ref::leakRef?
> 
> GCReachableRef(GCReachableRef&&), which WTFMove's Ref, which invokes leakRef
> on Ref in the original / source GCReachableRef.

In ~GCReachableRef, who calls GCReachableRef(GCReachableRef&&)? :P
Comment 12 Darin Adler 2018-09-30 20:52:51 PDT
Comment on attachment 351200 [details]
Fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=351200&action=review

> Source/WebCore/dom/GCReachableRef.h:66
> +        : m_ptr(WTFMove(other.m_ptr))

This line of code is wrong and uses of this function template won’t compile. Since "other" is a Ref, it does not have an "m_ptr" member to move from. The reason we don’t see a compiler error is that there are no uses of this function template. Maybe we should delete it for now? Or we could add unit tests for it if we want to keep it.

> Source/WebCore/dom/GCReachableRef.h:75
>      GCReachableRef(GCReachableRef&& other)
> -        : m_ref(WTFMove(other.m_ref))
> +        : m_ptr(WTFMove(other.m_ptr))
>      {
>      }

This is the same as the default implementation; we could get the same effect with "= default" here or maybe by omitting this entirely and letting it get generated.

On reflection, though, I realize that one thing this lacks is an assertion that the "other" is not already null. Perhaps we should leave this explicitly written out, but also explicitly forbid moving out of the same object twice just as the move constructor in Ref does, by asserting the pointer is non-null in the function body.

> Source/WebCore/dom/GCReachableRef.h:77
>      template<typename X, typename Y> GCReachableRef(const GCReachableRef<X, Y>& other) = delete;

It doesn't makes sense that we would have to delete this function template. Yes, we need to delete the automatically generated copy constructor, since the code it generates would be incorrect, but this copy-constructor-like function template should *not* be automatically generated so it should not need to be deleted.

The same logic applies in the Ref class. I have no idea why the copy-constructor-like and assignment-operator-like template functions need to be deleted in that class. I believe those lines of code in Ref.h have no effect and are unnecessary.

When GCReachableRef was using a Ref for a member, though, we did *not* need to delete its copy constructor, the Ref member already prevented it from being generated. But now that we are using RefPtr instead, we need to prevent the automatic generation of the copy constructor and copy assignment operator either directly with delete or indirectly by defining the move constructor and move assignment operator.

So this line should go. And we have additional work to do to make sure we don’t automatically generate an incorrect copy constructor and assignment operator.

> Source/WebCore/dom/GCReachableRef.h:80
> +    T* operator->() const { ASSERT(m_ptr); return m_ptr.get(); }
> +    T* ptr() const RETURNS_NONNULL { ASSERT(m_ptr); return m_ptr.get(); }

Would be nice to not have to write out the assertion. Maybe just call &get() in these functions?

> Source/WebCore/dom/GCReachableRef.h:83
> +    T& get() const { ASSERT(m_ptr); return *m_ptr; }
> +    operator T&() const { ASSERT(m_ptr); return *m_ptr; }
> +    bool operator!() const { ASSERT(m_ptr); return !*m_ptr; }

RefPtr::operator* already asserts, so there is no need for additional assertion here
Comment 13 Darin Adler 2018-09-30 21:02:50 PDT
(In reply to Geoffrey Garen from comment #11)
> > > In ~GCReachableRef, who calls Ref::leakRef?
> > 
> > GCReachableRef(GCReachableRef&&), which WTFMove's Ref, which invokes leakRef
> > on Ref in the original / source GCReachableRef.
> 
> In ~GCReachableRef, who calls GCReachableRef(GCReachableRef&&)? :P

The language Ryosuke was using was imprecise.

The error in GCReachableRef before this patch is that ~GCReachableRef was calling GCReachableRef::isNull, which in turn was implemented with a call to Ref::isHashTableEmptyValue. It‘s not OK to call Ref::isHashTableEmptyValue on a Ref once Ref::leakRef has been called on it.

Despite the fact that a moved-from or leakRef'd-from Ref uses the same value, nullptr, that hash table empty values do, the use of poisoning means we can't blur the distinction between these two.

One way to fix this would have been to write a new GCReachableRef::isNull that uses some different function in Ref that knows how to answer the question in a way that works even on poisoned memory.
Comment 14 Ryosuke Niwa 2018-09-30 22:56:19 PDT
Comment on attachment 351200 [details]
Fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=351200&action=review

>> Source/WebCore/dom/GCReachableRef.h:66
>> +        : m_ptr(WTFMove(other.m_ptr))
> 
> This line of code is wrong and uses of this function template won’t compile. Since "other" is a Ref, it does not have an "m_ptr" member to move from. The reason we don’t see a compiler error is that there are no uses of this function template. Maybe we should delete it for now? Or we could add unit tests for it if we want to keep it.

Sure, will remove.

>> Source/WebCore/dom/GCReachableRef.h:75
>>      }
> 
> This is the same as the default implementation; we could get the same effect with "= default" here or maybe by omitting this entirely and letting it get generated.
> 
> On reflection, though, I realize that one thing this lacks is an assertion that the "other" is not already null. Perhaps we should leave this explicitly written out, but also explicitly forbid moving out of the same object twice just as the move constructor in Ref does, by asserting the pointer is non-null in the function body.

Oh weird. I thought I had added ASSERT here but clearly I forgot. Will add the assertion.

>> Source/WebCore/dom/GCReachableRef.h:77
>>      template<typename X, typename Y> GCReachableRef(const GCReachableRef<X, Y>& other) = delete;
> 
> It doesn't makes sense that we would have to delete this function template. Yes, we need to delete the automatically generated copy constructor, since the code it generates would be incorrect, but this copy-constructor-like function template should *not* be automatically generated so it should not need to be deleted.
> 
> The same logic applies in the Ref class. I have no idea why the copy-constructor-like and assignment-operator-like template functions need to be deleted in that class. I believe those lines of code in Ref.h have no effect and are unnecessary.
> 
> When GCReachableRef was using a Ref for a member, though, we did *not* need to delete its copy constructor, the Ref member already prevented it from being generated. But now that we are using RefPtr instead, we need to prevent the automatic generation of the copy constructor and copy assignment operator either directly with delete or indirectly by defining the move constructor and move assignment operator.
> 
> So this line should go. And we have additional work to do to make sure we don’t automatically generate an incorrect copy constructor and assignment operator.

Hm... I don't quite follow what you're saying here.
Isn't this the line to prevent generating an incorrect copy constructor??

>> Source/WebCore/dom/GCReachableRef.h:80
>> +    T* ptr() const RETURNS_NONNULL { ASSERT(m_ptr); return m_ptr.get(); }
> 
> Would be nice to not have to write out the assertion. Maybe just call &get() in these functions?

Sure, will do.

>> Source/WebCore/dom/GCReachableRef.h:83
>> +    bool operator!() const { ASSERT(m_ptr); return !*m_ptr; }
> 
> RefPtr::operator* already asserts, so there is no need for additional assertion here

Sure, will remove.
Comment 15 Alexey Proskuryakov 2018-10-01 00:28:17 PDT
(In reply to Ryosuke Niwa from comment #10)
> Hm... I can still reproduce the stall at r236511 but whatever causing it is to
> do with WebKitTestRunner or run-webkit-tests. Weirdly, specifying --time-out-ms=3000
> would cause the tests to timeout less frequently despite of the fact none of the
> tests are timing out.

Could you please file a new bug with steps to reproduce? I don't understand what you are describing in these comments.

> Man... our testing infrastructure isn't in a good shape lately. iOS EWS is still suffering from the same data migration issue:
> 
> RuntimeError raised: Timed out while waiting for data migration

As you know, we just upgraded EWS to new major version of iOS SDK. It will take some tweaking to work around changed performance characteristics of the simulator. Either way, EWS eventually passed.
Comment 16 Darin Adler 2018-10-01 09:58:43 PDT
Comment on attachment 351200 [details]
Fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=351200&action=review

>>> Source/WebCore/dom/GCReachableRef.h:77
>>>      template<typename X, typename Y> GCReachableRef(const GCReachableRef<X, Y>& other) = delete;
>> 
>> It doesn't makes sense that we would have to delete this function template. Yes, we need to delete the automatically generated copy constructor, since the code it generates would be incorrect, but this copy-constructor-like function template should *not* be automatically generated so it should not need to be deleted.
>> 
>> The same logic applies in the Ref class. I have no idea why the copy-constructor-like and assignment-operator-like template functions need to be deleted in that class. I believe those lines of code in Ref.h have no effect and are unnecessary.
>> 
>> When GCReachableRef was using a Ref for a member, though, we did *not* need to delete its copy constructor, the Ref member already prevented it from being generated. But now that we are using RefPtr instead, we need to prevent the automatic generation of the copy constructor and copy assignment operator either directly with delete or indirectly by defining the move constructor and move assignment operator.
>> 
>> So this line should go. And we have additional work to do to make sure we don’t automatically generate an incorrect copy constructor and assignment operator.
> 
> Hm... I don't quite follow what you're saying here.
> Isn't this the line to prevent generating an incorrect copy constructor??

No. A template function like this, that is similar to the copy constructor and seems to cover a cast like it, will not prevent the language/compiler from generating the implicitly-declared and implicitly-defined copy constructor.
Comment 17 Ryosuke Niwa 2018-10-01 14:37:39 PDT
(In reply to Darin Adler from comment #16)
> Comment on attachment 351200 [details]
> Fixes the bug
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=351200&action=review
> 
> >>> Source/WebCore/dom/GCReachableRef.h:77
> >>>      template<typename X, typename Y> GCReachableRef(const GCReachableRef<X, Y>& other) = delete;
> >> 
> >> It doesn't makes sense that we would have to delete this function template. Yes, we need to delete the automatically generated copy constructor, since the code it generates would be incorrect, but this copy-constructor-like function template should *not* be automatically generated so it should not need to be deleted.
> >> 
> >> The same logic applies in the Ref class. I have no idea why the copy-constructor-like and assignment-operator-like template functions need to be deleted in that class. I believe those lines of code in Ref.h have no effect and are unnecessary.
> >> 
> >> When GCReachableRef was using a Ref for a member, though, we did *not* need to delete its copy constructor, the Ref member already prevented it from being generated. But now that we are using RefPtr instead, we need to prevent the automatic generation of the copy constructor and copy assignment operator either directly with delete or indirectly by defining the move constructor and move assignment operator.
> >> 
> >> So this line should go. And we have additional work to do to make sure we don’t automatically generate an incorrect copy constructor and assignment operator.
> > 
> > Hm... I don't quite follow what you're saying here.
> > Isn't this the line to prevent generating an incorrect copy constructor??
> 
> No. A template function like this, that is similar to the copy constructor
> and seems to cover a cast like it, will not prevent the language/compiler
> from generating the implicitly-declared and implicitly-defined copy
> constructor.

Oh, I see. Thanks for the clarification.
Comment 18 Ryosuke Niwa 2018-10-01 14:50:19 PDT
Comment on attachment 351200 [details]
Fixes the bug

I just noticed that GCReachableRef already uses WTF_MAKE_NONCOPYABLE so we just need to delete:
template<typename X, typename Y> GCReachableRef(const GCReachableRef<X, Y>& other) = delete;
We don't need to add new code to prevent copy constructor and operator=.
Comment 19 Ryosuke Niwa 2018-10-01 14:53:38 PDT
Committed r236693: <https://trac.webkit.org/changeset/236693>
Comment 20 Ryosuke Niwa 2018-10-01 19:59:53 PDT
(In reply to Alexey Proskuryakov from comment #15)
> (In reply to Ryosuke Niwa from comment #10)
> > Hm... I can still reproduce the stall at r236511 but whatever causing it is to
> > do with WebKitTestRunner or run-webkit-tests. Weirdly, specifying --time-out-ms=3000
> > would cause the tests to timeout less frequently despite of the fact none of the
> > tests are timing out.
> 
> Could you please file a new bug with steps to reproduce? I don't understand
> what you are describing in these comments.

This is very strange but I can't reproduce these timeouts with trunk WebKit anymore. Maybe it was a temporary issue which existed at the revision I was testing.

> > Man... our testing infrastructure isn't in a good shape lately. iOS EWS is still suffering from the same data migration issue:
> > 
> > RuntimeError raised: Timed out while waiting for data migration
> 
> As you know, we just upgraded EWS to new major version of iOS SDK. It will
> take some tweaking to work around changed performance characteristics of the
> simulator. Either way, EWS eventually passed.

Yeah, I understand. I wasn't trying to blame anyone.