Bug 188426

Summary: JSRunLoopTimer may run part of a member function after it's destroyed
Product: WebKit Reporter: Saam Barati <saam>
Component: JavaScriptCoreAssignee: Saam Barati <saam>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, ap, benjamin, commit-queue, ews-watchlist, fpizlo, ggaren, gskachkov, jfbastien, keith_miller, mark.lam, msaboff, rmorisset, rniwa, ryanhaddad, ticaiolima, tsavell, webkit-bug-importer, ysuzuki
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 188582, 188832, 188880    
Bug Blocks:    
Attachments:
Description Flags
patch
ews-watchlist: commit-queue-
Archive of layout-test-results from ews103 for mac-sierra
none
Archive of layout-test-results from ews107 for mac-sierra-wk2
none
Archive of layout-test-results from ews117 for mac-sierra
none
patch
ews-watchlist: commit-queue-
Archive of layout-test-results from ews101 for mac-sierra
none
patch
none
patch
none
patch
none
patch
saam: review-
WIP
ews-watchlist: commit-queue-
Archive of layout-test-results from ews101 for mac-sierra
none
Archive of layout-test-results from ews107 for mac-sierra-wk2
none
Archive of layout-test-results from ews126 for ios-simulator-wk2
none
Archive of layout-test-results from ews125 for ios-simulator-wk2
none
WIP
ews-watchlist: commit-queue-
Archive of layout-test-results from ews103 for mac-sierra
none
WIP
none
WIP
none
WIP
none
WIP
none
patch
none
patch
mark.lam: review+, ews-watchlist: commit-queue-
Archive of layout-test-results from ews117 for mac-sierra
none
Archive of layout-test-results from ews206 for win-future
none
patch for landing
none
patch for landing
none
attaching logs of stack traces
none
patch for landing none

Description Saam Barati 2018-08-08 18:33:18 PDT
Let's consider this function:

```
void JSRunLoopTimer::timerDidFire()
{
    JSLock* apiLock = m_apiLock.get();
    if (!apiLock) {
        // Likely a buggy usage: the timer fired while JSRunLoopTimer was being destroyed.
        return;
    }
    // HERE
    std::lock_guard<JSLock> lock(*apiLock);
    RefPtr<VM> vm = apiLock->vm();
    if (!vm) {
        // The VM has been destroyed, so we should just give up.
        return;
    }

    doWork();
}
```

Look at the comment "HERE" above. Let's say the thread gets context switched out for X seconds at that point. And let's say that during that X seconds, the VM is destroyed. Transitively, the VM owns this JSRunLoopTimer (e.g, GCActivityCallback). If that happens, we're now accessing fields on a dead object.

(An easy way to reproduce this bug is to just drop a sleep at the "HERE" comment).
Comment 1 Saam Barati 2018-08-09 11:18:51 PDT
After thinking about this, it seems like the only reasonable way to ensure the lifetime is for this class to ref() itself when it sets a timer, and deref when the callback is called. But this get's slightly tricky with how we use these RunLoopTimer objects.
Comment 2 Saam Barati 2018-08-09 18:28:18 PDT
Created attachment 346884 [details]
patch
Comment 3 EWS Watchlist 2018-08-09 19:18:14 PDT
Comment on attachment 346884 [details]
patch

Attachment 346884 [details] did not pass mac-ews (mac):
Output: https://webkit-queues.webkit.org/results/8814816

New failing tests:
imported/w3c/web-platform-tests/css/cssom/cssstyledeclaration-mutationrecord-001.html
Comment 4 EWS Watchlist 2018-08-09 19:18:16 PDT
Created attachment 346885 [details]
Archive of layout-test-results from ews103 for mac-sierra

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103  Port: mac-sierra  Platform: Mac OS X 10.12.6
Comment 5 EWS Watchlist 2018-08-09 19:46:46 PDT
Comment on attachment 346884 [details]
patch

Attachment 346884 [details] did not pass mac-wk2-ews (mac-wk2):
Output: https://webkit-queues.webkit.org/results/8815017

New failing tests:
editing/selection/select-bidi-run.html
Comment 6 EWS Watchlist 2018-08-09 19:46:48 PDT
Created attachment 346887 [details]
Archive of layout-test-results from ews107 for mac-sierra-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107  Port: mac-sierra-wk2  Platform: Mac OS X 10.12.6
Comment 7 EWS Watchlist 2018-08-09 20:07:13 PDT
Comment on attachment 346884 [details]
patch

Attachment 346884 [details] did not pass mac-debug-ews (mac):
Output: https://webkit-queues.webkit.org/results/8815036

New failing tests:
fast/events/fire-scroll-event-element.html
Comment 8 EWS Watchlist 2018-08-09 20:07:15 PDT
Created attachment 346888 [details]
Archive of layout-test-results from ews117 for mac-sierra

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews117  Port: mac-sierra  Platform: Mac OS X 10.12.6
Comment 9 Saam Barati 2018-08-09 20:58:02 PDT
I need to handle the VM dying before the timer.
Comment 10 Saam Barati 2018-08-09 21:49:56 PDT
Created attachment 346890 [details]
patch
Comment 11 EWS Watchlist 2018-08-09 22:58:14 PDT
Comment on attachment 346890 [details]
patch

Attachment 346890 [details] did not pass mac-ews (mac):
Output: https://webkit-queues.webkit.org/results/8816225

New failing tests:
workers/bomb.html
Comment 12 EWS Watchlist 2018-08-09 22:58:16 PDT
Created attachment 346891 [details]
Archive of layout-test-results from ews101 for mac-sierra

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews101  Port: mac-sierra  Platform: Mac OS X 10.12.6
Comment 13 Saam Barati 2018-08-09 23:07:56 PDT
Created attachment 346892 [details]
patch
Comment 14 Saam Barati 2018-08-09 23:10:05 PDT
Created attachment 346893 [details]
patch
Comment 15 Saam Barati 2018-08-09 23:14:23 PDT
Created attachment 346894 [details]
patch
Comment 16 Mark Lam 2018-08-10 12:44:12 PDT
Comment on attachment 346894 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=346894&action=review

Some comments for now.  Per our offline discussion, I'll wait for the next patch with bug fixes before continuing the review.

> Source/JavaScriptCore/heap/GCActivityCallback.cpp:60
>      doCollection();

I think you should update the year on the copyright header.

> Source/JavaScriptCore/heap/GCActivityCallback.cpp:69
>      if (newDelay * timerSlop > m_delay)
>          return;
>      Seconds delta = m_delay - newDelay;
>      m_delay = newDelay;
> -    CFRunLoopTimerSetNextFireDate(m_timer.get(), CFRunLoopTimerGetNextFireDate(m_timer.get()) - delta.seconds());
> +    setTimeUntilFire(timeUntilFire() - delta);

This logic is now broken because you no longer reset m_delay to s_decade in cancelTimer().  You still need cancelTimer() just for the purpose of resetting m_delay.

> Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:211
> +    if (m_isTimerFiringForBalancedDeref) {

Use UNLIKELY() for this condition?
Comment 17 Saam Barati 2018-08-10 14:40:54 PDT
Created attachment 346929 [details]
patch
Comment 18 Saam Barati 2018-08-12 14:40:59 PDT
Comment on attachment 346929 [details]
patch

I have an easier way to do this.
Comment 19 Saam Barati 2018-08-12 22:56:17 PDT
Created attachment 346996 [details]
WIP
Comment 20 EWS Watchlist 2018-08-12 22:58:43 PDT
Attachment 346996 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:89:  This { should be at the end of the previous line  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:198:  This { should be at the end of the previous line  [whitespace/braces] [4]
Total errors found: 2 in 19 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 21 EWS Watchlist 2018-08-13 00:14:04 PDT
Comment on attachment 346996 [details]
WIP

Attachment 346996 [details] did not pass mac-ews (mac):
Output: https://webkit-queues.webkit.org/results/8841409

New failing tests:
http/wpt/resource-timing/rt-initiatorType.worker.html
Comment 22 EWS Watchlist 2018-08-13 00:14:06 PDT
Created attachment 346997 [details]
Archive of layout-test-results from ews101 for mac-sierra

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews101  Port: mac-sierra  Platform: Mac OS X 10.12.6
Comment 23 EWS Watchlist 2018-08-13 00:49:39 PDT
Comment on attachment 346996 [details]
WIP

Attachment 346996 [details] did not pass mac-wk2-ews (mac-wk2):
Output: https://webkit-queues.webkit.org/results/8841494

New failing tests:
http/wpt/resource-timing/rt-initiatorType.worker.html
Comment 24 EWS Watchlist 2018-08-13 00:49:41 PDT
Created attachment 347000 [details]
Archive of layout-test-results from ews107 for mac-sierra-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107  Port: mac-sierra-wk2  Platform: Mac OS X 10.12.6
Comment 25 EWS Watchlist 2018-08-13 01:06:46 PDT
Comment on attachment 346996 [details]
WIP

Attachment 346996 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: https://webkit-queues.webkit.org/results/8841492

New failing tests:
http/wpt/resource-timing/rt-initiatorType.worker.html
Comment 26 EWS Watchlist 2018-08-13 01:06:48 PDT
Created attachment 347001 [details]
Archive of layout-test-results from ews126 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews126  Port: ios-simulator-wk2  Platform: Mac OS X 10.13.4
Comment 27 EWS Watchlist 2018-08-13 03:11:28 PDT
Comment on attachment 346996 [details]
WIP

Attachment 346996 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: https://webkit-queues.webkit.org/results/8842144

New failing tests:
http/wpt/resource-timing/rt-initiatorType.worker.html
Comment 28 EWS Watchlist 2018-08-13 03:11:30 PDT
Created attachment 347005 [details]
Archive of layout-test-results from ews125 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews125  Port: ios-simulator-wk2  Platform: Mac OS X 10.13.4
Comment 29 Saam Barati 2018-08-13 11:46:59 PDT
Created attachment 347021 [details]
WIP
Comment 30 EWS Watchlist 2018-08-13 11:49:05 PDT
Attachment 347021 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:94:  This { should be at the end of the previous line  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:203:  This { should be at the end of the previous line  [whitespace/braces] [4]
Total errors found: 2 in 19 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 31 EWS Watchlist 2018-08-13 13:06:13 PDT
Comment on attachment 347021 [details]
WIP

Attachment 347021 [details] did not pass mac-ews (mac):
Output: https://webkit-queues.webkit.org/results/8846401

New failing tests:
http/wpt/resource-timing/rt-initiatorType.worker.html
Comment 32 EWS Watchlist 2018-08-13 13:06:15 PDT
Created attachment 347029 [details]
Archive of layout-test-results from ews103 for mac-sierra

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103  Port: mac-sierra  Platform: Mac OS X 10.12.6
Comment 33 Saam Barati 2018-08-13 13:07:55 PDT
Created attachment 347030 [details]
WIP
Comment 34 EWS Watchlist 2018-08-13 13:09:49 PDT
Attachment 347030 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:136:  This { should be at the end of the previous line  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:241:  This { should be at the end of the previous line  [whitespace/braces] [4]
Total errors found: 2 in 19 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 35 Saam Barati 2018-08-13 13:16:35 PDT
Created attachment 347031 [details]
WIP
Comment 36 EWS Watchlist 2018-08-13 13:17:50 PDT
Attachment 347031 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:136:  This { should be at the end of the previous line  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:241:  This { should be at the end of the previous line  [whitespace/braces] [4]
Total errors found: 2 in 19 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 37 Saam Barati 2018-08-13 13:22:36 PDT
Created attachment 347032 [details]
WIP
Comment 38 EWS Watchlist 2018-08-13 13:25:23 PDT
Attachment 347032 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:135:  This { should be at the end of the previous line  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:240:  This { should be at the end of the previous line  [whitespace/braces] [4]
Total errors found: 2 in 19 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 39 Saam Barati 2018-08-13 17:27:42 PDT
Created attachment 347056 [details]
WIP
Comment 40 Saam Barati 2018-08-13 23:01:32 PDT
Created attachment 347065 [details]
patch
Comment 41 EWS Watchlist 2018-08-13 23:03:58 PDT
Attachment 347065 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:135:  This { should be at the end of the previous line  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:233:  This { should be at the end of the previous line  [whitespace/braces] [4]
Total errors found: 2 in 21 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 42 Geoffrey Garen 2018-08-14 10:17:03 PDT
Comment on attachment 347065 [details]
patch

Can't a JSRunLoopTimer just ref() itself inside JSRunLoopTimer::scheduleTimer() and deref() itself inside JSRunLoopTimer::cancelTimer()?
Comment 43 Keith Miller 2018-08-14 10:35:14 PDT
(In reply to Geoffrey Garen from comment #42)
> Comment on attachment 347065 [details]
> patch
> 
> Can't a JSRunLoopTimer just ref() itself inside
> JSRunLoopTimer::scheduleTimer() and deref() itself inside
> JSRunLoopTimer::cancelTimer()?

No, if another thread calls cancelTimer() after CFRunLoop has called our callback but before the JSRunLoopTimers has managed to ref itself. You'd now be running a function with a freed this.
Comment 44 Geoffrey Garen 2018-08-14 11:06:56 PDT
(In reply to Keith Miller from comment #43)
> (In reply to Geoffrey Garen from comment #42)
> > Comment on attachment 347065 [details]
> > patch
> > 
> > Can't a JSRunLoopTimer just ref() itself inside
> > JSRunLoopTimer::scheduleTimer() and deref() itself inside
> > JSRunLoopTimer::cancelTimer()?
> 
> No, if another thread calls cancelTimer() after CFRunLoop has called our
> callback but before the JSRunLoopTimers has managed to ref itself. You'd now
> be running a function with a freed this.

If you ref() inside JSRunLoopTimer::scheduleTimer(), you already hold a ref when CFRunLoop calls your callback.
Comment 45 Mark Lam 2018-08-14 11:11:58 PDT
(In reply to Geoffrey Garen from comment #44)
> (In reply to Keith Miller from comment #43)
> > (In reply to Geoffrey Garen from comment #42)
> > > Comment on attachment 347065 [details]
> > > patch
> > > 
> > > Can't a JSRunLoopTimer just ref() itself inside
> > > JSRunLoopTimer::scheduleTimer() and deref() itself inside
> > > JSRunLoopTimer::cancelTimer()?
> > 
> > No, if another thread calls cancelTimer() after CFRunLoop has called our
> > callback but before the JSRunLoopTimers has managed to ref itself. You'd now
> > be running a function with a freed this.
> 
> If you ref() inside JSRunLoopTimer::scheduleTimer(), you already hold a ref
> when CFRunLoop calls your callback.

I believe the issue is that the run loop's timer callback may have started but got pre-empted before it can ref the Timer object.  Meanwhile on the main thread, the VM shutdowns and frees the Timer object.  Eventually, the timer thread gets to run and ref's the now freed Timer object and reads its fields ... which is what we're trying to prevent.
Comment 46 Mark Lam 2018-08-14 11:14:03 PDT
Comment on attachment 347065 [details]
patch

The mac-debug EWS build failures looks legit.
Comment 47 Mark Lam 2018-08-14 11:15:51 PDT
(In reply to Mark Lam from comment #45)
> (In reply to Geoffrey Garen from comment #44)
> > (In reply to Keith Miller from comment #43)
> > > (In reply to Geoffrey Garen from comment #42)
> > > > Comment on attachment 347065 [details]
> > > > patch
> > > > 
> > > > Can't a JSRunLoopTimer just ref() itself inside
> > > > JSRunLoopTimer::scheduleTimer() and deref() itself inside
> > > > JSRunLoopTimer::cancelTimer()?
> > > 
> > > No, if another thread calls cancelTimer() after CFRunLoop has called our
> > > callback but before the JSRunLoopTimers has managed to ref itself. You'd now
> > > be running a function with a freed this.
> > 
> > If you ref() inside JSRunLoopTimer::scheduleTimer(), you already hold a ref
> > when CFRunLoop calls your callback.
> 
> I believe the issue is that the run loop's timer callback may have started
> but got pre-empted before it can ref the Timer object.  Meanwhile on the
> main thread, the VM shutdowns and frees the Timer object.  Eventually, the
> timer thread gets to run and ref's the now freed Timer object and reads its
> fields ... which is what we're trying to prevent.

Nevermind.  You said "inside JSRunLoopTimer::scheduleTimer()" as in asymmetric ref/deref.  I believe Saam identified some complications with cancelTimer() when doing that, but we'll take a second look.
Comment 48 Saam Barati 2018-08-14 11:30:48 PDT
(In reply to Geoffrey Garen from comment #44)
> (In reply to Keith Miller from comment #43)
> > (In reply to Geoffrey Garen from comment #42)
> > > Comment on attachment 347065 [details]
> > > patch
> > > 
> > > Can't a JSRunLoopTimer just ref() itself inside
> > > JSRunLoopTimer::scheduleTimer() and deref() itself inside
> > > JSRunLoopTimer::cancelTimer()?
> > 
> > No, if another thread calls cancelTimer() after CFRunLoop has called our
> > callback but before the JSRunLoopTimers has managed to ref itself. You'd now
> > be running a function with a freed this.
> 
> If you ref() inside JSRunLoopTimer::scheduleTimer(), you already hold a ref
> when CFRunLoop calls your callback.

Yes, this would work. But the tricky thing is getting cancel correct. The goal being: we want to be able to deref() in cancel so we don't leak.

If you want to see what I came up with to solve this, the patch for that is here:
https://bugs.webkit.org/attachment.cgi?id=346929&action=review

I think that approach turned out to be a lot more complicated than what I have up for review now.

The main issue I ran into with the prior approach is this:
1. I schedule a timer. I ref() b/c of this.
2. I cancel a timer. I'd like to deref() here. However, how do I know I've done that in a way that's safe. E.g, have I cancelled before the timer fired? This turns out to be trickier than I expected.
Comment 49 Saam Barati 2018-08-14 12:25:37 PDT
Created attachment 347102 [details]
patch

Fix debug build.
Comment 50 EWS Watchlist 2018-08-14 12:28:52 PDT
Attachment 347102 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:135:  This { should be at the end of the previous line  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:233:  This { should be at the end of the previous line  [whitespace/braces] [4]
Total errors found: 2 in 21 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 51 Geoffrey Garen 2018-08-14 13:03:57 PDT
> 2. I cancel a timer. I'd like to deref() here. However, how do I know I've
> done that in a way that's safe. E.g, have I cancelled before the timer
> fired?

Under what conditions do you cancel a timer, and then it fires afterward anyway?
Comment 52 EWS Watchlist 2018-08-14 13:32:21 PDT
Comment on attachment 347102 [details]
patch

Attachment 347102 [details] did not pass mac-debug-ews (mac):
Output: https://webkit-queues.webkit.org/results/8858423

Number of test failures exceeded the failure limit.
Comment 53 EWS Watchlist 2018-08-14 13:32:23 PDT
Created attachment 347105 [details]
Archive of layout-test-results from ews117 for mac-sierra

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews117  Port: mac-sierra  Platform: Mac OS X 10.12.6
Comment 54 Saam Barati 2018-08-14 13:37:23 PDT
(In reply to Geoffrey Garen from comment #51)
> > 2. I cancel a timer. I'd like to deref() here. However, how do I know I've
> > done that in a way that's safe. E.g, have I cancelled before the timer
> > fired?
> 
> Under what conditions do you cancel a timer, and then it fires afterward
> anyway?

When it's imminently about to fire. For example, let's say we have threads T1 and T2. T1 is running the runloop we're firing on, and T2 is the thread that will cancel the timer.

T1. Timer fires
T1. CF issues a call to our callback.
T1. Before an any callback code is run, T1 gets context switched
T2. Cancels

In such a scenario, T2 and T1 can't *both* deref.
Comment 55 Geoffrey Garen 2018-08-14 13:47:39 PDT
(In reply to Saam Barati from comment #54)
> (In reply to Geoffrey Garen from comment #51)
> > > 2. I cancel a timer. I'd like to deref() here. However, how do I know I've
> > > done that in a way that's safe. E.g, have I cancelled before the timer
> > > fired?
> > 
> > Under what conditions do you cancel a timer, and then it fires afterward
> > anyway?
> 
> When it's imminently about to fire. For example, let's say we have threads
> T1 and T2. T1 is running the runloop we're firing on, and T2 is the thread
> that will cancel the timer.
> 
> T1. Timer fires
> T1. CF issues a call to our callback.
> T1. Before an any callback code is run, T1 gets context switched
> T2. Cancels
> 
> In such a scenario, T2 and T1 can't *both* deref.

Is it a requirement that you should be able to call JSRunLoopTimer::cancelTimer() from any thread at any time? Who does this?
Comment 56 Saam Barati 2018-08-14 13:55:11 PDT
(In reply to Geoffrey Garen from comment #55)
> (In reply to Saam Barati from comment #54)
> > (In reply to Geoffrey Garen from comment #51)
> > > > 2. I cancel a timer. I'd like to deref() here. However, how do I know I've
> > > > done that in a way that's safe. E.g, have I cancelled before the timer
> > > > fired?
> > > 
> > > Under what conditions do you cancel a timer, and then it fires afterward
> > > anyway?
> > 
> > When it's imminently about to fire. For example, let's say we have threads
> > T1 and T2. T1 is running the runloop we're firing on, and T2 is the thread
> > that will cancel the timer.
> > 
> > T1. Timer fires
> > T1. CF issues a call to our callback.
> > T1. Before an any callback code is run, T1 gets context switched
> > T2. Cancels
> > 
> > In such a scenario, T2 and T1 can't *both* deref.
> 
> Is it a requirement that you should be able to call
> JSRunLoopTimer::cancelTimer() from any thread at any time? Who does this?

API users could cause this to happen in arbitrary ways:
- We grab the runloop at VM construction.
- We then can run code on that VM from any other thread. That code can do many things that will cancelTimer()
Comment 57 Saam Barati 2018-08-14 14:40:04 PDT
Looks like HashMap<Ref<P>, SomeVaueWhereEmptyValueIsNotZero> is broken...
Comment 58 EWS Watchlist 2018-08-15 17:42:26 PDT
Comment on attachment 347102 [details]
patch

Attachment 347102 [details] did not pass win-ews (win):
Output: https://webkit-queues.webkit.org/results/8873312

New failing tests:
http/tests/security/contentSecurityPolicy/userAgentShadowDOM/allow-video.html
Comment 59 EWS Watchlist 2018-08-15 17:42:38 PDT
Created attachment 347232 [details]
Archive of layout-test-results from ews206 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews206  Port: win-future  Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Comment 60 Mark Lam 2018-08-16 13:59:21 PDT
Comment on attachment 347102 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=347102&action=review

r=me.  I agree that this patch is much easier to reason about in terms of synchronization because it's now reduced to sync'ing on the Manager's lock, and the Manager is immortal.

> Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:194
> +    EpochTime fireEpochTime = epochTime(delay);

At first, I was concerned that delay may be negative, and that we'll have some underflow issue with fireEpochTime.  However, I see that EpochTime is Seconds, and therefore uses a double as storage.  I still think that it's strange to have a negative epoch time but I suppose the math can't be messed up because we're doing doubles math.

> Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:208
> +        if (entry.first.ptr() == &timer) {
> +            entry.second = fireEpochTime;
> +            found = true;
> +        }
> +        scheduleTime = std::min(scheduleTime, entry.second);

Shouldn't we break out of the loop if we've found the timer?  Each timer may only be enqueued in data.timers once, right?

Shouldn't the setting of "scheduleTime = std::min(scheduleTime, entry.second);" only be done if the timer is found?  Oh, I see.   You want to find the smallest time of all the JSRunLoopTimers.  Not sure if it's worth it but if you just cache this smallest time value in a PerVMData field, then you can bail out as soon as you've found the matching timer. ... On second thought, probably not worth it.  This is just an optimization which we can apply later if needed.  For now, symmetry with the cancelTimer() case is probably more important to ensure correctness.

> Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:325
> +        Manager::shared().scheduleTimer(*this, intervalInSeconds);

nit: pass the locker to document the contract that we must first lock the JSRunLoopTimer?

> Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:337
> +    Manager::shared().cancelTimer(*this);

nit: pass the locker to document the contract that we must first lock the JSRunLoopTimer?

> Source/JavaScriptCore/runtime/JSRunLoopTimer.h:64
> +        void willDestroyVM(VM&);

nit: name this unregisterVM for symmetry?
Comment 61 Saam Barati 2018-08-20 17:30:51 PDT
Comment on attachment 347102 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=347102&action=review

>> Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:325
>> +        Manager::shared().scheduleTimer(*this, intervalInSeconds);
> 
> nit: pass the locker to document the contract that we must first lock the JSRunLoopTimer?

That feels weird since we'd be passing in the locker for a JSRunLoopTimer and not the Manager. Other things could use Manager and not lock around calls to schedule/cancel

>> Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:337
>> +    Manager::shared().cancelTimer(*this);
> 
> nit: pass the locker to document the contract that we must first lock the JSRunLoopTimer?

ditto

>> Source/JavaScriptCore/runtime/JSRunLoopTimer.h:64
>> +        void willDestroyVM(VM&);
> 
> nit: name this unregisterVM for symmetry?

sounds good.
Comment 62 Mark Lam 2018-08-20 17:33:13 PDT
(In reply to Saam Barati from comment #61)
> Comment on attachment 347102 [details]
> patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=347102&action=review
> 
> >> Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:325
> >> +        Manager::shared().scheduleTimer(*this, intervalInSeconds);
> > 
> > nit: pass the locker to document the contract that we must first lock the JSRunLoopTimer?
> 
> That feels weird since we'd be passing in the locker for a JSRunLoopTimer
> and not the Manager. Other things could use Manager and not lock around
> calls to schedule/cancel

OK.
Comment 63 Saam Barati 2018-08-20 17:35:18 PDT
Created attachment 347584 [details]
patch for landing
Comment 64 EWS Watchlist 2018-08-20 18:33:39 PDT
Attachment 347584 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:135:  This { should be at the end of the previous line  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:233:  This { should be at the end of the previous line  [whitespace/braces] [4]
Total errors found: 2 in 21 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 65 Saam Barati 2018-08-20 18:40:21 PDT
Created attachment 347591 [details]
patch for landing
Comment 66 EWS Watchlist 2018-08-20 18:43:35 PDT
Attachment 347591 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:135:  This { should be at the end of the previous line  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:235:  This { should be at the end of the previous line  [whitespace/braces] [4]
Total errors found: 2 in 21 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 67 WebKit Commit Bot 2018-08-21 00:27:34 PDT
Comment on attachment 347591 [details]
patch for landing

Clearing flags on attachment: 347591

Committed r235107: <https://trac.webkit.org/changeset/235107>
Comment 68 WebKit Commit Bot 2018-08-21 00:27:36 PDT
All reviewed patches have been landed.  Closing bug.
Comment 69 Truitt Savell 2018-08-21 09:13:43 PDT
It looks like https://trac.webkit.org/changeset/235107/webkit

Has caused this crash to occur in test:
https://webkit-test-results.webkit.org/dashboards/flakiness_dashboard.html#showAllRuns=true&tests=http%2Ftests%2Fwebsocket%2Ftests%2Fhybi%2Fset-protocol.html

Crash:
https://build.webkit.org/results/Apple%20iOS%2011%20Simulator%20Debug%20WK2%20(Tests)/r235107%20(5999)/http/tests/websocket/tests/hybi/set-protocol-crash-log.txt

Excerpt:
stderr:
ASSERTION FAILED: mapIterator->value.contains(url)
/Volumes/Data/slave/ios-simulator-11-debug/build/Source/WebKit/NetworkProcess/FileAPI/NetworkBlobRegistry.cpp(125) : void WebKit::NetworkBlobRegistry::unregisterBlobURL(WebKit::NetworkConnectionToWebProcess *, const WebCore::URL &)
1   0x11ccd3be9 WTFCrash
2   0x10e72271b WTFCrashWithInfo(int, char const*, char const*, int)
3   0x10ec24200 WebKit::NetworkBlobRegistry::unregisterBlobURL(WebKit::NetworkConnectionToWebProcess*, WebCore::URL const&)
4   0x10eb49aed WebKit::NetworkConnectionToWebProcess::unregisterBlobURL(WebCore::URL const&)

Looks like one of the assertions added in this patch is causing the crash.
Comment 70 Saam Barati 2018-08-21 13:04:10 PDT
(In reply to Truitt Savell from comment #69)
> It looks like https://trac.webkit.org/changeset/235107/webkit
> 
> Has caused this crash to occur in test:
> https://webkit-test-results.webkit.org/dashboards/flakiness_dashboard.
> html#showAllRuns=true&tests=http%2Ftests%2Fwebsocket%2Ftests%2Fhybi%2Fset-
> protocol.html
> 
> Crash:
> https://build.webkit.org/results/
> Apple%20iOS%2011%20Simulator%20Debug%20WK2%20(Tests)/r235107%20(5999)/http/
> tests/websocket/tests/hybi/set-protocol-crash-log.txt
> 
> Excerpt:
> stderr:
> ASSERTION FAILED: mapIterator->value.contains(url)
> /Volumes/Data/slave/ios-simulator-11-debug/build/Source/WebKit/
> NetworkProcess/FileAPI/NetworkBlobRegistry.cpp(125) : void
> WebKit::NetworkBlobRegistry::unregisterBlobURL(WebKit::
> NetworkConnectionToWebProcess *, const WebCore::URL &)
> 1   0x11ccd3be9 WTFCrash
> 2   0x10e72271b WTFCrashWithInfo(int, char const*, char const*, int)
> 3   0x10ec24200
> WebKit::NetworkBlobRegistry::unregisterBlobURL(WebKit::
> NetworkConnectionToWebProcess*, WebCore::URL const&)
> 4   0x10eb49aed
> WebKit::NetworkConnectionToWebProcess::unregisterBlobURL(WebCore::URL const&)
> 
> Looks like one of the assertions added in this patch is causing the crash.

This is really weird. Do we even make a VM in the network process?
Comment 71 Saam Barati 2018-08-21 14:55:11 PDT
I'm looking into this now.
Comment 72 Saam Barati 2018-08-21 15:15:21 PDT
I can't reproduce this.
Comment 73 Ryan Haddad 2018-08-21 15:33:32 PDT
I can reproduce the crash with a ToT build with the following:
run-webkit-tests http/tests/websocket/tests/hybi --debug --exit-after-n-crashes-or-timeouts 1 --no-retry -fg --iter 100

I have not yet tried to repro with a build before https://trac.webkit.org/changeset/235107
Comment 74 Saam Barati 2018-08-21 15:46:25 PDT
(In reply to Ryan Haddad from comment #73)
> I can reproduce the crash with a ToT build with the following:
> run-webkit-tests http/tests/websocket/tests/hybi --debug
> --exit-after-n-crashes-or-timeouts 1 --no-retry -fg --iter 100
> 
> I have not yet tried to repro with a build before
> https://trac.webkit.org/changeset/235107

When I do this, every single test fails. Is that your experience as well?
Comment 75 Saam Barati 2018-08-21 16:03:09 PDT
(In reply to Saam Barati from comment #74)
> (In reply to Ryan Haddad from comment #73)
> > I can reproduce the crash with a ToT build with the following:
> > run-webkit-tests http/tests/websocket/tests/hybi --debug
> > --exit-after-n-crashes-or-timeouts 1 --no-retry -fg --iter 100
> > 
> > I have not yet tried to repro with a build before
> > https://trac.webkit.org/changeset/235107
> 
> When I do this, every single test fails. Is that your experience as well?

This is because I had a different apache installed...
Comment 76 Saam Barati 2018-08-21 16:22:32 PDT
(In reply to Ryan Haddad from comment #73)
> I can reproduce the crash with a ToT build with the following:
> run-webkit-tests http/tests/websocket/tests/hybi --debug
> --exit-after-n-crashes-or-timeouts 1 --no-retry -fg --iter 100
> 
> I have not yet tried to repro with a build before
> https://trac.webkit.org/changeset/235107

Did you really run this on the parent directory and not the individual test?
Comment 77 Saam Barati 2018-08-21 16:30:35 PDT
I still can't reproduce this. I need help from y'all on how you're doing it. What OS were you on Ryan?

It would be really helpful if someone can confirm that it can repro at my revision and can't repro on the revision before.
Comment 78 Saam Barati 2018-08-21 16:34:42 PDT
(In reply to Saam Barati from comment #77)
> I still can't reproduce this. I need help from y'all on how you're doing it.
> What OS were you on Ryan?
> 
> It would be really helpful if someone can confirm that it can repro at my
> revision and can't repro on the revision before.

I spoke too soon. I just got it to reproduce.
Comment 79 Saam Barati 2018-08-21 17:52:56 PDT
This appears to be revealing a bug in the blob code.

I've made it so that no JSC timers run, and this crash still happens.
Comment 80 Saam Barati 2018-08-21 18:14:19 PDT
Created attachment 347747 [details]
attaching logs of stack traces

This code seems super innocuous. We're calling a GC from some JS code injected into the test runner.
Comment 81 Saam Barati 2018-08-21 18:20:58 PDT
This patch is also a performance regression :(
Comment 82 WebKit Commit Bot 2018-08-21 18:21:56 PDT
Re-opened since this is blocked by bug 188832
Comment 83 Saam Barati 2018-08-21 22:06:01 PDT
I can’t reproduce the performance regression on my laptop 🤔

I’ll need to see if I can reproduce on other HW
Comment 84 Saam Barati 2018-08-22 11:58:20 PDT
Created attachment 347825 [details]
patch for landing
Comment 85 Saam Barati 2018-08-22 11:59:25 PDT
Comment on attachment 347591 [details]
patch for landing

View in context: https://bugs.webkit.org/attachment.cgi?id=347591&action=review

> Source/JavaScriptCore/heap/GCActivityCallback.cpp:72
> +        setTimeUntilFire(delta);

The perf bug is here. This should be:
setTimeUntilFire(newDelay)
Comment 86 EWS Watchlist 2018-08-22 12:00:25 PDT
Attachment 347825 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:135:  This { should be at the end of the previous line  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/runtime/JSRunLoopTimer.cpp:235:  This { should be at the end of the previous line  [whitespace/braces] [4]
Total errors found: 2 in 21 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 87 WebKit Commit Bot 2018-08-23 16:57:10 PDT
Comment on attachment 347825 [details]
patch for landing

Clearing flags on attachment: 347825

Committed r235261: <https://trac.webkit.org/changeset/235261>
Comment 88 WebKit Commit Bot 2018-08-23 16:57:13 PDT
All reviewed patches have been landed.  Closing bug.