Bug 186189

Summary: Crash in WebAnimation::runPendingPlayTask
Product: WebKit Reporter: Carlos Garcia Campos <cgarcia>
Component: AnimationsAssignee: Zan Dobersek <zan>
Status: RESOLVED FIXED    
Severity: Normal CC: dino, ews-watchlist, fred.wang, graouts, jonlee, mcatanzaro, rniwa, webkit-bug-importer, zan
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=186536
Bug Depends on:    
Bug Blocks: 186753    
Attachments:
Description Flags
Patch
none
Archive of layout-test-results from ews102 for mac-sierra
none
Archive of layout-test-results from ews105 for mac-sierra-wk2
none
Archive of layout-test-results from ews126 for ios-simulator-wk2
none
Archive of layout-test-results from ews117 for mac-sierra
none
Archive of layout-test-results from ews115 for mac-sierra
none
WIP patch
none
Patch none

Description Carlos Garcia Campos 2018-06-01 02:52:04 PDT
This is another crash due to using a std::optional value while it's nullopt.

Thread 1 "WebKitWebProces" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51	../sysdeps/unix/sysv/linux/raise.c: No existe el fichero o el directorio.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fdeab736231 in __GI_abort () at abort.c:79
#2  0x00007fdeb87e2d8a in WebCore::WebAnimation::runPendingPlayTask() () from /home/cgarcia/gnome/lib/libwebkit2gtk-4.0.so.37
#3  0x00007fdeb5cd8665 in WTF::dispatchFunctionsFromMainThread() () from /home/cgarcia/gnome/lib/libjavascriptcoregtk-4.0.so.18
#4  0x00007fdeb5d1c063 in WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) () from /home/cgarcia/gnome/lib/libjavascriptcoregtk-4.0.so.18
#5  0x00007fdeae120495 in g_main_dispatch (context=0x560a765b65c0) at gmain.c:3177
#6  g_main_context_dispatch (context=context@entry=0x560a765b65c0) at gmain.c:3830
#7  0x00007fdeae120838 in g_main_context_iterate (context=0x560a765b65c0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3903
#8  0x00007fdeae120b42 in g_main_loop_run (loop=0x560a766399e0) at gmain.c:4099
#9  0x00007fdeb5d1c438 in WTF::RunLoop::run() () from /home/cgarcia/gnome/lib/libjavascriptcoregtk-4.0.so.18
#10 0x00007fdeb8207970 in WebProcessMainUnix () from /home/cgarcia/gnome/lib/libwebkit2gtk-4.0.so.37
#11 0x00007fdeab721a87 in __libc_start_main (main=0x560a74c0ac30 <main>, argc=3, argv=0x7ffc918c5a48, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7ffc918c5a38) at ../csu/libc-start.c:310
#12 0x0000560a74c0acba in _start ()
Comment 1 Carlos Garcia Campos 2018-06-01 02:54:10 PDT
Created attachment 341746 [details]
Patch
Comment 2 EWS Watchlist 2018-06-01 03:59:41 PDT
Comment on attachment 341746 [details]
Patch

Attachment 341746 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/7922746

New failing tests:
fast/animation/css-animation-resuming-when-visible-with-style-change.html
Comment 3 EWS Watchlist 2018-06-01 03:59:42 PDT
Created attachment 341747 [details]
Archive of layout-test-results from ews102 for mac-sierra

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews102  Port: mac-sierra  Platform: Mac OS X 10.12.6
Comment 4 EWS Watchlist 2018-06-01 04:07:47 PDT
Comment on attachment 341746 [details]
Patch

Attachment 341746 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/7922782

New failing tests:
fast/animation/css-animation-resuming-when-visible-with-style-change.html
Comment 5 EWS Watchlist 2018-06-01 04:07:49 PDT
Created attachment 341748 [details]
Archive of layout-test-results from ews105 for mac-sierra-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews105  Port: mac-sierra-wk2  Platform: Mac OS X 10.12.6
Comment 6 EWS Watchlist 2018-06-01 04:26:38 PDT
Comment on attachment 341746 [details]
Patch

Attachment 341746 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/7922662

New failing tests:
fast/animation/css-animation-resuming-when-visible-with-style-change.html
Comment 7 EWS Watchlist 2018-06-01 04:26:40 PDT
Created attachment 341749 [details]
Archive of layout-test-results from ews126 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews126  Port: ios-simulator-wk2  Platform: Mac OS X 10.13.4
Comment 8 EWS Watchlist 2018-06-01 04:57:16 PDT
Comment on attachment 341746 [details]
Patch

Attachment 341746 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/7922917

New failing tests:
fast/animation/css-animation-resuming-when-visible-with-style-change.html
Comment 9 EWS Watchlist 2018-06-01 04:57:18 PDT
Created attachment 341750 [details]
Archive of layout-test-results from ews117 for mac-sierra

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews117  Port: mac-sierra  Platform: Mac OS X 10.12.6
Comment 10 Antoine Quint 2018-06-01 05:24:39 PDT
The change makes sense, I wonder why fast/animation/css-animation-resuming-when-visible-with-style-change.html is timing out on Mac now. It's been consistently timing out on GTK but has been stable on other ports.
Comment 11 EWS Watchlist 2018-06-01 06:37:35 PDT
Comment on attachment 341746 [details]
Patch

Attachment 341746 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/7924047

New failing tests:
fast/animation/css-animation-resuming-when-visible-with-style-change.html
Comment 12 EWS Watchlist 2018-06-01 06:37:36 PDT
Created attachment 341756 [details]
Archive of layout-test-results from ews115 for mac-sierra

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews115  Port: mac-sierra  Platform: Mac OS X 10.12.6
Comment 13 Carlos Garcia Campos 2018-06-05 01:01:45 PDT
(In reply to Antoine Quint from comment #10)
> The change makes sense, I wonder why
> fast/animation/css-animation-resuming-when-visible-with-style-change.html is
> timing out on Mac now. It's been consistently timing out on GTK but has been
> stable on other ports.

The problem, at least in GTK+, is that window.getComputedStyle(testDiv).transform doesn't change after resuming animations, even though I can see the animation happening. So, shouldBecomeEqual("internals.numberOfActiveAnimations()", "1", checkTransformAndFinishTest); never happens and the test doesn't finish.
Comment 14 Carlos Garcia Campos 2018-06-11 02:42:06 PDT
I don't know how to fix this, I'm not familiar with the web animations code. Any idea?
Comment 15 Michael Catanzaro 2018-06-12 12:34:01 PDT
Comment on attachment 341746 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=341746&action=review

I assume that this fixes several animation tests? I would mention that in the changelog.

> Source/WebCore/animation/WebAnimation.cpp:831
>      // 3. If animation's start time is unresolved, perform the following steps:
>      if (!m_startTime) {

Maybe you could avoid the layout test timeout by doing the check here instead?

if (!m_startTime && readyTime)

That way, the behavior should only be changed in the case where the code is going to imminently dereference readyTime when it is nullopt: m_readyPromise will still be resolved, and updateFinishedState will be called. I'm not sure if that's the most-correct solution, but the current code is definitely wrong, and it seems likely to avoid breaking the test?
Comment 16 Michael Catanzaro 2018-06-15 07:40:46 PDT
Comment on attachment 341746 [details]
Patch

Try my suggested change and see if it fixes the failing test.
Comment 17 Carlos Garcia Campos 2018-06-15 23:08:31 PDT
(In reply to Michael Catanzaro from comment #16)
> Comment on attachment 341746 [details]
> Patch
> 
> Try my suggested change and see if it fixes the failing test.

Why? The important thing here is not making a test pass but fixing the bug. Why are you so sure the test is correct? I can't because I'm not familiar with web notifications code nor the tests.
Comment 18 Michael Catanzaro 2018-06-16 09:24:17 PDT
(In reply to Carlos Garcia Campos from comment #17)
> Why? 

So we can solve this bug and move on.

> The important thing here is not making a test pass but fixing the bug.

Your patch breaks the test. Neither of us is going to debug it, and obviously Antoine is not going to give r+. So if we want to land this, we should try a smaller change.

> Why are you so sure the test is correct?

I'm not.

> I can't because I'm not familiar with web notifications code nor the tests.

I assume you meant the animations code. My suggested change was very simple, only changing just enough to avoid the immediate crash. It should also fix this bug. If it works and avoids breaking the test, then we can move on.

Alternatively, if you could mention what test or website you are using to reproduce this crash, then someone else could take a look.
Comment 19 Carlos Garcia Campos 2018-06-16 23:27:59 PDT
(In reply to Michael Catanzaro from comment #18)
> (In reply to Carlos Garcia Campos from comment #17)
> > Why? 
> 
> So we can solve this bug and move on.
> 
> > The important thing here is not making a test pass but fixing the bug.
> 
> Your patch breaks the test. Neither of us is going to debug it, and
> obviously Antoine is not going to give r+. So if we want to land this, we
> should try a smaller change.

I debugged it, and explained my conclusions in comment #13. I don't want to make the test pass, I want to fix the bug. I'm not sure the test is correct.

> > Why are you so sure the test is correct?
> 
> I'm not.
> 
> > I can't because I'm not familiar with web notifications code nor the tests.
> 
> I assume you meant the animations code. My suggested change was very simple,
> only changing just enough to avoid the immediate crash. It should also fix
> this bug. If it works and avoids breaking the test, then we can move on.
> 
> Alternatively, if you could mention what test or website you are using to
> reproduce this crash, then someone else could take a look.

It happens a lot with github and w3c pages.
Comment 20 Michael Catanzaro 2018-06-17 08:19:36 PDT
Comment on attachment 341746 [details]
Patch

I'll remove my r-, but I was using it because this seemed to be stalled. I would at least try my suggestion, since it might unblock this....
Comment 21 Carlos Garcia Campos 2018-06-21 06:25:30 PDT
Antoine, what's the runtime feature we would need to change to disable the new web animations? WebKitGTK+ (and probably WPE too) is unusable due to the web animations crashes, I think we should disable them until the feature is more mature.
Comment 22 Antoine Quint 2018-06-21 06:33:49 PDT
There are two flags defined in Source/WebKit/Shared/WebPreferences.yaml:

    - WebAnimationsEnabled
    - WebAnimationsCSSIntegrationEnabled

The former (WebAnimationsEnabled) governs the general availability of the Web Animations API. The latter (WebAnimationsCSSIntegrationEnabled) governs whether CSS Animations and CSS Transitions should use the Web Animations code or the legacy CSSAnimationController. You probably want to turn WebAnimationsCSSIntegrationEnabled off only.
Comment 23 Michael Catanzaro 2018-06-21 08:59:59 PDT
(In reply to Carlos Garcia Campos from comment #21)
> Antoine, what's the runtime feature we would need to change to disable the
> new web animations? WebKitGTK+ (and probably WPE too) is unusable due to the
> web animations crashes, I think we should disable them until the feature is
> more mature.

Are you seeing other animations crashes besides this crash? Are there bug reports?

(In reply to Antoine Quint from comment #22)
> You probably want to turn WebAnimationsCSSIntegrationEnabled off only.

It's already set to DEFAULT_EXPERIMENTAL_FEATURES_ENABLED, which is optimal IMO. It's even disabled in Epiphany Technology Preview, which explains why I'm not seeing this crash. So Carlos, you might want to ensure your build script uses -DENABLE_EXPERIMENTAL_FEATURES=OFF (the default).

If you really want it disabled in developer builds as well, you could add condition: PLATFORM(COCOA) to the setting in WebPreferences.yaml... but IMO that would not be good.

Anyway, I have had this std::optional madness near the top of my TODO list for several days. I agree we need to fix this ASAP, or we'll have to revert back to C++ 14.
Comment 24 Zan Dobersek 2018-06-25 10:26:07 PDT
(In reply to Carlos Garcia Campos from comment #13)
> (In reply to Antoine Quint from comment #10)
> > The change makes sense, I wonder why
> > fast/animation/css-animation-resuming-when-visible-with-style-change.html is
> > timing out on Mac now. It's been consistently timing out on GTK but has been
> > stable on other ports.
> 
> The problem, at least in GTK+, is that
> window.getComputedStyle(testDiv).transform doesn't change after resuming
> animations, even though I can see the animation happening. So,
> shouldBecomeEqual("internals.numberOfActiveAnimations()", "1",
> checkTransformAndFinishTest); never happens and the test doesn't finish.

The test doesn't time out in WPE, but I don't know the reason for that.

Crashes occur because of value retrieval on uninitialized std::optional<> objects that represent readyTime values. The Cocoa ports also operate on such uninitialized objects, but avoid crashes because the WTF's std::optional implementation intentionally doesn't trigger them. Instead, a zero value is used for computation, and that works as far as tests are concerned.

This access into uninitialized std::optional<> objects occurs in both runPendingPlayTask() and runPendingPauseTask() methods in WebAnimation. Causes are different.

With runPendingPlayTask(), in the tests where crashes are observed, issue occurs because the animations are suspended via DocumentTimeline::suspendAnimations() before the async pending play task is executed. Once suspended, the current time for that timeline is unresolved (i.e. std::nullopt). So the runPendingPlayTask() is called while the animations are suspended, nullopt readyTime value is retrieved, and things explode.

With runPendingPauseTask(), the crashes that are observed in tests are due to the timeline becoming inactive because the DocumentTimeline is detached from the owning Document instance. In WebAnimation::updatePendingTasks(), any pause task is posted to the Document. But the Document might soon after start being destroyed, and the DocumentTimeline gets detached in Document::prepareForDestruction(). This doesn't prevent task dispatch, and WebAnimation::runPendingPauseTask() can end up being dispatched with the DocumentTimeline in an inactive state. DocumentTimeline::currentTime() then ends up returning std::nullopt, and a crash occurs.

In both cases, WebAnimation ends up using DocumentTimeline as if it was guaranteed to be active, when in reality it isn't. It's not clear to me whether the current implementation correctly follows the specification when determining the 'ready time' value in both runPendingPlayTasks() and runPendingPauseTasks().

As a temporary solution, I would like the GTK and WPE ports to sync with the behavior of Cocoa ports by adjusting the readyTime value usage in runPendingPlayTask() and runPendingPauseTask() methods to fall back to a zero-Seconds value in case of an uninitialized std::optional<> instance. This would primarily avoid the crashes. Timeouts and spec compliance would be addressed later.
Comment 25 Michael Catanzaro 2018-06-25 11:06:50 PDT
(In reply to Zan Dobersek from comment #24)
> As a temporary solution, I would like the GTK and WPE ports to sync with the
> behavior of Cocoa ports by adjusting the readyTime value usage in
> runPendingPlayTask() and runPendingPauseTask() methods to fall back to a
> zero-Seconds value in case of an uninitialized std::optional<> instance.
> This would primarily avoid the crashes. Timeouts and spec compliance would
> be addressed later.

Yes
Comment 26 Carlos Garcia Campos 2018-06-26 00:03:24 PDT
(In reply to Michael Catanzaro from comment #25)
> (In reply to Zan Dobersek from comment #24)
> > As a temporary solution, I would like the GTK and WPE ports to sync with the
> > behavior of Cocoa ports by adjusting the readyTime value usage in
> > runPendingPlayTask() and runPendingPauseTask() methods to fall back to a
> > zero-Seconds value in case of an uninitialized std::optional<> instance.
> > This would primarily avoid the crashes. Timeouts and spec compliance would
> > be addressed later.
> 
> Yes

I've tried to avoid that because I thought that it might hide the actual problem and the bug will never be fixed if it works fine in mac (for whatever reason). But I'm tired of the crashes, so I guess we have no choice.
Comment 27 Zan Dobersek 2018-06-26 00:04:40 PDT
Created attachment 343591 [details]
WIP patch

EWS run of the proposed changes
Comment 28 EWS Watchlist 2018-06-26 00:06:38 PDT
Attachment 343591 [details] did not pass style-queue:


ERROR: Source/WebCore/ChangeLog:8:  You should remove the 'No new tests' and either add and list tests, or explain why no new tests were possible.  [changelog/nonewtests] [5]
Total errors found: 1 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 29 Carlos Garcia Campos 2018-06-26 00:14:41 PDT
Comment on attachment 343591 [details]
WIP patch

View in context: https://bugs.webkit.org/attachment.cgi?id=343591&action=review

> Source/WebCore/animation/WebAnimation.cpp:839
> +        auto newStartTime = readyTime.value_or(Seconds());

You can use 0_s instead of Seconds(). I've tested it on mac and it fixes the test.

> Source/WebCore/animation/WebAnimation.cpp:968
> +        setHoldTime((readyTime.value_or(Seconds()) - animationStartTime.value()) * m_playbackRate);

Ditto.
Comment 30 Zan Dobersek 2018-06-26 01:33:19 PDT
Created attachment 343596 [details]
Patch
Comment 31 Antoine Quint 2018-06-26 02:01:21 PDT
Since the timeline needs to be active, we ought to check at the call site that we have a valid timeline, and add an ASSERT() in the run calls to check that is indeed the case.

So in WebAnimation::updatePendingTasks() we should not only check for m_timeline but also m_timeline->currentTime() and in WebAnimation::runPendingPlayTask() add ASSERT(readyTime).

Would that cover everything or am I missing something?
Comment 32 Carlos Garcia Campos 2018-06-26 02:04:33 PDT
(In reply to Antoine Quint from comment #31)
> Since the timeline needs to be active, we ought to check at the call site
> that we have a valid timeline, and add an ASSERT() in the run calls to check
> that is indeed the case.
> 
> So in WebAnimation::updatePendingTasks() we should not only check for
> m_timeline but also m_timeline->currentTime() and in
> WebAnimation::runPendingPlayTask() add ASSERT(readyTime).
> 
> Would that cover everything or am I missing something?

Isn't that what my initial patch does?
Comment 33 Antoine Quint 2018-06-26 02:14:47 PDT
(In reply to Carlos Garcia Campos from comment #32)
> (In reply to Antoine Quint from comment #31)
> > Since the timeline needs to be active, we ought to check at the call site
> > that we have a valid timeline, and add an ASSERT() in the run calls to check
> > that is indeed the case.
> > 
> > So in WebAnimation::updatePendingTasks() we should not only check for
> > m_timeline but also m_timeline->currentTime() and in
> > WebAnimation::runPendingPlayTask() add ASSERT(readyTime).
> > 
> > Would that cover everything or am I missing something?
> 
> Isn't that what my initial patch does?

Precisely, I forgot, even though I even commented that this looked right. Sorry!

I'm actually working on a bigger patch that will change when the pending pause and play tasks will be performed and they will now be called from DocumentTimeline::updateAnimations() with the guarantee that the timeline is active.

I'm not sure what is the best course of action in the meantime.
Comment 34 Zan Dobersek 2018-06-26 02:37:34 PDT
(In reply to Antoine Quint from comment #33)
> (In reply to Carlos Garcia Campos from comment #32)
> > (In reply to Antoine Quint from comment #31)
> > > Since the timeline needs to be active, we ought to check at the call site
> > > that we have a valid timeline, and add an ASSERT() in the run calls to check
> > > that is indeed the case.
> > > 
> > > So in WebAnimation::updatePendingTasks() we should not only check for
> > > m_timeline but also m_timeline->currentTime() and in
> > > WebAnimation::runPendingPlayTask() add ASSERT(readyTime).
> > > 
> > > Would that cover everything or am I missing something?
> > 
> > Isn't that what my initial patch does?
> 
> Precisely, I forgot, even though I even commented that this looked right.
> Sorry!
> 

That approach causes fast/animation/css-animation-resuming-when-visible-with-style-change.html to timeout because of the pending play task being dispatched while the animations are suspended, effectively not executing the runPendingPlayTask() code.

> I'm actually working on a bigger patch that will change when the pending
> pause and play tasks will be performed and they will now be called from
> DocumentTimeline::updateAnimations() with the guarantee that the timeline is
> active.
> 
> I'm not sure what is the best course of action in the meantime.

Our issue isn't with the Web Animation implementation still being under development, it's just that because we've already switched over to C++17, we are suffering collateral effects like these std::optional crashes. Again, Cocoa ports are seeing the same behavior, the only difference being that the std::optional implementation in WTF (currently used by Cocoa ports due to still building in C++14 mode) doesn't trigger crashes upon invalid access.

The current patch adjusts these misuses while not addressing the root causes because we're not terribly familiar with the Web Animation spec, and it probably wouldn't help if we start posting patches that interfere with your work.

But we'd still like to see crashes gone. Besides these two, there's also bug #187036.
Comment 35 Carlos Garcia Campos 2018-06-26 03:53:32 PDT
Comment on attachment 343596 [details]
Patch

Yes, let's fix the crashes for now with the FIXME. Thanks!
Comment 36 Zan Dobersek 2018-06-26 05:30:21 PDT
Comment on attachment 343596 [details]
Patch

Clearing flags on attachment: 343596

Committed r233196: <https://trac.webkit.org/changeset/233196>
Comment 37 Zan Dobersek 2018-06-26 05:30:26 PDT
All reviewed patches have been landed.  Closing bug.
Comment 38 Radar WebKit Bug Importer 2018-06-26 05:32:57 PDT
<rdar://problem/41467282>