Bug 23865 - Safari can be frozen by rapidly adding timers
Summary: Safari can be frozen by rapidly adding timers
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.5
: P1 Normal
Assignee: Dmitry Titov
URL:
Keywords:
: 28455 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-02-10 03:42 PST by Alexey Proskuryakov
Modified: 2009-09-04 21:02 PDT (History)
5 users (show)

See Also:


Attachments
Proposed patch (7.13 KB, patch)
2009-02-11 19:30 PST, Dmitry Titov
no flags Details | Formatted Diff | Diff
Path to fix UI freeze on long timer queue. (7.59 KB, patch)
2009-08-20 23:39 PDT, Dmitry Titov
abarth: review-
Details | Formatted Diff | Diff
Test file - load it into browser and see a UI freeze for 10 seconds. (1.81 KB, text/html)
2009-09-02 17:17 PDT, Dmitry Titov
no flags Details
Updated patch. (10.19 KB, patch)
2009-09-02 17:19 PDT, Dmitry Titov
levin: review+
levin: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey Proskuryakov 2009-02-10 03:42:10 PST
Follow-up from bug 23705.

The below code freezes Safari completely for me:

---------------------
function test()
{
  setTimeout(test, 0);
  setTimeout(test, 0);
}
setTimeout(test, 0);
---------------------
Comment 1 Darin Adler 2009-02-10 08:52:51 PST
Has this been true in the past, or is this something that recently changed?
Comment 2 Alexey Proskuryakov 2009-02-10 09:26:39 PST
This is not a regression from 3.2.1 - I do not know about earlier versions.
Comment 3 Dmitry Titov 2009-02-10 11:08:10 PST
From looking at the code, it can be happening for a long time. Timers are implemented using CFRunLoopAddTimer. According to Apple docs, the run loop fetches the timers first and input second. So if there is always a timer ready to be scheduled/fired, no input goes through.

This script also freezes FF2 and IE7.
Comment 4 Darin Adler 2009-02-10 12:38:48 PST
(In reply to comment #3)
> From looking at the code, it can be happening for a long time. Timers are
> implemented using CFRunLoopAddTimer. According to Apple docs, the run loop
> fetches the timers first and input second. So if there is always a timer ready
> to be scheduled/fired, no input goes through.

If that analysis is correct, then we do not need two timers in the test() function to make this happen, as we don't need exponential growth to have the problem. I suspect that's not the entire story.
Comment 5 Dmitry Titov 2009-02-10 20:28:16 PST
Sorry, I should have done more experimenting before floating a theory above.

The code Alexey used to file this bug can generate a lot of timers. Even though each of the invocations only creates 2 more and exits, the overall time in ThreadTimers::sharedTimerFiredInternal() grows exponentially. I've added printing of timer count and overall time spent there, here is the result:
** 2 timers, took 0.4 ms
** 10 timers, took 0.6 ms
** 14 timers, took 0.8 ms
** 16 timers, took 1.2 ms
** 26 timers, took 1.6 ms
** 31 timers, took 1.9 ms
** 20 timers, took 1.5 ms
** 22 timers, took 1.4 ms
** 21 timers, took 1.4 ms
** 45 timers, took 2.9 ms
** 81 timers, took 5.5 ms
** 145 timers, took 10.0 ms
** 285 timers, took 19.8 ms
** 546 timers, took 38.2 ms
** 1086 timers, took 78.6 ms
** 2159 timers, took 152.8 ms
** 4295 timers, took 301.6 ms
** 8611 timers, took 625.8 ms
** 17236 timers, took 1254.7 ms
** 34440 timers, took 2546.1 ms
** 69056 timers, took 5250.1 ms
** 138270 timers, took 10695.5 ms
** 276547 timers, took 22173.0 ms

Each time the shared timer is fired, we go through collection of all timers and find out that they all are to be fired - so we loop through all and it takes ever-increasing time. As you can see, very quickly the time spent without returning to the run loop grows into seconds - so the beach ball appears and input events becomes very slow. It seems the input events such as mouse moves/clicks are accumulated and then dispatched in small portions between timer firing. So the user input would need several 'turns' with the timer to move the pointer/highlight the target/process the click. With growing interval between 'turns' it becomes nearly impossible.

Note that timeout mechanism that measures if a piece of script runs too long and aborts it if so (preventing "while(true) {}" from freezing UI) would not help here because each particular JS callback is short. It seems that if we need to fix this, we need to measure how long we are executing timers and break out of the loop with posting another timer when we run them for too long.

I'll have a patch soon.
Comment 6 Dmitry Titov 2009-02-11 19:30:13 PST
Created attachment 27585 [details]
Proposed patch

This patch fires timers one by one, measuring elapsed time. If too much time elapses, it quits firing loop and reschedules. This allows thread run loop to run. Using the JS in description, it now does not freeze UI.

However, if the script continues to run, the timer heap grows. If run for long time, one of 2 will happen:

- user closes the tab and we freeze UI for a while removing millions of timers from timer heap
- OOM and crash

Do we need a variant of "Slow Script" dialog in case the page tries to create too many outstanding timers?
Comment 7 Dmitry Titov 2009-02-11 19:33:17 PST
btw, I think that even separate from how the eventual OOM is handled, the proposed patch makes sense because it will help reduce number of cases when UI freezes, simplifies code and hopefully doesn't make something worse.
Comment 8 Dmitry Titov 2009-04-13 10:52:03 PDT
Comment on attachment 27585 [details]
Proposed patch

Removing r? since this needs a 'slow script' kind of detection too.
Comment 9 Dmitry Titov 2009-08-20 23:39:50 PDT
Created attachment 38364 [details]
Path to fix UI freeze on long timer queue.

Fix UI freezing by a long timer queue.

The patch addresses the case when UI gets  frozen because there are too many
timers in the queue and the CPU can not process all of them quickly enough and
starves the user input.
This may be more pronounced on platforms like cellphones. The patch also
simplifies the code a little - instead of taking all the ready timers out of
the heap and then processing the copy, it takes them one by one.

The recursive multiplication of timers that eventually causes out-of-memory
condition should be addressed, if feasible, in a separate patch - since it
likely involves some sort of limit on resources. If we need to split this bug,
I can do that.
Comment 10 Dmitry Titov 2009-08-21 00:16:08 PDT
*** Bug 28455 has been marked as a duplicate of this bug. ***
Comment 11 Adam Barth 2009-09-01 16:25:44 PDT
Comment on attachment 38364 [details]
Path to fix UI freeze on long timer queue.

+ static const int maxDurationOfFiringTimers = 0.050;

This code makes no sense.  This will just get rounded to zero.

Also, can we write a test for this?  Maybe have DRT queue up an event while firing a lot of timers?
Comment 12 Dmitry Titov 2009-09-02 17:17:12 PDT
Created attachment 38950 [details]
Test file - load it into browser and see a UI freeze for 10 seconds.

This is a manual test file reproducing the problem. The same file will be included into the patch, for WebCore/manula-test. It doesn't seem possible to create a DRT-based regular test because eventSender object that DRT uses to inject input does direct dispatching of the messages which makes it impossible to reproduce the freeze.

Load the file into browser - it will freeze it for about 10 seconds by creating appropriate amount of timers.
Comment 13 Dmitry Titov 2009-09-02 17:19:33 PDT
Created attachment 38952 [details]
Updated patch.

Adam, thanks for taking a look!

I think I've replaced a literal constant with a named one at the last moment... 
Updated patch - this time with manual test file (can't create a regular test for this).
Comment 14 David Levin 2009-09-04 18:06:18 PDT
Comment on attachment 38952 [details]
Updated patch.

This is basically equivalent to what was there before with the addition of a maximum amount of time to process the timers.

The removal of m_timersReadyToFire may cause a newly added timer to callback nearly immediately, but this should only happen for at most a millisecond (depending on the resolution of currentTime is that low) until currentTime() advances, so that seems fine.

A few nits below to fix on landing.

> diff --git a/WebCore/ChangeLog b/WebCore/ChangeLog
> +2009-09-02  Dmitry Titov  <dimich@chromium.org>
> +
> +        Reviewed by NOBODY (OOPS!).
> +

> +        https://bugs.webkit.org/show_bug.cgi?id=23865
> +
> +        Prevent UI freeze in case when too many timers are in the timer queue.
> +        The fix measures the elapsed time while executing timers. If we have too many

> +        timers and it take significant time to fire, quit the loop and reschedule.
"timers and it take*s*"

> +        This lets run loop to process user input (close the window for example).

This lets the run loop process user input (close the window for example).

> +        (WebCore::TimerBase::setNextFireTime):
> +        Since timers are now fired one by one, there is no need to keep track of updated timers.
> +        * manual-tests/input-starved-by-timers.html: manual test that attempts to freeze browser by
*M*anual test that...


> diff --git a/WebCore/manual-tests/input-starved-by-timers.html b/WebCore/manual-tests/input-starved-by-timers.html
> +var targetLatency = 10000; // multiply timers until it takes this much to fire all their callbacks.
*M*ultiply


> +    // Create more timers, capture the current time so when callbacks are fired,
Create more timers. Capture the ...


> +function runTest() {
> +  log("Freezing UI...");
> +  setTimeout(function() { timerCallback(new Date().getTime()); }, 0);
> +  setTimeout("multiplyFactor = 0; log('Finishing. Started to drain timers.');", 10000);

It would be nice to stick with the 4 space indent that you did above.

> diff --git a/WebCore/platform/ThreadTimers.cpp b/WebCore/platform/ThreadTimers.cpp
>  namespace WebCore {
>  
> +// Fire timers until this time, then quit to let the run loop to process user input events.

Consider:
  Fire timers for this length of time, and then quit to let the run loop process user input events.

> +// 100ms is about perceptable delay in UI, have a half of ot as a threshold.

Consider:
  100ms is about a perceptible delay in UI, so use have a half of that as a threshold.

> +// This is to prevent UI freeze when there are too much timers or machine performance is low.

s/much/many/

> +    while (!m_timerHeap.isEmpty() && m_timerHeap.first()->m_nextFireTime <= fireTime) {
...
> +        // Catch the case where the timer asked timers to fire in a nested event loop, or we are over time limit.
> +        if (!m_firingTimers || timeToQuit < currentTime())

Consider making this part of the "while" condition (of course that means the loop would never execute if timeToQuit > currentTime() at the beginning of the loop. 

On further consideration, I think this should remain here because that way we ensure that at least one timer got fired. (I had a hard time seeing how that could happen but maybe there is a slow/loaded machine...)
Comment 15 Dmitry Titov 2009-09-04 20:59:16 PDT
(In reply to comment #14)
> 
> A few nits below to fix on landing.
> 
> > diff --git a/WebCore/ChangeLog b/WebCore/ChangeLog
> > +2009-09-02  Dmitry Titov  <dimich@chromium.org>
> > +
> > +        Reviewed by NOBODY (OOPS!).
> > +
> 
> > +        https://bugs.webkit.org/show_bug.cgi?id=23865
> > +
> > +        Prevent UI freeze in case when too many timers are in the timer queue.
> > +        The fix measures the elapsed time while executing timers. If we have too many
> 
> > +        timers and it take significant time to fire, quit the loop and reschedule.
> "timers and it take*s*"
> 
> > +        This lets run loop to process user input (close the window for example).
> 
> This lets the run loop process user input (close the window for example).
> 
> > +        (WebCore::TimerBase::setNextFireTime):
> > +        Since timers are now fired one by one, there is no need to keep track of updated timers.
> > +        * manual-tests/input-starved-by-timers.html: manual test that attempts to freeze browser by
> *M*anual test that...
> 
> 
> > diff --git a/WebCore/manual-tests/input-starved-by-timers.html b/WebCore/manual-tests/input-starved-by-timers.html
> > +var targetLatency = 10000; // multiply timers until it takes this much to fire all their callbacks.
> *M*ultiply
> 
> 
> > +    // Create more timers, capture the current time so when callbacks are fired,
> Create more timers. Capture the ...
> 
> 
> > +function runTest() {
> > +  log("Freezing UI...");
> > +  setTimeout(function() { timerCallback(new Date().getTime()); }, 0);
> > +  setTimeout("multiplyFactor = 0; log('Finishing. Started to drain timers.');", 10000);
> 
> It would be nice to stick with the 4 space indent that you did above.
> 
> > diff --git a/WebCore/platform/ThreadTimers.cpp b/WebCore/platform/ThreadTimers.cpp
> >  namespace WebCore {
> >  
> > +// Fire timers until this time, then quit to let the run loop to process user input events.
> 
> Consider:
>   Fire timers for this length of time, and then quit to let the run loop
> process user input events.
> 
> > +// 100ms is about perceptable delay in UI, have a half of ot as a threshold.
> 
> Consider:
>   100ms is about a perceptible delay in UI, so use have a half of that as a
> threshold.
> 
> > +// This is to prevent UI freeze when there are too much timers or machine performance is low.
> 
> s/much/many/
> 
> > +    while (!m_timerHeap.isEmpty() && m_timerHeap.first()->m_nextFireTime <= fireTime) {
> ...
> > +        // Catch the case where the timer asked timers to fire in a nested event loop, or we are over time limit.
> > +        if (!m_firingTimers || timeToQuit < currentTime())
> 
> Consider making this part of the "while" condition (of course that means the
> loop would never execute if timeToQuit > currentTime() at the beginning of the
> loop. 
> 
> On further consideration, I think this should remain here because that way we
> ensure that at least one timer got fired. (I had a hard time seeing how that
> could happen but maybe there is a slow/loaded machine...)
Comment 16 Dmitry Titov 2009-09-04 21:02:00 PDT
Ugh, previous entry was submitted by mistake. 

Fixed up and landed: http://trac.webkit.org/changeset/48086

I agree it's better to keep the exit check at the end of the loop, if only for better readability of that loop.