Bug 198298 - REGRESSION (r245756) [Mac] 2 TestWebKitAPI.DownloadProgress* and TestWebKitAPI._WKDownload.DownloadMonitorCancel are flaky timeouts
Summary: REGRESSION (r245756) [Mac] 2 TestWebKitAPI.DownloadProgress* and TestWebKitAP...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit API (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-05-28 09:52 PDT by Shawn Roberts
Modified: 2019-05-30 12:53 PDT (History)
12 users (show)

See Also:


Attachments
Patch (12.12 KB, patch)
2019-05-29 18:35 PDT, David Quesada
ap: review+
Details | Formatted Diff | Diff
Patch for landing (12.12 KB, patch)
2019-05-30 10:49 PDT, David Quesada
david_quesada: commit-queue-
Details | Formatted Diff | Diff
Patch for landing, v2 (12.73 KB, patch)
2019-05-30 11:35 PDT, David Quesada
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Shawn Roberts 2019-05-28 09:52:42 PDT
It appears that after changes in r245756 the following API tests have started to become flaky timeouts.

This API will timeout almost every run on the WK2 and WK1 testers:

TestWebKitAPI.DownloadProgress.LoseProgressWhenDownloadIsCanceled 

https://build.webkit.org/builders/Apple%20Mojave%20Release%20WK1%20%28Tests%29/builds/5157/steps/run-api-tests/logs/stdio

These API tests will timeout flakily on WK1 testers:

TestWebKitAPI.DownloadProgress.CancelDownloadWhenProgressIsCanceled
TestWebKitAPI._WKDownload.DownloadMonitorCancel

https://build.webkit.org/builders/Apple%20Mojave%20Release%20WK1%20%28Tests%29/builds/5158/steps/run-api-tests/logs/stdio

I have tried running individual API tests and cannot get a timeout. I have also tried running the entire API test suite a few times and cannot get any timeouts.
Comment 1 Radar WebKit Bug Importer 2019-05-28 09:53:30 PDT
<rdar://problem/51182393>
Comment 2 Aakash Jain 2019-05-29 06:06:56 PDT
(In reply to Shawn Roberts from comment #0)
> It appears that after changes in r245756 the following API tests have started to become flaky timeouts.
The timing of when we started seeing these failures matches very well with this commit. Also r245756 seems to modify seems area of the code (DownloadCocoa.mm) as the tests. Maybe we can revert this commit?
 
> I have tried running individual API tests and cannot get a timeout. I have also tried running the entire API test suite a few times and cannot get any timeouts.
Maybe worth trying on one of the mac EWS bots (ews150, ews153, ews155, ews119).
Comment 3 Ryan Haddad 2019-05-29 13:55:36 PDT
TestWebKitAPI.DownloadProgress.CancelDownloadWhenProgressIsCanceled is asserting on Mojave debug:

    TestWebKitAPI.DownloadProgress.CancelDownloadWhenProgressIsCanceled
        _RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.
        ASSERTION FAILED: m_downloads.contains(download.downloadID())
        SHOULD NEVER BE REACHED
        /Volumes/Data/slave/mojave-debug/build/Source/WebKit/NetworkProcess/Downloads/DownloadManager.cpp(174) : void WebKit::DownloadManager::downloadFinished(WebKit::Download &)
        /Volumes/Data/slave/mojave-debug/build/WebKitBuild/Debug/DerivedSources/WebKit2/NetworkProcessProxyMessageReceiver.cpp(314) : void WebKit::NetworkProcessProxy::didReceiveNetworkProcessProxyMessage(IPC::Connection &, IPC::Decoder &)

https://build.webkit.org/builders/Apple%20Mojave%20Debug%20WK2%20(Tests)/builds/2782/steps/run-api-tests/logs/stdio

I can reproduce locally with a debug build and the following command:
run-api-tests DownloadProgress.LoseProgressWhenDownloadIsCanceled --iterations 25
Comment 4 Ryan Haddad 2019-05-29 14:04:39 PDT
I cannot reproduce the assertion failure with a build of r245753, but I can with a build of r245756.
Comment 5 David Quesada 2019-05-29 18:35:07 PDT
Created attachment 370906 [details]
Patch
Comment 6 Alexey Proskuryakov 2019-05-30 09:31:06 PDT
Comment on attachment 370906 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=370906&action=review

> Source/WebKit/NetworkProcess/cocoa/NetworkSessionCocoa.mm:629
> +                auto resumeDataReference = [resumeData isKindOfClass:[NSData class]] ? IPC::DataReference { static_cast<const uint8_t*>(resumeData.bytes), resumeData.length } : IPC::DataReference { };

Silently failing to use resumeData seems quite unfortunate. What are we losing today, and what will make resuming fail in the future? We should at least assert that resumeData is one of the types known today.
Comment 7 David Quesada 2019-05-30 10:37:14 PDT
(In reply to Alexey Proskuryakov from comment #6)
> Comment on attachment 370906 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=370906&action=review
> 
> > Source/WebKit/NetworkProcess/cocoa/NetworkSessionCocoa.mm:629
> > +                auto resumeDataReference = [resumeData isKindOfClass:[NSData class]] ? IPC::DataReference { static_cast<const uint8_t*>(resumeData.bytes), resumeData.length } : IPC::DataReference { };
> 
> Silently failing to use resumeData seems quite unfortunate.

I'm not sure I follow what you mean here. Are you referring to the case where resumeData is not an NSData?

> What are we losing today, and what will make resuming fail in the future?

I'm not aware of any cases where the resumeData will be non-nil, non-NSData (if that's what you mean by what we're "losing"). Though the type check a bit paranoid, I don't necessarily object to it since we're working with an untyped dictionary.

> We should at least assert that resumeData is one of the types known today.

Sure. I could add an ASSERT(!resumeData || [resumeData isKindOfClass:[NSData class]]) in this method, if that's what you mean.
Comment 8 David Quesada 2019-05-30 10:49:34 PDT
Created attachment 370955 [details]
Patch for landing
Comment 9 Alexey Proskuryakov 2019-05-30 10:51:26 PDT
Yes. Please consider adding release log too, so that we have a chance of knowing what's causing a problem with resuming in release builds too.
Comment 10 David Quesada 2019-05-30 11:35:54 PDT
Created attachment 370960 [details]
Patch for landing, v2

Added a release log message if the resume data is an unexpected class.
Comment 11 WebKit Commit Bot 2019-05-30 12:53:03 PDT
Comment on attachment 370960 [details]
Patch for landing, v2

Clearing flags on attachment: 370960

Committed r245901: <https://trac.webkit.org/changeset/245901>
Comment 12 WebKit Commit Bot 2019-05-30 12:53:05 PDT
All reviewed patches have been landed.  Closing bug.