Bug 198298

Summary: REGRESSION (r245756) [Mac] 2 TestWebKitAPI.DownloadProgress* and TestWebKitAPI._WKDownload.DownloadMonitorCancel are flaky timeouts
Product: WebKit Reporter: Shawn Roberts <sroberts>
Component: WebKit APIAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: aakash_jain, achristensen, ap, beidson, cdumez, commit-queue, david_quesada, jlewis3, rniwa, ryanhaddad, tsavell, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=198288
https://bugs.webkit.org/show_bug.cgi?id=197927
Attachments:
Description Flags
Patch
ap: review+
Patch for landing
david_quesada: commit-queue-
Patch for landing, v2 none

Description Shawn Roberts 2019-05-28 09:52:42 PDT
It appears that after changes in r245756 the following API tests have started to become flaky timeouts.

This API will timeout almost every run on the WK2 and WK1 testers:

TestWebKitAPI.DownloadProgress.LoseProgressWhenDownloadIsCanceled 

https://build.webkit.org/builders/Apple%20Mojave%20Release%20WK1%20%28Tests%29/builds/5157/steps/run-api-tests/logs/stdio

These API tests will timeout flakily on WK1 testers:

TestWebKitAPI.DownloadProgress.CancelDownloadWhenProgressIsCanceled
TestWebKitAPI._WKDownload.DownloadMonitorCancel

https://build.webkit.org/builders/Apple%20Mojave%20Release%20WK1%20%28Tests%29/builds/5158/steps/run-api-tests/logs/stdio

I have tried running individual API tests and cannot get a timeout. I have also tried running the entire API test suite a few times and cannot get any timeouts.
Comment 1 Radar WebKit Bug Importer 2019-05-28 09:53:30 PDT
<rdar://problem/51182393>
Comment 2 Aakash Jain 2019-05-29 06:06:56 PDT
(In reply to Shawn Roberts from comment #0)
> It appears that after changes in r245756 the following API tests have started to become flaky timeouts.
The timing of when we started seeing these failures matches very well with this commit. Also r245756 seems to modify seems area of the code (DownloadCocoa.mm) as the tests. Maybe we can revert this commit?
 
> I have tried running individual API tests and cannot get a timeout. I have also tried running the entire API test suite a few times and cannot get any timeouts.
Maybe worth trying on one of the mac EWS bots (ews150, ews153, ews155, ews119).
Comment 3 Ryan Haddad 2019-05-29 13:55:36 PDT
TestWebKitAPI.DownloadProgress.CancelDownloadWhenProgressIsCanceled is asserting on Mojave debug:

    TestWebKitAPI.DownloadProgress.CancelDownloadWhenProgressIsCanceled
        _RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.
        ASSERTION FAILED: m_downloads.contains(download.downloadID())
        SHOULD NEVER BE REACHED
        /Volumes/Data/slave/mojave-debug/build/Source/WebKit/NetworkProcess/Downloads/DownloadManager.cpp(174) : void WebKit::DownloadManager::downloadFinished(WebKit::Download &)
        /Volumes/Data/slave/mojave-debug/build/WebKitBuild/Debug/DerivedSources/WebKit2/NetworkProcessProxyMessageReceiver.cpp(314) : void WebKit::NetworkProcessProxy::didReceiveNetworkProcessProxyMessage(IPC::Connection &, IPC::Decoder &)

https://build.webkit.org/builders/Apple%20Mojave%20Debug%20WK2%20(Tests)/builds/2782/steps/run-api-tests/logs/stdio

I can reproduce locally with a debug build and the following command:
run-api-tests DownloadProgress.LoseProgressWhenDownloadIsCanceled --iterations 25
Comment 4 Ryan Haddad 2019-05-29 14:04:39 PDT
I cannot reproduce the assertion failure with a build of r245753, but I can with a build of r245756.
Comment 5 David Quesada 2019-05-29 18:35:07 PDT
Created attachment 370906 [details]
Patch
Comment 6 Alexey Proskuryakov 2019-05-30 09:31:06 PDT
Comment on attachment 370906 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=370906&action=review

> Source/WebKit/NetworkProcess/cocoa/NetworkSessionCocoa.mm:629
> +                auto resumeDataReference = [resumeData isKindOfClass:[NSData class]] ? IPC::DataReference { static_cast<const uint8_t*>(resumeData.bytes), resumeData.length } : IPC::DataReference { };

Silently failing to use resumeData seems quite unfortunate. What are we losing today, and what will make resuming fail in the future? We should at least assert that resumeData is one of the types known today.
Comment 7 David Quesada 2019-05-30 10:37:14 PDT
(In reply to Alexey Proskuryakov from comment #6)
> Comment on attachment 370906 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=370906&action=review
> 
> > Source/WebKit/NetworkProcess/cocoa/NetworkSessionCocoa.mm:629
> > +                auto resumeDataReference = [resumeData isKindOfClass:[NSData class]] ? IPC::DataReference { static_cast<const uint8_t*>(resumeData.bytes), resumeData.length } : IPC::DataReference { };
> 
> Silently failing to use resumeData seems quite unfortunate.

I'm not sure I follow what you mean here. Are you referring to the case where resumeData is not an NSData?

> What are we losing today, and what will make resuming fail in the future?

I'm not aware of any cases where the resumeData will be non-nil, non-NSData (if that's what you mean by what we're "losing"). Though the type check a bit paranoid, I don't necessarily object to it since we're working with an untyped dictionary.

> We should at least assert that resumeData is one of the types known today.

Sure. I could add an ASSERT(!resumeData || [resumeData isKindOfClass:[NSData class]]) in this method, if that's what you mean.
Comment 8 David Quesada 2019-05-30 10:49:34 PDT
Created attachment 370955 [details]
Patch for landing
Comment 9 Alexey Proskuryakov 2019-05-30 10:51:26 PDT
Yes. Please consider adding release log too, so that we have a chance of knowing what's causing a problem with resuming in release builds too.
Comment 10 David Quesada 2019-05-30 11:35:54 PDT
Created attachment 370960 [details]
Patch for landing, v2

Added a release log message if the resume data is an unexpected class.
Comment 11 WebKit Commit Bot 2019-05-30 12:53:03 PDT
Comment on attachment 370960 [details]
Patch for landing, v2

Clearing flags on attachment: 370960

Committed r245901: <https://trac.webkit.org/changeset/245901>
Comment 12 WebKit Commit Bot 2019-05-30 12:53:05 PDT
All reviewed patches have been landed.  Closing bug.