WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 201550
REGRESSION: http/tests/resourceLoadStatistics/do-not-capture-statistics-for-simple-top-navigations.html is frequently timing out on iOS EWS bots
https://bugs.webkit.org/show_bug.cgi?id=201550
Summary
REGRESSION: http/tests/resourceLoadStatistics/do-not-capture-statistics-for-s...
Ryan Haddad
Reported
2019-09-06 11:00:31 PDT
http/tests/resourceLoadStatistics/do-not-capture-statistics-for-simple-top-navigations.html is frequently failing on iOS EWS bots and resulting in false positives
https://ews-build.webkit.org/#/builders/18
Attachments
Patch
(11.56 KB, patch)
2019-09-06 15:53 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(12.84 KB, patch)
2019-09-09 19:01 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Ryan Haddad
Comment 1
2019-09-06 11:01:57 PDT
The test was a flaky timeout on trunk bots until 9/4/19, but it hasn't appeared again since then:
https://webkit-test-results.webkit.org/dashboards/flakiness_dashboard.html#tests=http%2Ftests%2FresourceLoadStatistics%2Fdo-not-capture-statistics-for-simple-top-navigations.html
Ryan Haddad
Comment 2
2019-09-06 11:03:47 PDT
Looking at the last 1000 EWS runs, the first timeout I see was 14 days ago:
https://ews-build.webkit.org/#/builders/18/builds/2379
...then the next one was 7 days ago:
https://ews-build.webkit.org/#/builders/18/builds/2803
Since then, it has been timing out very frequently.
Ryan Haddad
Comment 3
2019-09-06 11:06:28 PDT
The patch that presented the failure 7 days ago was
Bug 201344
[PSON] We no longer need to throw away suspended pages in a process before using it for a navigation That did introduce a few flaky adClickAttribution test timeouts, documented in:
https://bugs.webkit.org/show_bug.cgi?id=201440
That issue was fixed yesterday with
https://trac.webkit.org/changeset/249557
Ryan Haddad
Comment 4
2019-09-06 11:13:32 PDT
Chris did mention that his change did change the timing of things, so it could possibly make things more / less flaky. I'm still not sure why we don't see this on trunk bots. I'm not sure if this could have something to do with test order. We use '--skip-failing-tests' on EWS and also retry failures, so that could explain the different results.
Ryan Haddad
Comment 5
2019-09-06 11:46:25 PDT
I'm trying to reproduce this locally.
Tim Horton
Comment 6
2019-09-06 12:09:27 PDT
It reproduced locally for me; it didn't seem like the process was hung, just ... not doing anything.
Chris Dumez
Comment 7
2019-09-06 13:29:18 PDT
(In reply to Tim Horton from
comment #6
)
> It reproduced locally for me; it didn't seem like the process was hung, just > ... not doing anything.
It reproduces after
https://trac.webkit.org/changeset/249557
?
Tim Horton
Comment 8
2019-09-06 13:43:32 PDT
Oh, hmm! I definitely didn't have that.
Ryan Haddad
Comment 9
2019-09-06 14:27:31 PDT
I can reproduce it with
r249575
, so
r249557
did not resolve the issue with this test.
Chris Dumez
Comment 10
2019-09-06 14:28:40 PDT
(In reply to Ryan Haddad from
comment #9
)
> I can reproduce it with
r249575
, so
r249557
did not resolve the issue with > this test.
Ok, and do we think this regressed after
r249350
?
Ryan Haddad
Comment 11
2019-09-06 14:37:02 PDT
(In reply to Chris Dumez from
comment #10
)
> (In reply to Ryan Haddad from
comment #9
) > > I can reproduce it with
r249575
, so
r249557
did not resolve the issue with > > this test. > > Ok, and do we think this regressed after
r249350
?
I confirmed that I cannot reproduce it with
r249349
, but I can with
r249350
.
Chris Dumez
Comment 12
2019-09-06 14:47:48 PDT
(In reply to Ryan Haddad from
comment #11
)
> (In reply to Chris Dumez from
comment #10
) > > (In reply to Ryan Haddad from
comment #9
) > > > I can reproduce it with
r249575
, so
r249557
did not resolve the issue with > > > this test. > > > > Ok, and do we think this regressed after
r249350
? > I confirmed that I cannot reproduce it with
r249349
, but I can with
r249350
.
Ok then, I will investigate soon.
Chris Dumez
Comment 13
2019-09-06 15:00:30 PDT
(In reply to Chris Dumez from
comment #12
)
> (In reply to Ryan Haddad from
comment #11
) > > (In reply to Chris Dumez from
comment #10
) > > > (In reply to Ryan Haddad from
comment #9
) > > > > I can reproduce it with
r249575
, so
r249557
did not resolve the issue with > > > > this test. > > > > > > Ok, and do we think this regressed after
r249350
? > > I confirmed that I cannot reproduce it with
r249349
, but I can with
r249350
. > > Ok then, I will investigate soon.
Looking at this now, it does not seem to drop any IPC, nor does it seem to leverage the code paths impacted by
r249350
.
Chris Dumez
Comment 14
2019-09-06 15:04:15 PDT
The test seems to rely on TestRunner::installStatisticsDidScanDataRecordsCallback() / TestRunner::statisticsDidScanDataRecordsCallback(). Digging into it now.
Ryan Haddad
Comment 15
2019-09-06 15:20:16 PDT
Chris was able to reproduce the timeout by just running the test by itself. I was running the whole resourceLoadStatistics directory. I tried running the one test with
r249349
and still got a timeout, so perhaps something else is going on.
Chris Dumez
Comment 16
2019-09-06 15:28:07 PDT
The test calls TestRunner::statisticsNotifyObserver() but this ends up being a no-op because there are no statistics in the ResourceLoadObserver to send to the NetworkProcess. As a result, the test ends up waiting for a notification that will never arrive. This has nothing to do with my change. I guess that when you run it with other tests, there may be leftover statistics in the ResourceLaodObserver to send and this is why the test usually passes when not run by itself.
Chris Dumez
Comment 17
2019-09-06 15:37:27 PDT
(In reply to Chris Dumez from
comment #16
)
> The test calls TestRunner::statisticsNotifyObserver() but this ends up being > a no-op because there are no statistics in the ResourceLoadObserver to send > to the NetworkProcess. As a result, the test ends up waiting for a > notification that will never arrive. > > This has nothing to do with my change. I guess that when you run it with > other tests, there may be leftover statistics in the ResourceLaodObserver to > send and this is why the test usually passes when not run by itself.
Potential regression from
https://trac.webkit.org/r249056
. Before
r249056
, the call to statisticsNotifyObserver() would not have been a no-op I think, since we would have actually notified the WebResourceLoadStatistics store (providing it an empty HashMap).
Chris Dumez
Comment 18
2019-09-06 15:53:58 PDT
Created
attachment 378247
[details]
Patch
Chris Dumez
Comment 19
2019-09-06 15:54:36 PDT
Not sure if this is the best way to address the issue. John may have a better idea as he knows this code / test better.
Ryan Haddad
Comment 20
2019-09-06 16:36:13 PDT
Marked the test as flaky in
https://trac.webkit.org/changeset/249595/webkit
to speed up EWS.
Radar WebKit Bug Importer
Comment 21
2019-09-06 16:36:51 PDT
<
rdar://problem/55133261
>
Alex Christensen
Comment 22
2019-09-07 09:18:14 PDT
Comment on
attachment 378247
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=378247&action=review
> LayoutTests/http/tests/resourceLoadStatistics/do-not-capture-statistics-for-simple-top-navigations.html:37 > + timerHandle = setTimeout(finishTest, 100);
Is 100ms enough? What does this need to wait for?
Chris Dumez
Comment 23
2019-09-07 10:57:26 PDT
(In reply to Alex Christensen from
comment #22
)
> Comment on
attachment 378247
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=378247&action=review
> > > LayoutTests/http/tests/resourceLoadStatistics/do-not-capture-statistics-for-simple-top-navigations.html:37 > > + timerHandle = setTimeout(finishTest, 100); > > Is 100ms enough? What does this need to wait for?
It does not need to wait for anything. There is supposed to be no statistics, this is just making sure that no new statistics come in. We are testing that something does not happen (I.e something is not captured). So no flakiness issue with a fairly short timer.
John Wilander
Comment 24
2019-09-09 15:59:55 PDT
Comment on
attachment 378247
[details]
Patch LGTM.
Chris Dumez
Comment 25
2019-09-09 19:01:02 PDT
Created
attachment 378430
[details]
Patch
WebKit Commit Bot
Comment 26
2019-09-09 21:06:48 PDT
Comment on
attachment 378430
[details]
Patch Clearing flags on attachment: 378430 Committed
r249702
: <
https://trac.webkit.org/changeset/249702
>
WebKit Commit Bot
Comment 27
2019-09-09 21:06:50 PDT
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug