Bug 201822 - REGRESSION (STP): Erratic, then persistent page load failures (deleting disk cache blobs fixes it)
Summary: REGRESSION (STP): Erratic, then persistent page load failures (deleting disk ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: Safari Technology Preview
Hardware: Mac macOS 10.14
: P2 Normal
Assignee: Alex Christensen
URL:
Keywords: InRadar
Depends on: 203902
Blocks:
  Show dependency treegraph
 
Reported: 2019-09-16 09:36 PDT by Jack Wellborn
Modified: 2020-04-02 06:48 PDT (History)
18 users (show)

See Also:


Attachments
Apple.com fails load (246.48 KB, image/png)
2019-09-25 06:32 PDT, Jack Wellborn
no flags Details
Apple.com in private browsing loads fine (570.18 KB, image/png)
2019-09-25 06:33 PDT, Jack Wellborn
no flags Details
Patch (14.74 KB, patch)
2019-11-05 14:46 PST, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (4.36 KB, patch)
2019-11-05 15:58 PST, John Wilander
no flags Details | Formatted Diff | Diff
Patch for landing (4.28 KB, patch)
2019-11-05 16:54 PST, John Wilander
no flags Details | Formatted Diff | Diff
Revenge of the loading bug? (414.42 KB, image/png)
2020-03-09 13:25 PDT, Jack Wellborn
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jack Wellborn 2019-09-16 09:36:42 PDT
I typically try to get nice repro steps, but this one's slippery so bear with me. That last two releases of Safari Technologies Preview (Release 90 and Release 91) have not reliably loaded pages. These include major sites like Github, JIRA, and even Apple.com. As I wrote above, I can't specifically figure out what the scenario or cause of this is. Here is the observed behavior:

1. Everything loads fine.
2. After some time, (sometimes it's an hour, sometimes it's day), certain sites stop loading.
3. Restarting STP resolves temporarily resolves the issues.
4. Go to 1.

Github does start complaining about "Failed integrity metadata check", but I suspect this is a symptom rather than the root cause. It's clearly not an actual failure as the issue goes away along with the other page load failures after restarting STP.

Again, I am sorry for filing such a vague bug. I am filing the ticket since it's survived two releases and something that seems hard to catch in testing as the behavior only occurs after some time.

Thanks,
Jack Wellborn.
Comment 1 Alexey Proskuryakov 2019-09-17 13:22:51 PDT
Thank you for the report, this is very helpful!

> Github does start complaining about "Failed integrity metadata check"

Assuming that the error comes from WebKit, that would be about failing subresource integrity checks, possibly implicating a problem with the WebKit cache. Does clearing the cache via the Safari Developer menu happen to alleviate the symptoms? Of course, it's also possible that Github reports a different error with the same words.

You are correct that this would be difficult to at upon with little detail. Could you please collect a sysdiagnose <https://download.developer.apple.com/OS_X/OS_X_Logs/sysdiagnose_Logging_Instructions.pdf>, and report the bug again via feedbackassistant.apple.com? That would give us the best chance of addressing it.

Please post Feedback Assistant ID here once you have it.
Comment 2 Jack Wellborn 2019-09-17 13:50:38 PDT
Working on getting you the sysdiagnoses, but in the meantime it's worth noting the bug is manifesting at this time of writing on developer.apple.com.


"Did not parse stylesheet at 'https://developer.apple.com/assets/styles/global.dist.css' because non CSS MIME types are not allowed in strict mode."

My guess was some sort of cache corruption bug, but clearing the cache does not seem to help (though restarting does.)

I will follow up with the ID when I have it.
Comment 3 Jack Wellborn 2019-09-18 12:42:39 PDT
The ID is FB7292695. I have updated to STP 92 and will let you know if the issue emerges again.
Comment 4 Jack Wellborn 2019-09-18 13:36:50 PDT
That was quick. The issue has manifested again.
Comment 5 Alexey Proskuryakov 2019-09-18 13:58:29 PDT
Thank you!

rdar://problem/55490308 for Apple engineers looking at this.
Comment 6 Nigel Jones 2019-09-23 03:29:41 PDT
I came to bugzilla to search for exactly this problem.

I am experiencing exactly the same (MBP 2016, 16GB, Catalina current beta)

I've had it with any number of sites - including facebook, apple, bbc news. It seems to take around 1/2 day for me to manifest itself. I did also see the same integrity check error from github (which I use a lot) also

I've disabled all plugins, reset all experiments,  and hit the same issue..

Currenlty back with safari regular

I don't have access to radar. Since it's a tracked issue I'll try again with tech preview and collect a sysdiagnose when it occurs.
Comment 7 Chris Dumez 2019-09-23 08:49:13 PDT
(In reply to Jack Wellborn from comment #2)
> Working on getting you the sysdiagnoses, but in the meantime it's worth
> noting the bug is manifesting at this time of writing on developer.apple.com.
> 
> 
> "Did not parse stylesheet at
> 'https://developer.apple.com/assets/styles/global.dist.css' because non CSS
> MIME types are not allowed in strict mode."
> 
> My guess was some sort of cache corruption bug, but clearing the cache does
> not seem to help (though restarting does.)
> 
> I will follow up with the ID when I have it.

These CSS errors and integrity checks issues seem to indicate we got bad data either from the network or from a cache. Given that this seems to affect STP, I would suspect more a WebKit cache (memory cache or disk cache). Since it impacts apple.com, I do not think this is related to service workers.
Comment 8 Chris Dumez 2019-09-23 08:50:20 PDT
(In reply to Chris Dumez from comment #7)
> (In reply to Jack Wellborn from comment #2)
> > Working on getting you the sysdiagnoses, but in the meantime it's worth
> > noting the bug is manifesting at this time of writing on developer.apple.com.
> > 
> > 
> > "Did not parse stylesheet at
> > 'https://developer.apple.com/assets/styles/global.dist.css' because non CSS
> > MIME types are not allowed in strict mode."
> > 
> > My guess was some sort of cache corruption bug, but clearing the cache does
> > not seem to help (though restarting does.)
> > 
> > I will follow up with the ID when I have it.
> 
> These CSS errors and integrity checks issues seem to indicate we got bad
> data either from the network or from a cache. Given that this seems to
> affect STP, I would suspect more a WebKit cache (memory cache or disk
> cache). Since it impacts apple.com, I do not think this is related to
> service workers.

@Alex Christensen, did Safari Technology Preview switch to a non-default persistent data store recently? If so, this could be related.
Comment 9 Chris Dumez 2019-09-23 08:51:35 PDT
(In reply to Jack Wellborn from comment #2)
> My guess was some sort of cache corruption bug, but clearing the cache does
> not seem to help (though restarting does.)

How did you clear the cache exactly?
Comment 10 Jack Wellborn 2019-09-23 08:55:01 PDT
(In reply to Chris Dumez from comment #9)
> (In reply to Jack Wellborn from comment #2)
> > My guess was some sort of cache corruption bug, but clearing the cache does
> > not seem to help (though restarting does.)
> 
> How did you clear the cache exactly?

Develop menu 👉 Empty Caches. I also tried...

1. Close the tab of the failing page
2. Develop menu 👉 Empty Caches
3. Try to load failing page again in new tab.
Comment 11 Chris Dumez 2019-09-23 08:58:41 PDT
@Jack Wellborn: When did the issue reproduce with regards to you taking the sysdiagnose? Within the last 5 minutes? 15 minutes? more? If you have an exact time, that would be ideal.
Comment 12 Jack Wellborn 2019-09-23 09:08:52 PDT
(In reply to Chris Dumez from comment #11)
> @Jack Wellborn: When did the issue reproduce with regards to you taking the
> sysdiagnose? Within the last 5 minutes? 15 minutes? more? If you have an
> exact time, that would be ideal.

The time from STP launch to when the issue manifests varies. It happened within an hour after relaunching after updated to Release 92, but doesn't occur until the next day.

Other random information about my use, which I don't suspect will help, but who knows.

1. I am a web developer so I am regularly loading assets from localhost (though I don't think I have ever had an issue with localhost).
2. I am using Tunnelblick for my work VPN, but not all traffic is redirected through it.
3. I use BlueJeans, which does the dance where it tries to load the Application from a a landing page.
4. I am also using 1Password extension, which has also been problematic for a few releases. I suspect that issue is unrelated.

Pages where I have seen the issue (from what I can remember):

GitHub.com
Atlassian Service Desk
Apple.com
Internal Jenkins pages
Comment 13 Chris Dumez 2019-09-23 09:12:49 PDT
(In reply to Jack Wellborn from comment #12)
> (In reply to Chris Dumez from comment #11)
> > @Jack Wellborn: When did the issue reproduce with regards to you taking the
> > sysdiagnose? Within the last 5 minutes? 15 minutes? more? If you have an
> > exact time, that would be ideal.
> 
> The time from STP launch to when the issue manifests varies. It happened
> within an hour after relaunching after updated to Release 92, but doesn't
> occur until the next day.
> 
> Other random information about my use, which I don't suspect will help, but
> who knows.
> 
> 1. I am a web developer so I am regularly loading assets from localhost
> (though I don't think I have ever had an issue with localhost).
> 2. I am using Tunnelblick for my work VPN, but not all traffic is redirected
> through it.
> 3. I use BlueJeans, which does the dance where it tries to load the
> Application from a a landing page.
> 4. I am also using 1Password extension, which has also been problematic for
> a few releases. I suspect that issue is unrelated.
> 
> Pages where I have seen the issue (from what I can remember):
> 
> GitHub.com
> Atlassian Service Desk
> Apple.com
> Internal Jenkins pages

To be clear, I am not asking how long it takes to reproduce. I am asking when the issue happen relative to you taking the sysdiagnose so I can look at the relevant logs in the sysdiagnose. The sysdiagnose includes logging for a long period of time and it would help me a lot if you could tell me when the issue reproduced so I can narrow down my search in the logs. Also note that if the issue reproduces just before taking the sysdiagnose, this is when we get the most useful logging.
Comment 14 Jack Wellborn 2019-09-23 09:16:43 PDT
(In reply to Chris Dumez from comment #13)
> (In reply to Jack Wellborn from comment #12)
> > (In reply to Chris Dumez from comment #11)
> > > @Jack Wellborn: When did the issue reproduce with regards to you taking the
> > > sysdiagnose? Within the last 5 minutes? 15 minutes? more? If you have an
> > > exact time, that would be ideal.
> > 
> > The time from STP launch to when the issue manifests varies. It happened
> > within an hour after relaunching after updated to Release 92, but doesn't
> > occur until the next day.
> > 
> > Other random information about my use, which I don't suspect will help, but
> > who knows.
> > 
> > 1. I am a web developer so I am regularly loading assets from localhost
> > (though I don't think I have ever had an issue with localhost).
> > 2. I am using Tunnelblick for my work VPN, but not all traffic is redirected
> > through it.
> > 3. I use BlueJeans, which does the dance where it tries to load the
> > Application from a a landing page.
> > 4. I am also using 1Password extension, which has also been problematic for
> > a few releases. I suspect that issue is unrelated.
> > 
> > Pages where I have seen the issue (from what I can remember):
> > 
> > GitHub.com
> > Atlassian Service Desk
> > Apple.com
> > Internal Jenkins pages
> 
> To be clear, I am not asking how long it takes to reproduce. I am asking
> when the issue happen relative to you taking the sysdiagnose so I can look
> at the relevant logs in the sysdiagnose. The sysdiagnose includes logging
> for a long period of time and it would help me a lot if you could tell me
> when the issue reproduced so I can narrow down my search in the logs. Also
> note that if the issue reproduces just before taking the sysdiagnose, this
> is when we get the most useful logging.

Ah… I did misunderstand. Because the issue persists until the next relaunch, I was able to kick off sysdiagnose within a few minutes after a failed load.
Comment 15 Chris Dumez 2019-09-23 09:22:53 PDT
(In reply to Jack Wellborn from comment #14)
> (In reply to Chris Dumez from comment #13)
> > (In reply to Jack Wellborn from comment #12)
> > > (In reply to Chris Dumez from comment #11)
> > > > @Jack Wellborn: When did the issue reproduce with regards to you taking the
> > > > sysdiagnose? Within the last 5 minutes? 15 minutes? more? If you have an
> > > > exact time, that would be ideal.
> > > 
> > > The time from STP launch to when the issue manifests varies. It happened
> > > within an hour after relaunching after updated to Release 92, but doesn't
> > > occur until the next day.
> > > 
> > > Other random information about my use, which I don't suspect will help, but
> > > who knows.
> > > 
> > > 1. I am a web developer so I am regularly loading assets from localhost
> > > (though I don't think I have ever had an issue with localhost).
> > > 2. I am using Tunnelblick for my work VPN, but not all traffic is redirected
> > > through it.
> > > 3. I use BlueJeans, which does the dance where it tries to load the
> > > Application from a a landing page.
> > > 4. I am also using 1Password extension, which has also been problematic for
> > > a few releases. I suspect that issue is unrelated.
> > > 
> > > Pages where I have seen the issue (from what I can remember):
> > > 
> > > GitHub.com
> > > Atlassian Service Desk
> > > Apple.com
> > > Internal Jenkins pages
> > 
> > To be clear, I am not asking how long it takes to reproduce. I am asking
> > when the issue happen relative to you taking the sysdiagnose so I can look
> > at the relevant logs in the sysdiagnose. The sysdiagnose includes logging
> > for a long period of time and it would help me a lot if you could tell me
> > when the issue reproduced so I can narrow down my search in the logs. Also
> > note that if the issue reproduces just before taking the sysdiagnose, this
> > is when we get the most useful logging.
> 
> Ah… I did misunderstand. Because the issue persists until the next relaunch,
> I was able to kick off sysdiagnose within a few minutes after a failed load.

Perfect, I will dig into the logs soon. Thanks.
Comment 16 Nigel Jones 2019-09-24 02:35:21 PDT
Extra info - I've hit the same problem after this morning's Catalina update.

Today's hang happened on dev.azure.com viewing a pipeline, yet the same page loads immediately in chrome.

I've grabbed a sysdiagnose if it helps, though unclear where to post to.
Comment 17 Nigel Jones 2019-09-24 02:40:51 PDT
A cache clear didn't help. Restarted STP and the azure page loaded immediately.
Comment 18 Nigel Jones 2019-09-24 05:20:34 PDT
When the azure issue occurred, the only page error I could see was:

Refused to execute a script because its hash, its nonce, or 'unsafe-inline' does not appear in the script-src directive of the Content Security Policy.

After restarting the browser and reloading the page, I still got the same error followed by some other script errors, but the page loaded ok

NOte that in this incarnation I do have ad-blocking enabled, but errors have occured beforehand without any plugins

Exact page was : https://dev.azure.com/odpi/Egeria/_build?definitionId=8 (open source, trial azure builds...)

I'm guessing you have enough info for now so I'll only post any significantly different info. I may go back to safari or chrome in the interim as I'm seeing issues within 10s of minutes of starting the browser. It's actually pretty quick - though I note other pages will work ok.. then additional sites start stalling.
Comment 19 Alex Christensen 2019-09-24 07:17:48 PDT
(In reply to Chris Dumez from comment #8)
> @Alex Christensen, did Safari Technology Preview switch to a non-default
> persistent data store recently? If so, this could be related.
It did switch.  It's possible that it's related but it could also be something else.
Comment 20 Chris Dumez 2019-09-24 08:44:06 PDT
(In reply to Alex Christensen from comment #19)
> (In reply to Chris Dumez from comment #8)
> > @Alex Christensen, did Safari Technology Preview switch to a non-default
> > persistent data store recently? If so, this could be related.
> It did switch.  It's possible that it's related but it could also be
> something else.

Agreed, I am just trying to think of things that are potentially related and that we changed in the network process. Given that it seems to impact of tabs, I think this is either a bug in the UIProcess or the Network process but the network process is way more likely here. Also, given that restarting Safari fixes it, I think this rules out the HTTP disk cache. So I think something other than the HTTP disk cache is getting into a bad state in the network process (I was thinking maybe the session was inadvertently destroyed?).
Comment 21 Chris Dumez 2019-09-24 08:45:23 PDT
(In reply to Chris Dumez from comment #20)
> (In reply to Alex Christensen from comment #19)
> > (In reply to Chris Dumez from comment #8)
> > > @Alex Christensen, did Safari Technology Preview switch to a non-default
> > > persistent data store recently? If so, this could be related.
> > It did switch.  It's possible that it's related but it could also be
> > something else.
> 
> Agreed, I am just trying to think of things that are potentially related and
> that we changed in the network process. Given that it seems to impact of
> tabs, I think this is either a bug in the UIProcess or the Network process
> but the network process is way more likely here. Also, given that restarting
> Safari fixes it, I think this rules out the HTTP disk cache. So I think
> something other than the HTTP disk cache is getting into a bad state in the
> network process (I was thinking maybe the session was inadvertently
> destroyed?).

Looking at the sysdiagnose, I see a lot of cancelled loads (unclear why, they could simply be due to Safari top-hit preloading failing) and I also see some low-level SSL errors. Nothing else stood out.
Comment 22 Alex Christensen 2019-09-24 11:35:49 PDT
Jack, if you open a private browsing window, is everything working properly?
Comment 23 Chris Dumez 2019-09-24 11:41:35 PDT
(In reply to Alex Christensen from comment #22)
> Jack, if you open a private browsing window, is everything working properly?

Oh, great idea. This would be a very interesting data point indeed.
Comment 24 Geoffrey Garen 2019-09-24 11:47:08 PDT
@Jack, @Nigel, I'm also curious to know: When this problem happens, does it always manifest as something not loading at all, or do you ever see something load with obviously wrong or corrupted content?
Comment 25 Jack Wellborn 2019-09-24 12:06:31 PDT
(In reply to Geoffrey Garen from comment #24)
> @Jack, @Nigel, I'm also curious to know: When this problem happens, does it
> always manifest as something not loading at all, or do you ever see
> something load with obviously wrong or corrupted content?

I am not sure about corrupted content, but will keep an eye out for it.
Comment 26 Jack Wellborn 2019-09-24 12:07:29 PDT
(In reply to Alex Christensen from comment #22)
> Jack, if you open a private browsing window, is everything working properly?

This is a great idea and I will give it a shot next time it occurs.
Comment 27 Nigel Jones 2019-09-24 12:49:02 PDT
* I've not noticed any corrupted page elements at all.
* The problem always seems to be that not all (often most of them) elements never load... just waiting ....
* I've not tried a private window - but will do next time.
* My attempt of clearing cache was the develop->empty caches. Next time I'll try using a new tab
Comment 28 Nigel Jones 2019-09-25 02:15:03 PDT
My next stall happened with https://github.com

I am logged in (I use github a lot)

The page load progress indicator is sat around 15% along, and the page never completes loading. The title is correct, with favicon but no content. Sometimes - and always if I hit X I can see the page is partly rendered -- much of the content is there but the activity stream section just has 'loading'

The debugger shows the following errors:


[Error] Refused to execute a script because its hash, its nonce, or 'unsafe-inline' does not appear in the script-src directive of the Content Security Policy. (github.com, line 0, x2)
[Error] Unhandled Promise Rejection: TypeError: l.LegacyGlobal.should_do_lastpass_here is not a function. (In 'l.LegacyGlobal.should_do_lastpass_here(document)', 'l.LegacyGlobal.should_do_lastpass_here' is undefined)
	l (content-script.js:75:837159)
	s (content-script.js:75:835984)
	promiseReactionJob
[Error] Cannot load https://github.githubassets.com/assets/github-d2e94ad8e01fa6521ed5c16b187aa5da.css.map due to access control checks.
[Error] Cannot load https://github.githubassets.com/assets/frameworks-849637ecbd4bd65815cc113d80fee2d4.css.map due to access control checks.
[Error] Cannot load https://github.githubassets.com/assets/frameworks-c254eb02.js.map due to access control checks.

I then opened an incognito window & logged in, then reloaded the page, and instead got a fully loaded page, including activity stream, and just these errors:
[Error] Refused to execute a script because its hash, its nonce, or 'unsafe-inline' does not appear in the script-src directive of the Content Security Policy. (github.com, line 0, x2)
[Error] Unhandled Promise Rejection: TypeError: l.LegacyGlobal.should_do_lastpass_here is not a function. (In 'l.LegacyGlobal.should_do_lastpass_here(document)', 'l.LegacyGlobal.should_do_lastpass_here' is undefined)
	l (content-script.js:75:837159)
	s (content-script.js:75:835984)
	promiseReactionJob

So the lastpass one is the same in both cases -- it's the access control checks that are more interesting

I then used Develop->Empty caches, and opened up a new tab where I tried to go to github.com - however I still get an issue - the page just doesn't load.



Of course I realise this could be a different issue - it's difficult for me to know sorry.
Comment 29 Nigel Jones 2019-09-25 02:15:31 PDT
to add - again no problem in chrome
Comment 30 Nigel Jones 2019-09-25 02:18:06 PDT
restarted & went page to page. The page didn't load properly with broken icons. Hit shift/reload and page loaded correctly

After reload the only thing in the console was
Unhandled Promise Rejection: TypeError: l.LegacyGlobal.should_do_lastpass_here is not a function. (In 'l.LegacyGlobal.should_do_lastpass_here(document)', 'l.LegacyGlobal.should_do_lastpass_here' is undefined)

as expected
Comment 31 Jack Wellborn 2019-09-25 06:32:11 PDT
Created attachment 379542 [details]
Apple.com fails load
Comment 32 Jack Wellborn 2019-09-25 06:33:07 PDT
Created attachment 379543 [details]
Apple.com in private browsing loads fine
Comment 33 Jack Wellborn 2019-09-25 06:34:40 PDT
(In reply to Jack Wellborn from comment #26)
> (In reply to Alex Christensen from comment #22)
> > Jack, if you open a private browsing window, is everything working properly?
> 
> This is a great idea and I will give it a shot next time it occurs.

I just attached two screenshots that shows that sites stop loading work fine in private browsing. 🤔 Hope this helps.
Comment 34 Nigel Jones 2019-09-25 06:35:47 PDT
@jack - so we both get that the same then. private is ok. Hopefully that's an aid to the devs.
Comment 35 Alex Christensen 2019-09-25 09:31:20 PDT
I think I know what's going on and how to fix it.  After a network process crashes, we restore it in WebProcess::ensureNetworkProcessConnection, which only restores the default session.  In NetworkResourceLoader::startNetworkLoad we try to restore any private browsing sessions with WebsiteDataStoreParameters::privateSessionParameters, but since STP is now using a non-default persistent data store, it's never restored after the network process crashes, so loading always fails.  To fix it we just need to restore all sessions from the UI process after the network process crashes.  Simple to fix, simple to test.
Comment 36 Alexey Proskuryakov 2019-09-25 09:47:00 PDT
There are no Networking process crash logs in the sysdiagnose that we have.

To people hitting this: could you please check for com.apple.WebKit.Networking process crashing (using Console.app or directly in ~/Library/Logs/DiagnosticReports)? It's unacceptable if the Networking process crashes every day. While we can mostly recover from such a crash, the recovery can never be perfect.

Please file new bugs for those crashes. Or feel free to send crash logs to me to group them into actionable buckets.
Comment 37 Jack Wellborn 2019-09-25 10:12:13 PDT
(In reply to Alexey Proskuryakov from comment #36)
> There are no Networking process crash logs in the sysdiagnose that we have.
> 
> To people hitting this: could you please check for
> com.apple.WebKit.Networking process crashing (using Console.app or directly
> in ~/Library/Logs/DiagnosticReports)? It's unacceptable if the Networking
> process crashes every day. While we can mostly recover from such a crash,
> the recovery can never be perfect.
> 
> Please file new bugs for those crashes. Or feel free to send crash logs to
> me to group them into actionable buckets.

I don't see any logs named com.apple.WebKit.Networking, but I do see several named com.apple.WebKit.WebContent. Could those be related?
Comment 38 Alexey Proskuryakov 2019-09-25 10:17:19 PDT
WebContent wouldn't have such symptoms, so they are not the cause here.

We generally don't need bugs filed for random WebContent crashes. But if there are steps to reproduce or other additional information, then bugs are certainly welcome.
Comment 39 Geoffrey Garen 2019-09-25 10:40:15 PDT
(In reply to Alex Christensen from comment #35)
> I think I know what's going on and how to fix it.  After a network process
> crashes, we restore it in WebProcess::ensureNetworkProcessConnection, which
> only restores the default session.  In
> NetworkResourceLoader::startNetworkLoad we try to restore any private
> browsing sessions with WebsiteDataStoreParameters::privateSessionParameters,
> but since STP is now using a non-default persistent data store, it's never
> restored after the network process crashes, so loading always fails.  To fix
> it we just need to restore all sessions from the UI process after the
> network process crashes.  Simple to fix, simple to test.

Are there any conditions that are not crashes that might tear down a non-default session?
Comment 40 Alex Christensen 2019-09-25 10:50:21 PDT
(In reply to Geoffrey Garen from comment #39)
> (In reply to Alex Christensen from comment #35)
> Are there any conditions that are not crashes that might tear down a
> non-default session?

Deallocation of the non-default session in the UIProcess, which never happens in Safari.

My theory was incorrect based on the lack of NetworkProcess crashes from the bug reporters and based on the fact that the NetworkProcess recovers non-default persistent sessions.  I did some cleanup and added a unit test to verify this in bug 202211
Comment 41 Nigel Jones 2019-09-25 11:26:35 PDT
Quick user question - are there any flags/features that might help prove this (ie changing to older data store etc). Just wondering?
Comment 42 Nigel Jones 2019-09-27 03:01:02 PDT
New observation.

Github got into the failed load state. I used chrome for what I was doing, then came back to safari after 10 mins or so.

I seemned to be able to load the page - or at least some, but started seeing some corruption. Sadly I didn't capture a s/s, but after posting an update I had a lot of control chars in the page (I think as it didn't load ompletely).

I then noticed another tab where I had probably tried to load github but was still handing. In this tab the favicon was facebook, but the title was 'Inbox(36)' which, along with the displayed content, was actually gmail

As before other tabs continue ok, and incognito mode also works with the same sites

Time between failures seems quite variable from 10s of minutes to a few hours. github does seem particularly bad, but it may just be because I use it a lot (working on open source) and am constantly moving between pages - ie issues, pull requests etc.
Comment 43 Nigel Jones 2019-10-01 04:39:04 PDT
For confirmation I decided to run with Safari (standard, production, all experiments enabled, Catalina latest betas) and saw no issues at all with pages failing to load.

So I certainly agree with it being a regression...
Comment 44 Jon 2019-10-02 08:25:06 PDT
I'm also seeing this issue, with the same symptoms with the same sites, especially GitHub. Is there any additional info needed to help diagnose the issue?
Comment 45 Nigel Jones 2019-10-04 02:43:26 PDT
Seeing a new STP release,I installed 93, have used it for around a day, and appear to have hit a similar issue on facebook, where the page just seems to not continue loading. Obviously there is a lot of state there, but a exit/restart of STP addressed the issue.
Comment 46 Nigel Jones 2019-10-04 11:43:43 PDT
I hit the issue earlier today with facebook, and again just now with azure pipelines ( https://dev.azure.com/odpi/egeria ). In both cases private window & chrome were both fine.

Reverting back to Safari official version - where so far I've not seen an issue
Comment 47 Nigel Jones 2019-10-14 07:11:27 PDT
Can confirm this still occurs with release 93 on catalina beta 19B68f

Occurred after 1/2 day. This time on Azure developer pipelines dev.azure.com (I've seen it here before). Still not seen on 'safari'
Comment 48 Nigel Jones 2019-10-17 12:27:32 PDT
Confirm the issue occurred with STP 94 (this time dev.azure.com)

Back to safari again

Is there any update?
Comment 49 Chris Dumez 2019-10-17 12:31:24 PDT
(In reply to Nigel Jones from comment #48)
> Confirm the issue occurred with STP 94 (this time dev.azure.com)
> 
> Back to safari again
> 
> Is there any update?

Right now, our assumption is that this is due to an incompatibility between recent Safari Technology Preview builds and macOS High Sierra. I am assuming you are not using macOS Catalina yet?
Comment 50 Jon 2019-10-17 12:33:26 PDT
I've been on Catalina and the betas since July, I've seen it many times since then.
Comment 51 Chris Dumez 2019-10-17 12:34:58 PDT
(In reply to Jon from comment #50)
> I've been on Catalina and the betas since July, I've seen it many times
> since then.

Thanks for this information. This is helpful.
Comment 52 Nigel Jones 2019-10-17 13:04:00 PDT
I'm on 10.15.1 Beta (19B68f)
Comment 53 Jack Wellborn 2019-10-21 12:56:12 PDT
Still seeing Github failures on STP Release 94 on 10.14.6. If reproducibility is still an issue, would a screen share with one of us help?
Comment 54 Geoffrey Garen 2019-10-21 13:46:40 PDT
Alex Christensen is working on adding diagnostics to the build, which will produce valuable debugging data when you hit this condition. Alex, once it's ready, can you offer instructions on how to gather that data and upload it to Bugzilla?
Comment 55 Nigel Jones 2019-10-30 10:36:14 PDT
Do we have any update on capturing debug info.

I just tried 95 STP on Catalina and
 - some pages loaded just with source (no rendering)
 - facebook hung and didn't completely load
 - gmail hung and didn't completely load
 
The behaviour wasn't quite the same, but initial reaction was that it is much worse.
Back to regular safari again ...
Comment 56 Alex Christensen 2019-10-30 10:58:47 PDT
In the inspector console there should be some logs about sub resource integrity.  If you find them in STP 95+, could you send them to me or include them here?
Comment 57 Jack Wellborn 2019-10-30 14:31:26 PDT
(In reply to Alex Christensen from comment #56)
> In the inspector console there should be some logs about sub resource
> integrity.  If you find them in STP 95+, could you send them to me or
> include them here?

Seeing Apple.com fail to load with the error:

> Refused to execute https://www.apple.com/metrics/target/scripts/1.0/at.js as script because "X-Content-Type: nosniff" was given and its Content-Type is not a script MIME type.
Comment 58 Jack Wellborn 2019-10-30 14:34:13 PDT
(In reply to Alex Christensen from comment #56)
> In the inspector console there should be some logs about sub resource
> integrity.  If you find them in STP 95+, could you send them to me or
> include them here?

Here's the usual Github error (unfortunately, no new info even in STP 95):

> Cannot load stylesheet https://github.githubassets.com/assets/frameworks-481a47a96965f6706fb41bae0d14b09a.css. Failed integrity metadata check.
Comment 59 Nigel Jones 2019-10-30 15:12:21 PDT
Gmail fails to load 

 [Error] Failed to load resource: the server responded with a status of 404 () (get, line 0)

The main frame appears but mail list is blank
No problem in safari & chrome
Comment 60 Nigel Jones 2019-10-30 15:16:32 PDT
Azure pipelines - build status page fails with
https://dev.azure.com/ODPi/Egeria/_build/results?buildId=1385

Refused to execute a script because its hash, its nonce, or 'unsafe-inline' does not appear in the script-src directive of the Content Security Policy.
Comment 61 Nigel Jones 2019-10-30 15:20:29 PDT
Stack exchange fails to load (sticks on loading)

https://stackexchange.com

[Error] TypeError: Argument 1 ('target') to MutationObserver.observe must be an instance of Node
	observe (content-script.js:75:765087)
	(anonymous function) (content-script.js:75:765087)
	m (content-script.js:75:764843)
	(anonymous function) (content-script.js:75:764877)
[Error] TypeError: Argument 1 ('target') to MutationObserver.observe must be an instance of Node
	observe (content-script.js:75:765087)
	(anonymous function) (content-script.js:75:765087)
	m (content-script.js:75:764843)
	(anonymous function) (content-script.js:75:764877)
[Error] Did not parse stylesheet at 'https://cdn.sstatic.net/shared/chrome/chrome.css?v=c5438e118167' because non CSS MIME types are not allowed in strict mode.

No problem in safari, chrome

Odd - but this STP seems unusable.
Comment 62 Jack Wellborn 2019-10-31 05:57:20 PDT
Given the issue seems to be specific to some, maybe we can compare setups in the long shot that a common denominator can be found. 

Here's what I got.

ISP: Spectrum
Hardware: 2016 MacBook Pro 15"
OS: 10.14.6
Networking Related Software:
 * TunnelBlick VPN using OpenVPN Level3
 * Periodic use of HMA VPN (I have to debug geo specifically sometimes)
 * We also have GlobalProtect VPN, but it's never used since my company hasn't fully set it up.
 * Periodic use of Charles Proxy (This would seem to be a good candidate, but I think the issue has happened on days I don't use Charles)
 * Jamf (My company uses it. Not sure what it's doing on the network level, but I am sure it's doing something with networking.)
* Firewall is enabled
* Nessus agent security scanner 

Other info:
* I do Javascript development so I am regularly using localhost, as well as web inspector.
Comment 63 Nigel Jones 2019-11-04 01:31:57 PST
I thought I'd try STP again this morning. Within 10 minutes I had github failing to load (I'd been using it with Safari this am for a few hours):

[Error] Refused to execute a script because its hash, its nonce, or 'unsafe-inline' does not appear in the script-src directive of the Content Security Policy. (github.com, line 0, x2)
[Error] Cannot load https://github.githubassets.com/assets/frameworks-481a47a96965f6706fb41bae0d14b09a.css.map due to access control checks.
[Error] Cannot load https://github.githubassets.com/assets/github-4aa6c31d1652b09080e404b2bf72f75c.css.map due to access control checks.
[Error] Cannot load https://github.githubassets.com/assets/dashboard-bootstrap-a8792b6a.js.map due to access control checks.
[Error] Cannot load https://github.githubassets.com/assets/frameworks-cd24d104.js.map due to access control checks.
[Error] Cannot load https://github.githubassets.com/assets/github-bootstrap-513237e9.js.map due to access control checks.

As to configuration
 - Jamf is installed - but mostly is to deploy things
 - no additional security/firewall sw beyond apple firewall
 - I do use adguard, lastpass plugins -- though had these issues without them installed previously (and they work with safari)
 - TunnelBlick & Express VPN are installed, though I'm not connected through them.
 - We use Cisco Anyconnect VPN for intranet access - this is currently active, but is split-tunneled so only corporate traffic goes through the vpn, internet traffic does not.
  - currently on 10.15.1
 - use/have homebrew (and have lots installed from here - tbh if it wasn't for homebrew I'd probably be using another OS . it is FANTASTIC.!)
 - the only kernel extensions loaded are: 
  180    0 0xffffff7f852e5000 0x35000    0x35000    com.cisco.kext.acsock (4.7.1) 64621D59-8B2B-3F1C-9536-5AD140842F7E <6 5 1>
  181    0 0xffffff7f8531a000 0x16000    0x16000    com.box.filesystems.osxfuse (303.10.2) 737D416E-72BC-33A0-9DC8-8F66F3567164 <8 6 5 3 1>
 - other sw (unlikely to affect?) includes Boom3D (audio), Box, Lastpass, Jetbrains Toolkit & IDEA, OneDrive, iTerm2, slack, spotify, todoist, whatsapp, onenote, visual code being the ones I regularly use.
 - I do currently have all experiments (except adclick attribution debug) enabled (though hit issues without these in the past. My safari has all it's enabled too - albeit there are less)

Can't see any other debug info - and certainly no integrity check reports

Hope that helps. Back to safari....
Comment 64 Nigel Jones 2019-11-04 01:33:00 PST
Note - those github issues were from a page load of https://github.com/settings/profile hanging
Comment 65 Nigel Jones 2019-11-04 01:37:41 PST
Back in safari the only issue on that same page (which loads to completion) is:

Refused to execute a script because its hash, its nonce, or 'unsafe-inline' does not appear in the script-src directive of the Content Security Policy.

Loading the same profile page in google chrome beta gives me:
Refused to load the image 'https://gr6o6jgyyjlimtnvfr3yjziopvz2o4i5nnyim7xjecm3tscfgyrq.litepages.googlezip.net/i?u=https%3A%2F%2Fgithub.githubassets.com%2Fimages%2Fsearch-key-slash.svg' because it violates the following Content Security Policy directive: "img-src 'self' data: github.githubassets.com identicons.github.com collector.githubapp.com github-cloud.s3.amazonaws.com *.githubusercontent.com".

12Refused to load the font '<URL>' because it violates the following Content Security Policy directive: "font-src github.githubassets.com".

2profile:1 Refused to load the stylesheet 'https://fonts.googleapis.com/css?family=Muli:400,700&display=swap' because it violates the following Content Security Policy directive: "style-src 'unsafe-inline' github.githubassets.com". Note that 'style-src-elem' was not explicitly set, so 'style-src' is used as a fallback.

(in case that helps to figure out the issue). Again with chrome the page appears to render fine.
Comment 66 Jack Wellborn 2019-11-04 07:24:10 PST
(In reply to Nigel Jones from comment #63)
> I thought I'd try STP again this morning. Within 10 minutes I had github
> failing to load (I'd been using it with Safari this am for a few hours):
> 
> [Error] Refused to execute a script because its hash, its nonce, or
> 'unsafe-inline' does not appear in the script-src directive of the Content
> Security Policy. (github.com, line 0, x2)
> [Error] Cannot load
> https://github.githubassets.com/assets/frameworks-
> 481a47a96965f6706fb41bae0d14b09a.css.map due to access control checks.
> [Error] Cannot load
> https://github.githubassets.com/assets/github-
> 4aa6c31d1652b09080e404b2bf72f75c.css.map due to access control checks.
> [Error] Cannot load
> https://github.githubassets.com/assets/dashboard-bootstrap-a8792b6a.js.map
> due to access control checks.
> [Error] Cannot load
> https://github.githubassets.com/assets/frameworks-cd24d104.js.map due to
> access control checks.
> [Error] Cannot load
> https://github.githubassets.com/assets/github-bootstrap-513237e9.js.map due
> to access control checks.
> 
> As to configuration
>  - Jamf is installed - but mostly is to deploy things
>  - no additional security/firewall sw beyond apple firewall
>  - I do use adguard, lastpass plugins -- though had these issues without
> them installed previously (and they work with safari)
>  - TunnelBlick & Express VPN are installed, though I'm not connected through
> them.
>  - We use Cisco Anyconnect VPN for intranet access - this is currently
> active, but is split-tunneled so only corporate traffic goes through the
> vpn, internet traffic does not.
>   - currently on 10.15.1
>  - use/have homebrew (and have lots installed from here - tbh if it wasn't
> for homebrew I'd probably be using another OS . it is FANTASTIC.!)
>  - the only kernel extensions loaded are: 
>   180    0 0xffffff7f852e5000 0x35000    0x35000    com.cisco.kext.acsock
> (4.7.1) 64621D59-8B2B-3F1C-9536-5AD140842F7E <6 5 1>
>   181    0 0xffffff7f8531a000 0x16000    0x16000   
> com.box.filesystems.osxfuse (303.10.2) 737D416E-72BC-33A0-9DC8-8F66F3567164
> <8 6 5 3 1>
>  - other sw (unlikely to affect?) includes Boom3D (audio), Box, Lastpass,
> Jetbrains Toolkit & IDEA, OneDrive, iTerm2, slack, spotify, todoist,
> whatsapp, onenote, visual code being the ones I regularly use.
>  - I do currently have all experiments (except adclick attribution debug)
> enabled (though hit issues without these in the past. My safari has all it's
> enabled too - albeit there are less)
> 
> Can't see any other debug info - and certainly no integrity check reports
> 
> Hope that helps. Back to safari....

I also have Slack, Box and use JetBrains Intelli-J. I have LastPass, but do not use the extension. (I do use the 1Password extension, however.) Question: Does you company install any security/antivirus software through Jamf? We use something called Nessus agent, and I am wondering if the problem happens when something scanning the filesystem touches Safari Cache.
Comment 67 Nigel Jones 2019-11-04 07:31:29 PST
Jack,
 Not currently. There was a period where Carbon Black defence was in use, but it's been removed for now due to much broader compatibility issues with Catalina. I have no idea if it will come back.

So.. no... nothing... in fact typically these AV etc tools tend to load additional kernel extensions - as can be seen above I have nothing beyond box (fuse) and my corp. vpn.
Comment 68 Nigel Jones 2019-11-04 07:40:17 PST
The only thing I can think of that is regularly touching files is TimeMachine - I backup to a remove drive (on my asus router) - which of course runs hourly when I'm at home. That being said I've had STP issues when out of the house (like this am)

Other software - I use docker from time to time

Are there any STP or safari options that change behaviour in this area to aid in empirical experimentation?
Comment 69 Jack Wellborn 2019-11-05 07:20:21 PST
Small breakthrough. I can confirm that deleting the files in the following directory resolves the issue without restarting STP (though I suspect temporarily.)

>~/Library/Containers/com.apple.SafariTechnologyPreview/Data/Library/Caches/com.apple.SafariTechnologyPreview/WebKitCache/Version 14/Blobs
Comment 70 Alex Christensen 2019-11-05 07:22:27 PST
Thanks for the additional information.  It helps confirm the theory that this is caused by cache corruption.  That narrows the search for the problem quite a bit.
Comment 71 Nigel Jones 2019-11-05 09:14:57 PST
On cache, I noticed something odd when about to reply. 

I am using safari right now, but have been switching to/from STP as we get to the bottom of the bug ...

So.. this could be a safari bug, but since history is shared (safari->step but perhaps not the other way) this may be relevant

https://pasteboard.co/IFj9ZAo.png

I note that the page titles are totally out of sync with the URLs in most cases.

Just something odd I noticed....
Comment 72 Nigel Jones 2019-11-05 09:26:28 PST
I can also confirm the same effect of deleting those files - I had a hanging page (within a minute of starting STP!). Left it a few minutes. nogo. Ok in another browser. Deleting cache files and reloaded - all fine.
Comment 73 Alex Christensen 2019-11-05 14:46:09 PST
Created attachment 382853 [details]
Patch
Comment 74 John Wilander 2019-11-05 15:19:52 PST
Why not just skip the NSURLSession switching instead of rolling out the whole patch? That's a one-liner.
Comment 75 John Wilander 2019-11-05 15:24:38 PST
My suggested patch:

diff --git a/Source/WebKit/NetworkProcess/cocoa/NetworkDataTaskCocoa.mm b/Source/WebKit/NetworkProcess/cocoa/NetworkDataTaskCocoa.mm
index 39db8b43ced..3d2527d3182 100644
--- a/Source/WebKit/NetworkProcess/cocoa/NetworkDataTaskCocoa.mm
+++ b/Source/WebKit/NetworkProcess/cocoa/NetworkDataTaskCocoa.mm
@@ -213,7 +213,8 @@ NetworkDataTaskCocoa::NetworkDataTaskCocoa(NetworkSession& session, NetworkDataT
     if (auto* networkStorageSession = session.networkStorageSession()) {
         if (!shouldBlockCookies)
             shouldBlockCookies = networkStorageSession->shouldBlockCookies(request, frameID, pageID);
-        needsIsolatedSession = networkStorageSession->shouldBlockThirdPartyCookiesButKeepFirstPartyCookiesFor(firstParty);
+        // rdar://problem/56921584
+        // needsIsolatedSession = networkStorageSession->shouldBlockThirdPartyCookiesButKeepFirstPartyCookiesFor(firstParty);
     }
 #endif
     restrictRequestReferrerToOriginIfNeeded(request);
Comment 76 John Wilander 2019-11-05 15:40:16 PST
Comment on attachment 382853 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=382853&action=review

> LayoutTests/http/tests/resourceLoadStatistics/telemetry-generation-expected.txt:15
> +Some tests failed.

This test was not touched by the original patch and is not failing locally for me if I just skip setting the needsIsolatedSession boolean in NetworkDataTaskCocoa.
Comment 77 Alexey Proskuryakov 2019-11-05 15:54:58 PST
Comment on attachment 382853 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=382853&action=review

> Source/WebKit/ChangeLog:12
> +        This effectively reverts r248640 and increments the disk cache version number so that corrupted caches are discarded.

This fix comes without a test. Can this be tested?
Comment 78 John Wilander 2019-11-05 15:58:29 PST
Created attachment 382857 [details]
Patch
Comment 79 Antti Koivisto 2019-11-05 16:18:34 PST
Comment on attachment 382857 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=382857&action=review

> Source/WebKit/NetworkProcess/cocoa/NetworkDataTaskCocoa.mm:217
> +        // needsIsolatedSession = networkStorageSession->shouldBlockThirdPartyCookiesButKeepFirstPartyCookiesFor(firstParty);

We generally don't land commented-out code.
Comment 80 John Wilander 2019-11-05 16:25:44 PST
(In reply to Antti Koivisto from comment #79)
> Comment on attachment 382857 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=382857&action=review
> 
> > Source/WebKit/NetworkProcess/cocoa/NetworkDataTaskCocoa.mm:217
> > +        // needsIsolatedSession = networkStorageSession->shouldBlockThirdPartyCookiesButKeepFirstPartyCookiesFor(firstParty);
> 
> We generally don't land commented-out code.

Will fix.
Comment 81 Chris Dumez 2019-11-05 16:50:28 PST
Comment on attachment 382857 [details]
Patch

Please address Antti's comment before landing.
Comment 82 John Wilander 2019-11-05 16:54:00 PST
Created attachment 382863 [details]
Patch for landing
Comment 83 John Wilander 2019-11-05 16:54:46 PST
Thanks, Antti and Chris!
Comment 84 WebKit Commit Bot 2019-11-05 17:39:06 PST
Comment on attachment 382863 [details]
Patch for landing

Clearing flags on attachment: 382863

Committed r252116: <https://trac.webkit.org/changeset/252116>
Comment 85 WebKit Commit Bot 2019-11-05 17:39:09 PST
All reviewed patches have been landed.  Closing bug.
Comment 86 John Wilander 2019-11-05 20:44:08 PST
Everyone, thanks for filing and investigating!
Comment 87 Nigel Jones 2019-11-05 23:29:49 PST
Thanks. I look forward to the next STP!

Meanwhile I did try going to the latest nightly at https://webkit.org/build-archives/ ie https://s3-us-west-2.amazonaws.com/minified-archives.webkit.org/mac-catalina-x86_64-release/252124.zip

However I am unclear how to run this -- even clicking on run-webkit-archive in finder results in a security warning on catalina saying it isn't signed (ctrl-click doesn't help either as there are so many executables here)

"JavaScriptCore.framework" cannot be opened because the developer cannot be verified.

At CLI I see:
➜  252124 ./run-webkit-archive
Setting DYLD FRAMEWORK and LIBRARY paths to /Users/jonesn/Downloads/252124/Release
dyld: Symbol not found: __ZN3JSC14JSGlobalObject17defineOwnPropertyEPNS_8JSObjectEPS0_NS_12PropertyNameERKNS_18PropertyDescriptorEb
  Referenced from: /Users/jonesn/Downloads/252124/Release/WebCore.framework/Versions/A/WebCore
  Expected in: /System/Library/Frameworks/JavaScriptCore.framework/Versions/A/JavaScriptCore
 in /Users/jonesn/Downloads/252124/Release/WebCore.framework/Versions/A/WebCore

Other than more intrusive switching off of protection via safe mode I guess there's no obvious way to install now?

Apologies this is slightly off the core topic & fix - yay!
Comment 88 Jack Wellborn 2019-11-06 07:14:00 PST
Thanks! I look forward to the next STP!
Comment 89 Alexey Proskuryakov 2019-11-06 17:29:31 PST
> This fix comes without a test. Can this be tested?

WebKit policy is to land automated tests with all fixes. This way, we prevent reintroducing the same failure, which would otherwise be common for a project of this size and complexity. It is especially important in this case, which was one of the worst livability bugs in recent memory, and took extreme effort to diagnose.

I see that Chris Dumez already refactored the code to make reintroducing the same error much less likely. I don't think that's a substitute for regression testing.
Comment 90 Alexey Proskuryakov 2019-11-06 17:31:58 PST
> However I am unclear how to run this -- even clicking on run-webkit-archive
> in finder results in a security warning on catalina saying it isn't signed
> (ctrl-click doesn't help either as there are so many executables here)

I think that there may be a way to make this work in the future. Can you file a bug?

> dyld: Symbol not found:
> __ZN3JSC14JSGlobalObject17defineOwnPropertyEPNS_8JSObjectEPS0_NS_12PropertyNa
> meERKNS_18PropertyDescriptorEb
>   Referenced from:
> /Users/jonesn/Downloads/252124/Release/WebCore.framework/Versions/A/WebCore
>   Expected in:
> /System/Library/Frameworks/JavaScriptCore.framework/Versions/A/JavaScriptCore
>  in
> /Users/jonesn/Downloads/252124/Release/WebCore.framework/Versions/A/WebCore

It tries to use from the archive, but JavaScriptCore.framework form the system. That seems like a separate bug :(
Comment 91 Chris Dumez 2019-11-06 17:56:59 PST
(In reply to Alexey Proskuryakov from comment #89)
> > This fix comes without a test. Can this be tested?
> 
> WebKit policy is to land automated tests with all fixes.

True for fixes, not for roll outs. This is a partial rollout of a patch that caused a regression, we do not require to land a test at the same time in this case.

That’s not to say that we should not try to write a test now that we have fixed it. I defer to John & Alex though since it is their feature.

 This way, we
> prevent reintroducing the same failure, which would otherwise be common for
> a project of this size and complexity. It is especially important in this
> case, which was one of the worst livability bugs in recent memory, and took
> extreme effort to diagnose.

> 
> I see that Chris Dumez already refactored the code to make reintroducing the
> same error much less likely. I don't think that's a substitute for
> regression testing.

I did something at least.
Comment 92 Chris Dumez 2019-11-06 17:59:02 PST
(In reply to Chris Dumez from comment #91)
> (In reply to Alexey Proskuryakov from comment #89)
> > > This fix comes without a test. Can this be tested?
> > 
> > WebKit policy is to land automated tests with all fixes.
> 
> True for fixes, not for roll outs. This is a partial rollout of a patch that
> caused a regression, we do not require to land a test at the same time in
> this case.
> 
> That’s not to say that we should not try to write a test now that we have
> fixed it. I defer to John & Alex though since it is their feature.

Well, it looks like Alex has a test at:

https://bugs.webkit.org/show_bug.cgi?id=203934

>  This way, we
> > prevent reintroducing the same failure, which would otherwise be common for
> > a project of this size and complexity. It is especially important in this
> > case, which was one of the worst livability bugs in recent memory, and took
> > extreme effort to diagnose.
> 
> > 
> > I see that Chris Dumez already refactored the code to make reintroducing the
> > same error much less likely. I don't think that's a substitute for
> > regression testing.
> 
> I did something at least.
Comment 93 Nigel Jones 2019-11-07 12:47:31 PST
Did the bug that caused this get introduced into the code stream that as now out as
Safari Version 13.0.3 (15608.3.10.1.4)

I ask because for the FIRST TIME I've just had a very similar symptom over there .... a page hanging loading (twitter.com), despite working fine in incognito mode & in chrome.

Maybe a co-incidence...
Comment 94 Chris Dumez 2019-11-07 12:51:31 PST
(In reply to Nigel Jones from comment #93)
> Did the bug that caused this get introduced into the code stream that as now
> out as
> Safari Version 13.0.3 (15608.3.10.1.4)
> 
> I ask because for the FIRST TIME I've just had a very similar symptom over
> there .... a page hanging loading (twitter.com), despite working fine in
> incognito mode & in chrome.
> 
> Maybe a co-incidence...

No, this particular bug did not ship to customers, except with STP. I do believe we have a known service-worker related bug in shipping though and this may be what you're experiencing, given that most reports of this bug were with Twitter.
Comment 95 Nigel Jones 2019-11-07 12:52:33 PST
Ah ok, thanks. just checking ..
Comment 96 Jack Wellborn 2020-03-09 13:25:32 PDT
Created attachment 393066 [details]
Revenge of the loading bug?

I am worried this might have regressed as I am seeing the same behavior on Twitter for the past few weeks.

1. Page loads fine.
2. Page failed load sporadically.
3. Restarting STP (or using private browsing) resolves the issue.
Comment 97 Alex Christensen 2020-03-09 13:26:46 PDT
That's a different bug.
Comment 98 John Wilander 2020-03-09 13:27:58 PDT
(In reply to Jack Wellborn from comment #96)
> Created attachment 393066 [details]
> Revenge of the loading bug?
> 
> I am worried this might have regressed as I am seeing the same behavior on
> Twitter for the past few weeks.
> 
> 1. Page loads fine.
> 2. Page failed load sporadically.
> 3. Restarting STP (or using private browsing) resolves the issue.

I see people above mentioning that deleting cache was a workaround for the previous bug. Likewise for what you're seeing now?
Comment 99 Jack Wellborn 2020-04-02 06:48:25 PDT
Sorry for the delay here. Replies were ending up in junk for some reason, and then I wanted to give a few more days since updated to STP 103. Looks like whatever was going on has passed.