204871 – sched_yield can cause calling thread to be preempted for 10 ms on Darwin

RESOLVED DUPLICATE of bug 215248 204871

sched_yield can cause calling thread to be preempted for 10 ms on Darwin

https://bugs.webkit.org/show_bug.cgi?id=204871

Summary sched_yield can cause calling thread to be preempted for 10 ms on Darwin

Ben Nham

Reported 2019-12-04 20:07:20 PST

We currently use sched_yield in both bmalloc and WTF::Lock. This can lead to the calling thread being preempted for up to 10 ms even if the lock was relinquished much earlier than that because of the way this is implemented on Darwin: - sched_yield calls the swtch_pri syscall - swtch_pri depresses the calling thread's priority to 0 for thread_depress_time milliseconds - thread_depress_time is set to one scheduling quantum in sched_*.c - most of our devices run with a timer tick of 10 ms, so thread_depress_time is 10 ms - if the system is busy with enough >0 pri work for 10 ms to saturate all possible CPUs, then the calling thread blocks for 10 ms We actually see exactly this in a number of MobileSafari launch tests (this originally came up as a launch test variability issue from the perf team). An alternative is to use thread_switch, which can depress the priority of the calling thread for a configurable amount of time with millisecond granularity.

Attachments
Patch (3.73 KB, patch) 2019-12-04 20:15 PST, Ben Nham	fpizlo: review-	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Ben Nham

Comment 1 2019-12-04 20:09:08 PST

<rdar://problem/57645816>

Ben Nham

Comment 2 2019-12-04 20:10:42 PST

I ran this by Daniel in Core OS (who is one of the scheduler maintainers) and he said that thread_switch(MACH_PORT_NULL, SWITCH_OPTION_DEPRESS, 1) would do basically what sched_yield does, except only depress the priority for 1 ms instead of 10 ms. Of course the kernel folks would rather we use os_unfair_lock rather than rolling our own locks but I think that's probably a separate discussion.

Filip Pizlo

Comment 3 2019-12-04 20:14:59 PST

(In reply to Ben Nham from comment #2) > I ran this by Daniel in Core OS (who is one of the scheduler maintainers) > and he said that thread_switch(MACH_PORT_NULL, SWITCH_OPTION_DEPRESS, 1) > would do basically what sched_yield does, except only depress the priority > for 1 ms instead of 10 ms. > > Of course the kernel folks would rather we use os_unfair_lock rather than > rolling our own locks but I think that's probably a separate discussion. We use sched_yield because it’s faster than what Daniel suggests. We have tested this many times. We use our own locks because they are faster for WebKit and because they can fit into 2 bits. We use that fact in JSCell. Even WTF::Lock is only one byte, which is 4x smaller than os_unfair_lock.

Ben Nham

Comment 4 2019-12-04 20:15:27 PST

Reopening to attach new patch.

Ben Nham

Comment 5 2019-12-04 20:15:28 PST

Created attachment 384875 [details] Patch

Ben Nham

Comment 6 2019-12-04 20:19:11 PST

How are we testing thread_switch vs sched_yield? There is some difference in using a Mach trap vs. a BSD syscall but I am somewhat surprised it's that large. I have brought up lowering the amount of time that swtch_pri depresses the priority as well, but didn't get any response about that. In any case, we are definitely seeing 10 ms yields on phones (see the radar); whether the calling thread could have woken up earlier is harder to determine from the profile that I was looking at. When the calling thread is the main thread then those 10 ms can count.

Ben Nham

Comment 7 2019-12-04 20:26:27 PST

Actually both swtch_pri and thread_switch seem to be Mach traps. There is a bit more logic in the latter call on the kernel side, but I am somewhat surprised that there would be that much of a difference. But if you have a benchmark in mind that I can try out I would be happy to do that.

Filip Pizlo

Comment 8 2019-12-04 20:42:43 PST

(In reply to Ben Nham from comment #6) > How are we testing thread_switch vs sched_yield? There is some difference in > using a Mach trap vs. a BSD syscall but I am somewhat surprised it's that > large. > > I have brought up lowering the amount of time that swtch_pri depresses the > priority as well, but didn't get any response about that. > > In any case, we are definitely seeing 10 ms yields on phones (see the > radar); whether the calling thread could have woken up earlier is harder to > determine from the profile that I was looking at. When the calling thread is > the main thread then those 10 ms can count. I don’t think you’re understanding me. Your patch is going to be a performance regression.

Filip Pizlo

Comment 9 2019-12-04 20:43:25 PST

Comment on attachment 384875 [details] Patch We have tried this change before. It was bad for perf. Unless you’ve got overwhelming perf data, this ain’t a change we are going to land.

Filip Pizlo

Comment 10 2019-12-04 20:46:19 PST

(In reply to Ben Nham from comment #7) > Actually both swtch_pri and thread_switch seem to be Mach traps. There is a > bit more logic in the latter call on the kernel side, but I am somewhat > surprised that there would be that much of a difference. But if you have a > benchmark in mind that I can try out I would be happy to do that. Any benchmark? We have tried your exact change about 10 times throughout the history of our locks. We have run all the benchmarks. We know it’s a bad idea.

Ben Nham

Comment 11 2020-08-08 17:30:27 PDT

We hadn't re-run the benchmarks again since xnu made another change to thread_switch to more closely match the behavior for swtch_pri a couple of years ago. I wasn't able to complete running the benchmarks due to some automation issues, but it looks like Saam was able to do it last week. Duping to Saam's bug, which is the exact same patch. *** This bug has been marked as a duplicate of bug 215248 ***

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution DUPLICATE

of bug 215248

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Web Template Framework

Assignee

Ben Nham

Reported

2019-12-04 20:07 PST

Modified

2020-08-08 17:30 PDT History

CC List

8 users Show

URL

Keywords InRadar

Depends on

Blocks