Bug 313710
| Summary: | REGRESSION(311577@main): [GTK][GStreamer][Rice] WebKitNetworkProcess is using massive amounts of CPU and spanning several threads named webrtc-rice-XX | ||
|---|---|---|---|
| Product: | WebKit | Reporter: | Carlos Alberto Lopez Perez <clopez> |
| Component: | WebKitGTK | Assignee: | Philippe Normand <philn> |
| Status: | RESOLVED FIXED | ||
| Severity: | Normal | CC: | bugs-noreply, philn |
| Priority: | P2 | ||
| Version: | WebKit Nightly Build | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| See Also: | https://bugs.webkit.org/show_bug.cgi?id=310005 | ||
Carlos Alberto Lopez Perez
The GTK test bot is under unusual pressure lately, tests are exiting early randomly due to timeouts.
The cause seems to be that sometimes WebKitNetworkProcess starts using lot of CPU with several threads named "webrtc-rice-XX"
This happens randomly on some layout tests, so it is not easy to reproduce running one test alone.
The best way I found to reproduce this is:
1. Run:
Tools/Scripts/run-webkit-tests --no-show-results --no-new-test-results --no-retry-failures --no-sample-on-timeout --no-build --results-directory layout-test-results --debug-rwt-logging --release --gtk --no-timeout -f webrtc
2. On another tab check with htop the CPU usage of those WebKitNetworkProcess (you can enable to display thread names via F2->Display Options->Show custom thread names) and you will see a lot of webrtc-rice-XX thread using lot of CPU.
| Attachments | ||
|---|---|---|
| Add attachment proposed patch, testcase, etc. |
Carlos Alberto Lopez Perez
Note: passing --no-timeout to run-webkit-tests is needed so the tool doesn't kill WTR. The idea is that those tests should not enter into that very long (endless?) busy loop even when passing --no-timeout
Carlos Alberto Lopez Perez
I have bisected this, this is caused by 311577@main
Confirmed by:
- Checkout 311576@main and run test 1, wait for run-webkit-tests to print lines, then wait 1 minute more and you see only one WebKitWebProcess spinning CPU, which seems unrelated to this bug
- Checkout 311577@main and run test 1, wait for run-webkit-tests to print lines, then wait 1 minute more and you see the previous WebKitWebProcess spinning CPU but a lot more WebKitNetworkProcess spinning even more CPU.
This is a "top" snapshot around 2 minutes later than run-webkit-tests stopped printing lines
Check how there is even a WebKitNetworkProcess with more than 20 minutes of CPU time used burning through several cores (636% CPU usage).
That doesn't happen before 311577@main
Note: I also tested to revert in the SDK https://github.com/Igalia/webkit-container-sdk/commit/c228acd but the issue keeps happening with librice v0.3.0
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1561064 clopez 20 0 86.0g 195388 67096 S 636.0 0.1 20:55.51 WebKitNetworkPr
1560495 clopez 20 0 72.3g 95576 67352 S 147.4 0.1 4:47.17 WebKitNetworkPr
1560676 clopez 20 0 71.9g 95596 66452 S 144.7 0.1 4:25.84 WebKitNetworkPr
1560835 clopez 20 0 72.7g 97464 67248 S 127.2 0.1 3:39.79 WebKitNetworkPr
1560862 clopez 20 0 71.8g 95340 66444 S 121.9 0.1 3:49.37 WebKitNetworkPr
1560500 clopez 20 0 72.4g 100792 69296 S 103.5 0.1 3:50.74 WebKitNetworkPr
1560891 clopez 20 0 71.4g 95452 67348 S 96.5 0.1 2:38.17 WebKitNetworkPr
1561016 clopez 20 0 71.5g 95536 67400 S 96.5 0.1 3:00.53 WebKitNetworkPr
1560900 clopez 20 0 71.6g 95228 66436 S 93.9 0.1 2:46.44 WebKitNetworkPr
1560861 clopez 20 0 71.3g 95080 66328 S 88.6 0.1 2:40.04 WebKitNetworkPr
1560694 clopez 20 0 72.1g 95272 66432 S 83.3 0.1 2:48.20 WebKitNetworkPr
1560923 clopez 20 0 72.0g 95504 66508 S 78.1 0.1 3:09.31 WebKitNetworkPr
1560875 clopez 20 0 71.7g 95884 66640 S 60.5 0.1 3:15.21 WebKitNetworkPr
1560839 clopez 20 0 71.4g 95408 67376 S 57.0 0.1 2:13.37 WebKitNetworkPr
1560897 clopez 20 0 71.0g 95248 66472 S 57.0 0.1 1:32.37 WebKitNetworkPr
1561380 clopez 20 0 81.5g 632108 283636 R 42.1 0.5 1:41.94 WebKitWebProces
1560725 clopez 20 0 71.1g 95328 66644 S 37.7 0.1 1:29.10 WebKitNetworkPr
1560951 clopez 20 0 71.1g 95348 66636 S 36.8 0.1 2:09.71 WebKitNetworkPr
1560745 clopez 20 0 71.1g 95040 67180 S 31.6 0.1 0:42.01 WebKitNetworkPr
1560902 clopez 20 0 71.0g 94756 66320 S 31.6 0.1 1:39.89 WebKitNetworkPr
1560873 clopez 20 0 71.5g 95416 66500 S 28.1 0.1 0:43.34 WebKitNetworkPr
1561033 clopez 20 0 70.9g 95140 67172 S 27.2 0.1 0:48.47 WebKitNetworkPr
1560724 clopez 20 0 71.0g 95168 66404 S 23.7 0.1 0:49.81 WebKitNetworkPr
Philippe Normand
Pull request: https://github.com/WebKit/WebKit/pull/63960
Philippe Normand
I think the recv GSource blocks on TCP sockets polling, or maybe there's a bug in rice-io. I don't have time to investigate this further now, so we went for a short-term workaround, to make bots happy...
EWS
Committed 312352@main (5ef5c9f38896): <https://commits.webkit.org/312352@main>
Reviewed commits have been landed. Closing PR #63960 and removing active labels.
Philippe Normand
We'd need a proper patch instead of a workaround. Re-opening.
Philippe Normand
Pull request: https://github.com/WebKit/WebKit/pull/64664
EWS
Committed 313309@main (ecec9b6009d9): <https://commits.webkit.org/313309@main>
Reviewed commits have been landed. Closing PR #64664 and removing active labels.