Bug 241309

Summary: Set MallocProbGuard env var when running API tests and layout tests in Debug configuration
Product: WebKit Reporter: David Kilzer (:ddkilzer) <ddkilzer>
Component: Tools / TestsAssignee: David Kilzer (:ddkilzer) <ddkilzer>
Status: NEW    
Severity: Normal CC: ap, ews-watchlist, ggaren, glenn, jbedard, ryanhaddad, webkit-bug-importer, wilander
Priority: P2 Keywords: InRadar
Version: Other   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch v1
none
Patch v2
none
Patch v3 ews-feeder: commit-queue-

David Kilzer (:ddkilzer)
Reported 2022-06-04 13:17:58 PDT
Set MallocProbGuard environment variable when running API tests and layout tests in Debug configuration. For a single run, this is not likely to find any bugs, but over hundreds of runs over time, it will find bugs in WebKit and in dependent frameworks/libraries. <rdar://91786733>
Attachments
Patch v1 (5.25 KB, patch)
2022-06-04 13:21 PDT, David Kilzer (:ddkilzer)
no flags
Patch v2 (5.23 KB, patch)
2022-06-04 13:39 PDT, David Kilzer (:ddkilzer)
no flags
Patch v3 (5.30 KB, patch)
2022-06-05 08:47 PDT, David Kilzer (:ddkilzer)
ews-feeder: commit-queue-
David Kilzer (:ddkilzer)
Comment 1 2022-06-04 13:21:39 PDT
Created attachment 460029 [details] Patch v1
David Kilzer (:ddkilzer)
Comment 2 2022-06-04 13:39:57 PDT
Created attachment 460030 [details] Patch v2
Alexey Proskuryakov
Comment 3 2022-06-04 14:13:08 PDT
Comment on attachment 460030 [details] Patch v2 View in context: https://bugs.webkit.org/attachment.cgi?id=460030&action=review > COMMIT_MESSAGE:1 > +Set MallocProbGuard env var when running API tests and layout tests in Debug configuration What is the performance impact of this? Debug tests are already so slow that there zero space for nice to have things. Also, anything that simply increases flakiness is a no-go. This is what fuzzing infra is for; for regular tests, the goal is to reduce flakiness.
David Kilzer (:ddkilzer)
Comment 4 2022-06-05 08:44:04 PDT
(In reply to Alexey Proskuryakov from comment #3) > What is the performance impact of this? Debug tests are already so slow that > there zero space for nice to have things. I couldn't even tell it was enabled on my 2015 MBP when running layout tests, though I verified it was enabled via lldb. This is nothing like original Guard Malloc. Are there perf tests for Debug test runs so we can measure? > Also, anything that simply increases flakiness is a no-go. This is what > fuzzing infra is for; for regular tests, the goal is to reduce flakiness. This will not increase flakiness. The types of bugs it finds are already causing flakey crashes--this just makes a flakey crash actionable when it happens and the memory is being tracked. Over time, this will actually decrease some causes of flakey crashes by making those crashes actionable.
David Kilzer (:ddkilzer)
Comment 5 2022-06-05 08:47:58 PDT
Created attachment 460039 [details] Patch v3
David Kilzer (:ddkilzer)
Comment 6 2022-06-05 08:48:48 PDT
(In reply to David Kilzer (:ddkilzer) from comment #5) > Created attachment 460039 [details] > Patch v3 The only change here is to not enable MallocProbGuard when original Guard Malloc is enabled.
Alexey Proskuryakov
Comment 7 2022-06-05 13:08:46 PDT
> This will not increase flakiness. I do not think that this is correct, and I continue to not see any point in doing this.
John Wilander
Comment 8 2022-06-06 12:05:59 PDT
(In reply to Alexey Proskuryakov from comment #7) > > This will not increase flakiness. > > I do not think that this is correct, and I continue to not see any point in > doing this. Alexey, could you share some specific concerns, please? It may be that it's a bad idea to enable this but I'd like to understand that assessment. As David explained, PGM doesn't produce more crashes. It just makes existing crashes actionable in that memory allocation traces are retained and reported. Actionable bug reports should lead to bug fixing, which in turn will lead to less flakiness. It is also very easy to revert the change if we find it to be detrimental to our test stability goals. Thanks!
Alexey Proskuryakov
Comment 9 2022-06-06 13:04:57 PDT
From all I can tell, enabling PGM on correctness tests is: - Very low benefit. We try to invest tests that catch a lot of issues, and I haven't seen evidence of PGM finding even one new issue on regression tests yet (examples that can be counted using fingers wouldn't be of much significance, to be clear). - Medium risk. As a non-standard mode, it will sooner or later block testing due to issues that are not hit in normal mode (like crash logs not being generated, or just making everything crash due to a minor issue in an underlying framework, or memory allocator temporary not working with MallocProbGuard=1 at all). - Not the right place to do it. We have all kinds of stress racks, correctness tests are not that. > As David explained, PGM doesn't produce more crashes. I doubt that this is accurate. There are always issues that are benign or mostly benign.
Geoffrey Garen
Comment 10 2022-06-06 13:28:02 PDT
Why is this controversial at all? We're talking about a technology that adds a use-after-free and bounds check assertion to literally zero or one objects per test. Surely our existing debug assertions, in our allocators alone, change behavior in non-standard ways much more than that already? > - Very low benefit. We try to invest tests that catch a lot of issues, and I > haven't seen evidence of PGM finding even one new issue on regression tests > yet (examples that can be counted using fingers wouldn't be of much > significance, to be clear). Seems like an impossible burden of proof to meet? How would we produce the specific data that PGM catches bugs when running layout tests on bots without... turning it on and running layout tests on bots? > - Medium risk. As a non-standard mode, it will sooner or later block testing > due to issues that are not hit in normal mode (like crash logs not being > generated, or just making everything crash due to a minor issue in an > underlying framework, or memory allocator temporary not working with > MallocProbGuard=1 at all). This is a false premise. PGM enabled is a customer-shipping mode. > - Not the right place to do it. We have all kinds of stress racks, > correctness tests are not that. The stated purpose of PGM on layout tests is to catch bugs that are not caught by stress racks because technologies like Asan and Guard Malloc change behavior and timing too much, especially for thread-safety issues. Stress racks are not a viable way to find bugs not caught by stress racks. Why does an up-front guesstimate about risk or reward matter? What is the downside to trying out PGM and then reverting if it doesn't find any bugs, or if any of the feared side-effects actually materialize in practice?
Alexey Proskuryakov
Comment 11 2022-06-06 13:59:38 PDT
I have the same impression - this is so obviously wrong that I don't understand why there is any controversy. > Why does an up-front guesstimate about risk or reward matter? What is the downside to trying out PGM and then reverting if it doesn't find any bugs, or if any of the feared side-effects actually materialize in practice? We have intelligence and experience, and don't just throw patches at a wall to see which ones stick. We also prefer to catch mistakes before landing, and not spend days or weeks of time to see what exactly caused an observable symptom. > This is a false premise. PGM enabled is a customer-shipping mode. Not false enough. It's enabled rarely, and issues can go unnoticed for a long time, and stay low priority once noticed.
Alexey Proskuryakov
Comment 12 2022-06-06 15:09:48 PDT
> Seems like an impossible burden of proof to meet? How would we produce the specific data that PGM catches bugs when running layout tests on bots without... turning it on and running layout tests on bots? Sorry, missed this question. Typically, this is done by doing it locally. We'd have no trouble demonstrating that a debug build catches many issues that a release one doesn't.
Geoffrey Garen
Comment 13 2022-06-06 16:49:03 PDT
> > Why does an up-front guesstimate about risk or reward matter? What is the downside to trying out PGM and then reverting if it doesn't find any bugs, or if any of the feared side-effects actually materialize in practice? > > We have intelligence and experience Can you use your intelligence and experience to give any reasonable example of a bad thing that will happen when, for zero to one objects per test, we assert that it is accessed in-bounds and not used after being freed? And why those bad things are not already happening for all the ASSERTs we already have regarding bounds checks and use after free? > , and don't just throw patches at a wall > to see which ones stick. This is an inappropriate (and inaccurate) way to describe a year of Dave's hard work on PGM, and his good faith effort to use it to improve quality in WebKit.
Alexey Proskuryakov
Comment 14 2022-06-06 17:11:14 PDT
I was not describing Dave's work like that. I was very clearly responding to your suggestion, immediately quoted above. You are not helping the cause by putting words in my mouth, and otherwise lowering the level of this discussion. It is also going in circles now, not adding further value. Perhaps it would be best to let David as the person proposing this change, and having the relevant knowledge, to speak for it?
David Kilzer (:ddkilzer)
Comment 15 2022-06-07 13:34:55 PDT
(In reply to Alexey Proskuryakov from comment #14) > [...] Perhaps it would be best to let David as the person > proposing this change, and having the relevant knowledge, to speak for it? I'm not sure what else to say without specific questions (re)stated after the comments above, but at risk of repeating myself, here goes: - PGM is designed to be low-impact and to not introduce stability issues, which has already been proven over the last year. - The only thing this patch does is to enable PGM for all WebKit test processes on Debug builds so that (over a period of time and running thousands of tests) actionable crashes are more likely to be found. - The bugs found could have caused crashes anyway, so this just makes those crashes actionable. - If PGM is hitting a crash frequently, it means that issue was already happening at a high frequency, and is likely causing crashes or flakiness anyway. - It's easy to turn PGM back off (revert the patch) if such a situation is found. - This patch does not tune any default PGM parameters, but I did try some layout test runs locally doing that, and I proved that I could reliably hit (rdar://91711985) on multiple tests in one run. (The default parameter tuning does not hit this crash, as seen by the test results for "Patch v3" above.) - The reason Debug builds were chosen is that they already enable system malloc (last I knew), so enabling PGM on them would also find issues in WebKit, not just dependent frameworks and libraries. I don't think this change will appreciably slow down Debug layout tests, either. Obviously it won't be zero impact, but I think the result will be negligible. Do you have specific examples of what you're concerned that might happen if PGM is enabled?
David Kilzer (:ddkilzer)
Comment 16 2022-06-07 13:38:25 PDT
I should also note that there is nothing your team has to do after enabling this (w.r.t. to filing bugs for crashes) as PGM crash logs are collected and reviewed daily for new issues (currently by me). Also, when PGM hits a crash on a test run, the chances that it will be hit a second time are extremely low (due to its design not to cause stability issues).
Note You need to log in before you can comment on or make changes to this bug.