Bug 259773
Summary: | REGRESSION(264514@main): [WPE] Lot of crashes when running layout tests on machines without a GPU | ||
---|---|---|---|
Product: | WebKit | Reporter: | Carlos Alberto Lopez Perez <clopez> |
Component: | WPE WebKit | Assignee: | Philippe Normand <philn> |
Status: | RESOLVED FIXED | ||
Severity: | Normal | CC: | bugs-noreply, philn |
Priority: | P2 | ||
Version: | WebKit Nightly Build | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
See Also: | https://bugs.webkit.org/show_bug.cgi?id=257321 |
Carlos Alberto Lopez Perez
264514@main updated several recipes on the Flatpak SDK.
Among them it updated the FDO junction from freedesktop-sdk-22.08.11-24-g67b3605b402cffed2db0344f1da3a6acb5b6c914 to freedesktop-sdk-22.08.11-127-g7bed8a0d05bfd13939862b30e7080d1ffd5f635b
And that it turn updated Mesa from mesa-23.0.3-0-g77661a60228061aa5d78107e0de5c58c803a57a4 to mesa-23.1.0-0-gbe4f7fb656180ab55a50eff01f36672b0bf5f146
And there is a regression with the new version of Mesa that is causing massive crashes on the WPE bots that run without a GPU (which happen to be all the EWS machines).
You can see how they are crashing with more than 500 crashes (exit early) since then: https://ews-build.webkit.org/#/builders/34
I have been able to reproduce the issue by building Mesa with jhbuild (for faster iteration) and I'm now bisecting Mesa to find out which Mesa commit caused this
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
Carlos Alberto Lopez Perez
Bisect finished.
This is the Mesa commit causing the issue:
- b8da022da410519c95d5726fc92a9ee5731de5e8 is the first bad commit
- https://gitlab.freedesktop.org/mesa/mesa/-/commit/b8da022da410519c95d5726fc92a9ee5731de5e8
Is not obvious to me why this Mesa commit is causing this crashes. Will try to debug it with GDB.
Carlos Alberto Lopez Perez
The issue (crashes) is not reproducible with Mesa main so it seems this has been fixed on the main branch.
I will do a reverse bisect to find the commit fixing it
Philippe Normand
Thanks for investigating this, Carlos!
Do you know if Mesa 23.1.5 is also affected by this?
Philippe Normand
Do you have a bt of the crash? The EWS doesn't seem to log them...
Carlos Alberto Lopez Perez
I had trouble bisecting the main branch as I'm not able to build several commits in the range.
However I tested the stable branches and:
- 23.1.5 : works fine
- 23.1.4 : crashes
Bisecting between both tags:
- The commit fixing the issue is: c795abedd2c8d293ad5e9bfac1f8b362260e5bf8
https://gitlab.freedesktop.org/mesa/mesa/-/commit/c795abedd2c8d293ad5e9bfac1f8b362260e5bf8
Related MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24347
So we need to update Mesa to 23.1.5 to fix this issue on the bots.
I didn't bothered to get a GDB back-trace but I can get one easily if you think is useful, just let me know if you want it.
Philippe Normand
(In reply to Carlos Alberto Lopez Perez from comment #5)
> So we need to update Mesa to 23.1.5 to fix this issue on the bots.
>
I can submit a PR.
>
> I didn't bothered to get a GDB back-trace but I can get one easily if you
> think is useful, just let me know if you want it.
I was just asking out of curiosity.
Carlos Alberto Lopez Perez
(In reply to Philippe Normand from comment #6)
> (In reply to Carlos Alberto Lopez Perez from comment #5)
> > So we need to update Mesa to 23.1.5 to fix this issue on the bots.
> >
>
> I can submit a PR.
>
Great. I appreciate that. Thanks!
> >
> > I didn't bothered to get a GDB back-trace but I can get one easily if you
> > think is useful, just let me know if you want it.
>
> I was just asking out of curiosity.
Ok. Will try to get some backtrace.
Philippe Normand
Pull request: https://github.com/WebKit/WebKit/pull/16388
Carlos Alberto Lopez Perez
(In reply to Carlos Alberto Lopez Perez from comment #7)
> (In reply to Philippe Normand from comment #6)
> > (In reply to Carlos Alberto Lopez Perez from comment #5)
> > > So we need to update Mesa to 23.1.5 to fix this issue on the bots.
> > >
> >
> > I can submit a PR.
> >
>
> Great. I appreciate that. Thanks!
>
> > >
> > > I didn't bothered to get a GDB back-trace but I can get one easily if you
> > > think is useful, just let me know if you want it.
> >
> > I was just asking out of curiosity.
>
> Ok. Will try to get some backtrace.
I tried to debug this but is a bit complicated.
There isn't a crash as such.
And the issue is not on the WebProcess but on the UIProcess.
I can reproduce this both with WebKitTestRunner or with Cog.
On stderr you can see this:
libEGL warning: failed to get driver name for fd 0
libEGL warning: MESA-LOADER: failed to retrieve device information
But I couldn't get much more useful information.
Seems some OpenGL/EGL API call that is not expected to fail is failing and
then everything goes wrong.
EWS
Committed 266576@main (87a213be3f95): <https://commits.webkit.org/266576@main>
Reviewed commits have been landed. Closing PR #16388 and removing active labels.
Carlos Alberto Lopez Perez
(In reply to Carlos Alberto Lopez Perez from comment #9)
> (In reply to Carlos Alberto Lopez Perez from comment #7)
> > (In reply to Philippe Normand from comment #6)
> > > (In reply to Carlos Alberto Lopez Perez from comment #5)
> > > > So we need to update Mesa to 23.1.5 to fix this issue on the bots.
> > > >
> > >
> > > I can submit a PR.
> > >
> >
> > Great. I appreciate that. Thanks!
> >
> > > >
> > > > I didn't bothered to get a GDB back-trace but I can get one easily if you
> > > > think is useful, just let me know if you want it.
> > >
> > > I was just asking out of curiosity.
> >
> > Ok. Will try to get some backtrace.
>
>
> I tried to debug this but is a bit complicated.
>
> There isn't a crash as such.
>
> And the issue is not on the WebProcess but on the UIProcess.
>
> I can reproduce this both with WebKitTestRunner or with Cog.
>
> On stderr you can see this:
>
> libEGL warning: failed to get driver name for fd 0
> libEGL warning: MESA-LOADER: failed to retrieve device information
>
> But I couldn't get much more useful information.
>
> Seems some OpenGL/EGL API call that is not expected to fail is failing and
> then everything goes wrong.
Also the issue is much more easy to trigger if you run several WebKitTestRunners in parallel.
Carlos Alberto Lopez Perez
(In reply to Carlos Alberto Lopez Perez from comment #9)
> On stderr you can see this:
>
> libEGL warning: failed to get driver name for fd 0
> libEGL warning: MESA-LOADER: failed to retrieve device information
>
Another important thing is that the log is also full of warnings like this:
EGLDisplay Initialization failed: EGL_NOT_INITIALIZED
But that warning (EGL_NOT_INITIALIZED) also happens with this new Mesa version that fixes the issue and was also happening before this regression. So it is not new.
Things work despite that EGL_NOT_INITIALIZED warning, but maybe we should look at it as it doesn't seem normal this warning.