Bug 225099

Summary: [WPE][GTK] More correct fixes for stack size issues on musl libc
Product: WebKit Reporter: Daniel Kolesa <dkolesa>
Component: Web Template FrameworkAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: aperez, benjamin, cdumez, clopez, cmarcelo, ews-watchlist, keith_miller, mark.lam, msaboff, pmatos, psaavedra, raj.khem, saam, tzagallo, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Local Build   
Hardware: Other   
OS: Linux   
Bug Depends on:    
Bug Blocks: 187485, 226244    
Attachments:
Description Flags
patch
none
fixed style, typos
mark.lam: review+
use main stack boundary logic on linux universally
none
fixed patch
mark.lam: review+
fixed changelog and other comment typos none

Description Daniel Kolesa 2021-04-27 06:27:12 PDT
[WPE][GTK] More correct fixes for stack size issues on musl libc
    
Partial revert https://bugs.webkit.org/show_bug.cgi?id=210068
    
While the changes in r236306 stopped JSC from crashing outright, they are not correct, since they also make it rather unstable.
   
To counter this, increase stack size for threads on musl to 1M, to match other systems but not be overly greedy like e.g. glibc.
    
While at it, the previous approach to musl thread stack size was breaking use of DEFAULT_THREAD_STACK_SIZE_IN_KB (if defined) as well as not properly taking care of the unused parameter. Move the code to a more appropriate place, which solves these problems.
    
All this is however not enough, since there is still the main thread; on musl, attempting to get the stack size of main thread via the API provided by pthreads results in the value of 128k, which is the reserved kernel size. WebKit then uses this to check the stack size of all threads, including main. That results in small stack bounds for the main thread, which results in JSC thinking it has overflown the stack, and crashes many websites.
    
In order to fix this, we need to do something similar to what is already being done for Drawin - use the resource limits interface for main thread instead.
    
Finally, drop the special casing in JSC options. It is no longer necessary - softReservedZoneSize was causing a crash at a value of 128K because of the main thread stack bounds, and this is now anon-issue. We can keep the maxPerThreadStackUsage at 5M as well; there is no fundamental difference from how things are done on glibc anymore.
Comment 1 Daniel Kolesa 2021-04-27 06:31:09 PDT
Created attachment 427141 [details]
patch

Some comments for potential review:

- I considered using WTF::isMainThread() but decided against it; since this piece of code is specific to Linux, I thought it'd be best to use the lowest level API in order to avoid abstractions potentially causing issues and stay robust
- Ideally I'd like to change the '#if OS(LINUX) && !defined(__BIONIC__) && !defined(__GLIBC__)' to just '#if OS(LINUX)' in StackBounds.cpp. Common paths are better tested, and this should not hurt non-musl systems; finally, fewer ifdefs are better than more. I included a strict conditional for now to be conservative. I'll leave it up to reviewer which path we'll end up taking.
Comment 2 Daniel Kolesa 2021-04-27 06:38:41 PDT
Created attachment 427144 [details]
fixed style, typos
Comment 3 Adrian Perez 2021-04-27 07:15:56 PDT
Comment on attachment 427144 [details]
fixed style, typos

Nice one, thanks!

Patch LGTM, but I also think it would be positive to remove conditional compilation
guards and use the same code with the different libcs; to avoid having one code path
less exercised than the others.

I will try to get some JSC developer to also take a look at this and give the
final r+; it seems like a good idea to have another pair of eyes go over this
patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=427144&action=review

> Source/WTF/wtf/StackBounds.cpp:125
> +    if (getpid() == static_cast<pid_t>(syscall(SYS_gettid))) {

I would be in favor of moving this check into a Linux-specific implementation of
WTF::isMainThread(). Currently Linux uses the implementation from MainThreadGeneric.cpp
but pthread_main_np() is not available on Linux, so if something attempts to use it
before a call to WTF::initializeThreadPlatform() has been made e.g. constructors it
will most certainly return “false” because the “mainThread” variable would have not
been yet initialized.

Nevertheless, I think that could be in a follow-up patch, and unless somebody has
a different opinion, should not block landing this.
Comment 4 Daniel Kolesa 2021-04-27 07:20:44 PDT
I should probably adjust the comments a bit more to clarify things - notably, it's not that musl always returns 128k, it actually calculates the currently available size (which happens to be 128k as that's what the initial size is at the point; this was confirmed by the musl author, and that it's intended behavior)

so the solutions are really two - either do what i'm doing (but update the comment) or pre-poke the stack (which i don't think is necessary - it's fine that it grows, it's just that the stack boundary is created once on startup and never updated later, so it should represent the maximum rather than current)

anyway, after someone involved with JSC confirms it's fine, I will do a final update on the patch along with other potential review changes.
Comment 5 Khem Raj 2021-04-27 07:33:04 PDT
Comment on attachment 427144 [details]
fixed style, typos

thanks for this patch, I have tried it and seems to work ok in my case so far.
Comment 6 Mark Lam 2021-04-27 10:16:51 PDT
Comment on attachment 427144 [details]
fixed style, typos

View in context: https://bugs.webkit.org/attachment.cgi?id=427144&action=review

r=me with suggestion for USE(MUSL).  Feel free to do that as clean up in a follow up patch.

> Source/WTF/wtf/StackBounds.cpp:117
> +#if OS(LINUX) && !defined(__BIONIC__) && !defined(__GLIBC__)

I see this set of conditionals used in more than 1 place (3 in this patch).  Would it be better to define USE_MUSL in PlatformUse.h and check #if USE(MUSL) here (and in the other places)?  Is this code only needed for musl?
Comment 7 Mark Lam 2021-04-27 10:23:02 PDT
(In reply to Daniel Kolesa from comment #4)
> I should probably adjust the comments a bit more to clarify things -
> notably, it's not that musl always returns 128k, it actually calculates the
> currently available size (which happens to be 128k as that's what the
> initial size is at the point; this was confirmed by the musl author, and
> that it's intended behavior)
> 
> so the solutions are really two - either do what i'm doing (but update the
> comment) or pre-poke the stack (which i don't think is necessary - it's fine
> that it grows, it's just that the stack boundary is created once on startup
> and never updated later, so it should represent the maximum rather than
> current)
> 
> anyway, after someone involved with JSC confirms it's fine, I will do a
> final update on the patch along with other potential review changes.

FYI, the StackBounds is expected to be constant.  Client code can cache it to avoid continually probing for the bounds.  One can expect that whatever it returns the first time is the bounds the code will use for the lifecycle of the process.  StackBounds should never lie and claim to have more stack than is actually available.  The downside of doing so is that it will crash when the client use stack memory that isn't actually there.

And yes, adding the details to the ChangeLog will probably be helpful as future readers of this code will be consulting it for the rationale of your implementation.
Comment 8 Daniel Kolesa 2021-04-27 10:38:27 PDT
> FYI, the StackBounds is expected to be constant.  Client code can cache it
> to avoid continually probing for the bounds.  One can expect that whatever
> it returns the first time is the bounds the code will use for the lifecycle
> of the process.  StackBounds should never lie and claim to have more stack
> than is actually available.  The downside of doing so is that it will crash
> when the client use stack memory that isn't actually there.
> 
> And yes, adding the details to the ChangeLog will probably be helpful as
> future readers of this code will be consulting it for the rationale of your
> implementation.

well, that's the assumption we're working with in this patch - for main thread, it's initialized once, to the size the stack can have at maximum

on glibc and most other libc's, pthread_attr_getstack will already return the maximum, while on musl, it returns the current size (it will automatically grow though, unlike thread stacks which never grow further)

> I see this set of conditionals used in more than 1 place (3 in this patch). 
> Would it be better to define USE_MUSL in PlatformUse.h and check #if
> USE(MUSL) here (and in the other places)?  Is this code only needed for musl?

that's the thing - it's technically not needed on glibc/bionic (since the pthread api already returns what we expect), but if we also enabled it for glibc/bionic, it wouldn't hurt anything

which is why i'd perhaps like to change the conditional to just '#if OS(LINUX)' and use it everywhere, to avoid needless platform checks

that conditional will obviously catch other libc's than musl as well; there's no telling what kind of behavior they will actually have - so we should play it safe for those either way

musl has no test macro of its own, intentionally (in order to encourage people to write portable code)
Comment 9 Mark Lam 2021-04-27 10:43:53 PDT
(In reply to Daniel Kolesa from comment #8)
> > I see this set of conditionals used in more than 1 place (3 in this patch). 
> > Would it be better to define USE_MUSL in PlatformUse.h and check #if
> > USE(MUSL) here (and in the other places)?  Is this code only needed for musl?
> 
> that's the thing - it's technically not needed on glibc/bionic (since the
> pthread api already returns what we expect), but if we also enabled it for
> glibc/bionic, it wouldn't hurt anything
> 
> which is why i'd perhaps like to change the conditional to just '#if
> OS(LINUX)' and use it everywhere, to avoid needless platform checks
> 
> that conditional will obviously catch other libc's than musl as well;
> there's no telling what kind of behavior they will actually have - so we
> should play it safe for those either way
> 
> musl has no test macro of its own, intentionally (in order to encourage
> people to write portable code)

Can't you add this in PlatformUse.h:

#if OS(LINUX) && !defined(__BIONIC__) && !defined(__GLIBC__)
#define USE_MUSL
#endif

... and check USE(MUSL) in all the places instead?  That is unless you intend to use this code for other than musl.  Anyway, I just think it's error prone to always have a long string of conditionals like that that needs to be copy-pasted to multiple places.  That's my $.02.
Comment 10 Daniel Kolesa 2021-04-27 10:48:19 PDT
(In reply to Mark Lam from comment #9)
> (In reply to Daniel Kolesa from comment #8)
> > > I see this set of conditionals used in more than 1 place (3 in this patch). 
> > > Would it be better to define USE_MUSL in PlatformUse.h and check #if
> > > USE(MUSL) here (and in the other places)?  Is this code only needed for musl?
> > 
> > that's the thing - it's technically not needed on glibc/bionic (since the
> > pthread api already returns what we expect), but if we also enabled it for
> > glibc/bionic, it wouldn't hurt anything
> > 
> > which is why i'd perhaps like to change the conditional to just '#if
> > OS(LINUX)' and use it everywhere, to avoid needless platform checks
> > 
> > that conditional will obviously catch other libc's than musl as well;
> > there's no telling what kind of behavior they will actually have - so we
> > should play it safe for those either way
> > 
> > musl has no test macro of its own, intentionally (in order to encourage
> > people to write portable code)
> 
> Can't you add this in PlatformUse.h:
> 
> #if OS(LINUX) && !defined(__BIONIC__) && !defined(__GLIBC__)
> #define USE_MUSL
> #endif
> 
> ... and check USE(MUSL) in all the places instead?  That is unless you
> intend to use this code for other than musl.  Anyway, I just think it's
> error prone to always have a long string of conditionals like that that
> needs to be copy-pasted to multiple places.  That's my $.02.

that won't be true, because that doesn't check for musl, it checks for every c library except glibc and bionic (so also e.g. uclibc and so on)

as i said, the best thing would probably be to use the the code universally, without checking which libc it is; it should work everywhere as is, so there is no need to check for libc here
Comment 11 Mark Lam 2021-04-27 10:51:32 PDT
(In reply to Daniel Kolesa from comment #10)
> (In reply to Mark Lam from comment #9)
> > Can't you add this in PlatformUse.h:
> > 
> > #if OS(LINUX) && !defined(__BIONIC__) && !defined(__GLIBC__)
> > #define USE_MUSL
> > #endif
> > 
> > ... and check USE(MUSL) in all the places instead?  That is unless you
> > intend to use this code for other than musl.  Anyway, I just think it's
> > error prone to always have a long string of conditionals like that that
> > needs to be copy-pasted to multiple places.  That's my $.02.
> 
> that won't be true, because that doesn't check for musl, it checks for every
> c library except glibc and bionic (so also e.g. uclibc and so on)
> 
> as i said, the best thing would probably be to use the the code universally,
> without checking which libc it is; it should work everywhere as is, so there
> is no need to check for libc here

SGTM.
Comment 12 Daniel Kolesa 2021-04-27 11:09:13 PDT
Created attachment 427171 [details]
use main stack boundary logic on linux universally

switched to not checking libc, instead enabling the new code for all Linux universally

I kept the old conditional in the thread stack size code, as there is no need to adjust it on those two platforms specifically
Comment 13 Daniel Kolesa 2021-04-27 11:12:38 PDT
Created attachment 427173 [details]
fixed patch

apologies, i accidentally reverted some changes in the patch
Comment 14 Mark Lam 2021-04-27 11:24:50 PDT
Comment on attachment 427173 [details]
fixed patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427173&action=review

r=me

> Source/WTF/ChangeLog:33
> +        we allows us to drop the options special-casing that was in place.

/we allows/it allows/
Comment 15 Daniel Kolesa 2021-04-27 11:26:45 PDT
i'll let the CI finish, and then i'll fix the typo
Comment 16 Daniel Kolesa 2021-04-27 15:46:49 PDT
Created attachment 427207 [details]
fixed changelog and other comment typos

fixed the typo and some others in the changelogs and comments (no code changes)

CI seems to have passed on the previous run.
Comment 17 EWS 2021-04-28 00:36:53 PDT
Committed r276695 (237109@main): <https://commits.webkit.org/237109@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 427207 [details].
Comment 18 Radar WebKit Bug Importer 2021-04-28 00:37:18 PDT
<rdar://problem/77252578>