Bug 209670

Summary: REGRESSION(r258857): Broke aarch64 JSCOnly CI
Product: WebKit Reporter: Carlos Alberto Lopez Perez <clopez>
Component: JavaScriptCoreAssignee: Carlos Alberto Lopez Perez <clopez>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, cdumez, cmarcelo, ews-watchlist, fpizlo, ggaren, jbedard, keith_miller, mark.lam, mcatanzaro, msaboff, saam, tpopela, tzagallo, webkit-bug-importer, zan
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=209236
Bug Depends on: 209360    
Bug Blocks: 209236    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Crash backtrace
none
Patch none

Description Carlos Alberto Lopez Perez 2020-03-27 11:42:13 PDT
The JSCOnly ARM64 its crashing all of the time since r258857

https://build.webkit.org/builders/JSCOnly%20Linux%20AArch64%20Release/builds/5810
Comment 1 Carlos Alberto Lopez Perez 2020-03-27 11:47:19 PDT
Created attachment 394739 [details]
Patch
Comment 2 Michael Catanzaro 2020-03-27 11:48:33 PDT
Try changing it to 4 KB (or 16 KB), see if that makes a difference?

It really *shouldn't*, but if you're seeing crashes then clearly it does....
Comment 3 Carlos Alberto Lopez Perez 2020-03-27 11:50:01 PDT
Created attachment 394741 [details]
Patch
Comment 4 Michael Catanzaro 2020-03-27 11:54:07 PDT
Don't revert my ChangeLog entries. :P There is a 'webkit-patch rollout' command that will do the revert for you without messing up the ChangeLog.

That said, this change is passing our aarch64 CI. Must be related to different page size. Try this patch:

$ git diff
diff --git a/Source/WTF/wtf/PageBlock.h b/Source/WTF/wtf/PageBlock.h
index 425c9d9e662f..dd56f973cd09 100644
--- a/Source/WTF/wtf/PageBlock.h
+++ b/Source/WTF/wtf/PageBlock.h
@@ -49,9 +49,9 @@ namespace WTF {
 // Use 64 KiB for any unknown CPUs to be conservative.
 #if OS(DARWIN) || PLATFORM(PLAYSTATION)
 constexpr size_t CeilingOnPageSize = 16 * KB;
-#elif OS(WINDOWS) || CPU(MIPS) || CPU(X86) || CPU(X86_64) || CPU(ARM)
+#elif OS(WINDOWS) || CPU(MIPS) || CPU(X86) || CPU(X86_64) || CPU(ARM) || CPU(ARM64)
 constexpr size_t CeilingOnPageSize = 4 * KB;
-#elif CPU(UNKNOWN) || CPU(PPC) || CPU(PPC64) || CPU(PPC64LE) || CPU(ARM64)
+#elif CPU(UNKNOWN) || CPU(PPC) || CPU(PPC64) || CPU(PPC64LE)
 constexpr size_t CeilingOnPageSize = 64 * KB;
 #else
 #error Must set CeilingOnPageSize in PageBlock.h when adding a new CPU architecture!

That will definitely break our CI, because we use 64 KB pages, but at least that's the first step to figuring out what went wrong. Something must be wrong somewhere, because it should be completely safe to use any value larger than the actual page size for CeilingOnPageSize, as long as it is a multiple of the real page size....
Comment 5 Carlos Alberto Lopez Perez 2020-03-27 11:59:08 PDT
(In reply to Michael Catanzaro from comment #4)
> Don't revert my ChangeLog entries. :P There is a 'webkit-patch rollout'
> command that will do the revert for you without messing up the ChangeLog.
> 

Yes, but it failed because of r259018
So I had to do the revert manually.. I fixed the changelog thing already. Let's wait for EWS green before landing.
Comment 6 Michael Catanzaro 2020-03-27 12:00:22 PDT
BTW, is it crashing in JSC Config::permanentlyFreeze?
Comment 7 Michael Catanzaro 2020-03-27 12:04:28 PDT
Plan is:

 * Try 4 KB page size first
 * If that doesn't immediately fix the problem, land clopez's revert and figure it out next week.
Comment 8 Michael Catanzaro 2020-03-27 12:08:32 PDT
Created attachment 394743 [details]
Patch
Comment 9 EWS 2020-03-27 12:48:08 PDT
Committed r259134: <https://trac.webkit.org/changeset/259134>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 394743 [details].
Comment 10 Radar WebKit Bug Importer 2020-03-27 12:49:15 PDT
<rdar://problem/60980423>
Comment 11 Michael Catanzaro 2020-03-28 08:26:03 PDT
Looks like that fixed the upstream bot.

Now if only it didn't also break our internal bot, I would be happy....

Can you find where it was crashing, please? (gdb backtrace, not strace.) This change should only have affected MarkedBlock.h and JSCConfig.h. And in both cases, using any multiple of the system page size should have been perfectly safe.
Comment 12 Zan Dobersek 2020-03-30 03:40:39 PDT
Created attachment 394895 [details]
Crash backtrace
Comment 13 Zan Dobersek 2020-03-30 03:46:54 PDT
(In reply to Zan Dobersek from comment #12)
> Created attachment 394895 [details]
> Crash backtrace

The change leading to the crashes is the MarkedBlock size increase. The backtrace shows the crash, but it's more of a side-effect of that size increase than anything else.

FWIW the crash doesn't occur when bmalloc is disabled via `Malloc=1`.

Regardless of the fix for this crash, I don't think the increase to 64kB as the page-size ceiling is welcome for ARM64 platforms that don't need it, since it leads to quadrupling the size of the MarkedBlock objects.
Comment 14 Michael Catanzaro 2020-03-30 07:03:08 PDT
(In reply to Zan Dobersek from comment #13)
> FWIW the crash doesn't occur when bmalloc is disabled via `Malloc=1`.

Ahhh, there we go, that explains it. Yes, bmalloc has to be disabled when using sizes > 16 KB. The default CeilingOnPageSize of 64 KB is OK because bmalloc is disabled by default except on whitelisted architectures. It fails only on aarch64 because we have aarch64 whitelisted in WebKitFeatures.cmake. But RHEL's build manually overrides this by passing -DUSE_SYSTEM_MALLOC=ON for aarch64. I created bug #209360 to ask if it would be possible to fix that, but no response yet.

We could fix this by checking the page size at build time using sysconf(), but Carlos Lopez pointed out that will fail when cross-compiling, which is why I wound up just hardcoding page size guesses. So as long as bmalloc crashes in this configuration, the only solutions I can think of are (a) disable bmalloc for aarch64, even though it works on most systems, or (b) require that it be manually disabled downstream, and also patch the page size downstream in PageBlock.h. You can imagine I don't like either option very much. :)

Have you ever done any performance testing of bmalloc on aarch64? I'm curious to know whether it's any better than the system allocator. I assume it's probably not, but it still contains important heap security features, so it would be a shame to turn it off if not absolutely required. Maybe somebody who understands bmalloc could help with bug #209360.

> Regardless of the fix for this crash, I don't think the increase to 64kB as
> the page-size ceiling is welcome for ARM64 platforms that don't need it,
> since it leads to quadrupling the size of the MarkedBlock objects.

FWIW, the 64 KB page size is apparently performance-critical for some reason. I don't know or understand why. Maybe something about cache locality?
Comment 15 Carlos Alberto Lopez Perez 2020-03-30 08:06:34 PDT
(In reply to Michael Catanzaro from comment #14)
> (In reply to Zan Dobersek from comment #13)
> > FWIW the crash doesn't occur when bmalloc is disabled via `Malloc=1`.
> 
> Ahhh, there we go, that explains it. Yes, bmalloc has to be disabled when
> using sizes > 16 KB. The default CeilingOnPageSize of 64 KB is OK because
> bmalloc is disabled by default except on whitelisted architectures. It fails
> only on aarch64 because we have aarch64 whitelisted in WebKitFeatures.cmake.
> But RHEL's build manually overrides this by passing -DUSE_SYSTEM_MALLOC=ON
> for aarch64. I created bug #209360 to ask if it would be possible to fix
> that, but no response yet.
> 
> We could fix this by checking the page size at build time using sysconf(),
> but Carlos Lopez pointed out that will fail when cross-compiling, which is
> why I wound up just hardcoding page size guesses. So as long as bmalloc
> crashes in this configuration, the only solutions I can think of are (a)
> disable bmalloc for aarch64, even though it works on most systems, or (b)
> require that it be manually disabled downstream, and also patch the page
> size downstream in PageBlock.h. You can imagine I don't like either option
> very much. :)
> 

I also don't like any of the options, but I strongly prefer b) over a).

AFAIK, all Linux ARM64 distributions but RHEL use a default 4KB page size, so I don't think crippling the majority of our users its acceptable because of an exception. Therefore I prefer if RHEL carries a patch downstream for this problem unless we can find a better fix.

If you want to avoid carrying a patch, we can maybe add a new CMake internal config option that if set will take preference over the previous options: something like -DJSC_PAGE_SIZE=value and then you can define that to the value desired on the RHEL spec file.

> Have you ever done any performance testing of bmalloc on aarch64? I'm
> curious to know whether it's any better than the system allocator. I assume
> it's probably not, but it still contains important heap security features,
> so it would be a shame to turn it off if not absolutely required. Maybe
> somebody who understands bmalloc could help with bug #209360.
> 
> > Regardless of the fix for this crash, I don't think the increase to 64kB as
> > the page-size ceiling is welcome for ARM64 platforms that don't need it,
> > since it leads to quadrupling the size of the MarkedBlock objects.
> 
> FWIW, the 64 KB page size is apparently performance-critical for some
> reason. I don't know or understand why. Maybe something about cache locality?

A 64KB page size on ARM64 allows for faster TLB lookups. However, it can also cause performance penalties if you are dealing with small data structures that have to be page aligned. It would be interesting to see some benchmark results to see how changing this value affects the performance of JSC.

https://www.kernel.org/doc/Documentation/arm64/memory.txt
https://wiki.ubuntu.com/ARM64/performance#A4k_page_size_vs_64k_page_size
Comment 16 Michael Catanzaro 2020-03-30 08:54:17 PDT
(In reply to Carlos Alberto Lopez Perez from comment #15)
> AFAIK, all Linux ARM64 distributions but RHEL use a default 4KB page size,
> so I don't think crippling the majority of our users its acceptable because
> of an exception. Therefore I prefer if RHEL carries a patch downstream for
> this problem unless we can find a better fix.

I agree.

> If you want to avoid carrying a patch, we can maybe add a new CMake internal
> config option that if set will take preference over the previous options:
> something like -DJSC_PAGE_SIZE=value and then you can define that to the
> value desired on the RHEL spec file.

I think that's the best we can do if fixing bmalloc is not possible.

But I also think that fixing bmalloc should be possible. :)
Comment 17 Michael Catanzaro 2020-03-30 08:55:45 PDT
CC: bmalloc devs, since this has turned into a bmalloc bug.
Comment 18 Michael Catanzaro 2020-03-30 09:10:13 PDT
(In reply to Michael Catanzaro from comment #14)
> We could fix this by checking the page size at build time using sysconf(),
> but Carlos Lopez pointed out that will fail when cross-compiling, which is
> why I wound up just hardcoding page size guesses.

Maybe a build-time check would not be such a bad solution.

Linux distros do not care about cross building (unless you are Gentoo). All distros are going to be building aarch64 packages natively using powerful aarch64 servers.

Embedded systems devs also are not going to care in this particular case, because the page size check on the host system will return 4 KB, because nobody is going to be cross-building from anything except an x86_64 host. And that will match the target system, because nobody is going to be using something different than 4 KB on embedded systems. So even though it's definitely incorrect to do this, it should work OK in practice... trying to get page size at build time is incorrect anyway, we're just guessing here and hoping it doesn't change....
Comment 19 Carlos Alberto Lopez Perez 2020-03-30 09:44:29 PDT
(In reply to Michael Catanzaro from comment #18)
> (In reply to Michael Catanzaro from comment #14)
> > We could fix this by checking the page size at build time using sysconf(),
> > but Carlos Lopez pointed out that will fail when cross-compiling, which is
> > why I wound up just hardcoding page size guesses.
> 
> Maybe a build-time check would not be such a bad solution.
> 
> Linux distros do not care about cross building (unless you are Gentoo). All
> distros are going to be building aarch64 packages natively using powerful
> aarch64 servers.
> 
> Embedded systems devs also are not going to care in this particular case,
> because the page size check on the host system will return 4 KB, because
> nobody is going to be cross-building from anything except an x86_64 host.
> And that will match the target system, because nobody is going to be using
> something different than 4 KB on embedded systems. So even though it's
> definitely incorrect to do this, it should work OK in practice... trying to
> get page size at build time is incorrect anyway, we're just guessing here
> and hoping it doesn't change....

I prefer that we do not do this, mainly because its incorrect and can lead to subtle bugs that are hard to diagnose. For example: What will happen if you cross-build for PPC64LE on AMD64?

Also I don't think we can't affirm without doubt that "nobody is going to be cross-building from anything except an x86_64 host". That it's something we don't know. There are powerful boxes out there running on other architectures like PPC64 (Raptor Talos) or even ARM64.
Comment 20 Zan Dobersek 2020-03-30 09:51:45 PDT
(In reply to Michael Catanzaro from comment #14)
> Have you ever done any performance testing of bmalloc on aarch64? I'm
> curious to know whether it's any better than the system allocator. I assume
> it's probably not, but it still contains important heap security features,
> so it would be a shame to turn it off if not absolutely required. Maybe
> somebody who understands bmalloc could help with bug #209360.
> 

bmalloc is better than the system allocator both in performance and efficiency.

It's unreasonable to turn off bmalloc by default for platforms that can use it.
Comment 21 Michael Catanzaro 2020-03-30 11:12:00 PDT
OK, good to know. Hopefully we can fix bmalloc.

Otherwise, we don't have a ton of options here.
Comment 22 Saam Barati 2020-03-31 21:24:10 PDT
My guess is the commit/decommit of IsoHeaps both in JS and bmalloc are an issue here. What happens when you commit something smaller than page size? You may need to change those to use 64k. I think for bmalloc IsoHeaps, that should be trivial. I haven’t thought if this is trivial to do in JS, but I suspect it could be.
Comment 23 Saam Barati 2020-03-31 21:31:18 PDT
(In reply to Saam Barati from comment #22)
> My guess is the commit/decommit of IsoHeaps both in JS and bmalloc are an
> issue here. What happens when you commit something smaller than page size?
> You may need to change those to use 64k. I think for bmalloc IsoHeaps, that
> should be trivial. I haven’t thought if this is trivial to do in JS, but I
> suspect it could be.

Actually, it seems like the code in the JS heap should just work with 64k blocks.

However, y'all didn't implement this inside bmalloc. You'll want to switch bmalloc's IsoHeaps to use CeilingOnPageSize sized blocks, and the configSizeToProtect to be CeilingOnPageSize inside bmalloc too.
Comment 24 Saam Barati 2020-03-31 21:37:57 PDT
These seem problematic:
```
class IsoPageBase {
public:    
    static constexpr size_t pageSize = 16384;
```

```
constexpr size_t configSizeToProtect = 16 * bmalloc::Sizes::kB;
```

You might want to tune `m_smallLineMetadata` for 64k pages
Comment 25 Michael Catanzaro 2020-04-09 14:52:37 PDT
BTW, upstream kernel build options:

config ARM64_4K_PAGES
	bool "4KB"
	help
	  This feature enables 4KB pages support.

config ARM64_16K_PAGES
	bool "16KB"
	help
	  The system will use 16KB pages support. AArch32 emulation
	  requires applications compiled with 16K (or a multiple of 16K)
	  aligned segments.

config ARM64_64K_PAGES
	bool "64KB"
	help
	  This feature enables 64KB pages support (4KB by default)
	  allowing only two levels of page tables and faster TLB
	  look-up. AArch32 emulation requires applications compiled
	  with 64K aligned segments.
Comment 26 Michael Catanzaro 2020-04-09 15:01:37 PDT
Of the WTF CPU types that are not UNKNOWN, it looks like MIPS also supports page sizes up to 64 KB.
Comment 27 Michael Catanzaro 2020-04-09 15:36:56 PDT
Created attachment 396020 [details]
Patch
Comment 28 Michael Catanzaro 2020-04-09 15:38:50 PDT
(In reply to Carlos Alberto Lopez Perez from comment #0)
> The JSCOnly ARM64 its crashing all of the time since r258857

We don't have an EWS for it... do we?

I will test with the MIPS EWS, I guess.
Comment 29 Carlos Alberto Lopez Perez 2020-04-09 15:50:34 PDT
(In reply to Michael Catanzaro from comment #28)
> (In reply to Carlos Alberto Lopez Perez from comment #0)
> > The JSCOnly ARM64 its crashing all of the time since r258857
> 
> We don't have an EWS for it... do we?
> 
> I will test with the MIPS EWS, I guess.

No... no EWS at this moment. Only tests with buildbot at build.webkit.org after the fact
Comment 30 Michael Catanzaro 2020-04-09 17:19:11 PDT
MIPS EWS seems hung (stalled for five hours).
Comment 31 Michael Catanzaro 2020-05-15 14:25:53 PDT
Regression is fixed, we can continue in another bug in the future.
Comment 32 Michael Catanzaro 2020-11-17 07:32:31 PST
(In reply to Zan Dobersek from comment #13)
> Regardless of the fix for this crash, I don't think the increase to 64kB as
> the page-size ceiling is welcome for ARM64 platforms that don't need it,
> since it leads to quadrupling the size of the MarkedBlock objects.

This is finally addressed by r269396. We continue to default to assuming page size is 4 KB to avoid regressing embedded systems, but allow distributions to declare that they need 64 KB at build time if needed. If that option is used, the size of MarkedBlock and other objects that depend on system page size (notably JSCConfig/WTFConfig) increases, and bmalloc and JIT both get disabled. Otherwise, no change.