Bug 196091 - B3::Value should have different kinds of adjacency lists
Summary: B3::Value should have different kinds of adjacency lists
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Robin Morisset
URL:
Keywords: InRadar
Depends on: 196092 196093 196105 196110
Blocks:
  Show dependency treegraph
 
Reported: 2019-03-21 10:43 PDT by Robin Morisset
Modified: 2019-04-15 16:54 PDT (History)
8 users (show)

See Also:


Attachments
WIP (65.58 KB, patch)
2019-03-22 19:11 PDT, Robin Morisset
no flags Details | Formatted Diff | Diff
WIP_nearlyDone (78.69 KB, patch)
2019-03-25 11:51 PDT, Robin Morisset
no flags Details | Formatted Diff | Diff
WIP_reallyNearlyDone (77.96 KB, patch)
2019-03-25 12:46 PDT, Robin Morisset
no flags Details | Formatted Diff | Diff
Patch (76.08 KB, patch)
2019-03-25 13:00 PDT, Robin Morisset
no flags Details | Formatted Diff | Diff
Patch (84.24 KB, patch)
2019-03-25 17:51 PDT, Robin Morisset
no flags Details | Formatted Diff | Diff
Patch (84.21 KB, patch)
2019-03-25 17:58 PDT, Robin Morisset
no flags Details | Formatted Diff | Diff
Patch (83.86 KB, patch)
2019-03-27 15:47 PDT, Robin Morisset
no flags Details | Formatted Diff | Diff
Patch (83.92 KB, patch)
2019-03-29 12:38 PDT, Robin Morisset
fpizlo: review+
Details | Formatted Diff | Diff
Numbers file measuring the JetStream2 impact (418.56 KB, application/zip)
2019-04-15 14:23 PDT, Robin Morisset
no flags Details
PatchForLanding (83.91 KB, patch)
2019-04-15 16:13 PDT, Robin Morisset
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Robin Morisset 2019-03-21 10:43:07 PDT
B3::Value currently uses Vector<Value*, 3> as its adjacency list, taking 40 bytes no matter the type of value.
In particular, things like constants (which are very common) waste 40 of the 80 bytes they use. And a similar although less extreme thing is true of Identity (wastes 32 bytes out of 72), binary operations (waste 24 bytes out of 72), etc..
Since many kind of b3 values are already subclasses of B3::Value, I hope to try the following plan:
- Remove setOpcodeUnsafely and setKindUnsafely. They are only used in two places and are fairly easy to fix
- Replace all code "children().size()" by numChildren().
- Sort every kind of b3 value into one subclass of B3Value (creating some subclasses for the values that are currently represented by a bare B3::Value*)
- Make B3::Value an abstract class, and the methods that access m_children pure virtual, to be implemented by the subclasses (or by some intermediate ValueImpl that the subclasses inherit from and that inherit from Value)
- Split ValueImpl into three: ValueImpl<FixedAdjacencyList<n>>, ValueImpl<BoundedAdjacencyList<n>> and ValueImpl<UnboundedAdjacencyList>, using respectively a bare array, an array + a size, or a Vector<> as their implementation of m_children.

One slightly tricky part is the following functions:
- replaceWithOops
- replaceWithPhi
- replaceWithNop
- replaceWithNopIgnoringType
- replaceWithJump
- replaceWithBottom
- replaceWithIdentity
The implementations will have to be moved to a templated function that does a static_assert that the actual parameterized ValueType is large enough to hold the value we are trying to store in it (in other words, we want to make sure that Oops/Phi/Nop/Jump/Bottom/Identity all have the same size, and are the smallest of B3Values).
Then the actual methods will become pure virtual, and have to be implemented by a call to this function in the subclasses, enforcing that all subclasses must be large enough to fit an Identity (which will probably require one word of padding for Oops/Jump/etc..).

I will have to verify that the overhead of calling these virtual functions is negligible. I am optimistic because we only rarely access the children of a B3::Value without first switching on its opcode, at which point we know its actual type, and can use it if we see a performance cost.
Comment 1 Filip Pizlo 2019-03-21 13:46:24 PDT
(In reply to Robin Morisset from comment #0)
> B3::Value currently uses Vector<Value*, 3> as its adjacency list, taking 40
> bytes no matter the type of value.
> In particular, things like constants (which are very common) waste 40 of the
> 80 bytes they use. And a similar although less extreme thing is true of
> Identity (wastes 32 bytes out of 72), binary operations (waste 24 bytes out
> of 72), etc..
> Since many kind of b3 values are already subclasses of B3::Value, I hope to
> try the following plan:
> - Remove setOpcodeUnsafely and setKindUnsafely. They are only used in two
> places and are fairly easy to fix
> - Replace all code "children().size()" by numChildren().
> - Sort every kind of b3 value into one subclass of B3Value (creating some
> subclasses for the values that are currently represented by a bare
> B3::Value*)
> - Make B3::Value an abstract class, and the methods that access m_children
> pure virtual, to be implemented by the subclasses (or by some intermediate
> ValueImpl that the subclasses inherit from and that inherit from Value)
> - Split ValueImpl into three: ValueImpl<FixedAdjacencyList<n>>,
> ValueImpl<BoundedAdjacencyList<n>> and ValueImpl<UnboundedAdjacencyList>,
> using respectively a bare array, an array + a size, or a Vector<> as their
> implementation of m_children.

How about do this without any virtual anything?

1) Value allocated its adjacency list as a variable length array at the tail of Value rather than a Vector with inline capacity. 

2) have something much more clever for out-of-line children, for example where every access branches (small index -> inline, large index -> out of line). 

> 
> One slightly tricky part is the following functions:
> - replaceWithOops
> - replaceWithPhi
> - replaceWithNop
> - replaceWithNopIgnoringType
> - replaceWithJump
> - replaceWithBottom
> - replaceWithIdentity
> The implementations will have to be moved to a templated function that does
> a static_assert that the actual parameterized ValueType is large enough to
> hold the value we are trying to store in it (in other words, we want to make
> sure that Oops/Phi/Nop/Jump/Bottom/Identity all have the same size, and are
> the smallest of B3Values).
> Then the actual methods will become pure virtual, and have to be implemented
> by a call to this function in the subclasses, enforcing that all subclasses
> must be large enough to fit an Identity (which will probably require one
> word of padding for Oops/Jump/etc..).
> 
> I will have to verify that the overhead of calling these virtual functions
> is negligible. I am optimistic because we only rarely access the children of
> a B3::Value without first switching on its opcode, at which point we know
> its actual type, and can use it if we see a performance cost.
Comment 2 Robin Morisset 2019-03-21 13:55:31 PDT
> 
> How about do this without any virtual anything?
> 
> 1) Value allocated its adjacency list as a variable length array at the tail
> of Value rather than a Vector with inline capacity.

This was my original idea, the problem is that there are already quite a few subclasses of Value, and they add fields to it. For example constants add the constant value, MemoryValue adds a heap range, a fence range and an offset, etc..
So it seems to me that these fields would be after the end of the fields of Value, and thus conflict with the variable size array.
Comment 3 Filip Pizlo 2019-03-21 13:57:13 PDT
(In reply to Robin Morisset from comment #2)
> > 
> > How about do this without any virtual anything?
> > 
> > 1) Value allocated its adjacency list as a variable length array at the tail
> > of Value rather than a Vector with inline capacity.
> 
> This was my original idea, the problem is that there are already quite a few
> subclasses of Value, and they add fields to it. For example constants add
> the constant value, MemoryValue adds a heap range, a fence range and an
> offset, etc..
> So it seems to me that these fields would be after the end of the fields of
> Value, and thus conflict with the variable size array.

Then prepend the adjaceny list to Valur and make it negatively indexed. :-)
Comment 4 Robin Morisset 2019-03-21 14:05:46 PDT
(In reply to Filip Pizlo from comment #3)
> (In reply to Robin Morisset from comment #2)
> > > 
> > > How about do this without any virtual anything?
> > > 
> > > 1) Value allocated its adjacency list as a variable length array at the tail
> > > of Value rather than a Vector with inline capacity.
> > 
> > This was my original idea, the problem is that there are already quite a few
> > subclasses of Value, and they add fields to it. For example constants add
> > the constant value, MemoryValue adds a heap range, a fence range and an
> > offset, etc..
> > So it seems to me that these fields would be after the end of the fields of
> > Value, and thus conflict with the variable size array.
> 
> Then prepend the adjaceny list to Valur and make it negatively indexed. :-)

I had not thought of that trick. Thanks, I'll change my approach to that.
Comment 5 Robin Morisset 2019-03-21 14:49:14 PDT
The size of the children (0, 1, 2, 3, or varargs) will be stored in a field added in the free byte between m_type and m_origin.

In order to correctly support replaceWithIdentity (which needs one child), even values with no children will have one extra free word allocated before them (we could optionally make constants store their value there eventually, in a separate patch).
Comment 6 Robin Morisset 2019-03-22 19:11:09 PDT
Created attachment 365794 [details]
WIP

Not ready for review, just sending it as a backup.
Still needs to be done:
- Adding an explicit copy constructor that calls buildAdjacencyList (this currently causes the patch to fail testb3)
- probably more bugs still to fix..
- Add an optimization where the methods when called on a subclass don't need to test whether or not the value is a varargs
- General clean-up, moving things to the .cpp from the headers, etc..
Comment 7 Robin Morisset 2019-03-25 11:51:45 PDT
Created attachment 365879 [details]
WIP_nearlyDone

The last parts remaining are running benchmarks to get performance numbers, and writing the changelog.
Comment 8 Build Bot 2019-03-25 11:57:30 PDT
Attachment 365879 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/b3/B3Const32Value.h:87:  Wrong number of spaces before statement. (expected: 8)  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/b3/B3Const32Value.h:88:  Wrong number of spaces before statement. (expected: 8)  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/b3/B3Value.h:125:  An else statement can be removed when the prior "if" concludes with a return, break, continue or goto statement.  [readability/control_flow] [4]
ERROR: Source/JavaScriptCore/b3/B3Value.h:135:  An else statement can be removed when the prior "if" concludes with a return, break, continue or goto statement.  [readability/control_flow] [4]
ERROR: Source/JavaScriptCore/b3/B3Value.h:385:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/b3/B3Value.h:529:  Extra space before last semicolon. If this should be an empty statement, use { } instead.  [whitespace/semicolon] [5]
ERROR: Source/JavaScriptCore/b3/B3ValueInlines.h:55:  Missing space before ( in switch(  [whitespace/parens] [5]
ERROR: Source/JavaScriptCore/b3/B3ValueInlines.h:167:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/B3ValueInlines.h:173:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/B3MoveConstants.cpp:42:  Bad include order. Mixing system and custom headers.  [build/include_order] [4]
ERROR: Source/JavaScriptCore/b3/B3CCallValue.h:60:  Should have a space between // and comment  [whitespace/comments] [4]
ERROR: Source/JavaScriptCore/b3/B3CCallValue.h:69:  Should have a space between // and comment  [whitespace/comments] [4]
ERROR: Source/JavaScriptCore/b3/B3Const64Value.h:87:  Wrong number of spaces before statement. (expected: 8)  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/b3/B3Const64Value.h:88:  Wrong number of spaces before statement. (expected: 8)  [whitespace/indent] [4]
Total errors found: 14 in 38 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 9 Robin Morisset 2019-03-25 12:46:02 PDT
Created attachment 365884 [details]
WIP_reallyNearlyDone

Fix the build and lint issues.
Comment 10 Robin Morisset 2019-03-25 13:00:13 PDT
Created attachment 365887 [details]
Patch

Fixing one last build issue (a missing import of wtf/Vector.h). Really surprising that they did not show up on my machine.
Comment 11 Robin Morisset 2019-03-25 17:51:11 PDT
Created attachment 365928 [details]
Patch

- Added a changelog
- Fixed the build issue (it was just a missing import of wtf/Vector.h)
- Made B3::Value WTF_MAKE_FAST_ALLOCATED again (I had removed it because my new did not match my delete temporarily and forgot about it).
Comment 12 Robin Morisset 2019-03-25 17:54:36 PDT
I launched two A/B tasks to see the effect on RAMification and JetStream2:
https://perf-safari.apple.com/v3/#/analysis/task/6342
https://perf-safari.apple.com/v3/#/analysis/task/6341

I think my earlier confusing results were due to having deleted WTF_MAKE_FAST_ALLOCATED.
Comment 13 Robin Morisset 2019-03-25 17:58:44 PDT
Created attachment 365931 [details]
Patch

And this time putting back #include <wtf/FastMalloc.h> before the bots complain (for some reason it compiled find on my system despite lacking it).
Comment 14 Robin Morisset 2019-03-26 13:33:20 PDT
This was a regression on RAMification, I think because I removed the inline capacity in the VarArgs case. I am now looking at adding it back, ideally tuning it per opcode, and depending on whether we are creating a value from scratch or copying another one.
Comment 15 Robin Morisset 2019-03-27 15:47:35 PDT
Created attachment 366116 [details]
Patch

This uses an inline capacity of 3 (as before) for the VarArgs case. I have an alternative patch that customizes the inline capacity to the opcode, but the implementation gets extremely gross, for (I think) little benefit.
A global run on RAMification was inconclusive (dominated by noise in LuaJSFight among other things), so I ran 4 runs of RAMification on my machine, only looking at the peak footprint of the wasm benchmarks, after having disabled their call to main to only measure the compilation footprint.
The result was a rather disappointing 1% reduction in peak footprint.
I'd like to land this since it is still an improvement, and monitor the bots for any regression/progression.
Comment 16 Robin Morisset 2019-03-29 12:38:21 PDT
Created attachment 366298 [details]
Patch

Almost the same as the last one, just add a missing cast to (hopefully) fix the build failure on the gtk bot.
Comment 17 Saam Barati 2019-04-04 16:22:30 PDT
What’s the results on memory on RAMification?
Comment 18 Robin Morisset 2019-04-04 16:59:20 PDT
(In reply to Saam Barati from comment #17)
> What’s the results on memory on RAMification?

It is a regression, purely from LuaJSFight, which this has no way of altering in any way.. I'll relaunch the bots with more repetitions to bring the noise down.
Comment 19 Robin Morisset 2019-04-15 14:22:47 PDT
Here are some basic results from running JetStream2 with some instrumentation in B3Generate.cpp

This patch only saves an average of 13kB per function being compiled to B3. There are some significant outliers, here is the top 10:
57262	1781296	Tsf-wasm
13780	403232	OfflineAssembler
9715	271584	Regexp
7331	205880	Regexp
6237	184896	Gbemu
5361	163424	Mandreel
5559	160184	Regexp
5490	157752	Typescript
5191	145760	Typescript
5077	144680	Typescript
(left: number of values allocated, middle: number of bytes saved on those, right: jetstream2 benchmark it came from)

Here is the top outlier per benchmark that reach B3:
3264	98424	3d-cube-sp
3827	119768	3d-raytrace-SP
3055	89216	Acorn-wtb
1560	44784	ai-astar
713	23744	Air
1691	49912	async-fs
2856	81608	babylon
3130	86392	babylon-wtb
1005	30568	base64-SP
1120	32296	Basic
2724	75528	Box2D
1842	53288	Cdjs
1358	40008	Chai-wtb
817	24280	coffeescript-wtb
676	19312	Crypto
1769	51408	crypto-aes-sp
4771	138160	crypto-md5-sp
1104	32592	crypto-sha1-SP
2703	82136	date-format-tofte-sp
1645	49176	Date-format-xparb-sp
1231	35768	delta-blue
647	19016	earley-boyer
2407	71424	Espree-wtb
1087	32520	flightPlanner
1108	33176	Gaussian-blur
6237	184896	Gbemu
1079	30944	gcc-loops-wasm
1136	32576	hash-map
3044	90432	HashSet-wasm
3002	88296	jshint-wtb
3147	90416	json-stringify-inspector (only function to reach B3)
1609	49736	lebab-wtb
5361	163424	Mandreel
1095	31632	ML
637	17496	N-body-SP
690	17784	Navier-stokes
4075	125600	Octane-zlib
13780	403232	OfflineAssembler
3563	107248	Pdfs
1084	30632	Prepack-wtb
118	3632	quicksort-wasm
3400	98560	Raytrace
9715	271584	Regexp
997	28472	Richards
308	9480	Richards-wasm
986	30280	splay
1268	35336	Stanford-crypto-aes
1098	31288	Stanford-crypto-pbkdf2
646	19400	Stanford-crypto-sha256
720	20264	string-unpack-code-SP
1021	30800	tagcloud-SP
57262	1781296	Tsf-wasm
5490	157752	Typescript
1163	37080	Uglify-js-wtb
2908	83256	UniPoker
3251	92808	WSL
Comment 20 Robin Morisset 2019-04-15 14:23:44 PDT
Created attachment 367458 [details]
Numbers file measuring the JetStream2 impact

From which the comment above was written.
Comment 21 Robin Morisset 2019-04-15 16:13:29 PDT
Created attachment 367474 [details]
PatchForLanding
Comment 22 WebKit Commit Bot 2019-04-15 16:53:29 PDT
Comment on attachment 367474 [details]
PatchForLanding

Clearing flags on attachment: 367474

Committed r244309: <https://trac.webkit.org/changeset/244309>
Comment 23 WebKit Commit Bot 2019-04-15 16:53:31 PDT
All reviewed patches have been landed.  Closing bug.
Comment 24 Radar WebKit Bug Importer 2019-04-15 16:54:21 PDT
<rdar://problem/49923813>