WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
154808
NewRegexp should not prevent inlining
https://bugs.webkit.org/show_bug.cgi?id=154808
Summary
NewRegexp should not prevent inlining
Filip Pizlo
Reported
2016-02-28 21:49:09 PST
It currently prevents inlining because the Node uses indices into the CodeBlock.
Attachments
Proposed Patch
(10.57 KB, patch)
2016-08-20 01:05 PDT
,
Caio Lima
no flags
Details
Formatted Diff
Diff
Archive of layout-test-results from ews122 for ios-simulator-elcapitan-wk2
(690.74 KB, application/zip)
2016-08-20 03:26 PDT
,
Build Bot
no flags
Details
Patch
(13.67 KB, patch)
2016-08-22 09:19 PDT
,
Caio Lima
no flags
Details
Formatted Diff
Diff
Patch
(11.31 KB, patch)
2016-08-24 09:08 PDT
,
Caio Lima
no flags
Details
Formatted Diff
Diff
Regexp Inline Benchmark 2
(78.91 KB, text/plain)
2016-08-24 23:11 PDT
,
Caio Lima
no flags
Details
Regex inline benchmark 1
(77.62 KB, text/plain)
2016-08-24 23:12 PDT
,
Caio Lima
no flags
Details
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Caio Lima
Comment 1
2016-08-19 12:05:44 PDT
Working on It. I already have an implementation to propose, however I am investigating if it is Safe because of Garbage Colector. I am going to post it tonight.
Caio Lima
Comment 2
2016-08-20 01:05:02 PDT
Created
attachment 286537
[details]
Proposed Patch This version that I propose. I am investigating if there is a possibility of callee CodeBock be GCed because if it happens, RegExp are potentially candidates to be collected and Mache Code Block are going to point o invalid RegExp. The current test that I am using right now is: ``` function toInline(a) { var re = /cc+/; print("test ccc: "+re.test("ccc")); print("test abc: "+re.test("abc")); return 0; } function f(a) { toInline(a) var re = /(ab)+/; print("test ba: " + re.test("ba")); return 0; } noInline(f); for (let i = 0; i < 100000; i++) { f(2); } gc(); print("GC Passed"); let a = Array(10000000); for (let i = 0; i < 10000000; i++) { a[i] = {value: i}; } print("Objects Allocated"); for (let i = 0; i < 100; i++) { f(4); } ``` However it is possible write more test codes with correct asserts. This testing is working properly but I think it is worth more investigation.
Build Bot
Comment 3
2016-08-20 03:26:44 PDT
Comment on
attachment 286537
[details]
Proposed Patch
Attachment 286537
[details]
did not pass ios-sim-ews (ios-simulator-wk2): Output:
http://webkit-queues.webkit.org/results/1906777
New failing tests: fast/scrolling/ios/scroll-events-back-forward.html
Build Bot
Comment 4
2016-08-20 03:26:47 PDT
Created
attachment 286540
[details]
Archive of layout-test-results from ews122 for ios-simulator-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews122 Port: ios-simulator-elcapitan-wk2 Platform: Mac OS X 10.11.5
Caio Lima
Comment 5
2016-08-22 09:19:13 PDT
Created
attachment 286601
[details]
Patch
Geoffrey Garen
Comment 6
2016-08-22 11:11:57 PDT
Comment on
attachment 286601
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=286601&action=review
I think a better model might be for every CodeBlock always to copy its RegExp array out of its unlinked counterpart and into a unique vector. This will waste a little space, but we think it's rare, so it's probably acceptable. The nice thing about this change is that we won't need two different modes for accessing regexps. If we do need two different modes to save space, I would call the first mode "unlinkedRegExp()" and the second mode "jitCodeRegExp()", and I would store the regexeps used only by optimizing compilers on the JITCode object.
> Source/JavaScriptCore/ChangeLog:9 > + when it contains NewRegExp nodes. In current implementation, it is not
when the callee contains In the current implementation, inlining
> Source/JavaScriptCore/ChangeLog:10 > + possible because the RegExp references are being stored in their function's
are stored
> Source/JavaScriptCore/ChangeLog:12 > + NewRegExp node is emmited with the RegExp's index in CodeBlock vector
emitted in the CodeBlock vector
> Source/JavaScriptCore/ChangeLog:13 > + as parameter to be retrived in futher phases of compilation
as a parameter retrieved
> Source/JavaScriptCore/ChangeLog:16 > + As solution, we are remmaping the RegExps from its callee inline canditate
As a solution, we remap each RegExp candidate
> Source/JavaScriptCore/ChangeLog:17 > + to its machine code block in inline phase into
in the inlining phase
> Source/JavaScriptCore/ChangeLog:24 > + It is important notice that we created a new Vector because
to notice
> Source/JavaScriptCore/ChangeLog:27 > + out of compilation thread.
from the compilation thread
> Source/JavaScriptCore/bytecode/CodeBlock.h:557 > + unsigned addRegExpList(RegExp* regexp)
This function name sounds like you're adding a list of RegExps. I don't think it differs clearly enough from CodeBlock::regexp(). It is a little surprising that CodeBlock has two different RegExp stores, one that forwards to the unlinked code block and one that is local.
> Source/JavaScriptCore/bytecode/CodeBlock.h:900 > + Vector<RegExp*> m_regExps;
Why is it OK to store raw pointers to RegExp without a write barrier operation? You've explained why we can't do write barrier operations on the compilation thread, but I don't think you've explained why it's OK just to skip them -- and I don't think it is OK.
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5401 > +
Revert.
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5422 > +
Revert.
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5428 > + // We need to remap and copy RegExps from codeBlock->unlinkedCodeBlock() to > + // byteCodeParser->m_codeBlock because it allow us inline functions with NewRegExp. > + // nodes.
I don't fully understand this comment.
Caio Lima
Comment 7
2016-08-22 22:45:41 PDT
Comment on
attachment 286601
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=286601&action=review
>> Source/JavaScriptCore/bytecode/CodeBlock.h:900 >> + Vector<RegExp*> m_regExps; > > Why is it OK to store raw pointers to RegExp without a write barrier operation? You've explained why we can't do write barrier operations on the compilation thread, but I don't think you've explained why it's OK just to skip them -- and I don't think it is OK.
I am using this Vector to store the RegExp from inlined functions and also store the RegExp from its own m_unlikedCode. In my current investigation and I tried to replicate it in JSTests/stress/new-regex-inline.js, and the RegExp pointer is valid while its UnlikedCode is not collected by GC. The WriteBarrie is strictly necessary in this case, otherwise it is not safe manipulate RegExp objects. So, In my understand, we need to keep unlinkedCodeBlock->m_regexps [1], because it is the reason that RegExp* aren't collected by GC and remain valid in execution context. However, I agree with you that we can improve this implementation. I have the following changes in my mind: 1. remove all regExpList* methods I created; 2. Store RegEx from inlined functions in CodeBlock->m_rareData->m_inlinedRegexps as Vector<RegExp*> (consider it as a Weak pointer for "GC eyes"); 3. In regex() check if "index < m_unlinkedCode->numberOfRegExps()" and if it is false, get the element in CodeBlock->m_rareData->m_inlinedRegexps (we avoid waste space); To 2. be valid, what I need to be sure is if (Inlined Function CodeBlock)->m_unlikedCode->regexps [1] is not collected in GC phase. I accept help here to be 100% sure that it is a valid implementation. I am trying to debug GC with JSTests/stress/new-regex-inline.js and also --logGC=2. It is extremely helpful if you know a way to check if (Inlined Function CodeBlock)->m_unlikedCode is collected in "gc()" and I would be happy learning it. My Assumption is that if UnlinkedCode is not collected, m_regexps are pointed to them and won't be collected too. Does it make sense to you? [1] -
https://github.com/WebKit/webkit/blob/e9acddff34511e7f3e184b6e6f34897027e8b61e/Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.h#L452
>> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5428 >> + // nodes. > > I don't fully understand this comment.
I think it is confusing. What I am trying to say here is that we are mapping regExps from callee into machine code block to enable inlining, and we are also "mapping" machine code block's own RegExps into m_regexpRemap (but it is an identity mapping m_regexpRemap[i] == byteCodeParser->m_codeBlock->m_unlinkedCodeBlock->m_regexps[i]). Since we are planing change this implementation, it is going to be Out of Date.
Filip Pizlo
Comment 8
2016-08-22 23:41:40 PDT
Comment on
attachment 286601
[details]
Patch You shouldn't be adding any new fields to CodeBlock. The DFG already has a mechanism for strongly marking objects known to the compiler such as the RegExps. It's called freezing: just do m_graph.freezeStrongly(...). You don't need to record the RegExp object anywhere other than the NewRegExp node. That node should not have an index into any vectors. It should just hold a pointer to RegExp. We already do this for other node types. See for example Node::cellOperand(). Notice that its already compatible with freezing.
Caio Lima
Comment 9
2016-08-23 11:29:43 PDT
(In reply to
comment #8
)
> Comment on
attachment 286601
[details]
> Patch > > You shouldn't be adding any new fields to CodeBlock. > > The DFG already has a mechanism for strongly marking objects known to the > compiler such as the RegExps. It's called freezing: just do > m_graph.freezeStrongly(...). > > You don't need to record the RegExp object anywhere other than the NewRegExp > node. That node should not have an index into any vectors. It should just > hold a pointer to RegExp. We already do this for other node types. See for > example Node::cellOperand(). Notice that its already compatible with > freezing.
Thank you for this info. I really didn't know about that. I took a look in code using the freezing and it looks simpler and more correct than my current patch. Just for curiosity, as I understood, this Freeze mechanism interact with the GC, right? If yes, How does it happen?
Filip Pizlo
Comment 10
2016-08-23 11:54:45 PDT
(In reply to
comment #9
)
> (In reply to
comment #8
) > > Comment on
attachment 286601
[details]
> > Patch > > > > You shouldn't be adding any new fields to CodeBlock. > > > > The DFG already has a mechanism for strongly marking objects known to the > > compiler such as the RegExps. It's called freezing: just do > > m_graph.freezeStrongly(...). > > > > You don't need to record the RegExp object anywhere other than the NewRegExp > > node. That node should not have an index into any vectors. It should just > > hold a pointer to RegExp. We already do this for other node types. See for > > example Node::cellOperand(). Notice that its already compatible with > > freezing. > > Thank you for this info. I really didn't know about that. I took a look in > code using the freezing and it looks simpler and more correct than my > current patch. Just for curiosity, as I understood, this Freeze mechanism > interact with the GC, right? If yes, How does it happen?
There's a lot of logic there: - Freezing causes the resulting CodeBlock to have either a strong or weak reference to the frozen object, depending on whether you froze it strongly or weakly. - Freezing immediately causes the ongoing DFG compilation plan to track the reference if a GC happens while the compiler is running. The GC knows how to safepoint the compiler. This means that from the compiler's standpoint, GCs can only happen at well-defined points: either before the compiler started, during B3 compilation, or after the compiler finished. There are a lot of other details; I can't remember all of them off the top of my head. The short version is just: we already use freezing a lot, and it's designed exactly for what you want: you have some object that the generated code will refer to and you want to make sure that this object gets marked.
Caio Lima
Comment 11
2016-08-23 15:40:26 PDT
(In reply to
comment #10
)
> > There's a lot of logic there: > > - Freezing causes the resulting CodeBlock to have either a strong or weak > reference to the frozen object, depending on whether you froze it strongly > or weakly. > > - Freezing immediately causes the ongoing DFG compilation plan to track the > reference if a GC happens while the compiler is running. The GC knows how > to safepoint the compiler. This means that from the compiler's standpoint, > GCs can only happen at well-defined points: either before the compiler > started, during B3 compilation, or after the compiler finished. > > There are a lot of other details; I can't remember all of them off the top > of my head. The short version is just: we already use freezing a lot, and > it's designed exactly for what you want: you have some object that the > generated code will refer to and you want to make sure that this object gets > marked.
Thank You so much.
Caio Lima
Comment 12
2016-08-24 09:08:09 PDT
Created
attachment 286856
[details]
Patch
Caio Lima
Comment 13
2016-08-24 09:45:22 PDT
I am going to run benchmarks and compare to check for any improvement/regress this Patch can cause.
Geoffrey Garen
Comment 14
2016-08-24 14:26:31 PDT
Comment on
attachment 286856
[details]
Patch Looks much better. r=me
Caio Lima
Comment 15
2016-08-24 23:11:32 PDT
Created
attachment 286948
[details]
Regexp Inline Benchmark 2
Caio Lima
Comment 16
2016-08-24 23:12:02 PDT
Created
attachment 286949
[details]
Regex inline benchmark 1
Caio Lima
Comment 17
2016-08-24 23:14:57 PDT
Benchmarks are showing possible ~0.63% regression in general results of Benchmark 2. We may be interested in these results: 1. might be 1.1097x faster" in Octane 2. V8Spider regex - neutral 3. Sunspider regexp-dna - might be 1.3842x slower 4. ~11 regressions with ~2% of regression In benchmark 1, Sunspider regex-dna is resulting in "might be 1.0296x slower" I think that this Patch seems neutral in with some noise in the second run.
WebKit Commit Bot
Comment 18
2016-08-24 23:16:17 PDT
Comment on
attachment 286856
[details]
Patch Rejecting
attachment 286856
[details]
from commit-queue.
ticaiolima@gmail.com
does not have committer permissions according to
http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/common/config/contributors.json
. - If you do not have committer rights please read
http://webkit.org/coding/contributing.html
for instructions on how to use bugzilla flags. - If you have committer rights please correct the error in Tools/Scripts/webkitpy/common/config/contributors.json by adding yourself to the file (no review needed). The commit-queue restarts itself every 2 hours. After restart the commit-queue will correctly respect your committer rights.
Caio Lima
Comment 19
2016-08-24 23:17:41 PDT
(In reply to
comment #18
)
> Comment on
attachment 286856
[details]
> Patch > > Rejecting
attachment 286856
[details]
from commit-queue. > >
ticaiolima@gmail.com
does not have committer permissions according to >
http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/common/config/
> contributors.json. > > - If you do not have committer rights please read >
http://webkit.org/coding/contributing.html
for instructions on how to use > bugzilla flags. > > - If you have committer rights please correct the error in > Tools/Scripts/webkitpy/common/config/contributors.json by adding yourself to > the file (no review needed). The commit-queue restarts itself every 2 > hours. After restart the commit-queue will correctly respect your committer > rights.
I can't cq+ =).
WebKit Commit Bot
Comment 20
2016-08-25 00:05:47 PDT
Comment on
attachment 286856
[details]
Patch Clearing flags on attachment: 286856 Committed
r204958
: <
http://trac.webkit.org/changeset/204958
>
WebKit Commit Bot
Comment 21
2016-08-25 00:05:52 PDT
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug