Bug 271610
Summary: | Web Inspector: Generating JS builtins for createInspectorInjectedScript is very slow | ||
---|---|---|---|
Product: | WebKit | Reporter: | Philippe Normand <philn> |
Component: | Web Inspector | Assignee: | Karl Dubost <karlcow> |
Status: | RESOLVED FIXED | ||
Severity: | Normal | CC: | hi, inspector-bugzilla-changes, karlcow, qianlangchen, webkit-bug-importer |
Priority: | P2 | Keywords: | InRadar |
Version: | WebKit Nightly Build | ||
Hardware: | All | ||
OS: | All | ||
Bug Depends on: | 152294 | ||
Bug Blocks: |
Philippe Normand
Takes 30 to 40 seconds and blocks the build... That function nests so many other functions, BuiltinsCollection._parse_functions() is the bottleneck, using various regular expressions, looking for patterns on the whole file every time.
Would it be possible to move some of these nested functions out? Was there a specific reason for the current design?
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
Philippe Normand
time python Source/JavaScriptCore/Scripts/generate-js-builtins.py --framework JavaScriptCore --output-directory /tmp/ --combined Source/JavaScriptCore/builtins/*.js Source/JavaSc
riptCore/inspector/InjectedScriptSource.js
________________________________________________________
Executed in 43.86 secs fish external
usr time 43.68 secs 394.00 micros 43.68 secs
sys time 0.05 secs 91.00 micros 0.05 secs
Radar WebKit Bug Importer
<rdar://problem/125703146>
Karl Dubost
https://searchfox.org/wubkat/rev/473ca5f8512b88edd7e82c8783e7e09158f17ba1/Source/JavaScriptCore/Scripts/wkbuiltins/builtins_model.py#314-373
This is calling
matches = [func for func in functionHeadRegExp.finditer(text)]
which is
functionHeadRegExp = re.compile(r"(?:@[\w|=\[\] \"\.]+\s*\n)*(?:async\s+)?function\s+\w+\s*\(.*?\)", re.MULTILINE | re.DOTALL)
which then ends up in another for loop:
for match in matches:
The latest big change on this code was done on May 2022
https://bugs.webkit.org/show_bug.cgi?id=152294
https://github.com/WebKit/WebKit/commit/1d75ec6c3ee1575325c7af691084fb59a6e7587c
I wonder about this too.
https://searchfox.org/wubkat/rev/473ca5f8512b88edd7e82c8783e7e09158f17ba1/Source/JavaScriptCore/Scripts/wkbuiltins/builtins_model.py#163
Karl Dubost
TLDR
As a starting reference, on my laptop, I get:
16.54s user 0.04s system 99% cpu 16.715 total
but most of that time is used by one file: InjectedScriptSource.js, if I remove it I get:
0.93s user 0.02s system 97% cpu 0.973 total
More details:
_parse_functions is relatively fast for most of the iterations, but indeed InjectedScriptSource.js is one of the bottlenecks. Large file.
This is used here.
https://searchfox.org/wubkat/rev/473ca5f8512b88edd7e82c8783e7e09158f17ba1/Source/JavaScriptCore/Scripts/wkbuiltins/builtins_model.py#215-217
It chokes only on one iteration. Replacing the code by:
import time
tic = time.perf_counter()
parsed_functions = self._parse_functions(text)
toc = time.perf_counter()
print(f"_parse_functions: {toc - tic:0.4f}s / {filename} / {len(text)} chars")
The number of matches is given by length(matches) on
matches = [func for func in functionHeadRegExp.finditer(text)]
nb of matches: 6 / _parse_functions: 0.0461s / ArrayConstructor.js / 7841 chars
nb of matches: 2 / _parse_functions: 0.0053s / ArrayIteratorPrototype.js / 3003 chars
nb of matches: 34 / _parse_functions: 0.1062s / ArrayPrototype.js / 25784 chars
nb of matches: 8 / _parse_functions: 0.0278s / AsyncFromSyncIteratorPrototype.js / 6395 chars
nb of matches: 3 / _parse_functions: 0.0063s / AsyncFunctionPrototype.js / 2819 chars
nb of matches: 21 / _parse_functions: 0.0587s / AsyncGeneratorPrototype.js / 11595 chars
nb of matches: 1 / _parse_functions: 0.0001s / AsyncIteratorPrototype.js / 1460 chars
nb of matches: 3 / _parse_functions: 0.0009s / FunctionPrototype.js / 2042 chars
nb of matches: 4 / _parse_functions: 0.0081s / GeneratorPrototype.js / 3878 chars
nb of matches: 1 / _parse_functions: 0.0003s / GlobalObject.js / 1676 chars
nb of matches: 1 / _parse_functions: 0.0017s / GlobalOperations.js / 1985 chars
nb of matches: 2 / _parse_functions: 0.0109s / InternalPromiseConstructor.js / 3159 chars
nb of matches: 4 / _parse_functions: 0.0046s / IteratorHelpers.js / 3598 chars
nb of matches: 1 / _parse_functions: 0.0001s / IteratorPrototype.js / 1447 chars
nb of matches: 1 / _parse_functions: 0.0056s / MapConstructor.js / 2614 chars
nb of matches: 2 / _parse_functions: 0.0024s / MapIteratorPrototype.js / 2331 chars
nb of matches: 1 / _parse_functions: 0.0016s / MapPrototype.js / 1902 chars
nb of matches: 20 / _parse_functions: 0.0521s / ModuleLoader.js / 24361 chars
nb of matches: 1 / _parse_functions: 0.0003s / NumberConstructor.js / 1588 chars
nb of matches: 2 / _parse_functions: 0.0066s / ObjectConstructor.js / 3223 chars
nb of matches: 18 / _parse_functions: 0.0780s / PromiseConstructor.js / 13018 chars
nb of matches: 29 / _parse_functions: 0.0746s / PromiseOperations.js / 21115 chars
nb of matches: 3 / _parse_functions: 0.0078s / PromisePrototype.js / 3204 chars
nb of matches: 5 / _parse_functions: 0.0203s / ProxyHelpers.js / 5748 chars
nb of matches: 4 / _parse_functions: 0.0018s / ReflectObject.js / 2801 chars
nb of matches: 13 / _parse_functions: 0.2080s / RegExpPrototype.js / 22590 chars
nb of matches: 1 / _parse_functions: 0.0091s / RegExpStringIteratorPrototype.js / 2694 chars
nb of matches: 2 / _parse_functions: 0.0018s / SetIteratorPrototype.js / 2219 chars
nb of matches: 9 / _parse_functions: 0.0362s / SetPrototype.js / 9197 chars
nb of matches: 4 / _parse_functions: 0.0049s / ShadowRealmPrototype.js / 3285 chars
nb of matches: 1 / _parse_functions: 0.0038s / StringConstructor.js / 2200 chars
nb of matches: 1 / _parse_functions: 0.0082s / StringIteratorPrototype.js / 2588 chars
nb of matches: 29 / _parse_functions: 0.0356s / StringPrototype.js / 13145 chars
nb of matches: 2 / _parse_functions: 0.0283s / TypedArrayConstructor.js / 4428 chars
nb of matches: 14 / _parse_functions: 0.0224s / TypedArrayPrototype.js / 9332 chars
nb of matches: 2 / _parse_functions: 0.0005s / WebAssembly.js / 1725 chars
nb of matches: 20 / _parse_functions: 15.9046s / InjectedScriptSource.js / 62229 chars
Let's see inside:
matches = [func for func in functionHeadRegExp.finditer(text)]
is not the bottleneck.
matches perf: 0.0002s / matches: 6
matches perf: 0.0001s / matches: 2
matches perf: 0.0010s / matches: 34
matches perf: 0.0002s / matches: 8
matches perf: 0.0001s / matches: 3
matches perf: 0.0004s / matches: 21
matches perf: 0.0000s / matches: 1
matches perf: 0.0000s / matches: 3
matches perf: 0.0001s / matches: 4
matches perf: 0.0000s / matches: 1
matches perf: 0.0000s / matches: 1
matches perf: 0.0001s / matches: 2
matches perf: 0.0001s / matches: 4
matches perf: 0.0000s / matches: 1
matches perf: 0.0000s / matches: 1
matches perf: 0.0000s / matches: 2
matches perf: 0.0000s / matches: 1
matches perf: 0.0004s / matches: 20
matches perf: 0.0000s / matches: 1
matches perf: 0.0001s / matches: 2
matches perf: 0.0004s / matches: 18
matches perf: 0.0007s / matches: 29
matches perf: 0.0001s / matches: 3
matches perf: 0.0002s / matches: 5
matches perf: 0.0000s / matches: 4
matches perf: 0.0006s / matches: 13
matches perf: 0.0001s / matches: 1
matches perf: 0.0000s / matches: 2
matches perf: 0.0003s / matches: 9
matches perf: 0.0001s / matches: 4
matches perf: 0.0000s / matches: 1
matches perf: 0.0000s / matches: 1
matches perf: 0.0004s / matches: 29
matches perf: 0.0001s / matches: 2
matches perf: 0.0003s / matches: 14
matches perf: 0.0000s / matches: 2
matches perf: 0.0020s / matches: 20
Ah found the thing which takes time is the return statement.
tuc = time.perf_counter()
print(f"before list(map): {tuc - tic:0.8f}s")
thingtoreturn = list(map(BuiltinFunction.fromString, functionStrings))
tec = time.perf_counter()
print(f"before return: {tec - tic:0.8f}s")
return thingtoreturn
We get
----
matches perf: 0.00086213s / matches: 6
after for loop: 0.00204904s
before list(map): 0.00206413s
before return: 0.10147529s
----
matches perf: 0.00015438s / matches: 2
after for loop: 0.00035775s
before list(map): 0.00036813s
before return: 0.01159988s
----
matches perf: 0.00195908s / matches: 34
after for loop: 0.00354625s
before list(map): 0.00356558s
before return: 0.17573083s
----
matches perf: 0.00039850s / matches: 8
after for loop: 0.00065675s
before list(map): 0.00066321s
before return: 0.03363075s
----
matches perf: 0.00006025s / matches: 3
after for loop: 0.00013025s
before list(map): 0.00013296s
before return: 0.00832750s
----
matches perf: 0.00067088s / matches: 21
after for loop: 0.00126525s
before list(map): 0.00127637s
before return: 0.08663021s
----
matches perf: 0.00000875s / matches: 1
after for loop: 0.00003654s
before list(map): 0.00004133s
before return: 0.00020388s
----
matches perf: 0.00005858s / matches: 3
after for loop: 0.00018046s
before list(map): 0.00020887s
before return: 0.00143387s
----
matches perf: 0.00019867s / matches: 4
after for loop: 0.00035454s
before list(map): 0.00036325s
before return: 0.00981967s
----
matches perf: 0.00001000s / matches: 1
after for loop: 0.00002362s
before list(map): 0.00002583s
before return: 0.00036204s
----
matches perf: 0.00005350s / matches: 1
after for loop: 0.00012946s
before list(map): 0.00013804s
before return: 0.00283438s
----
matches perf: 0.00006054s / matches: 2
after for loop: 0.00014371s
before list(map): 0.00014733s
before return: 0.01643325s
----
matches perf: 0.00015025s / matches: 4
after for loop: 0.00038112s
before list(map): 0.00039075s
before return: 0.00595804s
----
matches perf: 0.00002438s / matches: 1
after for loop: 0.00005275s
before list(map): 0.00005858s
before return: 0.00020354s
----
matches perf: 0.00006971s / matches: 1
after for loop: 0.00013025s
before list(map): 0.00013554s
before return: 0.00682037s
----
matches perf: 0.00004017s / matches: 2
after for loop: 0.00008846s
before list(map): 0.00009133s
before return: 0.00295483s
----
matches perf: 0.00002750s / matches: 1
after for loop: 0.00007175s
before list(map): 0.00007696s
before return: 0.00219413s
----
matches perf: 0.00042275s / matches: 20
after for loop: 0.00098187s
before list(map): 0.00098813s
before return: 0.05855404s
----
matches perf: 0.00001100s / matches: 1
after for loop: 0.00005625s
before list(map): 0.00006233s
before return: 0.00034088s
----
matches perf: 0.00013438s / matches: 2
after for loop: 0.00021438s
before list(map): 0.00021813s
before return: 0.00783958s
----
matches perf: 0.00047179s / matches: 18
after for loop: 0.00107554s
before list(map): 0.00108154s
before return: 0.08648308s
----
matches perf: 0.00065763s / matches: 29
after for loop: 0.00149079s
before list(map): 0.00151117s
before return: 0.07941017s
----
matches perf: 0.00006600s / matches: 3
after for loop: 0.00018075s
before list(map): 0.00020383s
before return: 0.00815983s
----
matches perf: 0.00017421s / matches: 5
after for loop: 0.00040908s
before list(map): 0.00042392s
before return: 0.02092254s
----
matches perf: 0.00005058s / matches: 4
after for loop: 0.00010371s
before list(map): 0.00011133s
before return: 0.00197779s
----
matches perf: 0.00065625s / matches: 13
after for loop: 0.00158467s
before list(map): 0.00159621s
before return: 0.21502942s
----
matches perf: 0.00005500s / matches: 1
after for loop: 0.00012642s
before list(map): 0.00012967s
before return: 0.00934592s
----
matches perf: 0.00003154s / matches: 2
after for loop: 0.00007296s
before list(map): 0.00007908s
before return: 0.00186450s
----
matches perf: 0.00027429s / matches: 9
after for loop: 0.00065712s
before list(map): 0.00066079s
before return: 0.03788817s
----
matches perf: 0.00006625s / matches: 4
after for loop: 0.00014829s
before list(map): 0.00015296s
before return: 0.00489046s
----
matches perf: 0.00003221s / matches: 1
after for loop: 0.00008021s
before list(map): 0.00008400s
before return: 0.00396992s
----
matches perf: 0.00005129s / matches: 1
after for loop: 0.00012296s
before list(map): 0.00012596s
before return: 0.00840154s
----
matches perf: 0.00042504s / matches: 29
after for loop: 0.00095896s
before list(map): 0.00096475s
before return: 0.03761792s
----
matches perf: 0.00010217s / matches: 2
after for loop: 0.00024237s
before list(map): 0.00024500s
before return: 0.02967329s
----
matches perf: 0.00025683s / matches: 14
after for loop: 0.00061087s
before list(map): 0.00061604s
before return: 0.02288504s
----
matches perf: 0.00001508s / matches: 2
after for loop: 0.00003850s
before list(map): 0.00004042s
before return: 0.00047821s
----
matches perf: 0.00228304s / matches: 20
after for loop: 0.00531071s
before list(map): 0.00531458s
before return: 15.74802542s
So the
list(map(BuiltinFunction.fromString, functionStrings))
is the thing to improve.
Karl Dubost
diving a bit more in the rabbit hole.
import time
print(f"----")
tic = time.perf_counter()
themap = map(BuiltinFunction.fromString, functionStrings)
toc = time.perf_counter()
print(f"how long for map: {toc - tic:0.8f}s")
tec = time.perf_counter()
thingtoreturn = list(themap)
tuc = time.perf_counter()
print(f"how long for list: {tuc - tec:0.8f}s")
return thingtoreturn
map(BuiltinFunction.fromString, functionStrings) is very fast.
but list() is the slow bit.
----
how long for map: 0.00000067s
how long for list: 0.04454333s
----
how long for map: 0.00000054s
how long for list: 0.00522896s
----
how long for map: 0.00000025s
how long for list: 0.10435229s
----
how long for map: 0.00000008s
how long for list: 0.02742804s
----
how long for map: 0.00000029s
how long for list: 0.00610137s
----
how long for map: 0.00000008s
how long for list: 0.05791729s
----
how long for map: 0.00000013s
how long for list: 0.00007237s
----
how long for map: 0.00000013s
how long for list: 0.00085775s
----
how long for map: 0.00000017s
how long for list: 0.00789221s
----
how long for map: 0.00000008s
how long for list: 0.00032267s
----
how long for map: 0.00000038s
how long for list: 0.00170508s
----
how long for map: 0.00000021s
how long for list: 0.01071083s
----
how long for map: 0.00000017s
how long for list: 0.00450279s
----
how long for map: 0.00000017s
how long for list: 0.00004983s
----
how long for map: 0.00000029s
how long for list: 0.00536333s
----
how long for map: 0.00000017s
how long for list: 0.00225646s
----
how long for map: 0.00000008s
how long for list: 0.00150821s
----
how long for map: 0.00000008s
how long for list: 0.05064929s
----
how long for map: 0.00000029s
how long for list: 0.00026858s
----
how long for map: 0.00000013s
how long for list: 0.00655154s
----
how long for map: 0.00000025s
how long for list: 0.07719392s
----
how long for map: 0.00000017s
how long for list: 0.07323875s
----
how long for map: 0.00000008s
how long for list: 0.00774633s
----
how long for map: 0.00000017s
how long for list: 0.01983171s
----
how long for map: 0.00000017s
how long for list: 0.00177975s
----
how long for map: 0.00000021s
how long for list: 0.20562171s
----
how long for map: 0.00000063s
how long for list: 0.00885329s
----
how long for map: 0.00000054s
how long for list: 0.00178362s
----
how long for map: 0.00000033s
how long for list: 0.03592921s
----
how long for map: 0.00000038s
how long for list: 0.00455142s
----
how long for map: 0.00000017s
how long for list: 0.00376550s
----
how long for map: 0.00000008s
how long for list: 0.00770092s
----
how long for map: 0.00000008s
how long for list: 0.03476367s
----
how long for map: 0.00000033s
how long for list: 0.02836421s
----
how long for map: 0.00000021s
how long for list: 0.02175638s
----
how long for map: 0.00000008s
how long for list: 0.00043421s
----
how long for map: 0.00000017s
how long for list: 15.65461208s
Karl Dubost
map is an iterator, list will convert it to a real list, but it does it one at a time.
Maybe the next step to try would be to replace the map with a list comprehension
Nope… it's no better.
In
list(map(BuiltinFunction.fromString, functionStrings))
the functionStrings iterator for InjectedScriptSource.js
has only 3 items, two short and the last one very long.
[418, 254, 57055]
The two firsts and the beginning of the third one which is a long list of functions.
######################
@linkTimeConstant
function createObjectWithoutPrototype(/**/)
{
if (arguments.length % 2 !== 0)
@throwRangeError("`createObjectWithoutPrototype` requires that each key argument be followed by a value, resulting in an even number of arguments.");
let object = @Object.@create(null)
for (let i = 0; i < arguments.length; i += 2)
object[arguments[i]] = arguments[i + 1];
return object;
}
######################
@linkTimeConstant
function createArrayWithoutPrototype(/**/)
{
let array = new @Array(arguments.length);
array.__proto__ = null;
for (let i = 0; i < arguments.length; ++i)
@arrayPush(array, arguments[i]);
return array;
}
######################
@linkTimeConstant
@visibility=PrivateRecursive
function createInspectorInjectedScript(InjectedScriptHost, inspectedGlobalObject, injectedScriptId)
{
function PrototypelessObjectBase() {}
PrototypelessObjectBase.prototype = null;
function toString(obj)
{
return @String(obj);
}
function toStringDescription(obj)
{
if (obj === 0 && 1 / obj < 0)
return "-0";
if (isBigInt(obj))
return toString(obj) + "n";
return toString(obj);
}
etc…
Karl Dubost
I wonder if it's because of the nesting of functions.
```
function createInspectorInjectedScript(InjectedScriptHost, inspectedGlobalObject, injectedScriptId)
{
function PrototypelessObjectBase() {}
… a lot of code
return injectedScript;
}
```
Karl Dubost
next step would be to checkout what `BuiltinFunction.fromString` is doing for this
function createInspectorInjectedScript(InjectedScriptHost, inspectedGlobalObject, injectedScriptId) {…}
Because this is basically the issue.
Karl Dubost
This patch for Bug 152294 might have introduced the issue: https://github.com/WebKit/WebKit/commit/1d75ec6c3ee1575325c7af691084fb59a6e7587c#diff-226d8e95d34f44f09530c7ad6abf681a6cc386fe0b0f3884b53e0ae85afae049R56-R57
Karl Dubost
If I remove the wrapper, the gain in speed is huge.
The wrapper is:
function createInspectorInjectedScript(InjectedScriptHost, inspectedGlobalObject, injectedScriptId)
{ }
so with Source/JavaScriptCore/Scripts/wkbuiltins/builtins_model.py
functionStrings = [text[start:end].strip() for (start, end) in functionBounds]
import time
print(f"----")
tic = time.perf_counter()
thingtoreturn = list(map(BuiltinFunction.fromString, functionStrings))
toc = time.perf_counter()
print(f"how long for BuiltinFunction.fromString: {toc - tic:0.8f}s")
return thingtoreturn
% python3 Source/JavaScriptCore/Scripts/generate-js-builtins.py --framework JavaScriptCore --output-directory /tmp/ --combined Source/JavaScriptCore/inspector/InjectedScriptSource.js
how long for BuiltinFunction.fromString:
with wrapper: 15.52224183s
without wrapper: 0.08558171s
Karl Dubost
timing the different regex inside
BuiltinFunction.fromString
% python3 Source/JavaScriptCore/Scripts/generate-js-builtins.py --framework JavaScriptCore --output-directory /tmp/ --combined Source/JavaScriptCore/inspector/InjectedScriptSource.js
--------
Inside fromString
multilineCommentRegExp: 0.00000183s
functionIntrinsicRegExp: 0.00029496s
functionOverriddenNameRegExp: 0.00030350s
functionNameRegExp: 0.00000129s
function_name: createObjectWithoutPrototype
functionVisibilityRegExp: 0.00030337s
--------
Inside fromString
multilineCommentRegExp: 0.00000067s
functionIntrinsicRegExp: 0.00011783s
functionOverriddenNameRegExp: 0.00011892s
functionNameRegExp: 0.00000079s
function_name: createArrayWithoutPrototype
functionVisibilityRegExp: 0.00011775s
--------
Inside fromString
multilineCommentRegExp: 0.00001742s
functionIntrinsicRegExp: 7.45201550s
functionOverriddenNameRegExp: 5.22347075s
functionNameRegExp: 0.00006025s
function_name: createInspectorInjectedScript
functionVisibilityRegExp: 0.00018467s
how long for BuiltinFunction.fromString: 17.81850321s
These are the two RegExp which are not performant.
functionIntrinsicRegExp: 7.45201550s
functionOverriddenNameRegExp: 5.22347075s
functionIntrinsicRegExp = re.compile(r".*^@intrinsic=(\w+)", re.MULTILINE | re.DOTALL)
functionOverriddenNameRegExp = re.compile(r".*^@overriddenName=(\".+\")$", re.MULTILINE | re.DOTALL)
Karl Dubost
Removing the wrapper function in InjectedScriptSource.js
and suddenly the performances are good.
Inside fromString
multilineCommentRegExp: 0.00000067s
* functionIntrinsicRegExp: 0.00251513s
* functionOverriddenNameRegExp: 0.00258671s
functionNameRegExp: 0.00000108s
function_name: CommandLineAPI
functionVisibilityRegExp: 0.00245400s
Karl Dubost
As a matter of fact
There is no
@intrinsic
@overriddenName
in InjectedScriptSource.js
I have a fix, which reduces the full time to 6s.
So there's another regex not performant. Let see.
Karl Dubost
Ok the 3rd one is the nested functionVisibilityRegExp
# This is ok. No issue
visibilityMatch = functionVisibilityRegExp.search(function_source)
if visibilityMatch:
visibility = visibilityMatch.group(1)
# This one is costly
function_source = functionVisibilityRegExp.sub("", function_source)
elif is_link_time_constant:
visibility = "Private"
Profiling the code:
Full Time of execution:
33.5% functionIntrinsicRegExp = re.compile(r".*^@intrinsic=(\w+)", re.MULTILINE | re.DOTALL)
32.8% functionOverriddenNameRegExp = re.compile(r".*^@overriddenName=(\".+\")$", re.MULTILINE | re.DOTALL)
33.7% functionVisibilityRegExp = re.compile(r".*^@visibility=(\w+)", re.MULTILINE | re.DOTALL)
Let's fix this :)
Karl Dubost
Pull request: https://github.com/WebKit/WebKit/pull/26777
Karl Dubost
I have a better patch.
main:
21.63s user 0.04s system 99% cpu 21.710 total
my second patch:
0.10s user 0.03s system 84% cpu 0.157 total
:) There's one little fix to check to be extra-sure.
EWS
Committed 277085@main (2a64b0eff0e3): <https://commits.webkit.org/277085@main>
Reviewed commits have been landed. Closing PR #26777 and removing active labels.
Philippe Normand
Thanks for fixing this, Karl!