Summary: | Web Inspector: Generating JS builtins for createInspectorInjectedScript is very slow | ||
---|---|---|---|
Product: | WebKit | Reporter: | Philippe Normand <philn> |
Component: | Web Inspector | Assignee: | Karl Dubost <karlcow> |
Status: | RESOLVED FIXED | ||
Severity: | Normal | CC: | hi, inspector-bugzilla-changes, karlcow, qianlangchen, webkit-bug-importer |
Priority: | P2 | Keywords: | InRadar |
Version: | WebKit Nightly Build | ||
Hardware: | All | ||
OS: | All | ||
Bug Depends on: | 152294 | ||
Bug Blocks: |
Description
Philippe Normand
2024-03-24 09:15:10 PDT
time python Source/JavaScriptCore/Scripts/generate-js-builtins.py --framework JavaScriptCore --output-directory /tmp/ --combined Source/JavaScriptCore/builtins/*.js Source/JavaSc riptCore/inspector/InjectedScriptSource.js ________________________________________________________ Executed in 43.86 secs fish external usr time 43.68 secs 394.00 micros 43.68 secs sys time 0.05 secs 91.00 micros 0.05 secs https://searchfox.org/wubkat/rev/473ca5f8512b88edd7e82c8783e7e09158f17ba1/Source/JavaScriptCore/Scripts/wkbuiltins/builtins_model.py#314-373 This is calling matches = [func for func in functionHeadRegExp.finditer(text)] which is functionHeadRegExp = re.compile(r"(?:@[\w|=\[\] \"\.]+\s*\n)*(?:async\s+)?function\s+\w+\s*\(.*?\)", re.MULTILINE | re.DOTALL) which then ends up in another for loop: for match in matches: The latest big change on this code was done on May 2022 https://bugs.webkit.org/show_bug.cgi?id=152294 https://github.com/WebKit/WebKit/commit/1d75ec6c3ee1575325c7af691084fb59a6e7587c I wonder about this too. https://searchfox.org/wubkat/rev/473ca5f8512b88edd7e82c8783e7e09158f17ba1/Source/JavaScriptCore/Scripts/wkbuiltins/builtins_model.py#163 TLDR As a starting reference, on my laptop, I get: 16.54s user 0.04s system 99% cpu 16.715 total but most of that time is used by one file: InjectedScriptSource.js, if I remove it I get: 0.93s user 0.02s system 97% cpu 0.973 total More details: _parse_functions is relatively fast for most of the iterations, but indeed InjectedScriptSource.js is one of the bottlenecks. Large file. This is used here. https://searchfox.org/wubkat/rev/473ca5f8512b88edd7e82c8783e7e09158f17ba1/Source/JavaScriptCore/Scripts/wkbuiltins/builtins_model.py#215-217 It chokes only on one iteration. Replacing the code by: import time tic = time.perf_counter() parsed_functions = self._parse_functions(text) toc = time.perf_counter() print(f"_parse_functions: {toc - tic:0.4f}s / {filename} / {len(text)} chars") The number of matches is given by length(matches) on matches = [func for func in functionHeadRegExp.finditer(text)] nb of matches: 6 / _parse_functions: 0.0461s / ArrayConstructor.js / 7841 chars nb of matches: 2 / _parse_functions: 0.0053s / ArrayIteratorPrototype.js / 3003 chars nb of matches: 34 / _parse_functions: 0.1062s / ArrayPrototype.js / 25784 chars nb of matches: 8 / _parse_functions: 0.0278s / AsyncFromSyncIteratorPrototype.js / 6395 chars nb of matches: 3 / _parse_functions: 0.0063s / AsyncFunctionPrototype.js / 2819 chars nb of matches: 21 / _parse_functions: 0.0587s / AsyncGeneratorPrototype.js / 11595 chars nb of matches: 1 / _parse_functions: 0.0001s / AsyncIteratorPrototype.js / 1460 chars nb of matches: 3 / _parse_functions: 0.0009s / FunctionPrototype.js / 2042 chars nb of matches: 4 / _parse_functions: 0.0081s / GeneratorPrototype.js / 3878 chars nb of matches: 1 / _parse_functions: 0.0003s / GlobalObject.js / 1676 chars nb of matches: 1 / _parse_functions: 0.0017s / GlobalOperations.js / 1985 chars nb of matches: 2 / _parse_functions: 0.0109s / InternalPromiseConstructor.js / 3159 chars nb of matches: 4 / _parse_functions: 0.0046s / IteratorHelpers.js / 3598 chars nb of matches: 1 / _parse_functions: 0.0001s / IteratorPrototype.js / 1447 chars nb of matches: 1 / _parse_functions: 0.0056s / MapConstructor.js / 2614 chars nb of matches: 2 / _parse_functions: 0.0024s / MapIteratorPrototype.js / 2331 chars nb of matches: 1 / _parse_functions: 0.0016s / MapPrototype.js / 1902 chars nb of matches: 20 / _parse_functions: 0.0521s / ModuleLoader.js / 24361 chars nb of matches: 1 / _parse_functions: 0.0003s / NumberConstructor.js / 1588 chars nb of matches: 2 / _parse_functions: 0.0066s / ObjectConstructor.js / 3223 chars nb of matches: 18 / _parse_functions: 0.0780s / PromiseConstructor.js / 13018 chars nb of matches: 29 / _parse_functions: 0.0746s / PromiseOperations.js / 21115 chars nb of matches: 3 / _parse_functions: 0.0078s / PromisePrototype.js / 3204 chars nb of matches: 5 / _parse_functions: 0.0203s / ProxyHelpers.js / 5748 chars nb of matches: 4 / _parse_functions: 0.0018s / ReflectObject.js / 2801 chars nb of matches: 13 / _parse_functions: 0.2080s / RegExpPrototype.js / 22590 chars nb of matches: 1 / _parse_functions: 0.0091s / RegExpStringIteratorPrototype.js / 2694 chars nb of matches: 2 / _parse_functions: 0.0018s / SetIteratorPrototype.js / 2219 chars nb of matches: 9 / _parse_functions: 0.0362s / SetPrototype.js / 9197 chars nb of matches: 4 / _parse_functions: 0.0049s / ShadowRealmPrototype.js / 3285 chars nb of matches: 1 / _parse_functions: 0.0038s / StringConstructor.js / 2200 chars nb of matches: 1 / _parse_functions: 0.0082s / StringIteratorPrototype.js / 2588 chars nb of matches: 29 / _parse_functions: 0.0356s / StringPrototype.js / 13145 chars nb of matches: 2 / _parse_functions: 0.0283s / TypedArrayConstructor.js / 4428 chars nb of matches: 14 / _parse_functions: 0.0224s / TypedArrayPrototype.js / 9332 chars nb of matches: 2 / _parse_functions: 0.0005s / WebAssembly.js / 1725 chars nb of matches: 20 / _parse_functions: 15.9046s / InjectedScriptSource.js / 62229 chars Let's see inside: matches = [func for func in functionHeadRegExp.finditer(text)] is not the bottleneck. matches perf: 0.0002s / matches: 6 matches perf: 0.0001s / matches: 2 matches perf: 0.0010s / matches: 34 matches perf: 0.0002s / matches: 8 matches perf: 0.0001s / matches: 3 matches perf: 0.0004s / matches: 21 matches perf: 0.0000s / matches: 1 matches perf: 0.0000s / matches: 3 matches perf: 0.0001s / matches: 4 matches perf: 0.0000s / matches: 1 matches perf: 0.0000s / matches: 1 matches perf: 0.0001s / matches: 2 matches perf: 0.0001s / matches: 4 matches perf: 0.0000s / matches: 1 matches perf: 0.0000s / matches: 1 matches perf: 0.0000s / matches: 2 matches perf: 0.0000s / matches: 1 matches perf: 0.0004s / matches: 20 matches perf: 0.0000s / matches: 1 matches perf: 0.0001s / matches: 2 matches perf: 0.0004s / matches: 18 matches perf: 0.0007s / matches: 29 matches perf: 0.0001s / matches: 3 matches perf: 0.0002s / matches: 5 matches perf: 0.0000s / matches: 4 matches perf: 0.0006s / matches: 13 matches perf: 0.0001s / matches: 1 matches perf: 0.0000s / matches: 2 matches perf: 0.0003s / matches: 9 matches perf: 0.0001s / matches: 4 matches perf: 0.0000s / matches: 1 matches perf: 0.0000s / matches: 1 matches perf: 0.0004s / matches: 29 matches perf: 0.0001s / matches: 2 matches perf: 0.0003s / matches: 14 matches perf: 0.0000s / matches: 2 matches perf: 0.0020s / matches: 20 Ah found the thing which takes time is the return statement. tuc = time.perf_counter() print(f"before list(map): {tuc - tic:0.8f}s") thingtoreturn = list(map(BuiltinFunction.fromString, functionStrings)) tec = time.perf_counter() print(f"before return: {tec - tic:0.8f}s") return thingtoreturn We get ---- matches perf: 0.00086213s / matches: 6 after for loop: 0.00204904s before list(map): 0.00206413s before return: 0.10147529s ---- matches perf: 0.00015438s / matches: 2 after for loop: 0.00035775s before list(map): 0.00036813s before return: 0.01159988s ---- matches perf: 0.00195908s / matches: 34 after for loop: 0.00354625s before list(map): 0.00356558s before return: 0.17573083s ---- matches perf: 0.00039850s / matches: 8 after for loop: 0.00065675s before list(map): 0.00066321s before return: 0.03363075s ---- matches perf: 0.00006025s / matches: 3 after for loop: 0.00013025s before list(map): 0.00013296s before return: 0.00832750s ---- matches perf: 0.00067088s / matches: 21 after for loop: 0.00126525s before list(map): 0.00127637s before return: 0.08663021s ---- matches perf: 0.00000875s / matches: 1 after for loop: 0.00003654s before list(map): 0.00004133s before return: 0.00020388s ---- matches perf: 0.00005858s / matches: 3 after for loop: 0.00018046s before list(map): 0.00020887s before return: 0.00143387s ---- matches perf: 0.00019867s / matches: 4 after for loop: 0.00035454s before list(map): 0.00036325s before return: 0.00981967s ---- matches perf: 0.00001000s / matches: 1 after for loop: 0.00002362s before list(map): 0.00002583s before return: 0.00036204s ---- matches perf: 0.00005350s / matches: 1 after for loop: 0.00012946s before list(map): 0.00013804s before return: 0.00283438s ---- matches perf: 0.00006054s / matches: 2 after for loop: 0.00014371s before list(map): 0.00014733s before return: 0.01643325s ---- matches perf: 0.00015025s / matches: 4 after for loop: 0.00038112s before list(map): 0.00039075s before return: 0.00595804s ---- matches perf: 0.00002438s / matches: 1 after for loop: 0.00005275s before list(map): 0.00005858s before return: 0.00020354s ---- matches perf: 0.00006971s / matches: 1 after for loop: 0.00013025s before list(map): 0.00013554s before return: 0.00682037s ---- matches perf: 0.00004017s / matches: 2 after for loop: 0.00008846s before list(map): 0.00009133s before return: 0.00295483s ---- matches perf: 0.00002750s / matches: 1 after for loop: 0.00007175s before list(map): 0.00007696s before return: 0.00219413s ---- matches perf: 0.00042275s / matches: 20 after for loop: 0.00098187s before list(map): 0.00098813s before return: 0.05855404s ---- matches perf: 0.00001100s / matches: 1 after for loop: 0.00005625s before list(map): 0.00006233s before return: 0.00034088s ---- matches perf: 0.00013438s / matches: 2 after for loop: 0.00021438s before list(map): 0.00021813s before return: 0.00783958s ---- matches perf: 0.00047179s / matches: 18 after for loop: 0.00107554s before list(map): 0.00108154s before return: 0.08648308s ---- matches perf: 0.00065763s / matches: 29 after for loop: 0.00149079s before list(map): 0.00151117s before return: 0.07941017s ---- matches perf: 0.00006600s / matches: 3 after for loop: 0.00018075s before list(map): 0.00020383s before return: 0.00815983s ---- matches perf: 0.00017421s / matches: 5 after for loop: 0.00040908s before list(map): 0.00042392s before return: 0.02092254s ---- matches perf: 0.00005058s / matches: 4 after for loop: 0.00010371s before list(map): 0.00011133s before return: 0.00197779s ---- matches perf: 0.00065625s / matches: 13 after for loop: 0.00158467s before list(map): 0.00159621s before return: 0.21502942s ---- matches perf: 0.00005500s / matches: 1 after for loop: 0.00012642s before list(map): 0.00012967s before return: 0.00934592s ---- matches perf: 0.00003154s / matches: 2 after for loop: 0.00007296s before list(map): 0.00007908s before return: 0.00186450s ---- matches perf: 0.00027429s / matches: 9 after for loop: 0.00065712s before list(map): 0.00066079s before return: 0.03788817s ---- matches perf: 0.00006625s / matches: 4 after for loop: 0.00014829s before list(map): 0.00015296s before return: 0.00489046s ---- matches perf: 0.00003221s / matches: 1 after for loop: 0.00008021s before list(map): 0.00008400s before return: 0.00396992s ---- matches perf: 0.00005129s / matches: 1 after for loop: 0.00012296s before list(map): 0.00012596s before return: 0.00840154s ---- matches perf: 0.00042504s / matches: 29 after for loop: 0.00095896s before list(map): 0.00096475s before return: 0.03761792s ---- matches perf: 0.00010217s / matches: 2 after for loop: 0.00024237s before list(map): 0.00024500s before return: 0.02967329s ---- matches perf: 0.00025683s / matches: 14 after for loop: 0.00061087s before list(map): 0.00061604s before return: 0.02288504s ---- matches perf: 0.00001508s / matches: 2 after for loop: 0.00003850s before list(map): 0.00004042s before return: 0.00047821s ---- matches perf: 0.00228304s / matches: 20 after for loop: 0.00531071s before list(map): 0.00531458s before return: 15.74802542s So the list(map(BuiltinFunction.fromString, functionStrings)) is the thing to improve. diving a bit more in the rabbit hole. import time print(f"----") tic = time.perf_counter() themap = map(BuiltinFunction.fromString, functionStrings) toc = time.perf_counter() print(f"how long for map: {toc - tic:0.8f}s") tec = time.perf_counter() thingtoreturn = list(themap) tuc = time.perf_counter() print(f"how long for list: {tuc - tec:0.8f}s") return thingtoreturn map(BuiltinFunction.fromString, functionStrings) is very fast. but list() is the slow bit. ---- how long for map: 0.00000067s how long for list: 0.04454333s ---- how long for map: 0.00000054s how long for list: 0.00522896s ---- how long for map: 0.00000025s how long for list: 0.10435229s ---- how long for map: 0.00000008s how long for list: 0.02742804s ---- how long for map: 0.00000029s how long for list: 0.00610137s ---- how long for map: 0.00000008s how long for list: 0.05791729s ---- how long for map: 0.00000013s how long for list: 0.00007237s ---- how long for map: 0.00000013s how long for list: 0.00085775s ---- how long for map: 0.00000017s how long for list: 0.00789221s ---- how long for map: 0.00000008s how long for list: 0.00032267s ---- how long for map: 0.00000038s how long for list: 0.00170508s ---- how long for map: 0.00000021s how long for list: 0.01071083s ---- how long for map: 0.00000017s how long for list: 0.00450279s ---- how long for map: 0.00000017s how long for list: 0.00004983s ---- how long for map: 0.00000029s how long for list: 0.00536333s ---- how long for map: 0.00000017s how long for list: 0.00225646s ---- how long for map: 0.00000008s how long for list: 0.00150821s ---- how long for map: 0.00000008s how long for list: 0.05064929s ---- how long for map: 0.00000029s how long for list: 0.00026858s ---- how long for map: 0.00000013s how long for list: 0.00655154s ---- how long for map: 0.00000025s how long for list: 0.07719392s ---- how long for map: 0.00000017s how long for list: 0.07323875s ---- how long for map: 0.00000008s how long for list: 0.00774633s ---- how long for map: 0.00000017s how long for list: 0.01983171s ---- how long for map: 0.00000017s how long for list: 0.00177975s ---- how long for map: 0.00000021s how long for list: 0.20562171s ---- how long for map: 0.00000063s how long for list: 0.00885329s ---- how long for map: 0.00000054s how long for list: 0.00178362s ---- how long for map: 0.00000033s how long for list: 0.03592921s ---- how long for map: 0.00000038s how long for list: 0.00455142s ---- how long for map: 0.00000017s how long for list: 0.00376550s ---- how long for map: 0.00000008s how long for list: 0.00770092s ---- how long for map: 0.00000008s how long for list: 0.03476367s ---- how long for map: 0.00000033s how long for list: 0.02836421s ---- how long for map: 0.00000021s how long for list: 0.02175638s ---- how long for map: 0.00000008s how long for list: 0.00043421s ---- how long for map: 0.00000017s how long for list: 15.65461208s map is an iterator, list will convert it to a real list, but it does it one at a time. Maybe the next step to try would be to replace the map with a list comprehension Nope… it's no better. In list(map(BuiltinFunction.fromString, functionStrings)) the functionStrings iterator for InjectedScriptSource.js has only 3 items, two short and the last one very long. [418, 254, 57055] The two firsts and the beginning of the third one which is a long list of functions. ###################### @linkTimeConstant function createObjectWithoutPrototype(/**/) { if (arguments.length % 2 !== 0) @throwRangeError("`createObjectWithoutPrototype` requires that each key argument be followed by a value, resulting in an even number of arguments."); let object = @Object.@create(null) for (let i = 0; i < arguments.length; i += 2) object[arguments[i]] = arguments[i + 1]; return object; } ###################### @linkTimeConstant function createArrayWithoutPrototype(/**/) { let array = new @Array(arguments.length); array.__proto__ = null; for (let i = 0; i < arguments.length; ++i) @arrayPush(array, arguments[i]); return array; } ###################### @linkTimeConstant @visibility=PrivateRecursive function createInspectorInjectedScript(InjectedScriptHost, inspectedGlobalObject, injectedScriptId) { function PrototypelessObjectBase() {} PrototypelessObjectBase.prototype = null; function toString(obj) { return @String(obj); } function toStringDescription(obj) { if (obj === 0 && 1 / obj < 0) return "-0"; if (isBigInt(obj)) return toString(obj) + "n"; return toString(obj); } etc… I wonder if it's because of the nesting of functions. ``` function createInspectorInjectedScript(InjectedScriptHost, inspectedGlobalObject, injectedScriptId) { function PrototypelessObjectBase() {} … a lot of code return injectedScript; } ``` next step would be to checkout what `BuiltinFunction.fromString` is doing for this function createInspectorInjectedScript(InjectedScriptHost, inspectedGlobalObject, injectedScriptId) {…} Because this is basically the issue. This patch for Bug 152294 might have introduced the issue: https://github.com/WebKit/WebKit/commit/1d75ec6c3ee1575325c7af691084fb59a6e7587c#diff-226d8e95d34f44f09530c7ad6abf681a6cc386fe0b0f3884b53e0ae85afae049R56-R57 If I remove the wrapper, the gain in speed is huge. The wrapper is: function createInspectorInjectedScript(InjectedScriptHost, inspectedGlobalObject, injectedScriptId) { } so with Source/JavaScriptCore/Scripts/wkbuiltins/builtins_model.py functionStrings = [text[start:end].strip() for (start, end) in functionBounds] import time print(f"----") tic = time.perf_counter() thingtoreturn = list(map(BuiltinFunction.fromString, functionStrings)) toc = time.perf_counter() print(f"how long for BuiltinFunction.fromString: {toc - tic:0.8f}s") return thingtoreturn % python3 Source/JavaScriptCore/Scripts/generate-js-builtins.py --framework JavaScriptCore --output-directory /tmp/ --combined Source/JavaScriptCore/inspector/InjectedScriptSource.js how long for BuiltinFunction.fromString: with wrapper: 15.52224183s without wrapper: 0.08558171s timing the different regex inside BuiltinFunction.fromString % python3 Source/JavaScriptCore/Scripts/generate-js-builtins.py --framework JavaScriptCore --output-directory /tmp/ --combined Source/JavaScriptCore/inspector/InjectedScriptSource.js -------- Inside fromString multilineCommentRegExp: 0.00000183s functionIntrinsicRegExp: 0.00029496s functionOverriddenNameRegExp: 0.00030350s functionNameRegExp: 0.00000129s function_name: createObjectWithoutPrototype functionVisibilityRegExp: 0.00030337s -------- Inside fromString multilineCommentRegExp: 0.00000067s functionIntrinsicRegExp: 0.00011783s functionOverriddenNameRegExp: 0.00011892s functionNameRegExp: 0.00000079s function_name: createArrayWithoutPrototype functionVisibilityRegExp: 0.00011775s -------- Inside fromString multilineCommentRegExp: 0.00001742s functionIntrinsicRegExp: 7.45201550s functionOverriddenNameRegExp: 5.22347075s functionNameRegExp: 0.00006025s function_name: createInspectorInjectedScript functionVisibilityRegExp: 0.00018467s how long for BuiltinFunction.fromString: 17.81850321s These are the two RegExp which are not performant. functionIntrinsicRegExp: 7.45201550s functionOverriddenNameRegExp: 5.22347075s functionIntrinsicRegExp = re.compile(r".*^@intrinsic=(\w+)", re.MULTILINE | re.DOTALL) functionOverriddenNameRegExp = re.compile(r".*^@overriddenName=(\".+\")$", re.MULTILINE | re.DOTALL) Removing the wrapper function in InjectedScriptSource.js and suddenly the performances are good. Inside fromString multilineCommentRegExp: 0.00000067s * functionIntrinsicRegExp: 0.00251513s * functionOverriddenNameRegExp: 0.00258671s functionNameRegExp: 0.00000108s function_name: CommandLineAPI functionVisibilityRegExp: 0.00245400s As a matter of fact There is no @intrinsic @overriddenName in InjectedScriptSource.js I have a fix, which reduces the full time to 6s. So there's another regex not performant. Let see. Ok the 3rd one is the nested functionVisibilityRegExp # This is ok. No issue visibilityMatch = functionVisibilityRegExp.search(function_source) if visibilityMatch: visibility = visibilityMatch.group(1) # This one is costly function_source = functionVisibilityRegExp.sub("", function_source) elif is_link_time_constant: visibility = "Private" Profiling the code: Full Time of execution: 33.5% functionIntrinsicRegExp = re.compile(r".*^@intrinsic=(\w+)", re.MULTILINE | re.DOTALL) 32.8% functionOverriddenNameRegExp = re.compile(r".*^@overriddenName=(\".+\")$", re.MULTILINE | re.DOTALL) 33.7% functionVisibilityRegExp = re.compile(r".*^@visibility=(\w+)", re.MULTILINE | re.DOTALL) Let's fix this :) Pull request: https://github.com/WebKit/WebKit/pull/26777 I have a better patch. main: 21.63s user 0.04s system 99% cpu 21.710 total my second patch: 0.10s user 0.03s system 84% cpu 0.157 total :) There's one little fix to check to be extra-sure. Committed 277085@main (2a64b0eff0e3): <https://commits.webkit.org/277085@main> Reviewed commits have been landed. Closing PR #26777 and removing active labels. Thanks for fixing this, Karl! |