Bug 26316

Summary: 100% CPU hang for several seconds on large file syntax highlight.
Product: WebKit Reporter: Pavel Feldman <pfeldman>
Component: Web Inspector (Deprecated)Assignee: Nobody <webkit-unassigned>
Status: RESOLVED WORKSFORME    
Severity: Normal CC: apavlov, gsherloc, joepeck, keishi, pmuellr, rik, timothy
Priority: P3 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
sample
none
Use web Worker for syntax highlight none

Description Pavel Feldman 2009-06-11 05:00:59 PDT
1. Open slashdot.org in a new tab (make sure you get a new process)
2. Open inspector
3. Switch to resources tab
4. Click on all-minified.js
5. Click on idlecore-tidied.css

Expected: focus is moved to the css file
Actual: 100% CPU use until all-minified is syntax highlighted.
Comment 1 Gavin Sherlock 2009-06-11 10:06:15 PDT
I see this with Safari 4 - sample attached.  Seems to be spending time in:

617 JSC::RegExpObject::match(JSC::ExecState*, JSC::ArgList const&)
                617 JSC::RegExpConstructor::performMatch(JSC::RegExp*, JSC::UString const&, int, int&, int&, int**)
                              617 JSC::RegExp::match(JSC::UString const&, int, WTF::OwnArrayPtr<int>*)
                                          617 JSC::Yarr::executeRegex(JSC::Yarr::RegexCodeBlock&, unsigned short const*, unsigned int, unsigned int, int*, int)
Comment 2 Gavin Sherlock 2009-06-11 10:06:44 PDT
Created attachment 31165 [details]
sample
Comment 3 Gavin Sherlock 2009-06-11 10:12:19 PDT
Hang also occurs in r44591
Comment 4 Gavin Sherlock 2009-06-20 21:00:36 PDT
Reproduces as far back as r37381, and still reproduces in r44906 (I'm using 10.5.8 with Safari 3.2.1, but it also occurs using Safari 4.01.  r37381 predates YARR, so it is not a problem introduced by that.  Seems like the regex engine is the problem, and this task might make a good speed test.
Comment 5 Gavin Sherlock 2009-06-20 22:06:20 PDT
That should be I'm using 10.5.7....
Comment 6 Patrick Mueller 2009-06-22 11:19:37 PDT
That's really annoying.

Of course, all that time is more or less completely wasted because the file being parsed isn't really debuggable in the first place.  And a shame, because there's plenty of useful debug function (resource measurements) which can be made on that site, which you are locked out of for a certain amount of time if you inadvertently click on that file.

So one idea would be - don't parse that JavaScript.  Set a limit - say 100K - on the size of a JS file which will have syntax highlighting applied to it.  

Another heuristic could be to try to identify "minified" JS files - which also won't be very debuggable - you could do this by looking for very long lines, though not quite sure what the heuristic would be there, to keep from unintentionally labelling files that happen to contain long lines as "do not parse".

In a perfect world, where JS developers had debug concerns in mind, we might imagine annotating a JS file with a special comment "// not debuggable", in which case debuggers might decline to provide some amount of debug friendliness; but that's not very realistic; not many developers care about debugging, it seems.
Comment 7 Gavin Sherlock 2009-06-22 11:22:29 PDT
This this be a P1?  The guidelines say:

Any reproducible crash or hang.

is a P1, and I think this qualifies as a reproducible hang.
Comment 8 Gavin Barraclough 2009-06-22 11:40:02 PDT
<rdar://problem/6994458> Hang using Web Inspector on slashdot.org
Comment 9 Patrick Mueller 2009-06-24 11:29:02 PDT
Just a quick note - tried on the latest Chrome dev channel (3.0.189.0) on Windows, and the (presumed) parsing took a long time there as well.  Not as long, but still annoyingly long.  Locked up the UI in the Inspector window, but nothing else, which was expected.
Comment 10 Keishi Hattori 2009-07-12 13:44:07 PDT
I optimized the regexp and dom code and was able make it much faster (165974ms -> 8503ms) but it's still too slow. I investigated a C++ implementation but it's quite difficult. Maybe I'll rewrite it in JavaScript.
Comment 11 Patrick Mueller 2009-07-12 15:28:41 PDT
It's NEVER going to be fast enough.  Fairly standard for these days for folks to "optimize" their JS by combining it into large wads for a single download - even Web Inspector itself does this!  There's no sense in parsing these wads of code that people have no intention of browsing through.  I vote for a byte limit on the size of a JS file that parsing will be done on.
Comment 12 Patrick Mueller 2009-07-20 07:35:41 PDT
I just noticed WebKit seems to be building with worker support (never noticed it before, maybe it's been in a while).  Moving the syntax highlighting to a worker would seem to make sense.
Comment 13 Keishi Hattori 2009-07-20 07:54:27 PDT
(In reply to comment #12)
> I just noticed WebKit seems to be building with worker support (never noticed
> it before, maybe it's been in a while).  Moving the syntax highlighting to a
> worker would seem to make sense.

I did some research earlier. I may be wrong but Workers can't work on the DOM so 40-50 % of the work load will still be on the main thread. 

It could still be something I could try.
Comment 14 Patrick Mueller 2009-07-20 08:34:43 PDT
Wouldn't need to work directly off the DOM; you'd feed it the source string, it would send back lines of HTML to replace in the DOM.  Or something.
Comment 15 Keishi Hattori 2009-07-20 10:48:56 PDT
I was able to implement the current code in worker. It's good that it doesn't block the UI thread but it's unreasonably slow. (2x-10x slower)
I'll upload the patch when I get it cleaned up and ready.
Comment 16 Patrick Mueller 2009-07-20 11:31:08 PDT
Excellent.  If we can't get the speed back somehow, then perhaps we can conditionally use the worker when the size of the source exceeds some value, like 20-30K.  Let's the raw "do it every time" patch first though.
Comment 17 Keishi Hattori 2009-07-21 02:02:17 PDT
Created attachment 33160 [details]
Use web Worker for syntax highlight

The worker file is a strange file extension because "*.js" files are removed in the build script.

Also features better syntax highlighting and a few optimizations.
Comment 18 Patrick Mueller 2009-07-21 07:34:22 PDT
> "*.js" files are removed in the build script

arghh.  I've added a pointer to this bug in Bug 26272 (a bug I opened asking for the .js files to stop getting bundled in the build).

Even so, might be better if you created a subdirectory for the worker.  I've been wondering about organizing the scripts in front-end anyway into a set of subdirectories.

It appears you are sending each line over to the worker to be processed.  I suspect, especially for large files, that if this was chunked so you sent 100 or so lines at a time, you'd see some performance improvements.  Assuming the worker messaging overhead is somewhat significant.
Comment 19 Keishi Hattori 2009-07-21 09:09:43 PDT
(In reply to comment #18)
> arghh.  I've added a pointer to this bug in Bug 26272 (a bug I opened asking
> for the .js files to stop getting bundled in the build).

We don't bundle for debug builds so I feel this is OK (if there really is a noticeable speed difference in bundling). I think creating a subdirectory for worker scripts is a good solution.

> It appears you are sending each line over to the worker to be processed.  I
> suspect, especially for large files, that if this was chunked so you sent 100
> or so lines at a time, you'd see some performance improvements.  Assuming the
> worker messaging overhead is somewhat significant.

I suspect there is some message overhead but that doesn't explain why a few long lines has a larger performance deterioration(relative to the non-worker version) than a lot of short lines. Strange...

This is going to be a half baked solution anyway. 

The ultimate solution is doing it in JSC.  We need to modify the Lexer so it handles the white spaces.
And drastically modify HTMLViewSourceDocument which is closely tied to the HTMLTokenizer.
 
I was curious as to how far we can push this but I think I've put enough effort into the JS solution and I'm better off working on the C++ solution.
Comment 20 Timothy Hatcher 2009-07-21 09:12:12 PDT
I think a subdirectory would be good so we can keep .js extensions for workers.
Comment 21 Patrick Mueller 2009-07-21 10:36:39 PDT
"doing it in JSC" ... hopefully that doesn't mean that we have to have the V8 folks  also have to "do it in V8".   I don't know enough about what you're proposing to tell.

Also, thought I'd point out that FireBug doesn't seem to syntax color the JS it displays.  I like the highlighting, but could live without it.  In particular re: comment 6, it's more likely that the larger the JS file, the less likely I'm going to be looking at it, because it's probably some library code vs. code I've actually written at least at development time.  

So, I'm still in favor of just doing a size check and punting on the highlighting if it reaches a certain threshold - absolute size, or line length, whatever.
Comment 22 Joseph Pecoraro 2009-07-21 10:45:19 PDT
(In reply to comment #21)
> Also, thought I'd point out that FireBug doesn't seem to syntax color the JS it
> displays.

It might be worth mentioning that there is a FireBug extension that supports it. I used to use it:
https://addons.mozilla.org/en-US/firefox/addon/7575

> I like the highlighting, but could live without it.

I haven't actually applied the patch to try it locally. But the way I understand it is that the syntax highlighting gets applied, on any size file, without blocking the UI.  During this processing is the black+white text available in the meantime?  The problem I would see here is a potential "flicker" where there isn't highlighting and then boom there is.
Comment 23 Timothy Hatcher 2009-07-21 10:51:32 PDT
> I haven't actually applied the patch to try it locally. But the way I
> understand it is that the syntax highlighting gets applied, on any size file,
> without blocking the UI.  During this processing is the black+white text
> available in the meantime?  The problem I would see here is a potential
> "flicker" where there isn't highlighting and then boom there is.


Thats what I would expect. But there shouldn't be any flicker, just "color appears". The text should not shift or anything like that, and the rendering should happen in one step.
Comment 24 Joseph Pecoraro 2009-10-03 22:10:41 PDT
The typical hang scenario is when encountering 1 very long line.  Maybe we should act on that? A line with then 300 characters is likely to be minified and undebuggable anyways.  Skipping it in the highlighter shouldn't cause too many problems (if its minified then multiline things like comments would have been stripped already).
Comment 25 Patrick Mueller 2009-10-04 05:59:19 PDT
I'm certainly in favor of "skipping" parsing for bits that don't need to be parsed.  And I have run into code which is sort of "mixed" - some minified code mixed with some plain old code.  So skipping just the minified bits seems like it would work for me.

I think I'd also be happy with skipping files over a certain size, completely.  Or perhaps given your line-length insight, skipping over files with lines greater than a certain length.
Comment 26 Alexander Pavlov (apavlov) 2010-04-07 08:23:55 PDT
Fixed by pfeldman during the TextViewer optimization.