Bug 131004 - Add a --functionsToDFGCompile option.
Summary: Add a --functionsToDFGCompile option.
Status: RESOLVED DUPLICATE of bug 132437
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Mark Lam
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-31 19:36 PDT by Mark Lam
Modified: 2014-05-22 15:42 PDT (History)
7 users (show)

See Also:


Attachments
the patch. (8.91 KB, patch)
2014-03-31 19:49 PDT, Mark Lam
mark.lam: review-
Details | Formatted Diff | Diff
patch 2 (9.78 KB, patch)
2014-03-31 23:31 PDT, Mark Lam
no flags Details | Formatted Diff | Diff
patch 3: removed unnecessary white space checks (9.45 KB, patch)
2014-04-01 08:14 PDT, Mark Lam
ggaren: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Lam 2014-03-31 19:36:13 PDT
... because sometimes, --bytecodeRangeToDFGCompile just isn't enough to minimize the functions to compile.

If --functionsToDFGCompile is unspecified, all functions will be compiled (i.e. unrestricted).  If we wish to filter the functions to be compiled to a specified list (by name), we can now do so in 2 ways:
1. specify the list on the command line:
    --functionsToDFGCompile=functionA,functionB,functionC
2. provide a list of function names in a file:
    --functionsToDFGCompile=myListOfFunctionNames.txt

The benefit of using the list is that it also support C++ style comments.  For example:

=== BEGIN myListOfFunctionNames.txt ===
functionA
// functionB
functionC
/*
functionD
functionE
*/
=== END myListOfFunctionNames.txt ===

With this example list, only functionA and function C will be compiled.  This makes it easier for us to grab a list of functions that were compiled using --reportCompileTimes to make the compile list file, and then selectively commenting out entries in that list to isolate the minimum set of functions that need to be compiled in order to reproduce an issue.

Technically, in the implementation of --functionsToDFGCompile, the file option loads the file into a big string and parses that string as if it came from the command line.  Hence, technically, the command line list option also supports commenting, but it wouldn't be how one would typically use that option.
Comment 1 Mark Lam 2014-03-31 19:49:20 PDT
Created attachment 228234 [details]
the patch.
Comment 2 Mark Lam 2014-03-31 19:49:52 PDT
Comment on attachment 228234 [details]
the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=228234&action=review

> Source/JavaScriptCore/ChangeLog:9
> +        minimize the list of functions to compile in order to reproduce an issue

"minimize" ==> "minimal".
Comment 3 WebKit Commit Bot 2014-03-31 19:50:52 PDT
Attachment 228234 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/runtime/Options.h:308:  enum members should use InterCaps with an initial capital letter.  [readability/enum_casing] [4]
Total errors found: 1 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Geoffrey Garen 2014-03-31 21:04:19 PDT
Comment on attachment 228234 [details]
the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=228234&action=review

> Source/JavaScriptCore/ChangeLog:41
> +        Hence, we can use indentations, line breaks, and comments in the compilation
> +        list.  Here's an example of such a file:
> +
> +            // Start of myListOfFunctionNames.txt
> +            functionA,
> +            // functionB
> +            functionC
> +            /*
> +            functionD
> +            functionE
> +            */
> +            // End of myListOfFunctionNames.txt

Do we really need a fully generalized lexer and parser for this one little command-line option? And not just one but two different comment syntaxes? Really?

> Source/JavaScriptCore/dfg/DFGCapabilities.cpp:168
> +static inline bool inOKToDFGCompileList(CodeBlock* codeBlock)
> +{
> +    return !Options::functionsToDFGCompile() || slowInOKToDFGCompileList(codeBlock);
> +}

All of this command-line handling belongs in Options.cpp. It has nothing to do with the capabilities of the DFG.
Comment 5 Mark Lam 2014-03-31 21:41:16 PDT
(In reply to comment #4)
> (From update of attachment 228234 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=228234&action=review
> 
> > Source/JavaScriptCore/ChangeLog:41
> > +        Hence, we can use indentations, line breaks, and comments in the compilation
> > +        list.  Here's an example of such a file:
> > +
> > +            // Start of myListOfFunctionNames.txt
> > +            functionA,
> > +            // functionB
> > +            functionC
> > +            /*
> > +            functionD
> > +            functionE
> > +            */
> > +            // End of myListOfFunctionNames.txt
> 
> Do we really need a fully generalized lexer and parser for this one little command-line option? And not just one but two different comment syntaxes? Really?

You may already know this, but just to make sure that we’re on the same page, here’s the use case scenario I have in mind and indeed encountered a need for: 

1. After reducing the set of compiled function to some small-ish set using --bytecodeRangeToDFGCompile, I have a list of functions produced by --reportCompileTimes.

Note: I’ve already determined that I can no longer reduce the bytecode range further or the issue will cease to manifest.  However, with this bytecode range, I still have about 10 functions to work with.  However, depending on the workload, the list could have easily been 100 functions or some unwieldy number. 

2. I put this list in a file use the --functionsToDFGCompile to only filter out any functions on in that file.

Being able to use /* */ comments allows me to rule out groups of functions at a time and employ a bisecting search.   Note that the search requires that I speculatively remove groups of functions and be able to add them (or subsets of them) back in if my speculation fails to be true.  Without comments, I will have to do a lot of cutting and pasting into and out of that file, and this can be error prone.  With /* */ comments, it’s easier to comment out and in groups of functions while allowing me to see at a glance which groups of functions I’m including / excluding.

The // comments is just a convenience.  We can certainly make do with just the /* */ comments, but it was not difficult to add.  So, I added it.


> > Source/JavaScriptCore/dfg/DFGCapabilities.cpp:168
> > +static inline bool inOKToDFGCompileList(CodeBlock* codeBlock)
> > +{
> > +    return !Options::functionsToDFGCompile() || slowInOKToDFGCompileList(codeBlock);
> > +}
> 
> All of this command-line handling belongs in Options.cpp. It has nothing to do with the capabilities of the DFG.

I agree.  I will move them over.
Comment 6 Mark Lam 2014-03-31 23:31:14 PDT
Created attachment 228246 [details]
patch 2
Comment 7 WebKit Commit Bot 2014-03-31 23:34:05 PDT
Attachment 228246 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/runtime/Options.h:315:  enum members should use InterCaps with an initial capital letter.  [readability/enum_casing] [4]
Total errors found: 1 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 Mark Hahnenberg 2014-04-01 07:45:41 PDT
Comment on attachment 228246 [details]
patch 2

View in context: https://bugs.webkit.org/attachment.cgi?id=228246&action=review

> Source/JavaScriptCore/runtime/Options.cpp:427
> +}

I agree with Geoff--this could be way simpler. If it were me, I'd get rid of the inline option (for which you haven't listed a use case), treat the file list as one function per line, and just allow one-line comments that can only start at the beginning of the line. No need to keep track of the current position in the file, no need to worry about whitespace, no need to write weird pseudo regular expressions. Just read line by line (using fgetln or something similar), look at the very beginning of the line for comments, and assume everything else is a function.

You should be able to add/remove blocks of one-line comments very easily in almost any text editor, which allows you to bisect like you want. 

In this case I believe it's better to think like a scripter and not like a compiler writer :-)
Comment 9 Mark Lam 2014-04-01 08:14:11 PDT
Created attachment 228286 [details]
patch 3: removed unnecessary white space checks
Comment 10 WebKit Commit Bot 2014-04-01 08:16:03 PDT
Attachment 228286 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/runtime/Options.h:315:  enum members should use InterCaps with an initial capital letter.  [readability/enum_casing] [4]
Total errors found: 1 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 11 Mark Lam 2014-04-01 08:42:37 PDT
(In reply to comment #8)
> I agree with Geoff--this could be way simpler. If it were me, I'd get rid of the inline option (for which you haven't listed a use case), treat the file list as one function per line, and just allow one-line comments that can only start at the beginning of the line. No need to keep track of the current position in the file, no need to worry about whitespace, no need to write weird pseudo regular expressions. Just read line by line (using fgetln or something similar), look at the very beginning of the line for comments, and assume everything else is a function.

You’re only thinking of the mode where we load the function names from a file.  I also support the ability to specify the function names on the command line.  This is useful and convenient when the list of functions is small.  Are you also suggesting that I get rid of the option to specify the list on the command line?  fgetln() will not work if I’m parsing the list from the command line.

That said, your comment made me realize that I don’t need to explicitly check for white space.  Any illegal characters are effectively whitespace and are already ignored.  My 3rd patch removes the explicit white space checks.

> You should be able to add/remove blocks of one-line comments very easily in almost any text editor, which allows you to bisect like you want. 

Maybe I need to learn how to use my text editor better.

I still stand by my implementation.  I think it provides many advantages and conveniences in the workflow of debugging this class of bugs, and I don’t think the code to make it work isn’t all that complex.  These advantages include being able to specify the list on the command line (useful for short lists).  Once we support parsing the list from the command line, I have to scan the character stream anyway, and hence, it isn’t difficult to do a little bit of work to skip comments.

In the end, what are you guys objecting to? The complexity of the code?  This code isn’t that complex, and is not in any critical path.  The amount of memory space taken up by the code?  I think this code isn’t that big.

My goal is to make the JIT debugging workflow (for this class of JIT bugs) as convenient and simple as possible without too much cost.  I think this patch achieves that.  If you guys are strongly opposed to this patch, I can replace it with the fgetln() approach and remove the convenience of specifying the list on the command line.  Please advise.  Thanks.
Comment 12 Mark Lam 2014-04-01 08:45:16 PDT
(In reply to comment #11)
> (In reply to comment #8)
> > I'd get rid of the inline option (for which you haven't listed a use case), ...

One more note for consideration.  When choosing which blocks of functions to comment out, I sometimes like to leave notes for myself to track what and why I’m commenting out certain blocks or individual functions.  I find the single line comments useful for this.  Again, it is possible to use the block /* */ comments for this, but it’s less convenient.
Comment 13 Mark Hahnenberg 2014-04-01 09:56:29 PDT
(In reply to comment #11)

> You’re only thinking of the mode where we load the function names from a file.  I also support the ability to specify the function names on the command line.  This is useful and convenient when the list of functions is small.  Are you also suggesting that I get rid of the option to specify the list on the command line?  fgetln() will not work if I’m parsing the list from the command line.

I explicitly said I'd forgo the "inline" option in favor of just having the file. You said yourself that you already had the function names in a file from --reportCompileTimes.

> Maybe I need to learn how to use my text editor better.

Vim: ctrl + v to start a visual block, 'd' to delete the block
Emacs: c-spce to set the mark, select the block similar to vim but with one extra column, c-x r k to kill the block.
Xcode: cmd + / to toggle comments
This is a common feature in most text editors/IDEs.

> In the end, what are you guys objecting to? The complexity of the code?  This code isn’t that complex, and is not in any critical path.  The amount of memory space taken up by the code?  I think this code isn’t that big.

It just feels over engineered. Simplicity is a Good Thing (TM), both in implementation and in use. Finding the right balance is what we're trying to do here.
Comment 14 Mark Hahnenberg 2014-04-01 09:57:24 PDT
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #8)
> > > I'd get rid of the inline option (for which you haven't listed a use case), ...
> 
> One more note for consideration.  When choosing which blocks of functions to comment out, I sometimes like to leave notes for myself to track what and why I’m commenting out certain blocks or individual functions.  I find the single line comments useful for this.  Again, it is possible to use the block /* */ comments for this, but it’s less convenient.

Like I said, I'd get rid of the '/* */' comments in favor of '//' comments because they're easier to check for at the beginning of each line.
Comment 15 Geoffrey Garen 2014-04-01 10:11:30 PDT
Comment on attachment 228286 [details]
patch 3: removed unnecessary white space checks

View in context: https://bugs.webkit.org/attachment.cgi?id=228286&action=review

This feature isn't very useful to me, for a few reasons:

(1) It relies on function names, but not all functions have names, and names can overlap. This would be much more useful to me if its input were "filename:line,column" instead.

(2) It doesn't produce the initial exhaustive file, that I would start cutting down.

(3) There are enough special modes here -- three kinds of comments (//, /* */, and certain combinations of "non-function" characters), a special header delimiter if you're going to supply a file, all text assumed to be ASCII, limited range of parseable characters -- that I'm probably going to have to re-read the parsers every time I use this feature, and the parsers aren't super readable.

Now, maybe I should say r+ because you used this feature once and you like it, and it's only debugging code anyway. But I think the bar needs to be higher for code that we agree to maintain together.

I'm not convinced when you list out all the ways you could use all the features here. Every feature can be used in some circumstance, but that doesn't make every feature good, nor does it make all implementations of the same feature equally good. We need to weigh a feature against other ways to accomplish the same task, and against simplicity, clarity, maintainability, and robustness.

> Source/JavaScriptCore/runtime/Options.cpp:85
> +static bool parse(const char* string, const char*& value)
> +{
> +    value = string;
> +    return true;
> +}

This function returns bool but never returns false. Confusing.

> Source/JavaScriptCore/runtime/Options.cpp:352
> +struct FunctionsFilter {

This should be a class.

> Source/JavaScriptCore/runtime/Options.cpp:359
> +    Vector<CString> functionNames;

This should be Vector<String>.

> Source/JavaScriptCore/runtime/Options.cpp:362
> +static inline bool isFunctionNameStart(char c)

This is not a correct definition of the start of a function name in JavaScript. Many important details are missing, including escapes and Unicode.

> Source/JavaScriptCore/runtime/Options.cpp:367
> +static inline bool isFunctionNameChar(char c)

Ditto.

> Source/JavaScriptCore/runtime/Options.cpp:376
> +    char c = *p;
> +    while (c) {

It is better for a parser to operate on a set of characters and a length. In-band signaling of end-of-data often produces bugs.

> Source/JavaScriptCore/runtime/Options.cpp:378
> +        // Skip // or /* */ comment entries:
> +        if (c && c == '/') {

If you factored this out into a helper function, you wouldn't need a comment to give it a name.

> Source/JavaScriptCore/runtime/Options.cpp:381
> +                continue;

continue is misleading / slow here. You want break.

> Source/JavaScriptCore/runtime/Options.cpp:386
> +

Extra newline.

You should put some explicit control flow here to make it easier to follow.

> Source/JavaScriptCore/runtime/Options.cpp:432
> +    std::unique_ptr<char[]> buffer(new char[size + 1]);

This should be "auto buffer = make_unique<>...".

> Source/JavaScriptCore/runtime/Options.cpp:459
> +    if (filters.isInitialized && !filters.functionNames.size())
> +        return true;
> +
> +    if (!filters.isInitialized) {
> +        if (!strncmp(filterOption, "file:", 5))
> +            parseFunctionNamesInFile(filters, &filterOption[5]);
> +        else
> +            parseFunctionNamesInString(filters, filterOption);
> +        filters.isInitialized = true;
> +    }

Lazy initialization should be built into the class.
Comment 16 Michael Saboff 2014-04-01 10:41:41 PDT
I will join in the discussion as well.  The tool seems bigger than the problem to solve. I concur with simple is better.

IF we add a file name filter, I suggest the following
* The function names need to be the "cooked" or hashed name, e.g. foo#fTeGe3, that a compiled function is known by internally. That way we handle global functions, etc.  This isn't too much to ask, since this option will likely only be used after some range pruning.

 - for command line options:
* A comma separated list of hash named functions for the option value.
 - or for a file:
* It should be one function per line.
* No parsing, just grab the whole line, prune whitespace and add to vector.
* A simple comment of leading # or // to kill a line (pick one).

If we add a function name file parser that is only used for a debugging option, we are likely adding single use code subject to bit rot.
Comment 17 Mark Lam 2014-05-22 15:42:26 PDT
https://bugs.webkit.org/show_bug.cgi?id=132437 took care of the implementation that we decided to implement in the end.  No need to do any further work on this.

*** This bug has been marked as a duplicate of bug 132437 ***