Bug 19865

Summary: Sampling tool.
Product: WebKit Reporter: Gavin Barraclough <barraclough>
Component: JavaScriptCoreAssignee: Gavin Barraclough <barraclough>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, aroben, zwarich
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.5   
Attachments:
Description Flags
patch
darin: review-
[BROKEN] New implementation. (8% regression)
none
New, cleaned-up implementation - no known bugs, no regressions.
ggaren: review+
(sunspider results) none

Description Gavin Barraclough 2008-07-03 02:20:53 PDT
Add a sampling tool to the VM to count time in opcodes and within functions.
Comment 1 Gavin Barraclough 2008-07-03 02:22:06 PDT
Created attachment 22062 [details]
patch
Comment 2 Alexey Proskuryakov 2008-07-03 04:44:40 PDT
A strawman question - cannot such information be obtained non-intrusively with dtrace dynamic probes?
Comment 3 Gavin Barraclough 2008-07-03 05:09:48 PDT
(In reply to comment #2)
> A strawman question - cannot such information be obtained non-intrusively with
> dtrace dynamic probes?

That's a good question – and one I should probably look into more, since I don't know dtrace that well.

Dumping bytecode may be more tricky using dtrace – (I don't know if dtrace can be used to trigger insertion of a call to a function within the process being traced) – though I guess it probably could extract sourcefile name & line#, and bytecode could likely be dumped from a separate run.


Comment 4 Adam Roben (:aroben) 2008-07-03 05:45:05 PDT
(In reply to comment #2)
> A strawman question - cannot such information be obtained non-intrusively with
> dtrace dynamic probes?

Certainly not on platforms that don't have dtrace, such as Windows.
Comment 5 Gavin Barraclough 2008-07-03 06:27:11 PDT
> Certainly not on platforms that don't have dtrace, such as Windows.

Which is also an interesting question – I have no idea whether Windows would be happy or not with my use of pthread.h!

My copy of XP just arrived today, and I think there is a threading abstraction in WTF? – I'll take a look at making sure this is Windows friendly.

Comment 6 Alexey Proskuryakov 2008-07-04 03:07:52 PDT
(In reply to comment #4)
> Certainly not on platforms that don't have dtrace, such as Windows.

That's of course true, but since this information is not platform-dependent, it might be sufficient to have Mac-only tools for this.
Comment 7 Darin Adler 2008-07-06 19:46:46 PDT
Comment on attachment 22062 [details]
patch

Looks good!

There's a tab in the ChangeLog, so the patch can't be landed as-is.

I think the various OpcodeSampling functions should be defined as empty inlines when opcode sampling is off. That way we don't have to wrap every single call in a separate #if.

Any globals that are not used in the header don't really need to be static data members -- they can just be normal globals with internal linkage. This will allow more changes without changing the header. This includes opcodeCounts, 

     if (jsValues.size() && (exec != 0)) {

In WebKit coding style this would just be "&& exec" without the != 0.

    #pragma unused(samplingLogger)

We don't use bare #pragma unused directives. We need to wrap that in a macro of some sort; maybe UNUSED_PARAM? But also, I don't understand the point of putting this particular into a local variable and then marking it unused. Why do we need the local variable?

    struct CodeBlockSample_t {

We do not use the "_t" suffix in classes defined in the WebKit project.

    typedef WTF::HashMap<uint64_t, CodeBlockSample_t *> CodeBlockSampleMap_t;

No space before the "*" in WebKit coding style.

    void* OpcodeSampling::sampledVPC = (void*)-1;

If you absolutely must use a special value here, please use reinterpret_cast rather than a C-style cast.

     CodeBlockSample_t* leftValue = *(CodeBlockSample_t**)left;
     CodeBlockSample_t* rightValue = *(CodeBlockSample_t**)right;

These should be const* and should use static_cast rather than C-style cast.

     if (leftValue->totalCount < rightValue->totalCount)
         return 1;
     else if (leftValue->totalCount > rightValue->totalCount)
         return -1;
     else
         return 0;

We don't do else after return in the WebKit coding style.

     // calcualtion shouldn't overflow provided length is less that 90,000 or so!

Typo: "calcualtion".

    uint64_t fletcheresqueChecksum32(uint32_t *input, int length)

This should be marked static so it gets internal linkage. Also the "*" should be next to the type. And since this reads the input, it should be const uint32_t. Also, we normally just use unsigned for this sort of thing except in very unusual situations. The specific-sized variants aren't helpful since the code already relies on int being 32-bits.

    usleep(1000000 / DUMP_OPCODE_SAMPLING);

This technique seems economical but unnecessarily subtle. Could the frequency just be a separate macro instead?

    if (vPC != (void *)-1) {

The use of the magic value here would be better if it was a named constant.

    if (size > 4096) {
        printf("Skipping CodeBlock: #%016llx (looks malformed)\n", hash);
    }
    else {

    if (codeOffset < blockSample->size) {
        blockSample->vpcCounts[codeOffset]++;
    }

In WebKit coding style we don't put braces around single line if statement bodies. We also put the else on the same line as the closing brace.

    ptrdiff_t codeOffset = ((ptrdiff_t)(OpcodeSampling::sampledVPC) - (ptrdiff_t)(codeBlock->instructions.begin()))/sizeof(void*);

We use the C++ casts rather than the C-style casts.

    static const char* nonOpcodeSamplingNames[OpcodeSampling::numSamplingIDs - numOpcodeIDs] = {

This ought to have a const after the * too, since the array entries are const, not just the characters.

    inline const char* samplingNames(int index)

This should be marked static so it gets internal linkage.

         typedef enum {
             // all opcode ids are also valid sample ids.
             sampling_next = (op_end + 1), // This should come first.
             sampling_nativeCall,
             sampling_unknown // sampling_no_end MUST come at the end of the list.
         } SamplingID;

Normally you'd just say enum SamplingID rather than typedef enum ... SamplingID.

    static void *sampledVPC;

Use "void*" instead of "void *".

    static CodeBlock *sampledCodeBlock;

Use "CodeBlock*" instead of "CodeBlock *".

I'm going to say review- since there are some simple small things that could easily be done to make this match more of normal WebKit style, but I think it's almost suitable to land as-is.
Comment 8 Darin Adler 2008-07-06 19:47:03 PDT
Comment on attachment 22062 [details]
patch

Oops, forgot. Marking review-.
Comment 9 Gavin Barraclough 2008-07-09 10:54:00 PDT
Darin,

Thank you for the review – and the thorough comments.  Switching coding conventions can be tricky, but I really hoped I'd done a better job than that – so I do apologize for leaving so many issues to be identified.

I'm about to put up a new patch, which I hope should fix all the style bugs (as well as resolving the problem with using the tool within Safari).  I am going to have to take at least one more swing at this, since there are a couple of new design issues I need resolve before a final patch will be ready.

cheers,
G.

Comment 10 Gavin Barraclough 2008-07-09 11:30:33 PDT
Created attachment 22186 [details]
[BROKEN] New implementation. (8% regression)

This patch gets the tool working in Safari, but degrades performance due to a leak – for Program/EvalCodeBlocks the CodeBlock destructor is being called, so code blocks are not removed from the globalObject's array.  Issues to resolve:


(1) Calling the correct destructors on CodeBlocks, with the CodeBlocks being ref-counted - possible solutions:

i) Add a virtual destructor to CodeBlock, so the correct destructor is called, or achieve similar effect (e.g. could check codeType in a non-virtual ~CodeBlock, and static_cast to ProgramCodeBlock to remove self from the globalObject.

ii) Don't inherit RefCounted at the CodeBlock level - i.e. make ProgramCodeBlock, EvalCodeBlock and a new FunctionCodeBlock multiple inherit from both CodeBlock and RefCounted.

iii) Flatten the inheritance hierarchy, remove Program/EvalCodeBlock, give CodeBlock a globalObject pointer (null in the case of function code).


(2) Machine::isOpcode / Machine:: getOpcode / Machine:: getOpcodeID have been made static, so that I can dump CodeBlock objects after execution.  Firstly, does this sound sensible?

* if so, I will need find an appropriate place to create and destroy the map (currently being leaked) – Machine::Machine and Machine::~Machine will presumably not be suitable – assuming that multiple machines may exist concurrently.

* if not, I guess I need dump the codeblocks no later than the machine is destroyed, so associate one OpcodeSampling object with each Machine, and dump opcodes with the corresponding machine is destructed.
Comment 11 Geoffrey Garen 2008-07-10 15:01:47 PDT
> (1) Calling the correct destructors on CodeBlocks, with the CodeBlocks being
> ref-counted - possible solutions:
> 
> i) Add a virtual destructor to CodeBlock, so the correct destructor is called,
> or achieve similar effect (e.g. could check codeType in a non-virtual
> ~CodeBlock, and static_cast to ProgramCodeBlock to remove self from the
> globalObject.
> 
> ii) Don't inherit RefCounted at the CodeBlock level - i.e. make
> ProgramCodeBlock, EvalCodeBlock and a new FunctionCodeBlock multiple inherit
> from both CodeBlock and RefCounted.
> 
> iii) Flatten the inheritance hierarchy, remove Program/EvalCodeBlock, give
> CodeBlock a globalObject pointer (null in the case of function code).

We usually solve this issue with the RefCounted class by using virtual destructors. For example, the parse tree uses virtual destructors, as does the DOM in WebCore. So, that's probably the best solution along this path.

That said, I have to apologize for leading you a little astray here! I think there's a better solution, and I should have been more clear about it before.

A CodeBlock is the lazily generated portion of a ScopeNode. Therefore, the ScopeNode owns the CodeBlock. (There are three kinds of ScopeNode -- ProgramNode, EvalNode, and FunctionBodyNode -- corresponding to the three kinds of top-level code in JavaScript.) It's probably easiest to work within this model by hashing ScopeNodes, which are already refcounted, instead of CodeBlocks. (Each points to the other, so this is easy enough to do.)

Does that seem reasonable?

> (2) Machine::isOpcode / Machine:: getOpcode / Machine:: getOpcodeID have been
> made static, so that I can dump CodeBlock objects after execution.  Firstly,
> does this sound sensible?
> 
> * if so, I will need find an appropriate place to create and destroy the map
> (currently being leaked) – Machine::Machine and Machine::~Machine will
> presumably not be suitable – assuming that multiple machines may exist
> concurrently.
> 
> * if not, I guess I need dump the codeblocks no later than the machine is
> destroyed, so associate one OpcodeSampling object with each Machine, and dump
> opcodes with the corresponding machine is destructed.

Harumph. This is a little challenging.

I agree that the duplicate data structures are not ideal. Multiple machines are rare, though, and the data structures are small. My main concern is getting your code to work :).

My goal in tying the table to the machine was to make the machine a gateway to the table, which guaranteed that the table was appropriately initialized before being used. My secondary goal, I guess, was to give the table a clear lifetime.

I see two ways to solve this problem. One way is to use a shared table like you have. The appropriate startup hook would be initializeThreadingOnce(). There's no need for a shutdown hook. (BTW, in our coding style, "m_" stands for member and "s_" stands for "static", so you would want to change m_ to s_ when doing this.) It's a little less elegant to rely on initializeThreadingOnce(), but that ship sailed long ago :).

Another way to solve this problem is to use an explicit hook for dumping sampled data, which provides an ExecState to dump with, instead of relying on an implicit destructor. I see two advantages to this solution: (1) You get a better dump and your code becomes simpler, since you can provide a real ExecState when dumping; (2) The machine's opcode tables, being non-global, are faster to access. (There's a substantial penalty for accessing globals in 32bit Mach-O frameworks.)

For example, in JavaScriptCore's Shell.cpp, jscmain could invoke the sampling dumper after execution. That would cover command-line SunSpider. In WebCore, with sampling enabled, maybe we would dump whenever the last webpage closed, or something like that.

What do you think?
Comment 12 Gavin Barraclough 2008-07-23 10:14:26 PDT
Created attachment 22454 [details]
New, cleaned-up implementation - no known bugs, no regressions.
Comment 13 Gavin Barraclough 2008-07-23 10:16:38 PDT
Created attachment 22455 [details]
(sunspider results)
Comment 14 Geoffrey Garen 2008-07-23 15:01:03 PDT
Comment on attachment 22454 [details]
New, cleaned-up implementation - no known bugs, no regressions.

This patch looks much improved. I have some minor comments, but I think you can land the patch as-is, and address these comments in a separate patch.

MACHINE_SAMPLING_callingNativeFunction:

We've decided to use the word "host" rather than native to indicate functions built into the library.

I think you only added this call to op_call. You need to add it to op_construct, too.

I'm worried about what happens if somebody adds an opcode but forgets to update the opcodeNames array. One way to prevent this would be to have a function with a switch statement, instead of an array. That way, the compiler would warn if not all the opcodes were covered by the switch. Another solution would be to ASSERT somewhere that the number of elements in the array was equal to the number of opcodes.

+            for (ScopeSampleRecordMap::iterator iter = m_scopeSampleMap->begin(); iter != m_scopeSampleMap->end(); ++iter)
+                delete iter->second;

There's a helper function for this: deleteAllValues.

+            delete m_scopeSampleMap;

There's a helper template for this: OwnPtr.

+ SAMPLING_TOOL_ENABLED

We have a specific style of macros -- USE, HAVE, etc. -- for #defines like this. I think "ENABLE(SAMPLING_TOOL)" would be most appropriate in this case. You can read more about these macros in wtf/Platform.h.

ScopeSampleRecord and SamplingTool should get their own files. Generally, we try to put each class in its own file.

+            ScopeSampleRecord* record = m_scopeSampleMap->get(codeBlock->ownerNode);
+            if (record)

You can move the declaration and assignment for "record" into the "if()". This has the benefit of reducing the number of lines of code, and guaranteeing that you cannot use the pointer if it is null (because it's only in scope if it is not null).

+    const LineCountInfo *leftLineCount = reinterpret_cast<const LineCountInfo *>(left);

Lines like this should put the "*" next to the type name, not one space away.
Comment 15 Gavin Barraclough 2008-07-23 15:41:22 PDT
Sending        JavaScriptCore/ChangeLog
Sending        JavaScriptCore/JavaScriptCore.exp
Sending        JavaScriptCore/VM/Machine.cpp
Sending        JavaScriptCore/VM/Machine.h
Sending        JavaScriptCore/VM/Opcode.cpp
Sending        JavaScriptCore/VM/Opcode.h
Sending        JavaScriptCore/kjs/Shell.cpp
Sending        JavaScriptCore/kjs/nodes.cpp
Transmitting file data ........
Committed revision 35305.