Bug 20645 - Duplicated constants
Summary: Duplicated constants
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC All
: P2 Normal
Assignee: Nobody
Depends on:
Reported: 2008-09-04 04:32 PDT by Judit Jász
Modified: 2008-09-21 13:35 PDT (History)
1 user (show)

See Also:

proposed patch (2.95 KB, patch)
2008-09-04 04:34 PDT, Judit Jász
ggaren: review-
Details | Formatted Diff | Diff
Modification of the solution according to the reviewer's comments. (3.52 KB, patch)
2008-09-05 05:43 PDT, Judit Jász
ggaren: review-
Details | Formatted Diff | Diff
Improvements of the earlier patch according to the comments. (6.33 KB, patch)
2008-09-19 07:40 PDT, Judit Jász
darin: review-
Details | Formatted Diff | Diff
Updating the patch to the actual revision. (3.49 KB, patch)
2008-09-21 12:29 PDT, Judit Jász
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Judit Jász 2008-09-04 04:32:17 PDT
The CodeGenerator differentiates the constants with their JSValue* pointer instead of their contents. In this way many cases there are different constants with the same values. The memory consumption of a particular CodeBlock can be better if it would not duplicate the constants with the same values.
Comment 1 Judit Jász 2008-09-04 04:34:19 PDT
Created attachment 23166 [details]
proposed patch
Comment 2 Geoffrey Garen 2008-09-04 08:47:47 PDT
Comment on attachment 23166 [details]
proposed patch

Is the point of this patch just to unique numbers and strings within a codeblock? I think it is. For everything else, comparing JSValue* would be enough.

Rather than making a JSValue*, then later extracting and comparing the underlying data, I think we should just to add a version of emitLoad() for Identifier, and one for double, which did the unique-ing based on hashes of Identifier and double. 

That would eliminate a lot of complexity, and substantially improve the double hash function, which in this patch just truncates the double to unsigned. It would also reduce memory and GC pressure, in the case where you actually found a duplicate constant (since you would avoid allocating the duplicate just to discover that it was a duplicate).
Comment 3 Judit Jász 2008-09-05 05:43:16 PDT
Created attachment 23190 [details]
Modification of the solution according to the reviewer's comments.
Comment 4 Geoffrey Garen 2008-09-18 08:58:03 PDT
Thanks, Judit, this latest patch looks a lot better.

Three comments:

1. I'd like to improve the string map a bit. Right now, the typedef is 

+        typedef HashMap<RefPtr<UString::Rep>, JSValue*> StringMap;

We can get rid of the RefPtr in that typedef. The string is owned by the syntax tree, so there's no need for the code generator to ref and deref it.

Also, because we know that we're hashing Identifier strings, we can use IdentifierRepHash, which is optimized for the case where the string has already computed a hash code.

2. Though your patch didn't introduce this problem, it made me realize that emitLoad for UString and its use are a little weird:

+        RegisterID* emitLoad(RegisterID* dst, const UString&);

-    return generator.emitLoad(dst, jsOwnedString(generator.globalExec(), Identifier(generator.globalExec(), m_value).ustring()));
+    return generator.emitLoad(dst, m_value);

emitLoad is declared to operate on UStrings, but in fact it only wants to operate on Identifiers!

A more efficient solution would be to declare emitLoad to operate on an Identifier: "RegisterID* emitLoad(RegisterID* dst, const Identifier&);". And StringNode, instead of holding m_value as a UString, should hold m_value as an Identifier. That way, we won't have to make a temporary Identifier every time we encounter a string in the syntax tree.

3. A simpler way to write

+    NumberMap::iterator iterator = m_numberMap.find(d);
+    if (iterator != m_numberMap.end())
+        return emitLoad(dst, iterator->second);


if (JSValue* v = m_numberMap.get(d))
    return emitLoad(dst, v);

The same goes for emitLoad of a string.

I'm going to say r- because I think these would be good improvements to make, but overall this patch looks really good!
Comment 5 Judit Jász 2008-09-19 07:40:14 PDT
Created attachment 23565 [details]
Improvements of the earlier patch according  to the comments.
Comment 6 Darin Adler 2008-09-21 11:36:22 PDT
Comment on attachment 23565 [details]
Improvements of the earlier patch according  to the comments.

Patch looks good, but it needs to be merged with a change I made to StringNode. Sorry!
Comment 7 Judit Jász 2008-09-21 12:29:14 PDT
Created attachment 23630 [details]
Updating the patch to the actual revision.
Comment 8 Darin Adler 2008-09-21 13:26:55 PDT
Comment on attachment 23630 [details]
Updating the patch to the actual revision.

This looks good. Does it slow SunSpider down, or speed it up, or have no effect?

+        Reviewed by NOBODY (OOPS!).
+        Duplications of constants values of CodeBlocks are eliminated.

ChangeLog should include the bug number. It's also better if the list of files and functions below says what was done at each call site.

+    JSValue* number = NULL; 

This variable doesn't need to be initialized, because it's always set on the next line. Also, our coding style says we use 0 instead of NULL.

Further, this can be written in a way that doesn't hash twice, using more advanced features of the HashMap.

    pair<NumberMap::iterator, bool> addResult = m_numberMap.add(d, 0);
    JSValue* number;
    if (!addResult.second)
        number = addResult.first->second;
    else {
        number = jsNumber(globalExec(), d);
        addResult.first->second = number;
    return emitLoad(dst, number);

While the above code looks a bit more confusing, it does only one hash table lookup, while the code in your patch does two.

You can do the same thing for the string case.

+    JSValue* str = NULL; 

I'd suggest using "string" instead of "str".

+        typedef HashMap<UString::Rep*, JSValue*, IdentifierRepHash > IdentifierStringMap;

No need for a space before the ">" symbol.

I'm going to say r=me because my suggestions are probably optional; I think I'll make the improvements and then test performance.
Comment 9 Darin Adler 2008-09-21 13:35:38 PDT