Bug 8049 - StringImpl hash traits deleted value creates an init routine for WebCore
Summary: StringImpl hash traits deleted value creates an init routine for WebCore
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P1 Major
Assignee: Darin Adler
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2006-03-29 04:54 PST by Darin Adler
Modified: 2006-04-05 14:29 PDT (History)
1 user (show)

See Also:


Attachments
patch with detailed change log (layout tests all pass) (129.18 KB, patch)
2006-03-29 06:01 PST, Darin Adler
mjs: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Darin Adler 2006-03-29 04:54:20 PST
WebCore has an global initializer because of the deleted value for StringImpl hash tables. This needs to be fixed.
Comment 1 Darin Adler 2006-03-29 04:54:46 PST
<rdar://problem/4442248> REGRESSION: WebCore has init routines
Comment 2 Darin Adler 2006-03-29 06:01:02 PST
Created attachment 7370 [details]
patch with detailed change log (layout tests all pass)
Comment 3 Maciej Stachowiak 2006-03-31 02:22:24 PST
WebCore portion reviewed. Now moving on to the JavaScriptCore part.
Comment 4 Maciej Stachowiak 2006-03-31 02:54:57 PST
+        const ValueType& operator*() const { return *(const ValueType*)&*m_impl; }
+        const ValueType* operator->() const { return &**this; }

Maybe iterators should have a get() method? That would make these a bit easier to understand. I had to think about the operator-> for a while before I got it. Wouldn't it actually be simpler to write operator* in terms of operator-> rather than vice versa? (although I guess you'd have to call it explicitly).

> RefCountManager

May I suggest RefCounter as the base name for this family of classes? I distrust classes with Manager in the name.

+    template<typename T, typename U, typename V>
+    inline typename HashSet<T, U, V>::iterator HashSet<T, U, V>::begin()

I'm not sure I dig this T, U, V[, W, X] convention. Why do this instead of making the template parameter names explicit? It seems like this could make it harder to figure out what's what in error messages.

-        typedef HashSet<Value, HashFunctions, Traits> T;
-        typename T::iterator end = collection.end();
-        for (typename T::iterator it = collection.begin(); it != end; ++it)
-            delete (*it);
+        typedef typename HashTableType::iterator iterator;
+        iterator end = collection.end();
+        for (iterator it = collection.begin(); it != end; ++it)
+            delete *(ValueType*)&*it;

Not sure why this change - wouldn't using the HashSet's own iterator still work? The *)&* is a bit perlish.


+    template<> struct IntTypes<1> { typedef signed char SignedType; typedef unsigned char UnsignedType; };
+    template<> struct IntTypes<2> { typedef short SignedType; typedef unsigned short UnsignedType; };
+    template<> struct IntTypes<4> { typedef int SignedType; typedef unsigned UnsignedType; };
+    template<> struct IntTypes<8> { typedef long long SignedType; typedef unsigned long long UnsignedType; };

Here and in the intHash function below, wouldn't it be better to use int8_t, uint32_t, etc, to be explicit about the size of the type? Seems like the types are being used here in a way that is mainly about their exact size, so that would be more expressive IMO.

------------

In general, I don't think the inlining is being done at the right level to get good code sharing out of this.  Ideally, every HashSet or HashMap method would be inline and would call some non-inline HashTable method (or inline if sufficiently trivial) while at the same time HashMap had inline versions for its own internal use. To cite an example I think is wrong:

HashSet::add - non-inline
HashTable::add - inline
HashMap::add - non-inline

It should be the other way around now for good code sharing, since HashMap and HashSet have the true key / value types embedded in their type signature, but HashTable only knows about the storage types.

Otherwise, this looks like a really good change.

Most of these are nitpicks, but I think this needs performance testing and the inlining issue needs to be addressed. One thing to be wary of is that some HashTable methods are both called internally by HashTable and from HashMap and HashSet, these may need both inline and non-inline variants.

r- for this and the performance testing, but call it an r+ if you address these issues.

Comment 5 Darin Adler 2006-03-31 08:27:26 PST
(In reply to comment #4)
> +        const ValueType& operator*() const { return *(const ValueType*)&*m_impl; }
> +        const ValueType* operator->() const { return &**this; }
> 
> Maybe iterators should have a get() method? That would make these a bit easier
> to understand.

Sure that'd be fine.

> Wouldn't it actually be simpler to write operator* in terms of operator->
> rather than vice versa? (although I guess you'd have to call it explicitly).

Nah, I had it that way at first, but I like this way better. It seems clear that a get that returns a pointer would be the best way to make this readable.

> > RefCountManager
> 
> May I suggest RefCounter as the base name for this family of classes? I
> distrust classes with Manager in the name.

OK. I don't think these templates are really "counters" but I like your name anyway.

> +    template<typename T, typename U, typename V>
> +    inline typename HashSet<T, U, V>::iterator HashSet<T, U, V>::begin()
> 
> I'm not sure I dig this T, U, V[, W, X] convention. Why do this instead of
> making the template parameter names explicit? It seems like this could make it
> harder to figure out what's what in error messages.

I didn't experience any problems with error messages (and I saw quite a few while working on this).

With the full names being repeated over and over again, I find the code very hard to read. The terse names are great when the template parameters aren't even being used.

But it's a matter of taste and I'd be happy to put long names back for everything.

> -        typedef HashSet<Value, HashFunctions, Traits> T;
> -        typename T::iterator end = collection.end();
> -        for (typename T::iterator it = collection.begin(); it != end; ++it)
> -            delete (*it);
> +        typedef typename HashTableType::iterator iterator;
> +        iterator end = collection.end();
> +        for (iterator it = collection.begin(); it != end; ++it)
> +            delete *(ValueType*)&*it;
> 
> Not sure why this change - wouldn't using the HashSet's own iterator still
> work? The *)&* is a bit perlish.

I think it would work. I'll make the change.

This code went through many iterations and I guess it's now back where we started. I still think my approach of using a typedef for the iterator rather than for T is nicer so I'll probably do that.

> +    template<> struct IntTypes<1> { typedef signed char SignedType; typedef
> unsigned char UnsignedType; };
> +    template<> struct IntTypes<2> { typedef short SignedType; typedef unsigned
> short UnsignedType; };
> +    template<> struct IntTypes<4> { typedef int SignedType; typedef unsigned
> UnsignedType; };
> +    template<> struct IntTypes<8> { typedef long long SignedType; typedef
> unsigned long long UnsignedType; };
> 
> Here and in the intHash function below, wouldn't it be better to use int8_t,
> uint32_t, etc, to be explicit about the size of the type?

Yes, I think it would.

> In general, I don't think the inlining is being done at the right level to get
> good code sharing out of this.  Ideally, every HashSet or HashMap method would
> be inline and would call some non-inline HashTable method (or inline if
> sufficiently trivial) while at the same time HashMap had inline versions for
> its own internal use.

I agree.

> To cite an example I think is wrong:
> 
> HashSet::add - non-inline
> HashTable::add - inline
> HashMap::add - non-inline
> 
> It should be the other way around now for good code sharing, since HashMap and
> HashSet have the true key / value types embedded in their type signature, but
> HashTable only knows about the storage types.

You picked the one example that's different from the others, and for this one I believe you are mistaken.

The HashTable::add used here is a template that takes an additional parameter for the translator. Because of that, I don't think we can get any code reuse, so it makes sense of the HashTable::add to be inline and the HashSet and HashMap ones not to be.

But the principle seems clearly correct for almost every other function.