Bug 31749 - ER: Add a few C-string alternatives to methods that take AtomicString
Summary: ER: Add a few C-string alternatives to methods that take AtomicString
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P3 Enhancement
Assignee: Jens Alfke
URL:
Keywords:
Depends on:
Blocks: 30187
  Show dependency treegraph
 
Reported: 2009-11-20 15:07 PST by Jens Alfke
Modified: 2009-12-02 12:47 PST (History)
5 users (show)

See Also:


Attachments
patch (24.98 KB, patch)
2009-11-20 15:08 PST, Jens Alfke
no flags Details | Formatted Diff | Diff
patch 2 (smaller) (15.31 KB, patch)
2009-11-23 14:59 PST, Jens Alfke
darin: review-
Details | Formatted Diff | Diff
patch 3 (15.54 KB, patch)
2009-11-30 15:12 PST, Jens Alfke
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jens Alfke 2009-11-20 15:07:59 PST
In preparation for disabling implicit char*-to-AtomicString conversions (bug 30187) I'm adding a few char* equivalents of commonly-used WebCore methods that take an AtomicString parameter:

HTTPHeaderMap::get
HTTPHeaderMap::contains
ResourceRequestBase::httpHeaderField
ResourceRequestBase::setHTTPHeaderField
ResourceResponseBase::httpHeaderField
QualifiedName::QualifiedName
Element::setCStringAttribute [different name to prevent accidental usage]

I've also added ResourceRequestBase::contentType, as there were a number of calls to httpHeaderField("Content-Type").

To make HTTP header lookup by C string more efficient, I ported the alternate-key-type support from HashSet into HashMap. This allows HTTPHeaderMap to look up keys by char* without having to create a temporary AtomicString.

I didn't similarly optimize _setting_ HTTP headers, because that doesn't occur nearly as much. The new setHTTPHeaderField(char*) is just a convenience, to avoid having to insert a bunch of occurrences of "AtomicString(...)" in calling code.

Similarly, Element::setCStringAttribute is only a convenience, for use in code that isn't performance-sensitive, for example ImageDocument.cpp. I did give it a different name so that people won't call it by accident or through laziness; in general it's better to create a static AtomicString instead of using a string literal.

Finally, I collected the C string literals for HTTP header names into a group of symbolic constants in the HTTPHeaders namespace (in HTTPHeaderMap.h). I originally did this to make them into AtomicStrings, but that didn't work out for thread-safety reasons; I still think it's valuable to do for code cleanliness, though.
Comment 1 Jens Alfke 2009-11-20 15:08:35 PST
Created attachment 43616 [details]
patch
Comment 2 Jens Alfke 2009-11-23 14:59:44 PST
Created attachment 43738 [details]
patch 2 (smaller)

I've removed some of the nonessential parts to make this patch smaller and easier to read.
Comment 3 Darin Adler 2009-11-30 11:59:09 PST
Comment on attachment 43738 [details]
patch 2 (smaller)

> +void Element::setCStringAttribute(const QualifiedName& name, const char* cStringValue) {
> +    ExceptionCode ec;
> +    setAttribute(name, AtomicString(cStringValue), ec);
> +}

Brace needs to go on the next line.

> +    // Please don't use setCStringAttribute in performance-sensitive code;
> +    // use a static AtomicString value instead to avoid the conversion overhead.
> +    void setCStringAttribute(const QualifiedName&, const char* cStringValue);

I am not entirely comfortable with adding this and then advising people not to use it. Also, is there a reason we do this with a separate name and not with overloading?

> -QualifiedName::QualifiedName(const AtomicString& p, const AtomicString& l, const AtomicString& n)
> +void QualifiedName::init(const AtomicString& p, const AtomicString& l, const AtomicString& n)
>  {
>      if (!gNameCache)
>          gNameCache = new QNameSet;
> @@ -62,6 +62,16 @@ QualifiedName::QualifiedName(const Atomi
>          m_impl->ref();
>  }

I think this could be marked inline so we don't do an extra function call just to get in; or maybe the function call is worth it. If we do inline, we would probably get better code if we put the line of code that creates the QNameSet into a separate non-inline function.

    if (!gNameCache)
        createNameCache();

Something like that.

>      QualifiedName(const AtomicString& prefix, const AtomicString& localName, const AtomicString& namespaceURI);
> +    QualifiedName(const AtomicString& prefix, const char* localName, const AtomicString& namespaceURI);

Does this need advice about when to use it and when to not use it?

> +// Magic adapter that allows the HashMap to take C strings as keys.

Not a big fan of the word "Magic" in this comment.

> +    static unsigned hash(const char* c)
> +    {
> +        return CaseFoldingHash::hash(c);
> +    }

The use of "c" as the argument name here is a bit strange. I would name it string.

> +    static bool equal(AtomicString key, const char* s)

I think this needs to take const AtomicString& to avoid causing unnecessary reference count thrash.

> +        return equalIgnoringCase(key,s);

We use spaces after commas, so "key" needs a space after it.

> +    static void translate(AtomicString& location, const char* const& c, unsigned /*hash*/)
> +    {
> +        location = AtomicString(c);
> +    }

We should make sure we have a member function on AtomicString so we don't have to construct a temporary AtomicString just to do this operation.

I think the "c" argument can just be const char* and doesn't need to be const char* const&. And also it's a strange name.

> +String HTTPHeaderMap::get(const char* name) const
> +{
> +    const_iterator i = find<const char*, CaseFoldingCStringTranslator>(name);
> +    if (i == end())
> +        return String();
> +    else 
> +        return i->second;
> +}

We normally do not do else after return.

> +bool HTTPHeaderMap::contains(const char* name) const {

Brace needs to be on the next line.

> +pair<HTTPHeaderMap::iterator, bool> HTTPHeaderMap::add(const char *name, String value)

The "*" for const char* needs to be next to "char". The argument should be const String& instead of String to avoid causing unnecessary reference count thrash.

> +        String get(const AtomicString &name) const

The "&" should be moved next to the AtomicString.

> +        pair<iterator, bool> add(AtomicString name, String value)

This should take a const AtomicString& and a const String&.

> +        pair<iterator, bool> add(const char *name, String value);

The "*" should be moved next to the char. This should take a const String&.

> +        void setHTTPHeaderField(const char* name, const String& value) { setHTTPHeaderField(AtomicString(name), value); }

Could we put this out of line?

I think at least a few of the suggestions above are mandatory, so review-.
Comment 4 Jens Alfke 2009-11-30 15:12:32 PST
>I am not entirely comfortable with adding this and then advising people not to
>use it. 

It's not that people shouldn't use it; it's a trade-off of performance vs (convenience + code size). In a lot of places it probably doesn't matter; the comment just says not to use it in performance-sensitive code.

>Also, is there a reason we do this with a separate name and not with overloading?

To make the choice explicit by the caller, which is the theme of this series of patches. It's very easy to type setAttribute("foo",...) without realizing the conversion that's going to be incurred, so this makes the caller think about it.

>I think this could be marked inline so we don't do an extra function call just
>to get in; or maybe the function call is worth it.

The constructor is called in a lot of places in the code to create all the standard identifiers, so I'd rather not inline it. What happens instead is that in the .cpp file the init() method gets inlined into both constructors, so I don't think there is any net effect on performance by this change.

>Does this need advice about when to use it and when to not use it?

In theory, but I don't think there are any situations where we create QualifiedNames in performance-critical code.

>Not a big fan of the word "Magic" in this comment.

I'll change it to "Sorcerous", or maybe "Thaumaturgical". ;-)

>The use of "c" as the argument name here is a bit strange.
>I think the "c" argument can just be const char* and doesn't need to be const
>char* const&. And also it's a strange name.

Fixed. (BTW, those lines are directly copied from AtomicString.cpp, lines 49 and 66).

I've fixed the other things you pointed out and will upload a new patch.
Comment 5 Jens Alfke 2009-11-30 15:12:57 PST
Created attachment 44043 [details]
patch 3
Comment 6 Adam Barth 2009-11-30 15:14:35 PST
Attachment 44043 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
Done processing WebCore/platform/network/HTTPHeaderMap.cpp
JavaScriptCore/wtf/HashMap.h:127:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
JavaScriptCore/wtf/HashMap.h:215:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
JavaScriptCore/wtf/HashMap.h:224:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
JavaScriptCore/wtf/HashMap.h:233:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
JavaScriptCore/wtf/HashMap.h:262:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
Done processing JavaScriptCore/wtf/HashMap.h
Done processing WebCore/platform/network/ResourceRequestBase.cpp
Done processing WebCore/dom/Element.h
Done processing WebCore/dom/QualifiedName.h
Done processing WebCore/platform/network/ResourceResponseBase.cpp
Done processing WebCore/platform/network/HTTPHeaderMap.h
Done processing WebCore/platform/network/ResourceResponseBase.h
Done processing WebCore/dom/Element.cpp
Done processing WebCore/dom/QualifiedName.cpp
Done processing WebCore/platform/network/ResourceRequestBase.h
Total errors found: 5
Comment 7 Jens Alfke 2009-11-30 15:34:57 PST
>Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
>Done processing WebCore/platform/network/HTTPHeaderMap.cpp
>JavaScriptCore/wtf/HashMap.h:127:  Code inside a namespace should not be
>indented.  [whitespace/indent] [4]

HashMap.h was already indented. All I did was make my changes follow the indentation of the surrounding code.

It might be a better idea to hold off on automated style nags until you can get top-of-tree to pass without any warnings?
Comment 8 Darin Adler 2009-12-01 10:29:54 PST
(In reply to comment #4)
> >I think this could be marked inline so we don't do an extra function call just
> >to get in; or maybe the function call is worth it.
> 
> The constructor is called in a lot of places in the code to create all the
> standard identifiers, so I'd rather not inline it. What happens instead is that
> in the .cpp file the init() method gets inlined into both constructors, so I
> don't think there is any net effect on performance by this change.

My comment was about the init function. It's not marked inline, so I don't think it will get inlined into the constructors in as many situations as it would if it was marked inline.
Comment 9 Darin Adler 2009-12-01 10:37:34 PST
(In reply to comment #4)
> >Also, is there a reason we do this with a separate name and not with overloading?
> 
> To make the choice explicit by the caller, which is the theme of this series of
> patches. It's very easy to type setAttribute("foo",...) without realizing the
> conversion that's going to be incurred, so this makes the caller think about
> it.

I think the same issue applies to the QualifiedName constructor, but there we are using overloading. And ResourceRequestBase::setHTTPHeaderField. Are there different circumstances that make that the right tradeoff in those cases?
Comment 10 Darin Adler 2009-12-01 10:39:13 PST
Comment on attachment 44043 [details]
patch 3

> +        // An alternate version of add() that finds the object by hashing and comparing
> +        // with some other type, to avoid the cost of type conversion if the object is already
> +        // in the table. HashTranslator must have the following methods:

Better terminology is probably "function members", not "methods", as in the previous comment.

> +void QualifiedName::init(const AtomicString& p, const AtomicString& l, const AtomicString& n)

I think we should consider marking this member function inline so it gets inlined into the constructors. Not sure it's needed, but maybe it would be better than the extra level of function call. I raised this issue last time, but I think you may have misunderstood my comment.

> +String HTTPHeaderMap::get(const char* name) const
> +{
> +    const_iterator i = find<const char*, CaseFoldingCStringTranslator>(name);
> +    if (i == end())
> +        return String();
> +    return i->second;
> +}

Should we offer a HashMap::get that works with a custom key translator? If we had one we could use it here.

> +        String get(const AtomicString &name) const

Wrong position here for the "&" character. It should be by the type name. I wonder why that style checker script didn't notice.

> +        pair<iterator, bool> add(const AtomicString &name, const String &value)

And here.

> +        // Alternate accessors that are faster than converting the char* to AtomicString first

Need a period here.

> +        pair<iterator, bool> add(const char* name, const String &value);

Wrong position here for the "&" character. It should be by the type name.

It's probably not good that the HTTPHeaderMap::get function takes an const AtomicString& rather than a const String&. If the caller has a String but not an AtomicString, it will make things slower with no benefit. Similar issue, but debatable with the HTTPHeaderMap::add function -- tradeoff is slightly different because the string needs to be made into an AtomicString if it's a new key. I see no downside to taking a const String& in the get function, though.

r=me, but I think you could do some further refinement here
Comment 11 Jens Alfke 2009-12-01 13:53:28 PST
>Should we offer a HashMap::get that works with a custom key translator? If we
>had one we could use it here.

I thought about it, but there wasn't an equivalent in HashSet's translated-key code that I could crib from, and I don't trust my template fu enough to write one. Maybe Maciej could look at that (I never heard back from him about this patch.)

>I think we should consider marking this member function inline so it gets
>inlined into the constructors.

It already does get inlined by the -O3 optimizer (I disassembled to make sure.) IMHO I'd rather leave it up to the optimizer to make the decision in this case, esp. since I can imagine some WebKit clients (like mobile devices) may want different size/speed tradeoffs.

>Wrong position here for the "&" character.

Ack! I'm sorry I keep messing these up. I swear to god I pored over my last patch to make sure I hadn't left any punctuation in the ri^H^Hwrong place, but I was clearly being blind to ampersands. I'll fix those, and the comments too.

>It's probably not good that the HTTPHeaderMap::get function takes an const
>AtomicString& rather than a const String&. If the caller has a String but not
>an AtomicString, it will make things slower with no benefit.

Are you suggesting to implement get(String) by writing another alternate-key adapter? Or do you think the class declaration itself should change to be based on a HashMap<String,String>?
Comment 12 Jens Alfke 2009-12-01 16:07:21 PST
Committed r51566 (incorporating spelling and comment fixes.)
Comment 13 Darin Adler 2009-12-01 17:56:47 PST
(In reply to comment #11)
> It already does get inlined by the -O3 optimizer (I disassembled to make sure.)

I don't think the Mac OS X WebKit is built with -O3.
Comment 14 Mark Rowe (bdash) 2009-12-01 18:15:08 PST
(In reply to comment #13)
> (In reply to comment #11)
> > It already does get inlined by the -O3 optimizer (I disassembled to make sure.)
> 
> I don't think the Mac OS X WebKit is built with -O3.

From WebCore/Configurations/Base.xcconfig:
GCC_OPTIMIZATION_LEVEL_normal = 2;
GCC_OPTIMIZATION_LEVEL_debug = 0;
Comment 15 Jens Alfke 2009-12-02 12:47:55 PST
Ah, good point Mark. I was disassembling in Chromium's WebCore.xcodeproj, not Safari's.
I tried this with Safari's WebCore project, and the init method is not inlined. Sorry about that.

Whether this is worthwhile depends on how many QualifiedName objects get constructed. There's one instance per unique HTML/XML tag, i.e. a couple of hundred at most. Shaving a couple of nanoseconds off of each constructor doesn't seem like it would make any difference. You can file a new bug on this if you disagree, though.