Bug 136184

Summary: Remove unnecessary WTFString creation in Dictionary getter
Product: WebKit Reporter: Joseph Pecoraro <joepeck>
Component: WebCore Misc.Assignee: Joseph Pecoraro <joepeck>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, commit-queue, darin, joepeck
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
[PATCH] Proposed Fix
none
[PATCH] Proposed Fix - should apply none

Description Joseph Pecoraro 2014-08-22 16:48:12 PDT
Dictionary::getWithUndefinedOrNullCheck was taking a WTFString but could just take a string literal. Its only caller uses a literal, and the rest of the code expects literals.
Comment 1 Joseph Pecoraro 2014-08-22 16:49:22 PDT
Created attachment 237012 [details]
[PATCH] Proposed Fix
Comment 2 Joseph Pecoraro 2014-08-22 16:50:11 PDT
Created attachment 237013 [details]
[PATCH] Proposed Fix - should apply
Comment 3 Benjamin Poulain 2014-08-22 17:12:45 PDT
Comment on attachment 237013 [details]
[PATCH] Proposed Fix - should apply

View in context: https://bugs.webkit.org/attachment.cgi?id=237013&action=review

> Source/WebCore/bindings/js/JSDictionary.cpp:272
> +    if (tryGetProperty(propertyName, value) != PropertyFound || value.isUndefinedOrNull())

lol!
Comment 4 WebKit Commit Bot 2014-08-22 17:43:39 PDT
Comment on attachment 237013 [details]
[PATCH] Proposed Fix - should apply

Clearing flags on attachment: 237013

Committed r172876: <http://trac.webkit.org/changeset/172876>
Comment 5 WebKit Commit Bot 2014-08-22 17:43:41 PDT
All reviewed patches have been landed.  Closing bug.
Comment 6 Darin Adler 2014-08-24 11:57:41 PDT
Comment on attachment 237013 [details]
[PATCH] Proposed Fix - should apply

View in context: https://bugs.webkit.org/attachment.cgi?id=237013&action=review

>> Source/WebCore/bindings/js/JSDictionary.cpp:272
>> +    if (tryGetProperty(propertyName, value) != PropertyFound || value.isUndefinedOrNull())
> 
> lol!

Great change. But are any callers passing in Latin-1 strings rather than UTF-8 strings? Nothing in const char* makes it clear whether the strings must be ASCII, or can be Latin-1 or UTF-8.
Comment 7 Joseph Pecoraro 2014-08-25 11:41:16 PDT
(In reply to comment #6)
> (From update of attachment 237013 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=237013&action=review
> 
> >> Source/WebCore/bindings/js/JSDictionary.cpp:272
> >> +    if (tryGetProperty(propertyName, value) != PropertyFound || value.isUndefinedOrNull())
> > 
> > lol!
> 
> Great change. But are any callers passing in Latin-1 strings rather than UTF-8 strings? Nothing in const char* makes it clear whether the strings must be ASCII, or can be Latin-1 or UTF-8.

There is only one caller passing an ASCII string:

    options.getWithUndefinedOrNullCheck("keyPath", keyPathString)

The choice of "const char*" was to match the rest of the APIs in JSDictionary. Namely:

>    bool tryGetProperty(const char* propertyName, Result&) const;
>    bool get(const char* propertyName, Result&) const;

You've given a similar comment in a few patches and I think I'm missing the point you're making. Are you suggesting we should clarify the API by having it take an LChar*/UChar*? I still think in cases where a string literal is given (like the above) we will need a "const char*" method casting and forwarding to LChar*/UChar*. I'm not sure if any compile time checking is gained.
Comment 8 Darin Adler 2014-08-25 12:17:20 PDT
(In reply to comment #7)
> Are you suggesting we should clarify the API by having it take an LChar*/UChar*?

No.

I’ll come talk to you in person about this some time. My problem with functions that take const char* is that any function we have that takes a const char* without a length:

1) Has to do a strlen at runtime, can’t fold it away at compile time. But I realized that ASCIILiteral has the same issue, so not sure what I was all worked up about.

And even if it has a length:

2) It is ambiguous about how it handles non-ASCII characters. char* might mean Latin-1 or UTF-8 or it might be an ASCII-only function. It’s not great to have functions where that is ambiguous, and it’s not great to add more. It’s true that LChar* always means Latin-1, but it’s not suitable for use with literals because you have to cast every time you use it.
Comment 9 Joseph Pecoraro 2014-08-25 15:36:38 PDT
(In reply to comment #8)
> (In reply to comment #7)
> > Are you suggesting we should clarify the API by having it take an LChar*/UChar*?
> 
> No.
> 
> I’ll come talk to you in person about this some time. My problem with functions that take const char* is that any function we have that takes a const char* without a length:
> 
> 1) Has to do a strlen at runtime, can’t fold it away at compile time. But I realized that ASCIILiteral has the same issue, so not sure what I was all worked up about.

Well, we may still be able to eliminate that using ASCIILiteral.

That was going to be my next step. Give ASCIILiteral the constructor that knows the length at compile time, then check if there is a performance improvement. After talking with Ben, this might be worth doing on most architectures. I think the reason we originally avoided it with ASCIILiteral was on older ARM architectures the two argument constructor was hurting more then helping.

The change may look something like this (completely untested):

>  class ASCIILiteral {
>  public:
> -    explicit ASCIILiteral(const char* characters) : m_characters(characters) { }
> +    template<unsigned charactersCount>
> +    explicit ASCIILiteral(const char (&characters)[charactersCount])
> +        : m_characters(characters)
> +        , m_length(charactersCount) { }
>
>      operator const char*() { return m_characters; }
> +    unsigned length() const { return m_length; }
>  
>  private:
>      const char* m_characters;
> +    unsigned m_length;     
>  };

Then modifying WTF::String(ASCIILiteral) constructor to avoid strlen():

> -String::String(ASCIILiteral characters)
> -    : m_impl(StringImpl::createFromLiteral(characters))
> -{
> -}
>
> +String::String(ASCIILiteral literal)
> +    : m_impl(StringImpl::createFromLiteral(literal, literal.length()))
> +{
> +}

Or just removing ASCIILiteral in favor of the String(const char (&characters)[charactersCount]) constructor. Each of these approaches could be #ifdef'd based on the port if this really is still an issue on a particular architecture.

I believe Ben wanted to take this even further and somehow have an optimized constructor passing the string and length in a single pointer. So we still get the one argument constructor but get all the data we need without strlen().


> And even if it has a length:
> 
> 2) It is ambiguous about how it handles non-ASCII characters. char* might mean Latin-1 or UTF-8 or it might be an ASCII-only function. It’s not great to have functions where that is ambiguous, and it’s not great to add more. It’s true that LChar* always means Latin-1, but it’s not suitable for use with literals because you have to cast every time you use it.

I see. Any ideas on how to solve this? One thing I like about "ASCIILiteral(...)" is that we pretty much know the contents are ASCII. We could have a LATIN1("...") macro that does the cast for us and is slightly more pleasing to the eye then a large static_cast, but I tend to prefer being explicit where possible.