Bug 170925 - Using StringView.split() instead of String.split() in some places
Summary: Using StringView.split() instead of String.split() in some places
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: WebKit Local Build
Hardware: All All
: P2 Normal
Assignee: Daniel Bates
URL:
Keywords:
Depends on: 171396 171405
Blocks:
  Show dependency treegraph
 
Reported: 2017-04-17 17:08 PDT by Daniel Bates
Modified: 2017-05-02 16:15 PDT (History)
4 users (show)

See Also:


Attachments
Patch (14.72 KB, patch)
2017-04-17 17:09 PDT, Daniel Bates
no flags Details | Formatted Diff | Diff
Patch (23.60 KB, patch)
2017-04-18 12:22 PDT, Daniel Bates
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews116 for mac-elcapitan (1.81 MB, application/zip)
2017-04-18 14:35 PDT, Build Bot
no flags Details
Patch (23.78 KB, patch)
2017-04-18 16:38 PDT, Daniel Bates
no flags Details | Formatted Diff | Diff
Patch (26.58 KB, patch)
2017-04-28 17:22 PDT, Daniel Bates
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Bates 2017-04-17 17:08:49 PDT
Use StringView.split() instead of String.split() in some places to avoid intermediary allocation of substrings.
Comment 1 Daniel Bates 2017-04-17 17:09:52 PDT
Created attachment 307323 [details]
Patch
Comment 2 Sam Weinig 2017-04-17 18:42:37 PDT
Comment on attachment 307323 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=307323&action=review

> Source/WebCore/accessibility/AccessibilityObject.cpp:2241
> +    for (StringView roleName : StringView(value).split(' ')) {

I would use auto instead of StringView here.

> Source/WebCore/accessibility/AccessibilityObject.cpp:2242
> +        if (AccessibilityRole role = ariaRoleMap().get(roleName.toStringWithoutCopying()))

Seems unfortunate that you have to allocate a String here (though it's good you don't have to copy the characters.  Seems like a good place to use a HashTranslator.

> Source/WebCore/accessibility/AccessibilityObject.cpp:3079
> +    for (StringView idName : StringView(idList).split(' ')) {

Same, I would use auto.

> Source/WebCore/accessibility/AccessibilityObject.cpp:3080
> +        if (Element* idElement = treeScope.getElementById(idName.toStringWithoutCopying()))

I think here you want toAtomicString(), as getElementById will probably go ahead and do that if you don't.

> Source/WebCore/html/parser/XSSAuditor.cpp:245
> +        if (protocolIsJavaScript(value.toStringWithoutCopying()))

Seems like protocolIsJavaScript should have an overload that takes a StringView.
Comment 3 Daniel Bates 2017-04-18 12:22:52 PDT
Created attachment 307402 [details]
Patch

Updated patch based on feedback from Sam Weinig.
Comment 4 Build Bot 2017-04-18 14:35:07 PDT
Comment on attachment 307402 [details]
Patch

Attachment 307402 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/3558947

New failing tests:
http/tests/inspector/network/resource-sizes-network.html
Comment 5 Build Bot 2017-04-18 14:35:09 PDT
Created attachment 307418 [details]
Archive of layout-test-results from ews116 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews116  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 6 Daniel Bates 2017-04-18 16:38:53 PDT
Created attachment 307434 [details]
Patch

Fix the GTK EWS; rename the parameter to WebCore::semicolonSeparatedValueContainsJavaScriptURL() so that its name does not collide with the name of the for-loop variable.
Comment 7 Darin Adler 2017-04-18 18:43:51 PDT
Comment on attachment 307434 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=307434&action=review

> Source/WTF/wtf/HashMap.h:301
> +    KeyValuePairType* entry = const_cast<HashTableType&>(m_impl).template lookup<HashMapTranslatorAdapter<KeyValuePairTraits, HashTranslator>>(value);

Seems like this should be auto or auto* instead of KeyValuePairType*.

> Source/WTF/wtf/HashMap.h:406
> +    return get<typename HashTableType::IdentityTranslatorType>(key);

Why doesn’t this use IdentityTranslatorType instead of typename HashTableType::IdentityTranslatorType?

> Source/WebCore/accessibility/AccessibilityObject.cpp:2242
> +        if (AccessibilityRole role = ariaRoleMap().get<typename ARIARoleMap::IdentityTranslatorType>(roleName)) // Identity translator supports StringView; get() does not.

That is so peculiar. Even with the comment it’s so non-obvious that an "identity" translator would allow things that the get function would not allow itself. The comment just makes me wonder why! I wonder if there is some more elegant way to handle this. Or some clearer way to explain why explicit use of the identity translator makes this work.

> Source/WebCore/accessibility/AccessibilityObject.cpp:3080
> +        if (Element* idElement = treeScope.getElementById(idName.toAtomicString()))

Could consider optimizing this by using toExistingAtomicString; if that function exists and we would also have to make sure null string is handled properly.

Could also use auto* here instead of Element*.

> Source/WebCore/platform/URL.cpp:1149
> +    return protocolIs(url.toStringWithoutCopying(), "javascript");

This is an inefficient algorithm, allocating memory every time. We can do this much more efficiently by either copying and pasting the body of the protocolIs(const String&, const char*) function or using templates to make a StringView version.

Also there is a very strange FIXME above protocolIs(const String&, const char*) that says: "Why is this different than protocolIs(StringView, const char*)?' That seems to imply that there was a protocolIs function that took a StringView at one time. Perhaps it was deleted? The FIXME should go.

> Source/WebCore/platform/graphics/avfoundation/CDMPrivateMediaSourceAVFObjC.mm:107
> +    StringView keySystem = m_cdm->keySystem();

This looks unsafe. What type does m_cdm->keySystem() return? If it returns String then this is unsafe. If it returns const String& then this is safe. I worry that someone will change it from returning const String& to returning String and then make this code invalid. Hard to write this so it’s obviously safe. Think of StringView like a raw pointer. There’s always the question of why the StringView is still good after you initialize it from the result of an expression that might involve a temporary String.

Generally we should avoid local variables of type StringView that are initialized from a temporary String or look like they might be. Arguments of type StringView don’t have this problem, many cases of local variable StringView that are not initialized from temporary Strings are fine, and various other idioms are OK.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:1476
> +        auto segments = StringView(m_fontFaceElement->attributeWithoutSynchronization(SVGNames::font_weightAttr)).split(' ');
> +        for (auto segment : segments) {

I would have written this;

    auto& weight = m_fontFaceElement->attributeWithoutSynchronization(SVGNames::font_weightAttr);
    for (auto segment: StringView(weight).split(' ')) {

Makes the object lifetimes more obviously correct. Otherwise, you have the puzzle of why it’s OK for the StringView to outlive the result of attributeWithoutSynchronization.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:1489
> +        segments = StringView(m_fontFaceElement->attributeWithoutSynchronization(SVGNames::font_styleAttr)).split(' ');
> +        for (auto segment : segments) {

Same issue here.

> Source/WebCore/testing/Internals.cpp:3509
> +void Internals::setPageMuted(const String& statesString)

I think you can just change the argument type to StringView instead.
Comment 8 Sam Weinig 2017-04-18 19:08:04 PDT
(In reply to Darin Adler from comment #7)
> Comment on attachment 307434 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=307434&action=review

> 
> > Source/WTF/wtf/HashMap.h:406
> > +    return get<typename HashTableType::IdentityTranslatorType>(key);
> 
> Why doesn’t this use IdentityTranslatorType instead of typename
> HashTableType::IdentityTranslatorType?
> 
> > Source/WebCore/accessibility/AccessibilityObject.cpp:2242
> > +        if (AccessibilityRole role = ariaRoleMap().get<typename ARIARoleMap::IdentityTranslatorType>(roleName)) // Identity translator supports StringView; get() does not.
> 
> That is so peculiar. Even with the comment it’s so non-obvious that an
> "identity" translator would allow things that the get function would not
> allow itself. The comment just makes me wonder why! I wonder if there is
> some more elegant way to handle this. Or some clearer way to explain why
> explicit use of the identity translator makes this work.

One option, though I am not 100% sure this would work, would be to change HashMap::get from:

    MappedPeekType get(const KeyType&) const;

to

    template<typename K>
    MappedPeekType get(K&&) const;

And make get call m_impl.lookup<IdentityTranslator>(std::forward<K>(k)) (or some such). I would probably make sense to also do something like static_assert that HashTraits::hash(k) is invokable, to make errors slightly more clear.
Comment 9 Sam Weinig 2017-04-18 19:13:32 PDT
(In reply to Sam Weinig from comment #8)
> (In reply to Darin Adler from comment #7)
> > Comment on attachment 307434 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=307434&action=review
> 
> > 
> > > Source/WTF/wtf/HashMap.h:406
> > > +    return get<typename HashTableType::IdentityTranslatorType>(key);
> > 
> > Why doesn’t this use IdentityTranslatorType instead of typename
> > HashTableType::IdentityTranslatorType?
> > 
> > > Source/WebCore/accessibility/AccessibilityObject.cpp:2242
> > > +        if (AccessibilityRole role = ariaRoleMap().get<typename ARIARoleMap::IdentityTranslatorType>(roleName)) // Identity translator supports StringView; get() does not.
> > 
> > That is so peculiar. Even with the comment it’s so non-obvious that an
> > "identity" translator would allow things that the get function would not
> > allow itself. The comment just makes me wonder why! I wonder if there is
> > some more elegant way to handle this. Or some clearer way to explain why
> > explicit use of the identity translator makes this work.
> 
> One option, though I am not 100% sure this would work, would be to change
> HashMap::get from:
> 
>     MappedPeekType get(const KeyType&) const;
> 
> to
> 
>     template<typename K>
>     MappedPeekType get(K&&) const;
> 
> And make get call m_impl.lookup<IdentityTranslator>(std::forward<K>(k)) (or
> some such). I would probably make sense to also do something like
> static_assert that HashTraits::hash(k) is invokable, to make errors slightly
> more clear.

Alternatively, writing a specific Translator just for StringView might be the most straightforward and clear way to go.
Comment 10 Daniel Bates 2017-04-19 11:58:20 PDT
(In reply to Darin Adler from comment #7)
> Comment on attachment 307434 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=307434&action=review
> 
> > Source/WTF/wtf/HashMap.h:301
> > +    KeyValuePairType* entry = const_cast<HashTableType&>(m_impl).template lookup<HashMapTranslatorAdapter<KeyValuePairTraits, HashTranslator>>(value);
> 
> Seems like this should be auto or auto* instead of KeyValuePairType*.
> 

I can make this change if strongly desired. I feel doing this makes the code less readable. I understand the advantages of using auto/auto* to prevent narrowing conversions and make the code future proof. Can you please enlighten me on how making your proposed change makes it easier for a person reading this code to discover that the right-hand side of the expression returns an object of type KeyValuePairType, ultimately derived from struct KeyValuePair, that exposes member fields key and value? Suppose in the future KeyValuePairType is made to be effectively an alias for type NameValuePair and a NameValuePair object has member fields name and value. For some reason a person needs to modify this function (HashMap::get<>()) to access the key in the looked up result. How does your proposed change make it easier to discover that entry->name is the way to access this information? Are you assuming that a person working on this code will be using an editor with autocompletion? Are you assuming that a person using knowledge of typical hash table/map operations and intuition could determine this? If the latter, how does a person avoid the need to guess-and-check (e.g. try entry->key then entry->name)?

> > Source/WTF/wtf/HashMap.h:406
> > +    return get<typename HashTableType::IdentityTranslatorType>(key);
> 
> Why doesn’t this use IdentityTranslatorType instead of typename
> HashTableType::IdentityTranslatorType?
> 

Will change to use IdentityTranslatorType.

> > Source/WebCore/accessibility/AccessibilityObject.cpp:2242
> > +        if (AccessibilityRole role = ariaRoleMap().get<typename ARIARoleMap::IdentityTranslatorType>(roleName)) // Identity translator supports StringView; get() does not.
> 
> That is so peculiar. Even with the comment it’s so non-obvious that an
> "identity" translator would allow things that the get function would not
> allow itself. The comment just makes me wonder why! I wonder if there is
> some more elegant way to handle this. Or some clearer way to explain why
> explicit use of the identity translator makes this work.
> 

I am also not happy with this code and open to suggestions. I considered writing a HashTranslator-specific to the use case in this code, case-insensitive hashing a of a StringView. I chose not to take this approach because it would duplicate code in ASCIICaseInsensitiveHash. I also spoke to Anders Carlsson yesterday (04/19) about these approaches as well as the pecular code in question and asked for suggestions. In this discussion I brought up the idea Sam Weinig mentioned in comment #8 about making HashMap::get() take a templated type as its parameter instead of const KeyType&. Anders felt this was error prone and could lead to subtle bugs where we may return the wrong result.

Another idea is to pass ASCIICaseInsensitiveHash for "typename ARIARoleMap::IdentityTranslatorType" in the above code or to define an alias for it, maybe ASCIICaseInsensitiveStringViewHash. At the time I wrote the patch I chose against these approaches because I was worried that ASCIICaseInsensitiveStringViewHash could get out-of-sync with the hash function defined for the ARIA role map (ASCIICaseInsensitiveHash). Maybe there is nothing to worry?

> > Source/WebCore/accessibility/AccessibilityObject.cpp:3080
> > +        if (Element* idElement = treeScope.getElementById(idName.toAtomicString()))
> 
> Could consider optimizing this by using toExistingAtomicString; if that
> function exists and we would also have to make sure null string is handled
> properly.
> 
> Could also use auto* here instead of Element*.
> 

Will use auto*.

> > Source/WebCore/platform/URL.cpp:1149
> > +    return protocolIs(url.toStringWithoutCopying(), "javascript");
> 
> This is an inefficient algorithm, allocating memory every time. We can do
> this much more efficiently by either copying and pasting the body of the
> protocolIs(const String&, const char*) function or using templates to make a
> StringView version.
> 
> Also there is a very strange FIXME above protocolIs(const String&, const
> char*) that says: "Why is this different than protocolIs(StringView, const
> char*)?' 

Oops! I meant to remove this comment as it is out-of-date.

> That seems to imply that there was a protocolIs function that took
> a StringView at one time.

> Perhaps it was deleted? The FIXME should go.

Yes, the overload referenced in the comment was removed.

> 
> > Source/WebCore/platform/graphics/avfoundation/CDMPrivateMediaSourceAVFObjC.mm:107
> > +    StringView keySystem = m_cdm->keySystem();
> 
> This looks unsafe. What type does m_cdm->keySystem() return? If it returns
> String then this is unsafe. If it returns const String& then this is safe.

It returns a const String&.

> I worry that someone will change it from returning const String& to returning
> String and then make this code invalid.

I agree assuming that CDM::keySystem() will always returns a const String& is error prone. I will change this code to read:

String keySystem = m_cdm->keySystem(); // Local copy for StringView usage
StringView keySystemStringView { keySystem };
ASSERT(validKeySystemRE().match(keySystem) >= 0);
...

(I am open to name suggestions).

And will update the remainder of the code in this function to use keySystemStringView, as applicable.
 

> Hard to write this so it’s obviously
> safe. Think of StringView like a raw pointer. There’s always the question of
> why the StringView is still good after you initialize it from the result of
> an expression that might involve a temporary String.
> 
> Generally we should avoid local variables of type StringView that are
> initialized from a temporary String or look like they might be. Arguments of
> type StringView don’t have this problem, many cases of local variable
> StringView that are not initialized from temporary Strings are fine, and
> various other idioms are OK.
> 
> > Source/WebCore/svg/SVGToOTFFontConversion.cpp:1476
> > +        auto segments = StringView(m_fontFaceElement->attributeWithoutSynchronization(SVGNames::font_weightAttr)).split(' ');
> > +        for (auto segment : segments) {
> 
> I would have written this;
> 
>     auto& weight =
> m_fontFaceElement->attributeWithoutSynchronization(SVGNames::
> font_weightAttr);
>     for (auto segment: StringView(weight).split(' ')) {
> 
> Makes the object lifetimes more obviously correct. Otherwise, you have the
> puzzle of why it’s OK for the StringView to outlive the result of
> attributeWithoutSynchronization.
> 

Will change.

> > Source/WebCore/svg/SVGToOTFFontConversion.cpp:1489
> > +        segments = StringView(m_fontFaceElement->attributeWithoutSynchronization(SVGNames::font_styleAttr)).split(' ');
> > +        for (auto segment : segments) {
> 
> Same issue here.
> 

Will change.

> > Source/WebCore/testing/Internals.cpp:3509
> > +void Internals::setPageMuted(const String& statesString)
> 
> I think you can just change the argument type to StringView instead.

Will change.
Comment 11 Daniel Bates 2017-04-19 12:11:09 PDT
(In reply to Darin Adler from comment #7)
> > Source/WebCore/accessibility/AccessibilityObject.cpp:3080
> > +        if (Element* idElement = treeScope.getElementById(idName.toAtomicString()))
> 
> Could consider optimizing this by using toExistingAtomicString; if that
> function exists and we would also have to make sure null string is handled
> properly.
> 

I did not mean to ignore this comment in my reply in comment #10. StringView::toExistingAtomicString() does not exist.
Comment 12 Darin Adler 2017-04-19 15:39:55 PDT
Comment on attachment 307434 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=307434&action=review

>>> Source/WTF/wtf/HashMap.h:301
>>> +    KeyValuePairType* entry = const_cast<HashTableType&>(m_impl).template lookup<HashMapTranslatorAdapter<KeyValuePairTraits, HashTranslator>>(value);
>> 
>> Seems like this should be auto or auto* instead of KeyValuePairType*.
> 
> I can make this change if strongly desired. I feel doing this makes the code less readable. I understand the advantages of using auto/auto* to prevent narrowing conversions and make the code future proof. Can you please enlighten me on how making your proposed change makes it easier for a person reading this code to discover that the right-hand side of the expression returns an object of type KeyValuePairType, ultimately derived from struct KeyValuePair, that exposes member fields key and value? Suppose in the future KeyValuePairType is made to be effectively an alias for type NameValuePair and a NameValuePair object has member fields name and value. For some reason a person needs to modify this function (HashMap::get<>()) to access the key in the looked up result. How does your proposed change make it easier to discover that entry->name is the way to access this information? Are you assuming that a person working on this code will be using an editor with autocompletion? Are you assuming that a person using knowledge of typical hash table/map operations and intuition could determine this? If the latter, how does a person avoid the need to guess-and-check (e.g. try entry->key then entry->name)?

If there was no null check we would likely not name the type because we wouldn’t need a local variable. The code is:

    look up the entry in the hash map
    return the value from inside that entry (unless there is none)

I don’t think that the exact type used for entries in the hash map is relevant to understanding the code is correct. The code "entry->value" is self explanatory without knowing the specific type of entry.

>>> Source/WebCore/accessibility/AccessibilityObject.cpp:3080
>>> +        if (Element* idElement = treeScope.getElementById(idName.toAtomicString()))
>> 
>> Could consider optimizing this by using toExistingAtomicString; if that function exists and we would also have to make sure null string is handled properly.
>> 
>> Could also use auto* here instead of Element*.
> 
> Will use auto*.

At some point we should add toExistingAtomicString; it’s wasteful to create an atomic string and then destroy it in the case where the name is not already an atomic string.

>>> Source/WebCore/platform/URL.cpp:1149
>>> +    return protocolIs(url.toStringWithoutCopying(), "javascript");
>> 
>> This is an inefficient algorithm, allocating memory every time. We can do this much more efficiently by either copying and pasting the body of the protocolIs(const String&, const char*) function or using templates to make a StringView version.
>> 
>> Also there is a very strange FIXME above protocolIs(const String&, const char*) that says: "Why is this different than protocolIs(StringView, const char*)?' That seems to imply that there was a protocolIs function that took a StringView at one time. Perhaps it was deleted? The FIXME should go.
> 
> Oops! I meant to remove this comment as it is out-of-date.

Removing the comment is good. Using an efficient algorithm is important in the long run; we don’t want to allocate memory just to compare a URL against a protocol.
Comment 13 Sam Weinig 2017-04-20 09:39:11 PDT
(In reply to Daniel Bates from comment #10)
> (In reply to Darin Adler from comment #7)
> > Comment on attachment 307434 [details]
> > Patch
> > 
> > > Source/WebCore/accessibility/AccessibilityObject.cpp:2242
> > > +        if (AccessibilityRole role = ariaRoleMap().get<typename ARIARoleMap::IdentityTranslatorType>(roleName)) // Identity translator supports StringView; get() does not.
> > 
> > That is so peculiar. Even with the comment it’s so non-obvious that an
> > "identity" translator would allow things that the get function would not
> > allow itself. The comment just makes me wonder why! I wonder if there is
> > some more elegant way to handle this. Or some clearer way to explain why
> > explicit use of the identity translator makes this work.
> > 
> 
> I am also not happy with this code and open to suggestions. I considered
> writing a HashTranslator-specific to the use case in this code,
> case-insensitive hashing a of a StringView. I chose not to take this
> approach because it would duplicate code in ASCIICaseInsensitiveHash. I also
> spoke to Anders Carlsson yesterday (04/19) about these approaches as well as
> the pecular code in question and asked for suggestions. In this discussion I
> brought up the idea Sam Weinig mentioned in comment #8 about making
> HashMap::get() take a templated type as its parameter instead of const
> KeyType&. Anders felt this was error prone and could lead to subtle bugs
> where we may return the wrong result.
>

I am curious in what ways it would be error prone? Perhaps we could figure out a way to avoid those issues.
Comment 14 Sam Weinig 2017-04-20 09:43:40 PDT
(In reply to Darin Adler from comment #12)
> Comment on attachment 307434 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=307434&action=review
> 
> >>> Source/WebCore/accessibility/AccessibilityObject.cpp:3080
> >>> +        if (Element* idElement = treeScope.getElementById(idName.toAtomicString()))
> >> 
> >> Could consider optimizing this by using toExistingAtomicString; if that function exists and we would also have to make sure null string is handled properly.

Alternatively, we could add an additional overload to getElementById() that takes a StringView, since getElementById() already has the optimization for String, where it does nothing, if an atomic string cannot be made. That way, we could avoid exposing the fact knowledge that if an atomic string is not present, there can be no associated element. But maybe we already expose that fact for the bindings. Either way, we should add toExistingAtomicString() to StringView since it will no doubt be useful elsewhere.
Comment 15 Daniel Bates 2017-04-28 14:30:43 PDT
(In reply to Sam Weinig from comment #14)
> (In reply to Darin Adler from comment #12)
> > Comment on attachment 307434 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=307434&action=review
> > 
> > >>> Source/WebCore/accessibility/AccessibilityObject.cpp:3080
> > >>> +        if (Element* idElement = treeScope.getElementById(idName.toAtomicString()))
> > >> 
> > >> Could consider optimizing this by using toExistingAtomicString; if that function exists and we would also have to make sure null string is handled properly.
> 
> Alternatively, we could add an additional overload to getElementById() that
> takes a StringView, since getElementById() already has the optimization for
> String, where it does nothing, if an atomic string cannot be made. That way,
> we could avoid exposing the fact knowledge that if an atomic string is not
> present, there can be no associated element. But maybe we already expose
> that fact for the bindings. Either way, we should add
> toExistingAtomicString() to StringView since it will no doubt be useful
> elsewhere.

Filed bug #171405 to add StringView::toExistingAtomicString(). I will update this patch to add an overload of TreeScope::getElementById() that takes a StringView and is implemented in terms of StringView::toExistingAtomicString(). Notice that both the String and AtomicString constructors take a RefPtr<AtomicStringImpl> (returned by StringView::toExistingAtomicString()). I chose to also add the overload to avoid the need to disambiguate between TreeScope::getElementById(const String&) and TreeScope::getElementById(const AtomicString&) at the call site. Therefore the code at the call site will read:

...
if (Element* idElement = treeScope.getElementById(idName))
...
Comment 16 Daniel Bates 2017-04-28 15:25:55 PDT
(In reply to Darin Adler from comment #12)
> [...]
> >>> Source/WebCore/accessibility/AccessibilityObject.cpp:3080
> >>> +        if (Element* idElement = treeScope.getElementById(idName.toAtomicString()))
> >> 
> >> Could consider optimizing this by using toExistingAtomicString; if that function exists and we would also have to make sure null string is handled properly.
> >> 
> >> Could also use auto* here instead of Element*.
> > 
> > Will use auto*.
> 
> At some point we should add toExistingAtomicString; it’s wasteful to create
> an atomic string and then destroy it in the case where the name is not
> already an atomic string.
> 

Added StringView::toExistingAtomicString() in bug #171405. Will update patch to make use of it. See comment #15 for more details.

> >>> Source/WebCore/platform/URL.cpp:1149
> >>> +    return protocolIs(url.toStringWithoutCopying(), "javascript");
> >> 
> >> This is an inefficient algorithm, allocating memory every time. We can do this much more efficiently by either copying and pasting the body of the protocolIs(const String&, const char*) function or using templates to make a StringView version.
> >> 
> >> Also there is a very strange FIXME above protocolIs(const String&, const char*) that says: "Why is this different than protocolIs(StringView, const char*)?' That seems to imply that there was a protocolIs function that took a StringView at one time. Perhaps it was deleted? The FIXME should go.
> > 
> > Oops! I meant to remove this comment as it is out-of-date.
> 
> Removing the comment is good. Using an efficient algorithm is important in
> the long run; we don’t want to allocate memory just to compare a URL against
> a protocol.

Removed out-of-date comment and added WebCore::protocolIs(StringView) with a template-based implementation in bug #171396. Will update patch to make use of it.
Comment 17 Daniel Bates 2017-04-28 17:22:29 PDT
Created attachment 308626 [details]
Patch

Updated patch based on feedback from Darin Adler and Sam Weinig.
Comment 18 Darin Adler 2017-05-01 09:30:22 PDT
Comment on attachment 308626 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=308626&action=review

> Source/WTF/wtf/HashMap.h:72
> +    using IdentityTranslatorType = typename HashTableType::IdentityTranslatorType;

Seems fine to have this. Not 100% sure it should be public; would probably be fine to have it private.

> Source/WebCore/accessibility/AccessibilityObject.cpp:2131
> +// FIXME: Find a way to incorporate this functionality into ASCIICaseInsensitiveHash and allow
> +// a HashMap whose keys are type String to perform operations when given a key of type StringView.
> +struct ASCIICaseInsensitiveStringViewHashTranslator {
> +    static unsigned hash(StringView key)
> +    {
> +        if (key.is8Bit())
> +            return ASCIICaseInsensitiveHash::hash(key.characters8(), key.length());
> +        return ASCIICaseInsensitiveHash::hash(key.characters16(), key.length());
> +    }
> +
> +    static bool equal(const String& a, StringView b)
> +    {
> +        return equalIgnoringASCIICaseCommon(a, b);
> +    }
> +};

That comment seems fine, but it’s unclear why even in its current form this is here and not in the the header with ASCIICaseSensitiveHash.

> Source/WebCore/dom/TreeScope.cpp:122
> +    if (RefPtr<AtomicStringImpl> atomicElementId = elementId.toExistingAtomicString())

I would use auto here instead of writing out RefPtr.

> Source/WebCore/html/parser/XSSAuditor.cpp:242
> +static bool semicolonSeparatedValueContainsJavaScriptURL(const String& semicolonSeparatedValue)

Should change argument type to StringView.

The one call site for this function calls stripLeadingAndTrailingHTMLSpaces before calling this function. This is peculiar because it means that leading HTML spaces on the entire value are ignored, but not spaces after semicolons. I am not sure that is actually the desired behavior. Also, stripping trailing spaces is pointless; such spaces will be ignored anyway. We might want to do that stripping of leading spaces on each substring before calling protocolIsJavaScript. Note that stripLeadingAndTrailingHTMLSpaces or just stripping leading spaces are natural operations to support on a StringView rather than a String when the result is going to inspected rather than stored. There is no need to allocate a new String for the substring computed after trimming off possible spaces. Generally speaking StringView is excellent for substring operations like this one as long as the substring is not going to be turned back into a String for storage. Worth coming back to and tidying up later.
Comment 19 Daniel Bates 2017-05-02 15:48:28 PDT
(In reply to Darin Adler from comment #18)
> Comment on attachment 308626 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=308626&action=review
> 
> > Source/WTF/wtf/HashMap.h:72
> > +    using IdentityTranslatorType = typename HashTableType::IdentityTranslatorType;
> 
> Seems fine to have this. Not 100% sure it should be public; would probably
> be fine to have it private.
> 

Will make private.

> > Source/WebCore/accessibility/AccessibilityObject.cpp:2131
> > +// FIXME: Find a way to incorporate this functionality into ASCIICaseInsensitiveHash and allow
> > +// a HashMap whose keys are type String to perform operations when given a key of type StringView.
> > +struct ASCIICaseInsensitiveStringViewHashTranslator {
> > +    static unsigned hash(StringView key)
> > +    {
> > +        if (key.is8Bit())
> > +            return ASCIICaseInsensitiveHash::hash(key.characters8(), key.length());
> > +        return ASCIICaseInsensitiveHash::hash(key.characters16(), key.length());
> > +    }
> > +
> > +    static bool equal(const String& a, StringView b)
> > +    {
> > +        return equalIgnoringASCIICaseCommon(a, b);
> > +    }
> > +};
> 
> That comment seems fine, but it’s unclear why even in its current form this
> is here and not in the the header with ASCIICaseSensitiveHash.
> 

Moved to file Source/WTF/wtf/text/StringHash.h (where ASCIICaseSensitiveHash is defined).

> > Source/WebCore/dom/TreeScope.cpp:122
> > +    if (RefPtr<AtomicStringImpl> atomicElementId = elementId.toExistingAtomicString())
> 
> I would use auto here instead of writing out RefPtr.
> 

Will use auto.

> > Source/WebCore/html/parser/XSSAuditor.cpp:242
> > +static bool semicolonSeparatedValueContainsJavaScriptURL(const String& semicolonSeparatedValue)
> 
> Should change argument type to StringView.
> 

Will change.

> The one call site for this function calls stripLeadingAndTrailingHTMLSpaces
> before calling this function. This is peculiar because it means that leading
> HTML spaces on the entire value are ignored, but not spaces after
> semicolons. I am not sure that is actually the desired behavior. Also,
> stripping trailing spaces is pointless; such spaces will be ignored anyway.
> We might want to do that stripping of leading spaces on each substring
> before calling protocolIsJavaScript. Note that
> stripLeadingAndTrailingHTMLSpaces or just stripping leading spaces are
> natural operations to support on a StringView rather than a String when the
> result is going to inspected rather than stored. There is no need to
> allocate a new String for the substring computed after trimming off possible
> spaces. Generally speaking StringView is excellent for substring operations
> like this one as long as the substring is not going to be turned back into a
> String for storage. Worth coming back to and tidying up later.

I hope you do not mind that I defer looking into this issue for now.
Comment 20 Daniel Bates 2017-05-02 15:51:48 PDT
Committed r216102: <http://trac.webkit.org/changeset/216102>
Comment 21 Daniel Bates 2017-05-02 16:08:01 PDT
Committed build fix in <http://trac.webkit.org/changeset/216105>.
Comment 22 Daniel Bates 2017-05-02 16:15:53 PDT
Fixed up ChangeLog entry in <http://trac.webkit.org/changeset/216106>.