145109 – Move AtomicStringImpl table related operations from AtomicString to AtomicStringImpl

RESOLVED FIXED 145109

Move AtomicStringImpl table related operations from AtomicString to AtomicStringImpl

https://bugs.webkit.org/show_bug.cgi?id=145109

Summary Move AtomicStringImpl table related operations from AtomicString to AtomicStr...

Yusuke Suzuki

Reported 2015-05-17 12:34:56 PDT

Move AtomicStringImpl table related operations from AtomicString to AtomicStringImpl

Attachments
Patch (82.79 KB, patch) 2015-05-17 12:36 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch (82.76 KB, patch) 2015-05-17 12:37 PDT, Yusuke Suzuki	darin: review+	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Yusuke Suzuki

Comment 1 2015-05-17 12:36:52 PDT

Created attachment 253292 [details] Patch

Yusuke Suzuki

Comment 2 2015-05-17 12:37:53 PDT

Created attachment 253293 [details] Patch

WebKit Commit Bot

Comment 3 2015-05-17 12:39:34 PDT

Attachment 253293 [details] did not pass style-queue: ERROR: Source/WTF/wtf/text/AtomicString.h:45: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] ERROR: Source/WTF/wtf/text/AtomicString.h:46: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] ERROR: Source/WTF/wtf/text/AtomicString.h:47: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] ERROR: Source/WTF/wtf/text/AtomicString.h:48: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] ERROR: Source/WTF/wtf/text/AtomicString.h:49: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] ERROR: Source/WTF/wtf/text/AtomicString.h:50: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] ERROR: Source/WTF/wtf/text/AtomicString.h:61: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] ERROR: Source/WTF/wtf/text/AtomicString.h:164: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] ERROR: Source/WTF/wtf/text/AtomicString.h:167: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] ERROR: Source/WTF/wtf/text/WTFString.h:121: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] ERROR: Source/WTF/wtf/text/WTFString.h:122: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] ERROR: Source/WTF/wtf/text/WTFString.h:123: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] Total errors found: 12 in 29 files If any of these errors are false positives, please file a bug against check-webkit-style.

Yusuke Suzuki

Comment 4 2015-05-18 08:53:55 PDT

Comment on attachment 253293 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=253293&action=review Added comments > Source/WTF/wtf/text/AtomicString.h:50 > + AtomicString(const UChar* s) : m_string(AtomicStringImpl::add(s)) { } These follow the original code's style. And it causes cpplint's warnings. Is it better to change it to the following? AtomicString(const LChar* s) : m_string(AtomicStringImpl::add(s)) { }

Darin Adler

Comment 5 2015-05-19 08:56:29 PDT

(In reply to comment #4) > These follow the original code's style. And it causes cpplint's warnings. > Is it better to change it to the following? > AtomicString(const LChar* s) > : m_string(AtomicStringImpl::add(s)) > { > } I like that style and yes that’s how I would do it. But, better, I would not put the function definition into the class definition. I would declare it in the class and define it after the class. The class definition itself is more readable if it’s not full of function bodies.

Darin Adler

Comment 6 2015-05-19 09:06:30 PDT

Comment on attachment 253293 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=253293&action=review > Source/JavaScriptCore/runtime/TypeSet.cpp:355 > - String property = String((*iter)); > + String property = String(iter->get()); I don’t understand this change. > Source/WTF/wtf/text/AtomicString.cpp:108 > + if (RefPtr<AtomicStringImpl> impl = AtomicStringImpl::addFromUTF8(charactersStart, charactersEnd)) { Should use auto here instead of writing out the RefPtr type. > Source/WTF/wtf/text/AtomicString.cpp:110 > + AtomicString atomicString(impl.get()); > + return atomicString; I don’t understand the reason for the local variable here. I would write: return impl.get(); Or if that doesn’t work: return AtomicString { impl.get() }; > Source/WTF/wtf/text/AtomicString.cpp:112 > + return nullAtom; I prefer early return for the error case, although that gets rid of the scoping. >> Source/WTF/wtf/text/AtomicString.h:50 >> + AtomicString(const UChar* s) : m_string(AtomicStringImpl::add(s)) { } > > These follow the original code's style. And it causes cpplint's warnings. > Is it better to change it to the following? > AtomicString(const LChar* s) > : m_string(AtomicStringImpl::add(s)) > { > } I like that style and yes that’s how I would do it. But, better, I would not put the function definition into the class definition. I would declare it in the class and define it after the class. The class definition itself is more readable if it’s not full of function bodies. > Source/WTF/wtf/text/AtomicStringImpl.h:33 > +private: > + AtomicStringImpl() = delete; I suggest putting this down in the private section below. > Source/WTF/wtf/text/AtomicStringImpl.h:39 > + static AtomicStringImpl* lookup(LChar* characters, unsigned length) > + { > + return lookupInternal(characters, length); > + } The verb "look up" is two words. So the function name should be lookUp. Not "lookup", which is a noun, not a verb. Should return a reference, not a pointer. > Source/WTF/wtf/text/AtomicStringImpl.h:43 > + static AtomicStringImpl* lookup(UChar* characters, unsigned length) > + { > + return lookupInternal(characters, length); > + } The verb "look up" is two words. So the function name should be lookUp. Not "lookup", which is a noun, not a verb. Should return a reference, not a pointer. > Source/WTF/wtf/text/AtomicStringImpl.h:49 > + static AtomicStringImpl* lookup(StringImpl* string) > + { > + if (!string || string->isAtomic()) > + return static_cast<AtomicStringImpl*>(string); > + return lookupSlowCase(*string); > + } The verb "look up" is two words. So the function name should be lookUp. Not "lookup", which is a noun, not a verb. Should return a reference, not a pointer. Can we have one that takes and returns a reference? The one that takes and returns a pointer can call the one that takes and returns a reference after doing the null check. > Source/WTF/wtf/text/AtomicStringImpl.h:60 > + WTF_EXPORT_STRING_API static RefPtr<AtomicStringImpl> add(const LChar*); > + ALWAYS_INLINE static RefPtr<AtomicStringImpl> add(const char* s) { return add(reinterpret_cast<const LChar*>(s)); }; > + WTF_EXPORT_STRING_API static RefPtr<AtomicStringImpl> add(const LChar*, unsigned length); > + WTF_EXPORT_STRING_API static RefPtr<AtomicStringImpl> add(const UChar*, unsigned length); > + ALWAYS_INLINE static RefPtr<AtomicStringImpl> add(const char* s, unsigned length) { return add(reinterpret_cast<const LChar*>(s), length); }; > + WTF_EXPORT_STRING_API static Ref<AtomicStringImpl> add(const UChar*, unsigned length, unsigned existingHash); > + WTF_EXPORT_STRING_API static RefPtr<AtomicStringImpl> add(const UChar*); > + WTF_EXPORT_STRING_API static RefPtr<AtomicStringImpl> add(StringImpl*, unsigned offset, unsigned length); Any of these that can’t return null should be returning Ref, not RefPtr. I guess maybe they already are. > Source/WTF/wtf/text/AtomicStringImpl.h:67 > + WTF_EXPORT_STRING_API static Ref<AtomicStringImpl> addFromLiteralData(const char* characters, unsigned length); I’m not sure “add from” is the best naming here. Maybe just addLiteralData. But I’m also not sure what “literal data” is. I think maybe addLiteral is the best name. > Source/WTF/wtf/text/AtomicStringImpl.h:70 > + // AtomicStringImpl::addFromUTF8 will return a nullptr if > + // the input data contains invalid UTF-8 sequences. Don’t need to say the name of the function in the comment: // Returns null if the input data contains an invalid UTF-8 sequence. > Source/WTF/wtf/text/AtomicStringImpl.h:71 > + WTF_EXPORT_STRING_API static RefPtr<AtomicStringImpl> addFromUTF8(const char* start, const char* end); This should be addUTF8, because from doesn’t really make sense. I don’t understand why we use a start and end pointer here, though! > Source/WTF/wtf/text/AtomicStringImpl.h:80 > + return static_cast<AtomicStringImpl*>(string); This should just be: return nullptr; > Source/WTF/wtf/text/StringImpl.h:333 > + // To allow generating the destructor in the base classes, we expose the destructor as protected. Don’t need that comment. That’s what protected means. Should move this down because we try to organize our classes with public first, then protected, then private, without interleaving them. > Source/WTF/wtf/text/WTFString.h:119 > String(Ref<StringImpl>&& impl) : m_impl(WTF::move(impl)) { } > String(RefPtr<StringImpl>&& impl) : m_impl(impl) { } Doesn’t make sense that one of these uses WTF::move and the other does not! > Source/WTF/wtf/text/WTFString.h:121 > + String(PassRefPtr<AtomicStringImpl> impl) : m_impl(impl) { } Do we really need to add this one? PassRefPtr is deprecated. Can we just fix all the call sites, or are there too many? > Source/WTF/wtf/text/WTFString.h:123 > + String(Ref<AtomicStringImpl>&& impl) : m_impl(WTF::move(impl)) { } > + String(RefPtr<AtomicStringImpl>&& impl) : m_impl(impl) { } Doesn’t make sense that one of these uses WTF::move and the other does not!

Yusuke Suzuki

Comment 7 2015-05-19 09:33:57 PDT

(In reply to comment #5) > (In reply to comment #4) > > These follow the original code's style. And it causes cpplint's warnings. > > Is it better to change it to the following? > > AtomicString(const LChar* s) > > : m_string(AtomicStringImpl::add(s)) > > { > > } > > I like that style and yes that’s how I would do it. But, better, I would not > put the function definition into the class definition. I would declare it in > the class and define it after the class. The class definition itself is more > readable if it’s not full of function bodies. OK. So following the style, declaring the prototype in class body and define it with `inline` in the header.

Yusuke Suzuki

Comment 8 2015-05-19 11:18:26 PDT

Comment on attachment 253293 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=253293&action=review Thank you for your review! >> Source/JavaScriptCore/runtime/TypeSet.cpp:355 >> + String property = String(iter->get()); > > I don’t understand this change. This is due to ambiguity of the following things. String(RefPtr<AtomicStringImpl>&& impl) : m_impl(impl) { } String(PassRefPtr<AtomicStringImpl> impl) : m_impl(impl) { } String(PassRefPtr<StringImpl> impl) : m_impl(impl) { } OK, so in this time, I've just changed it to `**iter` to extract StringImpl& reference. >> Source/WTF/wtf/text/AtomicString.cpp:108 >> + if (RefPtr<AtomicStringImpl> impl = AtomicStringImpl::addFromUTF8(charactersStart, charactersEnd)) { > > Should use auto here instead of writing out the RefPtr type. Make sense! Changed. >> Source/WTF/wtf/text/AtomicString.cpp:110 >> + return atomicString; > > I don’t understand the reason for the local variable here. I would write: > > return impl.get(); > > Or if that doesn’t work: > > return AtomicString { impl.get() }; `return impl.get();` works. Fixed. >> Source/WTF/wtf/text/AtomicString.cpp:112 >> + return nullAtom; > > I prefer early return for the error case, although that gets rid of the scoping. OK, rewrite it to auto impl = ... if (!impl) return nullAtom; return impl.get(); >>> Source/WTF/wtf/text/AtomicString.h:50 >>> + AtomicString(const UChar* s) : m_string(AtomicStringImpl::add(s)) { } >> >> These follow the original code's style. And it causes cpplint's warnings. >> Is it better to change it to the following? >> AtomicString(const LChar* s) >> : m_string(AtomicStringImpl::add(s)) >> { >> } > > I like that style and yes that’s how I would do it. But, better, I would not put the function definition into the class definition. I would declare it in the class and define it after the class. The class definition itself is more readable if it’s not full of function bodies. Fixed :D >> Source/WTF/wtf/text/AtomicStringImpl.h:33 >> + AtomicStringImpl() = delete; > > I suggest putting this down in the private section below. Fixed. >> Source/WTF/wtf/text/AtomicStringImpl.h:39 >> + } > > The verb "look up" is two words. So the function name should be lookUp. Not "lookup", which is a noun, not a verb. Should return a reference, not a pointer. Ah, but, when the AtomicStringImpl's table doesn't contain AtomicStringImpl* for the given string, it returns nullptr. Changed `lookup` to `lookUp`. >> Source/WTF/wtf/text/AtomicStringImpl.h:43 >> + } > > The verb "look up" is two words. So the function name should be lookUp. Not "lookup", which is a noun, not a verb. Should return a reference, not a pointer. Fixed the name. >> Source/WTF/wtf/text/AtomicStringImpl.h:60 >> + WTF_EXPORT_STRING_API static RefPtr<AtomicStringImpl> add(StringImpl*, unsigned offset, unsigned length); > > Any of these that can’t return null should be returning Ref, not RefPtr. I guess maybe they already are. Seeing the implementation, when the given char* (const UChar* etc.) is null, it returns nullptr. Some (like `add(const UChar*, unsigned length, unsigned existingHash);`) returns Ref, because they require the given pointer is not nullptr. (since there's existingHash value, it is not nullptr in the used context I guess.) >> Source/WTF/wtf/text/AtomicStringImpl.h:67 >> + WTF_EXPORT_STRING_API static Ref<AtomicStringImpl> addFromLiteralData(const char* characters, unsigned length); > > I’m not sure “add from” is the best naming here. Maybe just addLiteralData. But I’m also not sure what “literal data” is. I think maybe addLiteral is the best name. Changed to `addLiteral`. >> Source/WTF/wtf/text/AtomicStringImpl.h:70 >> + // the input data contains invalid UTF-8 sequences. > > Don’t need to say the name of the function in the comment: > > // Returns null if the input data contains an invalid UTF-8 sequence. Fixed. >> Source/WTF/wtf/text/AtomicStringImpl.h:71 >> + WTF_EXPORT_STRING_API static RefPtr<AtomicStringImpl> addFromUTF8(const char* start, const char* end); > > This should be addUTF8, because from doesn’t really make sense. I don’t understand why we use a start and end pointer here, though! I've changed the API name to addUTF8. Haha, I guess that the given UTF8 is user-input; it may contain \0 and to avoid performing strlen(str) accidentally, we manage the range with pointers. But it's my guess. >> Source/WTF/wtf/text/AtomicStringImpl.h:80 >> + return static_cast<AtomicStringImpl*>(string); > > This should just be: > > return nullptr; Fixed. >> Source/WTF/wtf/text/StringImpl.h:333 >> + // To allow generating the destructor in the base classes, we expose the destructor as protected. > > Don’t need that comment. That’s what protected means. > > Should move this down because we try to organize our classes with public first, then protected, then private, without interleaving them. OK, removed the comment and moved it down. >> Source/WTF/wtf/text/WTFString.h:119 >> String(RefPtr<StringImpl>&& impl) : m_impl(impl) { } > > Doesn’t make sense that one of these uses WTF::move and the other does not! OK, use WTF::move. >> Source/WTF/wtf/text/WTFString.h:121 >> + String(PassRefPtr<AtomicStringImpl> impl) : m_impl(impl) { } > > Do we really need to add this one? PassRefPtr is deprecated. Can we just fix all the call sites, or are there too many? At least, dropping this can be done! >> Source/WTF/wtf/text/WTFString.h:123 >> + String(RefPtr<AtomicStringImpl>&& impl) : m_impl(impl) { } > > Doesn’t make sense that one of these uses WTF::move and the other does not! Fixed.

Yusuke Suzuki

Comment 9 2015-05-19 11:22:46 PDT

Comment on attachment 253293 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=253293&action=review >>> Source/JavaScriptCore/runtime/TypeSet.cpp:355 >>> + String property = String(iter->get()); >> >> I don’t understand this change. > > This is due to ambiguity of the following things. > String(RefPtr<AtomicStringImpl>&& impl) : m_impl(impl) { } > String(PassRefPtr<AtomicStringImpl> impl) : m_impl(impl) { } > String(PassRefPtr<StringImpl> impl) : m_impl(impl) { } > > OK, so in this time, I've just changed it to `**iter` to extract StringImpl& reference. Ah, it may be null, so use `String property = iter->get();`

Darin Adler

Comment 10 2015-05-19 13:39:33 PDT

Comment on attachment 253293 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=253293&action=review >>> Source/WTF/wtf/text/AtomicStringImpl.h:71 >>> + WTF_EXPORT_STRING_API static RefPtr<AtomicStringImpl> addFromUTF8(const char* start, const char* end); >> >> This should be addUTF8, because from doesn’t really make sense. I don’t understand why we use a start and end pointer here, though! > > I've changed the API name to addUTF8. > Haha, I guess that the given UTF8 is user-input; it may contain \0 and to avoid performing strlen(str) accidentally, we manage the range with pointers. > But it's my guess. I was wondering more about why two pointers rather than start and length. We almost always use pointer and length for this kind of thing, as in addFromLiteralData above.

Yusuke Suzuki

Comment 11 2015-05-19 19:01:29 PDT

Comment on attachment 253293 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=253293&action=review >>>> Source/WTF/wtf/text/AtomicStringImpl.h:71 >>>> + WTF_EXPORT_STRING_API static RefPtr<AtomicStringImpl> addFromUTF8(const char* start, const char* end); >>> >>> This should be addUTF8, because from doesn’t really make sense. I don’t understand why we use a start and end pointer here, though! >> >> I've changed the API name to addUTF8. >> Haha, I guess that the given UTF8 is user-input; it may contain \0 and to avoid performing strlen(str) accidentally, we manage the range with pointers. >> But it's my guess. > > I was wondering more about why two pointers rather than start and length. We almost always use pointer and length for this kind of thing, as in addFromLiteralData above. Ah, seeing the caller side of this function, I've found the edge case. In AtomicString::fromUTF8(const char* characters), we call it with the form, fromUTF8Internal(characters, 0); Here, 0 means nullptr. When the end is nullptr, this `addUTF8` works specially. (`calculateStringHashAndLengthFromUTF8MaskingTop8Bits` handles this) This is a little bit error prone design. In the meantime, I'll fix it in the separate patch.

Yusuke Suzuki

Comment 12 2015-05-19 21:19:25 PDT

Committed r184612: <http://trac.webkit.org/changeset/184612>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component New Bugs

Assignee

Yusuke Suzuki

Reported

2015-05-17 12:34 PDT

Modified

2015-05-19 21:19 PDT History

CC List

3 users Show

URL

Keywords

Depends on

Blocks

144848

Dependencies

tree graph