RESOLVED FIXED 205323
[JSC] 8Bit JSRopeString can contain 16Bit string in its rope
https://bugs.webkit.org/show_bug.cgi?id=205323
Summary [JSC] 8Bit JSRopeString can contain 16Bit string in its rope
Yusuke Suzuki
Reported 2019-12-16 20:16:22 PST
[JSC] 8Bit JSRopeString can contain 16Bit string in its rope
Attachments
Patch (14.03 KB, patch)
2019-12-16 20:23 PST, Yusuke Suzuki
mark.lam: review+
Yusuke Suzuki
Comment 1 2019-12-16 20:23:01 PST
Yusuke Suzuki
Comment 2 2019-12-16 20:25:02 PST
Yusuke Suzuki
Comment 3 2019-12-16 20:29:10 PST
Comment on attachment 385850 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=385850&action=review > Source/JavaScriptCore/runtime/JSString.cpp:164 > return; This is OK. When creating substring JSRopeString, we always resolve the substringBase before creating JSRopeString. This means that we already know the flag of result JSString. And 8Bit flag of resolved JSString never changes.
Mark Lam
Comment 4 2019-12-17 10:57:38 PST
Comment on attachment 385850 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=385850&action=review r=me. I wish there's a way to assert that we're not copying 16-bit chars that don't fit into 8-bit chars, but so far, I think that such an assertion is too expensive to have enabled all the time even for a debug build. Can you file a bug to investigate options for testing this? > Source/JavaScriptCore/runtime/JSString.h:594 > void resolveRopeSlowCase8(LChar*) const; I think this function is now deleted in the implementation. Please remove this declaration.
Yusuke Suzuki
Comment 5 2019-12-17 14:04:28 PST
Comment on attachment 385850 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=385850&action=review Thanks! Filed https://bugs.webkit.org/show_bug.cgi?id=205355 >> Source/JavaScriptCore/runtime/JSString.h:594 >> void resolveRopeSlowCase8(LChar*) const; > > I think this function is now deleted in the implementation. Please remove this declaration. Fixed.
Yusuke Suzuki
Comment 6 2019-12-17 14:12:05 PST
Darin Adler
Comment 7 2019-12-17 17:11:56 PST
Comment on attachment 385850 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=385850&action=review > Source/WTF/wtf/text/StringImpl.h:1089 > -template<typename CharacterType> inline void StringImpl::copyCharacters(CharacterType* destination, const CharacterType* source, unsigned numCharacters) > +template<typename SourceCharacterType, typename DestinationCharacterType> > +inline void StringImpl::copyCharacters(DestinationCharacterType* destination, const SourceCharacterType* source, unsigned numCharacters) > { > - if (numCharacters == 1) { > - *destination = *source; > - return; > + if constexpr (std::is_same<SourceCharacterType, DestinationCharacterType>::value) { > + if (numCharacters == 1) { > + *destination = *source; > + return; > + } > + memcpy(destination, source, numCharacters * sizeof(DestinationCharacterType)); > + } else { > + for (unsigned i = 0; i < numCharacters; ++i) > + destination[i] = source[i]; > } > - memcpy(destination, source, numCharacters * sizeof(CharacterType)); > -} > - > -ALWAYS_INLINE void StringImpl::copyCharacters(UChar* destination, const LChar* source, unsigned numCharacters) > -{ > - for (unsigned i = 0; i < numCharacters; ++i) > - destination[i] = source[i]; > } What’s the rationale for these changes? Do they make the implementation more efficient? Or are they more elegant? I like the old version better and it seems like it generates identical code, but maybe I am missing something.
Yusuke Suzuki
Comment 8 2019-12-17 17:29:59 PST
Comment on attachment 385850 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=385850&action=review >> Source/WTF/wtf/text/StringImpl.h:1089 >> } > > What’s the rationale for these changes? Do they make the implementation more efficient? Or are they more elegant? I like the old version better and it seems like it generates identical code, but maybe I am missing something. The rational behind this is adding `copyCharacters(LChar*, UChar*, unsigned)` version of this function, which does not exist previously. This is necessary to implement JSRopeString's resolution fixed in this patch.
Darin Adler
Comment 9 2019-12-18 09:28:37 PST
Comment on attachment 385850 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=385850&action=review >>> Source/WTF/wtf/text/StringImpl.h:1089 >>> } >> >> What’s the rationale for these changes? Do they make the implementation more efficient? Or are they more elegant? I like the old version better and it seems like it generates identical code, but maybe I am missing something. > > The rational behind this is adding `copyCharacters(LChar*, UChar*, unsigned)` version of this function, which does not exist previously. > This is necessary to implement JSRopeString's resolution fixed in this patch. Could we have named that one differently or does it need to be an overload? I don’t think it’s great to have functions that are guaranteed correct on any data assuming the buffer sizes are correct that have the same name as a function that must only be called if the precondition on the characters is correct. It’s fine to have both functions, but I’d want the one that narrows to have a different name unless there’s a reason it shouldn’t.
Yusuke Suzuki
Comment 10 2020-01-03 18:32:40 PST
Comment on attachment 385850 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=385850&action=review >>>> Source/WTF/wtf/text/StringImpl.h:1089 >>>> } >>> >>> What’s the rationale for these changes? Do they make the implementation more efficient? Or are they more elegant? I like the old version better and it seems like it generates identical code, but maybe I am missing something. >> >> The rational behind this is adding `copyCharacters(LChar*, UChar*, unsigned)` version of this function, which does not exist previously. >> This is necessary to implement JSRopeString's resolution fixed in this patch. > > Could we have named that one differently or does it need to be an overload? I don’t think it’s great to have functions that are guaranteed correct on any data assuming the buffer sizes are correct that have the same name as a function that must only be called if the precondition on the characters is correct. It’s fine to have both functions, but I’d want the one that narrows to have a different name unless there’s a reason it shouldn’t. Make sense! I'll do it in a follow-up patch.
Note You need to log in before you can comment on or make changes to this bug.