Summary: | UTF-16 code points compare() for String objects | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Luiz Agostini <luiz> | ||||||||||||||||
Component: | WebCore Misc. | Assignee: | Luiz Agostini <luiz> | ||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||
Severity: | Normal | CC: | ap, commit-queue, darin, hausmann, kenneth, tonikitoo, webkit.review.bot | ||||||||||||||||
Priority: | P2 | ||||||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||||||
Hardware: | All | ||||||||||||||||||
OS: | All | ||||||||||||||||||
Bug Depends on: | |||||||||||||||||||
Bug Blocks: | 39220 | ||||||||||||||||||
Attachments: |
|
Description
Luiz Agostini
2010-05-25 17:16:38 PDT
Created attachment 57056 [details]
patch 1
Comment on attachment 57056 [details]
patch 1
This is not an ASCII-only comparison. It's comparing UTF-16 code points, so asciiOnlyCompare is not the right name for it. ASCII doesn't enter into it. Could you explain your goals here? Where do you want to use this function?
Attachment 57056 [details] did not pass style-queue:
Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
JavaScriptCore/wtf/text/StringImpl.cpp:486: l is incorrectly named. Don't use the single letter 'l' as an identifier name. [readability/naming] [4]
Total errors found: 1 in 6 files
If any of these errors are false positives, please file a bug against check-webkit-style.
(In reply to comment #2) > (From update of attachment 57056 [details]) > This is not an ASCII-only comparison. It's comparing UTF-16 code points, so asciiOnlyCompare is not the right name for it. ASCII doesn't enter into it. Could you explain your goals here? Where do you want to use this function? I need to sort a vector of String objects in lexicographical order. For that I plan to use std::sort. The problem is that I did not find a way to do it easily. Looking for sorts in webkit I found this return strcmp(parentDirectory().utf8().data(), compareTo.parentDirectory().utf8().data()); in PluginPackage::compare. This seems to be very expensive because it does a lot of copy/conversion to do the comparison. Is there any good option for that? The name is not good. I chose it just because the method do not consider localizations and is ok for ascii. Created attachment 57059 [details]
patch 2
Comment on attachment 57059 [details] patch 2 Seems fine to move this code where it can be used on a WTF string. > int compare(const UString& s1, const UString& s2) The old version of this considers a null string equal to an empty string. The new version considers a null string to be less than an empty string. Are you sure it's OK to change behavior in that way? Do we really need a UString::compare member function? Why is that better than what we already had? > + static int compare(const StringImpl*, const StringImpl*); Is there some reason this is better as a String and StringImpl static member instead of being a free function? I think a free function would be better. (In reply to comment #6) > (From update of attachment 57059 [details]) > Seems fine to move this code where it can be used on a WTF string. > > > int compare(const UString& s1, const UString& s2) > > The old version of this considers a null string equal to an empty string. The new version considers a null string to be less than an empty string. Are you sure it's OK to change behavior in that way? no. do not want to change any behavior. > > Do we really need a UString::compare member function? Why is that better than what we already had? > > > + static int compare(const StringImpl*, const StringImpl*); > > Is there some reason this is better as a String and StringImpl static member instead of being a free function? I think a free function would be better. I am concerned about overloads because of all these string types and all those not explicit constructors. I think that some scope delimiters would help. :) Created attachment 57065 [details]
patch 3
I still don't understand what this function is for. You said that you needed lexicographical order, but that's not what it implements. Could you perhaps want Collator.h? If it's to be exposed, it needs a name that makes it clear that the function does something that's almost never appropriate. (In reply to comment #9) > I still don't understand what this function is for. You said that you needed lexicographical order, but that's not what it implements. Could you perhaps want Collator.h? > > If it's to be exposed, it needs a name that makes it clear that the function does something that's almost never appropriate. I want to compare lists of strings. I will sort then and then compare each string. Just to identify lists that contain exactly the same strings any comparison is ok. It happens that the strings I will manipulate are compose by ascii characters only. I am considering that this proposed code point comparison, for ascii characters only in english, would be lexicographical. Collator.h may be used. (In reply to comment #10) > (In reply to comment #9) > > I still don't understand what this function is for. You said that you needed lexicographical order, but that's not what it implements. Could you perhaps want Collator.h? > > > > If it's to be exposed, it needs a name that makes it clear that the function does something that's almost never appropriate. > > I want to compare lists of strings. I will sort then and then compare each string. Just to identify lists that contain exactly the same strings any comparison is ok. > > It happens that the strings I will manipulate are compose by ascii characters only. I am considering that this proposed code point comparison, for ascii characters only in english, would be lexicographical. > > Collator.h may be used. Actual use cases is serialisation of media queries (bug 39220). I see. Comparing code points is generally not sufficient even for equality checking (because the same character can be encoded in composed and decomposed forms). This will of course work for ASCII, but it's important to not make this function look like general purpose one.
> Actual use cases is serialisation of media queries (bug 39220).
Are these case sensitive?
(In reply to comment #12) > I see. Comparing code points is generally not sufficient even for equality checking (because the same character can be encoded in composed and decomposed forms). This will of course work for ASCII, but it's important to not make this function look like general purpose one. > > > Actual use cases is serialisation of media queries (bug 39220). > > Are these case sensitive? They are supposed to be written in lowercase, but if you write "NOT" it will be seen as "not". I am not sure if it is allowed to write (COLOR) instead of (color), but that is at least not handled by the parser right now. (In reply to comment #13) > (In reply to comment #12) > > I see. Comparing code points is generally not sufficient even for equality checking (because the same character can be encoded in composed and decomposed forms). This will of course work for ASCII, but it's important to not make this function look like general purpose one. > > > > > Actual use cases is serialisation of media queries (bug 39220). > > > > Are these case sensitive? > > They are supposed to be written in lowercase, but if you write "NOT" it will be seen as "not". I am not sure if it is allowed to write (COLOR) instead of (color), but that is at least not handled by the parser right now. The comparison if the strings is case sensitive. (http://dev.w3.org/csswg/cssom/#comparing-media-queries) Created attachment 57137 [details]
patch 4
Darin, could you take a look if you have time? Comment on attachment 57137 [details]
patch 4
It’s fine to have a comparison function that just compares the UChars in a string and does no case folding or normalization.
A) But I don't see any reason to make this comparison function be a class member. Neither a static member nor a non-static member. It should just be a free function, the way the function in UString.h already is. We can overload it to take more types. This patch instead adds a member function to StringImpl, but there's no reason for it to be a member function. It should just be a free function.
I think codePointCompare is an OK name for the function. Better than just "compare" since it might convince people not to use it in cases where they shouldn’t.
B) We should rename the one that works on UString too, to be consistent. I believe it's only called in exactly one place in JavaScriptCore.
C) The comparison function for UString that just calls through to the codePointCompare that takes StringImpl* should be inlined. It's in a hot code path and I don't want an extra level of function call added.
Now that I see the one place where codePointCompare is called inside JavaScriptCore, I realize that it's a code path where the strings can never be null strings. That has two implications:
D) We don't have to treat null strings and empty strings as equal.
E) If we wanted to we could make things slightly more efficient by making that call site not even check for null.
review-, but I think you're getting close now to something we should land. You should fix at least (A) and (C) above. And (B) is also a good idea and probably easy to do. No real need to do (D) or (E), though.
Created attachment 57251 [details]
patch 5
(In reply to comment #17) > (From update of attachment 57137 [details]) > It’s fine to have a comparison function that just compares the UChars in a string and does no case folding or normalization. > > A) But I don't see any reason to make this comparison function be a class member. Neither a static member nor a non-static member. It should just be a free function, the way the function in UString.h already is. We can overload it to take more types. This patch instead adds a member function to StringImpl, but there's no reason for it to be a member function. It should just be a free function. done > > I think codePointCompare is an OK name for the function. Better than just "compare" since it might convince people not to use it in cases where they shouldn’t. done > > B) We should rename the one that works on UString too, to be consistent. I believe it's only called in exactly one place in JavaScriptCore. done > > C) The comparison function for UString that just calls through to the codePointCompare that takes StringImpl* should be inlined. It's in a hot code path and I don't want an extra level of function call added. done > > Now that I see the one place where codePointCompare is called inside JavaScriptCore, I realize that it's a code path where the strings can never be null strings. That has two implications: > > D) We don't have to treat null strings and empty strings as equal. > E) If we wanted to we could make things slightly more efficient by making that call site not even check for null. Because of not being a committer, although I am always watching the bots I may not be able to react in time in an eventual problem with any of the platforms. That is why I decided not to make D and E for now. I am currently waiting for a committer nomination approval. :) > > review-, but I think you're getting close now to something we should land. You should fix at least (A) and (C) above. And (B) is also a good idea and probably easy to do. No real need to do (D) or (E), though. thanks for the review so far. :) Comment on attachment 57251 [details] patch 5 > + friend int codePointCompare(const UString&, const UString&); There's no need for this function to be a friend. > + inline int codePointCompare(const UString& s1, const UString& s2) > + { > + return codePointCompare(s1.m_rep.get(), s2.m_rep.get()); > + } This can use the rep() function instead of getting at m_rep directly. r=me if you remove the unnecessary friendship Created attachment 57274 [details]
patch 6
Comment on attachment 57274 [details] patch 6 > +int codePointCompare(const String& a, const String& b); Damn, forgot to ask you to remove these argument names. We can always do that later. (In reply to comment #22) > (From update of attachment 57274 [details]) > > +int codePointCompare(const String& a, const String& b); > > Damn, forgot to ask you to remove these argument names. We can always do that later. I can do it now. just one minute Created attachment 57276 [details]
patch 7
Comment on attachment 57276 [details] patch 7 Clearing flags on attachment: 57276 Committed r60332: <http://trac.webkit.org/changeset/60332> All reviewed patches have been landed. Closing bug. Luiz, this is currently scheduled for integration into the 2.0 release branch. Is this intentional? The patch doesn't apply - due to renamings - so I'd need your help if we really want to integrate it :) (In reply to comment #27) > Luiz, this is currently scheduled for integration into the 2.0 release branch. Is this intentional? > > The patch doesn't apply - due to renamings - so I'd need your help if we really want to integrate it :) Simon, bugs 39220 and 37205 depend on this one. I expect bug 39220 to get in soon. 37205 depends on how specification evolves. I think that we could wait until 37205 get in and then integrate them all. (In reply to comment #28) > (In reply to comment #27) > > Luiz, this is currently scheduled for integration into the 2.0 release branch. Is this intentional? > > > > The patch doesn't apply - due to renamings - so I'd need your help if we really want to integrate it :) > > Simon, > > bugs 39220 and 37205 depend on this one. I expect bug 39220 to get in soon. 37205 depends on how specification evolves. > > I think that we could wait until 37205 get in and then integrate them all. I see, that makes sense. Could you make a version of the string patch that applies against qtwebkit-2.0? It should be straightforward, most of the string classes simply moved from WebCore/platform/text into JavaScriptCore/wtf/text but the methods and class names remained. Luiz: ping. I need your help with a patch for the release branch :) |