WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
39701
UTF-16 code points compare() for String objects
https://bugs.webkit.org/show_bug.cgi?id=39701
Summary
UTF-16 code points compare() for String objects
Luiz Agostini
Reported
2010-05-25 17:16:38 PDT
There is no good option for ascii String comparison that can be used for lexicographical sort in webkit.
Attachments
patch 1
(5.84 KB, patch)
2010-05-25 17:26 PDT
,
Luiz Agostini
darin
: review-
Details
Formatted Diff
Diff
patch 2
(5.81 KB, patch)
2010-05-25 18:15 PDT
,
Luiz Agostini
no flags
Details
Formatted Diff
Diff
patch 3
(6.50 KB, patch)
2010-05-25 19:50 PDT
,
Luiz Agostini
no flags
Details
Formatted Diff
Diff
patch 4
(6.14 KB, patch)
2010-05-26 14:26 PDT
,
Luiz Agostini
darin
: review-
Details
Formatted Diff
Diff
patch 5
(7.44 KB, patch)
2010-05-27 09:05 PDT
,
Luiz Agostini
darin
: review-
darin
: commit-queue-
Details
Formatted Diff
Diff
patch 6
(7.13 KB, patch)
2010-05-27 14:05 PDT
,
Luiz Agostini
darin
: review+
Details
Formatted Diff
Diff
patch 7
(7.13 KB, patch)
2010-05-27 14:24 PDT
,
Luiz Agostini
no flags
Details
Formatted Diff
Diff
Show Obsolete
(6)
View All
Add attachment
proposed patch, testcase, etc.
Luiz Agostini
Comment 1
2010-05-25 17:26:52 PDT
Created
attachment 57056
[details]
patch 1
Darin Adler
Comment 2
2010-05-25 17:28:37 PDT
Comment on
attachment 57056
[details]
patch 1 This is not an ASCII-only comparison. It's comparing UTF-16 code points, so asciiOnlyCompare is not the right name for it. ASCII doesn't enter into it. Could you explain your goals here? Where do you want to use this function?
WebKit Review Bot
Comment 3
2010-05-25 17:29:05 PDT
Attachment 57056
[details]
did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1 JavaScriptCore/wtf/text/StringImpl.cpp:486: l is incorrectly named. Don't use the single letter 'l' as an identifier name. [readability/naming] [4] Total errors found: 1 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.
Luiz Agostini
Comment 4
2010-05-25 17:39:31 PDT
(In reply to
comment #2
)
> (From update of
attachment 57056
[details]
) > This is not an ASCII-only comparison. It's comparing UTF-16 code points, so asciiOnlyCompare is not the right name for it. ASCII doesn't enter into it. Could you explain your goals here? Where do you want to use this function?
I need to sort a vector of String objects in lexicographical order. For that I plan to use std::sort. The problem is that I did not find a way to do it easily. Looking for sorts in webkit I found this return strcmp(parentDirectory().utf8().data(), compareTo.parentDirectory().utf8().data()); in PluginPackage::compare. This seems to be very expensive because it does a lot of copy/conversion to do the comparison. Is there any good option for that? The name is not good. I chose it just because the method do not consider localizations and is ok for ascii.
Luiz Agostini
Comment 5
2010-05-25 18:15:04 PDT
Created
attachment 57059
[details]
patch 2
Darin Adler
Comment 6
2010-05-25 18:34:29 PDT
Comment on
attachment 57059
[details]
patch 2 Seems fine to move this code where it can be used on a WTF string.
> int compare(const UString& s1, const UString& s2)
The old version of this considers a null string equal to an empty string. The new version considers a null string to be less than an empty string. Are you sure it's OK to change behavior in that way? Do we really need a UString::compare member function? Why is that better than what we already had?
> + static int compare(const StringImpl*, const StringImpl*);
Is there some reason this is better as a String and StringImpl static member instead of being a free function? I think a free function would be better.
Luiz Agostini
Comment 7
2010-05-25 19:39:21 PDT
(In reply to
comment #6
)
> (From update of
attachment 57059
[details]
) > Seems fine to move this code where it can be used on a WTF string. > > > int compare(const UString& s1, const UString& s2) > > The old version of this considers a null string equal to an empty string. The new version considers a null string to be less than an empty string. Are you sure it's OK to change behavior in that way?
no. do not want to change any behavior.
> > Do we really need a UString::compare member function? Why is that better than what we already had? > > > + static int compare(const StringImpl*, const StringImpl*); > > Is there some reason this is better as a String and StringImpl static member instead of being a free function? I think a free function would be better.
I am concerned about overloads because of all these string types and all those not explicit constructors. I think that some scope delimiters would help. :)
Luiz Agostini
Comment 8
2010-05-25 19:50:18 PDT
Created
attachment 57065
[details]
patch 3
Alexey Proskuryakov
Comment 9
2010-05-25 19:57:56 PDT
I still don't understand what this function is for. You said that you needed lexicographical order, but that's not what it implements. Could you perhaps want Collator.h? If it's to be exposed, it needs a name that makes it clear that the function does something that's almost never appropriate.
Luiz Agostini
Comment 10
2010-05-25 20:33:28 PDT
(In reply to
comment #9
)
> I still don't understand what this function is for. You said that you needed lexicographical order, but that's not what it implements. Could you perhaps want Collator.h? > > If it's to be exposed, it needs a name that makes it clear that the function does something that's almost never appropriate.
I want to compare lists of strings. I will sort then and then compare each string. Just to identify lists that contain exactly the same strings any comparison is ok. It happens that the strings I will manipulate are compose by ascii characters only. I am considering that this proposed code point comparison, for ascii characters only in english, would be lexicographical. Collator.h may be used.
Luiz Agostini
Comment 11
2010-05-25 21:01:29 PDT
(In reply to
comment #10
)
> (In reply to
comment #9
) > > I still don't understand what this function is for. You said that you needed lexicographical order, but that's not what it implements. Could you perhaps want Collator.h? > > > > If it's to be exposed, it needs a name that makes it clear that the function does something that's almost never appropriate. > > I want to compare lists of strings. I will sort then and then compare each string. Just to identify lists that contain exactly the same strings any comparison is ok. > > It happens that the strings I will manipulate are compose by ascii characters only. I am considering that this proposed code point comparison, for ascii characters only in english, would be lexicographical. > > Collator.h may be used.
Actual use cases is serialisation of media queries (
bug 39220
).
Alexey Proskuryakov
Comment 12
2010-05-26 00:13:46 PDT
I see. Comparing code points is generally not sufficient even for equality checking (because the same character can be encoded in composed and decomposed forms). This will of course work for ASCII, but it's important to not make this function look like general purpose one.
> Actual use cases is serialisation of media queries (
bug 39220
).
Are these case sensitive?
Kenneth Rohde Christiansen
Comment 13
2010-05-26 05:32:49 PDT
(In reply to
comment #12
)
> I see. Comparing code points is generally not sufficient even for equality checking (because the same character can be encoded in composed and decomposed forms). This will of course work for ASCII, but it's important to not make this function look like general purpose one. > > > Actual use cases is serialisation of media queries (
bug 39220
). > > Are these case sensitive?
They are supposed to be written in lowercase, but if you write "NOT" it will be seen as "not". I am not sure if it is allowed to write (COLOR) instead of (color), but that is at least not handled by the parser right now.
Luiz Agostini
Comment 14
2010-05-26 13:41:29 PDT
(In reply to
comment #13
)
> (In reply to
comment #12
) > > I see. Comparing code points is generally not sufficient even for equality checking (because the same character can be encoded in composed and decomposed forms). This will of course work for ASCII, but it's important to not make this function look like general purpose one. > > > > > Actual use cases is serialisation of media queries (
bug 39220
). > > > > Are these case sensitive? > > They are supposed to be written in lowercase, but if you write "NOT" it will be seen as "not". I am not sure if it is allowed to write (COLOR) instead of (color), but that is at least not handled by the parser right now.
The comparison if the strings is case sensitive. (
http://dev.w3.org/csswg/cssom/#comparing-media-queries
)
Luiz Agostini
Comment 15
2010-05-26 14:26:35 PDT
Created
attachment 57137
[details]
patch 4
Luiz Agostini
Comment 16
2010-05-26 17:08:21 PDT
Darin, could you take a look if you have time?
Darin Adler
Comment 17
2010-05-26 17:29:36 PDT
Comment on
attachment 57137
[details]
patch 4 It’s fine to have a comparison function that just compares the UChars in a string and does no case folding or normalization. A) But I don't see any reason to make this comparison function be a class member. Neither a static member nor a non-static member. It should just be a free function, the way the function in UString.h already is. We can overload it to take more types. This patch instead adds a member function to StringImpl, but there's no reason for it to be a member function. It should just be a free function. I think codePointCompare is an OK name for the function. Better than just "compare" since it might convince people not to use it in cases where they shouldn’t. B) We should rename the one that works on UString too, to be consistent. I believe it's only called in exactly one place in JavaScriptCore. C) The comparison function for UString that just calls through to the codePointCompare that takes StringImpl* should be inlined. It's in a hot code path and I don't want an extra level of function call added. Now that I see the one place where codePointCompare is called inside JavaScriptCore, I realize that it's a code path where the strings can never be null strings. That has two implications: D) We don't have to treat null strings and empty strings as equal. E) If we wanted to we could make things slightly more efficient by making that call site not even check for null. review-, but I think you're getting close now to something we should land. You should fix at least (A) and (C) above. And (B) is also a good idea and probably easy to do. No real need to do (D) or (E), though.
Luiz Agostini
Comment 18
2010-05-27 09:05:42 PDT
Created
attachment 57251
[details]
patch 5
Luiz Agostini
Comment 19
2010-05-27 09:24:04 PDT
(In reply to
comment #17
)
> (From update of
attachment 57137
[details]
) > It’s fine to have a comparison function that just compares the UChars in a string and does no case folding or normalization. > > A) But I don't see any reason to make this comparison function be a class member. Neither a static member nor a non-static member. It should just be a free function, the way the function in UString.h already is. We can overload it to take more types. This patch instead adds a member function to StringImpl, but there's no reason for it to be a member function. It should just be a free function.
done
> > I think codePointCompare is an OK name for the function. Better than just "compare" since it might convince people not to use it in cases where they shouldn’t.
done
> > B) We should rename the one that works on UString too, to be consistent. I believe it's only called in exactly one place in JavaScriptCore.
done
> > C) The comparison function for UString that just calls through to the codePointCompare that takes StringImpl* should be inlined. It's in a hot code path and I don't want an extra level of function call added.
done
> > Now that I see the one place where codePointCompare is called inside JavaScriptCore, I realize that it's a code path where the strings can never be null strings. That has two implications: > > D) We don't have to treat null strings and empty strings as equal. > E) If we wanted to we could make things slightly more efficient by making that call site not even check for null.
Because of not being a committer, although I am always watching the bots I may not be able to react in time in an eventual problem with any of the platforms. That is why I decided not to make D and E for now. I am currently waiting for a committer nomination approval. :)
> > review-, but I think you're getting close now to something we should land. You should fix at least (A) and (C) above. And (B) is also a good idea and probably easy to do. No real need to do (D) or (E), though.
thanks for the review so far. :)
Darin Adler
Comment 20
2010-05-27 13:11:49 PDT
Comment on
attachment 57251
[details]
patch 5
> + friend int codePointCompare(const UString&, const UString&);
There's no need for this function to be a friend.
> + inline int codePointCompare(const UString& s1, const UString& s2) > + { > + return codePointCompare(s1.m_rep.get(), s2.m_rep.get()); > + }
This can use the rep() function instead of getting at m_rep directly. r=me if you remove the unnecessary friendship
Luiz Agostini
Comment 21
2010-05-27 14:05:47 PDT
Created
attachment 57274
[details]
patch 6
Darin Adler
Comment 22
2010-05-27 14:08:58 PDT
Comment on
attachment 57274
[details]
patch 6
> +int codePointCompare(const String& a, const String& b);
Damn, forgot to ask you to remove these argument names. We can always do that later.
Luiz Agostini
Comment 23
2010-05-27 14:18:46 PDT
(In reply to
comment #22
)
> (From update of
attachment 57274
[details]
) > > +int codePointCompare(const String& a, const String& b); > > Damn, forgot to ask you to remove these argument names. We can always do that later.
I can do it now. just one minute
Luiz Agostini
Comment 24
2010-05-27 14:24:35 PDT
Created
attachment 57276
[details]
patch 7
WebKit Commit Bot
Comment 25
2010-05-27 16:44:06 PDT
Comment on
attachment 57276
[details]
patch 7 Clearing flags on attachment: 57276 Committed
r60332
: <
http://trac.webkit.org/changeset/60332
>
WebKit Commit Bot
Comment 26
2010-05-27 16:44:13 PDT
All reviewed patches have been landed. Closing bug.
Simon Hausmann
Comment 27
2010-05-31 05:44:11 PDT
Luiz, this is currently scheduled for integration into the 2.0 release branch. Is this intentional? The patch doesn't apply - due to renamings - so I'd need your help if we really want to integrate it :)
Luiz Agostini
Comment 28
2010-05-31 06:47:48 PDT
(In reply to
comment #27
)
> Luiz, this is currently scheduled for integration into the 2.0 release branch. Is this intentional? > > The patch doesn't apply - due to renamings - so I'd need your help if we really want to integrate it :)
Simon, bugs 39220 and 37205 depend on this one. I expect
bug 39220
to get in soon. 37205 depends on how specification evolves. I think that we could wait until 37205 get in and then integrate them all.
Simon Hausmann
Comment 29
2010-05-31 07:46:00 PDT
(In reply to
comment #28
)
> (In reply to
comment #27
) > > Luiz, this is currently scheduled for integration into the 2.0 release branch. Is this intentional? > > > > The patch doesn't apply - due to renamings - so I'd need your help if we really want to integrate it :) > > Simon, > > bugs 39220 and 37205 depend on this one. I expect
bug 39220
to get in soon. 37205 depends on how specification evolves. > > I think that we could wait until 37205 get in and then integrate them all.
I see, that makes sense. Could you make a version of the string patch that applies against qtwebkit-2.0? It should be straightforward, most of the string classes simply moved from WebCore/platform/text into JavaScriptCore/wtf/text but the methods and class names remained.
Simon Hausmann
Comment 30
2010-07-01 01:06:37 PDT
Luiz: ping. I need your help with a patch for the release branch :)
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug