Bug 147604 - [INTL] Implement Collator Compare Functions
Summary: [INTL] Implement Collator Compare Functions
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 90906
  Show dependency treegraph
 
Reported: 2015-08-03 17:18 PDT by Andy VanWagoner
Modified: 2015-12-17 17:36 PST (History)
15 users (show)

See Also:


Attachments
Patch (26.39 KB, patch)
2015-10-22 03:51 PDT, Sukolsak Sakshuwong
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews101 for mac-mavericks (646.85 KB, application/zip)
2015-10-22 04:24 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews105 for mac-mavericks-wk2 (687.44 KB, application/zip)
2015-10-22 04:29 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews112 for mac-yosemite (775.51 KB, application/zip)
2015-10-22 04:46 PDT, Build Bot
no flags Details
Patch (25.90 KB, patch)
2015-10-22 07:24 PDT, Sukolsak Sakshuwong
no flags Details | Formatted Diff | Diff
Patch (56.75 KB, patch)
2015-11-06 18:12 PST, Sukolsak Sakshuwong
no flags Details | Formatted Diff | Diff
Patch (57.29 KB, patch)
2015-11-08 15:25 PST, Sukolsak Sakshuwong
no flags Details | Formatted Diff | Diff
Patch (63.91 KB, patch)
2015-12-15 06:12 PST, Sukolsak Sakshuwong
no flags Details | Formatted Diff | Diff
Patch (65.15 KB, patch)
2015-12-15 12:12 PST, Sukolsak Sakshuwong
no flags Details | Formatted Diff | Diff
Patch for landing (65.97 KB, patch)
2015-12-16 11:04 PST, Sukolsak Sakshuwong
sukolsak: commit-queue-
Details | Formatted Diff | Diff
Patch (66.17 KB, patch)
2015-12-17 00:54 PST, Sukolsak Sakshuwong
no flags Details | Formatted Diff | Diff
Patch (65.99 KB, patch)
2015-12-17 04:12 PST, Sukolsak Sakshuwong
no flags Details | Formatted Diff | Diff
Patch for landing (65.59 KB, patch)
2015-12-17 16:35 PST, Sukolsak Sakshuwong
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andy VanWagoner 2015-08-03 17:18:47 PDT
Implement ECMA-402 2.0 10.3.4 Collator Compare Functions
http://ecma-international.org/publications/standards/Ecma-402.htm

This makes the compare function given by Intl.Collator.prototype.compare locale-aware and sensitive to the passed in options.
Comment 1 Sukolsak Sakshuwong 2015-10-22 03:51:31 PDT
Created attachment 263817 [details]
Patch
Comment 2 Build Bot 2015-10-22 04:24:55 PDT
Comment on attachment 263817 [details]
Patch

Attachment 263817 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/321178

New failing tests:
js/intl-collator.html
Comment 3 Build Bot 2015-10-22 04:24:57 PDT
Created attachment 263820 [details]
Archive of layout-test-results from ews101 for mac-mavericks

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews101  Port: mac-mavericks  Platform: Mac OS X 10.9.5
Comment 4 Build Bot 2015-10-22 04:29:27 PDT
Comment on attachment 263817 [details]
Patch

Attachment 263817 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/321183

New failing tests:
js/intl-collator.html
Comment 5 Build Bot 2015-10-22 04:29:29 PDT
Created attachment 263821 [details]
Archive of layout-test-results from ews105 for mac-mavericks-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews105  Port: mac-mavericks-wk2  Platform: Mac OS X 10.9.5
Comment 6 Build Bot 2015-10-22 04:46:04 PDT
Comment on attachment 263817 [details]
Patch

Attachment 263817 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/321210

New failing tests:
js/intl-collator.html
Comment 7 Build Bot 2015-10-22 04:46:07 PDT
Created attachment 263823 [details]
Archive of layout-test-results from ews112 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 8 Sukolsak Sakshuwong 2015-10-22 07:24:17 PDT
Created attachment 263825 [details]
Patch

Add <meta charset="utf-8"> to LayoutTests/js/intl-collator.html
Comment 9 Darin Adler 2015-10-31 15:29:45 PDT
Comment on attachment 263825 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=263825&action=review

Looks OK. I see a lot of room for refinement and improvement.

> Source/JavaScriptCore/runtime/IntlCollator.h:61
> +    bool initializedCollator() const { return m_initializedCollator; }

The naming here is unfortunate. This sounds like a function that returns a collator that has been initialized.

> Source/JavaScriptCore/runtime/IntlCollator.h:62
> +    void setInitializedCollator(bool initializedCollator) { m_initializedCollator = initializedCollator; }

The naming here is unfortunate. This sounds like a function that takes a collator as an argument, not a function that takes a boolean.

Using public functions for this is taking the standards language too literally. We should keep this boolean private and write a member function that initializes if needed and returns if not.

> Source/JavaScriptCore/runtime/IntlCollatorConstructor.h:63
> +IntlCollator* initializeCollator(ExecState*, IntlCollator*, JSValue locales, JSValue optionsValue);

The interface to this function is strange. Why does the function both take a collator as an argument and return one. If the function needs to indicate failure, I suggest it return a boolean or error code. The ExecState argument should be ExecState&, not ExecState*, and the collator argument should be IntlCollator&, not IntlCollator*.

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:90
> +static int compareStrings(IntlCollator* collator, const String& x, const String& y)

This should take an IntlCollator&, not an IntlCollator*.

Also should probably take two StringView arguments, not two const String& arguments.

We’re going to need performance tests for this. Right now this is super-slow and we want a version that has reasonably fast performance.

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:94
> +    UCollator* icuCollator = ucol_open(collator->locale().utf8().data(), &status);

For acceptable performance, I think it’s important that we not use ucol_open and ucol_close every time. I tried that in code in WebCore and found that everything was much too slow. I think we probably need to put the UCollator in the IntlCollator, not create a new one every time.

Doing all that work to initialize the collator attributes every time we compare a pair of strings is going to result in very poor performance.

We also need to come up with a fast path so we aren’t converting simple strings with all ASCII characters to 16 bit just to call ICU with them. Again, we’ve run into this in the past. We need performance tests for this.

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:96
> +        return codePointCompare(x, y);

It is really OK to silently do the wrong thing if ucol_open fails?

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:100
> +    const String& sensitivity = collator->sensitivity();

Is this attribute string itself supposed to be case folding or case sensitive? This is not a good pattern, doing string comparisons to convert the collator’s sensitivity setting into arguments for the ICU collator every time we compare a string.

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:129
> +        return codePointCompare(x, y);

It is really OK to silently do the wrong thing if one of the ucol_setAttribute calls fail?

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:134
> +    String x16Bit = x.is8Bit() ? String::make16BitFrom8BitSource(x.characters8(), x.length()) : x;
> +    String y16Bit = y.is8Bit() ? String::make16BitFrom8BitSource(y.characters8(), y.length()) : y;
> +    UCollationResult result = ucol_strcoll(icuCollator, x16Bit.characters16(), x16Bit.length(), y16Bit.characters16(), y16Bit.length());

This is the wrong idiom for making a temporary 16-bit copy of a string. The correct idiom is to use upconvertedCharacters(). It works like this:

   UCollationResult result = ucol_strcoll(icuCollator, StringView(x).upconvertedCharacters(), x.length(), StringView(y).upconvertedCharacters(), y.length());

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:147
> +    switch (result) {
> +    case UCOL_EQUAL:
> +        return 0;
> +    case UCOL_GREATER:
> +        return 1;
> +    case UCOL_LESS:
> +        return -1;
> +    }
> +    ASSERT_NOT_REACHED();
> +    return 0;

It’s OK to try to use some abstraction here and not assume the values, but it’s really not useful. This named constants are just ICU names for these same values. There is no benefit to the switch. This can and should just be:

    return result;

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:157
> +    RELEASE_ASSERT(collator);

I don’t see why we would have this assertion in release code.

> Source/JavaScriptCore/runtime/IntlObject.cpp:118
> +    return String(uloc_getDefault()).replace('_', '-');

We should stop copying and pasting the replace we are doing here.
Comment 10 Sukolsak Sakshuwong 2015-11-03 11:33:56 PST
Thanks for the review. I will fix these. I have some questions.

(In reply to comment #9)
> > Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:96
> > +        return codePointCompare(x, y);
> 
> It is really OK to silently do the wrong thing if ucol_open fails?

I am not sure. I was wondering about that too. Then I checked StringImpl.cpp and found a few methods that do the wrong thing (returning "*this") when an ICU function fails (e.g. u_strToUpper.) So I did a similar thing. Should we do RELEASE_ASSERT(U_SUCCESS(status)) instead? If we should use codePointCompare, is it OK if I implement codePointCompare for StringViews?

> > Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:100
> > +    const String& sensitivity = collator->sensitivity();
> 
> Is this attribute string itself supposed to be case folding or case
> sensitive? This is not a good pattern, doing string comparisons to convert
> the collator’s sensitivity setting into arguments for the ICU collator every
> time we compare a string.

Case sensitive. Should I store an enum class instead and convert it to a string in resolvedOptions()?

> > Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:94
> > +    UCollator* icuCollator = ucol_open(collator->locale().utf8().data(), &status);
> 
> For acceptable performance, I think it’s important that we not use ucol_open
> and ucol_close every time. I tried that in code in WebCore and found that
> everything was much too slow. I think we probably need to put the UCollator
> in the IntlCollator, not create a new one every time.
> 
> Doing all that work to initialize the collator attributes every time we
> compare a pair of strings is going to result in very poor performance.
> 
> We also need to come up with a fast path so we aren’t converting simple
> strings with all ASCII characters to 16 bit just to call ICU with them.
> Again, we’ve run into this in the past. We need performance tests for this.

Is it OK if I address these in a separate patch?
Comment 11 Sukolsak Sakshuwong 2015-11-06 18:12:30 PST
Created attachment 264986 [details]
Patch
Comment 12 Darin Adler 2015-11-07 16:26:59 PST
Comment on attachment 264986 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=264986&action=review

> Source/JavaScriptCore/runtime/IntlCollator.cpp:98
> +                String collation(keywordValue);

Are these keyword value strings encoded as ASCII, Latin-1, or UTF-8? If the answer is ASCII or Latin-1, then this code is OK. If the answer is UTF-8, then this line should be:

    String collation = String::fromUTF8(keywordValue);
Comment 13 Sukolsak Sakshuwong 2015-11-07 17:26:31 PST
(In reply to comment #12)
> Comment on attachment 264986 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=264986&action=review
> 
> > Source/JavaScriptCore/runtime/IntlCollator.cpp:98
> > +                String collation(keywordValue);
> 
> Are these keyword value strings encoded as ASCII, Latin-1, or UTF-8?

ASCII. The Unicode Technical Standard #35 says "A Unicode Collation Identifier defines a type of collation (sort order). The valid values are those name attribute values in the type elements of bcp47/collation.xml." According to http://unicode.org/repos/cldr/trunk/common/bcp47/collation.xml, all the valid values are ASCII.
Comment 14 Sukolsak Sakshuwong 2015-11-08 15:25:52 PST
Created attachment 265024 [details]
Patch

Cache UCollator in IntlCollator
Comment 15 Sukolsak Sakshuwong 2015-11-08 15:28:36 PST
(In reply to comment #9)
> Comment on attachment 263825 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=263825&action=review
> 
> Looks OK. I see a lot of room for refinement and improvement.
> 
> > Source/JavaScriptCore/runtime/IntlCollator.h:61
> > +    bool initializedCollator() const { return m_initializedCollator; }
> 
> The naming here is unfortunate. This sounds like a function that returns a
> collator that has been initialized.
> 
> > Source/JavaScriptCore/runtime/IntlCollator.h:62
> > +    void setInitializedCollator(bool initializedCollator) { m_initializedCollator = initializedCollator; }
> 
> The naming here is unfortunate. This sounds like a function that takes a
> collator as an argument, not a function that takes a boolean.
> 
> Using public functions for this is taking the standards language too
> literally. We should keep this boolean private and write a member function
> that initializes if needed and returns if not.
> 
> > Source/JavaScriptCore/runtime/IntlCollatorConstructor.h:63
> > +IntlCollator* initializeCollator(ExecState*, IntlCollator*, JSValue locales, JSValue optionsValue);
> 
> The interface to this function is strange. Why does the function both take a
> collator as an argument and return one. If the function needs to indicate
> failure, I suggest it return a boolean or error code. The ExecState argument
> should be ExecState&, not ExecState*, and the collator argument should be
> IntlCollator&, not IntlCollator*.

I made initializeCollator, resolvedOptions, and compareStrings member functions of IntlCollator. We now don't need to expose the internal slots of IntlCollator objects such as usage, locale, collation, sensitivity, etc.

I also made initializeCollator returns if the collator has already been initialized.

> > Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:90
> > +static int compareStrings(IntlCollator* collator, const String& x, const String& y)
> 
> This should take an IntlCollator&, not an IntlCollator*.
> 
> Also should probably take two StringView arguments, not two const String&
> arguments.

Fixed.

> > Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:94
> > +    UCollator* icuCollator = ucol_open(collator->locale().utf8().data(), &status);
> 
> For acceptable performance, I think it’s important that we not use ucol_open
> and ucol_close every time. I tried that in code in WebCore and found that
> everything was much too slow. I think we probably need to put the UCollator
> in the IntlCollator, not create a new one every time.
> 
> Doing all that work to initialize the collator attributes every time we
> compare a pair of strings is going to result in very poor performance.
> 
> We also need to come up with a fast path so we aren’t converting simple
> strings with all ASCII characters to 16 bit just to call ICU with them.
> Again, we’ve run into this in the past. We need performance tests for this.

Cached UCollator in IntlCollator and called ucol_close in IntlCollator::~IntlCollator().

> > Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:96
> > +        return codePointCompare(x, y);
> 
> It is really OK to silently do the wrong thing if ucol_open fails?

Used RELEASE_ASSERT(U_SUCCESS(status)).

> > Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:100
> > +    const String& sensitivity = collator->sensitivity();
> 
> Is this attribute string itself supposed to be case folding or case
> sensitive? This is not a good pattern, doing string comparisons to convert
> the collator’s sensitivity setting into arguments for the ICU collator every
> time we compare a string.

Added enum classes for [[sensitivity]] and [[usage]] and added functions for converting them to strings.

> > Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:134
> > +    String x16Bit = x.is8Bit() ? String::make16BitFrom8BitSource(x.characters8(), x.length()) : x;
> > +    String y16Bit = y.is8Bit() ? String::make16BitFrom8BitSource(y.characters8(), y.length()) : y;
> > +    UCollationResult result = ucol_strcoll(icuCollator, x16Bit.characters16(), x16Bit.length(), y16Bit.characters16(), y16Bit.length());
> 
> This is the wrong idiom for making a temporary 16-bit copy of a string. The
> correct idiom is to use upconvertedCharacters().

Fixed.

> > Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:147
> > +    switch (result) {
> > +    case UCOL_EQUAL:
> > +        return 0;
> > +    case UCOL_GREATER:
> > +        return 1;
> > +    case UCOL_LESS:
> > +        return -1;
> > +    }
> > +    ASSERT_NOT_REACHED();
> > +    return 0;
> 
> It’s OK to try to use some abstraction here and not assume the values, but
> it’s really not useful. This named constants are just ICU names for these
> same values. There is no benefit to the switch. This can and should just be:
> 
>     return result;

Fixed.

> > Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:157
> > +    RELEASE_ASSERT(collator);
> 
> I don’t see why we would have this assertion in release code.

Changed to ASSERT(collator).

> > Source/JavaScriptCore/runtime/IntlObject.cpp:118
> > +    return String(uloc_getDefault()).replace('_', '-');
> 
> We should stop copying and pasting the replace we are doing here.

Added convertICULocaleToBCP47LanguageTag().
Comment 16 Darin Adler 2015-11-17 08:49:34 PST
Comment on attachment 265024 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=265024&action=review

Looks good.

We need some performance tests here. This code is highly likely to be super-slow because the memory allocation; much of it is unnecessary. Performance tests will help us make the right decisions about further optimization.

Code looks generally good but we need a lot more test coverage. I see lots of what might be minor mistakes in the code and the tests could show us if these are OK or if these are actually exposed to the JavaScript code.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:91
> +static Vector<String> sortLocaleData(const String& locale, const String& key)

I don’t like the interface to this function. If “key” can only be “co” or “kn”, then why are we passing a string, and not something more like an enum? It’s not good to have a function that takes a string, but only accepts a limited set of values, and requires the caller to do that checking (since it does ASSERT_NOT_REACHED).

> Source/JavaScriptCore/runtime/IntlCollator.cpp:120
> +                String collation(keywordValue);
> +
> +                // 10.2.3 "The values "standard" and "search" must not be used as elements in any [[sortLocaleData]][locale].co and [[searchLocaleData]][locale].co array."
> +                if (collation == "standard" || collation == "search")
> +                    continue;
> +
> +                // Map keyword values to BCP 47 equivalents.
> +                if (collation == "dictionary")
> +                    collation = ASCIILiteral("dict");
> +                else if (collation == "gb2312han")
> +                    collation = ASCIILiteral("gb2312");
> +                else if (collation == "phonebook")
> +                    collation = ASCIILiteral("phonebk");
> +                else if (collation == "traditional")
> +                    collation = ASCIILiteral("trad");
> +
> +                keyLocaleData.append(collation);

Would be nice to optimize this to only create a String after determining that it’s what we want to append. We can use strcmp on keywordValue instead of converting to a String and using ==.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:126
> +        keyLocaleData.append(ASCIILiteral("false"));
> +        keyLocaleData.append(ASCIILiteral("true"));

Should use reserveInitialCapacity and uncheckedAppend.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:132
> +static Vector<String> searchLocaleData(const String&, const String& key)

I don’t like the interface to this function. If “key” can only be “co”, “kn”, or “sensitivity”, then why are we passing a string, and not something more like an enum? It’s not good to have a function that takes a string, but only accepts a limited set of values, and requires the caller to do that checking (since it does ASSERT_NOT_REACHED).

> Source/JavaScriptCore/runtime/IntlCollator.cpp:138
> +        keyLocaleData.append(String());

Should use reserveInitialCapacity and uncheckedAppend.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:141
> +        keyLocaleData.append(ASCIILiteral("false"));
> +        keyLocaleData.append(ASCIILiteral("true"));

Should use reserveInitialCapacity and uncheckedAppend.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:144
> +        keyLocaleData.append("variant");

Should use reserveInitialCapacity and uncheckedAppend.

Missing ASCIILiteral here.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:150
> +void IntlCollator::initializeCollator(ExecState& exec, JSValue locales, JSValue optionsValue)

In new code we should name the argument “state” rather than “exec”.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:155
> +    m_initializedCollator = true;

Why set this here instead of at the end of the function? The pseudo code only sets it once we are past all the possible exceptions. Do we have test cases covering this?

> Source/JavaScriptCore/runtime/IntlCollator.cpp:157
> +    // 1. If collator has an [[initializedIntlObject]] internal slot with value true, throw a TypeError exception.

Where is that step? Seems missing.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:180
> +    const HashSet<String> usages({ ASCIILiteral("sort"), ASCIILiteral("search") });

Does this really need to be a HashSet? It’s quite inefficient to create and destroy a HashSet every time through this function. And if the set has a small number of entries, HashSet is not as efficient as, say, a sorted list with binary search, or even a linear search through an array. We can write the function so it takes a std::array or a std::initializer_list and make it much more efficient.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:181
> +    String usageString = getIntlStringOption(&exec, options, exec.vm().propertyNames->usage, usages, "usage must be either \"sort\" or \"search\"", ASCIILiteral("sort"));

WebKit coding style does not use “get” in the names of functions like this one.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:207
> +    const HashSet<String> matchers({ ASCIILiteral("lookup"), ASCIILiteral("best fit") });

Same comment about the use of HashSet.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:237
> +        const HashSet<String> caseFirsts({ ASCIILiteral("upper"), ASCIILiteral("lower"), ASCIILiteral("false") });

Same comment about the use of HashSet.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:285
> +    const HashSet<String> sensitivities({ ASCIILiteral("base"), ASCIILiteral("accent"), ASCIILiteral("case"), ASCIILiteral("variant") });

Same comment about the use of HashSet.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:337
> +        m_collator = ucol_open(m_locale.utf8().data(), &status);

Is it safe to do this if m_initializedCollator is false? We need test cases that cover this.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:338
> +        RELEASE_ASSERT(U_SUCCESS(status));

Why RELEASE_ASSERT here? Seems overkill, but maybe I am missing something.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:342
> +        switch (m_sensitivity) {

The value of m_sensitivity could be uninitialized here. What is this function supposed to do when initializedCollator is false?

> Source/JavaScriptCore/runtime/IntlCollator.cpp:362
> +        UColAttributeValue numericCollation = m_numeric ? UCOL_ON : UCOL_OFF;

The value of m_numeric could be uninitialized here. What is this function supposed to do when initializedCollator is false?

> Source/JavaScriptCore/runtime/IntlCollator.cpp:366
> +        // FIXME: Setting UCOL_ALTERNATE_HANDLING to UCOL_SHIFTED causes punctuation and whitespace to be
> +        // ignored. There is currently no way to ignore only punctuation.

We need to construct test cases showing this problem, rather than just a FIXME.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:367
> +        UColAttributeValue alternateHandling = m_ignorePunctuation ? UCOL_SHIFTED : UCOL_DEFAULT;

The value of m_ignorePunctuation could be uninitialized here. What is this function supposed to do when initializedCollator is false?

> Source/JavaScriptCore/runtime/IntlCollator.cpp:373
> +        RELEASE_ASSERT(U_SUCCESS(status));

Why RELEASE_ASSERT here? Why is a single RELEASE_ASSERT sufficient?

> Source/JavaScriptCore/runtime/IntlCollator.cpp:375
> +    return ucol_strcoll(m_collator, x.upconvertedCharacters(), x.length(), y.upconvertedCharacters(), y.length());

If the strings are entirely ASCII, we could use ucol_strcollUTF8. But since it’s hard to efficiently check if a string is UTF-8, the right thing to do to optimize is probably to use ucol_strcollIter, and write a UCharIterator that understands how to iterate a StringView.

An example of how to do this with ucol.h can be seen in the CollatorICU.cpp file. The Collator::collate function does all of this.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:378
> +const char* IntlCollator::getUsageString(Usage usage)

WebKit coding style does not use “get” in the names of functions like this one.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:390
> +const char* IntlCollator::getSensitivityString(Sensitivity sensitivity)

WebKit coding style does not use “get” in the names of functions like this one.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:406
> +JSObject* IntlCollator::resolvedOptions(ExecState& exec)

In new code we should call this argument “state” rather than “exec”.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:418
> +    options->putDirect(vm, vm.propertyNames->locale, jsString(&exec, m_locale));

Will this do the right thing when m_locale is null? We need test cases that cover this, including all the early exit cases that can happen in initializeCollator.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:419
> +    options->putDirect(vm, vm.propertyNames->usage, jsString(&exec, ASCIILiteral(getUsageString(m_usage))));

The value of m_usage could be uninitialized here, so we could actually hit the ASSERT_NOT_REACHED. We need test cases that cover this, including all the early exit cases that can happen in initializeCollator, and I suggest initializing this data member in the class definition in the header.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:420
> +    options->putDirect(vm, vm.propertyNames->sensitivity, jsString(&exec, ASCIILiteral(getSensitivityString(m_sensitivity))));

The value of m_sensitivity could be uninitialized here, so we could actually hit the ASSERT_NOT_REACHED. We need test cases that cover this, including all the early exit cases that can happen in initializeCollator, and I suggest initializing this data member in the class definition in the header.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:421
> +    options->putDirect(vm, vm.propertyNames->ignorePunctuation, jsBoolean(m_ignorePunctuation));

The value of m_ignorePunctuation could be uninitialized here. We need test cases that cover this, including all the early exit cases that can happen in initializeCollator, and I suggest initializing this data member in the class definition in the header.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:422
> +    options->putDirect(vm, vm.propertyNames->collation, jsString(&exec, m_collation));

Will this do the right thing when m_collation is null? We need test cases that cover this, including all the early exit cases that can happen in initializeCollator.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:423
> +    options->putDirect(vm, vm.propertyNames->numeric, jsBoolean(m_numeric));

The value of m_numeric could be uninitialized here. We need test cases that cover this, including all the early exit cases that can happen in initializeCollator, and I suggest initializing this data member in the class definition in the header.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:429
> +    m_boundCompare.set(vm, this, format);

Do we have test cases covering what happens when we pass in an object of the wrong type?

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:86
> +static EncodedJSValue JSC_HOST_CALL IntlCollatorFuncCompare(ExecState* exec)

In new code this argument should be named “state” rather than “exec”.
Comment 17 Sukolsak Sakshuwong 2015-12-15 06:12:30 PST
Created attachment 267366 [details]
Patch
Comment 18 Sukolsak Sakshuwong 2015-12-15 06:27:12 PST
Thanks for the review.

(In reply to comment #16)
> Comment on attachment 265024 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=265024&action=review
> 
> Looks good.
> 
> We need some performance tests here. This code is highly likely to be
> super-slow because the memory allocation; much of it is unnecessary.
> Performance tests will help us make the right decisions about further
> optimization.

I have removed some unnecessary memory allocation in this patch. Could you please let me know where is the right place to add performance tests for this? Is it JavaScriptCore/tests/stress?

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:91
> > +static Vector<String> sortLocaleData(const String& locale, const String& key)
> 
> I don’t like the interface to this function. If “key” can only be “co” or
> “kn”, then why are we passing a string, and not something more like an enum?
> It’s not good to have a function that takes a string, but only accepts a
> limited set of values, and requires the caller to do that checking (since it
> does ASSERT_NOT_REACHED).

Fixed by using array indices. I'm not sure if it's the best way to do it. I tried using enums but found the code too ugly, because resolveLocale() needs to work with other classes. So I had to cast enum to int back and forth and pass a function for converting enums to strings to resolveLocale().

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:120
> > +                String collation(keywordValue);
> > +
> > +                // 10.2.3 "The values "standard" and "search" must not be used as elements in any [[sortLocaleData]][locale].co and [[searchLocaleData]][locale].co array."
> > +                if (collation == "standard" || collation == "search")
> > +                    continue;
> > +
> > +                // Map keyword values to BCP 47 equivalents.
> > +                if (collation == "dictionary")
> > +                    collation = ASCIILiteral("dict");
> > +                else if (collation == "gb2312han")
> > +                    collation = ASCIILiteral("gb2312");
> > +                else if (collation == "phonebook")
> > +                    collation = ASCIILiteral("phonebk");
> > +                else if (collation == "traditional")
> > +                    collation = ASCIILiteral("trad");
> > +
> > +                keyLocaleData.append(collation);
> 
> Would be nice to optimize this to only create a String after determining
> that it’s what we want to append. We can use strcmp on keywordValue instead
> of converting to a String and using ==.

Done.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:126
> > +        keyLocaleData.append(ASCIILiteral("false"));
> > +        keyLocaleData.append(ASCIILiteral("true"));
> 
> Should use reserveInitialCapacity and uncheckedAppend.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:138
> > +        keyLocaleData.append(String());
> 
> Should use reserveInitialCapacity and uncheckedAppend.

Done.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:150
> > +void IntlCollator::initializeCollator(ExecState& exec, JSValue locales, JSValue optionsValue)
> 
> In new code we should name the argument “state” rather than “exec”.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:181
> > +    String usageString = getIntlStringOption(&exec, options, exec.vm().propertyNames->usage, usages, "usage must be either \"sort\" or \"search\"", ASCIILiteral("sort"));
> 
> WebKit coding style does not use “get” in the names of functions like this
> one.

Fixed in Bug 151491.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:155
> > +    m_initializedCollator = true;
> 
> Why set this here instead of at the end of the function? The pseudo code
> only sets it once we are past all the possible exceptions. Do we have test
> cases covering this?

Fixed. There are test cases covering this.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:157
> > +    // 1. If collator has an [[initializedIntlObject]] internal slot with value true, throw a TypeError exception.
> 
> Where is that step? Seems missing.

I don't implement it because I don't think the condition "collator.[[initializedIntlObject]] == true" can ever be true. According to the spec, this is the only place that calls InitializeCollator():

    "10.1.2 Intl.Collator([ locales [, options]])
    
    When the Intl.Collator function is called with optional arguments locales and options the following steps are taken:
    
    1. If NewTarget is undefined, let newTarget be the active function object, else let newTarget be NewTarget.
    2. Let collator be OrdinaryCreateFromConstructor(newTarget, %CollatorPrototype%).
    3. ReturnIfAbrupt(collator).
    4. Return InitializeCollator(collator, locales, options)."

It seems that we always call InitializeCollator() with a new object, unless I misunderstand how OrdinaryCreateFromConstructor works. So, the object should never have the internal slot "initializedIntlObject" set.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:180
> > +    const HashSet<String> usages({ ASCIILiteral("sort"), ASCIILiteral("search") });
> 
> Does this really need to be a HashSet? It’s quite inefficient to create and
> destroy a HashSet every time through this function. And if the set has a
> small number of entries, HashSet is not as efficient as, say, a sorted list
> with binary search, or even a linear search through an array. We can write
> the function so it takes a std::array or a std::initializer_list and make it
> much more efficient.

Fixed by using std::initializer_list.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:337
> > +        m_collator = ucol_open(m_locale.utf8().data(), &status);
> 
> Is it safe to do this if m_initializedCollator is false? We need test cases
> that cover this.

m_initializedCollator can never be false if we reach this point. I have made it a bit more explicit in the new patch.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:338
> > +        RELEASE_ASSERT(U_SUCCESS(status));
> 
> Why RELEASE_ASSERT here? Seems overkill, but maybe I am missing something.

Fixed by throwing an exception instead.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:342
> > +        switch (m_sensitivity) {
> 
> The value of m_sensitivity could be uninitialized here. What is this
> function supposed to do when initializedCollator is false?
> 
> > Source/JavaScriptCore/runtime/IntlCollator.cpp:362
> > +        UColAttributeValue numericCollation = m_numeric ? UCOL_ON : UCOL_OFF;
> 
> The value of m_numeric could be uninitialized here. What is this function
> supposed to do when initializedCollator is false?
> 

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:418
> > +    options->putDirect(vm, vm.propertyNames->locale, jsString(&exec, m_locale));
> 
> Will this do the right thing when m_locale is null? We need test cases that
> cover this, including all the early exit cases that can happen in
> initializeCollator.
> 
> > Source/JavaScriptCore/runtime/IntlCollator.cpp:419
> > +    options->putDirect(vm, vm.propertyNames->usage, jsString(&exec, ASCIILiteral(getUsageString(m_usage))));
> 
> The value of m_usage could be uninitialized here, so we could actually hit
> the ASSERT_NOT_REACHED. We need test cases that cover this, including all
> the early exit cases that can happen in initializeCollator, and I suggest
> initializing this data member in the class definition in the header.

There are two ways to get a Collator object:

1. Create a Collator object with "new Intl.Collator(...)" or "Intl.Collator(...)". initializeCollator() is called when the object is created. If initializeCollator() succeeds, all these values are initialized. If an exception is thrown, the Collator object is never returned to the user. So it's not possible to read uninitialized values of Collator objects that are created this way.

2. Use Intl.Collator.prototype. The spec says "The Intl.Collator prototype object is itself an Intl.Collator instance ... whose internal slots are set as if it had been constructed by the expression Construct(%Collator%)" (ECMA-402, 10.3). However, I don't want to call initializeCollator() when Intl.Collator.prototype is created, because Intl.Collator.prototype is created when JSGlobalObject is created. So, I do lazy initialization: check whether the object has been initialized before any read of its values. If not, initialize it. There are only two methods that read the values: resolvedOptions() and compareStrings(). The new patch should make it more explicit.

So, I don't think it's possible to create a test case that reads uninitialized values. We do have test cases where we check the default options and test the default behaviors.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:366
> > +        // FIXME: Setting UCOL_ALTERNATE_HANDLING to UCOL_SHIFTED causes punctuation and whitespace to be
> > +        // ignored. There is currently no way to ignore only punctuation.
> 
> We need to construct test cases showing this problem, rather than just a
> FIXME.

Done.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:373
> > +        RELEASE_ASSERT(U_SUCCESS(status));
> 
> Why RELEASE_ASSERT here? Why is a single RELEASE_ASSERT sufficient?

Fixed by throwing an exception and adding multiple checks for U_FAILURE(status).

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:375
> > +    return ucol_strcoll(m_collator, x.upconvertedCharacters(), x.length(), y.upconvertedCharacters(), y.length());
> 
> If the strings are entirely ASCII, we could use ucol_strcollUTF8. But since
> it’s hard to efficiently check if a string is UTF-8, the right thing to do
> to optimize is probably to use ucol_strcollIter, and write a UCharIterator
> that understands how to iterate a StringView.
> 
> An example of how to do this with ucol.h can be seen in the CollatorICU.cpp
> file. The Collator::collate function does all of this.

Done.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:378
> > +const char* IntlCollator::getUsageString(Usage usage)
> 
> WebKit coding style does not use “get” in the names of functions like this
> one.
> 
> > Source/JavaScriptCore/runtime/IntlCollator.cpp:390
> > +const char* IntlCollator::getSensitivityString(Sensitivity sensitivity)
> 
> WebKit coding style does not use “get” in the names of functions like this
> one.

Fixed.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:429
> > +    m_boundCompare.set(vm, this, format);
> 
> Do we have test cases covering what happens when we pass in an object of the
> wrong type?

If you mean something like "new Intl.Collator().compare.call(5, 'a', 'b')", we do. See http://trac.webkit.org/browser/trunk/LayoutTests/js/script-tests/intl-collator.js#L229
Comment 19 Darin Adler 2015-12-15 08:45:34 PST
Comment on attachment 267366 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=267366&action=review

Looks like great work. Many good improvements, not just the new feature.

review- because the string object lifetime mistake in IntlCollatorFuncCompare is a serious one.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:104
> +        keyLocaleData.append(String());

Alternate syntax for null in this context is { } rather than String(). I slightly prefer the former.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:125
> +                keyLocaleData.append(String(collation));

This treats the keyword value strings as ASCII or Latin-1, not UTF-8. Is that correct? Also, an explicit conversion to String() is not needed; please just write append(collation).

To create the strings as UTF-8, we would use String::fromUTF8(collation) but we also have to consider that the string might be invalid UTF-8 which would result in a null string.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:150
> +        keyLocaleData.append(String());

Alternate syntax for null in this context is { } rather than String(). I slightly prefer the former.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:170
> +    Vector<String> requestedLocales = canonicalizeLocaleList(state, locales);

I like to use auto for the return value type in a case like this.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:220
> +    opt.set(ASCIILiteral("localeMatcher"), matcher);

More efficient to use the add function here rather than the set function.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:241
> +        opt.set(ASCIILiteral("kn"), numericString);

Ditto.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:247
> +        opt.set(ASCIILiteral("kf"), caseFirst);

Ditto.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:252
> +    const HashSet<String>& availableLocales = state.callee()->globalObject()->intlCollatorAvailableLocales();

I like auto& here better for the return type.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:253
> +    HashMap<String, String> result = resolveLocale(availableLocales, requestedLocales, opt, relevantExtensionKeys, sizeof(relevantExtensionKeys) / sizeof(relevantExtensionKeys[0]), localeData);

I like auto here better for the return type.

WTF_ARRAY_LENGTH is preferred over that sizeof expression. Or perhaps we can use std::array and use relevantExtensionKeys.size().

> Source/JavaScriptCore/runtime/IntlCollator.cpp:329
> +        ASSERT(!state.hadException());

Why can we assert this? What guarantees it is true. If we can assert here, why can’t we assert along the way doing each step inside initializeCollator itself?

> Source/JavaScriptCore/runtime/IntlCollator.cpp:332
> +    if (!m_collator) {

I suggest putting the code to allocate a new collator into a separate function. So the code here would be:

    if (!m_collator) {
        createCollator();
        if (!m_collator)
            return state.vm().throwException(&state, createError(&state, ASCIILiteral("error message here")));
    }

The code to call initializeCollator could also possibly go into the createCollator function for the same reason.

I think this would make this function easier to read and possible even more efficient.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:379
> +        UColAttributeValue strength;
> +        UColAttributeValue caseLevel = UCOL_OFF;
> +        switch (m_sensitivity) {
> +        case Sensitivity::Base:
> +            strength = UCOL_PRIMARY;
> +            break;
> +        case Sensitivity::Accent:
> +            strength = UCOL_SECONDARY;
> +            break;
> +        case Sensitivity::Case:
> +            strength = UCOL_PRIMARY;
> +            caseLevel = UCOL_ON;
> +            break;
> +        case Sensitivity::Variant:
> +            strength = UCOL_TERTIARY;
> +            break;
> +        default:
> +            ASSERT_NOT_REACHED();
> +        }
> +        ucol_setAttribute(collator, UCOL_STRENGTH, strength, &status);
> +        if (U_FAILURE(status))
> +            goto fail;
> +        ucol_setAttribute(collator, UCOL_CASE_LEVEL, caseLevel, &status);
> +        if (U_FAILURE(status))
> +            goto fail;
> +
> +        ucol_setAttribute(collator, UCOL_NUMERIC_COLLATION, m_numeric ? UCOL_ON : UCOL_OFF, &status);
> +        if (U_FAILURE(status))
> +            goto fail;
> +
> +        // FIXME: Setting UCOL_ALTERNATE_HANDLING to UCOL_SHIFTED causes punctuation and whitespace to be
> +        // ignored. There is currently no way to ignore only punctuation.
> +        ucol_setAttribute(collator, UCOL_ALTERNATE_HANDLING, m_ignorePunctuation ? UCOL_SHIFTED : UCOL_DEFAULT, &status);
> +        if (U_FAILURE(status))
> +            goto fail;
> +
> +        // "The method is required to return 0 when comparing Strings that are considered canonically
> +        // equivalent by the Unicode standard."
> +        ucol_setAttribute(collator, UCOL_NORMALIZATION_MODE, UCOL_ON, &status);
> +        if (U_FAILURE(status))
> +            goto fail;
> +

Is there any other place we can share this code with? It’s pretty long winded and I seem to have seen code like this before.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:384
> +        return state.vm().throwException(&state, createError(&state, ASCIILiteral("ucol_setAttribute failed.")));

This does not seem like an acceptable string for an error seen by the web. When would this exception occur? Why is it useful to give the function name ucol_setAttribute to the web developer?

> Source/JavaScriptCore/runtime/IntlCollator.h:50
> +    JSValue compareStrings(ExecState&, StringView x, StringView y);

I don’t think the names “x” and “y” here add anything. Please omit them.

> Source/JavaScriptCore/runtime/IntlCollator.h:67
> +    enum class Usage {
> +        Sort,
> +        Search
> +    };

I would find this more readable as a one-liner.

> Source/JavaScriptCore/runtime/IntlCollator.h:74
> +    enum class Sensitivity {
> +        Base,
> +        Accent,
> +        Case,
> +        Variant
> +    };

I would find this more readable as a one-liner.

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:98
> +    StringView x = state->argument(0).toString(state)->view(state);

This won’t work. We are creating a StringView pointing to the result of toString, but then the garbage collector is allowed to destroy that object. Instead the local variable needs to be of type JSString* and the call to view needs to be separate. I suggest calling view in the line that calls compareStrings.

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:104
> +    StringView y = state->argument(1).toString(state)->view(state);

Ditto.

> Source/JavaScriptCore/runtime/IntlObject.cpp:118
> +    String locale(uloc_getDefault());

Is the result of uloc_getDefault guaranteed to be ASCII or Latin-1? Or could it be UTF-8? If it’s UTF-8 then we need to write this code differently.

> Source/JavaScriptCore/runtime/IntlObject.cpp:202
>              state.vm().throwException(&state, createRangeError(&state, String(notFound)));

This explicit conversion to String should not be needed. Please remove it.

> Source/JavaScriptCore/runtime/IntlObject.cpp:211
> -    return fallback;
> +    return String(fallback);

This explicit conversion to String should not be needed. Please don’t make this change.

> Source/JavaScriptCore/runtime/IntlObject.cpp:759
> -                        supportedExtensionAddition = "-" + key + '-' + value;
> +                        supportedExtensionAddition = String("-") + key + '-' + value;

More efficient idiom for this is:

    supportedExtensionAddition = makeString('-', key, '-', value);

Avoids extra work to allocate and destroy a single character string for "-".
Comment 20 Sukolsak Sakshuwong 2015-12-15 12:12:14 PST
Created attachment 267384 [details]
Patch
Comment 21 Sukolsak Sakshuwong 2015-12-15 12:15:46 PST
Thanks for the review.

Regarding the string object lifetime bug, it looks like there are a few places in StringPrototype.cpp that do that. For example, https://github.com/WebKit/webkit/blob/master/Source/JavaScriptCore/runtime/StringPrototype.cpp#L801 Should I file a bug for that?

(In reply to comment #19)
> Comment on attachment 267366 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=267366&action=review
> 
> > Source/JavaScriptCore/runtime/IntlCollator.cpp:104
> > +        keyLocaleData.append(String());
> 
> Alternate syntax for null in this context is { } rather than String(). I
> slightly prefer the former.
 
Fixed.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:125
> > +                keyLocaleData.append(String(collation));
> 
> This treats the keyword value strings as ASCII or Latin-1, not UTF-8. Is
> that correct? Also, an explicit conversion to String() is not needed; please
> just write append(collation).
> 
> To create the strings as UTF-8, we would use String::fromUTF8(collation) but
> we also have to consider that the string might be invalid UTF-8 which would
> result in a null string.
 
Changed to append(collation). collation can only be ASCII. See Comment 13 (https://bugs.webkit.org/show_bug.cgi?id=147604#c13).

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:170
> > +    Vector<String> requestedLocales = canonicalizeLocaleList(state, locales);
> 
> I like to use auto for the return value type in a case like this.

Fixed.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:220
> > +    opt.set(ASCIILiteral("localeMatcher"), matcher);
> 
> More efficient to use the add function here rather than the set function.
> 
> > Source/JavaScriptCore/runtime/IntlCollator.cpp:241
> > +        opt.set(ASCIILiteral("kn"), numericString);
> 
> Ditto.
> 
> > Source/JavaScriptCore/runtime/IntlCollator.cpp:247
> > +        opt.set(ASCIILiteral("kf"), caseFirst);
> 
> Ditto.

Fixed.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:252
> > +    const HashSet<String>& availableLocales = state.callee()->globalObject()->intlCollatorAvailableLocales();
> 
> I like auto& here better for the return type.

Fixed.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:253
> > +    HashMap<String, String> result = resolveLocale(availableLocales, requestedLocales, opt, relevantExtensionKeys, sizeof(relevantExtensionKeys) / sizeof(relevantExtensionKeys[0]), localeData);
> 
> I like auto here better for the return type.
> 
> WTF_ARRAY_LENGTH is preferred over that sizeof expression. Or perhaps we can
> use std::array and use relevantExtensionKeys.size().

Fixed by using auto and WTF_ARRAY_LENGTH.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:329
> > +        ASSERT(!state.hadException());
> 
> Why can we assert this? What guarantees it is true. If we can assert here,
> why can’t we assert along the way doing each step inside initializeCollator
> itself?

The spec guarantees that it is true. According to the steps in ECMA-402, 10.1.1, InitializeCollator (collator, locales, options) will not throw if locales and options are undefined.

Another reason is that the spec says that "The Intl.Collator prototype object is itself an Intl.Collator instance ... whose internal slots are set as if it had been constructed by the expression Construct(%Collator%)". If InitializeCollator(collator, undefined, undefined) could throw, then Intl.Collator.prototype could throw when it was created, which doesn't make sense.

We can't assert inside initializeCollator because locales and options are not always undefined.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:332
> > +    if (!m_collator) {
> 
> I suggest putting the code to allocate a new collator into a separate
> function. So the code here would be:
> 
>     if (!m_collator) {
>         createCollator();
>         if (!m_collator)
>             return state.vm().throwException(&state, createError(&state,
> ASCIILiteral("error message here")));
>     }
> 
> The code to call initializeCollator could also possibly go into the
> createCollator function for the same reason.
> 
> I think this would make this function easier to read and possible even more
> efficient.

Fixed.

> Is there any other place we can share this code with? It’s pretty long
> winded and I seem to have seen code like this before.

I think this is the only place that uses this code.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:384
> > +        return state.vm().throwException(&state, createError(&state, ASCIILiteral("ucol_setAttribute failed.")));
> 
> This does not seem like an acceptable string for an error seen by the web.
> When would this exception occur? Why is it useful to give the function name
> ucol_setAttribute to the web developer?

Changed the message to "Failed to compare strings." The ICU spec doesn't say what can cause an error for these functions.

> > Source/JavaScriptCore/runtime/IntlCollator.h:50
> > +    JSValue compareStrings(ExecState&, StringView x, StringView y);
> 
> I don’t think the names “x” and “y” here add anything. Please omit them.
> 
> > Source/JavaScriptCore/runtime/IntlCollator.h:67
> > +    enum class Usage {
> > +        Sort,
> > +        Search
> > +    };
> 
> I would find this more readable as a one-liner.
> 
> > Source/JavaScriptCore/runtime/IntlCollator.h:74
> > +    enum class Sensitivity {
> > +        Base,
> > +        Accent,
> > +        Case,
> > +        Variant
> > +    };
> 
> I would find this more readable as a one-liner.

Fixed.

> > Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:98
> > +    StringView x = state->argument(0).toString(state)->view(state);
> 
> This won’t work. We are creating a StringView pointing to the result of
> toString, but then the garbage collector is allowed to destroy that object.
> Instead the local variable needs to be of type JSString* and the call to
> view needs to be separate. I suggest calling view in the line that calls
> compareStrings.
> 
> > Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:104
> > +    StringView y = state->argument(1).toString(state)->view(state);
> 
> Ditto.

Fixed.

> > Source/JavaScriptCore/runtime/IntlObject.cpp:118
> > +    String locale(uloc_getDefault());
> 
> Is the result of uloc_getDefault guaranteed to be ASCII or Latin-1? Or could
> it be UTF-8? If it’s UTF-8 then we need to write this code differently.

ASCII, according to http://userguide.icu-project.org/locale

> > Source/JavaScriptCore/runtime/IntlObject.cpp:202
> >              state.vm().throwException(&state, createRangeError(&state, String(notFound)));
> 
> This explicit conversion to String should not be needed. Please remove it.
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:211
> > -    return fallback;
> > +    return String(fallback);
> 
> This explicit conversion to String should not be needed. Please don’t make
> this change.
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:759
> > -                        supportedExtensionAddition = "-" + key + '-' + value;
> > +                        supportedExtensionAddition = String("-") + key + '-' + value;
> 
> More efficient idiom for this is:
> 
>     supportedExtensionAddition = makeString('-', key, '-', value);
> 
> Avoids extra work to allocate and destroy a single character string for "-".

Fixed.
Comment 22 Darin Adler 2015-12-16 08:45:12 PST
(In reply to comment #21)
> Regarding the string object lifetime bug, it looks like there are a few
> places in StringPrototype.cpp that do that. For example,
> https://github.com/WebKit/webkit/blob/master/Source/JavaScriptCore/runtime/
> StringPrototype.cpp#L801 Should I file a bug for that?

Yes! Please cc me, Andreas Kling, and Geoff Garen.
Comment 23 Darin Adler 2015-12-16 08:57:48 PST
Comment on attachment 267384 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=267384&action=review

> Source/JavaScriptCore/runtime/IntlCollator.cpp:277
> +    m_numeric = (result.get(ASCIILiteral("kn")) == "true");

Doesn’t seem to be checking the type for boolean like the pseudocode says we should. Do we have tests covering that?

> Source/JavaScriptCore/runtime/IntlCollator.cpp:336
> +    UColAttributeValue strength;

I suggest initializing this to UCOL_PRIMARY. Would fix a warning on Windows.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:396
> +    UCharIterator iteratorX = createIterator(x);
> +    UCharIterator iteratorY = createIterator(y);

Looks like there is a linking problem on Windows.

     Creating library C:/cygwin/home/buildbot/WebKit/WebKitBuild/Release/lib32/JavaScriptCore.lib and object C:/cygwin/home/buildbot/WebKit/WebKitBuild/Release/lib32/JavaScriptCore.exp
IntlCollator.obj : error LNK2019: unresolved external symbol "struct UCharIterator __cdecl WTF::createIterator(class WTF::StringView)" (?createIterator@WTF@@YA?AUUCharIterator@@VStringView@1@@Z) referenced in function "public: class JSC::JSValue __thiscall JSC::IntlCollator::compareStrings(class JSC::ExecState &,class WTF::StringView,class WTF::StringView)" (?compareStrings@IntlCollator@JSC@@QAE?AVJSValue@2@AAVExecState@2@VStringView@WTF@@1@Z) [C:\cygwin\home\buildbot\WebKit\WebKitBuild\Release\Source\JavaScriptCore\JavaScriptCore.vcxproj]

I think this is because the declaration of createIterator in Collator.h needs WTF_EXPORT_PRIVATE.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:397
> +    int result = ucol_strcollIter(m_collator, &iteratorX, &iteratorY, &status);

auto is slightly better than int here

> Source/JavaScriptCore/runtime/IntlCollatorConstructor.cpp:111
> +    if (state->hadException())
> +        return JSValue::encode(jsUndefined());

We don’t need this code. There is no harm in returning the collator when there’s an exception; the return value will be ignored. It would be better to omit this.

> Source/JavaScriptCore/runtime/IntlCollatorConstructor.cpp:133
> +    if (state->hadException())
> +        return JSValue::encode(jsUndefined());

We don’t need this code. There is no harm in returning the collator when there’s an exception; the return value will be ignored. It would be better to omit this.

> Source/JavaScriptCore/runtime/IntlObject.cpp:118
> +    String locale(uloc_getDefault());

We normally prefer the = syntax to the function call style syntax for code like this.
Comment 24 Sukolsak Sakshuwong 2015-12-16 11:04:02 PST
Created attachment 267471 [details]
Patch for landing
Comment 25 Sukolsak Sakshuwong 2015-12-16 11:09:36 PST
Thanks!

(In reply to comment #23)
> Comment on attachment 267384 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=267384&action=review
> 
> > Source/JavaScriptCore/runtime/IntlCollator.cpp:277
> > +    m_numeric = (result.get(ASCIILiteral("kn")) == "true");
> 
> Doesn’t seem to be checking the type for boolean like the pseudocode says we
> should. Do we have tests covering that?

The pseudocode says "If the name given in the Type column of the row is 'boolean', let value be the result of comparing value with 'true'." "Type" here refers to the text in the Type column of Table 1 in the spec, which looks like the following:

-------------------------------------------------------
Key | Property  | Type      | Values
-------------------------------------------------------
kn  | numeric   | "boolean" |
kf  | caseFirst | "string"  | "upper", "lower", "false"
-------------------------------------------------------

We have tests covering numeric of various values. See http://trac.webkit.org/browser/trunk/LayoutTests/js/script-tests/intl-collator.js#L88 We test

- {numeric: true}
- {numeric: false}  (Only this one results in m_numeric == false)
- {numeric: 'false'}
- {numeric: { }}
- { get numeric() { throw 42; } }

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:336
> > +    UColAttributeValue strength;
> 
> I suggest initializing this to UCOL_PRIMARY. Would fix a warning on Windows.

Done.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:396
> > +    UCharIterator iteratorX = createIterator(x);
> > +    UCharIterator iteratorY = createIterator(y);
> 
> Looks like there is a linking problem on Windows.
> 

> I think this is because the declaration of createIterator in Collator.h
> needs WTF_EXPORT_PRIVATE.

Done.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:397
> > +    int result = ucol_strcollIter(m_collator, &iteratorX, &iteratorY, &status);
> 
> auto is slightly better than int here

Done.

> > Source/JavaScriptCore/runtime/IntlCollatorConstructor.cpp:111
> > +    if (state->hadException())
> > +        return JSValue::encode(jsUndefined());
> 
> We don’t need this code. There is no harm in returning the collator when
> there’s an exception; the return value will be ignored. It would be better
> to omit this.
> 
> > Source/JavaScriptCore/runtime/IntlCollatorConstructor.cpp:133
> > +    if (state->hadException())
> > +        return JSValue::encode(jsUndefined());
> 
> We don’t need this code. There is no harm in returning the collator when
> there’s an exception; the return value will be ignored. It would be better
> to omit this.

Done.

> > Source/JavaScriptCore/runtime/IntlObject.cpp:118
> > +    String locale(uloc_getDefault());
> 
> We normally prefer the = syntax to the function call style syntax for code
> like this.

Done.
Comment 26 Sukolsak Sakshuwong 2015-12-17 00:54:16 PST
Created attachment 267546 [details]
Patch

Try fixing the build issue on Windows by including <unicode/uiter.h> in Collator.h
Comment 27 Sukolsak Sakshuwong 2015-12-17 04:12:04 PST
Created attachment 267556 [details]
Patch
Comment 28 Sukolsak Sakshuwong 2015-12-17 04:16:08 PST
Three new changes:

1. Include <unicode/uiter.h> in Collator.h to fix the build issue on Windows.

I tried using WTF_EXPORT_PRIVATE on UCharIterator createIterator(StringView) in Collator.h but I got the same error message, except that there was "__declspec(dllimport)" at the beginning of the declaration (https://webkit-queues.webkit.org/results/567993). It looks like we need to include the definition of UCharIterator.

2. Only check for U_FAILURE(status) after the last ucol_setAttribute.

ucol_setAttribute returns immediately if U_FAILURE(status) is true. (See the source code at http://source.icu-project.org/repos/icu/icu/trunk/source/i18n/ucol.cpp) So, there's no harm in calling ucol_setAttribute consecutively. The ICU coding guidelines also state that "It is not necessary to check for U_FAILURE() immediately before calling a function that takes a UErrorCode parameter, because that function is supposed to check for failure." (http://userguide.icu-project.org/dev/codingguidelines)

It's super unlikely that ucol_setAttribute will ever throw an error. From the source code, ucol_setAttribute is just setting flags. The only error that it will ever throw is U_ILLEGAL_ARGUMENT_ERROR. (http://source.icu-project.org/repos/icu/icu/trunk/source/i18n/collationsettings.cpp) But I still do a run-time check instead of ASSERT just in case.

3. Use jsNontrivialString to create strings for usage and sensitivity.

It's obvious from the source code that these strings are longer than one character.
Comment 29 Darin Adler 2015-12-17 13:23:02 PST
Comment on attachment 267556 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=267556&action=review

Looks like Windows is still failing to build :-(

Looks OK once Windows builds.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:304
> +    if (sensitivityString.isNull()) {
> +        // a. If u is "sort", then let s be "variant".
> +        // b. Else
> +        //    i. Let dataLocale be the value of r.[[dataLocale]].
> +        //    ii. Let dataLocaleData be Get(localeData, dataLocale).
> +        //    iii. Let s be Get(dataLocaleData, "sensitivity").
> +        // 10.2.3 "[[searchLocaleData]][locale] must have a sensitivity property with a String value equal to "base", "accent", "case", or "variant" for all locale values."
> +        sensitivityString = ASCIILiteral("variant");
> +    }
> +    // 27. Set collator.[[sensitivity]] to s.
> +    if (sensitivityString == "base")
> +        m_sensitivity = Sensitivity::Base;
> +    else if (sensitivityString == "accent")
> +        m_sensitivity = Sensitivity::Accent;
> +    else if (sensitivityString == "case")
> +        m_sensitivity = Sensitivity::Case;
> +    else if (sensitivityString == "variant")
> +        m_sensitivity = Sensitivity::Variant;
> +    else
> +        ASSERT_NOT_REACHED();

Seems a little silly to do this as two steps. It could just be:

    else if (sensitivityString.isNull() || sensitivityString == "variant")
        m_sensitivity = Sensitivity::Variant;

Or even skip the ASSERT_NOT_REACHED thing and do this:

    else
        m_sensitivity = Sensitivity::Variant;

> Source/JavaScriptCore/runtime/IntlCollator.cpp:325
> +void IntlCollator::createCollator(ExecState& state)
> +{

I think we could use an ASSERT(!m_collator) at the start of this function to help make it clear locally in this function that we won’t leak m_collator by overwriting it without closing it.

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:93
> +    // 1. Let collator be the this value.
> +    IntlCollator* collator = jsDynamicCast<IntlCollator*>(state->thisValue());
> +
> +    // 2. Assert: Type(collator) is Object and collator has an [[initializedCollator]] internal slot whose value is true.
> +    ASSERT(collator);

I’m really surprised that “Assert” in the specification really means what ASSERT means in our code. I would have expected that an assertion in the specification is more about conditions that would lead to an exception as opposed to conditions that can’t possible be true because we are guaranteed the caller won’t violate certain invariants.

Is it really true that no one can call this function with a thisValue that is not an IntlCollator*? If so, then why do a jsDynamicCast rather than a static_cast?
Comment 30 Sukolsak Sakshuwong 2015-12-17 15:21:38 PST
(In reply to comment #29)
> Comment on attachment 267556 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=267556&action=review
> 
> Looks like Windows is still failing to build :-(
> 
> Looks OK once Windows builds.
> 
> > Source/JavaScriptCore/runtime/IntlCollator.cpp:304
> > +    if (sensitivityString.isNull()) {
> > +        // a. If u is "sort", then let s be "variant".
> > +        // b. Else
> > +        //    i. Let dataLocale be the value of r.[[dataLocale]].
> > +        //    ii. Let dataLocaleData be Get(localeData, dataLocale).
> > +        //    iii. Let s be Get(dataLocaleData, "sensitivity").
> > +        // 10.2.3 "[[searchLocaleData]][locale] must have a sensitivity property with a String value equal to "base", "accent", "case", or "variant" for all locale values."
> > +        sensitivityString = ASCIILiteral("variant");
> > +    }
> > +    // 27. Set collator.[[sensitivity]] to s.
> > +    if (sensitivityString == "base")
> > +        m_sensitivity = Sensitivity::Base;
> > +    else if (sensitivityString == "accent")
> > +        m_sensitivity = Sensitivity::Accent;
> > +    else if (sensitivityString == "case")
> > +        m_sensitivity = Sensitivity::Case;
> > +    else if (sensitivityString == "variant")
> > +        m_sensitivity = Sensitivity::Variant;
> > +    else
> > +        ASSERT_NOT_REACHED();
> 
> Seems a little silly to do this as two steps. It could just be:
> 
>     else if (sensitivityString.isNull() || sensitivityString == "variant")
>         m_sensitivity = Sensitivity::Variant;
> 
> Or even skip the ASSERT_NOT_REACHED thing and do this:
> 
>     else
>         m_sensitivity = Sensitivity::Variant;
> 
> > Source/JavaScriptCore/runtime/IntlCollator.cpp:325
> > +void IntlCollator::createCollator(ExecState& state)
> > +{
> 
> I think we could use an ASSERT(!m_collator) at the start of this function to
> help make it clear locally in this function that we won’t leak m_collator by
> overwriting it without closing it.
> 
> > Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:93
> > +    // 1. Let collator be the this value.
> > +    IntlCollator* collator = jsDynamicCast<IntlCollator*>(state->thisValue());
> > +
> > +    // 2. Assert: Type(collator) is Object and collator has an [[initializedCollator]] internal slot whose value is true.
> > +    ASSERT(collator);
> 
> I’m really surprised that “Assert” in the specification really means what
> ASSERT means in our code. I would have expected that an assertion in the
> specification is more about conditions that would lead to an exception as
> opposed to conditions that can’t possible be true because we are guaranteed
> the caller won’t violate certain invariants.
> 
> Is it really true that no one can call this function with a thisValue that
> is not an IntlCollator*? If so, then why do a jsDynamicCast rather than a
> static_cast?
Comment 31 Sukolsak Sakshuwong 2015-12-17 15:31:13 PST
Oops, sorry. I accidentally clicked Save Changes. I will fix the linking issue on Windows.

(In reply to comment #29)
> Seems a little silly to do this as two steps. It could just be:
> 
>     else if (sensitivityString.isNull() || sensitivityString == "variant")
>         m_sensitivity = Sensitivity::Variant;
> 
> Or even skip the ASSERT_NOT_REACHED thing and do this:
> 
>     else
>         m_sensitivity = Sensitivity::Variant;

Will do.

> > Source/JavaScriptCore/runtime/IntlCollator.cpp:325
> > +void IntlCollator::createCollator(ExecState& state)
> > +{
> 
> I think we could use an ASSERT(!m_collator) at the start of this function to
> help make it clear locally in this function that we won’t leak m_collator by
> overwriting it without closing it.

Will do.

> > Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:93
> > +    // 1. Let collator be the this value.
> > +    IntlCollator* collator = jsDynamicCast<IntlCollator*>(state->thisValue());
> > +
> > +    // 2. Assert: Type(collator) is Object and collator has an [[initializedCollator]] internal slot whose value is true.
> > +    ASSERT(collator);
> 
> I’m really surprised that “Assert” in the specification really means what
> ASSERT means in our code. I would have expected that an assertion in the
> specification is more about conditions that would lead to an exception as
> opposed to conditions that can’t possible be true because we are guaranteed
> the caller won’t violate certain invariants.
> 
> Is it really true that no one can call this function with a thisValue that
> is not an IntlCollator*? If so, then why do a jsDynamicCast rather than a
> static_cast?

It's true. Here's the definition of Intl.Collator.prototype.compare:

    "the phrase 'this Collator object' refers to the object that is the this value for the invocation of the function; a TypeError exception is thrown if the this value is not an object or an object that does not have an [[initializedCollator]] internal slot with value true."

    "1. Let collator be this Collator object.
    2. If collator.[[boundCompare]] is undefined, then
        a. Let F be a new built-in function object as defined in 11.3.4.
        b. The value of F’s length property is 2.
        c. Let bc be BoundFunctionCreate(F, «this value»).
        d. Set collator.[[boundCompare]] to bc.
    3. Return collator.[[boundCompare]]."

"F" in Step 2 is our IntlCollatorFuncCompare function. Since it's bound to a value that we have already checked that it is a Collator, the spec can assert that.

We do have a test case "new Intl.Collator().compare.call(5, 'a', 'b')". As expected, it does not throw.

I will change jsDynamicCast to jsCast.
Comment 32 Alex Christensen 2015-12-17 16:31:33 PST
I tried building the patch locally, and it built fine.  Let's land this and see what happens.
Comment 33 Sukolsak Sakshuwong 2015-12-17 16:35:40 PST
Created attachment 267598 [details]
Patch for landing
Comment 34 WebKit Commit Bot 2015-12-17 17:36:20 PST
Comment on attachment 267598 [details]
Patch for landing

Clearing flags on attachment: 267598

Committed r194253: <http://trac.webkit.org/changeset/194253>
Comment 35 WebKit Commit Bot 2015-12-17 17:36:28 PST
All reviewed patches have been landed.  Closing bug.