Bug 147599 - [INTL] Implement supportedLocalesOf on Intl Constructors
Summary: [INTL] Implement supportedLocalesOf on Intl Constructors
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Andy VanWagoner
URL:
Keywords:
Depends on:
Blocks: 90906
  Show dependency treegraph
 
Reported: 2015-08-03 16:46 PDT by Andy VanWagoner
Modified: 2015-09-15 10:43 PDT (History)
2 users (show)

See Also:


Attachments
Patch (176.30 KB, patch)
2015-08-18 18:42 PDT, Andy VanWagoner
no flags Details | Formatted Diff | Diff
Patch (176.28 KB, patch)
2015-08-19 11:34 PDT, Andy VanWagoner
no flags Details | Formatted Diff | Diff
Patch (176.43 KB, patch)
2015-08-29 12:40 PDT, Andy VanWagoner
no flags Details | Formatted Diff | Diff
Patch (176.30 KB, patch)
2015-09-02 17:55 PDT, Andy VanWagoner
no flags Details | Formatted Diff | Diff
Patch (176.45 KB, patch)
2015-09-10 19:05 PDT, Andy VanWagoner
no flags Details | Formatted Diff | Diff
Patch (176.73 KB, patch)
2015-09-12 07:56 PDT, Andy VanWagoner
no flags Details | Formatted Diff | Diff
Patch (177.99 KB, patch)
2015-09-12 10:12 PDT, Andy VanWagoner
no flags Details | Formatted Diff | Diff
Patch (177.87 KB, patch)
2015-09-15 07:59 PDT, Andy VanWagoner
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andy VanWagoner 2015-08-03 16:46:32 PDT
Implement ECMA-402 2.0 supportedLocalesOf
http://ecma-international.org/publications/standards/Ecma-402.htm

10.2.2 Intl.Collator.supportedLocalesOf (locales [, options ])
11.2.2 Intl.NumberFormat.supportedLocalesOf (locales [, options ])
12.2.2 Intl.DateTimeFormat.supportedLocalesOf (locales [, options ])

All of these functions operate in similar ways and rely on abstract operations:

6.2.3 CanonicalizeLanguageTag (locale)
9.2.1 CanonicalizeLocaleList (locales)
Comment 1 Andy VanWagoner 2015-08-18 18:42:03 PDT
Created attachment 259338 [details]
Patch
Comment 2 Andy VanWagoner 2015-08-19 10:29:28 PDT
Some style check errors were lost in the sea of style problems with the added icu headers. I will fix the errors in IntlObject.cpp.
Comment 3 Andy VanWagoner 2015-08-19 11:34:39 PDT
Created attachment 259382 [details]
Patch
Comment 4 Benjamin Poulain 2015-08-27 18:18:56 PDT
Comment on attachment 259382 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=259382&action=review

I am finally starting on this.
Some quick comments first.

> Source/JavaScriptCore/runtime/IntlObject.cpp:62
> +String getCanonicalLangTag(const Vector<String>&);
> +String getPrivateUseLangTag(const Vector<String>&, size_t);
> +String getGrandfatheredLangTag(const String&);
> +String canonicalizeLanguageTag(String);
> +String defaultLocale();
> +String bestAvailableLocale(const HashSet<String>&, String);
> +JSArray* lookupSupportedLocales(ExecState*, const HashSet<String>&, JSArray*);
> +JSArray* bestFitSupportedLocales(ExecState*, const HashSet<String>&, JSArray*);
> +String getIntlStringOption(ExecState*, JSValue, PropertyName, const HashSet<String>&, const char*, String);
> +bool getIntlBooleanOption(ExecState*, JSValue, PropertyName, bool);

You don't need to declare everything ahead of time when the function is in the same file.

If it is 
1) static
2) defined before it is used

you can use it without previous declaration.
Static function also tell the compiler that they are not available anywhere else than this file, which is a good thing to know for optimizing the code.

> Source/JavaScriptCore/runtime/IntlObject.cpp:119
> +String canonicalizeLanguageTag(String locale)

You can pass the argument as const String& to avoid ref-deref of the input.

> Source/JavaScriptCore/runtime/IntlObject.cpp:136
> +    size_t numParts = parts.size();
> +    if (numParts > 0) {

You can replace those two lines by
    if (!parts.isEmpty()) {

> Source/JavaScriptCore/runtime/IntlObject.cpp:150
> +{

Since parts not being empty is a precondition of this function, you can make that explicit by using an assertion:
ASSERT(!parts.isEmpty());

> Source/JavaScriptCore/runtime/IntlObject.cpp:158
> +    String currentPart = parts[currentIndex];

const String& currentPart = ...

To avoid a ref-deref.

> Source/JavaScriptCore/runtime/IntlObject.cpp:165
> +    canonical.append(currentPart.lower());

You can safely use convertToASCIILowercase().
String::lower() is a bit slower since it needs to checks for full unicode conversion. Here you know it can't happen by design since you would have escaped the previous branch.

> Source/JavaScriptCore/runtime/IntlObject.cpp:170
> +        for (int times = 0; times < 3 && currentIndex < numParts; ++times) {

int -> unsigned.

> Source/JavaScriptCore/runtime/IntlObject.cpp:172
> +            currentPart = parts[currentIndex];
> +            currentLength = currentPart.length();

Better use locals in this block rather than use the more global names.
You are gonna make the compiler's life simpler as the liveness analysis is guaranteed to end with the block.

> Source/JavaScriptCore/runtime/IntlObject.cpp:175
> +                canonical.append("-");

canonical.append('-');

StringBuilder has a fast path for single characters.

> Source/JavaScriptCore/runtime/IntlObject.cpp:176
> +                canonical.append(currentPart.lower());

lower()->convertToASCIILowercase()

> Source/JavaScriptCore/runtime/IntlObject.cpp:284
> +        [] (String a, String b) -> bool {

String -> const String&

> Source/JavaScriptCore/runtime/IntlObject.cpp:358
> +    //  grandfathered = irregular / regular
> +    HashMap<String, String> tagMap = {

Can you add a performance note?
FIXME: convert to a compile time hash table if this is causing performance issues.

> Source/JavaScriptCore/runtime/IntlObject.cpp:389
> +    String lowerTag = locale.lower();

Should this really be a full conversion? Most spec requires ASCII lower which is obtained by convertToASCIILowercase().
I'll have to check BCP47 a bit.

> Source/JavaScriptCore/runtime/IntlObject.cpp:391
> +    if (tagMap.contains(lowerTag))
> +        return tagMap.get(lowerTag);

You can just do return tagMap.get(lowerTag). It will return a default constructed object if the key is not present.

> Source/JavaScriptCore/runtime/IntlObject.cpp:420
> +    // 2. Let seen be an empty List.
> +    // Done before to also return in step 1, if needed.

This looks cleaner than the actual spec :)

> Source/JavaScriptCore/runtime/IntlObject.cpp:423
> +    JSObject* oLocales;

Terrible name...:(
I wish the spec had picked something better.

> Source/JavaScriptCore/runtime/IntlObject.cpp:426
> +        JSArray* aLocales = JSArray::tryCreateUninitialized(vm, globalObject->arrayStructureForIndexingTypeDuringAllocation(ArrayWithUndecided), 1);

You could also use ArrayWithContiguous instead of ArrayWithUndecided.

It does not matter much. ArrayWithUndecided is an array with contiguous values without any particular type. ArrayWithContiguous is a contiguous array with any type, it is used when you put any non-numeric value in a contiguous array.

The next line, initializeIndex() will convert the Undecided shape into a Contiguous shape because "locales" is a string.

> Source/JavaScriptCore/runtime/IntlObject.cpp:445
> +    int32_t length = lengthProperty.toInt32(exec);
> +    if (length < 0)

Looks like the spec defines ToLength on 53 bits.
I am okay with this for now, this is really a detail and we have the same bug all over JSArray.

> Source/JavaScriptCore/runtime/IntlObject.cpp:449
> +    HashSet<String> seenSet = HashSet<String>();

You can do:
    HashSet<String> seenSet;

> Source/JavaScriptCore/runtime/IntlObject.cpp:496
> +            if (!seenSet.contains(canonicalizedTag)) {
> +                seenSet.add(canonicalizedTag);

There is a trick to do this faster.

If you do set.contains() followed by set.add(), you have 2 hash lookup. They are both in amortized O(1) but you still do the operation twice.
Instead, you can do
    if (seenSet.add(canonicalizedTag).isNewEntry)
        seen->push(exec, jsString(exec, canonicalizedTag));

> Source/JavaScriptCore/runtime/IntlObject.cpp:582
> +            builder.append("-");

'-'
Comment 5 Andy VanWagoner 2015-08-28 09:36:30 PDT
Thanks! I'll make these changes as soon as I can.

(In reply to comment #4)
> Comment on attachment 259382 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=259382&action=review
> 
> I am finally starting on this.
> Some quick comments first.
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:62
> > +String getCanonicalLangTag(const Vector<String>&);
> > +String getPrivateUseLangTag(const Vector<String>&, size_t);
> > +String getGrandfatheredLangTag(const String&);
> > +String canonicalizeLanguageTag(String);
> > +String defaultLocale();
> > +String bestAvailableLocale(const HashSet<String>&, String);
> > +JSArray* lookupSupportedLocales(ExecState*, const HashSet<String>&, JSArray*);
> > +JSArray* bestFitSupportedLocales(ExecState*, const HashSet<String>&, JSArray*);
> > +String getIntlStringOption(ExecState*, JSValue, PropertyName, const HashSet<String>&, const char*, String);
> > +bool getIntlBooleanOption(ExecState*, JSValue, PropertyName, bool);
> 
> You don't need to declare everything ahead of time when the function is in
> the same file.
> 
> If it is 
> 1) static
> 2) defined before it is used
> 
> you can use it without previous declaration.
> Static function also tell the compiler that they are not available anywhere
> else than this file, which is a good thing to know for optimizing the code.
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:119
> > +String canonicalizeLanguageTag(String locale)
> 
> You can pass the argument as const String& to avoid ref-deref of the input.
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:136
> > +    size_t numParts = parts.size();
> > +    if (numParts > 0) {
> 
> You can replace those two lines by
>     if (!parts.isEmpty()) {
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:150
> > +{
> 
> Since parts not being empty is a precondition of this function, you can make
> that explicit by using an assertion:
> ASSERT(!parts.isEmpty());
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:158
> > +    String currentPart = parts[currentIndex];
> 
> const String& currentPart = ...
> 
> To avoid a ref-deref.
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:165
> > +    canonical.append(currentPart.lower());
> 
> You can safely use convertToASCIILowercase().
> String::lower() is a bit slower since it needs to checks for full unicode
> conversion. Here you know it can't happen by design since you would have
> escaped the previous branch.
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:170
> > +        for (int times = 0; times < 3 && currentIndex < numParts; ++times) {
> 
> int -> unsigned.
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:172
> > +            currentPart = parts[currentIndex];
> > +            currentLength = currentPart.length();
> 
> Better use locals in this block rather than use the more global names.
> You are gonna make the compiler's life simpler as the liveness analysis is
> guaranteed to end with the block.
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:175
> > +                canonical.append("-");
> 
> canonical.append('-');
> 
> StringBuilder has a fast path for single characters.
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:176
> > +                canonical.append(currentPart.lower());
> 
> lower()->convertToASCIILowercase()
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:284
> > +        [] (String a, String b) -> bool {
> 
> String -> const String&
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:358
> > +    //  grandfathered = irregular / regular
> > +    HashMap<String, String> tagMap = {
> 
> Can you add a performance note?
> FIXME: convert to a compile time hash table if this is causing performance
> issues.
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:389
> > +    String lowerTag = locale.lower();
> 
> Should this really be a full conversion? Most spec requires ASCII lower
> which is obtained by convertToASCIILowercase().
> I'll have to check BCP47 a bit.
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:391
> > +    if (tagMap.contains(lowerTag))
> > +        return tagMap.get(lowerTag);
> 
> You can just do return tagMap.get(lowerTag). It will return a default
> constructed object if the key is not present.
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:420
> > +    // 2. Let seen be an empty List.
> > +    // Done before to also return in step 1, if needed.
> 
> This looks cleaner than the actual spec :)
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:423
> > +    JSObject* oLocales;
> 
> Terrible name...:(
> I wish the spec had picked something better.
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:426
> > +        JSArray* aLocales = JSArray::tryCreateUninitialized(vm, globalObject->arrayStructureForIndexingTypeDuringAllocation(ArrayWithUndecided), 1);
> 
> You could also use ArrayWithContiguous instead of ArrayWithUndecided.
> 
> It does not matter much. ArrayWithUndecided is an array with contiguous
> values without any particular type. ArrayWithContiguous is a contiguous
> array with any type, it is used when you put any non-numeric value in a
> contiguous array.
> 
> The next line, initializeIndex() will convert the Undecided shape into a
> Contiguous shape because "locales" is a string.
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:445
> > +    int32_t length = lengthProperty.toInt32(exec);
> > +    if (length < 0)
> 
> Looks like the spec defines ToLength on 53 bits.
> I am okay with this for now, this is really a detail and we have the same
> bug all over JSArray.
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:449
> > +    HashSet<String> seenSet = HashSet<String>();
> 
> You can do:
>     HashSet<String> seenSet;
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:496
> > +            if (!seenSet.contains(canonicalizedTag)) {
> > +                seenSet.add(canonicalizedTag);
> 
> There is a trick to do this faster.
> 
> If you do set.contains() followed by set.add(), you have 2 hash lookup. They
> are both in amortized O(1) but you still do the operation twice.
> Instead, you can do
>     if (seenSet.add(canonicalizedTag).isNewEntry)
>         seen->push(exec, jsString(exec, canonicalizedTag));
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:582
> > +            builder.append("-");
> 
> '-'
Comment 6 Andy VanWagoner 2015-08-29 12:40:57 PDT
Created attachment 260225 [details]
Patch
Comment 7 Benjamin Poulain 2015-08-31 21:37:04 PDT
Comment on attachment 260225 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=260225&action=review

Quick comments. I emailed the JS team to get help on your patch.

> Source/JavaScriptCore/runtime/IntlObject.cpp:108
> +String defaultLocale()

This does not seem to be used anywhere.

> Source/JavaScriptCore/runtime/IntlObject.cpp:112
> +    String loc = String(def, strlen(def));

You can use
    String loc(def);
or just
    String locale(uloc_getDefault());
The string constructor taking a char* use strlen to find the length by default.

> Source/JavaScriptCore/runtime/IntlObject.cpp:551
> +        size_t pos = candidate.reverseFind("-");

I believe there is a faster String::reverseFind('-') available.

> Source/JavaScriptCore/runtime/IntlObject.cpp:556
> +        if (pos >= 2 && candidate.substring(pos-2, 1) == "-")

You really don't want to do candidate.substring(pos-2, 1) to check a character because it is a heavy operation. A new string of size 1 is allocated, initialized, the character copied. Then it is compared to "-" and immediately destroyed.

Instead, you could to:
    if (post >= 2 && pos - 2 <= candidate.length && candidate[post - 2] == '-')

> Source/JavaScriptCore/runtime/IntlObject.h:62
> +JSArray* canonicalizeLocaleList(ExecState*, JSValue);
> +JSValue supportedLocales(ExecState*, const HashSet<String>&, JSArray*, JSValue);

It would be worth naming some of those arguments for clarity. For example, in the first one, adding the name "locales" for the second argument.

> Source/JavaScriptCore/runtime/JSCJSValue.cpp:60
> +{

Can you please add a reference to the spec? (http://www.ecma-international.org/ecma-262/6.0/#sec-tolength)
This seems valuable since this code is non obvious.

> Source/JavaScriptCore/runtime/JSGlobalObject.cpp:1004
> +            String str = String(udat_getAvailable(i));

This could be 
    String str(udat_getAvailable(i));

The variable could be more descriptive, "localeName" maybe?

> Source/JavaScriptCore/runtime/JSGlobalObject.cpp:1018
> +            String str = String(unum_getAvailable(i));

This could be 
    String str(unum_getAvailable(i));

The variable could be more descriptive, "localeName" maybe?
Comment 8 Mark Lam 2015-09-01 09:20:44 PDT
Comment on attachment 260225 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=260225&action=review

> Source/JavaScriptCore/icu/unicode/ucal.h:5
> + *******************************************************************************
> + * Copyright (C) 1996-2010, International Business Machines Corporation and
> + * others. All Rights Reserved.
> + *******************************************************************************

Just curious, Andy, did IBM release the right to use this and the other files copyrighted by them?  If so, under what terms?  Are the terms compatible with the WebKit BSD license (see https://www.webkit.org/coding/bsd-license.html)?

I'm not a lawyer, but I think if IBM is willing to open source this code, their license / terms should be stated here.  This header, read literally, suggests that we have no right to use these files.
Comment 9 Andy VanWagoner 2015-09-01 10:30:31 PDT
Comment on attachment 260225 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=260225&action=review

>> Source/JavaScriptCore/icu/unicode/ucal.h:5
>> + *******************************************************************************
> 
> Just curious, Andy, did IBM release the right to use this and the other files copyrighted by them?  If so, under what terms?  Are the terms compatible with the WebKit BSD license (see https://www.webkit.org/coding/bsd-license.html)?
> 
> I'm not a lawyer, but I think if IBM is willing to open source this code, their license / terms should be stated here.  This header, read literally, suggests that we have no right to use these files.

We already have several of the ICU header files in this folder, and the existing ones contain the same notice, so I assumed we are able to use them without issues.
Comment 10 Andy VanWagoner 2015-09-02 17:55:15 PDT
Created attachment 260461 [details]
Patch
Comment 11 Benjamin Poulain 2015-09-09 20:39:20 PDT
Comment on attachment 260461 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=260461&action=review

Some more comments from reading from the bottom. I have not found any bug so far.

> Source/JavaScriptCore/runtime/IntlObject.cpp:530
> +static String bestAvailableLocale(const HashSet<String>& availableLocales, String locale)

const String& locale

> Source/JavaScriptCore/runtime/IntlObject.cpp:595
> +        sLocale.split("-", parts);

split('-', parts)

> Source/JavaScriptCore/runtime/IntlObject.cpp:597
> +        size_t pp = parts.size();

pp -> partSize

> Source/JavaScriptCore/runtime/IntlObject.cpp:600
> +        for (size_t p = 1; p < pp;) {

Why not have the "++p" in the for() statement?

> Source/JavaScriptCore/runtime/IntlObject.cpp:603
> +                while (++p < pp && parts[p].length() > 1) { }

I am afraid this could be read as if the "continue" was the only statement inside the while() loop.

It may be better to have:
    ++p
    while (p << partSize && ...) {
        // Skip part.
        ++p;
    }
    continue;
?

> Source/JavaScriptCore/runtime/IntlObject.cpp:642
> +        const HashSet<String> matchers = HashSet<String>({ ASCIILiteral("lookup"), ASCIILiteral("best fit") });

const HashSet<String> matchers({ ASCIILiteral("lookup"), ASCIILiteral("best fit") });

> Source/JavaScriptCore/runtime/IntlObject.cpp:672
> +    PropertyNameArray keys = PropertyNameArray(exec, PropertyNameMode::Strings);

PropertyNameArray keys(exec, PropertyNameMode::Strings);

> Source/JavaScriptCore/runtime/IntlObject.cpp:675
> +    PropertyDescriptor desc = PropertyDescriptor();

PropertyDescriptor desc;
Comment 12 Andy VanWagoner 2015-09-10 17:26:22 PDT
Comment on attachment 260461 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=260461&action=review

>> Source/JavaScriptCore/runtime/IntlObject.cpp:600
>> +        for (size_t p = 1; p < pp;) {
> 
> Why not have the "++p" in the for() statement?

because it would execute after the continue and skip the next singleton.

>> Source/JavaScriptCore/runtime/IntlObject.cpp:603
>> +                while (++p < pp && parts[p].length() > 1) { }
> 
> I am afraid this could be read as if the "continue" was the only statement inside the while() loop.
> 
> It may be better to have:
>     ++p
>     while (p << partSize && ...) {
>         // Skip part.
>         ++p;
>     }
>     continue;
> ?

My code here isn't easy to follow, I'll try a different approach.
Comment 13 Andy VanWagoner 2015-09-10 19:05:54 PDT
Created attachment 260983 [details]
Patch
Comment 14 Benjamin Poulain 2015-09-11 14:15:29 PDT
Comment on attachment 260983 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=260983&action=review

Some more comments, trying to find untested branches.

> Source/JavaScriptCore/runtime/IntlObject.cpp:192
> +        if (!numExtParts)
> +            return String();

Do you have test covering this case? I did not see it.

> Source/JavaScriptCore/runtime/IntlObject.cpp:215
> +    if (languageLength < 2 || languageLength > 8 || !language.isAllSpecialCharacters<isASCIIAlpha>())

I was very confused by this, I believed this to be wrong for a while, added some comments, then saw the "languageLength <= 3" later.

I believe it could be clearer to:
1) Add the full grammar with the 4-8 rules
2) Maybe put "languageLength <= 3" in a temporary earlier to clarify the "language". Proposal:
    if (language < 2)
        return String();
    bool languageCanBeISO639PlusExtlang = languageLength <= 3;
    if (!languageCanBeISO639PlusExtlang && languageLength > 8 && ...)
       return String()

What do you think?

> Source/JavaScriptCore/runtime/IntlObject.cpp:246
> +            canonical.append(script.substring(1, 3).convertToASCIILowercase());

You are losing me there. Why do you remove the first character?

> Source/JavaScriptCore/runtime/IntlObject.cpp:491
> +        if (kPresent) {

I don't see a test case for "kPresent == false". Can you please add one if it is missing?


> Source/JavaScriptCore/runtime/IntlObject.cpp:514
> +            String canonicalizedTag = canonicalizeLanguageTag(tag->value(exec));

Can you please add a test of an empty string for the tag?
Comment 15 Andy VanWagoner 2015-09-11 14:44:17 PDT
Comment on attachment 260983 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=260983&action=review

>> Source/JavaScriptCore/runtime/IntlObject.cpp:192
>> +            return String();
> 
> Do you have test covering this case? I did not see it.

Invalid language tag: en-x takes this path.

>> Source/JavaScriptCore/runtime/IntlObject.cpp:215
>> +    if (languageLength < 2 || languageLength > 8 || !language.isAllSpecialCharacters<isASCIIAlpha>())
> 
> I was very confused by this, I believed this to be wrong for a while, added some comments, then saw the "languageLength <= 3" later.
> 
> I believe it could be clearer to:
> 1) Add the full grammar with the 4-8 rules
> 2) Maybe put "languageLength <= 3" in a temporary earlier to clarify the "language". Proposal:
>     if (language < 2)
>         return String();
>     bool languageCanBeISO639PlusExtlang = languageLength <= 3;
>     if (!languageCanBeISO639PlusExtlang && languageLength > 8 && ...)
>        return String()
> 
> What do you think?

Yeah, this can probably be made clearer.

>> Source/JavaScriptCore/runtime/IntlObject.cpp:246
>> +            canonical.append(script.substring(1, 3).convertToASCIILowercase());
> 
> You are losing me there. Why do you remove the first character?

The first character was added capitalized, the rest added lowercased. Like Hans in zh-Hans-TW.
Comment 16 Andy VanWagoner 2015-09-12 07:56:07 PDT
Created attachment 261054 [details]
Patch
Comment 17 Andy VanWagoner 2015-09-12 10:12:59 PDT
Created attachment 261056 [details]
Patch
Comment 18 Benjamin Poulain 2015-09-14 21:07:03 PDT
Comment on attachment 261056 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=261056&action=review

Review in progress

> Source/JavaScriptCore/runtime/IntlObject.cpp:247
> +            canonical.append(toASCIIUpper(script[0]));

I am an idiot for missing that one :)

> Source/JavaScriptCore/runtime/IntlObject.cpp:282
> +        const String& lowerVariant = variant.convertToASCIILowercase();

I am not a big fan of this feature of C++.

I think using "String" instead of "const String&" makes it more explicit that the lifetime of the reference is scoped by lowerVariant.

You don't have to change that, just a comment.

> Source/JavaScriptCore/runtime/IntlObject.cpp:285
> +        if (subtags.contains(lowerVariant))
> +            return String();
> +        subtags.add(lowerVariant);

You can use add(lowerVariant).isNewEntry to avoid a double lookup.

> Source/JavaScriptCore/runtime/IntlObject.cpp:310
> +        if (subtags.contains(singleton))
> +            return String();
> +        subtags.add(singleton);

if (!subtags.add(singleton).isNewEntry)

To avoid a double-lookup.

> Source/JavaScriptCore/runtime/IntlObject.cpp:365
> +static String getGrandfatheredLangTag(const String& locale)

note to self: resume review from here
Comment 19 Benjamin Poulain 2015-09-14 22:41:43 PDT
Comment on attachment 261056 [details]
Patch

No more comment!

This patch is in great shape, let's land it!

Can you do the minor hash map update?
Comment 20 Andy VanWagoner 2015-09-15 07:59:21 PDT
Created attachment 261192 [details]
Patch
Comment 21 WebKit Commit Bot 2015-09-15 10:43:42 PDT
Comment on attachment 261192 [details]
Patch

Clearing flags on attachment: 261192

Committed r189811: <http://trac.webkit.org/changeset/189811>
Comment 22 WebKit Commit Bot 2015-09-15 10:43:47 PDT
All reviewed patches have been landed.  Closing bug.