Bug 229313

Summary: Intl.DateTimeFormat incorrectly parses patterns with 'h' literal
Product: WebKit Reporter: Devon Govett <govett>
Component: JavaScriptCoreAssignee: Yusuke Suzuki <ysuzuki>
Status: RESOLVED FIXED    
Severity: Normal CC: ews-watchlist, keith_miller, mark.lam, msaboff, ross.kirsling, saam, tzagallo, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: Safari Technology Preview   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch ross.kirsling: review+

Description Devon Govett 2021-08-19 17:16:46 PDT
WebKit parses the date/time pattern returned from ICU to do two things: 

1. Determine the `resolvedOptions` - https://github.com/WebKit/WebKit/blob/796333669f054bfaf1e50247d659dfa8834182a7/Source/JavaScriptCore/runtime/IntlDateTimeFormat.cpp#L318
2. To replace the hour part in the pattern when the `hourCycle` option is specified - https://github.com/WebKit/WebKit/blob/796333669f054bfaf1e50247d659dfa8834182a7/Source/JavaScriptCore/runtime/IntlDateTimeFormat.cpp#L486

However, these methods do not take into account that there may be literal parts in quotes which may contain otherwise meaningful symbols which should be ignored. For example, the hour pattern in the `fr` locale has the following pattern: `HH 'h'` https://github.com/unicode-org/cldr/blob/018b55eff7ceb389c7e3fc44e2f657eae3b10b38/common/main/fr.xml#L2962. When WebKit parses this pattern, it interprets the `h` as a symbol but it should be ignored instead.

This manifests in two ways:

1. The resolved `hour12` and `hourCycle` options for the French locale (and any other locale with patterns like this) are incorrect. For example, `new Intl.DateTimeFormat("fr", {hour: "numeric", hour12: false}).resolvedOptions().hour12` results in `true` rather than false, even though the date is formatted in 24 hour time: `new Intl.DateTimeFormat("fr", {hour: "numeric", hour12: false}).format(new Date(2021, 2, 3, 23))` results in `23 h`.

2. When specifying the `hourCycle` option, literals in the pattern are replaced resulting in incorrect formatting. For example, `new Intl.DateTimeFormat("fr", {hour: "numeric", hourCycle: 'h24'}).format(new Date(2021, 2, 3, 23))` results in `23 k` rather than `23 h`, and `new Intl.DateTimeFormat("fr", {hour: "numeric", hourCycle: 'h23'}).format(new Date(2021, 2, 3, 23))` results in `23 H`.
Comment 1 Yusuke Suzuki 2021-08-20 17:09:08 PDT
Nice catch!
Comment 2 Radar WebKit Bug Importer 2021-08-26 17:17:18 PDT
<rdar://problem/82414310>
Comment 3 Yusuke Suzuki 2021-08-26 21:13:26 PDT
Created attachment 436610 [details]
Patch
Comment 4 Ross Kirsling 2021-08-26 21:59:23 PDT
Comment on attachment 436610 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436610&action=review

r=me with comments (about the comments :P)

> Source/JavaScriptCore/runtime/IntlDateTimeFormat.cpp:313
> +    // 'ICU''s change' is "ICU's change" literal text, but even if we split this text into two literal texts,

The use of quotes on this line confused me at first—it would be easier to understand if you quote the first part too, maybe like `'ICU''s change'`.

> Source/JavaScriptCore/runtime/IntlDateTimeFormat.cpp:346
> +    // 2. Literal text, which is output as-is when formatting, and must closely match when
> +    //    parsing. Literal text can include: Any characters other than A..Z and a..z,
> +    //    including spaces and punctuation. Any text between single vertical quotes ('xxxx'),
> +    //    which may include A..Z and a..z as literal text. Two adjacent single vertical quotes
> +    //    (''), which represent a literal single quote, either inside or outside quoted text.

This is okay but since the last three sentences are intended as a bullet list, it seems like you should write it as one.
Comment 5 Yusuke Suzuki 2021-08-27 00:09:48 PDT
Comment on attachment 436610 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436610&action=review

Thanks!

>> Source/JavaScriptCore/runtime/IntlDateTimeFormat.cpp:313
>> +    // 'ICU''s change' is "ICU's change" literal text, but even if we split this text into two literal texts,
> 
> The use of quotes on this line confused me at first—it would be easier to understand if you quote the first part too, maybe like `'ICU''s change'`.

Fixed.

>> Source/JavaScriptCore/runtime/IntlDateTimeFormat.cpp:346
>> +    //    (''), which represent a literal single quote, either inside or outside quoted text.
> 
> This is okay but since the last three sentences are intended as a bullet list, it seems like you should write it as one.

Fixed.
Comment 6 Yusuke Suzuki 2021-08-27 00:15:18 PDT
Committed r281688 (241038@main): <https://commits.webkit.org/241038@main>