Bug 229313 - Intl.DateTimeFormat incorrectly parses patterns with 'h' literal
Summary: Intl.DateTimeFormat incorrectly parses patterns with 'h' literal
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: Safari Technology Preview
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Yusuke Suzuki
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-08-19 17:16 PDT by Devon Govett
Modified: 2021-08-27 00:15 PDT (History)
9 users (show)

See Also:


Attachments
Patch (12.57 KB, patch)
2021-08-26 21:13 PDT, Yusuke Suzuki
ross.kirsling: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Devon Govett 2021-08-19 17:16:46 PDT
WebKit parses the date/time pattern returned from ICU to do two things: 

1. Determine the `resolvedOptions` - https://github.com/WebKit/WebKit/blob/796333669f054bfaf1e50247d659dfa8834182a7/Source/JavaScriptCore/runtime/IntlDateTimeFormat.cpp#L318
2. To replace the hour part in the pattern when the `hourCycle` option is specified - https://github.com/WebKit/WebKit/blob/796333669f054bfaf1e50247d659dfa8834182a7/Source/JavaScriptCore/runtime/IntlDateTimeFormat.cpp#L486

However, these methods do not take into account that there may be literal parts in quotes which may contain otherwise meaningful symbols which should be ignored. For example, the hour pattern in the `fr` locale has the following pattern: `HH 'h'` https://github.com/unicode-org/cldr/blob/018b55eff7ceb389c7e3fc44e2f657eae3b10b38/common/main/fr.xml#L2962. When WebKit parses this pattern, it interprets the `h` as a symbol but it should be ignored instead.

This manifests in two ways:

1. The resolved `hour12` and `hourCycle` options for the French locale (and any other locale with patterns like this) are incorrect. For example, `new Intl.DateTimeFormat("fr", {hour: "numeric", hour12: false}).resolvedOptions().hour12` results in `true` rather than false, even though the date is formatted in 24 hour time: `new Intl.DateTimeFormat("fr", {hour: "numeric", hour12: false}).format(new Date(2021, 2, 3, 23))` results in `23 h`.

2. When specifying the `hourCycle` option, literals in the pattern are replaced resulting in incorrect formatting. For example, `new Intl.DateTimeFormat("fr", {hour: "numeric", hourCycle: 'h24'}).format(new Date(2021, 2, 3, 23))` results in `23 k` rather than `23 h`, and `new Intl.DateTimeFormat("fr", {hour: "numeric", hourCycle: 'h23'}).format(new Date(2021, 2, 3, 23))` results in `23 H`.
Comment 1 Yusuke Suzuki 2021-08-20 17:09:08 PDT
Nice catch!
Comment 2 Radar WebKit Bug Importer 2021-08-26 17:17:18 PDT
<rdar://problem/82414310>
Comment 3 Yusuke Suzuki 2021-08-26 21:13:26 PDT
Created attachment 436610 [details]
Patch
Comment 4 Ross Kirsling 2021-08-26 21:59:23 PDT
Comment on attachment 436610 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436610&action=review

r=me with comments (about the comments :P)

> Source/JavaScriptCore/runtime/IntlDateTimeFormat.cpp:313
> +    // 'ICU''s change' is "ICU's change" literal text, but even if we split this text into two literal texts,

The use of quotes on this line confused me at first—it would be easier to understand if you quote the first part too, maybe like `'ICU''s change'`.

> Source/JavaScriptCore/runtime/IntlDateTimeFormat.cpp:346
> +    // 2. Literal text, which is output as-is when formatting, and must closely match when
> +    //    parsing. Literal text can include: Any characters other than A..Z and a..z,
> +    //    including spaces and punctuation. Any text between single vertical quotes ('xxxx'),
> +    //    which may include A..Z and a..z as literal text. Two adjacent single vertical quotes
> +    //    (''), which represent a literal single quote, either inside or outside quoted text.

This is okay but since the last three sentences are intended as a bullet list, it seems like you should write it as one.
Comment 5 Yusuke Suzuki 2021-08-27 00:09:48 PDT
Comment on attachment 436610 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436610&action=review

Thanks!

>> Source/JavaScriptCore/runtime/IntlDateTimeFormat.cpp:313
>> +    // 'ICU''s change' is "ICU's change" literal text, but even if we split this text into two literal texts,
> 
> The use of quotes on this line confused me at first—it would be easier to understand if you quote the first part too, maybe like `'ICU''s change'`.

Fixed.

>> Source/JavaScriptCore/runtime/IntlDateTimeFormat.cpp:346
>> +    //    (''), which represent a literal single quote, either inside or outside quoted text.
> 
> This is okay but since the last three sentences are intended as a bullet list, it seems like you should write it as one.

Fixed.
Comment 6 Yusuke Suzuki 2021-08-27 00:15:18 PDT
Committed r281688 (241038@main): <https://commits.webkit.org/241038@main>