Bug 170720 - [JSC] Date.parse should accept wider range of representation
Summary: [JSC] Date.parse should accept wider range of representation
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Yusuke Suzuki
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-11 02:12 PDT by Yusuke Suzuki
Modified: 2017-07-25 12:46 PDT (History)
17 users (show)

See Also:


Attachments
Patch (8.07 KB, patch)
2017-04-13 05:23 PDT, Yusuke Suzuki
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yusuke Suzuki 2017-04-11 02:12:43 PDT
Date.parse now conforms the spec. But it is a bit strict and users want to parse some more relaxed ones[1].
What we would like is adding a fallback path to try to parse it if the current Date.parse fails to parse.

[1]: https://github.com/ariya/phantomjs/issues/11151
Comment 1 Yusuke Suzuki 2017-04-13 05:23:11 PDT
Created attachment 306985 [details]
Patch
Comment 2 Darin Adler 2017-04-13 11:27:16 PDT
Comment on attachment 306985 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=306985&action=review

> Source/WTF/wtf/DateMath.cpp:897
> +    std::optional<int> year = std::nullopt;

No need for "= std::nullopt". That’s what optionals do when not specified.

> Source/WTF/wtf/DateMath.cpp:1132
> +        // 1. According to the V8, this default value is compatible to the old KJS. While current V8 uses 2001,
> +        //    SpiderMonkey does not have such a fallback path. So this reason is not big deal.

I am a bit confused by this comment. V8 is the name of a JavaScript engine, not a person, so I don’t know what "according to the V8" means; maybe someone on the V8 team did some research about the behavior of older versions of WebKit? I don’t know what "compatible to the old KJS" means or why it’s relevant to our decision of what behavior is best. “The old KJS” seems quite unimportant; very few websites were tailored to assume the behavior of the KJS engine since it wasn’t used by all that many people. So maybe it’s simply a peculiar way to talk about older versions of WebKit and JavaScriptCore. The term “current V8” is not good since the comment is likely to last a long time and this may no longer be true in the future, so we should probably say this in a different way. Sometimes I use the phrase "as of this writing". Here’s a comment I might write, but I don’t know if it’s accurate:

    // 1. Year 2000 was used in older versions of JavaScriptCore, thus some older content
    //    tested only with WebKit might rely on this behavior (according to research done by the V8 team).
    //    (As of April 2017, V8 is using the year 2001 and Spider Monkey is not doing this kind of fallback.)

But when written this way it becomes clear that it’s a little strange to rely on research done by the V8 team to know what the behavior of old versions of WebKit is. Can’t we just check this rather than relying on what they say?

Also, it would be nice to be precise rather than just "older versions of JavaScriptCore".
Comment 3 Yusuke Suzuki 2017-04-14 00:56:14 PDT
Comment on attachment 306985 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=306985&action=review

>> Source/WTF/wtf/DateMath.cpp:1132
>> +        //    SpiderMonkey does not have such a fallback path. So this reason is not big deal.
> 
> I am a bit confused by this comment. V8 is the name of a JavaScript engine, not a person, so I don’t know what "according to the V8" means; maybe someone on the V8 team did some research about the behavior of older versions of WebKit? I don’t know what "compatible to the old KJS" means or why it’s relevant to our decision of what behavior is best. “The old KJS” seems quite unimportant; very few websites were tailored to assume the behavior of the KJS engine since it wasn’t used by all that many people. So maybe it’s simply a peculiar way to talk about older versions of WebKit and JavaScriptCore. The term “current V8” is not good since the comment is likely to last a long time and this may no longer be true in the future, so we should probably say this in a different way. Sometimes I use the phrase "as of this writing". Here’s a comment I might write, but I don’t know if it’s accurate:
> 
>     // 1. Year 2000 was used in older versions of JavaScriptCore, thus some older content
>     //    tested only with WebKit might rely on this behavior (according to research done by the V8 team).
>     //    (As of April 2017, V8 is using the year 2001 and Spider Monkey is not doing this kind of fallback.)
> 
> But when written this way it becomes clear that it’s a little strange to rely on research done by the V8 team to know what the behavior of old versions of WebKit is. Can’t we just check this rather than relying on what they say?
> 
> Also, it would be nice to be precise rather than just "older versions of JavaScriptCore".

I've dived into old logs... This function is added at r34872 and not changed so much. While year is first initialized 0 and later it will be converted to 2000, we do not produce any values without parsing year before this patch. So, users won't see 2000 default value I think.
Comment 4 Yusuke Suzuki 2017-04-14 01:00:04 PDT
Committed r215359: <http://trac.webkit.org/changeset/215359>
Comment 5 Simon Fraser (smfr) 2017-07-24 18:51:07 PDT
This caused the graphs on https://www.recode.net/2017/7/18/15994992/isabel-ge-mahe-apple-china-managing-director to render incorrectly.
Comment 6 Darin Adler 2017-07-24 22:24:49 PDT
Do we know why it caused that, specifically what date strings were being passed?
Comment 7 Yusuke Suzuki 2017-07-25 05:04:34 PDT
Oh, that's very interesting b/c it means some libs would depend on super tricky behavior. I'll setup the environment to check what is going.
Comment 8 Simon Fraser (smfr) 2017-07-25 08:35:22 PDT
I didn't have time to do any investigation other than autospading.
Comment 9 Yusuke Suzuki 2017-07-25 11:35:59 PDT
I've investigated it.

## Situation

This web site passes a string like "Jun. ‘15" to `Date.parse`.
To make this site work correctly, we need to return NaN for this value.

This patch relaxes Date parsing, then, for the above case, we return 2000/Jun/15 Date.
This is expected behavior. As you can see the phantomjs issue in this bug, webdevs want to generate ????/Jun/15 for a string like "Jun 15".

Why Chrome works fine is that it returns NaN for "Jun. ‘15". But it returns 2001/Jun/15 for "Jun. 15", "Jun 15" etc.

## In terms of the spec

ECMA262's guarantee about Date.parse is not-strict. ECMA262 requires that some descriptive format can be parsed by `Date.parse` correctly. In the case of the other values, Date.parse may either succeed or fail. It's implementation dependent heuristic. So, with regard to the spec, our current behavior is correct.
And ideally, users should not assume that the value must be rejected by Date.parse because the spec allows us to parse it with some heuristic.
But the real world is not ideal....... :(

https://tc39.github.io/ecma262/#sec-date.parse

## Possible solutions

1. Revert this patch

By reverting this patch, we return NaN for "Jun. ‘15". But instead, we lose the ability to return 2001/Jun/15 for "Jun. 15", "Jun 15" etc. which is requested in phantomjs's issue (And sometimes we see this request in twitter).

2. Add a hack for "Jun. ‘15" case

Add an additional heuristic for "Jun. ‘15" not to parse it as Date. Possible thing is that, if we see some leading text before the date (15), we do not recognize it as date. But it would break some existing heuristic.

3. Implement the heuristic completely same to the V8's one

Strictly speaking, these heuristics are not specified in the spec. So, the request in the phantomjs issue is basically saying "We want to parse Date as if V8 parses it".
The simplest solution for that is importing the heuristics in V8.

4. Request the site to fix it

Ideally, anybody should not assume that Date.parse should fail with a specific format. The heuristics could recognize it.


To me, for the short term solution, (1) should be appropriate. (2) is a bit too dangerous.
For the long term solution, I think we have (1), (3), and (4).
If we do not want to relax it, (1) and leave the current heuristic as is is the solution.
If we would like to accept some form of Date which is requested in the original issue, we have (3) or (4).

If we need to fix it quickly right now, we should take (1).
Comment 10 Simon Fraser (smfr) 2017-07-25 11:39:58 PDT
How do Gecko and Edge's Date parsing compare?
Comment 11 Yusuke Suzuki 2017-07-25 11:48:39 PDT
(In reply to Simon Fraser (smfr) from comment #10)
> How do Gecko and Edge's Date parsing compare?

Because I don't have Windows machine in my office, I cannot check it in Edge right now (anyone has Edge machine?).
Firefox has different heuristic mechanism from V8.
And it returns NaN for "Jun. ‘15", "Jun 15", and "Jun. 15".
Comment 12 Simon Fraser (smfr) 2017-07-25 11:52:19 PDT
You can get an Edge VM at https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/
Comment 13 Yusuke Suzuki 2017-07-25 12:46:45 PDT
(In reply to Simon Fraser (smfr) from comment #12)
> You can get an Edge VM at
> https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/

Thank you! I've just downloaded and tested.
Edge shows the same behavior to Firefox.
And it returns NaN for "Jun. ‘15", "Jun 15", and "Jun. 15".

So, not relaxing our heuristic is one idea. Basically, phantom.js's issue is raised because it is different from V8.