Bug 31340 - Introduce a class representing HTML5 date&time values
Summary: Introduce a class representing HTML5 date&time values
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Forms (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 31342
  Show dependency treegraph
 
Reported: 2009-11-11 02:14 PST by Kent Tamura
Modified: 2009-11-16 21:15 PST (History)
2 users (show)

See Also:


Attachments
ISO 8601 class (rev.5) (27.45 KB, patch)
2009-11-11 02:32 PST, Kent Tamura
levin: review-
Details | Formatted Diff | Diff
ISO 8601 class (rev.6) (28.37 KB, patch)
2009-11-12 03:24 PST, Kent Tamura
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kent Tamura 2009-11-11 02:14:55 PST
Moves a patch from Bug#29004.
Comment 1 Kent Tamura 2009-11-11 02:32:46 PST
Created attachment 42943 [details]
ISO 8601 class (rev.5)

* Avoid 1-letter variable names except "i".
Comment 2 David Levin 2009-11-11 19:35:44 PST
I'm working on reviewing this.
Comment 3 David Levin 2009-11-12 00:41:30 PST
Comment on attachment 42943 [details]
ISO 8601 class (rev.5)

> diff --git a/WebCore/ChangeLog b/WebCore/ChangeLog
> +2009-09-08  Kent Tamura  <tkent@chromium.org>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        Introduce WebCore::ISODateTime class.
> +        https://bugs.webkit.org/show_bug.cgi?id=29004

This is the wrong bug number.

...
> +        (WebCore::ISODateTime::month):
> +        (WebCore::ISODateTime::fullYear):
> +        (WebCore::ISODateTime::week):
> +        (WebCore::ISODateTime::):

It is good to add per function comments in the change log in general or at least per file.


> diff --git a/WebCore/html/ISODateTime.cpp b/WebCore/html/ISODateTime.cpp
> @@ -0,0 +1,423 @@
> +/*
> + * Copyright (c) 2009, Google Inc. All rights reserved.

Use a capital C and no comma after the year.

"Copyright (C) 2009 Google Inc. All rights reserved."

> +static const int maxDayOfMonths[12] = {31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31};

It would be better to name this array daysInMonth which will help distinquish it from the function with a similar name.


>
> +static int dayOfWeek(int year, int month, int day)
> +{
> +    ++month;
> +
> +    // Zeller's congruence
> +    if (month <= 2) {

It looks like you're treating month as if it is 1 based (making Jan be 13 and Feb 14 which is what the algorithm calls for) but in other places you treat it as one based including this call for example "int day = dayOfWeek(m_year, 0, 1);"

I don't understand how this is working. If this is a bug, please increase your test coverage to expose it.


> +        month += 12;
> +        year--;
> +    }
> +    int highYear = year / 100;
> +    int lowYear = year % 100;
> +    int result = (day + 13 * (month + 1) / 5 + lowYear + lowYear / 4 + highYear / 4 - 2 * highYear - 1) % 7;
> +    if (result < 0)
> +        result += 7;

Why not do 
    int result = (day + 13 * (month + 1) / 5 + lowYear + lowYear / 4 + highYear / 4 + 5 * highYear + 6) % 7;
and get rid of the if (result < 0)?

> +    return result;   // Sunday origin

One space before end of line comments but I feel like the Sunday origin comment should come before the equation above since the it the reason for the + 6.

> +int ISODateTime::maxWeek() const

I like maxWeekNumber as the name. (I don't know what the max week is?)


> +static unsigned countDigits(const UChar* src, unsigned length, unsigned start)
> +{
> +    unsigned count = start;

count isn't a count. It is an index. (It get especially confusing when you see the return value for the count of digits is "count - start".
I'd suggest changing the variable name to index.

> +    for (; count < length; ++count) {
> +        if (!isASCIIDigit(src[count]))
> +            break;
> +    }
> +    return count - start;
> +}
> +
> +// Very strict integer parser.  Not allow leading whitespaces unlike charactersToIntStrict().

Please only use one space after periods in comments.

"Not allow leading whitespaces unlike charactersToIntStrict()." -> "Do not allow leading or trailing whitespaces unlike charactersToIntStrict()."


> +static bool toInt(const UChar* src, unsigned length, unsigned parseStart, unsigned parseLength, int* out)

Because "out" may never be 0, use "int& out".

> +{
>
> +        int digit = *current - '0';
> +        if (value > (INT_MAX - digit) / 10)  // overflow

One space before end of line comments.

Also, this line doesn't cause an overflow, so I'd change the comment: "// Check for overflow."


> +bool ISODateTime::parseYear(const UChar* src, unsigned length, unsigned start, unsigned* end)

"end" may never be 0 so use "unsigned& end"

> +bool ISODateTime::addDay(int dayDiff)
> +{
> +    ASSERT(m_monthDay);
> +
> +    int day = m_monthDay + dayDiff;
> +    if (day > maxDayOfMonth(m_year, m_month)) {
> +        day = m_monthDay;
> +        int month = m_month;
> +        int year = m_year;
> +        for (; dayDiff > 0; --dayDiff) {
> +            ++day;
> +            if (day > maxDayOfMonth(year, month)) {

Doing this look up repeatedly in a loop pains me (especially in Febuary in a leap year). Why not store the result and only change it when the month/year changes below?

After looking over the code more, I take it back because I think this is never used to move very many days. Please correct me if I'm wrong.

> +                day = 1;
> +                ++month;
> +                if (month >= 12) {  // month is 0-origin.

Single space before end of line comments.

> +                    month = 0;
> +                    ++year;
> +                    if (year < 0)  // Overflow.

Single space before end of line comments.

"// Check for overflow."


> +bool ISODateTime::addMinute(int minute)
> +{
> +    int carry;
> +    // min can be negative or greater than 59

Add "." to end of comment.

> +
> +// Parses a timezone part, and adjust year, month, monthDay, hour, minute, second, millesecond.

"millesecond" should be "millisecond"

> +bool ISODateTime::parseTimeZone(const UChar* src, unsigned length, unsigned start, unsigned* end)

Use unsigned& end.

> +{
> +    if (start >= length)
> +        return false;
> +    unsigned i = start;
> +    int hour;
> +    int minute;
> +    if (src[i] == 'Z') {
> +        ++i;
> +        hour = 0;
> +        minute = 0;

Why do you set the hour and minute in this case at all? Why not just move the addMinute into the else?

> +    } else {

> +        if (i >= length || src[i++] != ':')
> +            return false;

It would be clear to move the i++ out of the src[i++] and put it here.

> +    // Subtract the timezone offset.
> +    if (!addMinute( -(hour * 60 + minute)))

No space after open paren.

> +        return false;
> +    *end = i;
> +    return true;
> +}
> +
> +bool ISODateTime::parseMonth(const UChar* src, unsigned length, unsigned start, unsigned* end)

Use unsigned& end

> +{
> +    ASSERT(src);
> +    ASSERT(end);
> +    unsigned i;

Consider using index instead of i.

> +    if (!parseYear(src, length, start, &i))
> +        return false;
> +    if (i >= length || src[i++] != '-')
> +        return false;

It would be clear to move the i++ out of the src[i++] and put it here.


> +bool ISODateTime::parseDate(const UChar* src, unsigned length, unsigned start, unsigned* end)

unsigned& end

> +{
> +    ASSERT(src);
> +    ASSERT(end);
> +    unsigned i;

Consider using index instead of i.

> +    if (!parseMonth(src, length, start, &i))
> +        return false;
> +    // '-' and 2-digits are needed.
> +    if (i + 2 >= length)
> +        return false;
> +    if (src[i++] != '-')
> +        return false;

It would be clear to move the i++ out of the src[i++] and put it here.


> +bool ISODateTime::parseWeek(const UChar* src, unsigned length, unsigned start, unsigned* end)

unsigned& end


> +    unsigned i;

Consider using index instead of i.

> +    if (!parseYear(src, length, start, &i))
> +        return false;
> +    // 4 characters ('-' 'W' digit digit) are needed.
> +    if (i + 3 >= length)
> +        return false;
> +    if (src[i++] != '-')
> +        return false;
> +    if (src[i++] != 'W')
> +        return false;

I think it would be clearer to have the comparison and the after the comparison succeeds to adavance the index. (Move the i++ out of the if statements.)

> +bool ISODateTime::parseTime(const UChar* src, unsigned length, unsigned start, unsigned* end)

unsigned& end

> +{
> +    unsigned i = start + 2;

Consider using index instead of i.

> +    if (i >= length)
> +        return false;
> +    if (src[i++] != ':')
> +        return false;

Consider moving i++ here.

> +    // Optional second part
> +    if (i + 2 < length && src[i] == ':') {
> +        int second;

> +        if (toInt(src, length, i + 1, 2, &second) && second >= 0 && second <= 59) {

Isn't it an error if the seconds are present but not in the proper range?

> +            m_second = second;
> +            i += 3;

> +
> +            // Optional fractional second part
> +            if (i < length && src[i] == '.') {

Consider moving the ++i and making it just i++.

> +                unsigned digitsLength = countDigits(src, length, ++i);
> +                if (digitsLength > 0) {

From my reading of http://www.w3.org/TR/NOTE-datetime, once there is a period, there should be one or more digits. It sounds like anything else is an error.

> +                    bool ok;
> +                    int millisecond;
> +                    if (digitsLength == 1) {
> +                        ok = toInt(src, length, i, 1, &millisecond);
> +                        millisecond *= 100;
> +                    } else if (digitsLength == 2) {
> +                        ok = toInt(src, length, i, 2, &millisecond);
> +                        millisecond *= 10;
> +                    } else // digitsLength >= 3
> +                        ok = toInt(src, length, i, 3, &millisecond);
> +                    if (ok) {
> +                        m_millisecond = millisecond;
> +                        i += digitsLength;
> +                    } else
> +                        --i;

I don't understand this --i. It is "unreading" the '.' but why isn't that done when digitsLength is 0 as well?

> +bool ISODateTime::parseDateTimeLocal(const UChar* src, unsigned length, unsigned start, unsigned* end)

unsigned& end

> +{
> +    ASSERT(src);
> +    ASSERT(end);
> +    unsigned i;

Consider using index instead of i.

> +    if (src[i++] != 'T')
> +        return false;

Consider moving i++ here.

> +    return parseTime(src, length, i, end);
> +}
> +
> +bool ISODateTime::parseDateTime(const UChar* src, unsigned length, unsigned start, unsigned* end)
> +{
> +    ASSERT(src);
> +    ASSERT(end);
> +    unsigned i;

Consider using index instead of i.

> +    if (src[i++] != 'T')
> +        return false;

Consider moving i++ here.

> diff --git a/WebCore/html/ISODateTime.h b/WebCore/html/ISODateTime.h
> + * Copyright (c) 2009, Google Inc. All rights reserved.

 capital (C), no comma after 2009

> +namespace WebCore {
> +
> +    // The following six functions parse the input `src' of which length is

"of which" -> "whose"

> +    // `length', and updates some fields of this instance. The parsing starts at
> +    // src[start] and examines characters before src[length].
> +    // `src', `end' and `date' must be non-null. The `src' string doesn't needs to

"needs" -> "need"



> +    // The functions return true if the parsing succeeds, and sets the next index
> +    // of the last consumed character to `*end'. The leading extra characters cause
Consider using this:
    // The functions return true if the parsing succeeds and set 'end' to the index after
    // the last character consumed. Extra leading characters cause
    // parse failures, and trailing extra characters don't cause parse failures.


> +    // m_weekDay values
> +    enum {
> +        SUNDAY = 0,
> +        MONDAY,
> +        TUESDAY,
> +        WEDNESDAY,
> +        THURSDAY,
> +        FRIDAY,
> +        SATURDAY,
> +    };

Enum members should user InterCaps with an initial capital letter.
 - http://webkit.org/coding/coding-style.html
Comment 4 Kent Tamura 2009-11-12 03:24:50 PST
Created attachment 43055 [details]
ISO 8601 class (rev.6)

(In reply to comment #3)
> > +        https://bugs.webkit.org/show_bug.cgi?id=29004
> 
> This is the wrong bug number.

Fixed.

> It is good to add per function comments in the change log in general or at
> least per file.

Added some comments.

> "Copyright (C) 2009 Google Inc. All rights reserved."

Fixed.

> > +static const int maxDayOfMonths[12] = {31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31};
> 
> It would be better to name this array daysInMonth which will help distinquish
> it from the function with a similar name.

Done.

> > +static int dayOfWeek(int year, int month, int day)
> > +{
> > +    ++month;
> > +
> > +    // Zeller's congruence
> > +    if (month <= 2) {
> 
> It looks like you're treating month as if it is 1 based (making Jan be 13 and
> Feb 14 which is what the algorithm calls for) but in other places you treat it
> as one based including this call for example "int day = dayOfWeek(m_year, 0,
> 1);"
> 
> I don't understand how this is working. If this is a bug, please increase your
> test coverage to expose it.

It's not a bug.
 - ISODateTime::m_month is 0-based to follow "struct tm".
 - month values at other places are 0-based.
 - The month parameter of this function is also 0-based.

I introduced a new local variable named `shiftedMonth' for readability, and added some comments.

> Why not do 
>     int result = (day + 13 * (month + 1) / 5 + lowYear + lowYear / 4 + highYear
> / 4 + 5 * highYear + 6) % 7;
> and get rid of the if (result < 0)?

Done.

> > +    return result;   // Sunday origin
> One space before end of line comments but I feel like the Sunday origin comment
> should come before the equation above since the it the reason for the + 6.

Done.

> > +int ISODateTime::maxWeek() const
> 
> I like maxWeekNumber as the name. (I don't know what the max week is?)

Done.

> > +static unsigned countDigits(const UChar* src, unsigned length, unsigned start)
> > +{
> > +    unsigned count = start;
> 
> count isn't a count. It is an index. (It get especially confusing when you see
> the return value for the count of digits is "count - start".
> I'd suggest changing the variable name to index.

Done.

> > +// Very strict integer parser.  Not allow leading whitespaces unlike charactersToIntStrict().
> 
> Please only use one space after periods in comments.

Done.

> "Not allow leading whitespaces unlike charactersToIntStrict()." -> "Do not
> allow leading or trailing whitespaces unlike charactersToIntStrict()."

Done.

> > +static bool toInt(const UChar* src, unsigned length, unsigned parseStart, unsigned parseLength, int* out)
> 
> Because "out" may never be 0, use "int& out".

Done for this and all of "unsigned* end".

> > +        if (value > (INT_MAX - digit) / 10)  // overflow
> 
> One space before end of line comments.

Done.

> Also, this line doesn't cause an overflow, so I'd change the comment: "// Check
> for overflow."

Done.

> > +        for (; dayDiff > 0; --dayDiff) {
> > +            ++day;
> > +            if (day > maxDayOfMonth(year, month)) {
> 
> Doing this look up repeatedly in a loop pains me (especially in Febuary in a
> leap year). Why not store the result and only change it when the month/year
> changes below?

Done.

> After looking over the code more, I take it back because I think this is never
> used to move very many days. Please correct me if I'm wrong.

Right.  This function is used to adjust timezone offset for now.
However we'll use this function to implement HTMLInputElement::stepUp() and steDown().
So it will be possible to move many days.

> > +                if (month >= 12) {  // month is 0-origin.
> 
> Single space before end of line comments.

Done.

> > +                    if (year < 0)  // Overflow.
> 
> Single space before end of line comments.

Done.

> "// Check for overflow."

Done.

> > +    // min can be negative or greater than 59
> 
> Add "." to end of comment.

Done.

> > +// Parses a timezone part, and adjust year, month, monthDay, hour, minute, second, millesecond.
> 
> "millesecond" should be "millisecond"

Done.

> > +    if (src[i] == 'Z') {
> > +        ++i;
> > +        hour = 0;
> > +        minute = 0;
> 
> Why do you set the hour and minute in this case at all? Why not just move the
> addMinute into the else?

Removed them and addMinute() for 'Z' case.

> > +        if (i >= length || src[i++] != ':')
> > +            return false;
> 
> It would be clear to move the i++ out of the src[i++] and put it here.

Done for all of src[i++] -> src[i] then i++.

> > +    if (!addMinute( -(hour * 60 + minute)))
> 
> No space after open paren.

Done.

> > +    ASSERT(src);
> > +    ASSERT(end);
> > +    unsigned i;
> 
> Consider using index instead of i.

Done for all of i -> index.

> > +    // Optional second part
> > +    if (i + 2 < length && src[i] == ':') {
> > +        int second;
> 
> > +        if (toInt(src, length, i + 1, 2, &second) && second >= 0 && second <= 59) {
> 
> Isn't it an error if the seconds are present but not in the proper range?

The policy of these parse*() functions is "consume only valid part of the input".
Even If the second or fractional second part is invalid, hh:mm part should be parsed successfully.

> > +                unsigned digitsLength = countDigits(src, length, ++i);
> > +                if (digitsLength > 0) {
> 
> From my reading of http://www.w3.org/TR/NOTE-datetime, once there is a period,
> there should be one or more digits. It sounds like anything else is an error.

Ditto.

> > +                    bool ok;
> > +                    int millisecond;
> > +                    if (digitsLength == 1) {
> > +                        ok = toInt(src, length, i, 1, &millisecond);
> > +                        millisecond *= 100;
> > +                    } else if (digitsLength == 2) {
> > +                        ok = toInt(src, length, i, 2, &millisecond);
> > +                        millisecond *= 10;
> > +                    } else // digitsLength >= 3
> > +                        ok = toInt(src, length, i, 3, &millisecond);
> > +                    if (ok) {
> > +                        m_millisecond = millisecond;
> > +                        i += digitsLength;
> > +                    } else
> > +                        --i;
> 
> I don't understand this --i. It is "unreading" the '.' but why isn't that done
> when digitsLength is 0 as well?

This "else" clause was attached to a wrong "if".  It should be attached to "if (digitsLength > 0)" to recover from an input like "23:59:59."
I found "if (ok)" made no sense because these toInt() calls parse number-only strings.
I changed the code here.

> > diff --git a/WebCore/html/ISODateTime.h b/WebCore/html/ISODateTime.h
> > + * Copyright (c) 2009, Google Inc. All rights reserved.
> 
>  capital (C), no comma after 2009

Done.

> > +    // The following six functions parse the input `src' of which length is
> 
> "of which" -> "whose"

Done.

> > +    // `src', `end' and `date' must be non-null. The `src' string doesn't needs to
> 
> "needs" -> "need"

Done.

> > +    // The functions return true if the parsing succeeds, and sets the next index
> > +    // of the last consumed character to `*end'. The leading extra characters cause
> Consider using this:
>     // The functions return true if the parsing succeeds and set 'end' to the
> index after
>     // the last character consumed. Extra leading characters cause
>     // parse failures, and trailing extra characters don't cause parse
> failures.

Done.

> > +    enum {
> > +        SUNDAY = 0,
> > +        MONDAY,
> > +        TUESDAY,
> > +        WEDNESDAY,
> > +        THURSDAY,
> > +        FRIDAY,
> > +        SATURDAY,
> > +    };
> 
> Enum members should user InterCaps with an initial capital letter.
>  - http://webkit.org/coding/coding-style.html

Done.


Thank you very very much for loooong review!
Comment 5 David Levin 2009-11-16 10:53:34 PST
Comment on attachment 43055 [details]
ISO 8601 class (rev.6)

Thanks! Please consider addressing these final nits on landing.

> diff --git a/WebCore/ChangeLog b/WebCore/ChangeLog

This is a much better changelog!


> diff --git a/WebCore/html/ISODateTime.cpp b/WebCore/html/ISODateTime.cpp
> +// The oldest day of Gregorian Calendar is 1582-10-15.  We don't support dates older than it.

One space after periods.

> +static const int gregorianStartYear = 1582;
> +static const int gregorianStartMonth = 9; // 0-based October.

Use whole sentences in general. 0-based October doesn't make sense. It sound like October is the 0th month. What you mean is that January is 0, so this month 9 is really October.
Here's an idea for your comment  "// This is October, since months are 0 based."

>
> +static int dayOfWeek(int year, int month, int day)
> +{
> +    int shiftedMonth = month + 2;
> +    // 2:January, 3:Feburuary, 4:March, ...

Thanks! You had this before with the month++ and I missed it, but I find this new form much easier to read.


> +// Very strict integer parser. Do not allow leading or trailing whitespaces unlike charactersToIntStrict().

s/whitespaces/whitespace/
Comment 6 Kent Tamura 2009-11-16 21:00:05 PST
(In reply to comment #5)

Thank you!  I'll fix them and land the patch manually.
Comment 7 Kent Tamura 2009-11-16 21:15:33 PST
Landed as r51063
http://trac.webkit.org/changeset/51063