Bug 62882 - WebVTT parser
Summary: WebVTT parser
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Media Elements (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Anna Cavender
URL:
Keywords:
Depends on: 62893 64132 64921
Blocks: 43668
  Show dependency treegraph
 
Reported: 2011-06-17 10:55 PDT by Anna Cavender
Modified: 2011-08-05 21:12 PDT (History)
11 users (show)

See Also:


Attachments
Patch (25.03 KB, patch)
2011-07-01 18:10 PDT, Anna Cavender
no flags Details | Formatted Diff | Diff
Addressing comments and making more spec compliant. (33.08 KB, patch)
2011-07-07 17:57 PDT, Anna Cavender
no flags Details | Formatted Diff | Diff
Correcting some typos (32.92 KB, patch)
2011-07-07 22:38 PDT, Anna Cavender
no flags Details | Formatted Diff | Diff
addressing comments, tests coming soon (40.08 KB, patch)
2011-07-14 11:21 PDT, Anna Cavender
no flags Details | Formatted Diff | Diff
addressing eric's comments (40.32 KB, patch)
2011-07-28 17:24 PDT, Anna Cavender
no flags Details | Formatted Diff | Diff
updating to ToT so patch will apply (40.38 KB, patch)
2011-07-28 18:23 PDT, Anna Cavender
no flags Details | Formatted Diff | Diff
addressing eric's comments (41.99 KB, patch)
2011-08-02 22:09 PDT, Anna Cavender
no flags Details | Formatted Diff | Diff
some nits and int to long casting removed (42.53 KB, patch)
2011-08-05 16:14 PDT, Anna Cavender
no flags Details | Formatted Diff | Diff
fixing some typos (42.54 KB, patch)
2011-08-05 17:39 PDT, Anna Cavender
no flags Details | Formatted Diff | Diff
updating to ToT (42.48 KB, patch)
2011-08-05 18:31 PDT, Anna Cavender
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Anna Cavender 2011-06-17 10:55:57 PDT
Implement a WebVTT parser for text track files.
Comment 1 Anna Cavender 2011-07-01 18:10:37 PDT
Created attachment 99544 [details]
Patch
Comment 2 Anna Cavender 2011-07-01 18:17:02 PDT
A few notes about this WebVTT parser:

There are aspects of the WebVTT parser spec that are not present in this patch:
* cue class span <c></c>
* cue voice span <v></v>
* cue timestamp (within the cue)
* detecting the BOM character at the front of a WebVTT file

Some of the html features of WebVTT we get for free in WebKit, specifically:
<i></i>
<b></b>
<u></u>
<ruby></ruby>

Thanks!
-Anna
Comment 3 Eric Carlson 2011-07-01 22:16:31 PDT
cc'd Eric Seidel because he and Adam wrote the HTML5 parser and I know squat about parsing
Comment 4 Eric Carlson 2011-07-01 22:49:51 PDT
Comment on attachment 99544 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=99544&action=review

> Source/WebCore/platform/track/CueParser.cpp:72
> +    return url.contains(".vtt");
>  }

Won't this match ".vtt" *anywhere in the string? Is this type a MIME type of a file name? If the former, I think it would be good if this took a ContentType as MediaPlayer::supportsType does. If the former, the function name should make it clear that the parameter is a file name.

> Source/WebCore/platform/track/CueParser.cpp:92
> +        if (firstLine.contains("WEBVTT"))
> +            m_private = WebVTTParser::create(this);

Is it OK that this will match "WEBVTT" anywhere in the first line of the file? The parser fails if the first line is not "WEBVTT" or "WEBVTT FILE", this should probably do the same.

> Source/WebCore/platform/track/CueParser.cpp:93
> +        m_client->trackLoadStarted();

Should trackLoadStarted() be called if the parser is not created?

> Source/WebCore/platform/track/WebVTTParser.cpp:99
> +        case TIMINGSANDSETTINGS:
> +            // 2-4 - WebVTT cue timings, white space, and optional WebVTT cue settings.

You should have FIXMEs for the features not supported yet.

> Source/WebCore/platform/track/WebVTTParser.cpp:163
> +    // 6-9 - If the next three characters are not "-->", abort and return failure.
> +    if (input.substring(position, 3) != "-->")
> +        return false;
> +    position += 3;
> +    position = skipWhiteSpace(input, position);
> +    // 10-11 - Collect a WebVTT timestamp. If that fails, then abort and return failure. Otherwise, let cue's text track cue end time be the collected time.

Nit here an elsewhere: I think it would be slightly easier to read if there was a blank line before each spec step/comment .
Comment 5 Eric Seidel 2011-07-01 23:47:51 PDT
Comment on attachment 99544 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=99544&action=review

Deifnitely need some sort of testing for this patch to be seriously considered. :)

> Source/WebCore/WebCore.xcodeproj/project.pbxproj:10457
> +		B10B740C13BE80BF00BC1C26 /* WebVTTParser.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = WebVTTParser.h; path = platform/track/WebVTTParser.h; sourceTree = SOURCE_ROOT; };
> +		B10B740D13BE80BF00BC1C26 /* WebVTTParser.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = WebVTTParser.cpp; path = platform/track/WebVTTParser.cpp; sourceTree = SOURCE_ROOT; };

These should be group relative, not SOURCE_ROOT relative.

> Source/WebCore/platform/track/CueParser.cpp:76
> +    // This may handle other mime types as well one day.

I don't think the comment adds much.

> Source/WebCore/platform/track/CueParser.cpp:88
> +        String firstLine = content.substring(0, content.find("\n"));

It's more efficient to just search for what you want instead of splitting the string on \n.  Is there a spec which covers this?

> Source/WebCore/platform/track/CueParser.cpp:89
> +        if (firstLine == emptyString())

firstLine.isEmpty()

> Source/WebCore/platform/track/CueParser.cpp:90
> +          firstLine = content.substring(0, content.find("\r"));

indent is odd.  Also, why search for \r after having found an empty line with \n?

> Source/WebCore/platform/track/WebVTTParser.cpp:42
> +    : m_state(HEADER)

I don't think all-caps is normal style for enums.

> Source/WebCore/platform/track/WebVTTParser.cpp:47
> +    , m_currentContent(emptyString())
> +    , m_currentSettings(emptyString())

You could just use "' here.  Is it important that these be "" instead of null strings?

> Source/WebCore/platform/track/WebVTTParser.cpp:56
> +    for (size_t i = 0; i < m_cuelist.size(); ++i)
> +        outputCues.append(m_cuelist[i]);

Doesn't vector have a copy method?

> Source/WebCore/platform/track/WebVTTParser.cpp:62
> +    // The following is based on WHATWG WebVTT 4.8.10.13.2 Syntax.

Thank you, that is helpful.

> Source/WebCore/platform/track/WebVTTParser.cpp:63
> +    String inputBytes = String::fromUTF8(data);

This is not the normal way we do text decoding.  Are you sure this is always utf8?

> Source/WebCore/platform/track/WebVTTParser.cpp:65
> +    splitByLineTerminators(inputBytes, lines);

Your parser does lots of string mallocs.  Which is OK, but not very efficient.   I'm not sure how performance senstiave this code is, if at all.

> Source/WebCore/platform/track/WebVTTParser.cpp:68
> +    for (Vector<String>::iterator line = lines.begin(); line != lines.end(); line++) {
> +        String currentLine = *line;

You could easily do this by searching for the next marker and keeping a start/end pointer and creating a string from that (or not even bothering creating a string).  Again, depends on what your constraits are.

> Source/WebCore/platform/track/WebVTTParser.cpp:74
> +            if (currentLine.stripWhiteSpace() != "WEBVTT" && currentLine.stripWhiteSpace() != "WEBVTT FILE")

Are you intentionally allowing leading spaces?  This code does...
Comment 6 Eric Seidel 2011-07-01 23:48:26 PDT
Darin is also particularly knowledgable about parsers, but then again he's particularly knowledgable about many things... :)
Comment 7 Anna Cavender 2011-07-07 17:56:52 PDT
Comment on attachment 99544 [details]
Patch

Thanks for the fast response on reviews!  Eric Seidel, what kind of testing would you like to see?  And would you be willing to talk with me separately about that?

Thanks!
-Anna

View in context: https://bugs.webkit.org/attachment.cgi?id=99544&action=review

>> Source/WebCore/WebCore.xcodeproj/project.pbxproj:10457
>> +		B10B740D13BE80BF00BC1C26 /* WebVTTParser.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = WebVTTParser.cpp; path = platform/track/WebVTTParser.cpp; sourceTree = SOURCE_ROOT; };
> 
> These should be group relative, not SOURCE_ROOT relative.

Done.

>> Source/WebCore/platform/track/CueParser.cpp:72
>>  }
> 
> Won't this match ".vtt" *anywhere in the string? Is this type a MIME type of a file name? If the former, I think it would be good if this took a ContentType as MediaPlayer::supportsType does. If the former, the function name should make it clear that the parameter is a file name.

I decided to take this out for now as it will primarily be used for <source> children of <track> tags, which we decided not to include yet.

>> Source/WebCore/platform/track/CueParser.cpp:76
>> +    // This may handle other mime types as well one day.
> 
> I don't think the comment adds much.

OK, it's gone.

>> Source/WebCore/platform/track/CueParser.cpp:88
>> +        String firstLine = content.substring(0, content.find("\n"));
> 
> It's more efficient to just search for what you want instead of splitting the string on \n.  Is there a spec which covers this?

I agree.  I changed it to more closely follow the WHATWG WEBVTT text/vtt spec, even though I think it's still under consideration: http://www.whatwg.org/specs/web-apps/current-work/multipage/iana.html#text/vtt

>> Source/WebCore/platform/track/CueParser.cpp:89
>> +        if (firstLine == emptyString())
> 
> firstLine.isEmpty()

Gone.

>> Source/WebCore/platform/track/CueParser.cpp:90
>> +          firstLine = content.substring(0, content.find("\r"));
> 
> indent is odd.  Also, why search for \r after having found an empty line with \n?

Gone.

>> Source/WebCore/platform/track/CueParser.cpp:92
>> +            m_private = WebVTTParser::create(this);
> 
> Is it OK that this will match "WEBVTT" anywhere in the first line of the file? The parser fails if the first line is not "WEBVTT" or "WEBVTT FILE", this should probably do the same.

No, good call.  I've changed it to follow spec (optional BOM character followed by "WEBVTT").

>> Source/WebCore/platform/track/CueParser.cpp:93
>> +        m_client->trackLoadStarted();
> 
> Should trackLoadStarted() be called if the parser is not created?

Also no.  Changed to only call trackLoadStarted when parser is created.

>> Source/WebCore/platform/track/WebVTTParser.cpp:42
>> +    : m_state(HEADER)
> 
> I don't think all-caps is normal style for enums.

Gone.

>> Source/WebCore/platform/track/WebVTTParser.cpp:47
>> +    , m_currentSettings(emptyString())
> 
> You could just use "' here.  Is it important that these be "" instead of null strings?

Nope, not important.  I've removed these initializers.

>> Source/WebCore/platform/track/WebVTTParser.cpp:56
>> +        outputCues.append(m_cuelist[i]);
> 
> Doesn't vector have a copy method?

Nice catch, thanks.

>> Source/WebCore/platform/track/WebVTTParser.cpp:63
>> +    String inputBytes = String::fromUTF8(data);
> 
> This is not the normal way we do text decoding.  Are you sure this is always utf8?

Yes, WebVTT files must always be UTF-8 (see http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#webvtt-file).  I'm happy to take suggestions though if this looks strange to you.

>> Source/WebCore/platform/track/WebVTTParser.cpp:65
>> +    splitByLineTerminators(inputBytes, lines);
> 
> Your parser does lots of string mallocs.  Which is OK, but not very efficient.   I'm not sure how performance senstiave this code is, if at all.

I agree, I've changed it to better match the spec, which I believe should be more efficient.  Performance isn't a big deal here as text tracks ought to be relatively small compared to the video/media they accompany, but we should do it right :).

>> Source/WebCore/platform/track/WebVTTParser.cpp:68
>> +        String currentLine = *line;
> 
> You could easily do this by searching for the next marker and keeping a start/end pointer and creating a string from that (or not even bothering creating a string).  Again, depends on what your constraits are.

Yep, you're right.  Changed.

>> Source/WebCore/platform/track/WebVTTParser.cpp:74
>> +            if (currentLine.stripWhiteSpace() != "WEBVTT" && currentLine.stripWhiteSpace() != "WEBVTT FILE")
> 
> Are you intentionally allowing leading spaces?  This code does...

Gone.

>> Source/WebCore/platform/track/WebVTTParser.cpp:99
>> +            // 2-4 - WebVTT cue timings, white space, and optional WebVTT cue settings.
> 
> You should have FIXMEs for the features not supported yet.

Done.

>> Source/WebCore/platform/track/WebVTTParser.cpp:163
>> +    // 10-11 - Collect a WebVTT timestamp. If that fails, then abort and return failure. Otherwise, let cue's text track cue end time be the collected time.
> 
> Nit here an elsewhere: I think it would be slightly easier to read if there was a blank line before each spec step/comment .

Done.
Comment 8 Anna Cavender 2011-07-07 17:57:19 PDT
Created attachment 100055 [details]
Addressing comments and making more spec compliant.
Comment 9 Eric Seidel 2011-07-07 18:01:09 PDT
Comment on attachment 100055 [details]
Addressing comments and making more spec compliant.

The code is not excercised in any test.  Past experiance has taught me that if code isn't tested, it's broken.  So we just need some way to exercise this code.  Whether by using existing run-webkit-tests LayoutTests or the unit testing code available from WTF, or something new. :)
Comment 10 Gustavo Noronha (kov) 2011-07-07 18:34:29 PDT
Comment on attachment 100055 [details]
Addressing comments and making more spec compliant.

Attachment 100055 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/9001256
Comment 11 Anna Cavender 2011-07-07 22:38:25 PDT
Created attachment 100074 [details]
Correcting some typos

Correcting some typos
Comment 12 Gustavo Noronha (kov) 2011-07-07 22:41:59 PDT
Comment on attachment 100074 [details]
Correcting some typos

Attachment 100074 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8999417
Comment 13 Eric Carlson 2011-07-08 12:25:02 PDT
Comment on attachment 100074 [details]
Correcting some typos

I agree with Eric, tests for this parser are absolutely essential! 

Even though none of the ports builds with VIDEO_TRACK, you obviously build with it on so you can run the tests yourself to make sure the parser behaves correctly, and to make sure you don't accidentally break it as you add more functionality. 


View in context: https://bugs.webkit.org/attachment.cgi?id=100074&action=review

> Source/WebCore/html/TextTrackCue.cpp:160
> +            while (position < input.length() && input[position] != ' ' && input[position] != '\t')

Why break on a tab, the spec says "Collect a sequence of characters that are not space characters"?

> Source/WebCore/html/TextTrackCue.cpp:165
> +            if (equalIgnoringCase(writingDirection, "vertical"))
> +                m_writingDirection = VerticalGrowingLeft;
> +            if (equalIgnoringCase(writingDirection, "vertical-lr"))
> +                m_writingDirection = VerticalGrowingRight;

The spec says the direction strings are case sensitive: "If value is a case-sensitive match for the string "vertical", then let cue's text track cue..."

> Source/WebCore/html/TextTrackCue.cpp:174
> +            if (position <= input.length() && input[position] != ' ' && input[position] != '\t')
> +                break;

Ditto about stopping on a tab. 

In the case where there is a non-space character immediately after an 'L', the spec says to skip everything *up to* the next space: "Collect a sequence of characters that are not space characters and discard them". Just breaking out of the case will skip up to the first non-space. This is different from the other terminating conditions for this stage.

> Source/WebCore/html/TextTrackCue.cpp:176
> +            if (linePosition.substring(1, linePosition.length()-1).contains('-') || linePosition.substring(0, linePosition.length()-1).contains('%'))
> +                break;

Doesn't "linePosition.substring(1, linePosition.length()-1" return a new string identical to "linePosition"? Is it necessary to copy the string to look for a character?

> Source/WebCore/html/TextTrackCue.cpp:177
> +            if (linePosition[0] == '-' && linePosition[linePosition.length()-1] == '%')

Nit here and elsewhere: "xxx-1" should have a space before and after the minus sign.

> Source/WebCore/html/TextTrackCue.cpp:187
> +            if (linePosition[linePosition.length()-1] == '%') {
> +                number = linePosition.substring(0, linePosition.length()-1).toInt64();
> +                if (number < 0 || number > 100)
> +                    break;
> +                m_snapToLines = false;
> +            } else
> +                number = linePosition.substring(0, linePosition.length()).toInt64();
> +            m_linePosition = number;

1) Here again, it isn't necessary to create a substring from 0..length. 

2) You skipped step 3: "If value does not contain at least one character in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), then jump back to the step labeled settings".

3) I believe that toInt64() will convert everything up to the first non-digit character to a number so you can simplify this:

bool validNumber;
number = linePosition.substring(0, linePosition.length()).toInt64(&validNumber);
if (!validNumber)
    break;
if (linePosition[linePosition.length()-1] == '%') {
    if (number < 0 || number > 100)
        break;
    m_snapToLines = false;
}
m_linePosition = number;

> Source/WebCore/html/TextTrackCue.cpp:198
> +            if (input[position++] != '%')
> +                break;

If the character is not a '%' you should discard everything *up to* the next space.

> Source/WebCore/html/TextTrackCue.cpp:200
> +            if (position < input.length() && input[position] != ' ' && input[position] != '\t')
> +                break;

Break on tab? In this case you should also skip everything  up to the next space.

> Source/WebCore/html/TextTrackCue.cpp:213
> +            while (position < input.length() && '0' <= input[position] && input[position] <= '9')
> +                cueSize.append(input[position++]);

The "Collect a sequence of characters that are in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9)" pattern is used often enough that it calls out for a function instead of repeating the code each time.

> Source/WebCore/html/TextTrackCue.cpp:219
> +            if (position < input.length() && input[position] != ' ' && input[position] != '\t')
> +                break;

Break on tab? In this case you should also skip everything  up to the next space.

> Source/WebCore/html/TextTrackCue.cpp:232
> +            while (position < input.length() && input[position] != ' ' && input[position] != '\t')
> +                cueAlignment.append(input[position++]);

Break on tab?

> Source/WebCore/html/TextTrackCue.cpp:238
> +            if (equalIgnoringCase(cueAlignment, "start"))
> +                m_cueAlignment = Start;
> +            if (equalIgnoringCase(cueAlignment, "middle"))
> +                m_cueAlignment = Middle;
> +            if (equalIgnoringCase(cueAlignment, "end"))
> +                m_cueAlignment = End;

These comparisons should all be case sensitive.

> Source/WebCore/html/TextTrackCue.cpp:244
> +        while (position < input.length() && (input[position] == ' ' || input[position] == '\t'))
> +            position++;

Break on tab?

> Source/WebCore/html/TextTrackCue.h:84
>      String m_settings;

Is it useful to retain the settings as a string in a non-debug build?

> Source/WebCore/platform/track/WebVTTParser.cpp:65
> +    // 5-8 - Collect the first line and check that it is "WEBVTT" (we also check for "WEBVTT FILE" for legacy reasons).
> +    String line;
> +    while (position < input.length() && input[position] != '\r' && input[position] != '\n')
> +        line.append(input[position++]);

A question that the other Eric may know the answer to: does it make sense to use a SegmentedString in at least this outer processing loop because WebVTT files are line based?

> Source/WebCore/platform/track/WebVTTParser.cpp:93
> +Header:
> +    // 13-18 - Allow a header (comment area) under the WEBVTT line.
> +    line = "";
> +    while (position < input.length() && input[position] != '\r' && input[position] != '\n')
> +        line.append(input[position++]);
> +    position = skipLineTerminator(input, position);
> +    if (!line.isEmpty())
> +        goto Header;
> +
> +CueLoop:
> +    // 19-29 - Allow any number of line terminators, then initialize new cue values.
> +    while (position < input.length() && (input[position] == '\r' || input[position] == '\n'))
> +        position++;
> +    resetCueValues();
> +

Instead of using inline code and jumping between labels, I think the code will be easier to follow (and debug later) if you are able to break it up into functions and structure it as a state machine. Something like:

enum ParserState {
        Initial,
        CueLoop,
        ... etc ...
    };

    ParserState state = Initial;

    while (position < input.length()) {
        switch (state) {
        case Initial: {
            if (!requireFileIdentifier(input, position))
                return;
            skipHeader(input, position);
            state = CueLoop;
            break;
        }

        case CueLoop: {
            state = parseCue(input, position, line);
            break;
        }

... etc ...

> Source/WebCore/platform/track/WebVTTParser.cpp:173
> +    if (input.substring(position, 3) != "-->")
> +        return false;

You could use find() instead of allocating a new string.

> Source/WebCore/platform/track/WebVTTParser.cpp:206
> +    String digits1;
> +    while (position < input.length() && '0' <= input[position] && input[position] <= '9')
> +        digits1.append(input[position++]);
> +    int value1 = digits1.toInt();

This should be able to use the same function you will create to parse numbers for parseSettings().

> Source/WebCore/platform/track/WebVTTParser.cpp:220
> +    // 8-12 - Collect the next sequence of 0-9 after ':' (must be 2 chars).
> +    if (position >= input.length() || input[position++] != ':')
> +        return malformedTime;
> +    if (position >= input.length() || !('0' <= input[position] && input[position] <= '9'))
> +        return malformedTime;
> +    String digits2;
> +    while (position < input.length() && '0' <= input[position] && input[position] <= '9')
> +        digits2.append(input[position++]);
> +    int value2 = digits2.toInt();

Here too.

> Source/WebCore/platform/track/WebVTTParser.cpp:237
> +        while (position < input.length() && '0' <= input[position] && input[position] <= '9')
> +            digits3.append(input[position++]);
> +        if (digits3.length() != 2)
> +            return malformedTime;
> +        value3 = digits3.toInt();

And here.

> Source/WebCore/platform/track/WebVTTParser.cpp:256
> +    // 14-19 - Collect next sequence of 0-9 after '.' (must be 3 chars).
> +    if (position >= input.length() || input[position++] != '.')
> +        return malformedTime;
> +    if (position >= input.length() || !('0' <= input[position] && input[position] <= '9'))
> +        return malformedTime;
> +    String digits4;
> +    while (position < input.length() && '0' <= input[position] && input[position] <= '9')
> +        digits4.append(input[position++]);
> +    if (digits4.length() != 3)
> +        return malformedTime;
> +    int value4 = digits4.toInt();
> +    if (value2 > 59 || value3 > 59)
> +        return malformedTime;

And here.

> Source/WebCore/platform/track/WebVTTParser.cpp:259
> +    return value1 * secondsInHour + value2 * secondsInMinute + value3 + (double)value4 / 1000;

Parentheses around the terms would make this much easier to parse visually.
Comment 14 Silvia Pfeiffer 2011-07-11 03:18:41 PDT
(In reply to comment #13)
> (From update of attachment 100074 [details])
> > Source/WebCore/html/TextTrackCue.cpp:160
> > +            while (position < input.length() && input[position] != ' ' && input[position] != '\t')
> 
> Why break on a tab, the spec says "Collect a sequence of characters that are not space characters"?

The "space characters" that the WebVTT spec refers to are defined here: http://www.whatwg.org/specs/web-apps/current-work/multipage/common-microsyntaxes.html#space-character and include U+0020 SPACE, U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR). Since we have done away with the newlines before getting to the settings parsing, I think Anna's code is right.


>
> > Source/WebCore/html/TextTrackCue.cpp:244
> > +        while (position < input.length() && (input[position] == ' ' || input[position] == '\t'))
> > +            position++;

I don't think this loop is in the spec. This loop removes more than one "space" separation character from between the individual settings. However, the spec does not contain that. Maybe it should?

What is missing, however, is the "Otherwise" part from the settings parsing: namely the bit that picks up on surplus characters from faulty settings and parses through to the next "space" character.
Comment 15 Anna Cavender 2011-07-14 11:21:22 PDT
Comment on attachment 100074 [details]
Correcting some typos

View in context: https://bugs.webkit.org/attachment.cgi?id=100074&action=review

>>> Source/WebCore/html/TextTrackCue.cpp:160
>>> +            while (position < input.length() && input[position] != ' ' && input[position] != '\t')
>> 
>> Why break on a tab, the spec says "Collect a sequence of characters that are not space characters"?
> 
> The "space characters" that the WebVTT spec refers to are defined here: http://www.whatwg.org/specs/web-apps/current-work/multipage/common-microsyntaxes.html#space-character and include U+0020 SPACE, U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR). Since we have done away with the newlines before getting to the settings parsing, I think Anna's code is right.

Yes, Silvia is right, but those tab characters are ugly.  I've moved them into a separate function.

>> Source/WebCore/html/TextTrackCue.cpp:165
>> +                m_writingDirection = VerticalGrowingRight;
> 
> The spec says the direction strings are case sensitive: "If value is a case-sensitive match for the string "vertical", then let cue's text track cue..."

Done.  Thanks, I misread that.

>> Source/WebCore/html/TextTrackCue.cpp:174
>> +                break;
> 
> Ditto about stopping on a tab. 
> 
> In the case where there is a non-space character immediately after an 'L', the spec says to skip everything *up to* the next space: "Collect a sequence of characters that are not space characters and discard them". Just breaking out of the case will skip up to the first non-space. This is different from the other terminating conditions for this stage.

Done.  Ah, I see my mistake.  Should be corrected now, thanks!

>> Source/WebCore/html/TextTrackCue.cpp:176
>> +                break;
> 
> Doesn't "linePosition.substring(1, linePosition.length()-1" return a new string identical to "linePosition"? Is it necessary to copy the string to look for a character?

Well, we want to check that any character other than the first is not a '-' and that any character other than the last is not a '%', so the strings really are substrings.  But, you're right, it's not necessary to copy; I changed it to use find() instead of contains().

>> Source/WebCore/html/TextTrackCue.cpp:177
>> +            if (linePosition[0] == '-' && linePosition[linePosition.length()-1] == '%')
> 
> Nit here and elsewhere: "xxx-1" should have a space before and after the minus sign.

Done.

>> Source/WebCore/html/TextTrackCue.cpp:187
>> +            m_linePosition = number;
> 
> 1) Here again, it isn't necessary to create a substring from 0..length. 
> 
> 2) You skipped step 3: "If value does not contain at least one character in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), then jump back to the step labeled settings".
> 
> 3) I believe that toInt64() will convert everything up to the first non-digit character to a number so you can simplify this:
> 
> bool validNumber;
> number = linePosition.substring(0, linePosition.length()).toInt64(&validNumber);
> if (!validNumber)
>     break;
> if (linePosition[linePosition.length()-1] == '%') {
>     if (number < 0 || number > 100)
>         break;
>     m_snapToLines = false;
> }
> m_linePosition = number;

Done. Nice! Thanks!

>> Source/WebCore/html/TextTrackCue.cpp:198
>> +                break;
> 
> If the character is not a '%' you should discard everything *up to* the next space.

Done.

>> Source/WebCore/html/TextTrackCue.cpp:200
>> +                break;
> 
> Break on tab? In this case you should also skip everything  up to the next space.

Done.

>> Source/WebCore/html/TextTrackCue.cpp:213
>> +                cueSize.append(input[position++]);
> 
> The "Collect a sequence of characters that are in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9)" pattern is used often enough that it calls out for a function instead of repeating the code each time.

Done.  Though I am unsure if making this method static in TextTrackCue as I have done is the right approach...

>> Source/WebCore/html/TextTrackCue.cpp:219
>> +                break;
> 
> Break on tab? In this case you should also skip everything  up to the next space.

Done.

>> Source/WebCore/html/TextTrackCue.cpp:232
>> +                cueAlignment.append(input[position++]);
> 
> Break on tab?

Done.

>> Source/WebCore/html/TextTrackCue.cpp:238
>> +                m_cueAlignment = End;
> 
> These comparisons should all be case sensitive.

Done.

>>> Source/WebCore/html/TextTrackCue.cpp:244
>>> +            position++;
>> 
>> Break on tab?
> 
> I don't think this loop is in the spec. This loop removes more than one "space" separation character from between the individual settings. However, the spec does not contain that. Maybe it should?
> 
> What is missing, however, is the "Otherwise" part from the settings parsing: namely the bit that picks up on surplus characters from faulty settings and parses through to the next "space" character.

I decided to keep the loop.  Some sort of check is necessary, otherwise position will not advance properly.  And I do think the spec should be tolerant of more than one space character between settings.

Done re Otherwise.  Much better :).

>> Source/WebCore/html/TextTrackCue.h:84
>>      String m_settings;
> 
> Is it useful to retain the settings as a string in a non-debug build?

Done.  It's gone.

>> Source/WebCore/platform/track/WebVTTParser.cpp:65
>> +        line.append(input[position++]);
> 
> A question that the other Eric may know the answer to: does it make sense to use a SegmentedString in at least this outer processing loop because WebVTT files are line based?

I've moved back to a state machine model, this time collecting on a line-by-line basis without doing all the string mallocs I had before.  I think this is a good compromise.

>> Source/WebCore/platform/track/WebVTTParser.cpp:93
>> +
> 
> Instead of using inline code and jumping between labels, I think the code will be easier to follow (and debug later) if you are able to break it up into functions and structure it as a state machine. Something like:
> 
> enum ParserState {
>         Initial,
>         CueLoop,
>         ... etc ...
>     };
> 
>     ParserState state = Initial;
> 
>     while (position < input.length()) {
>         switch (state) {
>         case Initial: {
>             if (!requireFileIdentifier(input, position))
>                 return;
>             skipHeader(input, position);
>             state = CueLoop;
>             break;
>         }
> 
>         case CueLoop: {
>             state = parseCue(input, position, line);
>             break;
>         }
> 
> ... etc ...

Ha!  I agree, that's more like how I had it before.  The new patch is updated.

>> Source/WebCore/platform/track/WebVTTParser.cpp:173
>> +        return false;
> 
> You could use find() instead of allocating a new string.

Done.

>> Source/WebCore/platform/track/WebVTTParser.cpp:206
>> +    int value1 = digits1.toInt();
> 
> This should be able to use the same function you will create to parse numbers for parseSettings().

Done.

>> Source/WebCore/platform/track/WebVTTParser.cpp:220
>> +    int value2 = digits2.toInt();
> 
> Here too.

Done.

>> Source/WebCore/platform/track/WebVTTParser.cpp:237
>> +        value3 = digits3.toInt();
> 
> And here.

Done.

>> Source/WebCore/platform/track/WebVTTParser.cpp:256
>> +        return malformedTime;
> 
> And here.

Done.

>> Source/WebCore/platform/track/WebVTTParser.cpp:259
>> +    return value1 * secondsInHour + value2 * secondsInMinute + value3 + (double)value4 / 1000;
> 
> Parentheses around the terms would make this much easier to parse visually.

Done.
Comment 16 Anna Cavender 2011-07-14 11:21:49 PDT
Created attachment 100833 [details]
addressing comments, tests coming soon
Comment 17 Anna Cavender 2011-07-14 11:29:39 PDT
Regarding tests for this parser, the test cases we have been using internally are here:
http://src.chromium.org/viewvc/chrome?view=rev&revision=92193

Should we make these part of WebKit?  Automated tests would obviously be best, but there is not much to test yet at this stage (in terms of a working feature).  Suggestions/ideas welcome!

-Anna
Comment 18 Eric Carlson 2011-07-15 13:57:26 PDT
(In reply to comment #17)
> Regarding tests for this parser, the test cases we have been using internally are here:
> http://src.chromium.org/viewvc/chrome?view=rev&revision=92193
> 
> Should we make these part of WebKit?  Automated tests would obviously be best, but there is not much to test yet at this stage (in terms of a working feature).  Suggestions/ideas welcome!
> 
The point is to add tests for every feature in the codebase. While you can't make tests for <track> yet because it isn't completely functional, you can add tests for the parser features you have implemented - parser correctness, handling malformed input, etc, etc.
Comment 19 Eric Carlson 2011-07-22 08:23:37 PDT
Comment on attachment 100833 [details]
addressing comments, tests coming soon

View in context: https://bugs.webkit.org/attachment.cgi?id=100833&action=review

Marking r- for the slow WTF::String idioms.


> Source/WebCore/html/TextTrackCue.cpp:73
> +    String digits;
> +    while (*position < input.length() && charIsANumber(input[*position]))
> +        digits.append(input[(*position)++]);
> +    return digits;

Darin's "Slow idioms with WTF::String" email (https://lists.webkit.org/pipermail/webkit-dev/2011-July/017587.html) flags more than one call to WTF::String::append as one of the "inefficient ways to build a new string". He suggest that WTF::StringBuilder is a much more efficient way to build a string in a loop.

> Source/WebCore/html/TextTrackCue.cpp:185
> +            while (position < input.length() && !charIsASpaceCharacter(input[position]))
> +                writingDirection.append(input[position++]);

Scanning past all non-space characters is done in several places. Can it be factored into a shared function?

> Source/WebCore/html/TextTrackCue.cpp:196
> +            while (position < input.length() && (input[position] == '-' || input[position] == '%' || charIsANumber(input[position])))
> +                linePosition.append(input[position++]);

WTF::String::append

> Source/WebCore/html/TextTrackCue.cpp:259
> +            while (position < input.length() && !charIsASpaceCharacter(input[position]))
> +                cueAlignment.append(input[position++]);

WTF::String::append

> Source/WebCore/platform/track/CueParser.cpp:79
> +    if (!m_private)
> +        if (response.mimeType() == "text/vtt")
> +            m_private = WebVTTParser::create(this);
>      m_client->trackLoadStarted();

Does it make sense to call trackLoadStarted for MIME types we don't support? Might be better to have a function that creates the parser and calls trackLoadStarted that you can call from here and ddiReceiveData.

> Source/WebCore/platform/track/CueParser.cpp:89
> +        if (WebVTTParser::requireFileIdentifier(data, len)) {
> +            m_private = WebVTTParser::create(this);
> +            m_client->trackLoadStarted();
> +        }

This will result in trackLoadStarted being called twice if the files doesn't have the right MIME type.

> Source/WebCore/platform/track/WebVTTParser.cpp:52
> +    while (position < input.length() && input[position] != '\r' && input[position] != '\n')
> +        line.append(input[position++]);

More WTF::String::append.

> Source/WebCore/platform/track/WebVTTParser.cpp:84
> +    String input = String::fromUTF8(data, length);
> +    unsigned position = 0;
> +
> +    do {
> +        String line = collectNextLine(input, &position);

Creating a String of the entire data buffer and then creating another String for each line seems needlessly inefficient. Could collectNextLine take a pointer to the data, length, and offset and create a string for a line from that?

> Source/WebCore/platform/track/WebVTTParser.cpp:90
> +            if (!requireFileIdentifier(data, length))
> +                return;

Creating a string from the entire buffer at the top of the function and then creating it again in requireFileIdentifier is not very efficient. Can you pass "line" to requireFileIdentifier?

> Source/WebCore/platform/track/WebVTTParser.cpp:93
> +            m_state = Header;
> +        
> +        case Header:

If you are falling through to the 'Header' case on purpose, please add a comment.

> Source/WebCore/platform/track/WebVTTParser.cpp:148
> +    while (position < input.length() && !TextTrackCue::charIsASpaceCharacter(input[position]))
> +        start.append(input[position++]);

WTF::String::append

> Source/WebCore/platform/track/WebVTTParser.cpp:164
> +    while (position < input.length() && !TextTrackCue::charIsASpaceCharacter(input[position]))
> +      end.append(input[position++]);

WTF::String::append

> Source/WebCore/platform/track/WebVTTParser.cpp:184
> +    if (!m_currentContent.isEmpty())
> +        m_currentContent += "\n";
> +    m_currentContent += line;

Darin's email notes that building a String with the += operator is as inefficient as WTF::String::append.

> Source/WebCore/platform/track/WebVTTParser.cpp:279
> +    String line = "";
> +    while (*position < input.length() && input[*position] != '\r' && input[*position] != '\n')
> +        line.append(input[(*position)++]);
> +    return line;

WTF::String::append
Comment 20 Anna Cavender 2011-07-28 17:24:14 PDT
Comment on attachment 100833 [details]
addressing comments, tests coming soon

View in context: https://bugs.webkit.org/attachment.cgi?id=100833&action=review

>> Source/WebCore/html/TextTrackCue.cpp:73
>> +    return digits;
> 
> Darin's "Slow idioms with WTF::String" email (https://lists.webkit.org/pipermail/webkit-dev/2011-July/017587.html) flags more than one call to WTF::String::append as one of the "inefficient ways to build a new string". He suggest that WTF::StringBuilder is a much more efficient way to build a string in a loop.

Thanks for the heads up!  I've changed it.

>> Source/WebCore/html/TextTrackCue.cpp:185
>> +                writingDirection.append(input[position++]);
> 
> Scanning past all non-space characters is done in several places. Can it be factored into a shared function?

Good call.  Done.

>> Source/WebCore/html/TextTrackCue.cpp:196
>> +                linePosition.append(input[position++]);
> 
> WTF::String::append

Done.

>> Source/WebCore/html/TextTrackCue.cpp:259
>> +                cueAlignment.append(input[position++]);
> 
> WTF::String::append

Done.

>> Source/WebCore/platform/track/CueParser.cpp:79
>>      m_client->trackLoadStarted();
> 
> Does it make sense to call trackLoadStarted for MIME types we don't support? Might be better to have a function that creates the parser and calls trackLoadStarted that you can call from here and ddiReceiveData.

Done.

>> Source/WebCore/platform/track/CueParser.cpp:89
>> +        }
> 
> This will result in trackLoadStarted being called twice if the files doesn't have the right MIME type.

Done. The new createWebVTTParser method only calls trackLoadStarted when the parser is created and createWebVTTParser will only be called if we don't yet have a parser, so it should only be called once now.  Thanks and good catch.

>> Source/WebCore/platform/track/WebVTTParser.cpp:52
>> +        line.append(input[position++]);
> 
> More WTF::String::append.

Done.

>> Source/WebCore/platform/track/WebVTTParser.cpp:84
>> +        String line = collectNextLine(input, &position);
> 
> Creating a String of the entire data buffer and then creating another String for each line seems needlessly inefficient. Could collectNextLine take a pointer to the data, length, and offset and create a string for a line from that?

Done.

>> Source/WebCore/platform/track/WebVTTParser.cpp:90
>> +                return;
> 
> Creating a string from the entire buffer at the top of the function and then creating it again in requireFileIdentifier is not very efficient. Can you pass "line" to requireFileIdentifier?

Hmm.  I was trying to avoid creating a string in CueParser when it checks the file content.  I've decide to instead create a string for the first line (not the entire buffer) in requireFileIdentifier.  Hope that's ok!

>> Source/WebCore/platform/track/WebVTTParser.cpp:93
>> +        case Header:
> 
> If you are falling through to the 'Header' case on purpose, please add a comment.

Whoops, no, that was a typo.  Good catch.

>> Source/WebCore/platform/track/WebVTTParser.cpp:148
>> +        start.append(input[position++]);
> 
> WTF::String::append

Done.

>> Source/WebCore/platform/track/WebVTTParser.cpp:164
>> +      end.append(input[position++]);
> 
> WTF::String::append

Done.

>> Source/WebCore/platform/track/WebVTTParser.cpp:184
>> +    m_currentContent += line;
> 
> Darin's email notes that building a String with the += operator is as inefficient as WTF::String::append.

Done.  Changed m_currentContent to a StringBuilder

>> Source/WebCore/platform/track/WebVTTParser.cpp:279
>> +    return line;
> 
> WTF::String::append

Done.
Comment 21 Anna Cavender 2011-07-28 17:24:52 PDT
Created attachment 102315 [details]
addressing eric's comments
Comment 22 Anna Cavender 2011-07-28 18:23:25 PDT
Created attachment 102319 [details]
updating to ToT so patch will apply
Comment 23 Eric Carlson 2011-08-01 15:11:57 PDT
Comment on attachment 102319 [details]
updating to ToT so patch will apply

View in context: https://bugs.webkit.org/attachment.cgi?id=102319&action=review

First, I apologize that it took me to long to get this reviewed :-(

Second, while this is much closer than the last time around, there are enough minor issues that I am going to mark it r- for now.


> Source/WebCore/html/TextTrackCue.cpp:67
> +bool TextTrackCue::charIsANumber(char c)
> +{
> +    return '0' <= c && c <= '9';
> +}

Is there any reason to not use WTF::isASCIIDigit instead of defining this here? If you think it is worth having a new function to emphasize that is follows the webVTT rules, maybe "isWebVTTDigit" and make it an inline?


> Source/WebCore/html/TextTrackCue.cpp:69
> +String TextTrackCue::collectNumberCharacters(const String& input, unsigned* position)

Nit: While you are collecting number characters, you return a string and advance the position. Maybe "collectWebVTTDigits", or "parseWebVTTDigits"?

Actually, you could put this and the other static parsing functions in this class into WebVTTParser so that "WebVTT" in the name isn't necessary and they are all in the same place.


> Source/WebCore/html/TextTrackCue.cpp:80
> +bool TextTrackCue::charIsASpaceCharacter(char c)
> +{
> +    // WebVTT space characters are U+0020 SPACE, U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR).
> +    return c == ' ' || c == '\t' || c == '\n' || c == '\f' || c == '\r';

Nit: I don't think "char" prefix is necessary, and you definitely don't need both the prefix and "Character" suffix. Maybe "isWebVTTSpace"? 

Also, this could be inline.

> Source/WebCore/html/TextTrackCue.cpp:83
> +String TextTrackCue::collectString(const String& input, unsigned* position)

Nit: "collectString" is accurate but not especially descriptive. You collect up to the next space and advance the position, so maybe "collectWebVTTWord", or "parseWebVTTWord"?

> Source/WebCore/html/TextTrackCue.cpp:184
> +        // 5-7 - If the character at position is not ':', set setting to empty string.
> +        if (input[position++] != ':')
> +            setting = ' ';

Does the spec consider a space to be the same as an empty string?

> Source/WebCore/html/TextTrackCue.cpp:191
> +        switch (setting) {
> +        case 'D':
> +            {

Here and elsewhere, I think it would be helpful to quote (or summarize) at least some of the parsing text from the spec. In addition to making it easier for someone that comes along later to understand what the logic is supposed to to, I think it will make somewhat easier to spot parts that need to be changed later because of spec changes.

> Source/WebCore/html/TextTrackCue.cpp:196
> +            if (writingDirection == "vertical")
> +                m_writingDirection = VerticalGrowingLeft;
> +            if (writingDirection == "vertical-lr")
> +                m_writingDirection = VerticalGrowingRight;

The string can't have both values, so an "else" here will make this slightly more efficient.

> Source/WebCore/html/TextTrackCue.cpp:212
> +            long number = linePosition.toInt64(&validNumber);

Do you need this implicit cast from int64_t to long?

> Source/WebCore/html/TextTrackCue.cpp:235
> +            long number = textPosition.toInt64(&validNumber);

Implicit cast?

> Source/WebCore/html/TextTrackCue.cpp:255
> +            long number = cueSize.toInt64(&validNumber);

Implicit cast?

> Source/WebCore/html/TextTrackCue.cpp:271
> +            if (cueAlignment == "start")
> +                m_cueAlignment = Start;
> +            if (cueAlignment == "middle")
> +                m_cueAlignment = Middle;
> +            if (cueAlignment == "end")
> +                m_cueAlignment = End;

"else if" will make this code slightly more efficient.

> Source/WebCore/html/TextTrackCue.cpp:279
> +        continue;
> +Otherwise:

Nit: a blank line after the "continue" will make this more readable IMO.

> Source/WebCore/platform/track/WebVTTParser.cpp:41
> +bool WebVTTParser::requireFileIdentifier(const char* data, int length)

I was confused about what this function did when I saw the name above, "require" as the first part of the name made me think it must enforce a requirement of some sort. Because the function checks to see if the byte stream has a required identifier, maybe "hasRequiredFileIdentifier"?

> Source/WebCore/platform/track/WebVTTParser.cpp:72
> +void WebVTTParser::parseBytes(const char* data, int length)

Here and elsewhere - why do you pass length around as a signed int but use unsigned inside of the functions?

> Source/WebCore/platform/track/WebVTTParser.cpp:114
> +            m_state = collectCueText(line);
> +            if ((int)position >= length - 1 && m_state == CueText)
> +                processCueText();

If "length" has to be signed (see above), you should use a static_cast rather than a C-style cast here.

> Source/WebCore/platform/track/WebVTTParser.cpp:122
> +    } while ((int)position < length);

C-style cast.

> Source/WebCore/platform/track/WebVTTParser.cpp:142
> +    String start = TextTrackCue::collectString(line, &position);
> +    m_currentStartTime = collectTimeStamp(start);

It looks like you always put a word into a local string that is only used to pass to collectTimeStamp. Is there any reason to not have collectTimeStamp take line and position like your other parsing functions?

> Source/WebCore/platform/track/WebVTTParser.cpp:175
> +WebVTTParser::ParseState WebVTTParser::collectCueText(const String& line)
> +{
> +    if (line.isEmpty()) {
> +        processCueText();
> +        return Id;
> +    }
> +    if (!m_currentContent.isEmpty())
> +        m_currentContent.append("\n");
> +    m_currentContent.append(line);
> +    return CueText;
> +}

I think the current factoring is confusing. Some of the logic for what makes a good cue is here and some is back in parseBytes. You sometimes call processCueText here, and sometimes call it in parseBytes. It might be clearer if you combined processCueText and collectCueText so all of the logic is in one place. Could this work?

> Source/WebCore/platform/track/WebVTTParser.cpp:279
> +void WebVTTParser::skipLineTerminator(const char* data, int length, unsigned* position)
> +{
> +    if ((int)*position >= length)
> +        return;
> +    if (data[*position] == '\r')
> +        (*position)++;
> +    if ((int)*position >= length)
> +        return;
> +    if (data[*position] == '\n')
> +        (*position)++;
> +}

Is there a reason that "length" needs to be signed? If so, the casts should no be C-style.

> Source/WebCore/platform/track/WebVTTParser.cpp:289
> +String WebVTTParser::collectNextLine(const char* data, int length, unsigned* position)
> +{
> +    unsigned oldPosition = *position;
> +    while ((int)*position < length && data[*position] != '\r' && data[*position] != '\n')
> +        (*position)++;
> +    String line = String::fromUTF8(data + oldPosition, *position - oldPosition);
> +    skipLineTerminator(data, length, position);
> +    return line;
> +}

Ditto.

> Source/WebCore/platform/track/WebVTTParser.h:43
> +static const int secondsInHour = 3600;
> +static const int secondsInMinute = 60;

Minor nit: "Per" might be better than "In", eg. "secondsPerHour" and "secondsPerMinute"
Comment 24 Anna Cavender 2011-08-02 22:09:10 PDT
Comment on attachment 102319 [details]
updating to ToT so patch will apply

Thanks for the review!

View in context: https://bugs.webkit.org/attachment.cgi?id=102319&action=review

>> Source/WebCore/html/TextTrackCue.cpp:67
>> +}
> 
> Is there any reason to not use WTF::isASCIIDigit instead of defining this here? If you think it is worth having a new function to emphasize that is follows the webVTT rules, maybe "isWebVTTDigit" and make it an inline?

Done.  Didn't know isASCIIDigit existed, thanks!

>> Source/WebCore/html/TextTrackCue.cpp:69
>> +String TextTrackCue::collectNumberCharacters(const String& input, unsigned* position)
> 
> Nit: While you are collecting number characters, you return a string and advance the position. Maybe "collectWebVTTDigits", or "parseWebVTTDigits"?
> 
> Actually, you could put this and the other static parsing functions in this class into WebVTTParser so that "WebVTT" in the name isn't necessary and they are all in the same place.

Done and Done.  Renamed to collectDigits (to keep with spec language) and moved all of these static functions to WebVTTParser.

>> Source/WebCore/html/TextTrackCue.cpp:80

> 
> Nit: I don't think "char" prefix is necessary, and you definitely don't need both the prefix and "Character" suffix. Maybe "isWebVTTSpace"? 
> 
> Also, this could be inline.

Done.  Good call about the name :).  Renamed to isASpace, moved in WebVTTParser, and inlined.

>> Source/WebCore/html/TextTrackCue.cpp:83
>> +String TextTrackCue::collectString(const String& input, unsigned* position)
> 
> Nit: "collectString" is accurate but not especially descriptive. You collect up to the next space and advance the position, so maybe "collectWebVTTWord", or "parseWebVTTWord"?

Done.  Renamed to collectWord and moved to WebVTTParser

>> Source/WebCore/html/TextTrackCue.cpp:184
>> +            setting = ' ';
> 
> Does the spec consider a space to be the same as an empty string?

Done.  Oops, these are not equivalent, changed to '\0'

>> Source/WebCore/html/TextTrackCue.cpp:191
>> +            {
> 
> Here and elsewhere, I think it would be helpful to quote (or summarize) at least some of the parsing text from the spec. In addition to making it easier for someone that comes along later to understand what the logic is supposed to to, I think it will make somewhat easier to spot parts that need to be changed later because of spec changes.

Done.

>> Source/WebCore/html/TextTrackCue.cpp:196
>> +                m_writingDirection = VerticalGrowingRight;
> 
> The string can't have both values, so an "else" here will make this slightly more efficient.

Done.

>> Source/WebCore/html/TextTrackCue.cpp:212
>> +            long number = linePosition.toInt64(&validNumber);
> 
> Do you need this implicit cast from int64_t to long?

I (probably mistakenly) thought toInt64 would be more accurate when cast to a long than toInt.  You prefer toInt?  WTFString doesn't have a toLong :).  I've changed to toInt here and elsewhere.

>> Source/WebCore/html/TextTrackCue.cpp:235
>> +            long number = textPosition.toInt64(&validNumber);
> 
> Implicit cast?

Done.  Changed to toInt, hope that's ok

>> Source/WebCore/html/TextTrackCue.cpp:255
>> +            long number = cueSize.toInt64(&validNumber);
> 
> Implicit cast?

Done.  Changed to toInt, hope that's ok

>> Source/WebCore/html/TextTrackCue.cpp:271
>> +                m_cueAlignment = End;
> 
> "else if" will make this code slightly more efficient.

Done.

>> Source/WebCore/html/TextTrackCue.cpp:279
>> +Otherwise:
> 
> Nit: a blank line after the "continue" will make this more readable IMO.

Done.

>> Source/WebCore/platform/track/WebVTTParser.cpp:41
>> +bool WebVTTParser::requireFileIdentifier(const char* data, int length)
> 
> I was confused about what this function did when I saw the name above, "require" as the first part of the name made me think it must enforce a requirement of some sort. Because the function checks to see if the byte stream has a required identifier, maybe "hasRequiredFileIdentifier"?

Done.

>> Source/WebCore/platform/track/WebVTTParser.cpp:72
>> +void WebVTTParser::parseBytes(const char* data, int length)
> 
> Here and elsewhere - why do you pass length around as a signed int but use unsigned inside of the functions?

Hmm.  unsigned is easier to use inside the functions for comparison with length().  int is left over from CueParser::didReceiveData(const char*, int) which is from ThreadableLoaderClient.  I suppose I should be consistent.  WebVTTParser will now consistently use unsigned.

>> Source/WebCore/platform/track/WebVTTParser.cpp:114
>> +                processCueText();
> 
> If "length" has to be signed (see above), you should use a static_cast rather than a C-style cast here.

Done.

>> Source/WebCore/platform/track/WebVTTParser.cpp:122
>> +    } while ((int)position < length);
> 
> C-style cast.

Done.

>> Source/WebCore/platform/track/WebVTTParser.cpp:142
>> +    m_currentStartTime = collectTimeStamp(start);
> 
> It looks like you always put a word into a local string that is only used to pass to collectTimeStamp. Is there any reason to not have collectTimeStamp take line and position like your other parsing functions?

Done.

>> Source/WebCore/platform/track/WebVTTParser.cpp:175
>> +}
> 
> I think the current factoring is confusing. Some of the logic for what makes a good cue is here and some is back in parseBytes. You sometimes call processCueText here, and sometimes call it in parseBytes. It might be clearer if you combined processCueText and collectCueText so all of the logic is in one place. Could this work?

I went ahead and moved all of the logic into collectCueText, but left processCueText as a separate method.  Later when we actually process the cue text properly (patch coming soon!), that function will be much larger.

>> Source/WebCore/platform/track/WebVTTParser.cpp:279
>> +}
> 
> Is there a reason that "length" needs to be signed? If so, the casts should no be C-style.

Done.

>> Source/WebCore/platform/track/WebVTTParser.cpp:289
>> +}
> 
> Ditto.

Done.

>> Source/WebCore/platform/track/WebVTTParser.h:43
>> +static const int secondsInMinute = 60;
> 
> Minor nit: "Per" might be better than "In", eg. "secondsPerHour" and "secondsPerMinute"

Done.
Comment 25 Anna Cavender 2011-08-02 22:09:52 PDT
Created attachment 102739 [details]
addressing eric's comments
Comment 26 Eric Carlson 2011-08-05 14:01:09 PDT
Comment on attachment 102739 [details]
addressing eric's comments

View in context: https://bugs.webkit.org/attachment.cgi?id=102739&action=review

This is *really* close, and I would r+ it despite the nit-picky comments, but the long-int issue really is a problem so I am marking this r-.



> Source/WebCore/html/TextTrackCue.cpp:157
> +        // 5-7 - If the character at position is not ':', set setting to empty string.
> +        if (input[position++] != ':')
> +            setting = '\0';

Nit: I find a blank line before a comment can make it easier to read. Please consider adding some white space here and elsewhere the next time you edit this file.

> Source/WebCore/html/TextTrackCue.cpp:180
> +            String linePosition = linePositionBuilder.toString();
> +            if (position < input.length() && !WebVTTParser::isASpace(input[position]))
> +                goto Otherwise;

Nit: You should switch the order of these because "linePosition" is not used if the test fails.

> Source/WebCore/html/TextTrackCue.cpp:181
> +            // 3-5 - If there is not at least one digit character, a '-' occurs anywhere other than the front, or a '%' occurs anywhere other than the end, then ignore this setting and keep looking.

Nit: this line is too long to read without scrolling in an editor that isn't set to wrap lines. Please break the line into smaller chunks the next time you edit this file.

> Source/WebCore/html/TextTrackCue.cpp:189
> +            long number = linePosition.toInt(&validNumber);

String::toInt return an "int", not a "long". long and int not the some in some compiler/processor configurations.

> Source/WebCore/html/TextTrackCue.cpp:191
> +            if (!validNumber)
> +                break;

I assume "validNumber" won't be false if the string has a trailing '%'?

> Source/WebCore/html/TextTrackCue.cpp:205
> +            // 1-2 - Collect characterss that are digits.

Nit: typo - characterss

> Source/WebCore/html/TextTrackCue.cpp:214
> +            // 4-6 - Ensure no other chars in this setting and setting is not empty.
> +            if (position < input.length() && !WebVTTParser::isASpace(input[position]))
> +                goto Otherwise;

Won't this allow characters at the end of a line after a space?

> Source/WebCore/html/TextTrackCue.cpp:219
> +            long number = textPosition.toInt(&validNumber);

long != int

> Source/WebCore/html/TextTrackCue.cpp:239
> +            // 4-6 - Ensure no other chars in this setting and setting is not empty.
> +            if (position < input.length() && !WebVTTParser::isASpace(input[position]))
> +                goto Otherwise;

Ditto.

> Source/WebCore/html/TextTrackCue.cpp:244
> +            long number = cueSize.toInt(&validNumber);

long != int

> Source/WebCore/platform/track/CueParser.cpp:80
>  void CueParser::didReceiveData(const char* data, int len)

Nit: no need to abbreviate "len".

> Source/WebCore/platform/track/WebVTTParser.cpp:55
> +    if (line.length() > 6 && (line.substring(0, 6) != "WEBVTT" || (line[6] != ' ' && line[6] != '\t')))

It would be good to have some parentheses here to make the precedence of the comparisons explicit.

> Source/WebCore/platform/track/WebVTTParser.cpp:251
> +        if (*position >= line.length() || line[*position] != ':')
> +            return malformedTime;
> +        (*position)++;

Just above, you increment "position" in the test for ':' 

  if (*position >= line.length() || line[(*position)++] != ':')
      return malformedTime;

But here you don't increment it until after the check. Is it a problem that you increment past the invalid character above before returning? If not, why not do the same thing here?
Comment 27 Anna Cavender 2011-08-05 16:13:32 PDT
Comment on attachment 102739 [details]
addressing eric's comments

Thanks Eric!  Another patch is on the way.

View in context: https://bugs.webkit.org/attachment.cgi?id=102739&action=review

>> Source/WebCore/html/TextTrackCue.cpp:157
>> +            setting = '\0';
> 
> Nit: I find a blank line before a comment can make it easier to read. Please consider adding some white space here and elsewhere the next time you edit this file.

Done.  Sorry about that.

>> Source/WebCore/html/TextTrackCue.cpp:180
>> +                goto Otherwise;
> 
> Nit: You should switch the order of these because "linePosition" is not used if the test fails.

Done.

>> Source/WebCore/html/TextTrackCue.cpp:181
>> +            // 3-5 - If there is not at least one digit character, a '-' occurs anywhere other than the front, or a '%' occurs anywhere other than the end, then ignore this setting and keep looking.
> 
> Nit: this line is too long to read without scrolling in an editor that isn't set to wrap lines. Please break the line into smaller chunks the next time you edit this file.

Done.

>> Source/WebCore/html/TextTrackCue.cpp:189
>> +            long number = linePosition.toInt(&validNumber);
> 
> String::toInt return an "int", not a "long". long and int not the some in some compiler/processor configurations.

Done.  Changed member variables to be ints rather than longs.

>> Source/WebCore/html/TextTrackCue.cpp:191
>> +                break;
> 
> I assume "validNumber" won't be false if the string has a trailing '%'?

True.  Added a comment for clarity.

>> Source/WebCore/html/TextTrackCue.cpp:205
>> +            // 1-2 - Collect characterss that are digits.
> 
> Nit: typo - characterss

Done.

>> Source/WebCore/html/TextTrackCue.cpp:214
>> +                goto Otherwise;
> 
> Won't this allow characters at the end of a line after a space?

No, this ensures that there is no garbage directly after the % (i.e. a space is required after each setting).  If something other than a space is found, Otherwise will eat it up and ignore the setting.

>> Source/WebCore/html/TextTrackCue.cpp:219
>> +            long number = textPosition.toInt(&validNumber);
> 
> long != int

Done.

>> Source/WebCore/html/TextTrackCue.cpp:239
>> +                goto Otherwise;
> 
> Ditto.

Same thing here.  No garbage allowed after the setting (a space is required).

>> Source/WebCore/html/TextTrackCue.cpp:244
>> +            long number = cueSize.toInt(&validNumber);
> 
> long != int

Done.

>> Source/WebCore/platform/track/CueParser.cpp:80
>>  void CueParser::didReceiveData(const char* data, int len)
> 
> Nit: no need to abbreviate "len".

Done.

>> Source/WebCore/platform/track/WebVTTParser.cpp:55
>> +    if (line.length() > 6 && (line.substring(0, 6) != "WEBVTT" || (line[6] != ' ' && line[6] != '\t')))
> 
> It would be good to have some parentheses here to make the precedence of the comparisons explicit.

I already have parentheses for each binary operation and wasn't sure where to add more, so instead I added the spec language in a comment and separated into two lines.  Hope that's more clear!

>> Source/WebCore/platform/track/WebVTTParser.cpp:251
>> +        (*position)++;
> 
> Just above, you increment "position" in the test for ':' 
> 
>   if (*position >= line.length() || line[(*position)++] != ':')
>       return malformedTime;
> 
> But here you don't increment it until after the check. Is it a problem that you increment past the invalid character above before returning? If not, why not do the same thing here?

Done.  Good call.
Comment 28 Anna Cavender 2011-08-05 16:14:24 PDT
Created attachment 103129 [details]
some nits and int to long casting removed
Comment 29 Anna Cavender 2011-08-05 17:39:27 PDT
Created attachment 103135 [details]
fixing some typos
Comment 30 Eric Carlson 2011-08-05 17:47:00 PDT
Comment on attachment 103135 [details]
fixing some typos

Thanks!
Comment 31 Anna Cavender 2011-08-05 18:31:39 PDT
Created attachment 103136 [details]
updating to ToT
Comment 32 WebKit Review Bot 2011-08-05 19:01:06 PDT
Comment on attachment 103136 [details]
updating to ToT

Clearing flags on attachment: 103136

Committed r92538: <http://trac.webkit.org/changeset/92538>
Comment 33 WebKit Review Bot 2011-08-05 19:01:13 PDT
All reviewed patches have been landed.  Closing bug.
Comment 34 Mark Rowe (bdash) 2011-08-05 19:54:44 PDT
This change broke the Mac build.  All of the track-related files appear to have been added to the incorrect place in the Xcode project so it’s not even obvious how to fix it.
Comment 35 Ryosuke Niwa 2011-08-05 20:07:37 PDT
This patch broke Mac builds.
Comment 36 Mark Rowe (bdash) 2011-08-05 20:09:46 PDT
Whomever added all of these track-related files to the Xcode project made a complete mess of it. Some were added inside the “svg" group, others inside the “testing” group.  Neither of those correspond to where the code lives.

I’ve attempted to fix the build and clean up this mess in r92542.  I’d very much appreciate it if slightly more care could be taken in the future.
Comment 37 Anna Cavender 2011-08-05 21:12:58 PDT
Thanks so much mrowe!  And sorry for the inconvenience, I'm not sure how they got so botched.  I appreciate your help.