Bug 81123 - cue.text fails for some track element cues
Summary: cue.text fails for some track element cues
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Media (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Brendan Long
URL:
Keywords: InRadar
Depends on:
Blocks: 43668
  Show dependency treegraph
 
Reported: 2012-03-14 09:20 PDT by Sam Dutton
Modified: 2013-10-22 11:44 PDT (History)
16 users (show)

See Also:


Attachments
Proposed patch (24.06 KB, patch)
2012-04-06 18:15 PDT, Eric Carlson
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ec2-cr-linux-04 (6.26 MB, application/zip)
2012-04-09 07:06 PDT, WebKit Review Bot
no flags Details
Proposed patch (27.48 KB, patch)
2012-04-13 15:22 PDT, Eric Carlson
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ec2-cr-linux-03 (6.40 MB, application/zip)
2012-04-14 04:45 PDT, WebKit Review Bot
no flags Details
Rebase patch (26.91 KB, patch)
2013-09-26 15:02 PDT, Brendan Long
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from webkit-ews-16 for mac-mountainlion-wk2 (502.04 KB, application/zip)
2013-09-26 15:54 PDT, Build Bot
no flags Details
Archive of layout-test-results from webkit-ews-03 for mac-mountainlion (778.57 KB, application/zip)
2013-09-26 16:18 PDT, Build Bot
no flags Details
captions-long.vtt (9.37 KB, application/octet-stream)
2013-09-26 17:01 PDT, Eric Carlson
no flags Details
Archive of layout-test-results from webkit-ews-07 for mac-mountainlion (527.59 KB, application/zip)
2013-09-26 17:24 PDT, Build Bot
no flags Details
Buffer data in WebVTTParser (23.20 KB, patch)
2013-10-07 17:47 PDT, Brendan Long
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from webkit-ews-16 for mac-mountainlion-wk2 (991.22 KB, application/zip)
2013-10-07 19:51 PDT, Build Bot
no flags Details
Archive of layout-test-results from webkit-ews-01 for mac-mountainlion (689.50 KB, application/zip)
2013-10-07 20:13 PDT, Build Bot
no flags Details
Archive of layout-test-results from webkit-ews-09 for mac-mountainlion-wk2 (1.09 MB, application/zip)
2013-10-07 20:30 PDT, Build Bot
no flags Details
Archive of layout-test-results from webkit-ews-08 for mac-mountainlion (648.62 KB, application/zip)
2013-10-07 20:33 PDT, Build Bot
no flags Details
Fix buffer logic (22.99 KB, patch)
2013-10-08 13:11 PDT, Brendan Long
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from webkit-ews-13 for mac-mountainlion-wk2 (672.42 KB, application/zip)
2013-10-08 13:51 PDT, Build Bot
no flags Details
Archive of layout-test-results from webkit-ews-07 for mac-mountainlion (674.93 KB, application/zip)
2013-10-08 14:00 PDT, Build Bot
no flags Details
Fix BOM and add NULL check for m_parser (23.60 KB, patch)
2013-10-08 15:01 PDT, Brendan Long
no flags Details | Formatted Diff | Diff
Use m_state to simplify BOM and Finished handling. (23.75 KB, patch)
2013-10-08 19:39 PDT, Brendan Long
no flags Details | Formatted Diff | Diff
Only use buffer when needed (23.60 KB, patch)
2013-10-10 01:03 PDT, Brendan Long
no flags Details | Formatted Diff | Diff
Simplify hasRequiredFileIdentifier (24.56 KB, patch)
2013-10-10 09:18 PDT, Brendan Long
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from webkit-ews-13 for mac-mountainlion-wk2 (551.29 KB, application/zip)
2013-10-10 10:17 PDT, Build Bot
no flags Details
Archive of layout-test-results from webkit-ews-04 for mac-mountainlion (567.90 KB, application/zip)
2013-10-10 10:39 PDT, Build Bot
no flags Details
Fix end-of-line handling in hasRequiredFileIdentifier (24.63 KB, patch)
2013-10-10 11:33 PDT, Brendan Long
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sam Dutton 2012-03-14 09:20:04 PDT
cue.text appears to lead to 'Unexpected end of input' error.

Go to  http://samdutton.net/track/bug/cueText.html

View console output.

Does not appear to be any problem with .vtt file -- have tried resaving, etc. -- sometimes fails on different cues. 

Have also tried adding quotes to all values.

19.0.1065.0 canary
Comment 1 Eric Carlson 2012-03-15 06:51:49 PDT
(In reply to comment #0)
> cue.text appears to lead to 'Unexpected end of input' error.
> 
> Go to  http://samdutton.net/track/bug/cueText.html
> 
I greatly appreciate your bug reports, but it would be extremely helpful if your test cases also included a .mp4 video file. I investigate all text track bugs and have fixed most of the ones filed in the last six months. Because I use Safari for development, before I can examine one of your test cases I have to download all of the resources it uses, export an MPEG-4 version of your video, and re-host the example.

Don't get me wrong, I would *much* rather have your WebM-only test cases than nothing at all! I only mean that it will save me a significant amount of time if you are able to create test cases with both WebM and MPEG-4 videos.
Comment 2 Sam Dutton 2012-03-15 08:14:24 PDT
> I greatly appreciate your bug reports, but it would be extremely helpful if your test cases also included a .mp4 video file.

Great idea -- I've added this to all the test cases in the http://samdutton.net/track/bug directory.
Comment 3 Eric Carlson 2012-03-15 08:19:39 PDT
(In reply to comment #2)
> > I greatly appreciate your bug reports, but it would be extremely helpful if your test cases also included a .mp4 video file.
> 
> Great idea -- I've added this to all the test cases in the http://samdutton.net/track/bug directory.

Thank you very much!
Comment 4 Radar WebKit Bug Importer 2012-03-15 08:29:22 PDT
<rdar://problem/11054940>
Comment 5 Eric Carlson 2012-04-06 18:15:35 PDT
Created attachment 136115 [details]
Proposed patch
Comment 6 WebKit Review Bot 2012-04-09 07:06:16 PDT
Comment on attachment 136115 [details]
Proposed patch

Attachment 136115 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/12372184

New failing tests:
media/track/track-long-captions-file.html
media/track/track-webvtt-tc028-unsupported-markup.html
media/track/track-cues-pause-on-exit.html
media/track/track-cues-sorted-before-dispatch.html
Comment 7 WebKit Review Bot 2012-04-09 07:06:21 PDT
Created attachment 136218 [details]
Archive of layout-test-results from ec2-cr-linux-04

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-04  Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'>  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment 8 Eric Carlson 2012-04-13 15:21:13 PDT
(In reply to comment #7)
> Created an attachment (id=136218) [details]
> Archive of layout-test-results from ec2-cr-linux-04
> 
> The attached test failures were seen while running run-webkit-tests on the chromium-ews.
> Bot: ec2-cr-linux-04  Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'>  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick

The failing media tests are: 
    track-webvtt-tc028-unsupported-markup.html - will fix
    track-cues-pause-on-exit.html - this is https://bugs.webkit.org/show_bug.cgi?id=80067
    track-cues-sorted-before-dispatch.html - this is https://bugs.webkit.org/show_bug.cgi?id=77862
Comment 9 Eric Carlson 2012-04-13 15:22:15 PDT
Created attachment 137163 [details]
Proposed patch
Comment 10 WebKit Review Bot 2012-04-14 04:45:46 PDT
Comment on attachment 137163 [details]
Proposed patch

Attachment 137163 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/12399746

New failing tests:
media/track/track-long-captions-file.html
media/track/track-cues-pause-on-exit.html
media/track/track-cues-sorted-before-dispatch.html
Comment 11 WebKit Review Bot 2012-04-14 04:45:54 PDT
Created attachment 137205 [details]
Archive of layout-test-results from ec2-cr-linux-03

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-03  Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'>  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment 12 Anna Cavender 2012-08-15 10:51:05 PDT
What ever happened to this bug and the associated patch?  Eric, I have a hunch that this may fix the timeouts we are seeing on track-cues-sorted-before-dispatch.html.  WDYT?

http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showExpectations=true&tests=track-cues-sorted-before-dispatch
Comment 13 Darin Adler 2013-05-14 10:33:56 PDT
Comment on attachment 137163 [details]
Proposed patch

This old patch looks fine. Iā€™m setting review+ but not sure where we stand on it.
Comment 14 Brendan Long 2013-09-26 15:00:47 PDT
The patch refers to captions-webvtt/captions-long.vtt, but it doesn't seem to exist.
Comment 15 Brendan Long 2013-09-26 15:02:10 PDT
Created attachment 212756 [details]
Rebase patch

This isn't done yet, since captions-long.vtt doesn't seem to exist..
Comment 16 Build Bot 2013-09-26 15:54:26 PDT
Comment on attachment 212756 [details]
Rebase patch

Attachment 212756 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/2617023

New failing tests:
media/track/track-cue-rendering-rtl.html
media/track/track-webvtt-tc022-entities.html
media/track/track-webvtt-tc023-markup.html
media/track/track-long-captions-file.html
media/track/track-webvtt-tc028-unsupported-markup.html
media/track/track-webvtt-tc024-timestamp.html
Comment 17 Build Bot 2013-09-26 15:54:28 PDT
Created attachment 212763 [details]
Archive of layout-test-results from webkit-ews-16 for mac-mountainlion-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-16  Port: mac-mountainlion-wk2  Platform: Mac OS X 10.8.5
Comment 18 Build Bot 2013-09-26 16:18:23 PDT
Comment on attachment 212756 [details]
Rebase patch

Attachment 212756 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/2629002

New failing tests:
media/track/track-cue-rendering-rtl.html
media/track/track-webvtt-tc022-entities.html
media/track/track-webvtt-tc023-markup.html
media/track/track-long-captions-file.html
media/track/track-webvtt-tc028-unsupported-markup.html
media/track/track-webvtt-tc024-timestamp.html
Comment 19 Build Bot 2013-09-26 16:18:26 PDT
Created attachment 212766 [details]
Archive of layout-test-results from webkit-ews-03 for mac-mountainlion

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-03  Port: mac-mountainlion  Platform: Mac OS X 10.8.5
Comment 20 Eric Carlson 2013-09-26 17:01:58 PDT
Created attachment 212769 [details]
captions-long.vtt

VTT file with 100 captions
Comment 21 Build Bot 2013-09-26 17:24:51 PDT
Comment on attachment 212756 [details]
Rebase patch

Attachment 212756 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/2635024

New failing tests:
media/track/track-cue-rendering-rtl.html
media/track/track-webvtt-tc022-entities.html
media/track/track-webvtt-tc023-markup.html
media/track/track-long-captions-file.html
media/track/track-webvtt-tc028-unsupported-markup.html
media/track/track-webvtt-tc024-timestamp.html
Comment 22 Build Bot 2013-09-26 17:24:54 PDT
Created attachment 212772 [details]
Archive of layout-test-results from webkit-ews-07 for mac-mountainlion

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-07  Port: mac-mountainlion  Platform: Mac OS X 10.8.5
Comment 23 Brendan Long 2013-09-27 13:32:56 PDT
The way this changes tests is suspicious. Shouldn't a line break either be preserved, or be converted to a space, not just removed?
Comment 24 Brendan Long 2013-10-07 10:50:56 PDT
According to the spec[1]:

> Line breaks in cues are honored. User agents will also insert extra line breaks if necessary to fit the cue in the cue's width. In general, therefore, authors are encouraged to write cues all on one line except when a line break is definitely necessary, and to not manually line-wrap for aesthetic reasons alone.

So, the test change where \n is removed from the expected cues seems to be wrong..

[1] http://dev.w3.org/html5/webvtt/#cues-with-multiple-lines
Comment 25 Brendan Long 2013-10-07 17:47:09 PDT
Created attachment 213632 [details]
Buffer data in WebVTTParser

The running theme in a lot of bug reports (see also #95691 and possibly #97097) is that we cut off parsing in the wrong place. This patch adds a buffer. For some reason the test gets a cross-origin error on WebKitGTK though, and I'm not entirely sure if just removing deprecatedDidReceiveCachedResource is correct (although it seems to work, and the FIXME comment suggests that that's what we should do).
Comment 26 Build Bot 2013-10-07 19:51:46 PDT
Comment on attachment 213632 [details]
Buffer data in WebVTTParser

Attachment 213632 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/3422067

New failing tests:
media/track/track-cue-rendering-rtl.html
media/track/track-mode.html
media/track/track-cues-pause-on-exit.html
media/track/track-cue-mutable.html
media/track/track-remove-by-setting-innerHTML.html
media/track/track-cue-rendering-with-padding.html
media/track/track-cue-rendering.html
media/track/track-long-captions-file.html
media/track/track-cue-nothing-to-render.html
media/track/track-webvtt-tc003-newlines.html
media/track/track-large-timestamp.html
media/track/track-webvtt-tc002-bom.html
media/track/track-active-cues.html
media/track/track-cue-container-rendering-position.html
Comment 27 Build Bot 2013-10-07 19:51:49 PDT
Created attachment 213641 [details]
Archive of layout-test-results from webkit-ews-16 for mac-mountainlion-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-16  Port: mac-mountainlion-wk2  Platform: Mac OS X 10.8.5
Comment 28 Build Bot 2013-10-07 20:13:34 PDT
Comment on attachment 213632 [details]
Buffer data in WebVTTParser

Attachment 213632 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/3710079

New failing tests:
media/track/track-cue-rendering-rtl.html
media/track/track-mode.html
media/track/track-cues-pause-on-exit.html
media/track/track-cue-mutable.html
media/track/track-remove-by-setting-innerHTML.html
media/track/track-cue-rendering-with-padding.html
media/track/track-cue-rendering.html
media/track/track-long-captions-file.html
media/track/track-cue-nothing-to-render.html
media/track/track-webvtt-tc003-newlines.html
media/track/track-large-timestamp.html
media/track/track-webvtt-tc002-bom.html
media/track/track-active-cues.html
media/track/track-cue-container-rendering-position.html
Comment 29 Build Bot 2013-10-07 20:13:38 PDT
Created attachment 213643 [details]
Archive of layout-test-results from webkit-ews-01 for mac-mountainlion

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-01  Port: mac-mountainlion  Platform: Mac OS X 10.8.5
Comment 30 Build Bot 2013-10-07 20:30:25 PDT
Comment on attachment 213632 [details]
Buffer data in WebVTTParser

Attachment 213632 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/3342063

New failing tests:
media/track/track-cue-rendering-rtl.html
media/track/track-mode.html
media/track/track-cues-pause-on-exit.html
media/track/track-cue-mutable.html
media/track/track-remove-by-setting-innerHTML.html
media/track/track-cue-rendering-with-padding.html
media/track/track-cue-rendering.html
media/track/track-long-captions-file.html
media/track/track-cue-nothing-to-render.html
media/track/track-webvtt-tc003-newlines.html
media/track/track-large-timestamp.html
media/track/track-webvtt-tc002-bom.html
media/track/track-active-cues.html
media/track/track-cue-container-rendering-position.html
Comment 31 Build Bot 2013-10-07 20:30:28 PDT
Created attachment 213645 [details]
Archive of layout-test-results from webkit-ews-09 for mac-mountainlion-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-09  Port: mac-mountainlion-wk2  Platform: Mac OS X 10.8.5
Comment 32 Build Bot 2013-10-07 20:33:46 PDT
Comment on attachment 213632 [details]
Buffer data in WebVTTParser

Attachment 213632 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/3746012

New failing tests:
media/track/track-cue-rendering-rtl.html
media/track/track-mode.html
media/track/track-cues-pause-on-exit.html
media/track/track-cue-mutable.html
media/track/track-remove-by-setting-innerHTML.html
media/track/track-cue-rendering-with-padding.html
media/track/track-cue-rendering.html
media/track/track-long-captions-file.html
media/track/track-cue-nothing-to-render.html
media/track/track-webvtt-tc003-newlines.html
media/track/track-large-timestamp.html
media/track/track-webvtt-tc002-bom.html
media/track/track-active-cues.html
media/track/track-cue-container-rendering-position.html
Comment 33 Build Bot 2013-10-07 20:33:49 PDT
Created attachment 213646 [details]
Archive of layout-test-results from webkit-ews-08 for mac-mountainlion

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-08  Port: mac-mountainlion  Platform: Mac OS X 10.8.5
Comment 34 Brendan Long 2013-10-08 13:11:05 PDT
Created attachment 213710 [details]
Fix buffer logic

My original version of this patch had a lot of little problems. deprecatedDidReceiveCachedResource() is definitely needed (although maybe it can be replaced with dataReceived()?). With that in place, m_cueParser->fileFinished() needs to go in notifyFinished() (so it doesn't get called every time we get data). I also added m_finished with an ASSERT so this doesn't bite anyone else in the future. I also messed up the handling of line breaks: According to the spec, any of \r, \n, and \r\n are all allowed. The current patch should fix all of these things, and the test passes on WebKitGTK (I'm building a different patch on my Mac, but I'll go to that if this fails on EWS again).
Comment 35 Build Bot 2013-10-08 13:51:28 PDT
Comment on attachment 213710 [details]
Fix buffer logic

Attachment 213710 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/3751126

New failing tests:
media/track/track-cues-pause-on-exit.html
media/track/track-webvtt-tc000-empty.html
media/track/track-mode.html
media/track/track-cue-container-rendering-position.html
media/track/track-cue-mutable.html
media/track/track-cue-rendering-rtl.html
media/track/track-cue-rendering-with-padding.html
media/track/track-load-error-readyState.html
media/track/track-long-captions-file.html
media/track/track-cue-nothing-to-render.html
media/track/track-large-timestamp.html
media/track/track-webvtt-tc002-bom.html
media/track/track-addtrack-kind.html
media/track/track-active-cues.html
Comment 36 Build Bot 2013-10-08 13:51:31 PDT
Created attachment 213716 [details]
Archive of layout-test-results from webkit-ews-13 for mac-mountainlion-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-13  Port: mac-mountainlion-wk2  Platform: Mac OS X 10.8.5
Comment 37 Build Bot 2013-10-08 14:00:34 PDT
Comment on attachment 213710 [details]
Fix buffer logic

Attachment 213710 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/3760061

New failing tests:
media/track/track-cues-pause-on-exit.html
media/track/track-webvtt-tc000-empty.html
media/track/track-mode.html
media/track/track-cue-container-rendering-position.html
media/track/track-cue-mutable.html
media/track/track-cue-rendering-rtl.html
media/track/track-cue-rendering-with-padding.html
media/track/track-load-error-readyState.html
media/track/track-long-captions-file.html
media/track/track-cue-nothing-to-render.html
media/track/track-large-timestamp.html
media/track/track-webvtt-tc002-bom.html
media/track/track-active-cues.html
Comment 38 Build Bot 2013-10-08 14:00:38 PDT
Created attachment 213718 [details]
Archive of layout-test-results from webkit-ews-07 for mac-mountainlion

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-07  Port: mac-mountainlion  Platform: Mac OS X 10.8.5
Comment 39 Brendan Long 2013-10-08 15:01:27 PDT
Created attachment 213723 [details]
Fix BOM and add NULL check for m_parser

There were two remaining problems: Because this patch converts to a String up-front, the BOM gets converted to a UTF-16 BOM (0xFEFF instead of 0xEFBBBF). This solution handles that comparing against String::fromUTF8(xEFxBBFF). I could do a more complicated version if you think it's important to explicitly check the original data for 0xEFBBFF. The other problem was that I didn't check if m_parser existed before using it in TrackLoader::notifyFinished().
Comment 40 Glenn Adams 2013-10-08 15:10:18 PDT
(In reply to comment #39)
> Created an attachment (id=213723) [details]
> Fix BOM and add NULL check for m_parser
> 
> There were two remaining problems: Because this patch converts to a String up-front, the BOM gets converted to a UTF-16 BOM (0xFEFF instead of 0xEFBBBF). This solution handles that comparing against String::fromUTF8(xEFxBBFF). I could do a more complicated version if you think it's important to explicitly check the original data for 0xEFBBFF. The other problem was that I didn't check if m_parser existed before using it in TrackLoader::notifyFinished().

What do you mean by "check the original data for 0xEFBBBF"? Do you mean check UTF-8 encoded string for the presence of the three contiguous bytes 0xEF 0xBB 0xBF
Comment 41 Build Bot 2013-10-08 15:28:15 PDT
Comment on attachment 213723 [details]
Fix BOM and add NULL check for m_parser

Attachment 213723 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/3759070
Comment 42 Brendan Long 2013-10-08 15:29:10 PDT
(In reply to comment #40)
> What do you mean by "check the original data for 0xEFBBBF"? Do you mean check UTF-8 encoded string for the presence of the three contiguous bytes 0xEF 0xBB 0xBF

I mean that technically we can't check `line` for the UTF-8 BOM because it's a UTF-16 String, and UTF-16 and UTF-8 have different BOMs. The way this was handled previously was by using a Vector<char> instead of `line`. That seems wasteful now that we buffer everything, but I could take that approach if reviewers don't like this.

I'm pretty sure it makes no difference, but I figured I'd mention it since it's "surprising".
Comment 43 Glenn Adams 2013-10-08 15:35:32 PDT
(In reply to comment #42)
> (In reply to comment #40)
> > What do you mean by "check the original data for 0xEFBBBF"? Do you mean check UTF-8 encoded string for the presence of the three contiguous bytes 0xEF 0xBB 0xBF
> 
> I mean that technically we can't check `line` for the UTF-8 BOM because it's a UTF-16 String, and UTF-16 and UTF-8 have different BOMs. The way this was handled previously was by using a Vector<char> instead of `line`. That seems wasteful now that we buffer everything, but I could take that approach if reviewers don't like this.
> 
> I'm pretty sure it makes no difference, but I figured I'd mention it since it's "surprising".

If the originally encoded UTF-8 content has been converted to 16-bit UTF-16, then it is pointless checking for the UTF-8 encoding of BOM since it will have been converted to 0xFEFF. After conversion, the BOM could be discarded or ignored because it no longer has a role to play.
Comment 44 Build Bot 2013-10-08 15:44:13 PDT
Comment on attachment 213723 [details]
Fix BOM and add NULL check for m_parser

Attachment 213723 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/3422306
Comment 45 Brendan Long 2013-10-08 15:45:59 PDT
(In reply to comment #43)
> If the originally encoded UTF-8 content has been converted to 16-bit UTF-16, then it is pointless checking for the UTF-8 encoding of BOM since it will have been converted to 0xFEFF.

Yes, exactly.

> After conversion, the BOM could be discarded or ignored because it no longer has a role to play.

We're required to check for the BOM by the WebVTT spec. What I'm saying is that my code technically converts the data to UTF-16 and checks for a UTF-16 BOM instead of directly checking for a UTF-8 BOM. I don't think this changes anything, but it's worth pointing out.
Comment 46 Glenn Adams 2013-10-08 18:24:45 PDT
(In reply to comment #45)
> (In reply to comment #43)
> > If the originally encoded UTF-8 content has been converted to 16-bit UTF-16, then it is pointless checking for the UTF-8 encoding of BOM since it will have been converted to 0xFEFF.
> 
> Yes, exactly.
> 
> > After conversion, the BOM could be discarded or ignored because it no longer has a role to play.
> 
> We're required to check for the BOM by the WebVTT spec. What I'm saying is that my code technically converts the data to UTF-16 and checks for a UTF-16 BOM instead of directly checking for a UTF-8 BOM. I don't think this changes anything, but it's worth pointing out.

First, there is no specific requirement for use of the BOM. It is simply an optional three byte prefix prior to the string "WEBVTT", and it (BOM) only applies in the UTF-8 encoded form, and further, it is optional.

So, if you've already transcoded from UTF-8 to UTF-16, then if there was a BOM in the UTF-8 source stream, it has been either discarded or transcoded to 0xFEFF. At that point, there is nothing else to do with it. It could have been dropped when transcoding into UTF-16 without changing any behavior.
Comment 47 Glenn Adams 2013-10-08 18:33:36 PDT
(In reply to comment #46)
> (In reply to comment #45)
> > (In reply to comment #43)
> > > If the originally encoded UTF-8 content has been converted to 16-bit UTF-16, then it is pointless checking for the UTF-8 encoding of BOM since it will have been converted to 0xFEFF.
> > 
> > Yes, exactly.
> > 
> > > After conversion, the BOM could be discarded or ignored because it no longer has a role to play.
> > 
> > We're required to check for the BOM by the WebVTT spec. What I'm saying is that my code technically converts the data to UTF-16 and checks for a UTF-16 BOM instead of directly checking for a UTF-8 BOM. I don't think this changes anything, but it's worth pointing out.
> 
> First, there is no specific requirement for use of the BOM. It is simply an optional three byte prefix prior to the string "WEBVTT", and it (BOM) only applies in the UTF-8 encoded form, and further, it is optional.
> 
> So, if you've already transcoded from UTF-8 to UTF-16, then if there was a BOM in the UTF-8 source stream, it has been either discarded or transcoded to 0xFEFF. At that point, there is nothing else to do with it. It could have been dropped when transcoding into UTF-16 without changing any behavior.

Perhaps I should add by quoting Unicode Section 16.8 [1]:

"Systems that use the byte order mark must recognize when an initial U+FEFF signals the byte order. In those cases, it is not part of the textual content and should be removed before processing, because otherwise it may be mistaken for a legitimate zero width no-break space."

So you can discard it if you want.

The only function of the UTF-8 BOM in WebVTT is for signature detection, and not determining byte order.

[1] http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf
Comment 48 Brendan Long 2013-10-08 19:00:32 PDT
Glenn, I wasn't saying that it's a problem (I wouldn't have included it in my patch if I thought it was), I just thought it was something reviewers should be aware of. I think I have an idea of how to handle this better anyway though, so hopefully it won't actually matter.
Comment 49 Glenn Adams 2013-10-08 19:14:56 PDT
(In reply to comment #48)
> Glenn, I wasn't saying that it's a problem (I wouldn't have included it in my patch if I thought it was), I just thought it was something reviewers should be aware of. I think I have an idea of how to handle this better anyway though, so hopefully it won't actually matter.

Sure. Just trying to make sure we're on the same page.
Comment 50 Brendan Long 2013-10-08 19:39:24 PDT
Created attachment 213743 [details]
Use m_state to simplify BOM and Finished handling.

I spent some time today and got DumpRenderTree working on my Mac again (apparently it's just broken in debug builds?), so this time I can verify on my own machine that the tests pass. I removed m_finished and just added a Finished state. I also added a CheckedBOM state, and moved the BOM check to near the top of parseBytes, which lets me check for the BOM while I'm still dealing with the raw data (this is simpler, and much less confusing).
Comment 51 Brendan Long 2013-10-10 01:03:19 PDT
Created attachment 213857 [details]
Only use buffer when needed

Ok, here's one more improvement. It bothered me that my patch was buffering everything, even if we didn't need it. This version only copies data to the buffer if we run out of data without finding a line ending. Because of this, collectNextLine() doesn't need to look at the contents of the buffer, since we already know it doesn't contain any line endings. More importantly, since it always contains one partial line, instead of removing from the front of the vector, we can just clear it, which is significantly more efficient.
Comment 52 Brendan Long 2013-10-10 01:05:58 PDT
I'm also planning to replace the `unsigned* position` parameters with `unsigned& position`, but I'd rather do that separately.
Comment 53 Brendan Long 2013-10-10 09:18:06 PDT
Created attachment 213890 [details]
Simplify hasRequiredFileIdentifier

The last thing that really bothered me in this patch was the substring creation in hasRequiredFileIdentifier. The new version just walks through the string, which avoids the copies and ends up simpler anyway. I also made the test use canplaythrough since the addtrack event was making it flaky. I think I'm actually done this time.
Comment 54 Build Bot 2013-10-10 10:17:52 PDT
Comment on attachment 213890 [details]
Simplify hasRequiredFileIdentifier

Attachment 213890 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/3879050

New failing tests:
media/track/track-cue-rendering-rtl.html
media/track/track-mode.html
media/track/track-cues-pause-on-exit.html
media/track/track-cue-mutable.html
media/track/track-remove-by-setting-innerHTML.html
media/track/track-cue-rendering-with-padding.html
media/track/track-cue-nothing-to-render.html
media/track/track-large-timestamp.html
media/track/track-webvtt-tc002-bom.html
media/track/track-active-cues.html
media/track/track-cue-container-rendering-position.html
Comment 55 Build Bot 2013-10-10 10:17:55 PDT
Created attachment 213898 [details]
Archive of layout-test-results from webkit-ews-13 for mac-mountainlion-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-13  Port: mac-mountainlion-wk2  Platform: Mac OS X 10.8.5
Comment 56 Build Bot 2013-10-10 10:39:10 PDT
Comment on attachment 213890 [details]
Simplify hasRequiredFileIdentifier

Attachment 213890 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/3874044

New failing tests:
media/track/track-cue-rendering-rtl.html
media/track/track-mode.html
media/track/track-cues-pause-on-exit.html
media/track/track-cue-mutable.html
media/track/track-remove-by-setting-innerHTML.html
media/track/track-cue-rendering-with-padding.html
media/track/track-cue-nothing-to-render.html
media/track/track-large-timestamp.html
media/track/track-webvtt-tc002-bom.html
media/track/track-active-cues.html
media/track/track-cue-container-rendering-position.html
Comment 57 Build Bot 2013-10-10 10:39:14 PDT
Created attachment 213900 [details]
Archive of layout-test-results from webkit-ews-04 for mac-mountainlion

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-04  Port: mac-mountainlion  Platform: Mac OS X 10.8.5
Comment 58 Brendan Long 2013-10-10 11:33:07 PDT
Created attachment 213907 [details]
Fix end-of-line handling in hasRequiredFileIdentifier

I forgot to increment the position variable. Since it was hard to notice, I wrote the function differently so it's always obvious what the line position should be.
Comment 59 Brendan Long 2013-10-14 13:54:15 PDT
Is there anything more I need to do on this to get it reviewed/accepted?
Comment 60 Eric Carlson 2013-10-22 11:10:11 PDT
(In reply to comment #59)
> Is there anything more I need to do on this to get it reviewed/accepted?

Sorry, I have been out of the office for two weeks. I will get to this in the next day or two.
Comment 61 WebKit Commit Bot 2013-10-22 11:44:24 PDT
Comment on attachment 213907 [details]
Fix end-of-line handling in hasRequiredFileIdentifier

Clearing flags on attachment: 213907

Committed r157801: <http://trac.webkit.org/changeset/157801>
Comment 62 WebKit Commit Bot 2013-10-22 11:44:31 PDT
All reviewed patches have been landed.  Closing bug.