cue.text appears to lead to 'Unexpected end of input' error.
Go to http://samdutton.net/track/bug/cueText.html
View console output.
Does not appear to be any problem with .vtt file -- have tried resaving, etc. -- sometimes fails on different cues.
Have also tried adding quotes to all values.
19.0.1065.0 canary
(In reply to comment #0)
> cue.text appears to lead to 'Unexpected end of input' error.
>
> Go to http://samdutton.net/track/bug/cueText.html
>
I greatly appreciate your bug reports, but it would be extremely helpful if your test cases also included a .mp4 video file. I investigate all text track bugs and have fixed most of the ones filed in the last six months. Because I use Safari for development, before I can examine one of your test cases I have to download all of the resources it uses, export an MPEG-4 version of your video, and re-host the example.
Don't get me wrong, I would *much* rather have your WebM-only test cases than nothing at all! I only mean that it will save me a significant amount of time if you are able to create test cases with both WebM and MPEG-4 videos.
> I greatly appreciate your bug reports, but it would be extremely helpful if your test cases also included a .mp4 video file.
Great idea -- I've added this to all the test cases in the http://samdutton.net/track/bug directory.
(In reply to comment #2)
> > I greatly appreciate your bug reports, but it would be extremely helpful if your test cases also included a .mp4 video file.
>
> Great idea -- I've added this to all the test cases in the http://samdutton.net/track/bug directory.
Thank you very much!
Created attachment 136218[details]
Archive of layout-test-results from ec2-cr-linux-04
The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-04 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
(In reply to comment #7)
> Created an attachment (id=136218) [details]
> Archive of layout-test-results from ec2-cr-linux-04
>
> The attached test failures were seen while running run-webkit-tests on the chromium-ews.
> Bot: ec2-cr-linux-04 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
The failing media tests are:
track-webvtt-tc028-unsupported-markup.html - will fix
track-cues-pause-on-exit.html - this is https://bugs.webkit.org/show_bug.cgi?id=80067
track-cues-sorted-before-dispatch.html - this is https://bugs.webkit.org/show_bug.cgi?id=77862
Created attachment 137205[details]
Archive of layout-test-results from ec2-cr-linux-03
The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-03 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment on attachment 212756[details]
Rebase patch
Attachment 212756[details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/2617023
New failing tests:
media/track/track-cue-rendering-rtl.html
media/track/track-webvtt-tc022-entities.html
media/track/track-webvtt-tc023-markup.html
media/track/track-long-captions-file.html
media/track/track-webvtt-tc028-unsupported-markup.html
media/track/track-webvtt-tc024-timestamp.html
Created attachment 212763[details]
Archive of layout-test-results from webkit-ews-16 for mac-mountainlion-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-16 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Comment on attachment 212756[details]
Rebase patch
Attachment 212756[details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/2629002
New failing tests:
media/track/track-cue-rendering-rtl.html
media/track/track-webvtt-tc022-entities.html
media/track/track-webvtt-tc023-markup.html
media/track/track-long-captions-file.html
media/track/track-webvtt-tc028-unsupported-markup.html
media/track/track-webvtt-tc024-timestamp.html
Created attachment 212766[details]
Archive of layout-test-results from webkit-ews-03 for mac-mountainlion
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-03 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Comment on attachment 212756[details]
Rebase patch
Attachment 212756[details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/2635024
New failing tests:
media/track/track-cue-rendering-rtl.html
media/track/track-webvtt-tc022-entities.html
media/track/track-webvtt-tc023-markup.html
media/track/track-long-captions-file.html
media/track/track-webvtt-tc028-unsupported-markup.html
media/track/track-webvtt-tc024-timestamp.html
Created attachment 212772[details]
Archive of layout-test-results from webkit-ews-07 for mac-mountainlion
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-07 Port: mac-mountainlion Platform: Mac OS X 10.8.5
According to the spec[1]:
> Line breaks in cues are honored. User agents will also insert extra line breaks if necessary to fit the cue in the cue's width. In general, therefore, authors are encouraged to write cues all on one line except when a line break is definitely necessary, and to not manually line-wrap for aesthetic reasons alone.
So, the test change where \n is removed from the expected cues seems to be wrong..
[1] http://dev.w3.org/html5/webvtt/#cues-with-multiple-lines
Created attachment 213632[details]
Buffer data in WebVTTParser
The running theme in a lot of bug reports (see also #95691 and possibly #97097) is that we cut off parsing in the wrong place. This patch adds a buffer. For some reason the test gets a cross-origin error on WebKitGTK though, and I'm not entirely sure if just removing deprecatedDidReceiveCachedResource is correct (although it seems to work, and the FIXME comment suggests that that's what we should do).
Created attachment 213641[details]
Archive of layout-test-results from webkit-ews-16 for mac-mountainlion-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-16 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Created attachment 213643[details]
Archive of layout-test-results from webkit-ews-01 for mac-mountainlion
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-01 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Created attachment 213645[details]
Archive of layout-test-results from webkit-ews-09 for mac-mountainlion-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-09 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Created attachment 213646[details]
Archive of layout-test-results from webkit-ews-08 for mac-mountainlion
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-08 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Created attachment 213710[details]
Fix buffer logic
My original version of this patch had a lot of little problems. deprecatedDidReceiveCachedResource() is definitely needed (although maybe it can be replaced with dataReceived()?). With that in place, m_cueParser->fileFinished() needs to go in notifyFinished() (so it doesn't get called every time we get data). I also added m_finished with an ASSERT so this doesn't bite anyone else in the future. I also messed up the handling of line breaks: According to the spec, any of \r, \n, and \r\n are all allowed. The current patch should fix all of these things, and the test passes on WebKitGTK (I'm building a different patch on my Mac, but I'll go to that if this fails on EWS again).
Created attachment 213716[details]
Archive of layout-test-results from webkit-ews-13 for mac-mountainlion-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-13 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Created attachment 213718[details]
Archive of layout-test-results from webkit-ews-07 for mac-mountainlion
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-07 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Created attachment 213723[details]
Fix BOM and add NULL check for m_parser
There were two remaining problems: Because this patch converts to a String up-front, the BOM gets converted to a UTF-16 BOM (0xFEFF instead of 0xEFBBBF). This solution handles that comparing against String::fromUTF8(xEFxBBFF). I could do a more complicated version if you think it's important to explicitly check the original data for 0xEFBBFF. The other problem was that I didn't check if m_parser existed before using it in TrackLoader::notifyFinished().
(In reply to comment #39)
> Created an attachment (id=213723) [details]
> Fix BOM and add NULL check for m_parser
>
> There were two remaining problems: Because this patch converts to a String up-front, the BOM gets converted to a UTF-16 BOM (0xFEFF instead of 0xEFBBBF). This solution handles that comparing against String::fromUTF8(xEFxBBFF). I could do a more complicated version if you think it's important to explicitly check the original data for 0xEFBBFF. The other problem was that I didn't check if m_parser existed before using it in TrackLoader::notifyFinished().
What do you mean by "check the original data for 0xEFBBBF"? Do you mean check UTF-8 encoded string for the presence of the three contiguous bytes 0xEF 0xBB 0xBF
(In reply to comment #40)
> What do you mean by "check the original data for 0xEFBBBF"? Do you mean check UTF-8 encoded string for the presence of the three contiguous bytes 0xEF 0xBB 0xBF
I mean that technically we can't check `line` for the UTF-8 BOM because it's a UTF-16 String, and UTF-16 and UTF-8 have different BOMs. The way this was handled previously was by using a Vector<char> instead of `line`. That seems wasteful now that we buffer everything, but I could take that approach if reviewers don't like this.
I'm pretty sure it makes no difference, but I figured I'd mention it since it's "surprising".
(In reply to comment #42)
> (In reply to comment #40)
> > What do you mean by "check the original data for 0xEFBBBF"? Do you mean check UTF-8 encoded string for the presence of the three contiguous bytes 0xEF 0xBB 0xBF
>
> I mean that technically we can't check `line` for the UTF-8 BOM because it's a UTF-16 String, and UTF-16 and UTF-8 have different BOMs. The way this was handled previously was by using a Vector<char> instead of `line`. That seems wasteful now that we buffer everything, but I could take that approach if reviewers don't like this.
>
> I'm pretty sure it makes no difference, but I figured I'd mention it since it's "surprising".
If the originally encoded UTF-8 content has been converted to 16-bit UTF-16, then it is pointless checking for the UTF-8 encoding of BOM since it will have been converted to 0xFEFF. After conversion, the BOM could be discarded or ignored because it no longer has a role to play.
(In reply to comment #43)
> If the originally encoded UTF-8 content has been converted to 16-bit UTF-16, then it is pointless checking for the UTF-8 encoding of BOM since it will have been converted to 0xFEFF.
Yes, exactly.
> After conversion, the BOM could be discarded or ignored because it no longer has a role to play.
We're required to check for the BOM by the WebVTT spec. What I'm saying is that my code technically converts the data to UTF-16 and checks for a UTF-16 BOM instead of directly checking for a UTF-8 BOM. I don't think this changes anything, but it's worth pointing out.
(In reply to comment #45)
> (In reply to comment #43)
> > If the originally encoded UTF-8 content has been converted to 16-bit UTF-16, then it is pointless checking for the UTF-8 encoding of BOM since it will have been converted to 0xFEFF.
>
> Yes, exactly.
>
> > After conversion, the BOM could be discarded or ignored because it no longer has a role to play.
>
> We're required to check for the BOM by the WebVTT spec. What I'm saying is that my code technically converts the data to UTF-16 and checks for a UTF-16 BOM instead of directly checking for a UTF-8 BOM. I don't think this changes anything, but it's worth pointing out.
First, there is no specific requirement for use of the BOM. It is simply an optional three byte prefix prior to the string "WEBVTT", and it (BOM) only applies in the UTF-8 encoded form, and further, it is optional.
So, if you've already transcoded from UTF-8 to UTF-16, then if there was a BOM in the UTF-8 source stream, it has been either discarded or transcoded to 0xFEFF. At that point, there is nothing else to do with it. It could have been dropped when transcoding into UTF-16 without changing any behavior.
(In reply to comment #46)
> (In reply to comment #45)
> > (In reply to comment #43)
> > > If the originally encoded UTF-8 content has been converted to 16-bit UTF-16, then it is pointless checking for the UTF-8 encoding of BOM since it will have been converted to 0xFEFF.
> >
> > Yes, exactly.
> >
> > > After conversion, the BOM could be discarded or ignored because it no longer has a role to play.
> >
> > We're required to check for the BOM by the WebVTT spec. What I'm saying is that my code technically converts the data to UTF-16 and checks for a UTF-16 BOM instead of directly checking for a UTF-8 BOM. I don't think this changes anything, but it's worth pointing out.
>
> First, there is no specific requirement for use of the BOM. It is simply an optional three byte prefix prior to the string "WEBVTT", and it (BOM) only applies in the UTF-8 encoded form, and further, it is optional.
>
> So, if you've already transcoded from UTF-8 to UTF-16, then if there was a BOM in the UTF-8 source stream, it has been either discarded or transcoded to 0xFEFF. At that point, there is nothing else to do with it. It could have been dropped when transcoding into UTF-16 without changing any behavior.
Perhaps I should add by quoting Unicode Section 16.8 [1]:
"Systems that use the byte order mark must recognize when an initial U+FEFF signals the byte order. In those cases, it is not part of the textual content and should be removed before processing, because otherwise it may be mistaken for a legitimate zero width no-break space."
So you can discard it if you want.
The only function of the UTF-8 BOM in WebVTT is for signature detection, and not determining byte order.
[1] http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf
Glenn, I wasn't saying that it's a problem (I wouldn't have included it in my patch if I thought it was), I just thought it was something reviewers should be aware of. I think I have an idea of how to handle this better anyway though, so hopefully it won't actually matter.
(In reply to comment #48)
> Glenn, I wasn't saying that it's a problem (I wouldn't have included it in my patch if I thought it was), I just thought it was something reviewers should be aware of. I think I have an idea of how to handle this better anyway though, so hopefully it won't actually matter.
Sure. Just trying to make sure we're on the same page.
Created attachment 213743[details]
Use m_state to simplify BOM and Finished handling.
I spent some time today and got DumpRenderTree working on my Mac again (apparently it's just broken in debug builds?), so this time I can verify on my own machine that the tests pass. I removed m_finished and just added a Finished state. I also added a CheckedBOM state, and moved the BOM check to near the top of parseBytes, which lets me check for the BOM while I'm still dealing with the raw data (this is simpler, and much less confusing).
Created attachment 213857[details]
Only use buffer when needed
Ok, here's one more improvement. It bothered me that my patch was buffering everything, even if we didn't need it. This version only copies data to the buffer if we run out of data without finding a line ending. Because of this, collectNextLine() doesn't need to look at the contents of the buffer, since we already know it doesn't contain any line endings. More importantly, since it always contains one partial line, instead of removing from the front of the vector, we can just clear it, which is significantly more efficient.
Created attachment 213890[details]
Simplify hasRequiredFileIdentifier
The last thing that really bothered me in this patch was the substring creation in hasRequiredFileIdentifier. The new version just walks through the string, which avoids the copies and ends up simpler anyway. I also made the test use canplaythrough since the addtrack event was making it flaky. I think I'm actually done this time.
Created attachment 213898[details]
Archive of layout-test-results from webkit-ews-13 for mac-mountainlion-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-13 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Created attachment 213900[details]
Archive of layout-test-results from webkit-ews-04 for mac-mountainlion
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-04 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Created attachment 213907[details]
Fix end-of-line handling in hasRequiredFileIdentifier
I forgot to increment the position variable. Since it was hard to notice, I wrote the function differently so it's always obvious what the line position should be.
(In reply to comment #59)
> Is there anything more I need to do on this to get it reviewed/accepted?
Sorry, I have been out of the office for two weeks. I will get to this in the next day or two.
2012-04-06 18:15 PDT, Eric Carlson
2012-04-09 07:06 PDT, WebKit Review Bot
2012-04-13 15:22 PDT, Eric Carlson
2012-04-14 04:45 PDT, WebKit Review Bot
2013-09-26 15:02 PDT, Brendan Long
2013-09-26 15:54 PDT, Build Bot
2013-09-26 16:18 PDT, Build Bot
2013-09-26 17:01 PDT, Eric Carlson
2013-09-26 17:24 PDT, Build Bot
2013-10-07 17:47 PDT, Brendan Long
2013-10-07 19:51 PDT, Build Bot
2013-10-07 20:13 PDT, Build Bot
2013-10-07 20:30 PDT, Build Bot
2013-10-07 20:33 PDT, Build Bot
2013-10-08 13:11 PDT, Brendan Long
2013-10-08 13:51 PDT, Build Bot
2013-10-08 14:00 PDT, Build Bot
2013-10-08 15:01 PDT, Brendan Long
2013-10-08 19:39 PDT, Brendan Long
2013-10-10 01:03 PDT, Brendan Long
2013-10-10 09:18 PDT, Brendan Long
2013-10-10 10:17 PDT, Build Bot
2013-10-10 10:39 PDT, Build Bot
2013-10-10 11:33 PDT, Brendan Long