Bug 201086

Summary: Privacy issue and spec violation in handling URLs in WebVTT STYLE blocks
Product: WebKit Reporter: Simon Pieters (:zcorpan) <zcorpan>
Component: MediaAssignee: Brent Fulgham <bfulgham>
Status: RESOLVED FIXED    
Severity: Normal CC: allan.jensen, benjamin, bfulgham, calvaris, cdumez, darin, dbates, eric.carlson, esprehn+autocc, ews-watchlist, glenn, gyuyoung.kim, jer.noble, kangil.han, koivisto, macpherson, menard, philipj, pvollan, sam, webkit-bug-importer, wilander
Priority: P2 Keywords: InRadar
Version: Safari Technology Preview   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch v2
none
Patch
none
Patch
none
Patch
darin: review+
patch
none
patch
none
patch bfulgham: review+

Simon Pieters (:zcorpan)
Reported 2019-08-23 13:22:29 PDT
Test: https://zcorpan.github.io/live-webvtt-viewer/#vtt=WEBVTT%0A%0ASTYLE%0A%3A%3Acue+%7B+background-image%3A+url(https%3A%2F%2Fwww.apple.com%2Fac%2Fflags%2F1%2Fimages%2Fus%2F32.png)%3B+%7D%0A%0A00%3A00%3A00.000+--%3E+00%3A00%3A10.000%0AHello+world.%0A The WebVTT file is: WEBVTT STYLE ::cue { background-image: url(https://www.apple.com/ac/flags/1/images/us/32.png); } 00:00:00.000 --> 00:00:10.000 Hello world. In Safari TP Release 90 (Safari 13.1, WebKit 14609.1.2.2) on macOS Mojave 10.14.6, this fetches the URL and applies the background image. This violates the spec and is a privacy problem as it allows third party tracking of how users watch videos. The WebVTT specification states: > For the purpose of resolving URLs in STYLE blocks of a WebVTT file, or any URLs in resources referenced from STYLE blocks of a WebVTT file, if the URL’s scheme is not "data", then the user agent must act as if the URL failed to resolve. > > Supporting external resources with @import or background-image would be a new ability for media elements and track elements to issue network requests as the user watches the video, which could be a privacy issue. https://w3c.github.io/webvtt/#obtaining-css-boxes > Such style sheets cannot fetch any external resources, and it is important for privacy that user agents do not allow this. Otherwise, WebVTT files could be authored such that a third party is notified when the user watches a particular video, and even the current time in that video. https://w3c.github.io/webvtt/#styling-security
Attachments
Patch (1.97 KB, patch)
2020-01-20 12:49 PST, Brent Fulgham
no flags
Patch v2 (5.49 KB, patch)
2020-01-21 15:00 PST, Brent Fulgham
no flags
Patch (8.30 KB, patch)
2020-01-22 21:36 PST, Brent Fulgham
no flags
Patch (17.11 KB, patch)
2020-01-23 09:22 PST, Brent Fulgham
no flags
Patch (17.10 KB, patch)
2020-01-23 15:32 PST, Brent Fulgham
darin: review+
patch (12.38 KB, patch)
2020-01-27 05:18 PST, Antti Koivisto
no flags
patch (9.96 KB, patch)
2020-01-27 06:30 PST, Antti Koivisto
no flags
patch (9.92 KB, patch)
2020-01-27 06:43 PST, Antti Koivisto
bfulgham: review+
Radar WebKit Bug Importer
Comment 1 2019-08-23 15:52:46 PDT
Brent Fulgham
Comment 2 2020-01-20 12:49:09 PST
Brent Fulgham
Comment 3 2020-01-20 12:54:59 PST
Comment on attachment 388250 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=388250&action=review > Source/WebCore/html/track/WebVTTParser.cpp:403 > return true; @Antti: On lines 381 -385 we check for 'namespace' and 'import' rules, because those are not allowed by the VTT specification. Would it be better if we passed an AllowedRulesType to StyleSheetContents::create() that blocked them? I can't tell from the header if that would mean using 'AllowCharsetRules' or if that means 'RegularRules'. > Source/WebCore/html/track/WebVTTParser.cpp:410 > + return true; @Antti: The rules set we get back from the parser only gives us read-only access to the styles, so we can't follow the spec here and switch the non-data URI to some 'non-resolvable' URL. My patch here solves the privacy problem by throwing out the whole style sheet, but that doesn't really follow the spec, which says we should act as if that image URL was non-resolvable. What can I do here?
Brent Fulgham
Comment 4 2020-01-20 12:55:50 PST
CSS experts: I would appreciate any guidance on the two questions I asked in the review comments above.
Sam Weinig
Comment 5 2020-01-20 16:09:21 PST
Seems like this should either be a new bit on the CSSParserContext or a new CSSParserMode, and the checking should be done in the parser. Does this restriction really only apply to background-image? I would gather from the spec text it should be applied for any external resource.
Simon Pieters (:zcorpan)
Comment 6 2020-01-21 03:03:39 PST
Yes, it applies to any external resource. CSS has many ways to fetch things (images, fonts, other stylehsets, in the future maybe also color profiles or whatever). I think it would be ideal to block fetches or URL parsing in a way that catches everything, including future additions to CSS. There is no restriction for @namespace rules. They don't fetch anything, and aren't even resolved as URLs. > A URI string parsed from the URI syntax must be treated as a literal string: as with the STRING syntax, no URI-specific normalization is applied. https://drafts.csswg.org/css-namespaces-3/#syntax @import rules are also allowed, with data: URLs, though there's no direct use case for that.
Simon Pieters (:zcorpan)
Comment 7 2020-01-21 03:13:18 PST
(In reply to Brent Fulgham from comment #3) > @Antti: The rules set we get back from the parser only gives us read-only > access to the styles, so we can't follow the spec here and switch the > non-data URI to some 'non-resolvable' URL. To clarify what the spec requires: it doesn't require that you change the URL. The CSSOM isn't exposed to the page, so the only thing that can be observed is whether a fetch happens. Acting as if the URL fails to resolve means that a fetch is not allowed to happen.
Brent Fulgham
Comment 8 2020-01-21 11:06:05 PST
(In reply to Sam Weinig from comment #5) > Seems like this should either be a new bit on the CSSParserContext or a new > CSSParserMode, and the checking should be done in the parser. > > Does this restriction really only apply to background-image? I would gather > from the spec text it should be applied for any external resource. Ideally, I think we would be able to run the CSSParser in a mode where it only accepted data URI.
Brent Fulgham
Comment 9 2020-01-21 11:06:59 PST
(In reply to Simon Pieters from comment #7) > (In reply to Brent Fulgham from comment #3) > > @Antti: The rules set we get back from the parser only gives us read-only > > access to the styles, so we can't follow the spec here and switch the > > non-data URI to some 'non-resolvable' URL. > > To clarify what the spec requires: it doesn't require that you change the > URL. The CSSOM isn't exposed to the page, so the only thing that can be > observed is whether a fetch happens. Acting as if the URL fails to resolve > means that a fetch is not allowed to happen. Understood. I was speaking in terms of the implementation, where the obvious approach would be to discard the URL in cases where it was not allowed. But I agree with Sam that the right approach is to revise the parser so that we can operate in a mode that does not allow external loads.
Antti Koivisto
Comment 10 2020-01-21 13:34:30 PST
WebVTT specific parser mode is one good approach. I think everything goes via CSSParserContext::completeURL(). We can probably just safely empty non-data URLs there. The mode could also ignore/bail out for other stuff that is not valid for WebVTT. Another approach would be to avoid loading in Style::loadPendingResources. The code can probably somehow find out it is a WebVTT context and avoid loading non-data URLs. This approach still needs to prevent font, @import etc loads by other means. The current patch is not good, there are many properties besides background-image that load resources.
Brent Fulgham
Comment 11 2020-01-21 15:00:15 PST
Created attachment 388357 [details] Patch v2 Patch version 2: Modify the CSS Parser to exclude non-data URI.
Darin Adler
Comment 12 2020-01-21 15:18:08 PST
Comment on attachment 388357 [details] Patch v2 View in context: https://bugs.webkit.org/attachment.cgi?id=388357&action=review Seems too easy to forget to add this check. Seems like we should make completeURL and just add null checks at the call site > Source/WebCore/css/parser/CSSParserMode.h:46 > + // WebVTT places limitations on external resources Add a period to match the style of the comments above? > Source/WebCore/css/parser/CSSParserMode.h:78 > + return cssParserMode == UASheetMode || cssParserMode == HTMLStandardMode || cssParserMode == SVGAttributeMode || cssParserMode == WebVTTMode; If we use switch statements we get warnings when we forget to cover new modes. Is this really the only place that we check the mode against HTMLStandardMode? > Source/WebCore/css/parser/CSSPropertyParser.cpp:4328 > + if (uri.isNull()) Why the change to call this “uri”? > Source/WebCore/css/parser/CSSPropertyParser.cpp:4333 > + return nullptr; This is indented 5 instead of 4.
Darin Adler
Comment 13 2020-01-21 15:49:05 PST
I meant that we should make completeURL do the check and return null. The have the callers check for null.
Brent Fulgham
Comment 14 2020-01-21 17:27:49 PST
Comment on attachment 388357 [details] Patch v2 View in context: https://bugs.webkit.org/attachment.cgi?id=388357&action=review >> Source/WebCore/css/parser/CSSParserMode.h:46 >> + // WebVTT places limitations on external resources > > Add a period to match the style of the comments above? Yes >> Source/WebCore/css/parser/CSSParserMode.h:78 >> + return cssParserMode == UASheetMode || cssParserMode == HTMLStandardMode || cssParserMode == SVGAttributeMode || cssParserMode == WebVTTMode; > > If we use switch statements we get warnings when we forget to cover new modes. Is this really the only place that we check the mode against HTMLStandardMode? I'll double-check. I'm not familiar with this code, so could have easily missed something. >> Source/WebCore/css/parser/CSSPropertyParser.cpp:4328 >> + if (uri.isNull()) > > Why the change to call this “uri”? I wanted to use 'url' below, and other places this pattern had 'uri' for the string, and 'url' for the URL object. But your later comment sort of removes the need for this. >> Source/WebCore/css/parser/CSSPropertyParser.cpp:4333 >> + return nullptr; > > This is indented 5 instead of 4. :-(
Brent Fulgham
Comment 15 2020-01-21 17:35:28 PST
(In reply to Darin Adler from comment #12) > Comment on attachment 388357 [details] > Patch v2 > > View in context: > https://bugs.webkit.org/attachment.cgi?id=388357&action=review > > Seems too easy to forget to add this check. Seems like we should make > completeURL and just add null checks at the call site I didn't do that at first because 'completeURL' returns a url object that is just blindly consumed by the callers. It seemed like these various use cases would exit early when the URL string was empty, which didn't give me an opportunity to wipe out the URL for the non-data URL case. Maybe I should change 'completeURL' to return an Optional<URL> and null it out if the rule is violated? Or, I could just check that the returned URL is not 'isNull()' as you suggest. That's probably better.
Antti Koivisto
Comment 16 2020-01-22 00:01:30 PST
Why can't you just return an invalid URL ("return URL()" that is) from completeURL? That already happens on parse failure so the case should be fully handled through the code. It would be the simplest solution and also makes logical sense as the URL is invalid in this context.
Brent Fulgham
Comment 17 2020-01-22 21:36:06 PST
Darin Adler
Comment 18 2020-01-22 21:48:01 PST
Comment on attachment 388517 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=388517&action=review > Source/WebCore/css/parser/CSSParserMode.h:88 > + return false; Could add ASSERT_NOT_REACHED here. > Source/WebCore/dom/InlineStyleSheetOwner.cpp:62 > + if (auto* parent = element.parentNode()) { > + if (is<VTTCueBox>(*parent)) > + result.mode = WebVTTMode; > + } This changes a lot of behavior, doesn’t just add the restrictions on external URLs, since this overwrites UASheetMode with a mode that is treated like HTMLStandardMode. Is all of that change desirable? If so, do we have test coverage for these desirable change? > Source/WebCore/dom/Node.h:201 > + virtual bool isWebVTTCueBoxElement() const { return false; } Unnecessary to add the word "Element" to the function name here, since the class name is WebVTTCueBox. > Source/WebCore/html/track/VTTCue.h:72 > + bool isWebVTTCueBoxElement() const override { return true; } Could be final rather than override.
Antti Koivisto
Comment 19 2020-01-23 05:18:29 PST
Comment on attachment 388517 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=388517&action=review > Source/WebCore/dom/InlineStyleSheetOwner.cpp:61 > + if (shadowRoot && shadowRoot->mode() == ShadowRootMode::UserAgent) { > result.mode = UASheetMode; > + if (auto* parent = element.parentNode()) { > + if (is<VTTCueBox>(*parent)) > + result.mode = WebVTTMode; What is this change trying to achieve? The bug is about parsing WebVTT STYLE blocks, those are handled by the change in WebVTTParser.
Brent Fulgham
Comment 20 2020-01-23 08:59:32 PST
Comment on attachment 388517 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=388517&action=review >> Source/WebCore/css/parser/CSSParserMode.h:88 >> + return false; > > Could add ASSERT_NOT_REACHED here. Will do. >> Source/WebCore/dom/InlineStyleSheetOwner.cpp:62 >> + } > > This changes a lot of behavior, doesn’t just add the restrictions on external URLs, since this overwrites UASheetMode with a mode that is treated like HTMLStandardMode. Is all of that change desirable? If so, do we have test coverage for these desirable change? If the Style element being processed is for a VTTCueBox, we need the CSSParserContext to operate in WebVTT mode. Otherwise it will take the CSS style that was approved by WebVTTParser.cpp and interpret it in UASheetMode, which will allow the URLs to be loaded. This code is somewhat confusing to me, because the WebVTTParser class really seems to be more of a validator that accepts or rejects the provided style. If it accepts the style, it passes the string (not the parsed style object) to the rest of the system. Once we reach this point, we have a string representing CSS (complete with full URLs) that is then re-processed as UASheetMode. The change here is meant to make the CSS handling honor the WebVTT restrictions. Maybe a better approach would be for WebVTTParser to throw away the string it was passed, and generate a new string (which has been sanitized to match WebVTT requirements). An even better approach would be for the WebVTTParser to produce a style object that could be used by later code, rather than this existing design that causes all WebVTT styles to be parsed two times. >> Source/WebCore/html/track/VTTCue.h:72 >> + bool isWebVTTCueBoxElement() const override { return true; } > > Could be final rather than override. Will do.
Brent Fulgham
Comment 21 2020-01-23 09:14:59 PST
Comment on attachment 388517 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=388517&action=review >>> Source/WebCore/dom/InlineStyleSheetOwner.cpp:62 >>> + } >> >> This changes a lot of behavior, doesn’t just add the restrictions on external URLs, since this overwrites UASheetMode with a mode that is treated like HTMLStandardMode. Is all of that change desirable? If so, do we have test coverage for these desirable change? > > If the Style element being processed is for a VTTCueBox, we need the CSSParserContext to operate in WebVTT mode. Otherwise it will take the CSS style that was approved by WebVTTParser.cpp and interpret it in UASheetMode, which will allow the URLs to be loaded. > > This code is somewhat confusing to me, because the WebVTTParser class really seems to be more of a validator that accepts or rejects the provided style. If it accepts the style, it passes the string (not the parsed style object) to the rest of the system. Once we reach this point, we have a string representing CSS (complete with full URLs) that is then re-processed as UASheetMode. > > The change here is meant to make the CSS handling honor the WebVTT restrictions. > > Maybe a better approach would be for WebVTTParser to throw away the string it was passed, and generate a new string (which has been sanitized to match WebVTT requirements). > > An even better approach would be for the WebVTTParser to produce a style object that could be used by later code, rather than this existing design that causes all WebVTT styles to be parsed two times. I'm uploading a modified patch that changes the various tests for "mode == UASheetMode" to the existing predicate "isUASheetBehavior", which I've modified to understand WebVTTMode, too. This should prevent any fallout from that part of the change. In practice, WebVTT styles use is fairly limited, so we probably wouldn't have seen much impact (the change would not make UASheetMode fail to work as expected), and it doesn't seem like WebVTT styles ever use things that are not HTMLStandardMode -- though I do notice that UASheetMode does not seem to allow deferred style, so making sure we enforce UASheetMode for WebVTTMode is probably wise.
Brent Fulgham
Comment 22 2020-01-23 09:22:38 PST
Brent Fulgham
Comment 23 2020-01-23 15:32:56 PST
Darin Adler
Comment 24 2020-01-23 18:23:02 PST
Comment on attachment 388602 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=388602&action=review > Source/WebCore/css/StyleSheetContents.cpp:347 > + p.parseSheet(this, sheetText, !isUASheetBehavior(parserContext().mode) ? CSSParser::RuleParsing::Deferred : CSSParser::RuleParsing::Normal); Might want to reverse these two rather than having a ! in front of the function call. So easy to miss that. > Source/WebCore/css/parser/CSSParserMode.h:47 > // User agent stylesheets are parsed in standards mode but also allows internal properties and values. > - UASheetMode > + UASheetMode, > + // WebVTT places limitations on external resources. > + WebVTTMode Why do we want to allow internal properties and values in a VTT style? > Source/WebCore/html/track/WebVTTParser.cpp:376 > - auto contents = StyleSheetContents::create(); > + auto contents = StyleSheetContents::create(CSSParserContext(WebVTTMode)); This is now a much bigger change than before. It changes this to UA sheet behavior. Is there anything bad with that change?
Darin Adler
Comment 25 2020-01-26 21:25:26 PST
Comment on attachment 388602 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=388602&action=review I’m going to say r=me but I do still have open questions listed above. > Source/WebCore/css/parser/CSSParserContext.h:75 > + URL completeURL(const String& url) const; I think it’s a little strange that this function member is stuck in between two data members of this struct. I’d move it down so it’s last for style’s sake. > Source/WebCore/css/parser/CSSParserContext.h:78 > }; Should come back and fix some other things in this header. For example, I don’t think CSSParserContextHash::hash should be inlined in the header; should be moved to the .cpp file. And we should use WTF_MAKE_STRUCT_FAST_ALLOCATED so we don't need to say "public:". >> Source/WebCore/css/parser/CSSParserMode.h:47 >> + WebVTTMode > > Why do we want to allow internal properties and values in a VTT style? I really don’t understand why a WebVTT stylesheet is more like the user agent stylesheet in this respect. The special user-agent stylesheet only internal stuff seems like it should not be allowed in WebVTT. > Source/WebCore/dom/InlineStyleSheetOwner.cpp:62 > + if (auto* parent = element.parentNode()) { > + if (is<VTTCueBox>(*parent)) > + result.mode = WebVTTMode; > + } Seems too mysterious to do this without a comment.
Antti Koivisto
Comment 26 2020-01-27 01:05:23 PST
Comment on attachment 388602 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=388602&action=review >> Source/WebCore/dom/InlineStyleSheetOwner.cpp:62 >> result.mode = UASheetMode; >> + if (auto* parent = element.parentNode()) { >> + if (is<VTTCueBox>(*parent)) >> + result.mode = WebVTTMode; >> + } > > Seems too mysterious to do this without a comment. This is too hackish to put on a core code path. I think a better approach is to re-serialize the parsed stylesheet with data URLs cleaned up. Brent, how about if I finish this patch?
Antti Koivisto
Comment 27 2020-01-27 05:18:30 PST
Antti Koivisto
Comment 28 2020-01-27 06:30:27 PST
Antti Koivisto
Comment 29 2020-01-27 06:43:52 PST
Brent Fulgham
Comment 30 2020-01-27 09:26:14 PST
Comment on attachment 388855 [details] patch Yes, this is much better. r=me.
Antti Koivisto
Comment 31 2020-01-27 09:41:15 PST
Darin Adler
Comment 32 2020-01-27 12:31:26 PST
Comment on attachment 388855 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=388855&action=review > Source/WebCore/html/track/WebVTTParser.cpp:416 > + sanitizedStyleSheetBuilder.append(selectorText); > + sanitizedStyleSheetBuilder.appendLiteral(" { "); > + sanitizedStyleSheetBuilder.append(styleRule.properties().asText()); > + sanitizedStyleSheetBuilder.appendLiteral(" }\n"); This can be done on a single line: sanitizedStyleSheetBuilder.append(selectorText, " { ", styleRule.properties().asText(), " }\n");
Antti Koivisto
Comment 33 2020-01-27 23:27:23 PST
> This can be done on a single line: > > sanitizedStyleSheetBuilder.append(selectorText, " { ", > styleRule.properties().asText(), " }\n"); Fixed in https://trac.webkit.org/r255227
Note You need to log in before you can comment on or make changes to this bug.