Bug 201086 - Privacy issue and spec violation in handling URLs in WebVTT STYLE blocks
Summary: Privacy issue and spec violation in handling URLs in WebVTT STYLE blocks
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Media (show other bugs)
Version: Safari Technology Preview
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Brent Fulgham
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-08-23 13:22 PDT by Simon Pieters (:zcorpan)
Modified: 2020-01-27 23:27 PST (History)
22 users (show)

See Also:


Attachments
Patch (1.97 KB, patch)
2020-01-20 12:49 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch v2 (5.49 KB, patch)
2020-01-21 15:00 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch (8.30 KB, patch)
2020-01-22 21:36 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch (17.11 KB, patch)
2020-01-23 09:22 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch (17.10 KB, patch)
2020-01-23 15:32 PST, Brent Fulgham
darin: review+
Details | Formatted Diff | Diff
patch (12.38 KB, patch)
2020-01-27 05:18 PST, Antti Koivisto
no flags Details | Formatted Diff | Diff
patch (9.96 KB, patch)
2020-01-27 06:30 PST, Antti Koivisto
no flags Details | Formatted Diff | Diff
patch (9.92 KB, patch)
2020-01-27 06:43 PST, Antti Koivisto
bfulgham: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Pieters (:zcorpan) 2019-08-23 13:22:29 PDT
Test: https://zcorpan.github.io/live-webvtt-viewer/#vtt=WEBVTT%0A%0ASTYLE%0A%3A%3Acue+%7B+background-image%3A+url(https%3A%2F%2Fwww.apple.com%2Fac%2Fflags%2F1%2Fimages%2Fus%2F32.png)%3B+%7D%0A%0A00%3A00%3A00.000+--%3E+00%3A00%3A10.000%0AHello+world.%0A

The WebVTT file is:

WEBVTT

STYLE
::cue { background-image: url(https://www.apple.com/ac/flags/1/images/us/32.png); }

00:00:00.000 --> 00:00:10.000
Hello world.

In Safari TP Release 90 (Safari 13.1, WebKit 14609.1.2.2) on macOS Mojave 10.14.6, this fetches the URL and applies the background image. This violates the spec and is a privacy problem as it allows third party tracking of how users watch videos.


The WebVTT specification states:

> For the purpose of resolving URLs in STYLE blocks of a WebVTT file, or any URLs in resources referenced from STYLE blocks of a WebVTT file, if the URL’s scheme is not "data", then the user agent must act as if the URL failed to resolve.
>
> Supporting external resources with @import or background-image would be a new ability for media elements and track elements to issue network requests as the user watches the video, which could be a privacy issue.

https://w3c.github.io/webvtt/#obtaining-css-boxes

> Such style sheets cannot fetch any external resources, and it is important for privacy that user agents do not allow this. Otherwise, WebVTT files could be authored such that a third party is notified when the user watches a particular video, and even the current time in that video.

https://w3c.github.io/webvtt/#styling-security
Comment 1 Radar WebKit Bug Importer 2019-08-23 15:52:46 PDT
<rdar://problem/54658121>
Comment 2 Brent Fulgham 2020-01-20 12:49:09 PST
Created attachment 388250 [details]
Patch
Comment 3 Brent Fulgham 2020-01-20 12:54:59 PST
Comment on attachment 388250 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=388250&action=review

> Source/WebCore/html/track/WebVTTParser.cpp:403
>              return true;

@Antti: On lines 381 -385 we check for 'namespace' and 'import' rules, because those are not allowed by the VTT specification.

Would it be better if we passed an AllowedRulesType to StyleSheetContents::create() that blocked them? I can't tell from the header if that would mean using 'AllowCharsetRules' or if that means 'RegularRules'.

> Source/WebCore/html/track/WebVTTParser.cpp:410
> +                    return true;

@Antti: The rules set we get back from the parser only gives us read-only access to the styles, so we can't follow the spec here and switch the non-data URI to some 'non-resolvable' URL.

My patch here solves the privacy problem by throwing out the whole style sheet, but that doesn't really follow the spec, which says we should act as if that image URL was non-resolvable.

What can I do here?
Comment 4 Brent Fulgham 2020-01-20 12:55:50 PST
CSS experts: I would appreciate any guidance on the two questions I asked in the review comments above.
Comment 5 Sam Weinig 2020-01-20 16:09:21 PST
Seems like this should either be a new bit on the CSSParserContext or a new CSSParserMode, and the checking should be done in the parser.

Does this restriction really only apply to background-image? I would gather from the spec text it should be applied for any external resource.
Comment 6 Simon Pieters (:zcorpan) 2020-01-21 03:03:39 PST
Yes, it applies to any external resource. CSS has many ways to fetch things (images, fonts, other stylehsets, in the future maybe also color profiles or whatever). I think it would be ideal to block fetches or URL parsing in a way that catches everything, including future additions to CSS.

There is no restriction for @namespace rules. They don't fetch anything, and aren't even resolved as URLs.

> A URI string parsed from the URI syntax must be treated as a literal string: as with the STRING syntax, no URI-specific normalization is applied.
https://drafts.csswg.org/css-namespaces-3/#syntax

@import rules are also allowed, with data: URLs, though there's no direct use case for that.
Comment 7 Simon Pieters (:zcorpan) 2020-01-21 03:13:18 PST
(In reply to Brent Fulgham from comment #3)
> @Antti: The rules set we get back from the parser only gives us read-only
> access to the styles, so we can't follow the spec here and switch the
> non-data URI to some 'non-resolvable' URL.

To clarify what the spec requires: it doesn't require that you change the URL. The CSSOM isn't exposed to the page, so the only thing that can be observed is whether a fetch happens. Acting as if the URL fails to resolve means that a fetch is not allowed to happen.
Comment 8 Brent Fulgham 2020-01-21 11:06:05 PST
(In reply to Sam Weinig from comment #5)
> Seems like this should either be a new bit on the CSSParserContext or a new
> CSSParserMode, and the checking should be done in the parser.
> 
> Does this restriction really only apply to background-image? I would gather
> from the spec text it should be applied for any external resource.

Ideally, I think we would be able to run the CSSParser in a mode where it only accepted data URI.
Comment 9 Brent Fulgham 2020-01-21 11:06:59 PST
(In reply to Simon Pieters from comment #7)
> (In reply to Brent Fulgham from comment #3)
> > @Antti: The rules set we get back from the parser only gives us read-only
> > access to the styles, so we can't follow the spec here and switch the
> > non-data URI to some 'non-resolvable' URL.
> 
> To clarify what the spec requires: it doesn't require that you change the
> URL. The CSSOM isn't exposed to the page, so the only thing that can be
> observed is whether a fetch happens. Acting as if the URL fails to resolve
> means that a fetch is not allowed to happen.

Understood. I was speaking in terms of the implementation, where the obvious approach would be to discard the URL in cases where it was not allowed.

But I agree with Sam that the right approach is to revise the parser so that we can operate in a mode that does not allow external loads.
Comment 10 Antti Koivisto 2020-01-21 13:34:30 PST
WebVTT specific parser mode is one good approach. I think everything goes via CSSParserContext::completeURL(). We can probably just safely empty non-data URLs there. The mode could also ignore/bail out for other stuff that is not valid for WebVTT.

Another approach would be to avoid loading in Style::loadPendingResources. The code can probably somehow find out it is a WebVTT context and avoid loading non-data URLs. This approach still needs to prevent font, @import etc loads by other means.

The current patch is not good, there are many properties besides background-image that load resources.
Comment 11 Brent Fulgham 2020-01-21 15:00:15 PST
Created attachment 388357 [details]
Patch v2

Patch version 2: Modify the CSS Parser to exclude non-data URI.
Comment 12 Darin Adler 2020-01-21 15:18:08 PST
Comment on attachment 388357 [details]
Patch v2

View in context: https://bugs.webkit.org/attachment.cgi?id=388357&action=review

Seems too easy to forget to add this check. Seems like we should make completeURL and just add null checks at the call site

> Source/WebCore/css/parser/CSSParserMode.h:46
> +    // WebVTT places limitations on external resources

Add a period to match the style of the comments above?

> Source/WebCore/css/parser/CSSParserMode.h:78
> +    return cssParserMode == UASheetMode || cssParserMode == HTMLStandardMode || cssParserMode == SVGAttributeMode || cssParserMode == WebVTTMode;

If we use switch statements we get warnings when we forget to cover new modes. Is this really the only place that we check the mode against HTMLStandardMode?

> Source/WebCore/css/parser/CSSPropertyParser.cpp:4328
> +    if (uri.isNull())

Why the change to call this “uri”?

> Source/WebCore/css/parser/CSSPropertyParser.cpp:4333
> +         return nullptr;

This is indented 5 instead of 4.
Comment 13 Darin Adler 2020-01-21 15:49:05 PST
I meant that we should make completeURL do the check and return null. The have the callers check for null.
Comment 14 Brent Fulgham 2020-01-21 17:27:49 PST
Comment on attachment 388357 [details]
Patch v2

View in context: https://bugs.webkit.org/attachment.cgi?id=388357&action=review

>> Source/WebCore/css/parser/CSSParserMode.h:46
>> +    // WebVTT places limitations on external resources
> 
> Add a period to match the style of the comments above?

Yes

>> Source/WebCore/css/parser/CSSParserMode.h:78
>> +    return cssParserMode == UASheetMode || cssParserMode == HTMLStandardMode || cssParserMode == SVGAttributeMode || cssParserMode == WebVTTMode;
> 
> If we use switch statements we get warnings when we forget to cover new modes. Is this really the only place that we check the mode against HTMLStandardMode?

I'll double-check. I'm not familiar with this code, so could have easily missed something.

>> Source/WebCore/css/parser/CSSPropertyParser.cpp:4328
>> +    if (uri.isNull())
> 
> Why the change to call this “uri”?

I wanted to use 'url' below, and other places this pattern had 'uri' for the string, and 'url' for the URL object. But your later comment sort of removes the need for this.

>> Source/WebCore/css/parser/CSSPropertyParser.cpp:4333
>> +         return nullptr;
> 
> This is indented 5 instead of 4.

:-(
Comment 15 Brent Fulgham 2020-01-21 17:35:28 PST
(In reply to Darin Adler from comment #12)
> Comment on attachment 388357 [details]
> Patch v2
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=388357&action=review
> 
> Seems too easy to forget to add this check. Seems like we should make
> completeURL and just add null checks at the call site

I didn't do that at first because 'completeURL' returns a url object that is just blindly consumed by the callers.

It seemed like these various use cases would exit early when the URL string was empty, which didn't give me an opportunity to wipe out the URL for the non-data URL case.

Maybe I should change 'completeURL' to return an Optional<URL> and null it out if the rule is violated?

Or, I could just check that the returned URL is not 'isNull()' as you suggest. That's probably better.
Comment 16 Antti Koivisto 2020-01-22 00:01:30 PST
Why can't you just return an invalid URL ("return URL()" that is) from completeURL? That already happens on parse failure so the case should be fully handled through the code. 

It would be the simplest solution and also makes logical sense as the URL is invalid in this context.
Comment 17 Brent Fulgham 2020-01-22 21:36:06 PST
Created attachment 388517 [details]
Patch
Comment 18 Darin Adler 2020-01-22 21:48:01 PST
Comment on attachment 388517 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=388517&action=review

> Source/WebCore/css/parser/CSSParserMode.h:88
> +    return false;

Could add ASSERT_NOT_REACHED here.

> Source/WebCore/dom/InlineStyleSheetOwner.cpp:62
> +        if (auto* parent = element.parentNode()) {
> +            if (is<VTTCueBox>(*parent))
> +                result.mode = WebVTTMode;
> +        }

This changes a lot of behavior, doesn’t just add the restrictions on external URLs, since this overwrites UASheetMode with a mode that is treated like HTMLStandardMode. Is all of that change desirable? If so, do we have test coverage for these desirable change?

> Source/WebCore/dom/Node.h:201
> +    virtual bool isWebVTTCueBoxElement() const { return false; }

Unnecessary to add the word "Element" to the function name here, since the class name is WebVTTCueBox.

> Source/WebCore/html/track/VTTCue.h:72
> +    bool isWebVTTCueBoxElement() const override { return true; }

Could be final rather than override.
Comment 19 Antti Koivisto 2020-01-23 05:18:29 PST
Comment on attachment 388517 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=388517&action=review

> Source/WebCore/dom/InlineStyleSheetOwner.cpp:61
> +    if (shadowRoot && shadowRoot->mode() == ShadowRootMode::UserAgent) {
>          result.mode = UASheetMode;
> +        if (auto* parent = element.parentNode()) {
> +            if (is<VTTCueBox>(*parent))
> +                result.mode = WebVTTMode;

What is this change trying to achieve? The bug is about parsing WebVTT STYLE blocks, those are handled by the change in WebVTTParser.
Comment 20 Brent Fulgham 2020-01-23 08:59:32 PST
Comment on attachment 388517 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=388517&action=review

>> Source/WebCore/css/parser/CSSParserMode.h:88
>> +    return false;
> 
> Could add ASSERT_NOT_REACHED here.

Will do.

>> Source/WebCore/dom/InlineStyleSheetOwner.cpp:62
>> +        }
> 
> This changes a lot of behavior, doesn’t just add the restrictions on external URLs, since this overwrites UASheetMode with a mode that is treated like HTMLStandardMode. Is all of that change desirable? If so, do we have test coverage for these desirable change?

If the Style element being processed is for a VTTCueBox, we need the CSSParserContext to operate in WebVTT mode. Otherwise it will take the CSS style that was approved by WebVTTParser.cpp and interpret it in UASheetMode, which will allow the URLs to be loaded.

This code is somewhat confusing to me, because the WebVTTParser class really seems to be more of a validator that accepts or rejects the provided style. If it accepts the style, it passes the string (not the parsed style object) to the rest of the system. Once we reach this point, we have a string representing CSS (complete with full URLs) that is then re-processed as UASheetMode.

The change here is meant to make the CSS handling honor the WebVTT restrictions.

Maybe a better approach would be for WebVTTParser to throw away the string it was passed, and generate a new string (which has been sanitized to match WebVTT requirements).

An even better approach would be for the WebVTTParser to produce a style object that could be used by later code, rather than this existing design that causes all WebVTT styles to be parsed two times.

>> Source/WebCore/html/track/VTTCue.h:72
>> +    bool isWebVTTCueBoxElement() const override { return true; }
> 
> Could be final rather than override.

Will do.
Comment 21 Brent Fulgham 2020-01-23 09:14:59 PST
Comment on attachment 388517 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=388517&action=review

>>> Source/WebCore/dom/InlineStyleSheetOwner.cpp:62
>>> +        }
>> 
>> This changes a lot of behavior, doesn’t just add the restrictions on external URLs, since this overwrites UASheetMode with a mode that is treated like HTMLStandardMode. Is all of that change desirable? If so, do we have test coverage for these desirable change?
> 
> If the Style element being processed is for a VTTCueBox, we need the CSSParserContext to operate in WebVTT mode. Otherwise it will take the CSS style that was approved by WebVTTParser.cpp and interpret it in UASheetMode, which will allow the URLs to be loaded.
> 
> This code is somewhat confusing to me, because the WebVTTParser class really seems to be more of a validator that accepts or rejects the provided style. If it accepts the style, it passes the string (not the parsed style object) to the rest of the system. Once we reach this point, we have a string representing CSS (complete with full URLs) that is then re-processed as UASheetMode.
> 
> The change here is meant to make the CSS handling honor the WebVTT restrictions.
> 
> Maybe a better approach would be for WebVTTParser to throw away the string it was passed, and generate a new string (which has been sanitized to match WebVTT requirements).
> 
> An even better approach would be for the WebVTTParser to produce a style object that could be used by later code, rather than this existing design that causes all WebVTT styles to be parsed two times.

I'm uploading a modified patch that changes the various tests for "mode == UASheetMode" to the existing predicate "isUASheetBehavior", which I've modified to understand WebVTTMode, too. This should prevent any fallout from that part of the change.

In practice, WebVTT styles use is fairly limited, so we probably wouldn't have seen much impact (the change would not make UASheetMode fail to work as expected), and it doesn't seem like WebVTT styles ever use things that are not HTMLStandardMode -- though I do notice that UASheetMode does not seem to allow deferred style, so making sure we enforce UASheetMode for WebVTTMode is probably wise.
Comment 22 Brent Fulgham 2020-01-23 09:22:38 PST
Created attachment 388554 [details]
Patch
Comment 23 Brent Fulgham 2020-01-23 15:32:56 PST
Created attachment 388602 [details]
Patch
Comment 24 Darin Adler 2020-01-23 18:23:02 PST
Comment on attachment 388602 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=388602&action=review

> Source/WebCore/css/StyleSheetContents.cpp:347
> +    p.parseSheet(this, sheetText, !isUASheetBehavior(parserContext().mode) ? CSSParser::RuleParsing::Deferred : CSSParser::RuleParsing::Normal);

Might want to reverse these two rather than having a ! in front of the function call. So easy to miss that.

> Source/WebCore/css/parser/CSSParserMode.h:47
>      // User agent stylesheets are parsed in standards mode but also allows internal properties and values.
> -    UASheetMode
> +    UASheetMode,
> +    // WebVTT places limitations on external resources.
> +    WebVTTMode

Why do we want to allow internal properties and values in a VTT style?

> Source/WebCore/html/track/WebVTTParser.cpp:376
> -    auto contents = StyleSheetContents::create();
> +    auto contents = StyleSheetContents::create(CSSParserContext(WebVTTMode));

This is now a much bigger change than before. It changes this to UA sheet behavior. Is there anything bad with that change?
Comment 25 Darin Adler 2020-01-26 21:25:26 PST
Comment on attachment 388602 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=388602&action=review

I’m going to say r=me but I do still have open questions listed above.

> Source/WebCore/css/parser/CSSParserContext.h:75
> +    URL completeURL(const String& url) const;

I think it’s a little strange that this function member is stuck in between two data members of this struct. I’d move it down so it’s last for style’s sake.

> Source/WebCore/css/parser/CSSParserContext.h:78
>  };

Should come back and fix some other things in this header. For example, I don’t think CSSParserContextHash::hash should be inlined in the header; should be moved to the .cpp file. And we should use WTF_MAKE_STRUCT_FAST_ALLOCATED so we don't need to say "public:".

>> Source/WebCore/css/parser/CSSParserMode.h:47
>> +    WebVTTMode
> 
> Why do we want to allow internal properties and values in a VTT style?

I really don’t understand why a WebVTT stylesheet is more like the user agent stylesheet in this respect. The special user-agent stylesheet only internal stuff seems like it should not be allowed in WebVTT.

> Source/WebCore/dom/InlineStyleSheetOwner.cpp:62
> +        if (auto* parent = element.parentNode()) {
> +            if (is<VTTCueBox>(*parent))
> +                result.mode = WebVTTMode;
> +        }

Seems too mysterious to do this without a comment.
Comment 26 Antti Koivisto 2020-01-27 01:05:23 PST
Comment on attachment 388602 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=388602&action=review

>> Source/WebCore/dom/InlineStyleSheetOwner.cpp:62
>>          result.mode = UASheetMode;
>> +        if (auto* parent = element.parentNode()) {
>> +            if (is<VTTCueBox>(*parent))
>> +                result.mode = WebVTTMode;
>> +        }
> 
> Seems too mysterious to do this without a comment.

This is too hackish to put on a core code path. I think a better approach is to re-serialize the parsed stylesheet with data URLs cleaned up.

Brent, how about if I finish this patch?
Comment 27 Antti Koivisto 2020-01-27 05:18:30 PST
Created attachment 388849 [details]
patch
Comment 28 Antti Koivisto 2020-01-27 06:30:27 PST
Created attachment 388852 [details]
patch
Comment 29 Antti Koivisto 2020-01-27 06:43:52 PST
Created attachment 388855 [details]
patch
Comment 30 Brent Fulgham 2020-01-27 09:26:14 PST
Comment on attachment 388855 [details]
patch

Yes, this is much better. r=me.
Comment 31 Antti Koivisto 2020-01-27 09:41:15 PST
https://trac.webkit.org/r255151
Comment 32 Darin Adler 2020-01-27 12:31:26 PST
Comment on attachment 388855 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=388855&action=review

> Source/WebCore/html/track/WebVTTParser.cpp:416
> +        sanitizedStyleSheetBuilder.append(selectorText);
> +        sanitizedStyleSheetBuilder.appendLiteral(" { ");
> +        sanitizedStyleSheetBuilder.append(styleRule.properties().asText());
> +        sanitizedStyleSheetBuilder.appendLiteral(" }\n");

This can be done on a single line:

    sanitizedStyleSheetBuilder.append(selectorText, " { ", styleRule.properties().asText(), "  }\n");
Comment 33 Antti Koivisto 2020-01-27 23:27:23 PST
> This can be done on a single line:
> 
>     sanitizedStyleSheetBuilder.append(selectorText, " { ",
> styleRule.properties().asText(), "  }\n");

Fixed in https://trac.webkit.org/r255227