Bug 184529 - atob() should not accept a vertical tab
Summary: atob() should not accept a vertical tab
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Rob Buis
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2018-04-12 03:00 PDT by Anne van Kesteren
Modified: 2020-05-03 16:47 PDT (History)
7 users (show)

See Also:


Attachments
Patch (8.35 KB, patch)
2020-05-01 12:47 PDT, Rob Buis
no flags Details | Formatted Diff | Diff
Patch (8.34 KB, patch)
2020-05-01 23:51 PDT, Rob Buis
no flags Details | Formatted Diff | Diff
Patch (9.18 KB, patch)
2020-05-02 01:32 PDT, Rob Buis
no flags Details | Formatted Diff | Diff
Patch (9.20 KB, patch)
2020-05-02 03:03 PDT, Rob Buis
no flags Details | Formatted Diff | Diff
Patch (10.77 KB, patch)
2020-05-02 04:00 PDT, Rob Buis
no flags Details | Formatted Diff | Diff
Patch (9.33 KB, patch)
2020-05-02 22:21 PDT, Rob Buis
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Anne van Kesteren 2018-04-12 03:00:21 PDT
Run html/webappapis/atob/base64.html in web-platform-tests after applying https://github.com/w3c/web-platform-tests/pull/10441 to reproduce this.
Comment 1 Rob Buis 2020-05-01 12:47:48 PDT
Created attachment 398219 [details]
Patch
Comment 2 Darin Adler 2020-05-01 18:42:53 PDT
Comment on attachment 398219 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=398219&action=review

This change is OK, but I think it’s possibly needed because of a mistake.

> Source/WTF/wtf/text/Base64.cpp:213
> +                || (!isSpaceOrNewline(ch) || ((options & Base64DiscardVerticalTab) && ch == '\v'))) {

I don’t understand why we ever use the full-Unicode isSpaceOrNewline function. In what context do we need to allow non-ASCII spaces? Do we have cross-browser tests checking this behavior? What spaces exactly do we need to ignore? Does any caller need ignore vertical tabs?
Comment 3 Rob Buis 2020-05-01 23:51:16 PDT
Created attachment 398281 [details]
Patch
Comment 4 Rob Buis 2020-05-02 01:32:51 PDT
Created attachment 398283 [details]
Patch
Comment 5 Rob Buis 2020-05-02 03:03:16 PDT
Created attachment 398285 [details]
Patch
Comment 6 Rob Buis 2020-05-02 04:00:41 PDT
Created attachment 398286 [details]
Patch
Comment 7 Rob Buis 2020-05-02 22:21:03 PDT
Created attachment 398304 [details]
Patch
Comment 8 Rob Buis 2020-05-03 02:35:37 PDT
Comment on attachment 398219 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=398219&action=review

>> Source/WTF/wtf/text/Base64.cpp:213
>> +                || (!isSpaceOrNewline(ch) || ((options & Base64DiscardVerticalTab) && ch == '\v'))) {
> 
> I don’t understand why we ever use the full-Unicode isSpaceOrNewline function. In what context do we need to allow non-ASCII spaces? Do we have cross-browser tests checking this behavior? What spaces exactly do we need to ignore? Does any caller need ignore vertical tabs?

You were right, this was not handling unicode well. I added a test for it and made sure we handle both whitespace and non-whitespace unicode correctly. Also upstreaming the test (https://github.com/web-platform-tests/wpt/pull/23377).
Comment 9 EWS 2020-05-03 02:53:59 PDT
Committed r261061: <https://trac.webkit.org/changeset/261061>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 398304 [details].
Comment 10 Radar WebKit Bug Importer 2020-05-03 02:54:14 PDT
<rdar://problem/62795464>
Comment 11 Darin Adler 2020-05-03 16:47:29 PDT
Comment on attachment 398304 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=398304&action=review

Really nice.

> Source/WTF/wtf/text/Base64.cpp:213
> +                || (!isLatin1(ch) || !isASCIISpace(ch) || ((options & Base64DiscardVerticalTab) && ch == '\v'))) {

Some thoughts about further refinements in the future:

1) We do not need the isLatin1 check here. The isASCIISpace function is guaranteed to return false. It’s not a function that implements "is whitespace if ASCII", but rather one that implements "is ASCII and whitespace".

2) Separately, regarding vertical tab, it’s worth considering that we have these functions:

    isASCIISpace: Based on the traditions of the C language <ctype.h>, thus includes vertical tab (U+000B); not sure how often this is useful in WebKit. At the time I created this I was just trying to make an isspace function that was never affected by the C standard library locale, and that consistently returned false for non-ASCII characters when passed wider character types.

    isHTMLSpace: Based on the HTML specification, does not include vertical tab, otherwise the is identical to isASCIISpace. Very useful in WebKit!

    isHTTPSpace: Based on the HTTP space, does not include form feed (U+000C), otherwise identical to isHTTPSpace.

I guess this means that at some point I would like to rename Base64DiscardVerticalTab to something like Base64UseLegacyASCIISpaceDefinition.

> Source/WTF/wtf/text/Base64.h:46
> +    Base64DiscardVerticalTab = 1 << 2,

This name doesn’t make it clear that it modifies Base64IgnoreSpacesAndNewLines and has no effect if that is not specified. Maybe we should rename it it Base64TreatVerticalTabAsSpace.

> Source/WebCore/page/Base64Utilities.cpp:50
> +    if (!base64Decode(encodedString, out, Base64ValidatePadding | Base64IgnoreSpacesAndNewLines | Base64DiscardVerticalTab))

Do we really need this vertical tab handling for interoperability or compatibility? We should find out. I would love to drop it.

> Source/WebCore/platform/network/DataURLDecoder.cpp:162
> +        if (!base64Decode(unescapedString, buffer, Base64IgnoreSpacesAndNewLines | Base64DiscardVerticalTab))

Do we really need this vertical tab handling for interoperability or compatibility? We should find out. I would love to drop it.