Bug 184529

Summary: atob() should not accept a vertical tab
Product: WebKit Reporter: Anne van Kesteren <annevk>
Component: DOMAssignee: Rob Buis <rbuis>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, cdumez, cmarcelo, darin, ews-watchlist, rbuis, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch none

Anne van Kesteren
Reported 2018-04-12 03:00:21 PDT
Run html/webappapis/atob/base64.html in web-platform-tests after applying https://github.com/w3c/web-platform-tests/pull/10441 to reproduce this.
Attachments
Patch (8.35 KB, patch)
2020-05-01 12:47 PDT, Rob Buis
no flags
Patch (8.34 KB, patch)
2020-05-01 23:51 PDT, Rob Buis
no flags
Patch (9.18 KB, patch)
2020-05-02 01:32 PDT, Rob Buis
no flags
Patch (9.20 KB, patch)
2020-05-02 03:03 PDT, Rob Buis
no flags
Patch (10.77 KB, patch)
2020-05-02 04:00 PDT, Rob Buis
no flags
Patch (9.33 KB, patch)
2020-05-02 22:21 PDT, Rob Buis
no flags
Rob Buis
Comment 1 2020-05-01 12:47:48 PDT
Darin Adler
Comment 2 2020-05-01 18:42:53 PDT
Comment on attachment 398219 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=398219&action=review This change is OK, but I think it’s possibly needed because of a mistake. > Source/WTF/wtf/text/Base64.cpp:213 > + || (!isSpaceOrNewline(ch) || ((options & Base64DiscardVerticalTab) && ch == '\v'))) { I don’t understand why we ever use the full-Unicode isSpaceOrNewline function. In what context do we need to allow non-ASCII spaces? Do we have cross-browser tests checking this behavior? What spaces exactly do we need to ignore? Does any caller need ignore vertical tabs?
Rob Buis
Comment 3 2020-05-01 23:51:16 PDT
Rob Buis
Comment 4 2020-05-02 01:32:51 PDT
Rob Buis
Comment 5 2020-05-02 03:03:16 PDT
Rob Buis
Comment 6 2020-05-02 04:00:41 PDT
Rob Buis
Comment 7 2020-05-02 22:21:03 PDT
Rob Buis
Comment 8 2020-05-03 02:35:37 PDT
Comment on attachment 398219 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=398219&action=review >> Source/WTF/wtf/text/Base64.cpp:213 >> + || (!isSpaceOrNewline(ch) || ((options & Base64DiscardVerticalTab) && ch == '\v'))) { > > I don’t understand why we ever use the full-Unicode isSpaceOrNewline function. In what context do we need to allow non-ASCII spaces? Do we have cross-browser tests checking this behavior? What spaces exactly do we need to ignore? Does any caller need ignore vertical tabs? You were right, this was not handling unicode well. I added a test for it and made sure we handle both whitespace and non-whitespace unicode correctly. Also upstreaming the test (https://github.com/web-platform-tests/wpt/pull/23377).
EWS
Comment 9 2020-05-03 02:53:59 PDT
Committed r261061: <https://trac.webkit.org/changeset/261061> All reviewed patches have been landed. Closing bug and clearing flags on attachment 398304 [details].
Radar WebKit Bug Importer
Comment 10 2020-05-03 02:54:14 PDT
Darin Adler
Comment 11 2020-05-03 16:47:29 PDT
Comment on attachment 398304 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=398304&action=review Really nice. > Source/WTF/wtf/text/Base64.cpp:213 > + || (!isLatin1(ch) || !isASCIISpace(ch) || ((options & Base64DiscardVerticalTab) && ch == '\v'))) { Some thoughts about further refinements in the future: 1) We do not need the isLatin1 check here. The isASCIISpace function is guaranteed to return false. It’s not a function that implements "is whitespace if ASCII", but rather one that implements "is ASCII and whitespace". 2) Separately, regarding vertical tab, it’s worth considering that we have these functions: isASCIISpace: Based on the traditions of the C language <ctype.h>, thus includes vertical tab (U+000B); not sure how often this is useful in WebKit. At the time I created this I was just trying to make an isspace function that was never affected by the C standard library locale, and that consistently returned false for non-ASCII characters when passed wider character types. isHTMLSpace: Based on the HTML specification, does not include vertical tab, otherwise the is identical to isASCIISpace. Very useful in WebKit! isHTTPSpace: Based on the HTTP space, does not include form feed (U+000C), otherwise identical to isHTTPSpace. I guess this means that at some point I would like to rename Base64DiscardVerticalTab to something like Base64UseLegacyASCIISpaceDefinition. > Source/WTF/wtf/text/Base64.h:46 > + Base64DiscardVerticalTab = 1 << 2, This name doesn’t make it clear that it modifies Base64IgnoreSpacesAndNewLines and has no effect if that is not specified. Maybe we should rename it it Base64TreatVerticalTabAsSpace. > Source/WebCore/page/Base64Utilities.cpp:50 > + if (!base64Decode(encodedString, out, Base64ValidatePadding | Base64IgnoreSpacesAndNewLines | Base64DiscardVerticalTab)) Do we really need this vertical tab handling for interoperability or compatibility? We should find out. I would love to drop it. > Source/WebCore/platform/network/DataURLDecoder.cpp:162 > + if (!base64Decode(unescapedString, buffer, Base64IgnoreSpacesAndNewLines | Base64DiscardVerticalTab)) Do we really need this vertical tab handling for interoperability or compatibility? We should find out. I would love to drop it.
Note You need to log in before you can comment on or make changes to this bug.