42061 – Make base64Decode ignore unrecognizable characters

RESOLVED DUPLICATE of bug 41510 42061

Make base64Decode ignore unrecognizable characters

https://bugs.webkit.org/show_bug.cgi?id=42061

Summary Make base64Decode ignore unrecognizable characters

Kwang Yul Seo

Reported 2010-07-12 01:52:27 PDT

Currently, base64Decode returns false immediately when it encounters a unknown character. However, RFC 2045 states: All line breaks or other characters not found in Table 1 must be ignored by decoding software. Change base64Decode to ignore unrecognizable characters. This topic was discussed in http://www.mail-archive.com/webkit-dev@lists.webkit.org/msg04525.html

Attachments
Patch (2.02 KB, patch) 2010-07-12 01:54 PDT, Kwang Yul Seo	darin: review- darin: commit-queue-	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Kwang Yul Seo

Comment 1 2010-07-12 01:54:26 PDT

Created attachment 61199 [details] Patch

Alexey Proskuryakov

Comment 2 2010-07-12 11:45:34 PDT

See also: bug 41510 and bug 23566. Let's decide which one to keep.

Patrick R. Gansterer

Comment 3 2010-07-12 12:21:57 PDT

(In reply to comment #2) > See also: bug 41510 and bug 23566. Let's decide which one to keep. This solution breaks window.atob().

Darin Adler

Comment 4 2010-07-12 12:35:58 PDT

(In reply to comment #3) > This solution breaks window.atob(). I think what you mean is that this patch breaks window.atob(). The general approach does not have to do so, if the patch is done differently so atob follows a different code path.

Darin Adler

Comment 5 2010-07-12 14:15:16 PDT

Comment on attachment 61199 [details] Patch Do we want this behavior for all callers of the function? RFC 2045 is specifically for the Base64 Content-Transfer-Encoding in MIME, not specifically for data URLs or for the window.atob function.

Kwang Yul Seo

Comment 6 2010-07-12 22:04:17 PDT

(In reply to comment #5) > (From update of attachment 61199 [details]) > Do we want this behavior for all callers of the function? RFC 2045 is specifically for the Base64 Content-Transfer-Encoding in MIME, not specifically for data URLs or for the window.atob function. RFC3548 states: 2.3. Interpretation of non-alphabet characters in encoded data Base encodings use a specific, reduced, alphabet to encode binary data. Non alphabet characters could exist within base encoded data, caused by data corruption or by design. Non alphabet characters may be exploited as a "covert channel", where non-protocol data can be sent for nefarious purposes. Non alphabet characters might also be sent in order to exploit implementation errors leading to, e.g., buffer overflow attacks. Implementations MUST reject the encoding if it contains characters outside the base alphabet when interpreting base encoded data, unless the specification referring to this document explicitly states otherwise. Such specifications may, as MIME does, instead state that characters outside the base encoding alphabet should simply be ignored when interpreting data ("be liberal in what you accept"). Note that this means that any CRLF constitute "non alphabet characters" and are ignored. Furthermore, such specifications may consider the pad character, "=", as not part of the base alphabet until the end of the string. If more than the allowed number of pad characters are found at the end of the string, e.g., a base 64 string terminated with "===", the excess pad characters could be ignored. According to the specification, we must not ignore unexpected characters in general case.

Darin Adler

Comment 7 2010-07-13 09:34:39 PDT

Comment on attachment 61199 [details] Patch Thanks for doing the research. We will need to pass an argument to base64Decode indicating what behavior the caller wants.

Patrick R. Gansterer

Comment 8 2010-07-13 10:16:38 PDT

(In reply to comment #7) > (From update of attachment 61199 [details]) > Thanks for doing the research. We will need to pass an argument to base64Decode indicating what behavior the caller wants. I think thats exactly what the patch at bug 41510 does.

Darin Adler

Comment 9 2010-07-13 10:27:55 PDT

(In reply to comment #8) > (In reply to comment #7) > > (From update of attachment 61199 [details] [details]) > > Thanks for doing the research. We will need to pass an argument to base64Decode indicating what behavior the caller wants. > I think thats exactly what the patch at bug 41510 does. Well, it introduces an argument, yes, but it does not ignore all unrecognizable characters. In bug 41510 we have been discussing the merits of ignoring all unrecognizable characters (matching the data URL RFC, I believe) vs. instead ignoring only certain characters (matching some other browsers’ behavior with data URLs). If I remember correctly.

Patrick R. Gansterer

Comment 10 2010-07-13 12:35:31 PDT

(In reply to comment #9) > Well, it introduces an argument, yes, but it does not ignore all unrecognizable characters. In bug 41510 we have been discussing the merits of ignoring all unrecognizable characters (matching the data URL RFC, I believe) vs. instead ignoring only certain characters (matching some other browsers’ behavior with data URLs). If I remember correctly. In the current version is an enum Base64DecodePolicy { FailOnInvalidCharacter, IgnoreWhitespace, IgnoreInvalidCharacters };

Kwang Yul Seo

Comment 11 2010-09-05 15:10:49 PDT

*** This bug has been marked as a duplicate of bug 41510 ***

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution DUPLICATE

of bug 41510

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component Platform

Assignee

Nobody

Reported

2010-07-12 01:52 PDT

Modified

2010-09-05 15:10 PDT History

CC List

2 users Show

URL

Keywords

Depends on

Blocks