Bug 42061

Summary: Make base64Decode ignore unrecognizable characters
Product: WebKit Reporter: Kwang Yul Seo <skyul>
Component: PlatformAssignee: Nobody <webkit-unassigned>
Status: RESOLVED DUPLICATE    
Severity: Normal CC: darin, paroga
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
Patch darin: review-, darin: commit-queue-

Kwang Yul Seo
Reported 2010-07-12 01:52:27 PDT
Currently, base64Decode returns false immediately when it encounters a unknown character. However, RFC 2045 states: All line breaks or other characters not found in Table 1 must be ignored by decoding software. Change base64Decode to ignore unrecognizable characters. This topic was discussed in http://www.mail-archive.com/webkit-dev@lists.webkit.org/msg04525.html
Attachments
Patch (2.02 KB, patch)
2010-07-12 01:54 PDT, Kwang Yul Seo
darin: review-
darin: commit-queue-
Kwang Yul Seo
Comment 1 2010-07-12 01:54:26 PDT
Alexey Proskuryakov
Comment 2 2010-07-12 11:45:34 PDT
See also: bug 41510 and bug 23566. Let's decide which one to keep.
Patrick R. Gansterer
Comment 3 2010-07-12 12:21:57 PDT
(In reply to comment #2) > See also: bug 41510 and bug 23566. Let's decide which one to keep. This solution breaks window.atob().
Darin Adler
Comment 4 2010-07-12 12:35:58 PDT
(In reply to comment #3) > This solution breaks window.atob(). I think what you mean is that this patch breaks window.atob(). The general approach does not have to do so, if the patch is done differently so atob follows a different code path.
Darin Adler
Comment 5 2010-07-12 14:15:16 PDT
Comment on attachment 61199 [details] Patch Do we want this behavior for all callers of the function? RFC 2045 is specifically for the Base64 Content-Transfer-Encoding in MIME, not specifically for data URLs or for the window.atob function.
Kwang Yul Seo
Comment 6 2010-07-12 22:04:17 PDT
(In reply to comment #5) > (From update of attachment 61199 [details]) > Do we want this behavior for all callers of the function? RFC 2045 is specifically for the Base64 Content-Transfer-Encoding in MIME, not specifically for data URLs or for the window.atob function. RFC3548 states: 2.3. Interpretation of non-alphabet characters in encoded data Base encodings use a specific, reduced, alphabet to encode binary data. Non alphabet characters could exist within base encoded data, caused by data corruption or by design. Non alphabet characters may be exploited as a "covert channel", where non-protocol data can be sent for nefarious purposes. Non alphabet characters might also be sent in order to exploit implementation errors leading to, e.g., buffer overflow attacks. Implementations MUST reject the encoding if it contains characters outside the base alphabet when interpreting base encoded data, unless the specification referring to this document explicitly states otherwise. Such specifications may, as MIME does, instead state that characters outside the base encoding alphabet should simply be ignored when interpreting data ("be liberal in what you accept"). Note that this means that any CRLF constitute "non alphabet characters" and are ignored. Furthermore, such specifications may consider the pad character, "=", as not part of the base alphabet until the end of the string. If more than the allowed number of pad characters are found at the end of the string, e.g., a base 64 string terminated with "===", the excess pad characters could be ignored. According to the specification, we must not ignore unexpected characters in general case.
Darin Adler
Comment 7 2010-07-13 09:34:39 PDT
Comment on attachment 61199 [details] Patch Thanks for doing the research. We will need to pass an argument to base64Decode indicating what behavior the caller wants.
Patrick R. Gansterer
Comment 8 2010-07-13 10:16:38 PDT
(In reply to comment #7) > (From update of attachment 61199 [details]) > Thanks for doing the research. We will need to pass an argument to base64Decode indicating what behavior the caller wants. I think thats exactly what the patch at bug 41510 does.
Darin Adler
Comment 9 2010-07-13 10:27:55 PDT
(In reply to comment #8) > (In reply to comment #7) > > (From update of attachment 61199 [details] [details]) > > Thanks for doing the research. We will need to pass an argument to base64Decode indicating what behavior the caller wants. > I think thats exactly what the patch at bug 41510 does. Well, it introduces an argument, yes, but it does not ignore all unrecognizable characters. In bug 41510 we have been discussing the merits of ignoring all unrecognizable characters (matching the data URL RFC, I believe) vs. instead ignoring only certain characters (matching some other browsers’ behavior with data URLs). If I remember correctly.
Patrick R. Gansterer
Comment 10 2010-07-13 12:35:31 PDT
(In reply to comment #9) > Well, it introduces an argument, yes, but it does not ignore all unrecognizable characters. In bug 41510 we have been discussing the merits of ignoring all unrecognizable characters (matching the data URL RFC, I believe) vs. instead ignoring only certain characters (matching some other browsers’ behavior with data URLs). If I remember correctly. In the current version is an enum Base64DecodePolicy { FailOnInvalidCharacter, IgnoreWhitespace, IgnoreInvalidCharacters };
Kwang Yul Seo
Comment 11 2010-09-05 15:10:49 PDT
*** This bug has been marked as a duplicate of bug 41510 ***
Note You need to log in before you can comment on or make changes to this bug.