Bug 42061 - Make base64Decode ignore unrecognizable characters
Summary: Make base64Decode ignore unrecognizable characters
Status: RESOLVED DUPLICATE of bug 41510
Alias: None
Product: WebKit
Classification: Unclassified
Component: Platform (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-07-12 01:52 PDT by Kwang Yul Seo
Modified: 2010-09-05 15:10 PDT (History)
2 users (show)

See Also:


Attachments
Patch (2.02 KB, patch)
2010-07-12 01:54 PDT, Kwang Yul Seo
darin: review-
darin: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kwang Yul Seo 2010-07-12 01:52:27 PDT
Currently, base64Decode returns false immediately when it encounters a unknown character. However, RFC 2045 states: All line breaks or other characters not found in Table 1 must be ignored by decoding software. Change base64Decode to ignore unrecognizable characters.

This topic was discussed in http://www.mail-archive.com/webkit-dev@lists.webkit.org/msg04525.html
Comment 1 Kwang Yul Seo 2010-07-12 01:54:26 PDT
Created attachment 61199 [details]
Patch
Comment 2 Alexey Proskuryakov 2010-07-12 11:45:34 PDT
See also: bug 41510 and bug 23566. Let's decide which one to keep.
Comment 3 Patrick R. Gansterer 2010-07-12 12:21:57 PDT
(In reply to comment #2)
> See also: bug 41510 and bug 23566. Let's decide which one to keep.
This solution breaks window.atob().
Comment 4 Darin Adler 2010-07-12 12:35:58 PDT
(In reply to comment #3)
> This solution breaks window.atob().

I think what you mean is that this patch breaks window.atob(). The general approach does not have to do so, if the patch is done differently so atob follows a different code path.
Comment 5 Darin Adler 2010-07-12 14:15:16 PDT
Comment on attachment 61199 [details]
Patch

Do we want this behavior for all callers of the function? RFC 2045 is specifically for the Base64 Content-Transfer-Encoding in MIME, not specifically for data URLs or for the window.atob function.
Comment 6 Kwang Yul Seo 2010-07-12 22:04:17 PDT
(In reply to comment #5)
> (From update of attachment 61199 [details])
> Do we want this behavior for all callers of the function? RFC 2045 is specifically for the Base64 Content-Transfer-Encoding in MIME, not specifically for data URLs or for the window.atob function.

RFC3548 states:

2.3.  Interpretation of non-alphabet characters in encoded data

   Base encodings use a specific, reduced, alphabet to encode binary
   data.  Non alphabet characters could exist within base encoded data,
   caused by data corruption or by design.  Non alphabet characters may
   be exploited as a "covert channel", where non-protocol data can be
   sent for nefarious purposes.  Non alphabet characters might also be
   sent in order to exploit implementation errors leading to, e.g.,
   buffer overflow attacks.

   Implementations MUST reject the encoding if it contains characters
   outside the base alphabet when interpreting base encoded data, unless
   the specification referring to this document explicitly states
   otherwise.  Such specifications may, as MIME does, instead state that
   characters outside the base encoding alphabet should simply be
   ignored when interpreting data ("be liberal in what you accept").
   Note that this means that any CRLF constitute "non alphabet
   characters" and are ignored.  Furthermore, such specifications may
   consider the pad character, "=", as not part of the base alphabet
   until the end of the string.  If more than the allowed number of pad
   characters are found at the end of the string, e.g., a base 64 string
   terminated with "===", the excess pad characters could be ignored.


According to the specification, we must not ignore unexpected characters in general case.
Comment 7 Darin Adler 2010-07-13 09:34:39 PDT
Comment on attachment 61199 [details]
Patch

Thanks for doing the research. We will need to pass an argument to base64Decode indicating what behavior the caller wants.
Comment 8 Patrick R. Gansterer 2010-07-13 10:16:38 PDT
(In reply to comment #7)
> (From update of attachment 61199 [details])
> Thanks for doing the research. We will need to pass an argument to base64Decode indicating what behavior the caller wants.
I think thats exactly what the patch at bug 41510 does.
Comment 9 Darin Adler 2010-07-13 10:27:55 PDT
(In reply to comment #8)
> (In reply to comment #7)
> > (From update of attachment 61199 [details] [details])
> > Thanks for doing the research. We will need to pass an argument to base64Decode indicating what behavior the caller wants.
> I think thats exactly what the patch at bug 41510 does.

Well, it introduces an argument, yes, but it does not ignore all unrecognizable characters. In bug 41510 we have been discussing the merits of ignoring all unrecognizable characters (matching the data URL RFC, I believe) vs. instead ignoring only certain characters (matching some other browsers’ behavior with data URLs). If I remember correctly.
Comment 10 Patrick R. Gansterer 2010-07-13 12:35:31 PDT
(In reply to comment #9)
> Well, it introduces an argument, yes, but it does not ignore all unrecognizable characters. In bug 41510 we have been discussing the merits of ignoring all unrecognizable characters (matching the data URL RFC, I believe) vs. instead ignoring only certain characters (matching some other browsers’ behavior with data URLs). If I remember correctly.

In the current version is an enum Base64DecodePolicy { FailOnInvalidCharacter, IgnoreWhitespace, IgnoreInvalidCharacters };
Comment 11 Kwang Yul Seo 2010-09-05 15:10:49 PDT

*** This bug has been marked as a duplicate of bug 41510 ***