<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>42061</bug_id>
          
          <creation_ts>2010-07-12 01:52:27 -0700</creation_ts>
          <short_desc>Make base64Decode ignore unrecognizable characters</short_desc>
          <delta_ts>2010-09-05 15:10:49 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Platform</component>
          <version>528+ (Nightly build)</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>DUPLICATE</resolution>
          <dup_id>41510</dup_id>
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>0</everconfirmed>
          <reporter name="Kwang Yul Seo">skyul</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>darin</cc>
    
    <cc>paroga</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>249720</commentid>
    <comment_count>0</comment_count>
    <who name="Kwang Yul Seo">skyul</who>
    <bug_when>2010-07-12 01:52:27 -0700</bug_when>
    <thetext>Currently, base64Decode returns false immediately when it encounters a unknown character. However, RFC 2045 states: All line breaks or other characters not found in Table 1 must be ignored by decoding software. Change base64Decode to ignore unrecognizable characters.

This topic was discussed in http://www.mail-archive.com/webkit-dev@lists.webkit.org/msg04525.html</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>249721</commentid>
    <comment_count>1</comment_count>
      <attachid>61199</attachid>
    <who name="Kwang Yul Seo">skyul</who>
    <bug_when>2010-07-12 01:54:26 -0700</bug_when>
    <thetext>Created attachment 61199
Patch</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>249934</commentid>
    <comment_count>2</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2010-07-12 11:45:34 -0700</bug_when>
    <thetext>See also: bug 41510 and bug 23566. Let&apos;s decide which one to keep.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>249958</commentid>
    <comment_count>3</comment_count>
    <who name="Patrick R. Gansterer">paroga</who>
    <bug_when>2010-07-12 12:21:57 -0700</bug_when>
    <thetext>(In reply to comment #2)
&gt; See also: bug 41510 and bug 23566. Let&apos;s decide which one to keep.
This solution breaks window.atob().</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>249968</commentid>
    <comment_count>4</comment_count>
    <who name="Darin Adler">darin</who>
    <bug_when>2010-07-12 12:35:58 -0700</bug_when>
    <thetext>(In reply to comment #3)
&gt; This solution breaks window.atob().

I think what you mean is that this patch breaks window.atob(). The general approach does not have to do so, if the patch is done differently so atob follows a different code path.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>250048</commentid>
    <comment_count>5</comment_count>
      <attachid>61199</attachid>
    <who name="Darin Adler">darin</who>
    <bug_when>2010-07-12 14:15:16 -0700</bug_when>
    <thetext>Comment on attachment 61199
Patch

Do we want this behavior for all callers of the function? RFC 2045 is specifically for the Base64 Content-Transfer-Encoding in MIME, not specifically for data URLs or for the window.atob function.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>250294</commentid>
    <comment_count>6</comment_count>
    <who name="Kwang Yul Seo">skyul</who>
    <bug_when>2010-07-12 22:04:17 -0700</bug_when>
    <thetext>(In reply to comment #5)
&gt; (From update of attachment 61199 [details])
&gt; Do we want this behavior for all callers of the function? RFC 2045 is specifically for the Base64 Content-Transfer-Encoding in MIME, not specifically for data URLs or for the window.atob function.

RFC3548 states:

2.3.  Interpretation of non-alphabet characters in encoded data

   Base encodings use a specific, reduced, alphabet to encode binary
   data.  Non alphabet characters could exist within base encoded data,
   caused by data corruption or by design.  Non alphabet characters may
   be exploited as a &quot;covert channel&quot;, where non-protocol data can be
   sent for nefarious purposes.  Non alphabet characters might also be
   sent in order to exploit implementation errors leading to, e.g.,
   buffer overflow attacks.

   Implementations MUST reject the encoding if it contains characters
   outside the base alphabet when interpreting base encoded data, unless
   the specification referring to this document explicitly states
   otherwise.  Such specifications may, as MIME does, instead state that
   characters outside the base encoding alphabet should simply be
   ignored when interpreting data (&quot;be liberal in what you accept&quot;).
   Note that this means that any CRLF constitute &quot;non alphabet
   characters&quot; and are ignored.  Furthermore, such specifications may
   consider the pad character, &quot;=&quot;, as not part of the base alphabet
   until the end of the string.  If more than the allowed number of pad
   characters are found at the end of the string, e.g., a base 64 string
   terminated with &quot;===&quot;, the excess pad characters could be ignored.


According to the specification, we must not ignore unexpected characters in general case.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>250565</commentid>
    <comment_count>7</comment_count>
      <attachid>61199</attachid>
    <who name="Darin Adler">darin</who>
    <bug_when>2010-07-13 09:34:39 -0700</bug_when>
    <thetext>Comment on attachment 61199
Patch

Thanks for doing the research. We will need to pass an argument to base64Decode indicating what behavior the caller wants.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>250592</commentid>
    <comment_count>8</comment_count>
    <who name="Patrick R. Gansterer">paroga</who>
    <bug_when>2010-07-13 10:16:38 -0700</bug_when>
    <thetext>(In reply to comment #7)
&gt; (From update of attachment 61199 [details])
&gt; Thanks for doing the research. We will need to pass an argument to base64Decode indicating what behavior the caller wants.
I think thats exactly what the patch at bug 41510 does.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>250601</commentid>
    <comment_count>9</comment_count>
    <who name="Darin Adler">darin</who>
    <bug_when>2010-07-13 10:27:55 -0700</bug_when>
    <thetext>(In reply to comment #8)
&gt; (In reply to comment #7)
&gt; &gt; (From update of attachment 61199 [details] [details])
&gt; &gt; Thanks for doing the research. We will need to pass an argument to base64Decode indicating what behavior the caller wants.
&gt; I think thats exactly what the patch at bug 41510 does.

Well, it introduces an argument, yes, but it does not ignore all unrecognizable characters. In bug 41510 we have been discussing the merits of ignoring all unrecognizable characters (matching the data URL RFC, I believe) vs. instead ignoring only certain characters (matching some other browsers’ behavior with data URLs). If I remember correctly.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>250679</commentid>
    <comment_count>10</comment_count>
    <who name="Patrick R. Gansterer">paroga</who>
    <bug_when>2010-07-13 12:35:31 -0700</bug_when>
    <thetext>(In reply to comment #9)
&gt; Well, it introduces an argument, yes, but it does not ignore all unrecognizable characters. In bug 41510 we have been discussing the merits of ignoring all unrecognizable characters (matching the data URL RFC, I believe) vs. instead ignoring only certain characters (matching some other browsers’ behavior with data URLs). If I remember correctly.

In the current version is an enum Base64DecodePolicy { FailOnInvalidCharacter, IgnoreWhitespace, IgnoreInvalidCharacters };</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>274453</commentid>
    <comment_count>11</comment_count>
    <who name="Kwang Yul Seo">skyul</who>
    <bug_when>2010-09-05 15:10:49 -0700</bug_when>
    <thetext>

*** This bug has been marked as a duplicate of bug 41510 ***</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="1"
              isprivate="0"
          >
            <attachid>61199</attachid>
            <date>2010-07-12 01:54:26 -0700</date>
            <delta_ts>2010-07-13 09:34:39 -0700</delta_ts>
            <desc>Patch</desc>
            <filename>base64.patch</filename>
            <type>text/plain</type>
            <size>2066</size>
            <attacher name="Kwang Yul Seo">skyul</attacher>
            
              <data encoding="base64">SW5kZXg6IFdlYkNvcmUvQ2hhbmdlTG9nCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIFdlYkNvcmUvQ2hhbmdlTG9n
CShyZXZpc2lvbiA2MzA2MykKKysrIFdlYkNvcmUvQ2hhbmdlTG9nCSh3b3JraW5nIGNvcHkpCkBA
IC0xLDMgKzEsMTggQEAKKzIwMTAtMDctMTIgIEt3YW5nIFl1bCBTZW8gIDxza3l1bEBjb21wYW55
MTAwLm5ldD4KKworICAgICAgICBSZXZpZXdlZCBieSBOT0JPRFkgKE9PUFMhKS4KKworICAgICAg
ICBNYWtlIGJhc2U2NERlY29kZSBpZ25vcmUgdW5yZWNvZ25pemFibGUgY2hhcmFjdGVycworICAg
ICAgICBodHRwczovL2J1Z3Mud2Via2l0Lm9yZy9zaG93X2J1Zy5jZ2k/aWQ9NDIwNjEKKworICAg
ICAgICBDdXJyZW50bHksIGJhc2U2NERlY29kZSByZXR1cm5zIGZhbHNlIGltbWVkaWF0ZWx5IHdo
ZW4gaXQgZW5jb3VudGVycyBhCisgICAgICAgIHVua25vd24gY2hhcmFjdGVyLiBIb3dldmVyLCBS
RkMgMjA0NSBzdGF0ZXM6IEFsbCBsaW5lIGJyZWFrcyBvciBvdGhlcgorICAgICAgICBjaGFyYWN0
ZXJzIG5vdCBmb3VuZCBpbiBUYWJsZSAxIG11c3QgYmUgaWdub3JlZCBieSBkZWNvZGluZyBzb2Z0
d2FyZS4KKyAgICAgICAgQ2hhbmdlIGJhc2U2NERlY29kZSB0byBpZ25vcmUgdW5yZWNvZ25pemFi
bGUgY2hhcmFjdGVycy4KKworICAgICAgICAqIHBsYXRmb3JtL3RleHQvQmFzZTY0LmNwcDoKKyAg
ICAgICAgKFdlYkNvcmU6OmJhc2U2NERlY29kZSk6CisKIDIwMTAtMDctMTEgIE1hY2llaiBTdGFj
aG93aWFrICA8bWpzQGFwcGxlLmNvbT4KIAogICAgICAgICBSZXZpZXdlZCBieSBEYW4gQmVybnN0
ZWluLgpJbmRleDogV2ViQ29yZS9wbGF0Zm9ybS90ZXh0L0Jhc2U2NC5jcHAKPT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQot
LS0gV2ViQ29yZS9wbGF0Zm9ybS90ZXh0L0Jhc2U2NC5jcHAJKHJldmlzaW9uIDYzMDYyKQorKysg
V2ViQ29yZS9wbGF0Zm9ybS90ZXh0L0Jhc2U2NC5jcHAJKHdvcmtpbmcgY29weSkKQEAgLTE0NCwx
OCArMTQ0LDIxIEBAIGJvb2wgYmFzZTY0RGVjb2RlKGNvbnN0IGNoYXIqIGRhdGEsIHVuc2kKICAg
ICB3aGlsZSAobGVuICYmIGRhdGFbbGVuLTFdID09ICc9JykKICAgICAgICAgLS1sZW47CiAKKyAg
ICB1bnNpZ25lZCBvdXRTaXplID0gMDsKICAgICBvdXQuZ3JvdyhsZW4pOwogICAgIGZvciAodW5z
aWduZWQgaWR4ID0gMDsgaWR4IDwgbGVuOyBpZHgrKykgewogICAgICAgICB1bnNpZ25lZCBjaGFy
IGNoID0gZGF0YVtpZHhdOwotICAgICAgICBpZiAoKGNoID4gNDcgJiYgY2ggPCA1OCkgfHwgKGNo
ID4gNjQgJiYgY2ggPCA5MSkgfHwgKGNoID4gOTYgJiYgY2ggPCAxMjMpIHx8IGNoID09ICcrJyB8
fCBjaCA9PSAnLycgfHwgY2ggPT0gJz0nKQotICAgICAgICAgICAgb3V0W2lkeF0gPSBiYXNlNjRE
ZWNNYXBbY2hdOwotICAgICAgICBlbHNlCi0gICAgICAgICAgICByZXR1cm4gZmFsc2U7CisgICAg
ICAgIGlmICgoY2ggPiA0NyAmJiBjaCA8IDU4KSB8fCAoY2ggPiA2NCAmJiBjaCA8IDkxKSB8fCAo
Y2ggPiA5NiAmJiBjaCA8IDEyMykgfHwgY2ggPT0gJysnIHx8IGNoID09ICcvJyB8fCBjaCA9PSAn
PScpIHsKKyAgICAgICAgICAgIG91dFtvdXRTaXplXSA9IGJhc2U2NERlY01hcFtjaF07CisgICAg
ICAgICAgICBvdXRTaXplKys7CisgICAgICAgIH0KICAgICB9CiAKKyAgICBvdXQucmVzaXplKG91
dFNpemUpOworCiAgICAgLy8gNC1ieXRlIHRvIDMtYnl0ZSBjb252ZXJzaW9uCi0gICAgdW5zaWdu
ZWQgb3V0TGVuID0gbGVuIC0gKChsZW4gKyAzKSAvIDQpOwotICAgIGlmICghb3V0TGVuIHx8ICgo
b3V0TGVuICsgMikgLyAzKSAqIDQgPCBsZW4pCisgICAgdW5zaWduZWQgb3V0TGVuID0gb3V0U2l6
ZSAtICgob3V0U2l6ZSArIDMpIC8gNCk7CisgICAgaWYgKCFvdXRMZW4gfHwgKChvdXRMZW4gKyAy
KSAvIDMpICogNCA8IG91dFNpemUpCiAgICAgICAgIHJldHVybiBmYWxzZTsKIAogICAgIHVuc2ln
bmVkIHNpZHggPSAwOwo=
</data>
<flag name="review"
          id="49182"
          type_id="1"
          status="-"
          setter="darin"
    />
    <flag name="commit-queue"
          id="49183"
          type_id="3"
          status="-"
          setter="darin"
    />
          </attachment>
      

    </bug>

</bugzilla>