Bug 14124

Summary: [CURL] Support data URLs
Product: WebKit Reporter: Alp Toker <alp>
Component: New BugsAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal Keywords: Gtk
Priority: P2    
Version: 523.x (Safari 3)   
Hardware: All   
OS: All   
Attachments:
Description Flags
Add data URL support
andersca: review-
Add data URL support, take two
none
Add data URL support, take three (with ChangeLog entry) andersca: review+

Description Alp Toker 2007-06-13 11:37:09 PDT
Data URLs currently cause:

Curl ERROR for url='http://data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABgAAABAAQMAAAA6ZCUhAAAABlBMVEUAAAAup+mvJKLXAAAAAnRSTlMA/1uRIrUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAAZSURBVHicY2D4z8BACf5PAFNqPkELKDMfANtJT7FBP1fLAAAAAElFTkSuQmCC', error: 'couldn't resolve host name'

The Qt port already has code for this, so a solution that can be shared between ports may be a good idea.
Comment 1 Alp Toker 2007-10-23 14:14:06 PDT
Note that the Qt port no longer has code for this (they must have added the feature to Qt itself), though it's still in SVN history.

This feature is not so hard to implement, but the state of the curl backend makes it little fun to hack on.

This bug continues to be one of the most annoying, since a lot of SVG and canvas tests use data: URLs for inline images, even though it's not a feature used widely on the Web.
Comment 2 Alp Toker 2007-10-26 02:33:22 PDT
Created attachment 16876 [details]
Add data URL support
Comment 3 Anders Carlsson 2007-10-26 16:12:14 PDT
Comment on attachment 16876 [details]
Add data URL support

>     unsigned len = in.size();
>     const char* data = in.data();
>

Can these just be folded into the new base64Decode call now?

>+    return base64Decode(data, len, out);
>+}
>+
>+bool base64Decode(const char* data, unsigned len, Vector<char>& out)
>+{
>+    out.clear();
>+    if (len == 0)
>+        return true;
>+
>+    // If the input string is pathologically large, just return nothing.
>+    if (len > UINT_MAX)
>+        return false;
>+

Why is this change needed?

>     while (len && data[len-1] == '=')
>         --len;
> 

>+static void parseDataUrl(ResourceHandle* job)

Please name the variable "handle" instead of "job".

>+{
>+    DeprecatedString data = job->request().url().url().latin1();

Why do you need to call latin1() here? Also, we recommend against using DeprecatedString in newly written code.

>+
>+    ASSERT(data.startsWith("data:"));

URL schemes are case insensitive so this is not correct.

>+
> void ResourceHandleManager::startJob(ResourceHandle* job)

I take it that startJob is called from the main loop.

> {
>+    KURL kurl = job->request().url();
>+    String protocol = kurl.protocol();
>+
>+    if (protocol == "data") {

URL schemes are insensitive, please use equalIgnoringCase here.

>+        parseDataUrl(job);
>+        return;
>+    }
>+
>     ResourceHandleInternal* d = job->getInternal();
Comment 4 Alp Toker 2007-10-26 16:27:44 PDT
(In reply to comment #3)
> (From update of attachment 16876 [details] [edit])
> >     unsigned len = in.size();
> >     const char* data = in.data();
> >
> 
> Can these just be folded into the new base64Decode call now?

Like this?

    return base64Decode(in.data(), in.size(), out);

Sure.

> 
> >+    return base64Decode(data, len, out);
> >+}
> >+
> >+bool base64Decode(const char* data, unsigned len, Vector<char>& out)
> >+{
> >+    out.clear();
> >+    if (len == 0)
> >+        return true;
> >+
> >+    // If the input string is pathologically large, just return nothing.
> >+    if (len > UINT_MAX)
> >+        return false;
> >+
> 
> Why is this change needed?

These checks were in the original base64Decode(). The first is a base case and seems to still be necessary, but the second is probably redundant.

> 
> >     while (len && data[len-1] == '=')
> >         --len;
> > 
> 
> >+static void parseDataUrl(ResourceHandle* job)
> 
> Please name the variable "handle" instead of "job".

OK.

> 
> >+{
> >+    DeprecatedString data = job->request().url().url().latin1();
> 
> Why do you need to call latin1() here? Also, we recommend against using
> DeprecatedString in newly written code.

Its features were closest to the original code (eg. DeprecatedString::mid()). KURL's KURL::decode_string() also expects and returns DeprecatedString, which we call for percent-encoded data.
 
> 
> >+
> >+    ASSERT(data.startsWith("data:"));
> 
> URL schemes are case insensitive so this is not correct.

OK.

> 
> >+
> > void ResourceHandleManager::startJob(ResourceHandle* job)
> 
> I take it that startJob is called from the main loop.

Yeah. The job then gets queued.

> 
> > {
> >+    KURL kurl = job->request().url();
> >+    String protocol = kurl.protocol();
> >+
> >+    if (protocol == "data") {
> 
> URL schemes are insensitive, please use equalIgnoringCase here.

OK.

> 
> >+        parseDataUrl(job);
> >+        return;
> >+    }
> >+
> >     ResourceHandleInternal* d = job->getInternal();
> 

Comment 5 Anders Carlsson 2007-10-26 16:30:55 PDT
(In reply to comment #4)
> (In reply to comment #3)
> > (From update of attachment 16876 [details] [edit] [edit])
> > >     unsigned len = in.size();
> > >     const char* data = in.data();
> > >
> > 
> > Can these just be folded into the new base64Decode call now?
> 
> Like this?
> 
>     return base64Decode(in.data(), in.size(), out);
> 
> Sure.

Good.

> 
> > 
> > >+    return base64Decode(data, len, out);
> > >+}
> > >+
> > >+bool base64Decode(const char* data, unsigned len, Vector<char>& out)
> > >+{
> > >+    out.clear();
> > >+    if (len == 0)
> > >+        return true;
> > >+
> > >+    // If the input string is pathologically large, just return nothing.
> > >+    if (len > UINT_MAX)
> > >+        return false;
> > >+
> > 
> > Why is this change needed?
> 
> These checks were in the original base64Decode(). The first is a base case and
> seems to still be necessary, but the second is probably redundant.
>

Are they still in the method that takes a vector? If so, can they be removed from there?
 
> > 
> > >     while (len && data[len-1] == '=')
> > >         --len;
> > > 
> > 
> > >+static void parseDataUrl(ResourceHandle* job)
> > 
> > Please name the variable "handle" instead of "job".
> 
> OK.
> 
> > 
> > >+{
> > >+    DeprecatedString data = job->request().url().url().latin1();
> > 
> > Why do you need to call latin1() here? Also, we recommend against using
> > DeprecatedString in newly written code.
> 
> Its features were closest to the original code (eg. DeprecatedString::mid()).
> KURL's KURL::decode_string() also expects and returns DeprecatedString, which
> we call for percent-encoded data.
>

String::substring is the equivalent of DeprecatedString::mid. 

Comment 6 Alp Toker 2007-10-26 18:00:26 PDT
Created attachment 16892 [details]
Add data URL support, take two

This addresses the issues in the review, except for DeprecatedString.

I'm not comfortable with the idea of storing binary data in a string class either way, so I don't think using String instead of DeprecatedString will be any less broken.
Comment 7 Alp Toker 2007-10-26 18:01:58 PDT
Created attachment 16893 [details]
Add data URL support, take three (with ChangeLog entry)
Comment 8 Anders Carlsson 2007-10-27 22:41:12 PDT
Comment on attachment 16893 [details]
Add data URL support, take three (with ChangeLog entry)

Looks great, r=me
Comment 9 Alp Toker 2007-10-28 15:00:17 PDT
Landed in r27189.