Bug 44641 - Implement Base64 HTML entities
Summary: Implement Base64 HTML entities
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Enhancement
Assignee: Adam Barth
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-25 15:41 PDT by Adam Barth
Modified: 2011-05-23 10:16 PDT (History)
6 users (show)

See Also:


Attachments
Work in progress (14.54 KB, patch)
2010-08-25 15:42 PDT, Adam Barth
no flags Details | Formatted Diff | Diff
Patch (23.67 KB, patch)
2010-08-25 16:23 PDT, Adam Barth
no flags Details | Formatted Diff | Diff
Patch (23.97 KB, patch)
2010-08-25 16:40 PDT, Adam Barth
no flags Details | Formatted Diff | Diff
Patch (12.78 KB, patch)
2010-09-06 03:20 PDT, Adam Barth
abarth: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Adam Barth 2010-08-25 15:41:29 PDT
Implement Base64 HTML entities
Comment 1 Adam Barth 2010-08-25 15:42:49 PDT
Created attachment 65484 [details]
Work in progress
Comment 2 Adam Barth 2010-08-25 16:23:15 PDT
Created attachment 65495 [details]
Patch
Comment 3 Eric Seidel (no email) 2010-08-25 16:33:03 PDT
Attachment 65495 [details] did not build on mac:
Build output: http://queues.webkit.org/results/3755634
Comment 4 Alexey Proskuryakov 2010-08-25 16:33:39 PDT
I got curious, and found this explanation: <http://www.mail-archive.com/whatwg@lists.whatwg.org/msg23193.html>. The idea seems to be that it will be slightly easier to use this new mechanism to escape untrusted content (but one would still have to remember to escape, and forgetting to do that is the most common issue AFAIK). An obvious downside is that inserted untrusted content will be unreadable by humans.
Comment 5 Adam Barth 2010-08-25 16:40:38 PDT
Created attachment 65499 [details]
Patch
Comment 6 Adam Barth 2010-08-25 16:46:33 PDT
Yep.  This is not a part of HTML5 (yet).  The goal is to make it easier for folks to add untrusted content to their document while avoiding cross-site scripting.

Here's a design document that shows some of the thinking that lead to this design:

https://docs.google.com/document/edit?id=1Uye7FCE7sIouru_9ayiyYRDP_ibjY6ZcOeImWH1pFrE&hl=en&authkey=CLO4uYIN

The design in this patch is simpler than some of the other ideas in that document.
Comment 7 Adam Barth 2010-08-25 16:55:36 PDT
Here's the summary from the email if you don't want to click through.

== Summary ==

HTML should support Base64-encoded entities to make it easier for
authors to include untrusted content in their documents without
risking XSS.  For example,

&%SFRNTDUncyA8Y2FudmFzPiBlbGVtZW50IGlzIGF3ZXNvbWUuCg==;

would decode to "HTML5's <canvas> element is awesome."  Notice that
the < and > characters get emitted by the parser as character tokens.
That means they can't be used by an attacker for XSS.  These entities
can be used safely both in intertag content as well as in attribute
values.
Comment 8 Adam Barth 2010-09-06 03:20:40 PDT
Created attachment 66617 [details]
Patch
Comment 9 Oliver Hunt 2010-09-06 11:30:07 PDT
(In reply to comment #7)
> Here's the summary from the email if you don't want to click through.
> 
> == Summary ==
> 
> HTML should support Base64-encoded entities to make it easier for
> authors to include untrusted content in their documents without
> risking XSS.  For example,
> 
> &%SFRNTDUncyA8Y2FudmFzPiBlbGVtZW50IGlzIGF3ZXNvbWUuCg==;
> 
> would decode to "HTML5's <canvas> element is awesome."  Notice that
> the < and > characters get emitted by the parser as character tokens.
> That means they can't be used by an attacker for XSS.  These entities
> can be used safely both in intertag content as well as in attribute
> values.

What use cases does this solve that aren't already solved by innerText and/or innerStaticHTML ?
Comment 10 Adam Barth 2010-09-06 11:37:23 PDT
> What use cases does this solve that aren't already solved by innerText and/or innerStaticHTML ?

They solve different problems.  innerText/innerStaticHTML let you modify a DOM node safely where as base64 entities give you a safe way of transmitting untrusted data from the server to the client.

Put another way, if you want to use innerText/innerStaticHTML, you still need a safe way of getting the untrusted content you want to assign to those properties from the server to the client.  That's the problem that base64 entities solve.
Comment 11 Oliver Hunt 2010-09-06 11:40:55 PDT
(In reply to comment #10)
> > What use cases does this solve that aren't already solved by innerText and/or innerStaticHTML ?
> 
> They solve different problems.  innerText/innerStaticHTML let you modify a DOM node safely where as base64 entities give you a safe way of transmitting untrusted data from the server to the client.
> 
> Put another way, if you want to use innerText/innerStaticHTML, you still need a safe way of getting the untrusted content you want to assign to those properties from the server to the client.  That's the problem that base64 entities solve.

there's already base64 decode support in JS (through btoa)

Also what encoding is used in base64?  based on atob/btoa behaviour base64 doesn't support multibyte characters, so this needs to be specified.
Comment 12 Adam Barth 2010-09-06 12:16:46 PDT
> there's already base64 decode support in JS (through btoa)

Imagine a PHP script that wants to send an untrusted string to the client at a particular point in the output stream.  They can't do the following:

<?php echo "<script>document.write(btoa('".base64_encode($untrusted_string)."'));</script>" ?>

because that's XSS.  However, they can do:

<?php echo "&%'".base64_encode($untrusted_string)."';" ?>

That's safe.

> Also what encoding is used in base64?

UTF8.

> based on atob/btoa behaviour base64 doesn't support multibyte characters, so this needs to be specified.

The btoa behavior is really nutty and also needs to be specified.  :)
Comment 13 Adam Barth 2010-09-06 12:18:08 PDT
Rather:

<?php echo "&%".base64_encode($untrusted_string).";" ?>

(removed extra ' characters that snuck in).
Comment 14 Alexey Proskuryakov 2010-11-15 10:19:39 PST
> > Also what encoding is used in base64?
> UTF8.

I think that this needs to explicitly mention what happens to bad UTF-8 (unpaired surrogates, misplaced BOMs, overlong sequences etc). With tests!
Comment 15 Adam Barth 2010-11-15 13:02:52 PST
(In reply to comment #14)
> > > Also what encoding is used in base64?
> > UTF8.
> 
> I think that this needs to explicitly mention what happens to bad UTF-8 (unpaired surrogates, misplaced BOMs, overlong sequences etc). With tests!

Sure.  It should probably do the same thing as http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#preprocessing-the-input-stream without the CR/LF magic (and possibly without the null byte magic).
Comment 16 Adam Barth 2010-12-21 01:17:35 PST
Comment on attachment 66617 [details]
Patch

No love for this patch, apparently.
Comment 17 Adam Barth 2011-05-23 10:16:34 PDT
This idea never got enough traction.