Implement Base64 HTML entities
Created attachment 65484 [details] Work in progress
Created attachment 65495 [details] Patch
Attachment 65495 [details] did not build on mac: Build output: http://queues.webkit.org/results/3755634
I got curious, and found this explanation: <http://www.mail-archive.com/whatwg@lists.whatwg.org/msg23193.html>. The idea seems to be that it will be slightly easier to use this new mechanism to escape untrusted content (but one would still have to remember to escape, and forgetting to do that is the most common issue AFAIK). An obvious downside is that inserted untrusted content will be unreadable by humans.
Created attachment 65499 [details] Patch
Yep. This is not a part of HTML5 (yet). The goal is to make it easier for folks to add untrusted content to their document while avoiding cross-site scripting. Here's a design document that shows some of the thinking that lead to this design: https://docs.google.com/document/edit?id=1Uye7FCE7sIouru_9ayiyYRDP_ibjY6ZcOeImWH1pFrE&hl=en&authkey=CLO4uYIN The design in this patch is simpler than some of the other ideas in that document.
Here's the summary from the email if you don't want to click through. == Summary == HTML should support Base64-encoded entities to make it easier for authors to include untrusted content in their documents without risking XSS. For example, &%SFRNTDUncyA8Y2FudmFzPiBlbGVtZW50IGlzIGF3ZXNvbWUuCg==; would decode to "HTML5's <canvas> element is awesome." Notice that the < and > characters get emitted by the parser as character tokens. That means they can't be used by an attacker for XSS. These entities can be used safely both in intertag content as well as in attribute values.
Created attachment 66617 [details] Patch
(In reply to comment #7) > Here's the summary from the email if you don't want to click through. > > == Summary == > > HTML should support Base64-encoded entities to make it easier for > authors to include untrusted content in their documents without > risking XSS. For example, > > &%SFRNTDUncyA8Y2FudmFzPiBlbGVtZW50IGlzIGF3ZXNvbWUuCg==; > > would decode to "HTML5's <canvas> element is awesome." Notice that > the < and > characters get emitted by the parser as character tokens. > That means they can't be used by an attacker for XSS. These entities > can be used safely both in intertag content as well as in attribute > values. What use cases does this solve that aren't already solved by innerText and/or innerStaticHTML ?
> What use cases does this solve that aren't already solved by innerText and/or innerStaticHTML ? They solve different problems. innerText/innerStaticHTML let you modify a DOM node safely where as base64 entities give you a safe way of transmitting untrusted data from the server to the client. Put another way, if you want to use innerText/innerStaticHTML, you still need a safe way of getting the untrusted content you want to assign to those properties from the server to the client. That's the problem that base64 entities solve.
(In reply to comment #10) > > What use cases does this solve that aren't already solved by innerText and/or innerStaticHTML ? > > They solve different problems. innerText/innerStaticHTML let you modify a DOM node safely where as base64 entities give you a safe way of transmitting untrusted data from the server to the client. > > Put another way, if you want to use innerText/innerStaticHTML, you still need a safe way of getting the untrusted content you want to assign to those properties from the server to the client. That's the problem that base64 entities solve. there's already base64 decode support in JS (through btoa) Also what encoding is used in base64? based on atob/btoa behaviour base64 doesn't support multibyte characters, so this needs to be specified.
> there's already base64 decode support in JS (through btoa) Imagine a PHP script that wants to send an untrusted string to the client at a particular point in the output stream. They can't do the following: <?php echo "<script>document.write(btoa('".base64_encode($untrusted_string)."'));</script>" ?> because that's XSS. However, they can do: <?php echo "&%'".base64_encode($untrusted_string)."';" ?> That's safe. > Also what encoding is used in base64? UTF8. > based on atob/btoa behaviour base64 doesn't support multibyte characters, so this needs to be specified. The btoa behavior is really nutty and also needs to be specified. :)
Rather: <?php echo "&%".base64_encode($untrusted_string).";" ?> (removed extra ' characters that snuck in).
> > Also what encoding is used in base64? > UTF8. I think that this needs to explicitly mention what happens to bad UTF-8 (unpaired surrogates, misplaced BOMs, overlong sequences etc). With tests!
(In reply to comment #14) > > > Also what encoding is used in base64? > > UTF8. > > I think that this needs to explicitly mention what happens to bad UTF-8 (unpaired surrogates, misplaced BOMs, overlong sequences etc). With tests! Sure. It should probably do the same thing as http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#preprocessing-the-input-stream without the CR/LF magic (and possibly without the null byte magic).
Comment on attachment 66617 [details] Patch No love for this patch, apparently.
This idea never got enough traction.