17689 – Reject long UTF sequences

RESOLVED INVALID17689

Reject long UTF sequences

https://bugs.webkit.org/show_bug.cgi?id=17689

Summary Reject long UTF sequences

jasneet

Reported 2008-03-05 15:57:23 PST

Webkit issue: UTF standards require parsers to reject sequences that were encoded using more bytes than absolutely necessary (for example, standard 7-bit characters encoded as 2 or 4-byte strings, e.g. &#0000106, either as a binary value or a HTML entity). Modify the renderer to reject such characters, as they have no legitimate use, but are routinely abused to carry out cross-site scripting attacks (attempts to close HTML tags and inject code, when obfuscated this way, routinely bypass filters).

Attachments
test case (works as expected) (87 bytes, text/html) 2008-03-05 22:58 PST, Alexey Proskuryakov	no flags	Details
reduction (140 bytes, text/html) 2008-03-24 15:07 PDT, jasneet	no flags	Details
View All Add attachment proposed patch, testcase, etc.

Alexey Proskuryakov

Comment 1 2008-03-05 22:58:38 PST

Created attachment 19565 [details] test case (works as expected) Yes, our decoder does reject non-shortest UTF forms in all cases I'm aware of. Do you have a specific example of the problem?

jasneet

Comment 2 2008-03-24 15:07:57 PDT

Created attachment 20014 [details] reduction

jasneet

Comment 3 2008-03-24 15:08:30 PDT

Looks like the only remaining worrisome case is multibyte HTML entities. These could be used to bypass filters that differentiate between absolute and relative URLs, and apply restrictions based on this distinction: <a href="javascript&#x0000003aalert(1)">Long HTML entity notation might be used to bypass some URL filters</a> This is not strictly a browser bug, but it has no legitimate uses, and is a common XSS vector against applications, so locking it down is certainly beneficial.

Alexey Proskuryakov

Comment 4 2008-03-25 00:26:08 PDT

In this example, the entity is not only long, but it is not terminated with a semicolon. As such, it is covered by bug 4948. I am not aware of any reason to reject "&#x0000003a;", though - other browsers handle this just fine, and standards do not disallow it AFAIK.

Anne van Kesteren

Comment 5 2023-04-01 00:23:09 PDT

Indeed, this behavior is covered by the HTML Standard.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution INVALID

Priority P3

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware PC

OS Windows XP

Product WebKit

Component WebKit Misc.

Assignee

Nobody

Reported

2008-03-05 15:57 PST

Modified

2023-04-01 00:23 PDT History

CC List

4 users Show

URL

Keywords

Depends on

Blocks