165790 – makeString UChar* adaptor is silly

RESOLVED DUPLICATE of bug 190187165790

makeString UChar* adaptor is silly

https://bugs.webkit.org/show_bug.cgi?id=165790

Summary makeString UChar* adaptor is silly

JF Bastien

Reported 2016-12-12 22:52:13 PST

This code seems silly: unsigned length = 0; while (m_characters[length]) ++length; if (length > std::numeric_limits<unsigned>::max()) CRASH(); I'll clean it up as a follow-up to #147142. I'd appreciate guidance on what a good upper-bound is. IMO getting to anything "big" without finding a 0 means we're severely broken already, so crashing is the right action. I'm just not sure: 1. Has this ever happened? Even ignoring the unsigned comparison to its maximum, we'd need to have 4GiB of continuous non-zero readable data. 2. What would a better limit be for strings? I'm not a fan of hard-coded values, but there's got to be a number which ought to be Good Enough for Everyone™.

Attachments
patch (1.91 KB, patch) 2016-12-15 10:40 PST, JF Bastien	no flags	Details Formatted Diff Diff
patch (2.32 KB, patch) 2016-12-19 20:41 PST, JF Bastien	beidson: review-	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Darin Adler

Comment 1 2016-12-14 09:27:36 PST

String length overflow checks have indeed been critical for security in the past. We started without having almost any of them in WebKit and added lots of them over the years as we resolved specific security issues, and especially as we started dealing with a mix of 32-bit and 64-bit values. Some checks that seem obviously impossible to hit actually can be hit quite easily on 64-bit systems with lots of RAM, when code runs that intentionally tries to get the engine to generate a giant string. It’s not immediately obvious from looking at one function what callers might be directly or indirectly controlled by an attacker looking for a vulnerability. Adding a new lower limit is an interesting project; it’s hard to know what cases might legitimately lead to a long string just from looking at the lowest level. You typically have to look at all the callers to understand what they are doing. Having a lower limit simply changes a "hang" into a "crash", so the benefit of finding a lower value is a modest one, unless we use the limit is some other way, perhaps throwing a JavaScript exception or something like that. Typically when we add the overflow checks, most don’t try to implement a global “sensible policy”, but rather catch things that actually reach points we know the implementation can’t handle without overflowing or such. It’s much harder to come up with an arbitrarily “biggest normal value” limit. Not that it can’t be done, but it’s more challenging. Strings stored in WTF::StringImpl have a maximum length smaller than the maximum 32-bit unsigned integer, depending on how the StringImpl is created. The practical limit is in the neighborhood of 2^31.

Darin Adler

Comment 2 2016-12-14 09:29:16 PST

(In reply to comment #0) > This code seems silly: > > unsigned length = 0; > while (m_characters[length]) > ++length; > > if (length > std::numeric_limits<unsigned>::max()) > CRASH(); Wait, that’s beyond silly. That’s just dead unreachable code. If the length is greater than the maximum unsigned, then we will have already overflowed before checking it. I’m surprised the compiler doesn’t issue an error or warning. The length local variable had a 64-bit type then the code would be “merely silly”.

Darin Adler

Comment 3 2016-12-14 09:29:34 PST

(In reply to comment #2) > The length local variable had a 64-bit type then the code would be “merely > silly”. If the ...

JF Bastien

Comment 4 2016-12-14 09:30:22 PST

(In reply to comment #2) > (In reply to comment #0) > > This code seems silly: > > > > unsigned length = 0; > > while (m_characters[length]) > > ++length; > > > > if (length > std::numeric_limits<unsigned>::max()) > > CRASH(); > > Wait, that’s beyond silly. That’s just dead unreachable code. If the length > is greater than the maximum unsigned, then we will have already overflowed > before checking it. I’m surprised the compiler doesn’t issue an error or > warning. > > The length local variable had a 64-bit type then the code would be “merely > silly”. :-)

Darin Adler

Comment 5 2016-12-14 09:39:17 PST

I suggest we change this to use size_t and otherwise leave it alone, based on the rationale I gave above. But perhaps there’s something even better we can do. Seems unimportant to optimize this case; null-character terminated UChar strings are super rare on iOS and macOS, and come up mostly in when dealing with results of functions on Windows, so this isn’t likely to be used in hot code.

JF Bastien

Comment 6 2016-12-15 10:40:52 PST

Created attachment 297201 [details] patch I don't think size_t is a good idea: on 32-bit platforms we're back to the same problem. On 64-bit platforms we'll just keep counting as we were before: way too much! Better to just call it a day IMO. If we ever encounter a 1GiB string without a null terminator in it then we're already having a bad time. WDYT?

Darin Adler

Comment 7 2016-12-15 12:45:03 PST

(In reply to comment #6) > on 32-bit platforms we're back to the same problem On 32-bit platforms there's no way to overflow a 32-bit usigned integer walking through a string a byte at a time unless the entire computer has no zero bytes in it. So the code is dead code, and code that does no harm. The whole point of the code, I think, is to prevent us from overflowing 32 bits and passing an inappropriately small value to the create function. > On 64-bit platforms we'll just keep counting as we were before: way too > much! Better to just call it a day IMO. I don’t know what “better to call it a day” means in practice. > If we ever encounter a 1GiB string without a null terminator in it then > we're already having a bad time. WDYT? I don’t think we should waste our time choosing an arbitrary limit. I don’t think a 1GiB limit is better than a 4GiB limit. Maybe you could give a concrete example where you think the smaller limit will make things significantly better? I think maybe we should could just remove the check entirely, because the symptom of creating a too-small string might be harmless.

Alexey Proskuryakov

Comment 8 2016-12-17 14:49:50 PST

One scenario where we've seen large blobs of data is dealing with files (especially uploading videos to YouTube). Indeed, it seems unlikely that those will ever be passed through this adapter.

JF Bastien

Comment 9 2016-12-19 20:41:56 PST

Created attachment 297504 [details] patch You're right, the limit isn't helpful and deleting the check is better. I bumped the length to a size_t though, so at least it doesn't wrap around if we do get a huge string (though as was pointed out the subsequent string may have its own arbitrary limit).

Darin Adler

Comment 10 2016-12-20 12:45:42 PST

Comment on attachment 297504 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=297504&action=review > Source/WTF/wtf/text/StringConcatenate.h:147 > + size_t length() const { return m_length; } Where are the callers to this length function? Did we audit them to make sure they correctly handle size_t rather than unsigned?

Darin Adler

Comment 11 2016-12-21 13:24:22 PST

Comment on attachment 297504 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=297504&action=review >> Source/WTF/wtf/text/StringConcatenate.h:147 >> + size_t length() const { return m_length; } > > Where are the callers to this length function? Did we audit them to make sure they correctly handle size_t rather than unsigned? I will likely immediately r+ this patch once I get the answer to this, either by researching it myself, or by you answering here.

JF Bastien

Comment 12 2016-12-21 13:33:52 PST

(In reply to comment #11) > Comment on attachment 297504 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=297504&action=review > > >> Source/WTF/wtf/text/StringConcatenate.h:147 > >> + size_t length() const { return m_length; } > > > > Where are the callers to this length function? Did we audit them to make sure they correctly handle size_t rather than unsigned? > > I will likely immediately r+ this patch once I get the answer to this, > either by researching it myself, or by you answering here. Sorry for the slow response rater on this. It's a great question and I want to go through the code properly before getting back to you. I'll do this soon, but have a few other things I want to get to first.

JF Bastien

Comment 13 2018-01-24 09:55:41 PST

Apparently this will be flagged as a tautological compare by upcoming clang: length > std::numeric_limits<unsigned>::max() Should we just remove it for now (since it is tautological) or use size_t as I suggest? I don't have time to audit code as I said I would (then forgot...).

Brady Eidson

Comment 14 2018-02-14 10:36:37 PST

Comment on attachment 297504 [details] patch Patches that have been up for review since 2016 are almost certainly too stale to be relevant to trunk in their current form. If this patch is still important please rebase it and post it for review again.

Filip Pizlo

Comment 15 2018-02-26 21:23:24 PST

(In reply to JF Bastien from comment #13) > Apparently this will be flagged as a tautological compare by upcoming clang: > length > std::numeric_limits<unsigned>::max() > > Should we just remove it for now (since it is tautological) or use size_t as > I suggest? I don't have time to audit code as I said I would (then > forgot...). I think that you should remove the comparison first and then do the size_t thing and audit.

Mark Lam

Comment 16 2018-10-02 13:06:42 PDT

This issue will be handled when we fix https://bugs.webkit.org/show_bug.cgi?id=190187. *** This bug has been marked as a duplicate of bug 190187 ***

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution DUPLICATE

of bug 190187

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Web Template Framework

Assignee

JF Bastien

Reported

2016-12-12 22:52 PST

Modified

2022-02-28 03:46 PST History

CC List

15 users Show

URL

Keywords

Depends on

147142

Blocks

Dependencies

tree graph