Bug 165790 - makeString UChar* adaptor is silly
Summary: makeString UChar* adaptor is silly
Status: RESOLVED DUPLICATE of bug 190187
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: JF Bastien
URL:
Keywords:
Depends on: 147142
Blocks:
  Show dependency treegraph
 
Reported: 2016-12-12 22:52 PST by JF Bastien
Modified: 2022-02-28 03:46 PST (History)
15 users (show)

See Also:


Attachments
patch (1.91 KB, patch)
2016-12-15 10:40 PST, JF Bastien
no flags Details | Formatted Diff | Diff
patch (2.32 KB, patch)
2016-12-19 20:41 PST, JF Bastien
beidson: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description JF Bastien 2016-12-12 22:52:13 PST
This code seems silly:

        unsigned length = 0;
        while (m_characters[length])
            ++length;

        if (length > std::numeric_limits<unsigned>::max())
            CRASH();

I'll clean it up as a follow-up to #147142. I'd appreciate guidance on what a good upper-bound is. IMO getting to anything "big" without finding a 0 means we're severely broken already, so crashing is the right action. I'm just not sure:

  1. Has this ever happened? Even ignoring the unsigned comparison to its maximum, we'd need to have 4GiB of continuous non-zero readable data.
  2. What would a better limit be for strings? I'm not a fan of hard-coded values, but there's got to be a number which ought to be Good Enough for Everyone™.
Comment 1 Darin Adler 2016-12-14 09:27:36 PST
String length overflow checks have indeed been critical for security in the past. We started without having almost any of them in WebKit and added lots of them over the years as we resolved specific security issues, and especially as we started dealing with a mix of 32-bit and 64-bit values.

Some checks that seem obviously impossible to hit actually can be hit quite easily on 64-bit systems with lots of RAM, when code runs that intentionally tries to get the engine to generate a giant string. It’s not immediately obvious from looking at one function what callers might be directly or indirectly controlled by an attacker looking for a vulnerability.

Adding a new lower limit is an interesting project; it’s hard to know what cases might legitimately lead to a long string just from looking at the lowest level. You typically have to look at all the callers to understand what they are doing. Having a lower limit simply changes a "hang" into a "crash", so the benefit of finding a lower value is a modest one, unless we use the limit is some other way, perhaps throwing a JavaScript exception or something like that.

Typically when we add the overflow checks, most don’t try to implement a global “sensible policy”, but rather catch things that actually reach points we know the implementation can’t handle without overflowing or such. It’s much harder to come up with an arbitrarily “biggest normal value” limit. Not that it can’t be done, but it’s more challenging.

Strings stored in WTF::StringImpl have a maximum length smaller than the maximum 32-bit unsigned integer, depending on how the StringImpl is created. The practical limit is in the neighborhood of 2^31.
Comment 2 Darin Adler 2016-12-14 09:29:16 PST
(In reply to comment #0)
> This code seems silly:
> 
>         unsigned length = 0;
>         while (m_characters[length])
>             ++length;
> 
>         if (length > std::numeric_limits<unsigned>::max())
>             CRASH();

Wait, that’s beyond silly. That’s just dead unreachable code. If the length is greater than the maximum unsigned, then we will have already overflowed before checking it. I’m surprised the compiler doesn’t issue an error or warning.

The length local variable had a 64-bit type then the code would be “merely silly”.
Comment 3 Darin Adler 2016-12-14 09:29:34 PST
(In reply to comment #2)
> The length local variable had a 64-bit type then the code would be “merely
> silly”.

If the ...
Comment 4 JF Bastien 2016-12-14 09:30:22 PST
(In reply to comment #2)
> (In reply to comment #0)
> > This code seems silly:
> > 
> >         unsigned length = 0;
> >         while (m_characters[length])
> >             ++length;
> > 
> >         if (length > std::numeric_limits<unsigned>::max())
> >             CRASH();
> 
> Wait, that’s beyond silly. That’s just dead unreachable code. If the length
> is greater than the maximum unsigned, then we will have already overflowed
> before checking it. I’m surprised the compiler doesn’t issue an error or
> warning.
> 
> The length local variable had a 64-bit type then the code would be “merely
> silly”.

:-)
Comment 5 Darin Adler 2016-12-14 09:39:17 PST
I suggest we change this to use size_t and otherwise leave it alone, based on the rationale I gave above.

But perhaps there’s something even better we can do.

Seems unimportant to optimize this case; null-character terminated UChar strings are super rare on iOS and macOS, and come up mostly in when dealing with results of functions on Windows, so this isn’t likely to be used in hot code.
Comment 6 JF Bastien 2016-12-15 10:40:52 PST
Created attachment 297201 [details]
patch

I don't think size_t is a good idea: on 32-bit platforms we're back to the same problem.
On 64-bit platforms we'll just keep counting as we were before: way too much! Better to just call it a day IMO.

If we ever encounter a 1GiB string without a null terminator in it then we're already having a bad time. WDYT?
Comment 7 Darin Adler 2016-12-15 12:45:03 PST
(In reply to comment #6)
> on 32-bit platforms we're back to the same problem

On 32-bit platforms there's no way to overflow a 32-bit usigned integer walking through a string a byte at a time unless the entire computer has no zero bytes in it. So the code is dead code, and code that does no harm. The whole point of the code, I think, is to prevent us from overflowing 32 bits and passing an inappropriately small value to the create function.

> On 64-bit platforms we'll just keep counting as we were before: way too
> much! Better to just call it a day IMO.

I don’t know what “better to call it a day” means in practice.

> If we ever encounter a 1GiB string without a null terminator in it then
> we're already having a bad time. WDYT?

I don’t think we should waste our time choosing an arbitrary limit. I don’t think a 1GiB limit is better than a 4GiB limit.

Maybe you could give a concrete example where you think the smaller limit will make things significantly better?

I think maybe we should could just remove the check entirely, because the symptom of creating a too-small string might be harmless.
Comment 8 Alexey Proskuryakov 2016-12-17 14:49:50 PST
One scenario where we've seen large blobs of data is dealing with files (especially uploading videos to YouTube). Indeed, it seems unlikely that those will ever be passed through this adapter.
Comment 9 JF Bastien 2016-12-19 20:41:56 PST
Created attachment 297504 [details]
patch

You're right, the limit isn't helpful and deleting the check is better. I bumped the length to a size_t though, so at least it doesn't wrap around if we do get a huge string (though as was pointed out the subsequent string may have its own arbitrary limit).
Comment 10 Darin Adler 2016-12-20 12:45:42 PST
Comment on attachment 297504 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=297504&action=review

> Source/WTF/wtf/text/StringConcatenate.h:147
> +    size_t length() const { return m_length; }

Where are the callers to this length function? Did we audit them to make sure they correctly handle size_t rather than unsigned?
Comment 11 Darin Adler 2016-12-21 13:24:22 PST
Comment on attachment 297504 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=297504&action=review

>> Source/WTF/wtf/text/StringConcatenate.h:147
>> +    size_t length() const { return m_length; }
> 
> Where are the callers to this length function? Did we audit them to make sure they correctly handle size_t rather than unsigned?

I will likely immediately r+ this patch once I get the answer to this, either by researching it myself, or by you answering here.
Comment 12 JF Bastien 2016-12-21 13:33:52 PST
(In reply to comment #11)
> Comment on attachment 297504 [details]
> patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=297504&action=review
> 
> >> Source/WTF/wtf/text/StringConcatenate.h:147
> >> +    size_t length() const { return m_length; }
> > 
> > Where are the callers to this length function? Did we audit them to make sure they correctly handle size_t rather than unsigned?
> 
> I will likely immediately r+ this patch once I get the answer to this,
> either by researching it myself, or by you answering here.

Sorry for the slow response rater on this. It's a great question and I want to go through the code properly before getting back to you. I'll do this soon, but have a few other things I want to get to first.
Comment 13 JF Bastien 2018-01-24 09:55:41 PST
Apparently this will be flagged as a tautological compare by upcoming clang:
    length > std::numeric_limits<unsigned>::max()

Should we just remove it for now (since it is tautological) or use size_t as I suggest? I don't have time to audit code as I said I would (then forgot...).
Comment 14 Brady Eidson 2018-02-14 10:36:37 PST
Comment on attachment 297504 [details]
patch

Patches that have been up for review since 2016 are almost certainly too stale to be relevant to trunk in their current form.

If this patch is still important please rebase it and post it for review again.
Comment 15 Filip Pizlo 2018-02-26 21:23:24 PST
(In reply to JF Bastien from comment #13)
> Apparently this will be flagged as a tautological compare by upcoming clang:
>     length > std::numeric_limits<unsigned>::max()
> 
> Should we just remove it for now (since it is tautological) or use size_t as
> I suggest? I don't have time to audit code as I said I would (then
> forgot...).

I think that you should remove the comparison first and then do the size_t thing and audit.
Comment 16 Mark Lam 2018-10-02 13:06:42 PDT
This issue will be handled when we fix https://bugs.webkit.org/show_bug.cgi?id=190187.

*** This bug has been marked as a duplicate of bug 190187 ***