Bug 88936

Summary: StringImpl::characters can return NULL for an empty string
Product: WebKit Reporter: Myles C. Maxfield <litherum>
Component: Web Template FrameworkAssignee: Myles C. Maxfield <litherum>
Status: UNCONFIRMED ---    
Severity: Normal CC: ap, benjamin, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch benjamin: review-, benjamin: commit-queue-

Description Myles C. Maxfield 2012-06-12 16:43:55 PDT
StringImpl::characters can return NULL for an empty string
Comment 1 Myles C. Maxfield 2012-06-12 16:45:59 PDT
Created attachment 147191 [details]
Patch
Comment 2 Myles C. Maxfield 2012-06-12 16:50:29 PDT
This is pretty hacky, but I'm not sure I can think of a better way.

Callers of characters() shouldn't be free()ing the pointer, so this should be relatively safe.

One alternative would be to assign a non-NULL value to the m_copyData16 variable, and check for it on destruction, but I think this is a little cleaner. 

Another alternative would be to add a "is-valid" byte to each string, but the overhead of an extra byte to each string is probably not worth it.

If a reviewer can think of a less-hacky way to handle this, I'd be happy to do that instead.
Comment 3 Alexey Proskuryakov 2012-06-13 10:46:32 PDT
The bug says that the function can return NULL, but not why it should. Can you explain that?
Comment 4 Myles C. Maxfield 2012-06-13 12:24:17 PDT
fastMalloc(x) can call malloc(x). According to C99 (about malloc): "If the size of the space requested is zero, the behavior is implementation- defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object."

Should I add this to the comment?
Comment 5 Myles C. Maxfield 2012-06-13 13:50:19 PDT
Moving this check into ICU wrapper functions, since they'll be not as hot as characters()
Comment 6 Myles C. Maxfield 2012-06-13 14:21:13 PDT
Created attachment 147410 [details]
Patch
Comment 7 Ryosuke Niwa 2012-06-13 14:33:54 PDT
Comment on attachment 147410 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=147410&action=review

> Source/WTF/wtf/unicode/icu/CollatorICU.cpp:106
> +    if (!lhs && !lhsLength)
> +      lhs = (const UChar*)1;
> +    if (!rhs && !rhsLength)
> +      rhs = (const UChar*)1;

Please use "" here instead.  I don't see any value in using a made up pointer value like 1.
Comment 8 Myles C. Maxfield 2012-06-13 15:00:44 PDT
Created attachment 147419 [details]
Patch
Comment 9 Darin Adler 2012-06-13 15:29:29 PDT
Comment on attachment 147419 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=147419&action=review

> Source/WTF/wtf/unicode/icu/CollatorICU.cpp:101
> +    // The ICU functions have the property where they assume that a null pointer means an invalid string
> +    // (and therefore won't do the comparison). A null pointer could come about here if an empty string
> +    // was allocated with a malloc() implementation that returns null on a zero-sized malloc (which is
> +    // valid according to C99 section 7.20.3). Therefore, we have to change any valid null pointers before
> +    // passing them to ICU.

Comment is much too long. Should say something more like this:

    // ICU does not allow null pointers for empty strings, but we do.

> Source/WTF/wtf/unicode/icu/CollatorICU.cpp:106
> +    if (!lhs && !lhsLength)
> +      lhs = (const UChar*)"";
> +    if (!rhs && !rhsLength)
> +      rhs = (const UChar*)"";

This is wrong. You can’t just cast the pointer to an empty C string to a UChar* and expect it to work. That will read off the end of the buffer.
Comment 10 Darin Adler 2012-06-13 15:31:31 PDT
Comment on attachment 147419 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=147419&action=review

>> Source/WTF/wtf/unicode/icu/CollatorICU.cpp:106
>> +      rhs = (const UChar*)"";
> 
> This is wrong. You can’t just cast the pointer to an empty C string to a UChar* and expect it to work. That will read off the end of the buffer.

Oh, I see, maybe I am wrong. This can literally be any pointer other than a null pointer! Still, do we really need a typecast? I suggest this:

    UChar character;
    if (!lhsLength)
        lhs = &character;
    if (!rhsLength)
        rhs = &character;

Lets avoid that messy casting and also avoid making the conditions too complex.
Comment 11 Myles C. Maxfield 2012-06-13 15:41:00 PDT
Created attachment 147430 [details]
Patch
Comment 12 Alexey Proskuryakov 2012-06-13 15:56:31 PDT
Comment on attachment 147430 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=147430&action=review

> Source/WTF/wtf/unicode/icu/CollatorICU.cpp:99
> +    UChar dummy;

Can we make it static? Pointers pointing to random places are scary.
Comment 13 Myles C. Maxfield 2012-06-13 16:09:05 PDT
Well, it's the address of a stack variable, so it's not random, but I'll make it static anyway to marginally decrease the size of the stack during the duration of this call :-]
Comment 14 Myles C. Maxfield 2012-06-13 16:09:22 PDT
Created attachment 147440 [details]
Patch
Comment 15 Darin Adler 2012-06-13 18:34:29 PDT
Comment on attachment 147440 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=147440&action=review

> Source/WTF/wtf/unicode/icu/CollatorICU.cpp:99
> +    static UChar dummy;

No need to make this static. A pointer to the stack is just as good as a pointer to a global variable.

> Source/WTF/wtf/unicode/icu/CollatorICU.cpp:100
> +    if (!lhs && !lhsLength)

We don’t need to check both of these. Doesn’t matter which we check, but I don’t want an extra branch to check both. My preference would be to only check the length.

> Source/WTF/wtf/unicode/icu/CollatorICU.cpp:101
> +      lhs = &dummy;

WebKit project indents 4 spaces, not 2.

> Source/WTF/wtf/unicode/icu/CollatorICU.cpp:103
> +    if (!rhs && !rhsLength)
> +      rhs = &dummy;

Same comments again.
Comment 16 Myles C. Maxfield 2012-06-14 09:55:24 PDT
Created attachment 147608 [details]
Patch
Comment 17 Myles C. Maxfield 2012-06-18 13:08:22 PDT
Ping?
Comment 18 Eric Seidel (no email) 2012-06-19 20:56:20 PDT
Comment on attachment 147608 [details]
Patch

How can we test this?
Comment 19 Myles C. Maxfield 2012-06-20 12:40:06 PDT
I found this by running V8's mjsunit tests against JavaScriptCore. This was uncovered by http://v8.googlecode.com/svn/trunk/test/mjsunit/deep-recursion.js lines 52 and 55.

I suppose there are a couple options here:
1) Re-write the test as a C++ unit test, creating a Collator, calling collate with (null, 0, "a", 1) and making sure it doesn't return "Equal." I could also reverse the arguments and make sure the output is reversed. I'm not sure where this unit test should end up living.

2) Add all of mjsunit to the JavaScriptCore tests inside Source/JavaScriptCore/tests

3) Add just deep-recursion.js to the JavaScriptCore tests inside Source/JavaScriptCore/tests

4) Add a comment saying this is tested in v8's mjsunit ;-)

This is complicated by the fact that it looks like the only tests in Source/JavaScriptCore/tests are regular expression tests (not applicable here), performance tests (also not applicable here) and the mozilla test suite. This test doesn't belong in any of those places. Also, adding existing source might get complicated because of possibly-conflicting licenses.

I'm willing to do whatever the community thinks is best.
Comment 20 Myles C. Maxfield 2012-06-28 16:04:08 PDT
Ping?
Comment 21 Darin Adler 2012-08-14 16:27:21 PDT
Comment on attachment 147608 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=147608&action=review

> Source/WTF/ChangeLog:9
> +        * wtf/text/StringImpl.h:
> +        (WTF::StringImpl::characters):

This is not the function nor the file we are modifying.
Comment 22 Myles C. Maxfield 2012-08-25 16:42:19 PDT
Created attachment 160578 [details]
Patch
Comment 23 Benjamin Poulain 2012-08-25 17:18:18 PDT
Comment on attachment 160578 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=160578&action=review

I would:
-modify Collator::collate to take two String or StringImpl instead of the raw data
-add the special cases based on the string classes. Why do we even have to invoke ucol_strcoll if one of the string is empty?
-Add a layout test and if possible add a test through Webkit test API.

Do we need ucol_strcoll if the two strings are 8bits anyway?

> Source/WTF/ChangeLog:8
> +        Additional information of the change such as approach, rationale. Please add per-function descriptions below (OOPS!).

No description.

> Source/WTF/wtf/unicode/icu/CollatorICU.cpp:98
> +

blank line.

> Source/WTF/wtf/unicode/icu/CollatorICU.cpp:99
> +    UChar dummy;

UChar dummy = 0;
Comment 24 Myles C. Maxfield 2012-08-26 01:13:40 PDT
Darin had previously marked the patch as review+. Can you (Benjamin) and him (Darin) come to an agreement about what should be done? I'd be happy to implement whatever solution is best, but I'm not sure what that is.

Thanks,
Myles