Bug 67640

Summary: ReadAV(NULL) @ chrome.dll!ubrk_setText_46 #3c121a35,7beced32,3f8c5464
Product: WebKit Reporter: Berend-Jan Wever <skylined>
Component: CSSAssignee: Jungshik Shin <jshin>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, cachobot, commit-queue, darin, eric, jshin, mitz
Priority: P1    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
Repro
none
patch for review
none
patch with a missing file added
none
patch (fix style nits)
mitz: review-
patch updated per mitz's comments mitz: review+

Description Berend-Jan Wever 2011-09-06 06:40:06 PDT
Created attachment 106412 [details]
Repro

Chromium: https://code.google.com/p/chromium/issues/detail?id=95486

Repro:
<style>
  * {
    -webkit-locale: 'xx-_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
    -webkit-text-security: disc;
  }
</style>
xx

There seem to be a number of problems here in WebKit and icu that combine to cause this:
- A missing NULL check in acquireLineBreakIterator,
- A hardcoded size limit for a buffer in ures_open,
- Code that fills a buffer completely, leaving no room for a terminating NULL in _canonicalize.

Source code for WebCore::acquireLineBreakIterator [webkit\source\webcore\platform\text\textbreakiteratoricu.cpp]
--------------
TextBreakIterator* acquireLineBreakIterator(const UChar* string, int length, const AtomicString& locale)
{
    UBreakIterator* iterator = LineBreakIteratorPool::sharedPool().take(locale); /// RETURN NULL

    UErrorCode setTextStatus = U_ZERO_ERROR;
    ubrk_setText(iterator, string, length, &setTextStatus);                      /// NULL ptr crash
    if (U_FAILURE(setTextStatus)) {
        LOG_ERROR("ubrk_setText failed with status %d", setTextStatus);
        return 0;
    }

    return reinterpret_cast<TextBreakIterator*>(iterator);
}
--------------
The code path that sets iterator to NULL is:

WebCore::LineBreakIteratorPool::take [webkit\source\webcore\platform\text\linebreakiteratorpoolicu.h]
ubrk_open [icu\source\common\ubrk.cpp]
BreakIterator::createLineInstance [icu\source\common\brkiter.cpp]
BreakIterator::createInstance [icu\source\common\brkiter.cpp]
BreakIterator::makeInstance [icu\source\common\brkiter.cpp]
BreakIterator::buildInstance [icu\source\common\brkiter.cpp]
ures_open [icu\source\common\uresbund.c]

Parital source code for ures_open:
---------------
U_CAPI UResourceBundle*  U_EXPORT2
ures_open(const char* path,
                    const char* localeID,
                    UErrorCode* status)
{
    char canonLocaleID[100];
    UResourceDataEntry *hasData = NULL;
    UResourceBundle *r;

    if(status == NULL || U_FAILURE(*status)) {
        return NULL;
    }

    /* first "canonicalize" the locale ID */
    uloc_getBaseName(localeID, canonLocaleID, sizeof(canonLocaleID), status); /// SEE BELOW ///
    if(U_FAILURE(*status) || *status == U_STRING_NOT_TERMINATED_WARNING) {
        *status = U_ILLEGAL_ARGUMENT_ERROR;
        return NULL;
    }
--------------
The max locale size for uloc_getBaseName is limited to 100 by the hardcoded "canonLocaleID" buffer. It might be wise to switch to a dynamically allocated buffer here, if possible?

Continuing down the code path:

uloc_getBaseName [icu\source\common\uloc.c]
_canonicalize [icu\source\common\uloc.c]

Parital source code for _canonicalize:
---------------
    return u_terminateChars(result, resultCapacity, len, err);
---------------
"resultCapacity" is the hardcoded 100 limit mentioned above. The locale we provided is 100 chars, so result if filled with 100 chars in the code before this return statement. It might be wise for the code to not fill the entire buffer, but leave some space for the terminate chars? Otherwise, all our efforts have been in vain. Continuing down the code path:

u_terminateChars [icu\source\common\ustring.c]
Source code for u_terminateChars:
----------------
U_CAPI int32_t U_EXPORT2
u_terminateChars(char *dest, int32_t destCapacity, int32_t length, UErrorCode *pErrorCode) {
    __TERMINATE_STRING(dest, destCapacity, length, pErrorCode);
    return length;
}
----------------
#define __TERMINATE_STRING(dest, destCapacity, length, pErrorCode)      \
    if(pErrorCode!=NULL && U_SUCCESS(*pErrorCode)) {                    \
        /* not a public function, so no complete argument checking */   \
                                                                        \
        if(length<0) {                                                  \
            /* assume that the caller handles this */                   \
        } else if(length<destCapacity) {                                \
            /* NUL-terminate the string, the NUL fits */                \
            dest[length]=0;                                             \
            /* unset the not-terminated warning but leave all others */ \
            if(*pErrorCode==U_STRING_NOT_TERMINATED_WARNING) {          \
                *pErrorCode=U_ZERO_ERROR;                               \
            }                                                           \
        } else if(length==destCapacity) {                               \
            /* unable to NUL-terminate, but the string itself fit - set a warning code */ \
            *pErrorCode=U_STRING_NOT_TERMINATED_WARNING;                \
        } else /* length>destCapacity */ {                              \
            /* even the string itself did not fit - set an error code */ \
            *pErrorCode=U_BUFFER_OVERFLOW_ERROR;                        \
        }                                                               \
    }
----------------
Since length is 100 and destCapacity is also 100, pErrorCode is set to U_STRING_NOT_TERMINATED_WARNING. Going back up the code path, from ures_open onwards, functions will return NULL leading to a NULL pointer in ubrk_setText.

ubrk_setText [icu\source\common\ubrk.cpp]
----------------
U_CAPI void U_EXPORT2
ubrk_setText(UBreakIterator* bi,
             const UChar*    text,
             int32_t         textLength,
             UErrorCode*     status)
{
    BreakIterator *brit = (BreakIterator *)bi;                  //// bi and brit are NULL
    UText  ut = UTEXT_INITIALIZER;
    utext_openUChars(&ut, text, textLength, status);
    brit->setText(&ut, *status);                                //// brit is NULL -> crash
    // A stack allocated UText wrapping a UCHar * string
    //   can be dumped without explicitly closing it.
}

It might make sense to add a NULL check here.
Comment 1 Jungshik Shin 2011-09-08 12:44:39 PDT
For builds that cannot patch ICU, we'd better add a null check to TextBreakIteratorICU.cpp. 

What layout test should we add?  The attachment here would suffice?
Comment 2 Jungshik Shin 2012-02-15 11:52:45 PST
I have  a simple patch.
Comment 3 Jungshik Shin 2012-02-22 17:23:41 PST
The ICU-side of change I made is not acceptable because it's not in line with ICU C API convention about 'NULL' check.  See  http://bugs.icu-project.org/trac/ticket/9115

I'll upload a webkit change to better handle this situation.
Comment 4 Jungshik Shin 2012-02-23 16:51:54 PST
Created attachment 128604 [details]
patch for review
Comment 5 Jungshik Shin 2012-02-23 16:56:43 PST
Created attachment 128609 [details]
patch with a missing file added
Comment 6 Jungshik Shin 2012-02-23 17:04:17 PST
This bug affects all webkit ports that rely on ICU including Safari on Mac. 

mitz, if you can review it, it'll be greatly appreciated.
Comment 7 WebKit Commit Bot 2012-02-24 02:14:04 PST
Attachment 128609 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast..." exit_code: 1
LayoutTests/ChangeLog:4:  Line contains tab character.  [whitespace/tab] [5]
LayoutTests/ChangeLog:5:  Line contains tab character.  [whitespace/tab] [5]
LayoutTests/ChangeLog:6:  Line contains tab character.  [whitespace/tab] [5]
Source/WebCore/ChangeLog:4:  Line contains tab character.  [whitespace/tab] [5]
Source/WebCore/ChangeLog:5:  Line contains tab character.  [whitespace/tab] [5]
Source/WebCore/ChangeLog:6:  Line contains tab character.  [whitespace/tab] [5]
Total errors found: 6 in 8 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 Jungshik Shin 2012-02-24 16:09:57 PST
Created attachment 128825 [details]
patch (fix style nits)
Comment 9 Jungshik Shin 2012-02-26 22:42:12 PST
Darin, can you take a look in case mitz cannot?  Thanks
Comment 10 mitz 2012-02-26 23:26:37 PST
Comment on attachment 128825 [details]
patch (fix style nits)

View in context: https://bugs.webkit.org/attachment.cgi?id=128825&action=review

Looks good, r- because I think you’re not handling an error case that was previously handled.

> Source/WebCore/platform/text/LineBreakIteratorPoolICU.h:64
> +            bool isLocaleEmpty = locale.isEmpty();

I think the question phrase “isLocaleEmpty” isn’t a great name for this variable. You could call it “localeIsEmpty”.

> Source/WebCore/platform/text/LineBreakIteratorPoolICU.h:67
> +            // locale comes from a web page and it can be invalid leading ICU
> +            // to fail, in which case we fall back to the default locale.

You’re missing a comma after “invalid”.

> Source/WebCore/platform/text/LineBreakIteratorPoolICU.h:75
> +                if (U_FAILURE(openStatus)) {
> +                    LOG_ERROR("ubrk_open failed with status %d", openStatus);
> +                    return 0;
> +                }

This should go outside the “if (!isLocalEmpty && U_FAILURE(openStatus))” block, otherwise we’re not handling the case of ubrk_open failing with the currentTextBreakLocaleID() when locale *is* empty.
Comment 11 Jungshik Shin 2012-02-28 10:54:48 PST
Created attachment 129290 [details]
patch updated per mitz's comments

mitz, thanks for the review.  Can you take another look?
Comment 12 mitz 2012-02-28 11:24:20 PST
Comment on attachment 129290 [details]
patch updated per mitz's comments

r=me!
Comment 13 Jungshik Shin 2012-03-12 10:23:49 PDT
Landed on Feb 28 as 

http://trac.webkit.org/changeset/109144