211142 – U_STRING_NOT_TERMINATED_WARNING ICU must be handled when using the output buffer as a C string

Saam Barati

Reported 2020-04-28 12:48:17 PDT

We have code that checks if we got the U_BUFFER_OVERFLOW_ERROR, and we grow the output buffer and try again when we see such a "warning". We also detect "failure" using U_FAILURE, which doesn't consdier U_STRING_NOT_TERMINATED_WARNING as failure, which seems crazy at first glance, and is leading to bugs in at least some parts of JSC. There are 118 uses of U_BUFFER_OVERFLOW_ERROR in WebKit There are 211 uses of U_FAILURE in WebKit I think we need to audit all of them

Saam Barati

Comment 1 2020-04-28 12:52:17 PDT

Some examples that appear troublesome: ``` String LocaleICU::decimalSymbol(UNumberFormatSymbol symbol) { UErrorCode status = U_ZERO_ERROR; int32_t bufferLength = unum_getSymbol(m_numberFormat, symbol, 0, 0, &status); ASSERT(U_SUCCESS(status) || status == U_BUFFER_OVERFLOW_ERROR); if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR) return String(); Vector<UChar> buffer(bufferLength); status = U_ZERO_ERROR; unum_getSymbol(m_numberFormat, symbol, buffer.data(), bufferLength, &status); if (U_FAILURE(status)) return String(); return String::adopt(WTFMove(buffer)); } ``` ``` int32_t parsedLength; auto intermediateLength = uloc_forLanguageTag(input.data(), intermediate.data(), intermediate.size(), &parsedLength, &status); if (status == U_BUFFER_OVERFLOW_ERROR) { intermediate.resize(intermediateLength + 1); status = U_ZERO_ERROR; uloc_forLanguageTag(input.data(), intermediate.data(), intermediateLength + 1, &parsedLength, &status); } if (U_FAILURE(status) || parsedLength != static_cast<int32_t>(input.length())) return String(); ```

Darin Adler

Comment 2 2020-04-28 12:53:26 PDT

I am puzzled by this. What specifically goes wrong in what case?

Saam Barati

Comment 3 2020-04-28 12:53:53 PDT

(In reply to Saam Barati from comment #1) > Some examples that appear troublesome: > ``` > String LocaleICU::decimalSymbol(UNumberFormatSymbol symbol) > { > UErrorCode status = U_ZERO_ERROR; > int32_t bufferLength = unum_getSymbol(m_numberFormat, symbol, 0, 0, > &status); > ASSERT(U_SUCCESS(status) || status == U_BUFFER_OVERFLOW_ERROR); > if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR) > return String(); > Vector<UChar> buffer(bufferLength); > status = U_ZERO_ERROR; > unum_getSymbol(m_numberFormat, symbol, buffer.data(), bufferLength, > &status); > if (U_FAILURE(status)) > return String(); > return String::adopt(WTFMove(buffer)); > } > ``` Actually this one might be ok, since we don't rely on the buffer being a valid C string > > ``` > int32_t parsedLength; > auto intermediateLength = uloc_forLanguageTag(input.data(), > intermediate.data(), intermediate.size(), &parsedLength, &status); > if (status == U_BUFFER_OVERFLOW_ERROR) { > intermediate.resize(intermediateLength + 1); > status = U_ZERO_ERROR; > uloc_forLanguageTag(input.data(), intermediate.data(), > intermediateLength + 1, &parsedLength, &status); > } > if (U_FAILURE(status) || parsedLength != > static_cast<int32_t>(input.length())) > return String(); > ``` This isn't ok, since we do rely on it being a C string

Darin Adler

Comment 4 2020-04-28 12:54:01 PDT

U_STRING_NOT_TERMINATED_WARNING is only relevant if we rely on a 0 character terminator, and we do not rely on that in the code above.

Darin Adler

Comment 5 2020-04-28 12:54:16 PDT

I’m not sure there is any problem here at all.

Saam Barati

Comment 6 2020-04-28 12:54:59 PDT

(In reply to Darin Adler from comment #2) > I am puzzled by this. What specifically goes wrong in what case? We will take the output of an ICU call and trust that it's a C string. See: ``` UErrorCode status = U_ZERO_ERROR; Vector<char, 32> intermediate(31); int32_t parsedLength; auto intermediateLength = uloc_forLanguageTag(input.data(), intermediate.data(), intermediate.size(), &parsedLength, &status); if (status == U_BUFFER_OVERFLOW_ERROR) { intermediate.resize(intermediateLength + 1); status = U_ZERO_ERROR; uloc_forLanguageTag(input.data(), intermediate.data(), intermediateLength + 1, &parsedLength, &status); } if (U_FAILURE(status) || parsedLength != static_cast<int32_t>(input.length())) return String(); Vector<char, 32> result(32); auto resultLength = uloc_toLanguageTag(intermediate.data(), result.data(), result.size(), false, &status); if (status == U_BUFFER_OVERFLOW_ERROR) { result.grow(resultLength); status = U_ZERO_ERROR; uloc_toLanguageTag(intermediate.data(), result.data(), resultLength, false, &status); } if (U_FAILURE(status)) return String(); ```

Darin Adler

Comment 7 2020-04-28 12:57:35 PDT

(In reply to Saam Barati from comment #3) > > if (U_FAILURE(status) || parsedLength != > > static_cast<int32_t>(input.length())) > > return String(); > > ``` > > This isn't ok, since we do rely on it being a C string I see now. That’s right! That case is wrong. I’d be surprised if there are a lot of other examples like this, but there may be a few in the new Intl code.

Darin Adler

Comment 8 2020-04-28 12:58:38 PDT

(In reply to Saam Barati from comment #6) > We will take the output of an ICU call and trust that it's a C string. I think it’s great to audit carefully. But I think it’s unlikely that we’ve made this mistake a lot in the past.

Saam Barati

Comment 9 2020-04-28 12:59:00 PDT

(In reply to Darin Adler from comment #7) > (In reply to Saam Barati from comment #3) > > > if (U_FAILURE(status) || parsedLength != > > > static_cast<int32_t>(input.length())) > > > return String(); > > > ``` > > > > This isn't ok, since we do rely on it being a C string > > I see now. That’s right! That case is wrong. I’d be surprised if there are a > lot of other examples like this, but there may be a few in the new Intl code. Yeah my initial reaction of this being terribly wrong was off-base. However, I've spotted a handful in JSC's Intl code already, so there exist a non-trivial amount. It's probably worth doing at least a basic audit of all uses of this.

Saam Barati

Comment 10 2020-04-28 12:59:29 PDT

(In reply to Darin Adler from comment #8) > (In reply to Saam Barati from comment #6) > > We will take the output of an ICU call and trust that it's a C string. > > I think it’s great to audit carefully. But I think it’s unlikely that we’ve > made this mistake a lot in the past. Yeah it seems like we do the right thing in most places, maybe except JSC's Intl code.

Ross Kirsling

Comment 11 2020-04-28 13:14:38 PDT

I'll note that that uloc_forLanguageTag -> uloc_toLanguageTag case was particularly tricky to get working safely (hence the comment I added) so any ideas on how to make it less fragile would be great. The other case I'm aware of where we pass the C string from one ICU API directly into another is here: https://github.com/WebKit/webkit/blob/master/Source/JavaScriptCore/runtime/IntlDateTimeFormat.cpp#L115-L144 But this case is still less horrible because the second call takes an input length.

Darin Adler

Comment 12 2020-04-28 13:35:39 PDT

I want to quibble with the language here. Let’s stay logical: there’s no need for fuzziness here. If we need a null-terminated string, we *must* treat U_STRING_NOT_TERMINATED_WARNING as a “buffer too small” case; unless there is some other way we are adding a null terminator. If we do *not* need a null-terminated string, then we can and should ignore that warning. (In reply to Ross Kirsling from comment #11) > But this case is still less horrible because the second call takes an input > length. Here’s my quibble. That case is not just “less horrible”. It is *correct*. The other case is *incorrect*.

Saam Barati

Comment 13 2020-04-28 16:37:06 PDT

<rdar://problem/62530860>

Saam Barati

Comment 14 2020-04-28 16:45:48 PDT

Created attachment 397904 [details] patch

Darin Adler

Comment 15 2020-04-28 17:04:00 PDT

Comment on attachment 397904 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=397904&action=review Honestly I think needsToGrowToProduceBuffer doesn’t make things easier to read. It’s mainly valuable as a record of the fact that you audited a given call site. On the other hand, needsToGrowToProduceCString is definitely valuable. But only in exactly one place. I think adding the header is overkill, but I won’t stand in your way if you still like it. > Source/JavaScriptCore/runtime/IntlObject.cpp:334 > + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING || parsedLength != static_cast<int32_t>(input.length())) How can U_STRING_NOT_TERMINATED_WARNING happen here? Seems like we can just leave this out or simply assert it doesn’t happen. We are using the length it returned, so there is no chance it will happen. > Source/JavaScriptCore/runtime/IntlObject.cpp:346 > +#if ASSERT_ENABLED > + bool sawNullTerminator = false; > + for (char c : intermediate) { > + if (c == '\0') { > + sawNullTerminator = true; > + break; > + } > + } > + ASSERT(sawNullTerminator); > +#endif Alternate implementation that does the same thing: ASSERT(intermediate.contains('\0')); > Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:78 > + if (needsToGrowToProduceCString(status)) { This doesn’t seem right. I don’t see code depending on scratchBuffer being null terminated. > Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:79 > scratchBuffer.grow(lengthNeeded + 1); Seems like this should just be lengthNeeded, not + 1.

Mark Lam

Comment 16 2020-04-28 19:52:40 PDT

Comment on attachment 397904 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=397904&action=review > Source/JavaScriptCore/runtime/IntlObject.cpp:154 > - return String(StringImpl::createStaticStringImpl(buffer.data(), length)); > + return String(buffer.data(), length); This is wrong. convertICULocaleToBCP47LanguageTag() is only called from inside std::call_once guarded blocks i.e. it's meant to produce immortal singleton strings. Before I changed it to be StringImpl::createStaticStringImpl(), we had UAFs due to a user of this string freeing it.

Mark Lam

Comment 17 2020-04-28 20:02:53 PDT

Comment on attachment 397904 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=397904&action=review >> Source/JavaScriptCore/runtime/IntlObject.cpp:154 >> + return String(buffer.data(), length); > > This is wrong. convertICULocaleToBCP47LanguageTag() is only called from inside std::call_once guarded blocks i.e. it's meant to produce immortal singleton strings. Before I changed it to be StringImpl::createStaticStringImpl(), we had UAFs due to a user of this string freeing it. I talked to Saam offline, but for the record, there's another important property of StaticStringImpls i.e. that they are safe to be ref'ed and deref'ed from different threads without locks. That's the most important reason this needs to be a StaticStringImpl because it is a singleton that can be used from different threads.

Mark Lam

Comment 18 2020-04-28 20:04:56 PDT

Comment on attachment 397904 [details] patch Restoring Darin's r+ since Saam said he will undo the StringImpl::createStaticStringImpl() change.

Saam Barati

Comment 19 2020-04-28 20:11:26 PDT

Comment on attachment 397904 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=397904&action=review >> Source/JavaScriptCore/runtime/IntlObject.cpp:154 >> + return String(buffer.data(), length); > > This is wrong. convertICULocaleToBCP47LanguageTag() is only called from inside std::call_once guarded blocks i.e. it's meant to produce immortal singleton strings. Before I changed it to be StringImpl::createStaticStringImpl(), we had UAFs due to a user of this string freeing it. As Mark and I discussed on Slack, the actual issue here isn't being called from call_once, but the fact that the resulting string may ref/deref from more than one thread at once. >> Source/JavaScriptCore/runtime/IntlObject.cpp:334 >> + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING || parsedLength != static_cast<int32_t>(input.length())) > > How can U_STRING_NOT_TERMINATED_WARNING happen here? Seems like we can just leave this out or simply assert it doesn’t happen. We are using the length it returned, so there is no chance it will happen. My thinking here was just ICU failing for some reason, but this would be a weird way for it to fail. >> Source/JavaScriptCore/runtime/IntlObject.cpp:346 >> +#endif > > Alternate implementation that does the same thing: > > ASSERT(intermediate.contains('\0')); That'd be too easy! >> Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:78 >> + if (needsToGrowToProduceCString(status)) { > > This doesn’t seem right. I don’t see code depending on scratchBuffer being null terminated. Yeah it's not needed. I got confused with the AtomString::fromUTF8 version that doesn't take "length" that ends up calling strlen.

Saam Barati

Comment 20 2020-04-28 20:13:34 PDT

Comment on attachment 397904 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=397904&action=review >>> Source/JavaScriptCore/runtime/IntlObject.cpp:334 >>> + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING || parsedLength != static_cast<int32_t>(input.length())) >> >> How can U_STRING_NOT_TERMINATED_WARNING happen here? Seems like we can just leave this out or simply assert it doesn’t happen. We are using the length it returned, so there is no chance it will happen. > > My thinking here was just ICU failing for some reason, but this would be a weird way for it to fail. I guess part of my reasoning. is that if for some reason ICU again hit this warning, that "status == U_STRING_NOT_TERMINATED_WARNING" is just as bad as U_FAILURE

Saam Barati

Comment 21 2020-04-28 20:39:57 PDT

(In reply to Darin Adler from comment #15) > Comment on attachment 397904 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=397904&action=review > > Honestly I think needsToGrowToProduceBuffer doesn’t make things easier to > read. It’s mainly valuable as a record of the fact that you audited a given > call site. On the other hand, needsToGrowToProduceCString is definitely > valuable. But only in exactly one place. I think adding the header is > overkill, but I won’t stand in your way if you still like it. I'm going to keep "needsToGrowToProduceBuffer". The reason I like it is that if you end writing code to use it, it's (hopefully) a forcing function to consider if you should instead be using needsToGrowToProduceCString. > > > Source/JavaScriptCore/runtime/IntlObject.cpp:334 > > + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING || parsedLength != static_cast<int32_t>(input.length())) > > How can U_STRING_NOT_TERMINATED_WARNING happen here? Seems like we can just > leave this out or simply assert it doesn’t happen. We are using the length > it returned, so there is no chance it will happen. > > > Source/JavaScriptCore/runtime/IntlObject.cpp:346 > > +#if ASSERT_ENABLED > > + bool sawNullTerminator = false; > > + for (char c : intermediate) { > > + if (c == '\0') { > > + sawNullTerminator = true; > > + break; > > + } > > + } > > + ASSERT(sawNullTerminator); > > +#endif > > Alternate implementation that does the same thing: > > ASSERT(intermediate.contains('\0')); > > > Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:78 > > + if (needsToGrowToProduceCString(status)) { > > This doesn’t seem right. I don’t see code depending on scratchBuffer being > null terminated. > > > Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:79 > > scratchBuffer.grow(lengthNeeded + 1); > > Seems like this should just be lengthNeeded, not + 1.

Saam Barati

Comment 22 2020-04-28 20:48:48 PDT

Comment on attachment 397904 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=397904&action=review >> Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:79 >> scratchBuffer.grow(lengthNeeded + 1); > > Seems like this should just be lengthNeeded, not + 1. I believe this is needed. But this is my first time reading ICU documentation, and it's not crystal clear exactly what the invariants are. My reading, and slight guessing, given the ambiguity, at what ICU means by "input buffer containing well-formed locale ID to be modified" is that it's null-terminated. There is also this warning: "NOTE: Unlike almost every other ICU function which takes a buffer, this function will NOT truncate the output text, and will not update the buffer with unterminated text setting a status of U_STRING_NOT_TERMINATED_WARNING.". So I think we need to give it the capacity to write the terminating byte otherwise it won't write any output I'm reading the documentation here: https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/uloc_8h.html#a4ba76d26bf66ef7629a649f6dfc3cc93

Saam Barati

Comment 23 2020-04-28 21:22:37 PDT

(In reply to Saam Barati from comment #22) > Comment on attachment 397904 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=397904&action=review > > >> Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:79 > >> scratchBuffer.grow(lengthNeeded + 1); > > > > Seems like this should just be lengthNeeded, not + 1. > > I believe this is needed. But this is my first time reading ICU > documentation, and it's not crystal clear exactly what the invariants are. > My reading, and slight guessing, given the ambiguity, at what ICU means by > "input buffer containing well-formed locale ID to be modified" is that it's > null-terminated. > There is also this warning: "NOTE: Unlike almost every other ICU function > which takes a buffer, this function will NOT truncate the output text, and > will not update the buffer with unterminated text setting a status of > U_STRING_NOT_TERMINATED_WARNING.". So I think we need to give it the > capacity to write the terminating byte otherwise it won't write any output > > I'm reading the documentation here: > https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/uloc_8h. > html#a4ba76d26bf66ef7629a649f6dfc3cc93 I misread their "NOTE". As Ross pointed out on Slack, ICU is calling out that they won't set such a warning. I read this as having a comma between "text" and "setting".

Saam Barati

Comment 24 2020-04-28 21:26:07 PDT

Created attachment 397921 [details] patch for landing

EWS

Comment 25 2020-04-29 00:15:23 PDT

Committed r260882: <https://trac.webkit.org/changeset/260882> All reviewed patches have been landed. Closing bug and clearing flags on attachment 397921 [details].

Attachments
patch (28.49 KB, patch) 2020-04-28 16:45 PDT, Saam Barati	no flags	Details Formatted Diff Diff
patch for landing (28.31 KB, patch) 2020-04-28 21:26 PDT, Saam Barati	no flags	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

andy
annulen
benjamin
cdumez
cmarcelo
darin
ews-watchlist
fpizlo
ggaren
gskachkov
guijemont
gyuyoung.kim
keith_miller
mark.lam
mifenton
msaboff
rmorisset
ross.kirsling
ryuan.choi
sergio
ticaiolima
tzagallo
webkit-bug-importer
ysuzuki