We have code that checks if we got the U_BUFFER_OVERFLOW_ERROR, and we grow the output buffer and try again when we see such a "warning". We also detect "failure" using U_FAILURE, which doesn't consdier U_STRING_NOT_TERMINATED_WARNING as failure, which seems crazy at first glance, and is leading to bugs in at least some parts of JSC. There are 118 uses of U_BUFFER_OVERFLOW_ERROR in WebKit There are 211 uses of U_FAILURE in WebKit I think we need to audit all of them
Some examples that appear troublesome: ``` String LocaleICU::decimalSymbol(UNumberFormatSymbol symbol) { UErrorCode status = U_ZERO_ERROR; int32_t bufferLength = unum_getSymbol(m_numberFormat, symbol, 0, 0, &status); ASSERT(U_SUCCESS(status) || status == U_BUFFER_OVERFLOW_ERROR); if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR) return String(); Vector<UChar> buffer(bufferLength); status = U_ZERO_ERROR; unum_getSymbol(m_numberFormat, symbol, buffer.data(), bufferLength, &status); if (U_FAILURE(status)) return String(); return String::adopt(WTFMove(buffer)); } ``` ``` int32_t parsedLength; auto intermediateLength = uloc_forLanguageTag(input.data(), intermediate.data(), intermediate.size(), &parsedLength, &status); if (status == U_BUFFER_OVERFLOW_ERROR) { intermediate.resize(intermediateLength + 1); status = U_ZERO_ERROR; uloc_forLanguageTag(input.data(), intermediate.data(), intermediateLength + 1, &parsedLength, &status); } if (U_FAILURE(status) || parsedLength != static_cast<int32_t>(input.length())) return String(); ```
I am puzzled by this. What specifically goes wrong in what case?
(In reply to Saam Barati from comment #1) > Some examples that appear troublesome: > ``` > String LocaleICU::decimalSymbol(UNumberFormatSymbol symbol) > { > UErrorCode status = U_ZERO_ERROR; > int32_t bufferLength = unum_getSymbol(m_numberFormat, symbol, 0, 0, > &status); > ASSERT(U_SUCCESS(status) || status == U_BUFFER_OVERFLOW_ERROR); > if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR) > return String(); > Vector<UChar> buffer(bufferLength); > status = U_ZERO_ERROR; > unum_getSymbol(m_numberFormat, symbol, buffer.data(), bufferLength, > &status); > if (U_FAILURE(status)) > return String(); > return String::adopt(WTFMove(buffer)); > } > ``` Actually this one might be ok, since we don't rely on the buffer being a valid C string > > ``` > int32_t parsedLength; > auto intermediateLength = uloc_forLanguageTag(input.data(), > intermediate.data(), intermediate.size(), &parsedLength, &status); > if (status == U_BUFFER_OVERFLOW_ERROR) { > intermediate.resize(intermediateLength + 1); > status = U_ZERO_ERROR; > uloc_forLanguageTag(input.data(), intermediate.data(), > intermediateLength + 1, &parsedLength, &status); > } > if (U_FAILURE(status) || parsedLength != > static_cast<int32_t>(input.length())) > return String(); > ``` This isn't ok, since we do rely on it being a C string
U_STRING_NOT_TERMINATED_WARNING is only relevant if we rely on a 0 character terminator, and we do not rely on that in the code above.
I’m not sure there is any problem here at all.
(In reply to Darin Adler from comment #2) > I am puzzled by this. What specifically goes wrong in what case? We will take the output of an ICU call and trust that it's a C string. See: ``` UErrorCode status = U_ZERO_ERROR; Vector<char, 32> intermediate(31); int32_t parsedLength; auto intermediateLength = uloc_forLanguageTag(input.data(), intermediate.data(), intermediate.size(), &parsedLength, &status); if (status == U_BUFFER_OVERFLOW_ERROR) { intermediate.resize(intermediateLength + 1); status = U_ZERO_ERROR; uloc_forLanguageTag(input.data(), intermediate.data(), intermediateLength + 1, &parsedLength, &status); } if (U_FAILURE(status) || parsedLength != static_cast<int32_t>(input.length())) return String(); Vector<char, 32> result(32); auto resultLength = uloc_toLanguageTag(intermediate.data(), result.data(), result.size(), false, &status); if (status == U_BUFFER_OVERFLOW_ERROR) { result.grow(resultLength); status = U_ZERO_ERROR; uloc_toLanguageTag(intermediate.data(), result.data(), resultLength, false, &status); } if (U_FAILURE(status)) return String(); ```
(In reply to Saam Barati from comment #3) > > if (U_FAILURE(status) || parsedLength != > > static_cast<int32_t>(input.length())) > > return String(); > > ``` > > This isn't ok, since we do rely on it being a C string I see now. That’s right! That case is wrong. I’d be surprised if there are a lot of other examples like this, but there may be a few in the new Intl code.
(In reply to Saam Barati from comment #6) > We will take the output of an ICU call and trust that it's a C string. I think it’s great to audit carefully. But I think it’s unlikely that we’ve made this mistake a lot in the past.
(In reply to Darin Adler from comment #7) > (In reply to Saam Barati from comment #3) > > > if (U_FAILURE(status) || parsedLength != > > > static_cast<int32_t>(input.length())) > > > return String(); > > > ``` > > > > This isn't ok, since we do rely on it being a C string > > I see now. That’s right! That case is wrong. I’d be surprised if there are a > lot of other examples like this, but there may be a few in the new Intl code. Yeah my initial reaction of this being terribly wrong was off-base. However, I've spotted a handful in JSC's Intl code already, so there exist a non-trivial amount. It's probably worth doing at least a basic audit of all uses of this.
(In reply to Darin Adler from comment #8) > (In reply to Saam Barati from comment #6) > > We will take the output of an ICU call and trust that it's a C string. > > I think it’s great to audit carefully. But I think it’s unlikely that we’ve > made this mistake a lot in the past. Yeah it seems like we do the right thing in most places, maybe except JSC's Intl code.
I'll note that that uloc_forLanguageTag -> uloc_toLanguageTag case was particularly tricky to get working safely (hence the comment I added) so any ideas on how to make it less fragile would be great. The other case I'm aware of where we pass the C string from one ICU API directly into another is here: https://github.com/WebKit/webkit/blob/master/Source/JavaScriptCore/runtime/IntlDateTimeFormat.cpp#L115-L144 But this case is still less horrible because the second call takes an input length.
I want to quibble with the language here. Let’s stay logical: there’s no need for fuzziness here. If we need a null-terminated string, we *must* treat U_STRING_NOT_TERMINATED_WARNING as a “buffer too small” case; unless there is some other way we are adding a null terminator. If we do *not* need a null-terminated string, then we can and should ignore that warning. (In reply to Ross Kirsling from comment #11) > But this case is still less horrible because the second call takes an input > length. Here’s my quibble. That case is not just “less horrible”. It is *correct*. The other case is *incorrect*.
<rdar://problem/62530860>
Created attachment 397904 [details] patch
Comment on attachment 397904 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=397904&action=review Honestly I think needsToGrowToProduceBuffer doesn’t make things easier to read. It’s mainly valuable as a record of the fact that you audited a given call site. On the other hand, needsToGrowToProduceCString is definitely valuable. But only in exactly one place. I think adding the header is overkill, but I won’t stand in your way if you still like it. > Source/JavaScriptCore/runtime/IntlObject.cpp:334 > + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING || parsedLength != static_cast<int32_t>(input.length())) How can U_STRING_NOT_TERMINATED_WARNING happen here? Seems like we can just leave this out or simply assert it doesn’t happen. We are using the length it returned, so there is no chance it will happen. > Source/JavaScriptCore/runtime/IntlObject.cpp:346 > +#if ASSERT_ENABLED > + bool sawNullTerminator = false; > + for (char c : intermediate) { > + if (c == '\0') { > + sawNullTerminator = true; > + break; > + } > + } > + ASSERT(sawNullTerminator); > +#endif Alternate implementation that does the same thing: ASSERT(intermediate.contains('\0')); > Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:78 > + if (needsToGrowToProduceCString(status)) { This doesn’t seem right. I don’t see code depending on scratchBuffer being null terminated. > Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:79 > scratchBuffer.grow(lengthNeeded + 1); Seems like this should just be lengthNeeded, not + 1.
Comment on attachment 397904 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=397904&action=review > Source/JavaScriptCore/runtime/IntlObject.cpp:154 > - return String(StringImpl::createStaticStringImpl(buffer.data(), length)); > + return String(buffer.data(), length); This is wrong. convertICULocaleToBCP47LanguageTag() is only called from inside std::call_once guarded blocks i.e. it's meant to produce immortal singleton strings. Before I changed it to be StringImpl::createStaticStringImpl(), we had UAFs due to a user of this string freeing it.
Comment on attachment 397904 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=397904&action=review >> Source/JavaScriptCore/runtime/IntlObject.cpp:154 >> + return String(buffer.data(), length); > > This is wrong. convertICULocaleToBCP47LanguageTag() is only called from inside std::call_once guarded blocks i.e. it's meant to produce immortal singleton strings. Before I changed it to be StringImpl::createStaticStringImpl(), we had UAFs due to a user of this string freeing it. I talked to Saam offline, but for the record, there's another important property of StaticStringImpls i.e. that they are safe to be ref'ed and deref'ed from different threads without locks. That's the most important reason this needs to be a StaticStringImpl because it is a singleton that can be used from different threads.
Comment on attachment 397904 [details] patch Restoring Darin's r+ since Saam said he will undo the StringImpl::createStaticStringImpl() change.
Comment on attachment 397904 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=397904&action=review >> Source/JavaScriptCore/runtime/IntlObject.cpp:154 >> + return String(buffer.data(), length); > > This is wrong. convertICULocaleToBCP47LanguageTag() is only called from inside std::call_once guarded blocks i.e. it's meant to produce immortal singleton strings. Before I changed it to be StringImpl::createStaticStringImpl(), we had UAFs due to a user of this string freeing it. As Mark and I discussed on Slack, the actual issue here isn't being called from call_once, but the fact that the resulting string may ref/deref from more than one thread at once. >> Source/JavaScriptCore/runtime/IntlObject.cpp:334 >> + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING || parsedLength != static_cast<int32_t>(input.length())) > > How can U_STRING_NOT_TERMINATED_WARNING happen here? Seems like we can just leave this out or simply assert it doesn’t happen. We are using the length it returned, so there is no chance it will happen. My thinking here was just ICU failing for some reason, but this would be a weird way for it to fail. >> Source/JavaScriptCore/runtime/IntlObject.cpp:346 >> +#endif > > Alternate implementation that does the same thing: > > ASSERT(intermediate.contains('\0')); That'd be too easy! >> Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:78 >> + if (needsToGrowToProduceCString(status)) { > > This doesn’t seem right. I don’t see code depending on scratchBuffer being null terminated. Yeah it's not needed. I got confused with the AtomString::fromUTF8 version that doesn't take "length" that ends up calling strlen.
Comment on attachment 397904 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=397904&action=review >>> Source/JavaScriptCore/runtime/IntlObject.cpp:334 >>> + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING || parsedLength != static_cast<int32_t>(input.length())) >> >> How can U_STRING_NOT_TERMINATED_WARNING happen here? Seems like we can just leave this out or simply assert it doesn’t happen. We are using the length it returned, so there is no chance it will happen. > > My thinking here was just ICU failing for some reason, but this would be a weird way for it to fail. I guess part of my reasoning. is that if for some reason ICU again hit this warning, that "status == U_STRING_NOT_TERMINATED_WARNING" is just as bad as U_FAILURE
(In reply to Darin Adler from comment #15) > Comment on attachment 397904 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=397904&action=review > > Honestly I think needsToGrowToProduceBuffer doesn’t make things easier to > read. It’s mainly valuable as a record of the fact that you audited a given > call site. On the other hand, needsToGrowToProduceCString is definitely > valuable. But only in exactly one place. I think adding the header is > overkill, but I won’t stand in your way if you still like it. I'm going to keep "needsToGrowToProduceBuffer". The reason I like it is that if you end writing code to use it, it's (hopefully) a forcing function to consider if you should instead be using needsToGrowToProduceCString. > > > Source/JavaScriptCore/runtime/IntlObject.cpp:334 > > + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING || parsedLength != static_cast<int32_t>(input.length())) > > How can U_STRING_NOT_TERMINATED_WARNING happen here? Seems like we can just > leave this out or simply assert it doesn’t happen. We are using the length > it returned, so there is no chance it will happen. > > > Source/JavaScriptCore/runtime/IntlObject.cpp:346 > > +#if ASSERT_ENABLED > > + bool sawNullTerminator = false; > > + for (char c : intermediate) { > > + if (c == '\0') { > > + sawNullTerminator = true; > > + break; > > + } > > + } > > + ASSERT(sawNullTerminator); > > +#endif > > Alternate implementation that does the same thing: > > ASSERT(intermediate.contains('\0')); > > > Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:78 > > + if (needsToGrowToProduceCString(status)) { > > This doesn’t seem right. I don’t see code depending on scratchBuffer being > null terminated. > > > Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:79 > > scratchBuffer.grow(lengthNeeded + 1); > > Seems like this should just be lengthNeeded, not + 1.
Comment on attachment 397904 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=397904&action=review >> Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:79 >> scratchBuffer.grow(lengthNeeded + 1); > > Seems like this should just be lengthNeeded, not + 1. I believe this is needed. But this is my first time reading ICU documentation, and it's not crystal clear exactly what the invariants are. My reading, and slight guessing, given the ambiguity, at what ICU means by "input buffer containing well-formed locale ID to be modified" is that it's null-terminated. There is also this warning: "NOTE: Unlike almost every other ICU function which takes a buffer, this function will NOT truncate the output text, and will not update the buffer with unterminated text setting a status of U_STRING_NOT_TERMINATED_WARNING.". So I think we need to give it the capacity to write the terminating byte otherwise it won't write any output I'm reading the documentation here: https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/uloc_8h.html#a4ba76d26bf66ef7629a649f6dfc3cc93
(In reply to Saam Barati from comment #22) > Comment on attachment 397904 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=397904&action=review > > >> Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:79 > >> scratchBuffer.grow(lengthNeeded + 1); > > > > Seems like this should just be lengthNeeded, not + 1. > > I believe this is needed. But this is my first time reading ICU > documentation, and it's not crystal clear exactly what the invariants are. > My reading, and slight guessing, given the ambiguity, at what ICU means by > "input buffer containing well-formed locale ID to be modified" is that it's > null-terminated. > There is also this warning: "NOTE: Unlike almost every other ICU function > which takes a buffer, this function will NOT truncate the output text, and > will not update the buffer with unterminated text setting a status of > U_STRING_NOT_TERMINATED_WARNING.". So I think we need to give it the > capacity to write the terminating byte otherwise it won't write any output > > I'm reading the documentation here: > https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/uloc_8h. > html#a4ba76d26bf66ef7629a649f6dfc3cc93 I misread their "NOTE". As Ross pointed out on Slack, ICU is calling out that they won't set such a warning. I read this as having a comma between "text" and "setting".
Created attachment 397921 [details] patch for landing
Committed r260882: <https://trac.webkit.org/changeset/260882> All reviewed patches have been landed. Closing bug and clearing flags on attachment 397921 [details].