WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
211142
U_STRING_NOT_TERMINATED_WARNING ICU must be handled when using the output buffer as a C string
https://bugs.webkit.org/show_bug.cgi?id=211142
Summary
U_STRING_NOT_TERMINATED_WARNING ICU must be handled when using the output buf...
Saam Barati
Reported
2020-04-28 12:48:17 PDT
We have code that checks if we got the U_BUFFER_OVERFLOW_ERROR, and we grow the output buffer and try again when we see such a "warning". We also detect "failure" using U_FAILURE, which doesn't consdier U_STRING_NOT_TERMINATED_WARNING as failure, which seems crazy at first glance, and is leading to bugs in at least some parts of JSC. There are 118 uses of U_BUFFER_OVERFLOW_ERROR in WebKit There are 211 uses of U_FAILURE in WebKit I think we need to audit all of them
Attachments
patch
(28.49 KB, patch)
2020-04-28 16:45 PDT
,
Saam Barati
no flags
Details
Formatted Diff
Diff
patch for landing
(28.31 KB, patch)
2020-04-28 21:26 PDT
,
Saam Barati
no flags
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Saam Barati
Comment 1
2020-04-28 12:52:17 PDT
Some examples that appear troublesome: ``` String LocaleICU::decimalSymbol(UNumberFormatSymbol symbol) { UErrorCode status = U_ZERO_ERROR; int32_t bufferLength = unum_getSymbol(m_numberFormat, symbol, 0, 0, &status); ASSERT(U_SUCCESS(status) || status == U_BUFFER_OVERFLOW_ERROR); if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR) return String(); Vector<UChar> buffer(bufferLength); status = U_ZERO_ERROR; unum_getSymbol(m_numberFormat, symbol, buffer.data(), bufferLength, &status); if (U_FAILURE(status)) return String(); return String::adopt(WTFMove(buffer)); } ``` ``` int32_t parsedLength; auto intermediateLength = uloc_forLanguageTag(input.data(), intermediate.data(), intermediate.size(), &parsedLength, &status); if (status == U_BUFFER_OVERFLOW_ERROR) { intermediate.resize(intermediateLength + 1); status = U_ZERO_ERROR; uloc_forLanguageTag(input.data(), intermediate.data(), intermediateLength + 1, &parsedLength, &status); } if (U_FAILURE(status) || parsedLength != static_cast<int32_t>(input.length())) return String(); ```
Darin Adler
Comment 2
2020-04-28 12:53:26 PDT
I am puzzled by this. What specifically goes wrong in what case?
Saam Barati
Comment 3
2020-04-28 12:53:53 PDT
(In reply to Saam Barati from
comment #1
)
> Some examples that appear troublesome: > ``` > String LocaleICU::decimalSymbol(UNumberFormatSymbol symbol) > { > UErrorCode status = U_ZERO_ERROR; > int32_t bufferLength = unum_getSymbol(m_numberFormat, symbol, 0, 0, > &status); > ASSERT(U_SUCCESS(status) || status == U_BUFFER_OVERFLOW_ERROR); > if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR) > return String(); > Vector<UChar> buffer(bufferLength); > status = U_ZERO_ERROR; > unum_getSymbol(m_numberFormat, symbol, buffer.data(), bufferLength, > &status); > if (U_FAILURE(status)) > return String(); > return String::adopt(WTFMove(buffer)); > } > ```
Actually this one might be ok, since we don't rely on the buffer being a valid C string
> > ``` > int32_t parsedLength; > auto intermediateLength = uloc_forLanguageTag(input.data(), > intermediate.data(), intermediate.size(), &parsedLength, &status); > if (status == U_BUFFER_OVERFLOW_ERROR) { > intermediate.resize(intermediateLength + 1); > status = U_ZERO_ERROR; > uloc_forLanguageTag(input.data(), intermediate.data(), > intermediateLength + 1, &parsedLength, &status); > } > if (U_FAILURE(status) || parsedLength != > static_cast<int32_t>(input.length())) > return String(); > ```
This isn't ok, since we do rely on it being a C string
Darin Adler
Comment 4
2020-04-28 12:54:01 PDT
U_STRING_NOT_TERMINATED_WARNING is only relevant if we rely on a 0 character terminator, and we do not rely on that in the code above.
Darin Adler
Comment 5
2020-04-28 12:54:16 PDT
I’m not sure there is any problem here at all.
Saam Barati
Comment 6
2020-04-28 12:54:59 PDT
(In reply to Darin Adler from
comment #2
)
> I am puzzled by this. What specifically goes wrong in what case?
We will take the output of an ICU call and trust that it's a C string. See: ``` UErrorCode status = U_ZERO_ERROR; Vector<char, 32> intermediate(31); int32_t parsedLength; auto intermediateLength = uloc_forLanguageTag(input.data(), intermediate.data(), intermediate.size(), &parsedLength, &status); if (status == U_BUFFER_OVERFLOW_ERROR) { intermediate.resize(intermediateLength + 1); status = U_ZERO_ERROR; uloc_forLanguageTag(input.data(), intermediate.data(), intermediateLength + 1, &parsedLength, &status); } if (U_FAILURE(status) || parsedLength != static_cast<int32_t>(input.length())) return String(); Vector<char, 32> result(32); auto resultLength = uloc_toLanguageTag(intermediate.data(), result.data(), result.size(), false, &status); if (status == U_BUFFER_OVERFLOW_ERROR) { result.grow(resultLength); status = U_ZERO_ERROR; uloc_toLanguageTag(intermediate.data(), result.data(), resultLength, false, &status); } if (U_FAILURE(status)) return String(); ```
Darin Adler
Comment 7
2020-04-28 12:57:35 PDT
(In reply to Saam Barati from
comment #3
)
> > if (U_FAILURE(status) || parsedLength != > > static_cast<int32_t>(input.length())) > > return String(); > > ``` > > This isn't ok, since we do rely on it being a C string
I see now. That’s right! That case is wrong. I’d be surprised if there are a lot of other examples like this, but there may be a few in the new Intl code.
Darin Adler
Comment 8
2020-04-28 12:58:38 PDT
(In reply to Saam Barati from
comment #6
)
> We will take the output of an ICU call and trust that it's a C string.
I think it’s great to audit carefully. But I think it’s unlikely that we’ve made this mistake a lot in the past.
Saam Barati
Comment 9
2020-04-28 12:59:00 PDT
(In reply to Darin Adler from
comment #7
)
> (In reply to Saam Barati from
comment #3
) > > > if (U_FAILURE(status) || parsedLength != > > > static_cast<int32_t>(input.length())) > > > return String(); > > > ``` > > > > This isn't ok, since we do rely on it being a C string > > I see now. That’s right! That case is wrong. I’d be surprised if there are a > lot of other examples like this, but there may be a few in the new Intl code.
Yeah my initial reaction of this being terribly wrong was off-base. However, I've spotted a handful in JSC's Intl code already, so there exist a non-trivial amount. It's probably worth doing at least a basic audit of all uses of this.
Saam Barati
Comment 10
2020-04-28 12:59:29 PDT
(In reply to Darin Adler from
comment #8
)
> (In reply to Saam Barati from
comment #6
) > > We will take the output of an ICU call and trust that it's a C string. > > I think it’s great to audit carefully. But I think it’s unlikely that we’ve > made this mistake a lot in the past.
Yeah it seems like we do the right thing in most places, maybe except JSC's Intl code.
Ross Kirsling
Comment 11
2020-04-28 13:14:38 PDT
I'll note that that uloc_forLanguageTag -> uloc_toLanguageTag case was particularly tricky to get working safely (hence the comment I added) so any ideas on how to make it less fragile would be great. The other case I'm aware of where we pass the C string from one ICU API directly into another is here:
https://github.com/WebKit/webkit/blob/master/Source/JavaScriptCore/runtime/IntlDateTimeFormat.cpp#L115-L144
But this case is still less horrible because the second call takes an input length.
Darin Adler
Comment 12
2020-04-28 13:35:39 PDT
I want to quibble with the language here. Let’s stay logical: there’s no need for fuzziness here. If we need a null-terminated string, we *must* treat U_STRING_NOT_TERMINATED_WARNING as a “buffer too small” case; unless there is some other way we are adding a null terminator. If we do *not* need a null-terminated string, then we can and should ignore that warning. (In reply to Ross Kirsling from
comment #11
)
> But this case is still less horrible because the second call takes an input > length.
Here’s my quibble. That case is not just “less horrible”. It is *correct*. The other case is *incorrect*.
Saam Barati
Comment 13
2020-04-28 16:37:06 PDT
<
rdar://problem/62530860
>
Saam Barati
Comment 14
2020-04-28 16:45:48 PDT
Created
attachment 397904
[details]
patch
Darin Adler
Comment 15
2020-04-28 17:04:00 PDT
Comment on
attachment 397904
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=397904&action=review
Honestly I think needsToGrowToProduceBuffer doesn’t make things easier to read. It’s mainly valuable as a record of the fact that you audited a given call site. On the other hand, needsToGrowToProduceCString is definitely valuable. But only in exactly one place. I think adding the header is overkill, but I won’t stand in your way if you still like it.
> Source/JavaScriptCore/runtime/IntlObject.cpp:334 > + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING || parsedLength != static_cast<int32_t>(input.length()))
How can U_STRING_NOT_TERMINATED_WARNING happen here? Seems like we can just leave this out or simply assert it doesn’t happen. We are using the length it returned, so there is no chance it will happen.
> Source/JavaScriptCore/runtime/IntlObject.cpp:346 > +#if ASSERT_ENABLED > + bool sawNullTerminator = false; > + for (char c : intermediate) { > + if (c == '\0') { > + sawNullTerminator = true; > + break; > + } > + } > + ASSERT(sawNullTerminator); > +#endif
Alternate implementation that does the same thing: ASSERT(intermediate.contains('\0'));
> Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:78 > + if (needsToGrowToProduceCString(status)) {
This doesn’t seem right. I don’t see code depending on scratchBuffer being null terminated.
> Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:79 > scratchBuffer.grow(lengthNeeded + 1);
Seems like this should just be lengthNeeded, not + 1.
Mark Lam
Comment 16
2020-04-28 19:52:40 PDT
Comment on
attachment 397904
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=397904&action=review
> Source/JavaScriptCore/runtime/IntlObject.cpp:154 > - return String(StringImpl::createStaticStringImpl(buffer.data(), length)); > + return String(buffer.data(), length);
This is wrong. convertICULocaleToBCP47LanguageTag() is only called from inside std::call_once guarded blocks i.e. it's meant to produce immortal singleton strings. Before I changed it to be StringImpl::createStaticStringImpl(), we had UAFs due to a user of this string freeing it.
Mark Lam
Comment 17
2020-04-28 20:02:53 PDT
Comment on
attachment 397904
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=397904&action=review
>> Source/JavaScriptCore/runtime/IntlObject.cpp:154 >> + return String(buffer.data(), length); > > This is wrong. convertICULocaleToBCP47LanguageTag() is only called from inside std::call_once guarded blocks i.e. it's meant to produce immortal singleton strings. Before I changed it to be StringImpl::createStaticStringImpl(), we had UAFs due to a user of this string freeing it.
I talked to Saam offline, but for the record, there's another important property of StaticStringImpls i.e. that they are safe to be ref'ed and deref'ed from different threads without locks. That's the most important reason this needs to be a StaticStringImpl because it is a singleton that can be used from different threads.
Mark Lam
Comment 18
2020-04-28 20:04:56 PDT
Comment on
attachment 397904
[details]
patch Restoring Darin's r+ since Saam said he will undo the StringImpl::createStaticStringImpl() change.
Saam Barati
Comment 19
2020-04-28 20:11:26 PDT
Comment on
attachment 397904
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=397904&action=review
>> Source/JavaScriptCore/runtime/IntlObject.cpp:154 >> + return String(buffer.data(), length); > > This is wrong. convertICULocaleToBCP47LanguageTag() is only called from inside std::call_once guarded blocks i.e. it's meant to produce immortal singleton strings. Before I changed it to be StringImpl::createStaticStringImpl(), we had UAFs due to a user of this string freeing it.
As Mark and I discussed on Slack, the actual issue here isn't being called from call_once, but the fact that the resulting string may ref/deref from more than one thread at once.
>> Source/JavaScriptCore/runtime/IntlObject.cpp:334 >> + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING || parsedLength != static_cast<int32_t>(input.length())) > > How can U_STRING_NOT_TERMINATED_WARNING happen here? Seems like we can just leave this out or simply assert it doesn’t happen. We are using the length it returned, so there is no chance it will happen.
My thinking here was just ICU failing for some reason, but this would be a weird way for it to fail.
>> Source/JavaScriptCore/runtime/IntlObject.cpp:346 >> +#endif > > Alternate implementation that does the same thing: > > ASSERT(intermediate.contains('\0'));
That'd be too easy!
>> Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:78 >> + if (needsToGrowToProduceCString(status)) { > > This doesn’t seem right. I don’t see code depending on scratchBuffer being null terminated.
Yeah it's not needed. I got confused with the AtomString::fromUTF8 version that doesn't take "length" that ends up calling strlen.
Saam Barati
Comment 20
2020-04-28 20:13:34 PDT
Comment on
attachment 397904
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=397904&action=review
>>> Source/JavaScriptCore/runtime/IntlObject.cpp:334 >>> + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING || parsedLength != static_cast<int32_t>(input.length())) >> >> How can U_STRING_NOT_TERMINATED_WARNING happen here? Seems like we can just leave this out or simply assert it doesn’t happen. We are using the length it returned, so there is no chance it will happen. > > My thinking here was just ICU failing for some reason, but this would be a weird way for it to fail.
I guess part of my reasoning. is that if for some reason ICU again hit this warning, that "status == U_STRING_NOT_TERMINATED_WARNING" is just as bad as U_FAILURE
Saam Barati
Comment 21
2020-04-28 20:39:57 PDT
(In reply to Darin Adler from
comment #15
)
> Comment on
attachment 397904
[details]
> patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=397904&action=review
> > Honestly I think needsToGrowToProduceBuffer doesn’t make things easier to > read. It’s mainly valuable as a record of the fact that you audited a given > call site. On the other hand, needsToGrowToProduceCString is definitely > valuable. But only in exactly one place. I think adding the header is > overkill, but I won’t stand in your way if you still like it.
I'm going to keep "needsToGrowToProduceBuffer". The reason I like it is that if you end writing code to use it, it's (hopefully) a forcing function to consider if you should instead be using needsToGrowToProduceCString.
> > > Source/JavaScriptCore/runtime/IntlObject.cpp:334 > > + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING || parsedLength != static_cast<int32_t>(input.length())) > > How can U_STRING_NOT_TERMINATED_WARNING happen here? Seems like we can just > leave this out or simply assert it doesn’t happen. We are using the length > it returned, so there is no chance it will happen. > > > Source/JavaScriptCore/runtime/IntlObject.cpp:346 > > +#if ASSERT_ENABLED > > + bool sawNullTerminator = false; > > + for (char c : intermediate) { > > + if (c == '\0') { > > + sawNullTerminator = true; > > + break; > > + } > > + } > > + ASSERT(sawNullTerminator); > > +#endif > > Alternate implementation that does the same thing: > > ASSERT(intermediate.contains('\0')); > > > Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:78 > > + if (needsToGrowToProduceCString(status)) { > > This doesn’t seem right. I don’t see code depending on scratchBuffer being > null terminated. > > > Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:79 > > scratchBuffer.grow(lengthNeeded + 1); > > Seems like this should just be lengthNeeded, not + 1.
Saam Barati
Comment 22
2020-04-28 20:48:48 PDT
Comment on
attachment 397904
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=397904&action=review
>> Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:79 >> scratchBuffer.grow(lengthNeeded + 1); > > Seems like this should just be lengthNeeded, not + 1.
I believe this is needed. But this is my first time reading ICU documentation, and it's not crystal clear exactly what the invariants are. My reading, and slight guessing, given the ambiguity, at what ICU means by "input buffer containing well-formed locale ID to be modified" is that it's null-terminated. There is also this warning: "NOTE: Unlike almost every other ICU function which takes a buffer, this function will NOT truncate the output text, and will not update the buffer with unterminated text setting a status of U_STRING_NOT_TERMINATED_WARNING.". So I think we need to give it the capacity to write the terminating byte otherwise it won't write any output I'm reading the documentation here:
https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/uloc_8h.html#a4ba76d26bf66ef7629a649f6dfc3cc93
Saam Barati
Comment 23
2020-04-28 21:22:37 PDT
(In reply to Saam Barati from
comment #22
)
> Comment on
attachment 397904
[details]
> patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=397904&action=review
> > >> Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:79 > >> scratchBuffer.grow(lengthNeeded + 1); > > > > Seems like this should just be lengthNeeded, not + 1. > > I believe this is needed. But this is my first time reading ICU > documentation, and it's not crystal clear exactly what the invariants are. > My reading, and slight guessing, given the ambiguity, at what ICU means by > "input buffer containing well-formed locale ID to be modified" is that it's > null-terminated. > There is also this warning: "NOTE: Unlike almost every other ICU function > which takes a buffer, this function will NOT truncate the output text, and > will not update the buffer with unterminated text setting a status of > U_STRING_NOT_TERMINATED_WARNING.". So I think we need to give it the > capacity to write the terminating byte otherwise it won't write any output > > I'm reading the documentation here: >
https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/uloc_8h
. > html#a4ba76d26bf66ef7629a649f6dfc3cc93
I misread their "NOTE". As Ross pointed out on Slack, ICU is calling out that they won't set such a warning. I read this as having a comma between "text" and "setting".
Saam Barati
Comment 24
2020-04-28 21:26:07 PDT
Created
attachment 397921
[details]
patch for landing
EWS
Comment 25
2020-04-29 00:15:23 PDT
Committed
r260882
: <
https://trac.webkit.org/changeset/260882
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 397921
[details]
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug