Bug 144257

Summary:

REGRESSION (r183373): ASSERT failed in wtf/SHA1.h

Product:

WebKit

Reporter:

Yusuke Suzuki <ysuzuki>

Component:

JavaScriptCore

Assignee:

Yusuke Suzuki <ysuzuki>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

ap, benjamin, cmarcelo, commit-queue, darin, dbates, fpizlo, ggaren, joepeck, ossy, rniwa

Priority:

Version:

528+ (Nightly build)

Hardware:

Unspecified

OS:

Unspecified

See Also:

https://bugs.webkit.org/show_bug.cgi?id=144339

Bug Depends on:

Bug Blocks:

144263, 142691

Attachments:

Description	Flags
Patch	none
Patch	none
Patch	none
Patch	none
Patch	none
Patch	none
Patch	none

Yusuke Suzuki

Reported 2015-04-27 03:11:28 PDT

https://build.webkit.org/builders/Apple%20Yosemite%20Debug%20JSC%20%28Tests%29/builds/1036/steps/jscore-test/logs/stdio

Attachments
Patch (3.08 KB, patch) 2015-04-27 03:47 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch (3.08 KB, patch) 2015-04-27 04:06 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch (3.89 KB, patch) 2015-04-27 10:17 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch (11.88 KB, patch) 2015-04-28 06:09 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch (13.19 KB, patch) 2015-04-28 06:28 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch (14.69 KB, patch) 2015-04-28 06:31 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch (5.71 KB, patch) 2015-04-29 08:33 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Show Obsolete (6) View All Add attachment proposed patch, testcase, etc.

Yusuke Suzuki

Comment 1 2015-04-27 03:20:23 PDT

OK. I've found the issue. SHA1 is used to calculate CodeBlockHash. To calculate hash value, we pass the source code UTF-8 string to SHA1::addBytes. However, in SHA1::addBytes, there's assertion `input.length() == strlen(string)`. In the template-literal-syntax.js, we perform `eval` with the script contains "\0". As the result, `strlen(string)` accidentally shortened by the contained "\0", and assertion fails.

Yusuke Suzuki

Comment 2 2015-04-27 03:47:52 PDT

Created attachment 251733 [details] Patch

Csaba Osztrogonác

Comment 3 2015-04-27 04:03:04 PDT

Comment on attachment 251733 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251733&action=review The fix looks good to me, but I'm not sure if the source code can contain null characters, so I would let JSC experts to review this fix. > Source/JavaScriptCore/ChangeLog:11 > + So when performing `strlen` on the source code's CString, it returnes the incorrect length. typo: returnes -> returns > Source/JavaScriptCore/ChangeLog:19 > + It allow the given bytes to contain a null character. typo: allow -> allows

Yusuke Suzuki

Comment 4 2015-04-27 04:06:25 PDT

Created attachment 251734 [details] Patch

Darin Adler

Comment 5 2015-04-27 09:33:28 PDT

Comment on attachment 251734 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251734&action=review > Source/JavaScriptCore/bytecode/CodeBlockHash.cpp:47 > - sha1.addBytes(sourceCode.toUTF8()); > + CString sourceCodeBytes = sourceCode.toUTF8(); > + // Since the source code string can contain a null character, > + // pass the bytes and length explicitly instead of passing CString as an argument. > + sha1.addBytes(reinterpret_cast<const uint8_t*>(sourceCodeBytes.data()), sourceCodeBytes.length()); This should be fixed by removing the incorrect assertion from SHA1::addBytes, not by changing the code here at all. That function already has the correct algorithm, it just has a misguided assertion. But at some point, this code should be made be better in other ways. We don’t need a single copy of all the data in UTF-8 just so we can pass it to the SHA1 algorithm; we just need a view of it in chunks that we can pass in. The best API for something like that is not something that allocated a UTF8 string, but rather a streaming decoder that can give us ranges of bytes in UTF-8 form. This enables two optimizations: 1) if the string is stored in an 8-bit buffer and is all ASCII then we can just compute the SHA1 in place without allocating any memory at all; 2) we don’t need to heap allocate a single object big enough to translate the entire thing to UTF-8. Instead we can use a fixed size buffer and not bother with heap allocations at all. An idiom I like to use for this is a “chunky iterator” which returns points to subsequent runs of UTF-8 bytes. The caller decides how big the runs are. I think SourceCode and String need something like this. It makes no sense to copy all the source code into a large memory buffer just to compute an SHA1 of it. One way to do it is to make a function that takes a std::function that gets a pointer and a length to UTF-8 bytes that gets called repeatedly until the entire string is iterated.

Yusuke Suzuki

Comment 6 2015-04-27 10:08:37 PDT

Comment on attachment 251734 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251734&action=review >> Source/JavaScriptCore/bytecode/CodeBlockHash.cpp:47 >> + sha1.addBytes(reinterpret_cast<const uint8_t*>(sourceCodeBytes.data()), sourceCodeBytes.length()); > > This should be fixed by removing the incorrect assertion from SHA1::addBytes, not by changing the code here at all. That function already has the correct algorithm, it just has a misguided assertion. > > But at some point, this code should be made be better in other ways. We don’t need a single copy of all the data in UTF-8 just so we can pass it to the SHA1 algorithm; we just need a view of it in chunks that we can pass in. The best API for something like that is not something that allocated a UTF8 string, but rather a streaming decoder that can give us ranges of bytes in UTF-8 form. This enables two optimizations: 1) if the string is stored in an 8-bit buffer and is all ASCII then we can just compute the SHA1 in place without allocating any memory at all; 2) we don’t need to heap allocate a single object big enough to translate the entire thing to UTF-8. Instead we can use a fixed size buffer and not bother with heap allocations at all. An idiom I like to use for this is a “chunky iterator” which returns points to subsequent runs of UTF-8 bytes. The caller decides how big the runs are. > > I think SourceCode and String need something like this. It makes no sense to copy all the source code into a large memory buffer just to compute an SHA1 of it. One way to do it is to make a function that takes a std::function that gets a pointer and a length to UTF-8 bytes that gets called repeatedly until the entire string is iterated. You're completely right. We should stream the decoded UTF-8 bytes into SHA1 instead of the current "alloc, convert and use" implementation. I've opened the new issue to implement it[1]. So in the meantime, to fix the debug builds, I'll upload the simple patch dropping the assertion in SHA1. [1]: https://bugs.webkit.org/show_bug.cgi?id=144263

Yusuke Suzuki

Comment 7 2015-04-27 10:17:11 PDT

Created attachment 251754 [details] Patch

Alexey Proskuryakov

Comment 8 2015-04-27 12:54:08 PDT

CString is not a data buffer, so a CString object with a null byte is simply invalid. I don't think that the assertion is incorrect, the original patch looks a lot better to me. We've had enough hard to diagnose bugs caused by null bytes in CStrings in the past to make me want to disallow this in every way possible.

Ryosuke Niwa

Comment 9 2015-04-27 14:09:46 PDT

Committed r183421: <http://trac.webkit.org/changeset/183421>

Ryosuke Niwa

Comment 10 2015-04-27 14:10:33 PDT

(In reply to comment #9) > Committed r183421: <http://trac.webkit.org/changeset/183421> I've skipped three tests failing on Windows.

Darin Adler

Comment 11 2015-04-27 14:58:08 PDT

(In reply to comment #8) > CString is not a data buffer, so a CString object with a null byte is simply > invalid. I don’t agree. That’s not the intent of the CString class, although perhaps it should be in the future. > I don't think that the assertion is incorrect, the original patch > looks a lot better to me. We should use Vector<char> for these purposes if a CString can’t have a null byte. > We've had enough hard to diagnose bugs caused by null bytes in CStrings in > the past to make me want to disallow this in every way possible. OK, but that is a statement about the future, not about the present.

Alexey Proskuryakov

Comment 12 2015-04-27 16:33:52 PDT

> OK, but that is a statement about the future, not about the present. CString is documented like this: // A container for a null-terminated char array supporting copy-on-write // assignment. The contained char array may be null. A null terminated string with null bytes in the middle does not really make sense. And in fact, we allow CStrings whose length is bigger than string length, because null termination is the truth.

Benjamin Poulain

Comment 13 2015-04-27 16:46:32 PDT

Don't forget to unskip the tests when landing.

Yusuke Suzuki

Comment 14 2015-04-28 06:09:55 PDT

Created attachment 251842 [details] Patch

Yusuke Suzuki

Comment 15 2015-04-28 06:10:36 PDT

(In reply to comment #12) > > OK, but that is a statement about the future, not about the present. > > CString is documented like this: > > // A container for a null-terminated char array supporting copy-on-write > // assignment. The contained char array may be null. > > A null terminated string with null bytes in the middle does not really make > sense. And in fact, we allow CStrings whose length is bigger than string > length, because null termination is the truth. Thank you for your comment. > And in fact, we allow CStrings whose length is bigger than string length, because null termination is the truth. So, it seems that the current assertion `input.length() == strlen(string)` is problematic anyway. OK, so I propose the following fix, 1. Dropping SHA1::addBytes(CString) interface. SHA1 should feed byte streams (bytes). So the interface accepting CString is not good I think. And if CString should not contain a NULL character, we should insert assertions into CString instead of SHA1::addBytes(CString). Seeing the code, CodeBlockHash is the only user of this. 2. Add StringImpl::utf8BufferForRange -> Vector<char> function. It returns Vector instead of CString. And we implement utf8ForRange based on this function. 3. Use returned buffer vector in CodeBlockHash. Use a buffer vector instead of CString because CString should not contain a null character. And I've upload the revised patch. darin@ & ap@, could you review this once more?

Yusuke Suzuki

Comment 16 2015-04-28 06:11:56 PDT

I've get an TestWebKitAPI build error. Just dropping this test (SHA1::addBytes(CString)).

Yusuke Suzuki

Comment 17 2015-04-28 06:28:13 PDT

Created attachment 251843 [details] Patch

Yusuke Suzuki

Comment 18 2015-04-28 06:30:10 PDT

And unskip template literal tests...

Yusuke Suzuki

Comment 19 2015-04-28 06:31:46 PDT

Created attachment 251844 [details] Patch

Yusuke Suzuki

Comment 20 2015-04-28 06:33:12 PDT

(In reply to comment #13) > Don't forget to unskip the tests when landing. Thank you! Now win32 issue is also fixed by peavo@ http://trac.webkit.org/changeset/183465 So unskipping the tests in the updated patch.

Darin Adler

Comment 21 2015-04-28 08:24:22 PDT

(In reply to comment #12) > > OK, but that is a statement about the future, not about the present. > > CString is documented like this: > > // A container for a null-terminated char array supporting copy-on-write > // assignment. The contained char array may be null. > > A null terminated string with null bytes in the middle does not really make > sense. And in fact, we allow CStrings whose length is bigger than string > length, because null termination is the truth. If this was actually CString’s contract, then CString::length function would have the strlen assertion, not SHA1::addBytes. We allow CString with null characters in them, and they can be used at call sites that don’t care about null character termination at all and are just using CString as a holder for a string of 8-bit characters. If we didn’t allow this use, the String::UTF8 would need to fail when decoding a null character rather than successfully a CString with an embedded null character. I am happy for us to make this change and forbid this usage, but despite the comment you cite, this is not the full truth about what CString is today.

Darin Adler

Comment 22 2015-04-28 08:28:52 PDT

I do see, however, that there are some CString functions that do not support the strings with embedded null characters the way I thought they did. CString functions that work properly with embedded null characters: data mutableData length isNull isSafeToSendToAnotherThread buffer isHashTableDeletedValue == (when both sides are CString) Functions that do not: hash < == (when one side is a CString and the other is a const char*) It’s clear that we should either forbid the embedded null characters entirely, or make the functions all support them. Alexey, lets figure out what we want to do about this and file a new bug about it.

Alexey Proskuryakov

Comment 23 2015-04-28 10:52:00 PDT

Thank you for the helpful analysis! I filed bug 144339 to track this.

Yusuke Suzuki

Comment 24 2015-04-29 08:17:41 PDT

Changing CString not to contain null character needs large changes and it is now handled in bug 144339. So in this issue, I suggest the simplest solution; just dropping the assertion. If it's not good because CString should not contain a null-character, I think it will be fixed in bug 144339.

Yusuke Suzuki

Comment 25 2015-04-29 08:33:40 PDT

Created attachment 251951 [details] Patch

WebKit Commit Bot

Comment 26 2015-04-29 10:46:09 PDT

Comment on attachment 251951 [details] Patch Clearing flags on attachment: 251951 Committed r183559: <http://trac.webkit.org/changeset/183559>

WebKit Commit Bot

Comment 27 2015-04-29 10:46:14 PDT

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.