WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 144257
REGRESSION (
r183373
): ASSERT failed in wtf/SHA1.h
https://bugs.webkit.org/show_bug.cgi?id=144257
Summary
REGRESSION (r183373): ASSERT failed in wtf/SHA1.h
Yusuke Suzuki
Reported
2015-04-27 03:11:28 PDT
https://build.webkit.org/builders/Apple%20Yosemite%20Debug%20JSC%20%28Tests%29/builds/1036/steps/jscore-test/logs/stdio
Attachments
Patch
(3.08 KB, patch)
2015-04-27 03:47 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(3.08 KB, patch)
2015-04-27 04:06 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(3.89 KB, patch)
2015-04-27 10:17 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(11.88 KB, patch)
2015-04-28 06:09 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(13.19 KB, patch)
2015-04-28 06:28 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(14.69 KB, patch)
2015-04-28 06:31 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(5.71 KB, patch)
2015-04-29 08:33 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Show Obsolete
(6)
View All
Add attachment
proposed patch, testcase, etc.
Yusuke Suzuki
Comment 1
2015-04-27 03:20:23 PDT
OK. I've found the issue. SHA1 is used to calculate CodeBlockHash. To calculate hash value, we pass the source code UTF-8 string to SHA1::addBytes. However, in SHA1::addBytes, there's assertion `input.length() == strlen(string)`. In the template-literal-syntax.js, we perform `eval` with the script contains "\0". As the result, `strlen(string)` accidentally shortened by the contained "\0", and assertion fails.
Yusuke Suzuki
Comment 2
2015-04-27 03:47:52 PDT
Created
attachment 251733
[details]
Patch
Csaba Osztrogonác
Comment 3
2015-04-27 04:03:04 PDT
Comment on
attachment 251733
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=251733&action=review
The fix looks good to me, but I'm not sure if the source code can contain null characters, so I would let JSC experts to review this fix.
> Source/JavaScriptCore/ChangeLog:11 > + So when performing `strlen` on the source code's CString, it returnes the incorrect length.
typo: returnes -> returns
> Source/JavaScriptCore/ChangeLog:19 > + It allow the given bytes to contain a null character.
typo: allow -> allows
Yusuke Suzuki
Comment 4
2015-04-27 04:06:25 PDT
Created
attachment 251734
[details]
Patch
Darin Adler
Comment 5
2015-04-27 09:33:28 PDT
Comment on
attachment 251734
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=251734&action=review
> Source/JavaScriptCore/bytecode/CodeBlockHash.cpp:47 > - sha1.addBytes(sourceCode.toUTF8()); > + CString sourceCodeBytes = sourceCode.toUTF8(); > + // Since the source code string can contain a null character, > + // pass the bytes and length explicitly instead of passing CString as an argument. > + sha1.addBytes(reinterpret_cast<const uint8_t*>(sourceCodeBytes.data()), sourceCodeBytes.length());
This should be fixed by removing the incorrect assertion from SHA1::addBytes, not by changing the code here at all. That function already has the correct algorithm, it just has a misguided assertion. But at some point, this code should be made be better in other ways. We don’t need a single copy of all the data in UTF-8 just so we can pass it to the SHA1 algorithm; we just need a view of it in chunks that we can pass in. The best API for something like that is not something that allocated a UTF8 string, but rather a streaming decoder that can give us ranges of bytes in UTF-8 form. This enables two optimizations: 1) if the string is stored in an 8-bit buffer and is all ASCII then we can just compute the SHA1 in place without allocating any memory at all; 2) we don’t need to heap allocate a single object big enough to translate the entire thing to UTF-8. Instead we can use a fixed size buffer and not bother with heap allocations at all. An idiom I like to use for this is a “chunky iterator” which returns points to subsequent runs of UTF-8 bytes. The caller decides how big the runs are. I think SourceCode and String need something like this. It makes no sense to copy all the source code into a large memory buffer just to compute an SHA1 of it. One way to do it is to make a function that takes a std::function that gets a pointer and a length to UTF-8 bytes that gets called repeatedly until the entire string is iterated.
Yusuke Suzuki
Comment 6
2015-04-27 10:08:37 PDT
Comment on
attachment 251734
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=251734&action=review
>> Source/JavaScriptCore/bytecode/CodeBlockHash.cpp:47 >> + sha1.addBytes(reinterpret_cast<const uint8_t*>(sourceCodeBytes.data()), sourceCodeBytes.length()); > > This should be fixed by removing the incorrect assertion from SHA1::addBytes, not by changing the code here at all. That function already has the correct algorithm, it just has a misguided assertion. > > But at some point, this code should be made be better in other ways. We don’t need a single copy of all the data in UTF-8 just so we can pass it to the SHA1 algorithm; we just need a view of it in chunks that we can pass in. The best API for something like that is not something that allocated a UTF8 string, but rather a streaming decoder that can give us ranges of bytes in UTF-8 form. This enables two optimizations: 1) if the string is stored in an 8-bit buffer and is all ASCII then we can just compute the SHA1 in place without allocating any memory at all; 2) we don’t need to heap allocate a single object big enough to translate the entire thing to UTF-8. Instead we can use a fixed size buffer and not bother with heap allocations at all. An idiom I like to use for this is a “chunky iterator” which returns points to subsequent runs of UTF-8 bytes. The caller decides how big the runs are. > > I think SourceCode and String need something like this. It makes no sense to copy all the source code into a large memory buffer just to compute an SHA1 of it. One way to do it is to make a function that takes a std::function that gets a pointer and a length to UTF-8 bytes that gets called repeatedly until the entire string is iterated.
You're completely right. We should stream the decoded UTF-8 bytes into SHA1 instead of the current "alloc, convert and use" implementation. I've opened the new issue to implement it[1]. So in the meantime, to fix the debug builds, I'll upload the simple patch dropping the assertion in SHA1. [1]:
https://bugs.webkit.org/show_bug.cgi?id=144263
Yusuke Suzuki
Comment 7
2015-04-27 10:17:11 PDT
Created
attachment 251754
[details]
Patch
Alexey Proskuryakov
Comment 8
2015-04-27 12:54:08 PDT
CString is not a data buffer, so a CString object with a null byte is simply invalid. I don't think that the assertion is incorrect, the original patch looks a lot better to me. We've had enough hard to diagnose bugs caused by null bytes in CStrings in the past to make me want to disallow this in every way possible.
Ryosuke Niwa
Comment 9
2015-04-27 14:09:46 PDT
Committed
r183421
: <
http://trac.webkit.org/changeset/183421
>
Ryosuke Niwa
Comment 10
2015-04-27 14:10:33 PDT
(In reply to
comment #9
)
> Committed
r183421
: <
http://trac.webkit.org/changeset/183421
>
I've skipped three tests failing on Windows.
Darin Adler
Comment 11
2015-04-27 14:58:08 PDT
(In reply to
comment #8
)
> CString is not a data buffer, so a CString object with a null byte is simply > invalid.
I don’t agree. That’s not the intent of the CString class, although perhaps it should be in the future.
> I don't think that the assertion is incorrect, the original patch > looks a lot better to me.
We should use Vector<char> for these purposes if a CString can’t have a null byte.
> We've had enough hard to diagnose bugs caused by null bytes in CStrings in > the past to make me want to disallow this in every way possible.
OK, but that is a statement about the future, not about the present.
Alexey Proskuryakov
Comment 12
2015-04-27 16:33:52 PDT
> OK, but that is a statement about the future, not about the present.
CString is documented like this: // A container for a null-terminated char array supporting copy-on-write // assignment. The contained char array may be null. A null terminated string with null bytes in the middle does not really make sense. And in fact, we allow CStrings whose length is bigger than string length, because null termination is the truth.
Benjamin Poulain
Comment 13
2015-04-27 16:46:32 PDT
Don't forget to unskip the tests when landing.
Yusuke Suzuki
Comment 14
2015-04-28 06:09:55 PDT
Created
attachment 251842
[details]
Patch
Yusuke Suzuki
Comment 15
2015-04-28 06:10:36 PDT
(In reply to
comment #12
)
> > OK, but that is a statement about the future, not about the present. > > CString is documented like this: > > // A container for a null-terminated char array supporting copy-on-write > // assignment. The contained char array may be null. > > A null terminated string with null bytes in the middle does not really make > sense. And in fact, we allow CStrings whose length is bigger than string > length, because null termination is the truth.
Thank you for your comment.
> And in fact, we allow CStrings whose length is bigger than string length, because null termination is the truth.
So, it seems that the current assertion `input.length() == strlen(string)` is problematic anyway. OK, so I propose the following fix, 1. Dropping SHA1::addBytes(CString) interface. SHA1 should feed byte streams (bytes). So the interface accepting CString is not good I think. And if CString should not contain a NULL character, we should insert assertions into CString instead of SHA1::addBytes(CString). Seeing the code, CodeBlockHash is the only user of this. 2. Add StringImpl::utf8BufferForRange -> Vector<char> function. It returns Vector instead of CString. And we implement utf8ForRange based on this function. 3. Use returned buffer vector in CodeBlockHash. Use a buffer vector instead of CString because CString should not contain a null character. And I've upload the revised patch. darin@ & ap@, could you review this once more?
Yusuke Suzuki
Comment 16
2015-04-28 06:11:56 PDT
I've get an TestWebKitAPI build error. Just dropping this test (SHA1::addBytes(CString)).
Yusuke Suzuki
Comment 17
2015-04-28 06:28:13 PDT
Created
attachment 251843
[details]
Patch
Yusuke Suzuki
Comment 18
2015-04-28 06:30:10 PDT
And unskip template literal tests...
Yusuke Suzuki
Comment 19
2015-04-28 06:31:46 PDT
Created
attachment 251844
[details]
Patch
Yusuke Suzuki
Comment 20
2015-04-28 06:33:12 PDT
(In reply to
comment #13
)
> Don't forget to unskip the tests when landing.
Thank you! Now win32 issue is also fixed by peavo@
http://trac.webkit.org/changeset/183465
So unskipping the tests in the updated patch.
Darin Adler
Comment 21
2015-04-28 08:24:22 PDT
(In reply to
comment #12
)
> > OK, but that is a statement about the future, not about the present. > > CString is documented like this: > > // A container for a null-terminated char array supporting copy-on-write > // assignment. The contained char array may be null. > > A null terminated string with null bytes in the middle does not really make > sense. And in fact, we allow CStrings whose length is bigger than string > length, because null termination is the truth.
If this was actually CString’s contract, then CString::length function would have the strlen assertion, not SHA1::addBytes. We allow CString with null characters in them, and they can be used at call sites that don’t care about null character termination at all and are just using CString as a holder for a string of 8-bit characters. If we didn’t allow this use, the String::UTF8 would need to fail when decoding a null character rather than successfully a CString with an embedded null character. I am happy for us to make this change and forbid this usage, but despite the comment you cite, this is not the full truth about what CString is today.
Darin Adler
Comment 22
2015-04-28 08:28:52 PDT
I do see, however, that there are some CString functions that do not support the strings with embedded null characters the way I thought they did. CString functions that work properly with embedded null characters: data mutableData length isNull isSafeToSendToAnotherThread buffer isHashTableDeletedValue == (when both sides are CString) Functions that do not: hash < == (when one side is a CString and the other is a const char*) It’s clear that we should either forbid the embedded null characters entirely, or make the functions all support them. Alexey, lets figure out what we want to do about this and file a new bug about it.
Alexey Proskuryakov
Comment 23
2015-04-28 10:52:00 PDT
Thank you for the helpful analysis! I filed
bug 144339
to track this.
Yusuke Suzuki
Comment 24
2015-04-29 08:17:41 PDT
Changing CString not to contain null character needs large changes and it is now handled in
bug 144339
. So in this issue, I suggest the simplest solution; just dropping the assertion. If it's not good because CString should not contain a null-character, I think it will be fixed in
bug 144339
.
Yusuke Suzuki
Comment 25
2015-04-29 08:33:40 PDT
Created
attachment 251951
[details]
Patch
WebKit Commit Bot
Comment 26
2015-04-29 10:46:09 PDT
Comment on
attachment 251951
[details]
Patch Clearing flags on attachment: 251951 Committed
r183559
: <
http://trac.webkit.org/changeset/183559
>
WebKit Commit Bot
Comment 27
2015-04-29 10:46:14 PDT
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug