Bug 187947

Summary: JavaScript string corruption using RegExp with unicode character
Product: WebKit Reporter: Luca <lukeongaro+bugzilla>
Component: JavaScriptCoreAssignee: Alexey Shvayka <ashvayka>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, ashvayka, commit-queue, fpizlo, joepeck, msaboff, rendezvouscp, saam, sukolsak, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: Safari 11   
Hardware: All   
OS: All   
Bug Depends on: 205323    
Bug Blocks:    
Attachments:
Description Flags
Patch
ysuzuki: review+, ysuzuki: commit-queue+
Patch none

Luca
Reported 2018-07-24 01:19:14 PDT
When a 2-bytes unicode character is removed from a JavaScript string, a combination of using the string as an object key, and string concatenation can corrupt the string with null bytes. Steps to Reproduce: Running this snippet on Safari, verified at least on Version 11.1.1 (13605.2.8), reproduces the bug (see notes below): https://gist.github.com/lucaong/a7d7a2eee869e2c7afe8b59fc0dfda2b Here is a JSFiddle of the same snippet (read comment): https://jsfiddle.net/DukeLeNoir/mkrfw4g8/ Expected Results vs Actual Result: The script would be expected to find both strings as keys in the object, and therefore alert "All fine" twice. That's indeed what happens on other browsers. On Safari, the script instead alerts twice, the second time from a code path that should not occur. If the alert text is copied and pasted into a file and inspected with a hex dump tool (like xdd), the string is found to be corrupted with unexpected null bytes, explaining why it was not found as a key in the object. Calling any String.prototype method on the corrupted string before using it to index the object "fixes" it (even without using the method return value). Other operations that "fix" the corrupted string are using it in a comparison, logging it with console.log, or accessing one of its characters with []. This affects Safari Version 11.1.1 (13605.2.8) and Safari for iOS, but other versions are likely affected, as shown in this bug report for a popular JavaScript library, caused by the same root issue: https://github.com/olivernn/lunr.js/issues/279 The provided script seems to be the minimum needed to reproduce the bug: using shorter strings, skipping the character enumeration, or any other operation done in this short script seems to cause the bug to disappear.
Attachments
Patch (9.13 KB, patch)
2020-02-06 08:09 PST, Alexey Shvayka
ysuzuki: review+
ysuzuki: commit-queue+
Patch (9.15 KB, patch)
2020-02-06 12:40 PST, Alexey Shvayka
no flags
Radar WebKit Bug Importer
Comment 1 2018-07-24 15:22:50 PDT
Sukolsak Sakshuwong
Comment 2 2018-07-26 18:14:24 PDT
The code from the snippet above causes an assertion failure in Debug mode: ASSERTION FAILED: is8Bit() /webkit/WebKitBuild/Debug/usr/local/include/wtf/text/StringImpl.h(271) : const LChar *WTF::StringImpl::characters8() const 1 0x10c34a419 WTFCrash 2 0x10c34dd19 WTF::StringImpl::characters8() const 3 0x10d6976f4 JSC::JSRopeString::resolveRopeInternal8NoSubstring(unsigned char*) const 4 0x10d69758a JSC::JSRopeString::resolveRopeInternal8(unsigned char*) const 5 0x10d698c82 JSC::JSRopeString::resolveRopeToExistingAtomicString(JSC::ExecState*) const 6 0x10cc9436c JSC::JSString::toExistingAtomicString(JSC::ExecState*) const 7 0x10d36f12f JSC::LLInt::getByVal(JSC::VM&, JSC::ExecState*, JSC::Instruction*, JSC::JSValue, JSC::JSValue) 8 0x10d36eeb5 llint_slow_path_get_by_val 9 0x10c439862 llint_entry 10 0x10c4355b2 vmEntryToJavaScript 11 0x10d28b50a JSC::JITCode::execute(JSC::VM*, JSC::ProtoCallFrame*) 12 0x10d28aab1 JSC::Interpreter::executeProgram(JSC::SourceCode const&, JSC::ExecState*, JSC::JSObject*) 13 0x10d543ad7 JSC::evaluate(JSC::ExecState*, JSC::SourceCode const&, JSC::JSValue, WTF::NakedPtr<JSC::Exception>&) 14 0x10c285010 runWithOptions(GlobalObject*, CommandLine&, bool&) 15 0x10c25c9cc jscmain(int, char**)::$_3::operator()(JSC::VM&, GlobalObject*, bool&) const 16 0x10c244174 int runJSC<jscmain(int, char**)::$_3>(CommandLine, bool, jscmain(int, char**)::$_3 const&) 17 0x10c242c5f jscmain(int, char**) 18 0x10c242bbe main 19 0x7fff6dff4015 start Segmentation fault: 11 Here is a smaller script that reproduces the same bug: var ab16bit = 'abcĀ'.replace(/c.*/, ''); var map = {}; map[ab16bit]; var ropeAB = 'a' + 'b'; var ropeABC = ropeAB + 'c'; map[ropeAB]; map[ropeABC] = 42; console.log(JSON.stringify(map)); // Expected: {"abc":42}. Actual: {"a\u0000c":42}. "map[ab16bit]" creates a 16-bit AtomicString "ab". ropeAB and ropeABC are initially 8-bit JSRopeStrings. "map[ropeAB]" causes ropeAB to resolve into the 16-bit AtomicString "ab". Because of https://webkit.org/b/133574, ropeAB becomes a 16-bit string. However, ropeABC, which points to ropeAB, is still an 8-bit JSRopeString. So, when it is forced to resolve, it copies only the first two bytes of ropeAB, which is "a\0". Thus it returns "a\0c". One way to fix this seems to be that, when an 8-bit JSRopeString becomes 16-bit, it should set all its ancestor JSRopeStrings to be 16-bit. But I'm not sure how.
Yusuke Suzuki
Comment 3 2018-07-26 18:22:58 PDT
(In reply to Sukolsak Sakshuwong from comment #2) > The code from the snippet above causes an assertion failure in Debug mode: > > ASSERTION FAILED: is8Bit() > /webkit/WebKitBuild/Debug/usr/local/include/wtf/text/StringImpl.h(271) : > const LChar *WTF::StringImpl::characters8() const > 1 0x10c34a419 WTFCrash > 2 0x10c34dd19 WTF::StringImpl::characters8() const > 3 0x10d6976f4 JSC::JSRopeString::resolveRopeInternal8NoSubstring(unsigned > char*) const > 4 0x10d69758a JSC::JSRopeString::resolveRopeInternal8(unsigned char*) const > 5 0x10d698c82 > JSC::JSRopeString::resolveRopeToExistingAtomicString(JSC::ExecState*) const > 6 0x10cc9436c JSC::JSString::toExistingAtomicString(JSC::ExecState*) const > 7 0x10d36f12f JSC::LLInt::getByVal(JSC::VM&, JSC::ExecState*, > JSC::Instruction*, JSC::JSValue, JSC::JSValue) > 8 0x10d36eeb5 llint_slow_path_get_by_val > 9 0x10c439862 llint_entry > 10 0x10c4355b2 vmEntryToJavaScript > 11 0x10d28b50a JSC::JITCode::execute(JSC::VM*, JSC::ProtoCallFrame*) > 12 0x10d28aab1 JSC::Interpreter::executeProgram(JSC::SourceCode const&, > JSC::ExecState*, JSC::JSObject*) > 13 0x10d543ad7 JSC::evaluate(JSC::ExecState*, JSC::SourceCode const&, > JSC::JSValue, WTF::NakedPtr<JSC::Exception>&) > 14 0x10c285010 runWithOptions(GlobalObject*, CommandLine&, bool&) > 15 0x10c25c9cc jscmain(int, char**)::$_3::operator()(JSC::VM&, > GlobalObject*, bool&) const > 16 0x10c244174 int runJSC<jscmain(int, char**)::$_3>(CommandLine, bool, > jscmain(int, char**)::$_3 const&) > 17 0x10c242c5f jscmain(int, char**) > 18 0x10c242bbe main > 19 0x7fff6dff4015 start > Segmentation fault: 11 > > Here is a smaller script that reproduces the same bug: > > var ab16bit = 'abcĀ'.replace(/c.*/, ''); > > var map = {}; > map[ab16bit]; > > var ropeAB = 'a' + 'b'; > var ropeABC = ropeAB + 'c'; > > map[ropeAB]; > map[ropeABC] = 42; > console.log(JSON.stringify(map)); // Expected: {"abc":42}. Actual: > {"a\u0000c":42}. > > "map[ab16bit]" creates a 16-bit AtomicString "ab". ropeAB and ropeABC are > initially 8-bit JSRopeStrings. "map[ropeAB]" causes ropeAB to resolve into > the 16-bit AtomicString "ab". Because of https://webkit.org/b/133574, ropeAB > becomes a 16-bit string. However, ropeABC, which points to ropeAB, is still > an 8-bit JSRopeString. So, when it is forced to resolve, it copies only the > first two bytes of ropeAB, which is "a\0". Thus it returns "a\0c". > > One way to fix this seems to be that, when an 8-bit JSRopeString becomes > 16-bit, it should set all its ancestor JSRopeStrings to be 16-bit. But I'm > not sure how. I think the correct fix should be making AtomicString 8-bit if it can be.
Alexey Shvayka
Comment 4 2020-02-06 08:09:24 PST
Yusuke Suzuki
Comment 5 2020-02-06 12:32:49 PST
Comment on attachment 389958 [details] Patch rs=me. I think this is fixed in https://trac.webkit.org/changeset/253648/webkit
Alexey Shvayka
Comment 6 2020-02-06 12:40:04 PST
Created attachment 389982 [details] Patch Set reviewer and link https://bugs.webkit.org/show_bug.cgi?id=205323 in ChangeLog.
WebKit Commit Bot
Comment 7 2020-02-06 13:36:20 PST
The commit-queue encountered the following flaky tests while processing attachment 389982 [details]: editing/spelling/spellcheck-async-remove-frame.html bug 158401 (authors: morrita@google.com, rniwa@webkit.org, and tony@chromium.org) The commit-queue is continuing to process your patch.
WebKit Commit Bot
Comment 8 2020-02-06 13:36:54 PST
Comment on attachment 389982 [details] Patch Clearing flags on attachment: 389982 Committed r255975: <https://trac.webkit.org/changeset/255975>
WebKit Commit Bot
Comment 9 2020-02-06 13:36:55 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.