Bug 187947 - JavaScript string corruption using RegExp with unicode character
Summary: JavaScript string corruption using RegExp with unicode character
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: Safari 11
Hardware: All All
: P2 Normal
Assignee: Alexey Shvayka
URL:
Keywords: InRadar
Depends on: 205323
Blocks:
  Show dependency treegraph
 
Reported: 2018-07-24 01:19 PDT by Luca
Modified: 2020-02-06 13:36 PST (History)
11 users (show)

See Also:


Attachments
Patch (9.13 KB, patch)
2020-02-06 08:09 PST, Alexey Shvayka
ysuzuki: review+
ysuzuki: commit-queue+
Details | Formatted Diff | Diff
Patch (9.15 KB, patch)
2020-02-06 12:40 PST, Alexey Shvayka
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Luca 2018-07-24 01:19:14 PDT
When a 2-bytes unicode character is removed from a JavaScript string, a combination of using the string as an object key, and string concatenation can corrupt the string with null bytes.

Steps to Reproduce:

Running this snippet on Safari, verified at least on Version 11.1.1 (13605.2.8), reproduces the bug (see notes below):

https://gist.github.com/lucaong/a7d7a2eee869e2c7afe8b59fc0dfda2b

Here is a JSFiddle of the same snippet (read comment): https://jsfiddle.net/DukeLeNoir/mkrfw4g8/

Expected Results vs Actual Result:

The script would be expected to find both strings as keys in the object, and therefore alert "All fine" twice. That's indeed what happens on other browsers.

On Safari, the script instead alerts twice, the second time from a code path that should not occur. If the alert text is copied and pasted into a file and inspected with a hex dump tool (like xdd), the string is found to be corrupted with unexpected null bytes, explaining why it was not found as a key in the object.
Calling any String.prototype method on the corrupted string before using it to index the object "fixes" it (even without using the method return value). Other operations that "fix" the corrupted string are using it in a comparison, logging it with console.log, or accessing one of its characters with [].

This affects Safari Version 11.1.1 (13605.2.8) and Safari for iOS, but other versions are likely affected, as shown in this bug report for a popular JavaScript library, caused by the same root issue: https://github.com/olivernn/lunr.js/issues/279

The provided script seems to be the minimum needed to reproduce the bug: using shorter strings, skipping the character enumeration, or any other operation done in this short script seems to cause the bug to disappear.
Comment 1 Radar WebKit Bug Importer 2018-07-24 15:22:50 PDT
<rdar://problem/42558747>
Comment 2 Sukolsak Sakshuwong 2018-07-26 18:14:24 PDT
The code from the snippet above causes an assertion failure in Debug mode:

ASSERTION FAILED: is8Bit()
/webkit/WebKitBuild/Debug/usr/local/include/wtf/text/StringImpl.h(271) : const LChar *WTF::StringImpl::characters8() const
1   0x10c34a419 WTFCrash
2   0x10c34dd19 WTF::StringImpl::characters8() const
3   0x10d6976f4 JSC::JSRopeString::resolveRopeInternal8NoSubstring(unsigned char*) const
4   0x10d69758a JSC::JSRopeString::resolveRopeInternal8(unsigned char*) const
5   0x10d698c82 JSC::JSRopeString::resolveRopeToExistingAtomicString(JSC::ExecState*) const
6   0x10cc9436c JSC::JSString::toExistingAtomicString(JSC::ExecState*) const
7   0x10d36f12f JSC::LLInt::getByVal(JSC::VM&, JSC::ExecState*, JSC::Instruction*, JSC::JSValue, JSC::JSValue)
8   0x10d36eeb5 llint_slow_path_get_by_val
9   0x10c439862 llint_entry
10  0x10c4355b2 vmEntryToJavaScript
11  0x10d28b50a JSC::JITCode::execute(JSC::VM*, JSC::ProtoCallFrame*)
12  0x10d28aab1 JSC::Interpreter::executeProgram(JSC::SourceCode const&, JSC::ExecState*, JSC::JSObject*)
13  0x10d543ad7 JSC::evaluate(JSC::ExecState*, JSC::SourceCode const&, JSC::JSValue, WTF::NakedPtr<JSC::Exception>&)
14  0x10c285010 runWithOptions(GlobalObject*, CommandLine&, bool&)
15  0x10c25c9cc jscmain(int, char**)::$_3::operator()(JSC::VM&, GlobalObject*, bool&) const
16  0x10c244174 int runJSC<jscmain(int, char**)::$_3>(CommandLine, bool, jscmain(int, char**)::$_3 const&)
17  0x10c242c5f jscmain(int, char**)
18  0x10c242bbe main
19  0x7fff6dff4015 start
Segmentation fault: 11

Here is a smaller script that reproduces the same bug:

    var ab16bit = 'abcĀ'.replace(/c.*/, '');

    var map = {};
    map[ab16bit];

    var ropeAB = 'a' + 'b';
    var ropeABC = ropeAB + 'c';

    map[ropeAB];
    map[ropeABC] = 42;
    console.log(JSON.stringify(map)); // Expected: {"abc":42}. Actual: {"a\u0000c":42}.

"map[ab16bit]" creates a 16-bit AtomicString "ab". ropeAB and ropeABC are initially 8-bit JSRopeStrings. "map[ropeAB]" causes ropeAB to resolve into the 16-bit AtomicString "ab". Because of https://webkit.org/b/133574, ropeAB becomes a 16-bit string. However, ropeABC, which points to ropeAB, is still an 8-bit JSRopeString. So, when it is forced to resolve, it copies only the first two bytes of ropeAB, which is "a\0". Thus it returns "a\0c".

One way to fix this seems to be that, when an 8-bit JSRopeString becomes 16-bit, it should set all its ancestor JSRopeStrings to be 16-bit. But I'm not sure how.
Comment 3 Yusuke Suzuki 2018-07-26 18:22:58 PDT
(In reply to Sukolsak Sakshuwong from comment #2)
> The code from the snippet above causes an assertion failure in Debug mode:
> 
> ASSERTION FAILED: is8Bit()
> /webkit/WebKitBuild/Debug/usr/local/include/wtf/text/StringImpl.h(271) :
> const LChar *WTF::StringImpl::characters8() const
> 1   0x10c34a419 WTFCrash
> 2   0x10c34dd19 WTF::StringImpl::characters8() const
> 3   0x10d6976f4 JSC::JSRopeString::resolveRopeInternal8NoSubstring(unsigned
> char*) const
> 4   0x10d69758a JSC::JSRopeString::resolveRopeInternal8(unsigned char*) const
> 5   0x10d698c82
> JSC::JSRopeString::resolveRopeToExistingAtomicString(JSC::ExecState*) const
> 6   0x10cc9436c JSC::JSString::toExistingAtomicString(JSC::ExecState*) const
> 7   0x10d36f12f JSC::LLInt::getByVal(JSC::VM&, JSC::ExecState*,
> JSC::Instruction*, JSC::JSValue, JSC::JSValue)
> 8   0x10d36eeb5 llint_slow_path_get_by_val
> 9   0x10c439862 llint_entry
> 10  0x10c4355b2 vmEntryToJavaScript
> 11  0x10d28b50a JSC::JITCode::execute(JSC::VM*, JSC::ProtoCallFrame*)
> 12  0x10d28aab1 JSC::Interpreter::executeProgram(JSC::SourceCode const&,
> JSC::ExecState*, JSC::JSObject*)
> 13  0x10d543ad7 JSC::evaluate(JSC::ExecState*, JSC::SourceCode const&,
> JSC::JSValue, WTF::NakedPtr<JSC::Exception>&)
> 14  0x10c285010 runWithOptions(GlobalObject*, CommandLine&, bool&)
> 15  0x10c25c9cc jscmain(int, char**)::$_3::operator()(JSC::VM&,
> GlobalObject*, bool&) const
> 16  0x10c244174 int runJSC<jscmain(int, char**)::$_3>(CommandLine, bool,
> jscmain(int, char**)::$_3 const&)
> 17  0x10c242c5f jscmain(int, char**)
> 18  0x10c242bbe main
> 19  0x7fff6dff4015 start
> Segmentation fault: 11
> 
> Here is a smaller script that reproduces the same bug:
> 
>     var ab16bit = 'abcĀ'.replace(/c.*/, '');
> 
>     var map = {};
>     map[ab16bit];
> 
>     var ropeAB = 'a' + 'b';
>     var ropeABC = ropeAB + 'c';
> 
>     map[ropeAB];
>     map[ropeABC] = 42;
>     console.log(JSON.stringify(map)); // Expected: {"abc":42}. Actual:
> {"a\u0000c":42}.
> 
> "map[ab16bit]" creates a 16-bit AtomicString "ab". ropeAB and ropeABC are
> initially 8-bit JSRopeStrings. "map[ropeAB]" causes ropeAB to resolve into
> the 16-bit AtomicString "ab". Because of https://webkit.org/b/133574, ropeAB
> becomes a 16-bit string. However, ropeABC, which points to ropeAB, is still
> an 8-bit JSRopeString. So, when it is forced to resolve, it copies only the
> first two bytes of ropeAB, which is "a\0". Thus it returns "a\0c".
> 
> One way to fix this seems to be that, when an 8-bit JSRopeString becomes
> 16-bit, it should set all its ancestor JSRopeStrings to be 16-bit. But I'm
> not sure how.

I think the correct fix should be making AtomicString 8-bit if it can be.
Comment 4 Alexey Shvayka 2020-02-06 08:09:24 PST
Created attachment 389958 [details]
Patch
Comment 5 Yusuke Suzuki 2020-02-06 12:32:49 PST
Comment on attachment 389958 [details]
Patch

rs=me. I think this is fixed in https://trac.webkit.org/changeset/253648/webkit
Comment 6 Alexey Shvayka 2020-02-06 12:40:04 PST
Created attachment 389982 [details]
Patch

Set reviewer and link https://bugs.webkit.org/show_bug.cgi?id=205323 in ChangeLog.
Comment 7 WebKit Commit Bot 2020-02-06 13:36:20 PST
The commit-queue encountered the following flaky tests while processing attachment 389982 [details]:

editing/spelling/spellcheck-async-remove-frame.html bug 158401 (authors: morrita@google.com, rniwa@webkit.org, and tony@chromium.org)
The commit-queue is continuing to process your patch.
Comment 8 WebKit Commit Bot 2020-02-06 13:36:54 PST
Comment on attachment 389982 [details]
Patch

Clearing flags on attachment: 389982

Committed r255975: <https://trac.webkit.org/changeset/255975>
Comment 9 WebKit Commit Bot 2020-02-06 13:36:55 PST
All reviewed patches have been landed.  Closing bug.