Bug 154863 - WebKitTestRunner and DumpRenderTree do not handle dangling surrogate characters
Summary: WebKitTestRunner and DumpRenderTree do not handle dangling surrogate characters
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords: InRadar
Depends on:
Blocks: 154842
  Show dependency treegraph
 
Reported: 2016-03-01 10:41 PST by Michael Saboff
Modified: 2017-04-11 01:17 PDT (History)
4 users (show)

See Also:


Attachments
Crashing test (547 bytes, text/html)
2016-03-01 10:41 PST, Michael Saboff
no flags Details
Patch (8.62 KB, patch)
2016-06-08 14:53 PDT, Michael Saboff
ap: review+
Details | Formatted Diff | Diff
Patch for Landing (8.94 KB, patch)
2016-06-08 17:51 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
follow-up fix for EWS (1.71 KB, patch)
2016-06-11 13:19 PDT, Alexey Proskuryakov
no flags Details | Formatted Diff | Diff
Follow-up fix for EWS (1.94 KB, patch)
2016-06-11 16:07 PDT, Alexey Proskuryakov
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2016-03-01 10:41:33 PST
Created attachment 272574 [details]
Crashing test

If your run the attached test with DumpRenderTree it doesn’t provide any test output:
Content-Type: text/plain
DumpMalloc: 53440512
ERROR: nil result from [documentElement innerText]#EOF
#EOF

If you run WebKitTestRunner with the test, it crashes:
1   0x106169ad0 WTFCrash
2   0x1061dfdcf WTF::String::fromUTF8(unsigned char const*, unsigned long)
3   0x1061e010f WTF::String::fromUTF8WithLatin1Fallback(unsigned char const*, unsigned long)
4   0x11729c394 WTF::String::fromUTF8WithLatin1Fallback(char const*, unsigned long)
5   0x11729c158 WTR::toWTFString(OpaqueWKString const*)
6   0x117298b5f WTR::toWTFString(WebKit::WKRetainPtr<OpaqueWKString const*> const&)
7   0x1172c6102 WTR::dumpFrameText(OpaqueWKBundleFrame const*, WTF::StringBuilder&)
8   0x1172c6625 WTR::InjectedBundlePage::dump()
9   0x1172c5da2 WTR::InjectedBundlePage::frameDidChangeLocation(OpaqueWKBundleFrame const*, bool)
10  0x1172c4607 WTR::InjectedBundlePage::didFinishLoadForFrame(OpaqueWKBundleFrame const*)
11  0x1172c32e8 WTR::InjectedBundlePage::didFinishLoadForFrame(OpaqueWKBundlePage const*, OpaqueWKBundleFrame const*, void const**, void const*)
12  0x1020c0906 WebKit::InjectedBundlePageLoaderClient::didFinishLoadForFrame(WebKit::WebPage*, WebKit::WebFrame*, WTF::RefPtr<API::Object>&)
13  0x10252e89d WebKit::WebFrameLoaderClient::dispatchDidFinishLoad()
14  0x108c84c2a WebCore::FrameLoader::checkLoadCompleteForThisFrame()
15  0x108c7c6be WebCore::FrameLoader::checkLoadComplete()
16  0x108c7c1cc WebCore::FrameLoader::checkCompleted()
17  0x108c7c1f5 WebCore::FrameLoader::loadDone()
18  0x10849a609 WebCore::CachedResourceLoader::loadDone(WebCore::CachedResource*, bool)
19  0x10a418935 WebCore::SubresourceLoader::notifyDone()
20  0x10a41855a WebCore::SubresourceLoader::didFinishLoading(double)
21  0x10280f827 WebKit::WebResourceLoader::didFinishResourceLoad(double)
22  0x102814d93 void IPC::callMemberFunctionImpl<WebKit::WebResourceLoader, void (WebKit::WebResourceLoader::*)(double), std::__1::tuple<double>, 0ul>(WebKit::WebResourceLoader*, void (WebKit::WebResourceLoader::*)(double), std::__1::tuple<double>&&, std::index_sequence<0ul>)
23  0x102814ce8 void IPC::callMemberFunction<WebKit::WebResourceLoader, void (WebKit::WebResourceLoader::*)(double), std::__1::tuple<double>, std::make_index_sequence<1ul> >(std::__1::tuple<double>&&, WebKit::WebResourceLoader*, void (WebKit::WebResourceLoader::*)(double))
24  0x102813e02 void IPC::handleMessage<Messages::WebResourceLoader::DidFinishResourceLoad, WebKit::WebResourceLoader, void (WebKit::WebResourceLoader::*)(double)>(IPC::MessageDecoder&, WebKit::WebResourceLoader*, void (WebKit::WebResourceLoader::*)(double))
25  0x10281357c WebKit::WebResourceLoader::didReceiveWebResourceLoaderMessage(IPC::Connection&, IPC::MessageDecoder&)
26  0x10222b990 WebKit::NetworkProcessConnection::didReceiveMessage(IPC::Connection&, IPC::MessageDecoder&)
27  0x101fca993 IPC::Connection::dispatchMessage(IPC::MessageDecoder&)
28  0x101fc1811 IPC::Connection::dispatchMessage(std::__1::unique_ptr<IPC::MessageDecoder, std::__1::default_delete<IPC::MessageDecoder> >)
29  0x101fcaf8f IPC::Connection::dispatchOneMessage()
30  0x101fdc2fd IPC::Connection::enqueueIncomingMessage(std::__1::unique_ptr<IPC::MessageDecoder, std::__1::default_delete<IPC::MessageDecoder> >)::$_10::operator()() const
31  0x101fdc2cd void std::__1::__invoke_void_return_wrapper<void>::__call<IPC::Connection::enqueueIncomingMessage(std::__1::unique_ptr<IPC::MessageDecoder, std::__1::default_delete<IPC::MessageDecoder> >)::$_10&>(IPC::Connection::enqueueIncomingMessage(std::__1::unique_ptr<IPC::MessageDecoder, std::__1::default_delete<IPC::MessageDecoder> >)::$_10&&&)
#CRASHED - com.apple.WebKit.WebContent.Development (pid 89331)
LEAK: 1 WebProcessPool
LEAK: 1 WebPageProxy

For WebKitTestRunner, we set “strict” to true when calling convertUTF16ToUTF8(), via the call to WKStringGetUTF8CString(), which will return 0.  We pass that stringLength - 1 which underflows to uint64_max to String::fromUTF8WithLatin1Fallback().

Here is the code:

inline WTF::String toWTFString(WKStringRef string)
{
    size_t bufferSize = WKStringGetMaximumUTF8CStringSize(string);
    auto buffer = std::make_unique<char[]>(bufferSize);
    size_t stringLength = WKStringGetUTF8CString(string, buffer.get(), bufferSize);
    return WTF::String::fromUTF8WithLatin1Fallback(buffer.get(), stringLength - 1);
}

Comment 1 Michael Saboff 2016-03-01 10:41:58 PST
<rdar://problem/24910745>
Comment 2 Michael Saboff 2016-06-08 14:53:33 PDT
Created attachment 280844 [details]
Patch
Comment 3 Alexey Proskuryakov 2016-06-08 15:44:56 PDT
Comment on attachment 280844 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280844&action=review

> Tools/ChangeLog:12
> +        The code added in DumpRenderTree matches what was changed in WebKitTestRunner.

I think that this needs more of an explanation, I couldn't understand the DumpRenderTree change.

We already start with an NSString returned by [documentElement innerText], why is this round-trip needed? Is it even actually needed? It looks very suspicious.

> Tools/DumpRenderTree/mac/DumpRenderTree.mm:1528
> +    NSString* innerText = [documentElement innerText];

Style: misplaced star, should be "NSString *innerText"

> Tools/DumpRenderTree/mac/DumpRenderTree.mm:1533
> +    WTF::String string = WTF::String::fromUTF8WithLatin1Fallback(buffer.get(), stringLength - 1);

I don't think that WTF:: prefix is ever needed for String, please remove it (twice in this line).
Comment 4 Alexey Proskuryakov 2016-06-08 15:47:07 PDT
Comment on attachment 280844 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280844&action=review

> LayoutTests/ChangeLog:11
> +        * js/dangling-surrogates.html: Added.

Oh, and I think that this is a wrong place for this test - LayoutTests/js is for tests that are also run by run-webkit-tests, and this one is not like that.

> LayoutTests/js/dangling-surrogates.html:1
> +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">

Please use HTML5 doctype (<!DOCTYPE html>).

> LayoutTests/js/dangling-surrogates.html:19
> +<script src="../resources/js-test-post.js"></script>

A more modern way to write script tests is to use "js-test.js", then you don't need js-test-post.js.
Comment 5 Michael Saboff 2016-06-08 17:33:16 PDT
Comment on attachment 280844 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280844&action=review

>> Tools/ChangeLog:12
>> +        The code added in DumpRenderTree matches what was changed in WebKitTestRunner.
> 
> I think that this needs more of an explanation, I couldn't understand the DumpRenderTree change.
> 
> We already start with an NSString returned by [documentElement innerText], why is this round-trip needed? Is it even actually needed? It looks very suspicious.

Changed the last line of the ChangeLog entry to:
                                                                                                                After converting
        to a character buffer via WKStringGetUTF8CStringNonStrict(), we convert back
        to a UTF8 string.  The NSString appendFormat handles the resulting UTF8 string
        without any issue.
        This added conversion code matches how similar code in WebKitTestRunner was changed.

I added the following comment in DumpRenderTree::dumpFramesAsText:

    // We use WKStringGetUTF8CStringNonStrict() to convert innerText to a WK String since
    // WKStringGetUTF8CStringNonStrict() can handle dangling surrogates and the NSString
    // conversion methods cannot. After the conversion to a buffer, we turn that buffer into
    // a CFString via fromUTF8WithLatin1Fallback().createCFString() which can be appended to
    // the result without any conversion.

>> Tools/DumpRenderTree/mac/DumpRenderTree.mm:1528
>> +    NSString* innerText = [documentElement innerText];
> 
> Style: misplaced star, should be "NSString *innerText"

Done.

>> Tools/DumpRenderTree/mac/DumpRenderTree.mm:1533
>> +    WTF::String string = WTF::String::fromUTF8WithLatin1Fallback(buffer.get(), stringLength - 1);
> 
> I don't think that WTF:: prefix is ever needed for String, please remove it (twice in this line).

Removed.

>> LayoutTests/ChangeLog:11
>> +        * js/dangling-surrogates.html: Added.
> 
> Oh, and I think that this is a wrong place for this test - LayoutTests/js is for tests that are also run by run-webkit-tests, and this one is not like that.

I moved the test to fast/text.

>> LayoutTests/js/dangling-surrogates.html:1
>> +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
> 
> Please use HTML5 doctype (<!DOCTYPE html>).

Done.

>> LayoutTests/js/dangling-surrogates.html:19
>> +<script src="../resources/js-test-post.js"></script>
> 
> A more modern way to write script tests is to use "js-test.js", then you don't need js-test-post.js.

Changed to use js-test.js.
Comment 6 Michael Saboff 2016-06-08 17:51:44 PDT
Created attachment 280861 [details]
Patch for Landing
Comment 7 Michael Saboff 2016-06-09 07:53:19 PDT
Committed r201863: <http://trac.webkit.org/changeset/201863>
Comment 8 Darin Adler 2016-06-09 10:18:09 PDT
Comment on attachment 280844 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280844&action=review

> Tools/ChangeLog:3
> +        WebKitTestRunner and DumpRenderTree do not handle dangling surrogate characters

What do we mean when we say we want to "handle" dangling surrogates? Do we simply mean that we want to leave them alone?

I think the real problem here is that we should leave strings in UTF-16. I don’t think we should convert them to and from UTF-8. If we leave them as UTF-16, then our code should just leave surrogates alone.

> Tools/DumpRenderTree/mac/DumpRenderTree.mm:1535
> +    [result appendFormat:@"%@\n", string.createCFString().get()];

It seems really strange to add WKStringGetUTF8CStringNonStrict to WebKit2 SPI just so we can use it on one NSString to make another NSString. It’s really peculiar to involve all these various WebKit classes when we are starting with one NSString and trying to create a second NSString.

> Tools/WebKitTestRunner/StringFunctions.h:91
>      size_t bufferSize = WKStringGetMaximumUTF8CStringSize(string);
>      auto buffer = std::make_unique<char[]>(bufferSize);
> -    size_t stringLength = WKStringGetUTF8CString(string, buffer.get(), bufferSize);
> +    size_t stringLength = WKStringGetUTF8CStringNonStrict(string, buffer.get(), bufferSize);
>      return WTF::String::fromUTF8WithLatin1Fallback(buffer.get(), stringLength - 1);

Why are we converting a WKString, which is either Latin-1 or UTF-16 internally, first to UTF-8 and then back to UTF-16? We should come up with a code path that doesn’t involve conversion back and forth to/from UTF-8.
Comment 9 Michael Saboff 2016-06-09 11:06:37 PDT
(In reply to comment #8)
> Comment on attachment 280844 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=280844&action=review
> 
> > Tools/ChangeLog:3
> > +        WebKitTestRunner and DumpRenderTree do not handle dangling surrogate characters
> 
> What do we mean when we say we want to "handle" dangling surrogates? Do we
> simply mean that we want to leave them alone?

I believe that both DumpRenderTree and WebKitTestRunner dump their frame text output as UFT-8.  Therefore "handling" dangling surrogates means that we get a consistent transformation.  A non-strict UTF-16 to UTF-8 conversion will convert dangling surrogates to a 3 character UTF-8 sequence.
 
> I think the real problem here is that we should leave strings in UTF-16. I
> don’t think we should convert them to and from UTF-8. If we leave them as
> UTF-16, then our code should just leave surrogates alone.

If we leave them as UTF-16 we lose the whole string.  This happens because NSString always converts using some format when asked to return some string.  

> > Tools/DumpRenderTree/mac/DumpRenderTree.mm:1535
> > +    [result appendFormat:@"%@\n", string.createCFString().get()];
> 
> It seems really strange to add WKStringGetUTF8CStringNonStrict to WebKit2
> SPI just so we can use it on one NSString to make another NSString. It’s
> really peculiar to involve all these various WebKit classes when we are
> starting with one NSString and trying to create a second NSString.

I wondered about this myself so I discussed it with Anders before starting and he supported creating the SPI.  I needed this non-strict SPI to do the conversion without losing strings that contain dangling surrogates.

> > Tools/WebKitTestRunner/StringFunctions.h:91
> >      size_t bufferSize = WKStringGetMaximumUTF8CStringSize(string);
> >      auto buffer = std::make_unique<char[]>(bufferSize);
> > -    size_t stringLength = WKStringGetUTF8CString(string, buffer.get(), bufferSize);
> > +    size_t stringLength = WKStringGetUTF8CStringNonStrict(string, buffer.get(), bufferSize);
> >      return WTF::String::fromUTF8WithLatin1Fallback(buffer.get(), stringLength - 1);
> 
> Why are we converting a WKString, which is either Latin-1 or UTF-16
> internally, first to UTF-8 and then back to UTF-16? We should come up with a
> code path that doesn’t involve conversion back and forth to/from UTF-8.

This round about process is used so that we end up with an NSString that can be converted to a UTF-8 when we finally output it.
Comment 10 Alexey Proskuryakov 2016-06-11 13:12:42 PDT
This made all result strings leak:

... dumpFramesAsText(WebFrame*) DumpRenderTree.mm:1534 | WKStringCreateWithCFString WKStringCF.mm:57 | API::String::create(WTF::String const&) APIString.h:51 | API::String::create(WTF::String&&) APIString.h:46 | API::Object::newObject(unsigned long, API::Object::Type) APIObject.mm:275 | NSAllocateObject | class_createInstance | calloc | malloc_zone_calloc
Comment 11 Alexey Proskuryakov 2016-06-11 13:18:52 PDT
Re-opening to attach a fix.
Comment 12 Alexey Proskuryakov 2016-06-11 13:19:10 PDT
Created attachment 281110 [details]
follow-up fix for EWS
Comment 13 Alexey Proskuryakov 2016-06-11 16:07:29 PDT
Created attachment 281113 [details]
Follow-up fix for EWS
Comment 14 Alexey Proskuryakov 2016-06-11 16:43:26 PDT
Landed the follow-up in r201981.
Comment 15 Simon Fraser (smfr) 2017-02-10 15:37:09 PST
This causes us to call into WebKit2 from DumpRenderTree on the WebThread on iOS, which is bad: https://bugs.webkit.org/show_bug.cgi?id=168149
Comment 16 Joseph Pecoraro 2017-04-11 01:17:22 PDT
There are a bunch of commented out tests that point to this bug:
LayoutTests/js/script-tests/regexp-unicode.js

    // Make sure we properly handle dangling surrogates and combined surrogates
    // FIXME: These tests are disabled until https://bugs.webkit.org/show_bug.cgi?id=154863 is fixed
    ...

Since this bug is resolved, can those tests be enabled?