RESOLVED FIXED Bug 154863
WebKitTestRunner and DumpRenderTree do not handle dangling surrogate characters
https://bugs.webkit.org/show_bug.cgi?id=154863
Summary WebKitTestRunner and DumpRenderTree do not handle dangling surrogate characters
Michael Saboff
Reported 2016-03-01 10:41:33 PST
Created attachment 272574 [details] Crashing test If your run the attached test with DumpRenderTree it doesn’t provide any test output: Content-Type: text/plain DumpMalloc: 53440512 ERROR: nil result from [documentElement innerText]#EOF #EOF If you run WebKitTestRunner with the test, it crashes: 1 0x106169ad0 WTFCrash 2 0x1061dfdcf WTF::String::fromUTF8(unsigned char const*, unsigned long) 3 0x1061e010f WTF::String::fromUTF8WithLatin1Fallback(unsigned char const*, unsigned long) 4 0x11729c394 WTF::String::fromUTF8WithLatin1Fallback(char const*, unsigned long) 5 0x11729c158 WTR::toWTFString(OpaqueWKString const*) 6 0x117298b5f WTR::toWTFString(WebKit::WKRetainPtr<OpaqueWKString const*> const&) 7 0x1172c6102 WTR::dumpFrameText(OpaqueWKBundleFrame const*, WTF::StringBuilder&) 8 0x1172c6625 WTR::InjectedBundlePage::dump() 9 0x1172c5da2 WTR::InjectedBundlePage::frameDidChangeLocation(OpaqueWKBundleFrame const*, bool) 10 0x1172c4607 WTR::InjectedBundlePage::didFinishLoadForFrame(OpaqueWKBundleFrame const*) 11 0x1172c32e8 WTR::InjectedBundlePage::didFinishLoadForFrame(OpaqueWKBundlePage const*, OpaqueWKBundleFrame const*, void const**, void const*) 12 0x1020c0906 WebKit::InjectedBundlePageLoaderClient::didFinishLoadForFrame(WebKit::WebPage*, WebKit::WebFrame*, WTF::RefPtr<API::Object>&) 13 0x10252e89d WebKit::WebFrameLoaderClient::dispatchDidFinishLoad() 14 0x108c84c2a WebCore::FrameLoader::checkLoadCompleteForThisFrame() 15 0x108c7c6be WebCore::FrameLoader::checkLoadComplete() 16 0x108c7c1cc WebCore::FrameLoader::checkCompleted() 17 0x108c7c1f5 WebCore::FrameLoader::loadDone() 18 0x10849a609 WebCore::CachedResourceLoader::loadDone(WebCore::CachedResource*, bool) 19 0x10a418935 WebCore::SubresourceLoader::notifyDone() 20 0x10a41855a WebCore::SubresourceLoader::didFinishLoading(double) 21 0x10280f827 WebKit::WebResourceLoader::didFinishResourceLoad(double) 22 0x102814d93 void IPC::callMemberFunctionImpl<WebKit::WebResourceLoader, void (WebKit::WebResourceLoader::*)(double), std::__1::tuple<double>, 0ul>(WebKit::WebResourceLoader*, void (WebKit::WebResourceLoader::*)(double), std::__1::tuple<double>&&, std::index_sequence<0ul>) 23 0x102814ce8 void IPC::callMemberFunction<WebKit::WebResourceLoader, void (WebKit::WebResourceLoader::*)(double), std::__1::tuple<double>, std::make_index_sequence<1ul> >(std::__1::tuple<double>&&, WebKit::WebResourceLoader*, void (WebKit::WebResourceLoader::*)(double)) 24 0x102813e02 void IPC::handleMessage<Messages::WebResourceLoader::DidFinishResourceLoad, WebKit::WebResourceLoader, void (WebKit::WebResourceLoader::*)(double)>(IPC::MessageDecoder&, WebKit::WebResourceLoader*, void (WebKit::WebResourceLoader::*)(double)) 25 0x10281357c WebKit::WebResourceLoader::didReceiveWebResourceLoaderMessage(IPC::Connection&, IPC::MessageDecoder&) 26 0x10222b990 WebKit::NetworkProcessConnection::didReceiveMessage(IPC::Connection&, IPC::MessageDecoder&) 27 0x101fca993 IPC::Connection::dispatchMessage(IPC::MessageDecoder&) 28 0x101fc1811 IPC::Connection::dispatchMessage(std::__1::unique_ptr<IPC::MessageDecoder, std::__1::default_delete<IPC::MessageDecoder> >) 29 0x101fcaf8f IPC::Connection::dispatchOneMessage() 30 0x101fdc2fd IPC::Connection::enqueueIncomingMessage(std::__1::unique_ptr<IPC::MessageDecoder, std::__1::default_delete<IPC::MessageDecoder> >)::$_10::operator()() const 31 0x101fdc2cd void std::__1::__invoke_void_return_wrapper<void>::__call<IPC::Connection::enqueueIncomingMessage(std::__1::unique_ptr<IPC::MessageDecoder, std::__1::default_delete<IPC::MessageDecoder> >)::$_10&>(IPC::Connection::enqueueIncomingMessage(std::__1::unique_ptr<IPC::MessageDecoder, std::__1::default_delete<IPC::MessageDecoder> >)::$_10&&&) #CRASHED - com.apple.WebKit.WebContent.Development (pid 89331) LEAK: 1 WebProcessPool LEAK: 1 WebPageProxy For WebKitTestRunner, we set “strict” to true when calling convertUTF16ToUTF8(), via the call to WKStringGetUTF8CString(), which will return 0. We pass that stringLength - 1 which underflows to uint64_max to String::fromUTF8WithLatin1Fallback(). Here is the code: inline WTF::String toWTFString(WKStringRef string) { size_t bufferSize = WKStringGetMaximumUTF8CStringSize(string); auto buffer = std::make_unique<char[]>(bufferSize); size_t stringLength = WKStringGetUTF8CString(string, buffer.get(), bufferSize); return WTF::String::fromUTF8WithLatin1Fallback(buffer.get(), stringLength - 1); } 
Attachments
Crashing test (547 bytes, text/html)
2016-03-01 10:41 PST, Michael Saboff
no flags
Patch (8.62 KB, patch)
2016-06-08 14:53 PDT, Michael Saboff
ap: review+
Patch for Landing (8.94 KB, patch)
2016-06-08 17:51 PDT, Michael Saboff
no flags
follow-up fix for EWS (1.71 KB, patch)
2016-06-11 13:19 PDT, Alexey Proskuryakov
no flags
Follow-up fix for EWS (1.94 KB, patch)
2016-06-11 16:07 PDT, Alexey Proskuryakov
no flags
Michael Saboff
Comment 1 2016-03-01 10:41:58 PST
Michael Saboff
Comment 2 2016-06-08 14:53:33 PDT
Alexey Proskuryakov
Comment 3 2016-06-08 15:44:56 PDT
Comment on attachment 280844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=280844&action=review > Tools/ChangeLog:12 > + The code added in DumpRenderTree matches what was changed in WebKitTestRunner. I think that this needs more of an explanation, I couldn't understand the DumpRenderTree change. We already start with an NSString returned by [documentElement innerText], why is this round-trip needed? Is it even actually needed? It looks very suspicious. > Tools/DumpRenderTree/mac/DumpRenderTree.mm:1528 > + NSString* innerText = [documentElement innerText]; Style: misplaced star, should be "NSString *innerText" > Tools/DumpRenderTree/mac/DumpRenderTree.mm:1533 > + WTF::String string = WTF::String::fromUTF8WithLatin1Fallback(buffer.get(), stringLength - 1); I don't think that WTF:: prefix is ever needed for String, please remove it (twice in this line).
Alexey Proskuryakov
Comment 4 2016-06-08 15:47:07 PDT
Comment on attachment 280844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=280844&action=review > LayoutTests/ChangeLog:11 > + * js/dangling-surrogates.html: Added. Oh, and I think that this is a wrong place for this test - LayoutTests/js is for tests that are also run by run-webkit-tests, and this one is not like that. > LayoutTests/js/dangling-surrogates.html:1 > +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> Please use HTML5 doctype (<!DOCTYPE html>). > LayoutTests/js/dangling-surrogates.html:19 > +<script src="../resources/js-test-post.js"></script> A more modern way to write script tests is to use "js-test.js", then you don't need js-test-post.js.
Michael Saboff
Comment 5 2016-06-08 17:33:16 PDT
Comment on attachment 280844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=280844&action=review >> Tools/ChangeLog:12 >> + The code added in DumpRenderTree matches what was changed in WebKitTestRunner. > > I think that this needs more of an explanation, I couldn't understand the DumpRenderTree change. > > We already start with an NSString returned by [documentElement innerText], why is this round-trip needed? Is it even actually needed? It looks very suspicious. Changed the last line of the ChangeLog entry to: After converting to a character buffer via WKStringGetUTF8CStringNonStrict(), we convert back to a UTF8 string. The NSString appendFormat handles the resulting UTF8 string without any issue. This added conversion code matches how similar code in WebKitTestRunner was changed. I added the following comment in DumpRenderTree::dumpFramesAsText: // We use WKStringGetUTF8CStringNonStrict() to convert innerText to a WK String since // WKStringGetUTF8CStringNonStrict() can handle dangling surrogates and the NSString // conversion methods cannot. After the conversion to a buffer, we turn that buffer into // a CFString via fromUTF8WithLatin1Fallback().createCFString() which can be appended to // the result without any conversion. >> Tools/DumpRenderTree/mac/DumpRenderTree.mm:1528 >> + NSString* innerText = [documentElement innerText]; > > Style: misplaced star, should be "NSString *innerText" Done. >> Tools/DumpRenderTree/mac/DumpRenderTree.mm:1533 >> + WTF::String string = WTF::String::fromUTF8WithLatin1Fallback(buffer.get(), stringLength - 1); > > I don't think that WTF:: prefix is ever needed for String, please remove it (twice in this line). Removed. >> LayoutTests/ChangeLog:11 >> + * js/dangling-surrogates.html: Added. > > Oh, and I think that this is a wrong place for this test - LayoutTests/js is for tests that are also run by run-webkit-tests, and this one is not like that. I moved the test to fast/text. >> LayoutTests/js/dangling-surrogates.html:1 >> +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> > > Please use HTML5 doctype (<!DOCTYPE html>). Done. >> LayoutTests/js/dangling-surrogates.html:19 >> +<script src="../resources/js-test-post.js"></script> > > A more modern way to write script tests is to use "js-test.js", then you don't need js-test-post.js. Changed to use js-test.js.
Michael Saboff
Comment 6 2016-06-08 17:51:44 PDT
Created attachment 280861 [details] Patch for Landing
Michael Saboff
Comment 7 2016-06-09 07:53:19 PDT
Darin Adler
Comment 8 2016-06-09 10:18:09 PDT
Comment on attachment 280844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=280844&action=review > Tools/ChangeLog:3 > + WebKitTestRunner and DumpRenderTree do not handle dangling surrogate characters What do we mean when we say we want to "handle" dangling surrogates? Do we simply mean that we want to leave them alone? I think the real problem here is that we should leave strings in UTF-16. I don’t think we should convert them to and from UTF-8. If we leave them as UTF-16, then our code should just leave surrogates alone. > Tools/DumpRenderTree/mac/DumpRenderTree.mm:1535 > + [result appendFormat:@"%@\n", string.createCFString().get()]; It seems really strange to add WKStringGetUTF8CStringNonStrict to WebKit2 SPI just so we can use it on one NSString to make another NSString. It’s really peculiar to involve all these various WebKit classes when we are starting with one NSString and trying to create a second NSString. > Tools/WebKitTestRunner/StringFunctions.h:91 > size_t bufferSize = WKStringGetMaximumUTF8CStringSize(string); > auto buffer = std::make_unique<char[]>(bufferSize); > - size_t stringLength = WKStringGetUTF8CString(string, buffer.get(), bufferSize); > + size_t stringLength = WKStringGetUTF8CStringNonStrict(string, buffer.get(), bufferSize); > return WTF::String::fromUTF8WithLatin1Fallback(buffer.get(), stringLength - 1); Why are we converting a WKString, which is either Latin-1 or UTF-16 internally, first to UTF-8 and then back to UTF-16? We should come up with a code path that doesn’t involve conversion back and forth to/from UTF-8.
Michael Saboff
Comment 9 2016-06-09 11:06:37 PDT
(In reply to comment #8) > Comment on attachment 280844 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=280844&action=review > > > Tools/ChangeLog:3 > > + WebKitTestRunner and DumpRenderTree do not handle dangling surrogate characters > > What do we mean when we say we want to "handle" dangling surrogates? Do we > simply mean that we want to leave them alone? I believe that both DumpRenderTree and WebKitTestRunner dump their frame text output as UFT-8. Therefore "handling" dangling surrogates means that we get a consistent transformation. A non-strict UTF-16 to UTF-8 conversion will convert dangling surrogates to a 3 character UTF-8 sequence. > I think the real problem here is that we should leave strings in UTF-16. I > don’t think we should convert them to and from UTF-8. If we leave them as > UTF-16, then our code should just leave surrogates alone. If we leave them as UTF-16 we lose the whole string. This happens because NSString always converts using some format when asked to return some string. > > Tools/DumpRenderTree/mac/DumpRenderTree.mm:1535 > > + [result appendFormat:@"%@\n", string.createCFString().get()]; > > It seems really strange to add WKStringGetUTF8CStringNonStrict to WebKit2 > SPI just so we can use it on one NSString to make another NSString. It’s > really peculiar to involve all these various WebKit classes when we are > starting with one NSString and trying to create a second NSString. I wondered about this myself so I discussed it with Anders before starting and he supported creating the SPI. I needed this non-strict SPI to do the conversion without losing strings that contain dangling surrogates. > > Tools/WebKitTestRunner/StringFunctions.h:91 > > size_t bufferSize = WKStringGetMaximumUTF8CStringSize(string); > > auto buffer = std::make_unique<char[]>(bufferSize); > > - size_t stringLength = WKStringGetUTF8CString(string, buffer.get(), bufferSize); > > + size_t stringLength = WKStringGetUTF8CStringNonStrict(string, buffer.get(), bufferSize); > > return WTF::String::fromUTF8WithLatin1Fallback(buffer.get(), stringLength - 1); > > Why are we converting a WKString, which is either Latin-1 or UTF-16 > internally, first to UTF-8 and then back to UTF-16? We should come up with a > code path that doesn’t involve conversion back and forth to/from UTF-8. This round about process is used so that we end up with an NSString that can be converted to a UTF-8 when we finally output it.
Alexey Proskuryakov
Comment 10 2016-06-11 13:12:42 PDT
This made all result strings leak: ... dumpFramesAsText(WebFrame*) DumpRenderTree.mm:1534 | WKStringCreateWithCFString WKStringCF.mm:57 | API::String::create(WTF::String const&) APIString.h:51 | API::String::create(WTF::String&&) APIString.h:46 | API::Object::newObject(unsigned long, API::Object::Type) APIObject.mm:275 | NSAllocateObject | class_createInstance | calloc | malloc_zone_calloc
Alexey Proskuryakov
Comment 11 2016-06-11 13:18:52 PDT
Re-opening to attach a fix.
Alexey Proskuryakov
Comment 12 2016-06-11 13:19:10 PDT
Created attachment 281110 [details] follow-up fix for EWS
Alexey Proskuryakov
Comment 13 2016-06-11 16:07:29 PDT
Created attachment 281113 [details] Follow-up fix for EWS
Alexey Proskuryakov
Comment 14 2016-06-11 16:43:26 PDT
Landed the follow-up in r201981.
Simon Fraser (smfr)
Comment 15 2017-02-10 15:37:09 PST
This causes us to call into WebKit2 from DumpRenderTree on the WebThread on iOS, which is bad: https://bugs.webkit.org/show_bug.cgi?id=168149
Joseph Pecoraro
Comment 16 2017-04-11 01:17:22 PDT
There are a bunch of commented out tests that point to this bug: LayoutTests/js/script-tests/regexp-unicode.js // Make sure we properly handle dangling surrogates and combined surrogates // FIXME: These tests are disabled until https://bugs.webkit.org/show_bug.cgi?id=154863 is fixed ... Since this bug is resolved, can those tests be enabled?
Note You need to log in before you can comment on or make changes to this bug.