RESOLVED FIXED 195535
WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support
https://bugs.webkit.org/show_bug.cgi?id=195535
Summary WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8...
Darin Adler
Reported 2019-03-10 14:29:56 PDT
WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support
Attachments
Patch (47.62 KB, patch)
2019-03-10 14:57 PDT, Darin Adler
no flags
Patch (47.89 KB, patch)
2019-03-10 15:15 PDT, Darin Adler
no flags
Archive of layout-test-results from ews100 for mac-highsierra (663.30 KB, application/zip)
2019-03-10 15:58 PDT, EWS Watchlist
no flags
Archive of layout-test-results from ews106 for mac-highsierra-wk2 (769.16 KB, application/zip)
2019-03-10 16:04 PDT, EWS Watchlist
no flags
Archive of layout-test-results from ews115 for mac-highsierra (574.30 KB, application/zip)
2019-03-10 16:12 PDT, EWS Watchlist
no flags
Patch (50.41 KB, patch)
2019-04-21 12:53 PDT, Darin Adler
no flags
Archive of layout-test-results from ews123 for ios-simulator-wk2 (2.91 MB, application/zip)
2019-04-21 15:03 PDT, EWS Watchlist
no flags
Patch (65.11 KB, patch)
2019-04-21 17:12 PDT, Darin Adler
no flags
Patch (67.00 KB, patch)
2019-04-28 09:12 PDT, Darin Adler
no flags
Patch (69.53 KB, patch)
2019-04-29 19:03 PDT, Darin Adler
ap: review+
Darin Adler
Comment 1 2019-03-10 14:57:18 PDT Comment hidden (obsolete)
Darin Adler
Comment 2 2019-03-10 15:15:33 PDT Comment hidden (obsolete)
EWS Watchlist
Comment 3 2019-03-10 15:58:46 PDT Comment hidden (obsolete)
EWS Watchlist
Comment 4 2019-03-10 15:58:48 PDT Comment hidden (obsolete)
EWS Watchlist
Comment 5 2019-03-10 16:04:08 PDT Comment hidden (obsolete)
EWS Watchlist
Comment 6 2019-03-10 16:04:10 PDT Comment hidden (obsolete)
EWS Watchlist
Comment 7 2019-03-10 16:12:27 PDT Comment hidden (obsolete)
EWS Watchlist
Comment 8 2019-03-10 16:12:29 PDT Comment hidden (obsolete)
Darin Adler
Comment 9 2019-03-10 16:14:06 PDT
Trying to figure out what kind of failure/crash this is.
EWS Watchlist
Comment 10 2019-03-10 17:53:22 PDT Comment hidden (obsolete)
Darin Adler
Comment 11 2019-04-21 12:53:11 PDT Comment hidden (obsolete)
Darin Adler
Comment 12 2019-04-21 12:54:43 PDT
I figured it out: previous patch broke AtomicString::fromUTF8 when passed a null-terminated string, and that’s fixed now.
EWS Watchlist
Comment 13 2019-04-21 15:03:57 PDT Comment hidden (obsolete)
EWS Watchlist
Comment 14 2019-04-21 15:03:59 PDT Comment hidden (obsolete)
jsc-armv7 EWS
Comment 15 2019-04-21 15:17:57 PDT Comment hidden (obsolete)
EWS Watchlist
Comment 16 2019-04-21 15:20:24 PDT Comment hidden (obsolete)
Darin Adler
Comment 17 2019-04-21 17:12:12 PDT Comment hidden (obsolete)
jsc-armv7 EWS
Comment 18 2019-04-21 19:36:28 PDT Comment hidden (obsolete)
EWS Watchlist
Comment 19 2019-04-21 19:40:32 PDT Comment hidden (obsolete)
Darin Adler
Comment 20 2019-04-22 08:34:13 PDT
Not sure how to run those tests, but I’ll look into why they are failing next time I have a chance to work on this.
Alexey Proskuryakov
Comment 21 2019-04-22 09:18:04 PDT
This may not be the optimal way to run just a subset of tests, but at the high level, the command is "run-javascriptcore-tests".
Saam Barati
Comment 22 2019-04-22 11:26:11 PDT
(In reply to Alexey Proskuryakov from comment #21) > This may not be the optimal way to run just a subset of tests, but at the > high level, the command is "run-javascriptcore-tests". You can also run only the layout JSC tests by: $ Tools/Scripts/run-jsc-stress-tests (--release or --debug) LayoutTests/jsc-layout-tests.yaml You can run an individual test using: $ ./Tools/Scripts/run-jsc (--release or --debug) LayoutTests/resources/standalone-pre.js <test> LayoutTests/resources/standalone-post.js e.g: $ ./Tools/Scripts/run-jsc (--release or --debug) LayoutTests/resources/standalone-pre.js LayoutTests/js/script-tests/invalid-utf8-in-syntax-error.js LayoutTests/resources/standalone-post.js
Yusuke Suzuki
Comment 23 2019-04-22 11:32:59 PDT
You can also pass `--filter="text"` option to filter out tests :)
Darin Adler
Comment 24 2019-04-28 09:12:31 PDT Comment hidden (obsolete)
Darin Adler
Comment 25 2019-04-28 09:14:15 PDT
Comments in ChangeLog explain why that one test was failing. Not a real problem, just a test that needed to be rebased since the test *output* now includes a U+FFFD rather than an unpaired UTF-16 surrogate. The test behavior is identical, and correct, in both cases.
Alexey Proskuryakov
Comment 26 2019-04-29 10:46:09 PDT
EWS points out that Windows build is broken: C:\cygwin\home\buildbot\WebKit\Source\WTF\wtf\unicode\UTF8Conversion.cpp(45): error C2220: warning treated as error - no 'object' file generated [C:\cygwin\home\buildbot\WebKit\WebKitBuild\Release\Source\WTF\wtf\WTF.vcxproj] C:\cygwin\home\buildbot\WebKit\Source\WTF\wtf\unicode\UTF8Conversion.cpp(45): warning C4333: '>>': right shift by too large amount, data loss [C:\cygwin\home\buildbot\WebKit\WebKitBuild\Release\Source\WTF\wtf\WTF.vcxproj]
Darin Adler
Comment 27 2019-04-29 13:05:17 PDT
Yes, I have a patch locally on my machine at home that works around that bug in either the Windows compiler's warning logic or in the Windows version of the ICU headers. Planning to upload it for review. Didn’t get a chance to upload it last night.
Darin Adler
Comment 28 2019-04-29 19:03:07 PDT
Darin Adler
Comment 29 2019-04-30 08:22:54 PDT
All EWS tests passing now. Finally ready for someone to review.
Alexey Proskuryakov
Comment 30 2019-04-30 09:30:45 PDT
Comment on attachment 368529 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=368529&action=review Large patch, somewhat superficial review. Overall comment: I don't think that hiding the types in "auto result = convertUTF16ToUTF8" and "auto success = convertLatin1ToUTF8" is an improvement, especially as the types are different, and the plan is to change them further. This took me an outstanding amount of effort to follow, with all the subtle differences with regards to where "target exhausted" is a failure, and where it is a success. > Source/WTF/wtf/text/WTFString.cpp:862 > + if (!convertUTF8ToUTF16(stringCurrent, reinterpret_cast<const char *>(stringStart + length), &bufferCurrent, bufferCurrent + buffer.size())) Pre-existing issue, misplaced star. > Source/WTF/wtf/unicode/UTF8Conversion.cpp:2 > - * Copyright (C) 2007, 2014 Apple Inc. All rights reserved. > + * Copyright (C) 2007-2019 Apple Inc. All rights reserved. Have we actually made non-trivial changes every year? > Source/WebCore/platform/SharedBuffer.cpp:340 > - result = WTF::Unicode::convertLatin1ToUTF8(&d, d + length, &p, p + buffer.size()); > + if (!WTF::Unicode::convertLatin1ToUTF8(&d, d + length, &p, p + buffer.size())) Typically, we export symbols in WTF into the global namespace, and don't use the WTF prefix at call sites. Is there a good reason for WTF::Unicode to be special? > Source/WebKit/Shared/API/APIString.h:26 > +#pragma once Do we now fully rely on pragma once even in cross-platform API headers? > LayoutTests/ChangeLog:14 > + * css3/escape-dom-api-expected.txt: > + * fast/text/dangling-surrogates-expected.txt: > + * js/dom/webidl-type-mapping-expected.txt: > + * js/invalid-utf8-in-syntax-error-expected.txt: > + Updated expected results to have the Unicode replacement character in cases where the > + text contains unpaired surrogates. The tests are still doing the same operations, and > + still getting the same results, but the text output no longer includes illegal UTF-8. It would be useful to mention whether the new behavior matches other browser engines. > LayoutTests/js/invalid-utf8-in-syntax-error.html:1 > +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> This should use HTML5 doctype. How did you create the wrapper? I just verified that "make-new-script-test js/invalid-utf8-in-syntax-error.html" creates a modern wrapper, also without pre/post scripts.
Darin Adler
Comment 31 2019-04-30 11:32:49 PDT
Comment on attachment 368529 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=368529&action=review Thank you for the review. > Source/WTF/wtf/text/StringImpl.cpp:1857 > + auto success = convertLatin1ToUTF8(&characters, characters + length, &buffer, buffer + bufferVector.size()); This should be bool success, not auto success! >> Source/WTF/wtf/text/WTFString.cpp:862 >> + if (!convertUTF8ToUTF16(stringCurrent, reinterpret_cast<const char *>(stringStart + length), &bufferCurrent, bufferCurrent + buffer.size())) > > Pre-existing issue, misplaced star. I’ll fix that. >> Source/WTF/wtf/unicode/UTF8Conversion.cpp:2 >> + * Copyright (C) 2007-2019 Apple Inc. All rights reserved. > > Have we actually made non-trivial changes every year? I’ll check and update it correctly. >> Source/WebCore/platform/SharedBuffer.cpp:340 >> + if (!WTF::Unicode::convertLatin1ToUTF8(&d, d + length, &p, p + buffer.size())) > > Typically, we export symbols in WTF into the global namespace, and don't use the WTF prefix at call sites. Is there a good reason for WTF::Unicode to be special? Good point. Worth reconsidering later. I was not changing anything here at this time, but I am willing to look into this. Long term, maybe we should stop doing the "export WTF symbols into the global namespace" entirely. It’s less important in an era where we don’t write out whole identifiers so often. But my choice to follow the pattern of the nearby code here was not intentionally to move us in any direction. >> Source/WebKit/Shared/API/APIString.h:26 >> +#pragma once > > Do we now fully rely on pragma once even in cross-platform API headers? As far as I know, this is not a cross-platform API header. This is a header for a class used in the back end of the C API, not itself part of an API. The API directory is confusingly named since many of the headers in it are to *implement* the API, not part of the API. The entire WebKit API namespace is not itself part of API. >> LayoutTests/ChangeLog:14 >> + still getting the same results, but the text output no longer includes illegal UTF-8. > > It would be useful to mention whether the new behavior matches other browser engines. As I understand it, this is a change in the behavior of our test harnesses, to the text dumping done by DumpRenderTree and WebKitTestRunner; no change to behavior of the browser engine. >> LayoutTests/js/invalid-utf8-in-syntax-error.html:1 >> +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> > > This should use HTML5 doctype. > > How did you create the wrapper? I just verified that "make-new-script-test js/invalid-utf8-in-syntax-error.html" creates a modern wrapper, also without pre/post scripts. I’ll use make-new-script-test to regenerate this file. I copied another wrapper from another file in this directory and edited the filename; did not remember there was a make-new-script-test script.
Darin Adler
Comment 32 2019-05-01 08:52:22 PDT
Radar WebKit Bug Importer
Comment 33 2019-05-01 08:54:11 PDT
Shawn Roberts
Comment 34 2019-05-01 10:02:39 PDT
Seeing 4 Test262 JSC failures on Release and Debug after this change. FAIL test/built-ins/RegExp/prototype/Symbol.match/builtin-infer-unicode.js (default) Full Output: Exception: Test262Error: Expected SameValue(«�», «null») to be true FAIL test/built-ins/RegExp/prototype/Symbol.match/builtin-infer-unicode.js (strict mode) Full Output: Exception: Test262Error: Expected SameValue(«�», «null») to be true FAIL test/built-ins/RegExp/prototype/exec/u-lastindex-adv.js (default) Full Output: Exception: Test262Error: Expected SameValue(«�», «null») to be true FAIL test/built-ins/RegExp/prototype/exec/u-lastindex-adv.js (strict mode) Full Output: Exception: Test262Error: Expected SameValue(«�», «null») to be true https://build.webkit.org/builders/Apple%20High%20Sierra%20Release%20Test262%20%28Tests%29/builds/2784/steps/test262-test/logs/stdio https://build.webkit.org/builders/Apple%20High%20Sierra%20Debug%20Test262%20%28Tests%29/builds/2108/steps/test262-test/logs/stdio
Ross Kirsling
Comment 35 2019-05-01 10:11:23 PDT
(In reply to Shawn Roberts from comment #34) > Seeing 4 Test262 JSC failures on Release and Debug after this change. Those are already-failing tests so we just need to update test262/expectations.yaml.
Shawn Roberts
Comment 36 2019-05-01 10:14:35 PDT
Reverted r244821 for reason: Causing Committed r244827: <https://trac.webkit.org/changeset/244827>
Ryan Haddad
Comment 37 2019-05-01 10:22:28 PDT
(In reply to Ross Kirsling from comment #35) > (In reply to Shawn Roberts from comment #34) > > Seeing 4 Test262 JSC failures on Release and Debug after this change. > > Those are already-failing tests so we just need to update > test262/expectations.yaml. Hm, the test run before this revision didn't report the tests as failing. The bottom of the stdio shows: "0 tests newly fail" I do see the failures when I search the test names, though. https://build.webkit.org/builders/Apple%20High%20Sierra%20Debug%20Test262%20%28Tests%29/builds/2107/steps/test262-test/logs/stdio
Ryan Haddad
Comment 38 2019-05-01 10:28:44 PDT
(In reply to Ryan Haddad from comment #37) > (In reply to Ross Kirsling from comment #35) > > (In reply to Shawn Roberts from comment #34) > > > Seeing 4 Test262 JSC failures on Release and Debug after this change. > > > > Those are already-failing tests so we just need to update > > test262/expectations.yaml. Oh, I see what you are saying. The test was already failing but the output changed from: Exception: Test262Error: Expected SameValue(«���», «null») to be true to Exception: Test262Error: Expected SameValue(«�», «null») to be true
Ryan Haddad
Comment 39 2019-05-01 10:33:55 PDT
I re-landed r244821 and r244822 in https://trac.webkit.org/changeset/244828/webkit. Will address the test262 failures separately.
Ross Kirsling
Comment 40 2019-05-01 10:49:35 PDT
(In reply to Ryan Haddad from comment #39) > I re-landed r244821 and r244822 in > https://trac.webkit.org/changeset/244828/webkit. > > Will address the test262 failures separately. Committed r244833: <https://trac.webkit.org/changeset/244833>
Darin Adler
Comment 41 2019-05-01 13:02:56 PDT
I see now. The test262 failures are analogous to the layout tests, where we had to rebase because the test harness now writes out strings differently (more correctly) but the actual test results are unchanged. So the expected results need to be rebased.
Alex Christensen
Comment 42 2020-08-19 13:12:02 PDT
Comment on attachment 368529 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=368529&action=review > Source/WTF/wtf/unicode/UTF8Conversion.cpp:99 > + U8_NEXT(reinterpret_cast<const uint8_t*>(source), sourceOffset, sourceEnd - source, character); It appears we would more closely match the behavior of Chrome and Firefox if we used U8_NEXT_OR_FFFD here instead of U8_NEXT. I'm planning to propose that change in an upcoming patch.
Darin Adler
Comment 43 2020-08-19 13:20:52 PDT
Comment on attachment 368529 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=368529&action=review >> Source/WTF/wtf/unicode/UTF8Conversion.cpp:99 >> + U8_NEXT(reinterpret_cast<const uint8_t*>(source), sourceOffset, sourceEnd - source, character); > > It appears we would more closely match the behavior of Chrome and Firefox if we used U8_NEXT_OR_FFFD here instead of U8_NEXT. I'm planning to propose that change in an upcoming patch. Given that this is a widely used function, I am not sure that all call sites need the same behavior.
Alex Christensen
Comment 44 2020-08-19 17:15:05 PDT
Note You need to log in before you can comment on or make changes to this bug.