RESOLVED FIXED 49581
Cleanup UTF8.cpp
https://bugs.webkit.org/show_bug.cgi?id=49581
Summary Cleanup UTF8.cpp
Patrick R. Gansterer
Reported 2010-11-15 23:45:20 PST
see patch
Attachments
Patch (7.21 KB, patch)
2010-11-15 23:54 PST, Patrick R. Gansterer
darin: review-
darin: commit-queue-
test case for unicode plane 1 characters (1.94 KB, patch)
2010-11-19 06:38 PST, Patrick R. Gansterer
no flags
Patch (7.68 KB, patch)
2010-11-20 03:06 PST, Patrick R. Gansterer
no flags
Patrick R. Gansterer
Comment 1 2010-11-15 23:54:52 PST
WebKit Review Bot
Comment 2 2010-11-15 23:58:18 PST
Attachment 73969 [details] did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style', '--diff-files', u'JavaScriptCore/ChangeLog', u'JavaScriptCore/wtf/unicode/UTF8.cpp', u'JavaScriptCore/wtf/unicode/UnicodeMacrosFromICU.h']" exit_code: 1 JavaScriptCore/wtf/unicode/UTF8.cpp:242: A case label should not be indented, but line up with its switch statement. [whitespace/indent] [4] JavaScriptCore/wtf/unicode/UTF8.cpp:242: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/wtf/unicode/UTF8.cpp:243: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/wtf/unicode/UTF8.cpp:244: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/wtf/unicode/UTF8.cpp:245: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/wtf/unicode/UTF8.cpp:246: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/wtf/unicode/UnicodeMacrosFromICU.h:41: Missing spaces around <= [whitespace/operators] [3] Total errors found: 7 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Darin Adler
Comment 3 2010-11-16 10:33:05 PST
Comment on attachment 73969 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=73969&action=review Thanks for making these changes. I think these are an improvement! > JavaScriptCore/wtf/unicode/UTF8.cpp:290 > + *target++ = static_cast<UChar>(character); // normal case Is this typecast really needed? Why? Doesn’t narrowing happen automatically without explicit typecasting. > JavaScriptCore/wtf/unicode/UTF8.cpp:300 > + character -= 0x0010000UL; > + *target++ = U16_LEAD(character); > + *target++ = U16_TRAIL(character); This is wrong. The U16_LEAD macro works on an actual character, not a character that has had subtraction done. So this change will break things! We need to make sure we have testing coverage. > JavaScriptCore/wtf/unicode/UTF8.cpp:307 > + *target++ = 0xFFFD; We should consider moving CharacterNames.h from WebCore into WTF. Then we could use the name replacementCharacter here instead of the constant 0xFFFD.
Patrick R. Gansterer
Comment 4 2010-11-16 13:43:06 PST
(In reply to comment #3) > Thanks for making these changes. I think these are an improvement! I hope so ;-) > > JavaScriptCore/wtf/unicode/UTF8.cpp:300 > > + character -= 0x0010000UL; > > + *target++ = U16_LEAD(character); > > + *target++ = U16_TRAIL(character); > > This is wrong. The U16_LEAD macro works on an actual character, not a character that has had subtraction done. So this change will break things! We need to make sure we have testing coverage. Uups, missed removal of this line. I had some failures in the LayoutTests, but they seam to be unrelated to UTF8 changes. :-/ I'm not 100% familar with all LayouTests. Can you point me to a good example to test this? There are some xml test with invalid unicode sequenzes, but I'm not sure if that's the best way to to this too. > > JavaScriptCore/wtf/unicode/UTF8.cpp:307 > > + *target++ = 0xFFFD; > > We should consider moving CharacterNames.h from WebCore into WTF. Then we could use the name replacementCharacter here instead of the constant 0xFFFD. Created bug 49618 to move CharacterNames.h into WTF.
Darin Adler
Comment 5 2010-11-16 18:08:32 PST
(In reply to comment #4) > I'm not 100% familar with all LayoutTests. Nobody is. There are too many for any one person to know them all. > Can you point me to a good example to test this? I’m not sure there is already one. First you’d have to look at where this function is used, and then figure out a way to make a test that will pass in a UTF-8 sequence for a character that is non-BMP to that code path. I’d be happy to help, but I can’t do it right now.
Darin Adler
Comment 6 2010-11-17 13:57:33 PST
(In reply to comment #5) > First you’d have to look at where this function is used As you know from your AtomicString work, one place this is used is in the XML parser. Knowing that, I think we can easily produce a test using DOMParser.
Darin Adler
Comment 7 2010-11-17 14:00:25 PST
(In reply to comment #6) > As you know from your AtomicString work, one place this is used is in the XML parser. Knowing that, I think we can easily produce a test using DOMParser. That won’t be completely great because in the case of DOMParser, the input is UTF-16, which is then converted to UTF-8, and then back to UTF-16. But it can test if the round trip works, and would have caught the 0x10000 bug.
Darin Adler
Comment 8 2010-11-17 14:01:18 PST
Just adding more characters to fast/encoding/xml-utf-8-default.xml might work.
Patrick R. Gansterer
Comment 9 2010-11-17 14:11:10 PST
(In reply to comment #8) > Just adding more characters to fast/encoding/xml-utf-8-default.xml might work. Ok, thx! I'll look for some characters in the different unicode ranges add them and post a patch.
Patrick R. Gansterer
Comment 10 2010-11-19 06:38:00 PST
Created attachment 74381 [details] test case for unicode plane 1 characters
Darin Adler
Comment 11 2010-11-19 09:57:09 PST
You should check that this test case will catch that bug in convertUTF8ToUTF16, by leaving the bug in and running the test case.
Patrick R. Gansterer
Comment 12 2010-11-19 10:03:39 PST
(In reply to comment #11) > You should check that this test case will catch that bug in convertUTF8ToUTF16, by leaving the bug in and running the test case. Argh, missed to add the comment :-/. I already tried it with the new test case, but it didn't catch an error. Then I inserted a CRASH in the convertUTF8ToUTF16 function, but it didn't crashed. I didn't had the time to look deeper into this issue, but this additonal testcase won't hurt anyway. Maybe you can review bug 49618 (CharacterNames.h) in the meantime.
WebKit Commit Bot
Comment 13 2010-11-19 22:31:40 PST
Comment on attachment 74381 [details] test case for unicode plane 1 characters Rejecting patch 74381 from commit-queue. Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=eseidel-cq-sl', 'apply-attachment', '--force-clean', '--non-interactive', 74381]" exit_code: 1 Last 500 characters of output: in _process_patch self._main_sequence.run_and_handle_errors(tool, options, state) File "/Projects/CommitQueue/WebKitTools/Scripts/webkitpy/tool/commands/stepsequence.py", line 79, in run_and_handle_errors log(e.message_with_output()) File "/Projects/CommitQueue/WebKitTools/Scripts/webkitpy/common/system/deprecated_logging.py", line 39, in log print >> sys.stderr, string UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 292: ordinal not in range(128) Full output: http://queues.webkit.org/results/6155071
Patrick R. Gansterer
Comment 14 2010-11-20 01:26:14 PST
(In reply to comment #12) > (In reply to comment #11) > > You should check that this test case will catch that bug in convertUTF8ToUTF16, by leaving the bug in and running the test case. > I already tried it with the new test case, but it didn't catch an error. Now I found the problem: I missed one toString in bug 45488. With the patch from that bug the test fails as expected.
Patrick R. Gansterer
Comment 15 2010-11-20 03:06:04 PST
WebKit Review Bot
Comment 16 2010-11-20 03:08:35 PST
Attachment 74476 [details] did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style', '--diff-files', u'JavaScriptCore/ChangeLog', u'JavaScriptCore/wtf/unicode/UTF8.cpp', u'JavaScriptCore/wtf/unicode/UnicodeMacrosFromICU.h']" exit_code: 1 JavaScriptCore/wtf/unicode/UTF8.cpp:245: A case label should not be indented, but line up with its switch statement. [whitespace/indent] [4] JavaScriptCore/wtf/unicode/UTF8.cpp:245: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/wtf/unicode/UTF8.cpp:246: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/wtf/unicode/UTF8.cpp:247: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/wtf/unicode/UTF8.cpp:248: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/wtf/unicode/UTF8.cpp:249: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/wtf/unicode/UnicodeMacrosFromICU.h:41: Missing spaces around <= [whitespace/operators] [3] Total errors found: 7 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Patrick R. Gansterer
Comment 17 2010-11-21 05:11:39 PST
Patrick R. Gansterer
Comment 18 2010-11-21 05:14:35 PST
webkit-patch land closed bug, so reopen it
Patrick R. Gansterer
Comment 19 2010-11-30 01:52:45 PST
@darin: ping
Darin Adler
Comment 20 2010-11-30 10:37:39 PST
Comment on attachment 74476 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=74476&action=review > JavaScriptCore/wtf/unicode/UTF8.cpp:35 > +const UChar replacementCharacter = 0xFFFD; Since this is inside a .cpp file we should use static to give it internal linkage.
WebKit Commit Bot
Comment 21 2010-11-30 11:54:57 PST
The commit-queue encountered the following flaky tests while processing attachment 74476 [details]: compositing/iframes/overlapped-nested-iframes.html inspector/styles-disable-then-delete.html Please file bugs against the tests. These tests were authored by pfeldman@chromium.org and simon.fraser@apple.com. The commit-queue is continuing to process your patch.
WebKit Commit Bot
Comment 22 2010-11-30 12:38:16 PST
Comment on attachment 74476 [details] Patch Rejecting patch 74476 from commit-queue. Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=eseidel-cq-sl', 'build', '--no-clean', '--no-update', '--build-style=both']" exit_code: 2 Last 500 characters of output: s/CommitQueue/WebKitBuild/WebCore.build/Debug/WebCore.build/Objects-normal/x86_64/JSWebKitAnimationEvent.o /Projects/CommitQueue/WebKitBuild/Debug/DerivedSources/WebCore/JSWebKitAnimationEvent.cpp normal x86_64 c++ com.apple.compilers.gcc.4_2 CompileC /Projects/CommitQueue/WebKitBuild/WebCore.build/Debug/WebCore.build/Objects-normal/x86_64/JSWebGLShader.o /Projects/CommitQueue/WebKitBuild/Debug/DerivedSources/WebCore/JSWebGLShader.cpp normal x86_64 c++ com.apple.compilers.gcc.4_2 (5 failures) Full output: http://queues.webkit.org/results/6343115
Darin Adler
Comment 23 2010-11-30 13:09:06 PST
Comment on attachment 74476 [details] Patch Try again.
Eric Seidel (no email)
Comment 24 2010-11-30 13:14:56 PST
One of those flakes was bug 50255.
WebKit Commit Bot
Comment 25 2010-11-30 16:13:56 PST
The commit-queue encountered the following flaky tests while processing attachment 74476 [details]: animations/play-state-suspend.html http/tests/websocket/tests/multiple-connections.html Please file bugs against the tests. These tests were authored by abarth@webkit.org and cmarrin@apple.com. The commit-queue is continuing to process your patch.
WebKit Commit Bot
Comment 26 2010-11-30 16:40:33 PST
Comment on attachment 74476 [details] Patch Clearing flags on attachment: 74476 Committed r72979: <http://trac.webkit.org/changeset/72979>
WebKit Commit Bot
Comment 27 2010-11-30 16:40:40 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.