Bug 49581 - Cleanup UTF8.cpp
Summary: Cleanup UTF8.cpp
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Patrick R. Gansterer
URL:
Keywords:
Depends on: 45488 49618
Blocks: 45594
  Show dependency treegraph
 
Reported: 2010-11-15 23:45 PST by Patrick R. Gansterer
Modified: 2010-11-30 16:40 PST (History)
4 users (show)

See Also:


Attachments
Patch (7.21 KB, patch)
2010-11-15 23:54 PST, Patrick R. Gansterer
darin: review-
darin: commit-queue-
Details | Formatted Diff | Diff
test case for unicode plane 1 characters (1.94 KB, patch)
2010-11-19 06:38 PST, Patrick R. Gansterer
no flags Details | Formatted Diff | Diff
Patch (7.68 KB, patch)
2010-11-20 03:06 PST, Patrick R. Gansterer
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick R. Gansterer 2010-11-15 23:45:20 PST
see patch
Comment 1 Patrick R. Gansterer 2010-11-15 23:54:52 PST
Created attachment 73969 [details]
Patch
Comment 2 WebKit Review Bot 2010-11-15 23:58:18 PST
Attachment 73969 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--diff-files', u'JavaScriptCore/ChangeLog', u'JavaScriptCore/wtf/unicode/UTF8.cpp', u'JavaScriptCore/wtf/unicode/UnicodeMacrosFromICU.h']" exit_code: 1
JavaScriptCore/wtf/unicode/UTF8.cpp:242:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
JavaScriptCore/wtf/unicode/UTF8.cpp:242:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/wtf/unicode/UTF8.cpp:243:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/wtf/unicode/UTF8.cpp:244:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/wtf/unicode/UTF8.cpp:245:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/wtf/unicode/UTF8.cpp:246:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/wtf/unicode/UnicodeMacrosFromICU.h:41:  Missing spaces around <=  [whitespace/operators] [3]
Total errors found: 7 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Darin Adler 2010-11-16 10:33:05 PST
Comment on attachment 73969 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=73969&action=review

Thanks for making these changes. I think these are an improvement!

> JavaScriptCore/wtf/unicode/UTF8.cpp:290
> +                *target++ = static_cast<UChar>(character); // normal case

Is this typecast really needed? Why? Doesn’t narrowing happen automatically without explicit typecasting.

> JavaScriptCore/wtf/unicode/UTF8.cpp:300
> +            character -= 0x0010000UL;
> +            *target++ = U16_LEAD(character);
> +            *target++ = U16_TRAIL(character);

This is wrong. The U16_LEAD macro works on an actual character, not a character that has had subtraction done. So this change will break things! We need to make sure we have testing coverage.

> JavaScriptCore/wtf/unicode/UTF8.cpp:307
> +                *target++ = 0xFFFD;

We should consider moving CharacterNames.h from WebCore into WTF. Then we could use the name replacementCharacter here instead of the constant 0xFFFD.
Comment 4 Patrick R. Gansterer 2010-11-16 13:43:06 PST
(In reply to comment #3)
> Thanks for making these changes. I think these are an improvement!
I hope so ;-)

> > JavaScriptCore/wtf/unicode/UTF8.cpp:300
> > +            character -= 0x0010000UL;
> > +            *target++ = U16_LEAD(character);
> > +            *target++ = U16_TRAIL(character);
> 
> This is wrong. The U16_LEAD macro works on an actual character, not a character that has had subtraction done. So this change will break things! We need to make sure we have testing coverage.
Uups, missed removal of this line. I had some failures in the LayoutTests, but they seam to be unrelated to UTF8 changes. :-/
I'm not 100% familar with all LayouTests. Can you point me to a good example to test this? There are some xml test with invalid unicode sequenzes, but I'm not sure if that's the best way to to this too.

> > JavaScriptCore/wtf/unicode/UTF8.cpp:307
> > +                *target++ = 0xFFFD;
> 
> We should consider moving CharacterNames.h from WebCore into WTF. Then we could use the name replacementCharacter here instead of the constant 0xFFFD.
Created bug 49618 to move CharacterNames.h into WTF.
Comment 5 Darin Adler 2010-11-16 18:08:32 PST
(In reply to comment #4)
> I'm not 100% familar with all LayoutTests.

Nobody is. There are too many for any one person to know them all.

> Can you point me to a good example to test this?

I’m not sure there is already one. First you’d have to look at where this function is used, and then figure out a way to make a test that will pass in a UTF-8 sequence for a character that is non-BMP to that code path. I’d be happy to help, but I can’t do it right now.
Comment 6 Darin Adler 2010-11-17 13:57:33 PST
(In reply to comment #5)
> First you’d have to look at where this function is used

As you know from your AtomicString work, one place this is used is in the XML parser. Knowing that, I think we can easily produce a test using DOMParser.
Comment 7 Darin Adler 2010-11-17 14:00:25 PST
(In reply to comment #6)
> As you know from your AtomicString work, one place this is used is in the XML parser. Knowing that, I think we can easily produce a test using DOMParser.

That won’t be completely great because in the case of DOMParser, the input is UTF-16, which is then converted to UTF-8, and then back to UTF-16. But it can test if the round trip works, and would have caught the 0x10000 bug.
Comment 8 Darin Adler 2010-11-17 14:01:18 PST
Just adding more characters to fast/encoding/xml-utf-8-default.xml might work.
Comment 9 Patrick R. Gansterer 2010-11-17 14:11:10 PST
(In reply to comment #8)
> Just adding more characters to fast/encoding/xml-utf-8-default.xml might work.
Ok, thx! I'll look for some characters in the different unicode ranges add them and post a patch.
Comment 10 Patrick R. Gansterer 2010-11-19 06:38:00 PST
Created attachment 74381 [details]
test case for unicode plane 1 characters
Comment 11 Darin Adler 2010-11-19 09:57:09 PST
You should check that this test case will catch that bug in convertUTF8ToUTF16, by leaving the bug in and running the test case.
Comment 12 Patrick R. Gansterer 2010-11-19 10:03:39 PST
(In reply to comment #11)
> You should check that this test case will catch that bug in convertUTF8ToUTF16, by leaving the bug in and running the test case.
Argh, missed to add the comment :-/.
I already tried it with the new test case, but it didn't catch an error. Then I inserted a CRASH in the convertUTF8ToUTF16 function, but it didn't crashed.
I didn't had the time to look deeper into this issue, but this additonal testcase won't hurt anyway.
Maybe you can review bug 49618 (CharacterNames.h) in the meantime.
Comment 13 WebKit Commit Bot 2010-11-19 22:31:40 PST
Comment on attachment 74381 [details]
test case for unicode plane 1 characters

Rejecting patch 74381 from commit-queue.

Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=eseidel-cq-sl', 'apply-attachment', '--force-clean', '--non-interactive', 74381]" exit_code: 1
Last 500 characters of output:
 in _process_patch
    self._main_sequence.run_and_handle_errors(tool, options, state)
  File "/Projects/CommitQueue/WebKitTools/Scripts/webkitpy/tool/commands/stepsequence.py", line 79, in run_and_handle_errors
    log(e.message_with_output())
  File "/Projects/CommitQueue/WebKitTools/Scripts/webkitpy/common/system/deprecated_logging.py", line 39, in log
    print >> sys.stderr, string
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 292: ordinal not in range(128)

Full output: http://queues.webkit.org/results/6155071
Comment 14 Patrick R. Gansterer 2010-11-20 01:26:14 PST
(In reply to comment #12)
> (In reply to comment #11)
> > You should check that this test case will catch that bug in convertUTF8ToUTF16, by leaving the bug in and running the test case.
> I already tried it with the new test case, but it didn't catch an error.
Now I found the problem: I missed one toString in bug 45488. With the patch from that bug the test fails as expected.
Comment 15 Patrick R. Gansterer 2010-11-20 03:06:04 PST
Created attachment 74476 [details]
Patch
Comment 16 WebKit Review Bot 2010-11-20 03:08:35 PST
Attachment 74476 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--diff-files', u'JavaScriptCore/ChangeLog', u'JavaScriptCore/wtf/unicode/UTF8.cpp', u'JavaScriptCore/wtf/unicode/UnicodeMacrosFromICU.h']" exit_code: 1
JavaScriptCore/wtf/unicode/UTF8.cpp:245:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
JavaScriptCore/wtf/unicode/UTF8.cpp:245:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/wtf/unicode/UTF8.cpp:246:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/wtf/unicode/UTF8.cpp:247:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/wtf/unicode/UTF8.cpp:248:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/wtf/unicode/UTF8.cpp:249:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/wtf/unicode/UnicodeMacrosFromICU.h:41:  Missing spaces around <=  [whitespace/operators] [3]
Total errors found: 7 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 17 Patrick R. Gansterer 2010-11-21 05:11:39 PST
Committed r72497: <http://trac.webkit.org/changeset/72497>
Comment 18 Patrick R. Gansterer 2010-11-21 05:14:35 PST
webkit-patch land closed bug, so reopen it
Comment 19 Patrick R. Gansterer 2010-11-30 01:52:45 PST
@darin: ping
Comment 20 Darin Adler 2010-11-30 10:37:39 PST
Comment on attachment 74476 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=74476&action=review

> JavaScriptCore/wtf/unicode/UTF8.cpp:35
> +const UChar replacementCharacter = 0xFFFD;

Since this is inside a .cpp file we should use static to give it internal linkage.
Comment 21 WebKit Commit Bot 2010-11-30 11:54:57 PST
The commit-queue encountered the following flaky tests while processing attachment 74476 [details]:

compositing/iframes/overlapped-nested-iframes.html
inspector/styles-disable-then-delete.html

Please file bugs against the tests.  These tests were authored by pfeldman@chromium.org and simon.fraser@apple.com.  The commit-queue is continuing to process your patch.
Comment 22 WebKit Commit Bot 2010-11-30 12:38:16 PST
Comment on attachment 74476 [details]
Patch

Rejecting patch 74476 from commit-queue.

Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=eseidel-cq-sl', 'build', '--no-clean', '--no-update', '--build-style=both']" exit_code: 2
Last 500 characters of output:
s/CommitQueue/WebKitBuild/WebCore.build/Debug/WebCore.build/Objects-normal/x86_64/JSWebKitAnimationEvent.o /Projects/CommitQueue/WebKitBuild/Debug/DerivedSources/WebCore/JSWebKitAnimationEvent.cpp normal x86_64 c++ com.apple.compilers.gcc.4_2
	CompileC /Projects/CommitQueue/WebKitBuild/WebCore.build/Debug/WebCore.build/Objects-normal/x86_64/JSWebGLShader.o /Projects/CommitQueue/WebKitBuild/Debug/DerivedSources/WebCore/JSWebGLShader.cpp normal x86_64 c++ com.apple.compilers.gcc.4_2
(5 failures)


Full output: http://queues.webkit.org/results/6343115
Comment 23 Darin Adler 2010-11-30 13:09:06 PST
Comment on attachment 74476 [details]
Patch

Try again.
Comment 24 Eric Seidel (no email) 2010-11-30 13:14:56 PST
One of those flakes was bug 50255.
Comment 25 WebKit Commit Bot 2010-11-30 16:13:56 PST
The commit-queue encountered the following flaky tests while processing attachment 74476 [details]:

animations/play-state-suspend.html
http/tests/websocket/tests/multiple-connections.html

Please file bugs against the tests.  These tests were authored by abarth@webkit.org and cmarrin@apple.com.  The commit-queue is continuing to process your patch.
Comment 26 WebKit Commit Bot 2010-11-30 16:40:33 PST
Comment on attachment 74476 [details]
Patch

Clearing flags on attachment: 74476

Committed r72979: <http://trac.webkit.org/changeset/72979>
Comment 27 WebKit Commit Bot 2010-11-30 16:40:40 PST
All reviewed patches have been landed.  Closing bug.