Bug 162537 - Using char16_t instead of wchar_t in TestWebKitAPI URLParser tests
Summary: Using char16_t instead of wchar_t in TestWebKitAPI URLParser tests
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Yusuke Suzuki
URL:
Keywords:
: 161806 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-09-25 00:21 PDT by Yusuke Suzuki
Modified: 2016-09-27 06:04 PDT (History)
3 users (show)

See Also:


Attachments
Patch (11.15 KB, patch)
2016-09-25 00:27 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (11.47 KB, patch)
2016-09-26 00:34 PDT, Yusuke Suzuki
achristensen: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yusuke Suzuki 2016-09-25 00:21:53 PDT
Using char16_t instead of wchar_t in TestWebKitAPI URLParser tests
Comment 1 Yusuke Suzuki 2016-09-25 00:25:36 PDT
Some build bot (Linux ARM GTK port) fails since we cannot select the specific StringBuilder::append method for wchar_t.
wchar_t implementation highly depends on platforms. sizeof(wchar_t) == 2 in VC++, while sizeof(wchar_t) == 4 in GCC.

Instead of using wchar_t, we use char16_t and C++11 UTF-16 literal in test cases.
Comment 2 Yusuke Suzuki 2016-09-25 00:27:01 PDT
Created attachment 289774 [details]
Patch
Comment 3 Yusuke Suzuki 2016-09-25 00:27:21 PDT
https://build.webkit.org/builders/GTK%20Linux%20ARM%20Release/builds/12283 failing example.
Comment 4 Yusuke Suzuki 2016-09-25 00:28:51 PDT
Watching EWS.
Comment 5 Yusuke Suzuki 2016-09-26 00:34:51 PDT
Created attachment 289801 [details]
Patch
Comment 6 Alex Christensen 2016-09-26 08:14:46 PDT
Comment on attachment 289801 [details]
Patch

r=me
Be careful when committing this.  The commit queue and svn-apply change the emoji to UTF-8 encoded emoji, which breaks the test.
Comment 7 Yusuke Suzuki 2016-09-26 10:10:00 PDT
Committed r206378: <http://trac.webkit.org/changeset/206378>
Comment 8 Yusuke Suzuki 2016-09-26 10:11:23 PDT
(In reply to comment #6)
> Comment on attachment 289801 [details]
> Patch
> 
> r=me
> Be careful when committing this.  The commit queue and svn-apply change the
> emoji to UTF-8 encoded emoji, which breaks the test.

Thanks. Landed.
Comment 9 Darin Adler 2016-09-26 10:29:49 PDT
Seems a bit strange to use char16_t instead of UChar, given that we use UChar everywhere else in WebKit.
Comment 10 Darin Adler 2016-09-26 10:30:32 PDT
Another way to put that is: Maybe we should start using char16_t everywhere instead of UChar!
Comment 11 Yusuke Suzuki 2016-09-26 11:26:38 PDT
(In reply to comment #9)
> Seems a bit strange to use char16_t instead of UChar, given that we use
> UChar everywhere else in WebKit.

This is just used to use u"UTF-16" literal in C++11 in this testing.
This allows us to write UTF-16 chars including emoji directly in source code and makes testing easier.
char16_t is used only before calling utf16String() function.
After converting to WTFString, char16_t is no longer used.

(In reply to comment #10)
> Another way to put that is: Maybe we should start using char16_t everywhere
> instead of UChar!

That is very nice, but, it prevents us from using ICU functions easily.
There are 2 problems I think.

1. Built ICU may use different UChar definition.

If ICU is built with WebKit, we can replace UChar implementation from uint16_t to char16_t by passing -DUCHAR_TYPE=char16_t.
However, if ICU is system's one, we carefully need to align UChar type to the system's one because C++ mangled names are different among char16_t, uint16_t, and wchar_t.

For example, OS X's ICU's UnicodeString (const UChar *text, int32_t textLength) constructor is mangled as `__ZN3icu13UnicodeStringC2EPKti` -> UnicodeString(const unsigned short*, int). So UnicodeString(const char16_t*, int) causes a link error.

If we only use ICU C API, we can ignore this problem. However, it seems that some places use ICU C++ APIs.

2. Windows ICU prefer wchar_t.

Windows has a lot of APIs using wchar_t. So ICU uses wchar_t for UChar in Windows.
This allows us to pass UChar* directly to Windows APIs.
If we change UChar to char16_t, we need to use properly wchar_t and char16_t in those places.

Personally, I think using char16_t instead of using ICU UChar doesn't give us nice benefit at this time. What do you think of?
Comment 12 Carlos Alberto Lopez Perez 2016-09-27 06:04:17 PDT
*** Bug 161806 has been marked as a duplicate of this bug. ***