UTF8->UTF16 conversion in c_utility.cpp doesn't have a Qt4 Unicode implementation
Created attachment 17128 [details] implements qt4 unicode in c_utility.cpp
Created attachment 17129 [details] implements qt4 unicode in c_utility.cpp Ooops.. I need to add my copyright at the top :)
Comment on attachment 17129 [details] implements qt4 unicode in c_utility.cpp The patch needs a ChangeLog entry, and the coding style should follow the style guidelines. I know that the ICU code above does not, but it'd be great if new code would. In particular, the use of C-style casts, and the one-line for statement with strange whitespace usage.
Created attachment 17138 [details] implements Qt 4 Unicode path in c_utility.cpp
Comment on attachment 17138 [details] implements Qt 4 Unicode path in c_utility.cpp + *UTF16Length = wcslen(utf16); This won't work correctly if there are null characters in the string, and won't even compile on platforms with a different definition of wchar_t (such as Mac OS X). I think it would be best to take code from <http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c>, possibly throwing away the ICU-based implementation. Also, see the Latin-1 fallback code path in ICU branch - I think we'd like to have it on all platforms.
Created attachment 17161 [details] Changes implementation of convertutf8toutf16 to use unicode.org conversion function Alright, changes implementation to use unicode.org sources. ap was iffy on adding the files in their entirety, but I don't see a problem with it.
Tossing my two cents. I care a lot for the binary size. So this adds 30k of code for something already available in the toolkit so please make it optional to use the built-in codec...
Comment on attachment 17161 [details] Changes implementation of convertutf8toutf16 to use unicode.org conversion function Agreed, this is adding a lot of code we just don't need in Qt afaik.
(In reply to comment #7) > So this adds 30k of code for something already available in the toolkit so please make it optional > to use the built-in codec... Hmm, where do these 30K come from? We only use a single tiny function (I assume you do have dead code stripping enabled). I don't think such a trivial operation warrants having platform-specific branches. I was actually considering making a custom UTF-8 decoder for WebCore, too - just as with Latin-1, it's bad for performance to initialize the whole codec registry just for those.
Darin suggests that it would be best to have UTF-8 decoding in one place, such as wtf/unicode. UString::UTF8String() can go there, too.
(In reply to comment #10) > Darin suggests that it would be best to have UTF-8 decoding in one place, such > as wtf/unicode. UString::UTF8String() can go there, too. Doing that in bug 15953. Keeping this bug open for port authors to consider adding platform-specific branches.
Closing this bug since r30794 consolidates this into WebCore::String::fromUTF8, which uses the TextEncoding/TextEncoder infrastructure.