RESOLVED FIXED 15907
Implement USE(QT4_UNICODE) for UTF8 -> UTF16 conversion
https://bugs.webkit.org/show_bug.cgi?id=15907
Summary Implement USE(QT4_UNICODE) for UTF8 -> UTF16 conversion
Justin Haygood
Reported 2007-11-08 14:51:26 PST
UTF8->UTF16 conversion in c_utility.cpp doesn't have a Qt4 Unicode implementation
Attachments
implements qt4 unicode in c_utility.cpp (976 bytes, patch)
2007-11-08 14:52 PST, Justin Haygood
no flags
implements qt4 unicode in c_utility.cpp (1.30 KB, patch)
2007-11-08 14:53 PST, Justin Haygood
mrowe: review-
implements Qt 4 Unicode path in c_utility.cpp (2.07 KB, patch)
2007-11-08 19:47 PST, Justin Haygood
ap: review-
Changes implementation of convertutf8toutf16 to use unicode.org conversion function (30.12 KB, patch)
2007-11-09 13:33 PST, Justin Haygood
staikos: review-
Justin Haygood
Comment 1 2007-11-08 14:52:07 PST
Created attachment 17128 [details] implements qt4 unicode in c_utility.cpp
Justin Haygood
Comment 2 2007-11-08 14:53:20 PST
Created attachment 17129 [details] implements qt4 unicode in c_utility.cpp Ooops.. I need to add my copyright at the top :)
Mark Rowe (bdash)
Comment 3 2007-11-08 14:58:51 PST
Comment on attachment 17129 [details] implements qt4 unicode in c_utility.cpp The patch needs a ChangeLog entry, and the coding style should follow the style guidelines. I know that the ICU code above does not, but it'd be great if new code would. In particular, the use of C-style casts, and the one-line for statement with strange whitespace usage.
Justin Haygood
Comment 4 2007-11-08 19:47:09 PST
Created attachment 17138 [details] implements Qt 4 Unicode path in c_utility.cpp
Alexey Proskuryakov
Comment 5 2007-11-08 23:17:14 PST
Comment on attachment 17138 [details] implements Qt 4 Unicode path in c_utility.cpp + *UTF16Length = wcslen(utf16); This won't work correctly if there are null characters in the string, and won't even compile on platforms with a different definition of wchar_t (such as Mac OS X). I think it would be best to take code from <http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c>, possibly throwing away the ICU-based implementation. Also, see the Latin-1 fallback code path in ICU branch - I think we'd like to have it on all platforms.
Justin Haygood
Comment 6 2007-11-09 13:33:45 PST
Created attachment 17161 [details] Changes implementation of convertutf8toutf16 to use unicode.org conversion function Alright, changes implementation to use unicode.org sources. ap was iffy on adding the files in their entirety, but I don't see a problem with it.
Holger Freyther
Comment 7 2007-11-10 17:26:52 PST
Tossing my two cents. I care a lot for the binary size. So this adds 30k of code for something already available in the toolkit so please make it optional to use the built-in codec...
George Staikos
Comment 8 2007-11-10 17:42:20 PST
Comment on attachment 17161 [details] Changes implementation of convertutf8toutf16 to use unicode.org conversion function Agreed, this is adding a lot of code we just don't need in Qt afaik.
Alexey Proskuryakov
Comment 9 2007-11-11 02:21:25 PST
(In reply to comment #7) > So this adds 30k of code for something already available in the toolkit so please make it optional > to use the built-in codec... Hmm, where do these 30K come from? We only use a single tiny function (I assume you do have dead code stripping enabled). I don't think such a trivial operation warrants having platform-specific branches. I was actually considering making a custom UTF-8 decoder for WebCore, too - just as with Latin-1, it's bad for performance to initialize the whole codec registry just for those.
Alexey Proskuryakov
Comment 10 2007-11-11 11:15:40 PST
Darin suggests that it would be best to have UTF-8 decoding in one place, such as wtf/unicode. UString::UTF8String() can go there, too.
Alexey Proskuryakov
Comment 11 2007-11-12 05:15:53 PST
(In reply to comment #10) > Darin suggests that it would be best to have UTF-8 decoding in one place, such > as wtf/unicode. UString::UTF8String() can go there, too. Doing that in bug 15953. Keeping this bug open for port authors to consider adding platform-specific branches.
Simon Hausmann
Comment 12 2010-03-23 02:17:40 PDT
Closing this bug since r30794 consolidates this into WebCore::String::fromUTF8, which uses the TextEncoding/TextEncoder infrastructure.
Note You need to log in before you can comment on or make changes to this bug.