15907 – Implement USE(QT4_UNICODE) for UTF8 -> UTF16 conversion

RESOLVED FIXED 15907

Implement USE(QT4_UNICODE) for UTF8 -> UTF16 conversion

https://bugs.webkit.org/show_bug.cgi?id=15907

Summary Implement USE(QT4_UNICODE) for UTF8 -> UTF16 conversion

Justin Haygood

Reported 2007-11-08 14:51:26 PST

UTF8->UTF16 conversion in c_utility.cpp doesn't have a Qt4 Unicode implementation

Attachments
implements qt4 unicode in c_utility.cpp (976 bytes, patch) 2007-11-08 14:52 PST, Justin Haygood	no flags	Details Formatted Diff Diff
implements qt4 unicode in c_utility.cpp (1.30 KB, patch) 2007-11-08 14:53 PST, Justin Haygood	mrowe: review-	Details Formatted Diff Diff
implements Qt 4 Unicode path in c_utility.cpp (2.07 KB, patch) 2007-11-08 19:47 PST, Justin Haygood	ap: review-	Details Formatted Diff Diff
Changes implementation of convertutf8toutf16 to use unicode.org conversion function (30.12 KB, patch) 2007-11-09 13:33 PST, Justin Haygood	staikos: review-	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Justin Haygood

Comment 1 2007-11-08 14:52:07 PST

Created attachment 17128 [details] implements qt4 unicode in c_utility.cpp

Justin Haygood

Comment 2 2007-11-08 14:53:20 PST

Created attachment 17129 [details] implements qt4 unicode in c_utility.cpp Ooops.. I need to add my copyright at the top :)

Mark Rowe (bdash)

Comment 3 2007-11-08 14:58:51 PST

Comment on attachment 17129 [details] implements qt4 unicode in c_utility.cpp The patch needs a ChangeLog entry, and the coding style should follow the style guidelines. I know that the ICU code above does not, but it'd be great if new code would. In particular, the use of C-style casts, and the one-line for statement with strange whitespace usage.

Justin Haygood

Comment 4 2007-11-08 19:47:09 PST

Created attachment 17138 [details] implements Qt 4 Unicode path in c_utility.cpp

Alexey Proskuryakov

Comment 5 2007-11-08 23:17:14 PST

Comment on attachment 17138 [details] implements Qt 4 Unicode path in c_utility.cpp + *UTF16Length = wcslen(utf16); This won't work correctly if there are null characters in the string, and won't even compile on platforms with a different definition of wchar_t (such as Mac OS X). I think it would be best to take code from <http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c>, possibly throwing away the ICU-based implementation. Also, see the Latin-1 fallback code path in ICU branch - I think we'd like to have it on all platforms.

Justin Haygood

Comment 6 2007-11-09 13:33:45 PST

Created attachment 17161 [details] Changes implementation of convertutf8toutf16 to use unicode.org conversion function Alright, changes implementation to use unicode.org sources. ap was iffy on adding the files in their entirety, but I don't see a problem with it.

Holger Freyther

Comment 7 2007-11-10 17:26:52 PST

Tossing my two cents. I care a lot for the binary size. So this adds 30k of code for something already available in the toolkit so please make it optional to use the built-in codec...

George Staikos

Comment 8 2007-11-10 17:42:20 PST

Comment on attachment 17161 [details] Changes implementation of convertutf8toutf16 to use unicode.org conversion function Agreed, this is adding a lot of code we just don't need in Qt afaik.

Alexey Proskuryakov

Comment 9 2007-11-11 02:21:25 PST

(In reply to comment #7) > So this adds 30k of code for something already available in the toolkit so please make it optional > to use the built-in codec... Hmm, where do these 30K come from? We only use a single tiny function (I assume you do have dead code stripping enabled). I don't think such a trivial operation warrants having platform-specific branches. I was actually considering making a custom UTF-8 decoder for WebCore, too - just as with Latin-1, it's bad for performance to initialize the whole codec registry just for those.

Alexey Proskuryakov

Comment 10 2007-11-11 11:15:40 PST

Darin suggests that it would be best to have UTF-8 decoding in one place, such as wtf/unicode. UString::UTF8String() can go there, too.

Alexey Proskuryakov

Comment 11 2007-11-12 05:15:53 PST

(In reply to comment #10) > Darin suggests that it would be best to have UTF-8 decoding in one place, such > as wtf/unicode. UString::UTF8String() can go there, too. Doing that in bug 15953. Keeping this bug open for port authors to consider adding platform-specific branches.

Simon Hausmann

Comment 12 2010-03-23 02:17:40 PDT

Closing this bug since r30794 consolidates this into WebCore::String::fromUTF8, which uses the TextEncoding/TextEncoder infrastructure.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware PC

OS OS X 10.4

Product WebKit

Component Text

Assignee

George Staikos

Reported

2007-11-08 14:51 PST

Modified

2010-03-23 02:17 PDT History

CC List

3 users Show

URL

Keywords Qt

Depends on

Blocks