Bug 15907 - Implement USE(QT4_UNICODE) for UTF8 -> UTF16 conversion
Summary: Implement USE(QT4_UNICODE) for UTF8 -> UTF16 conversion
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.4
: P2 Normal
Assignee: George Staikos
URL:
Keywords: Qt
Depends on:
Blocks:
 
Reported: 2007-11-08 14:51 PST by Justin Haygood
Modified: 2010-03-23 02:17 PDT (History)
3 users (show)

See Also:


Attachments
implements qt4 unicode in c_utility.cpp (976 bytes, patch)
2007-11-08 14:52 PST, Justin Haygood
no flags Details | Formatted Diff | Diff
implements qt4 unicode in c_utility.cpp (1.30 KB, patch)
2007-11-08 14:53 PST, Justin Haygood
mrowe: review-
Details | Formatted Diff | Diff
implements Qt 4 Unicode path in c_utility.cpp (2.07 KB, patch)
2007-11-08 19:47 PST, Justin Haygood
ap: review-
Details | Formatted Diff | Diff
Changes implementation of convertutf8toutf16 to use unicode.org conversion function (30.12 KB, patch)
2007-11-09 13:33 PST, Justin Haygood
staikos: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Haygood 2007-11-08 14:51:26 PST
UTF8->UTF16 conversion in c_utility.cpp doesn't have a Qt4 Unicode implementation
Comment 1 Justin Haygood 2007-11-08 14:52:07 PST
Created attachment 17128 [details]
implements qt4 unicode in c_utility.cpp
Comment 2 Justin Haygood 2007-11-08 14:53:20 PST
Created attachment 17129 [details]
implements qt4 unicode in c_utility.cpp

Ooops.. I need to add my copyright at the top :)
Comment 3 Mark Rowe (bdash) 2007-11-08 14:58:51 PST
Comment on attachment 17129 [details]
implements qt4 unicode in c_utility.cpp

The patch needs a ChangeLog entry, and the coding style should follow the style guidelines.  I know that the ICU code above does not, but it'd be great if new code would.  In particular, the use of C-style casts, and the one-line for statement with strange whitespace usage.
Comment 4 Justin Haygood 2007-11-08 19:47:09 PST
Created attachment 17138 [details]
implements Qt 4 Unicode path in c_utility.cpp
Comment 5 Alexey Proskuryakov 2007-11-08 23:17:14 PST
Comment on attachment 17138 [details]
implements Qt 4 Unicode path in c_utility.cpp

+    *UTF16Length = wcslen(utf16);

This won't work correctly if there are null characters in the string, and won't even compile on platforms with a different definition of wchar_t (such as Mac OS X).

I think it would be best to take code from <http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c>, possibly throwing away the ICU-based implementation.

Also, see the Latin-1 fallback code path in ICU branch - I think we'd like to have it on all platforms.
Comment 6 Justin Haygood 2007-11-09 13:33:45 PST
Created attachment 17161 [details]
Changes implementation of convertutf8toutf16 to use unicode.org conversion function

Alright, changes implementation to use unicode.org sources. ap was iffy on adding the files in their entirety, but I don't see a problem with it.
Comment 7 Holger Freyther 2007-11-10 17:26:52 PST
Tossing my two cents. I care a lot for the binary size. So this adds 30k of code for something already available in the toolkit so please make it optional to use the built-in codec...
Comment 8 George Staikos 2007-11-10 17:42:20 PST
Comment on attachment 17161 [details]
Changes implementation of convertutf8toutf16 to use unicode.org conversion function

Agreed, this is adding a lot of code we just don't need in Qt afaik.
Comment 9 Alexey Proskuryakov 2007-11-11 02:21:25 PST
(In reply to comment #7)
> So this adds 30k of code for something already available in the toolkit so please make it optional
> to use the built-in codec...

Hmm, where do these 30K come from? We only use a single tiny function (I assume you do have dead code stripping enabled). I don't think such a trivial operation warrants having platform-specific branches.

I was actually considering making a custom UTF-8 decoder for WebCore, too - just as with Latin-1, it's bad for performance to initialize the whole codec registry just for those.
Comment 10 Alexey Proskuryakov 2007-11-11 11:15:40 PST
Darin suggests that it would be best to have UTF-8 decoding in one place, such as wtf/unicode. UString::UTF8String() can go there, too.
Comment 11 Alexey Proskuryakov 2007-11-12 05:15:53 PST
(In reply to comment #10)
> Darin suggests that it would be best to have UTF-8 decoding in one place, such
> as wtf/unicode. UString::UTF8String() can go there, too.

Doing that in bug 15953. Keeping this bug open for port authors to consider adding platform-specific branches.
Comment 12 Simon Hausmann 2010-03-23 02:17:40 PDT
Closing this bug since r30794 consolidates this into WebCore::String::fromUTF8, which uses the TextEncoding/TextEncoder infrastructure.