WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED INVALID
27366
TextEncodingDetector that uses Universal Charset Detector
https://bugs.webkit.org/show_bug.cgi?id=27366
Summary
TextEncodingDetector that uses Universal Charset Detector
Kwang Yul Seo
Reported
2009-07-17 03:44:16 PDT
Add a TextEncodingDetector implementation that uses Universal Charset Detector from Mozilla. The source code is taken from Mozilla: mozilla-central/extensions/universalchardet/src/base/ Universal Charset Detector is not usually available as a shared C/C++ library, so I included all code. The original code consists of many files, but I merged all source files into a single file to add it to the build system of many WebKit ports easily. I changed the coding style to follow WebKit Style Guidelines and I ran cpplint.py to ensure that there are no style errors. However, I've not changed the class and method names of Mozilla code. I think it is better to preserve the original class and method names. Please tell if there is the policy on this issue. Currently, there is only one implementation of TextEncodingDetector, TextEncodingDetectorICU. Ports without ICU can use TextEncodingDetectorUniversal by default because it imposes no external dependency.
Attachments
TextEncodingDetectorUniversal
(521.17 KB, patch)
2009-07-17 03:48 PDT
,
Kwang Yul Seo
mjs
: review-
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Kwang Yul Seo
Comment 1
2009-07-17 03:48:07 PDT
Created
attachment 32926
[details]
TextEncodingDetectorUniversal No change to the build.
Maciej Stachowiak
Comment 2
2009-07-21 00:49:45 PDT
Comment on
attachment 32926
[details]
TextEncodingDetectorUniversal What's our plan for maintaining this code? Will we sync with upstream periodically or maintain it ourselves? Either way, it seems like a bad idea to put all the code in one file. It seems like it will make maintenance harder. Please resubmit with files split properly. I have no comment on the merits as to whether including this is a good idea, probably international text experts should chime in. Is there any information available on Universal Charset Detector, what it does, and how it works?
Kwang Yul Seo
Comment 3
2009-07-21 03:57:10 PDT
Okay. I will resubmit the files. Uiversal Charset Detector is a language/encoding detector. There is a good paper on this.
http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
Universal Charset Detector is just another encoding detector which performs much better than the current ICU encoding detctor. There was a discussion on the merits of a encoding detector:
https://bugs.webkit.org/show_bug.cgi?id=16482
I think syncing with upstream periodically is a good strategy here because the code is quite stable.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug