WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
115848
Make TextCodecICU not depend on TextEncoding
https://bugs.webkit.org/show_bug.cgi?id=115848
Summary
Make TextCodecICU not depend on TextEncoding
Alexey Proskuryakov
Reported
2013-05-08 22:11:05 PDT
TextEncoding is a WebCore concept that builds on TextCodec, which is a platform API wrapper, so using the former in TextCodecICU is confusing. It's used for two reasons: 1. Passing an encoding as TextEncoding guarantees that the encoding name has been already canonicalized. 2. We use TextEncoding::displayString() to tweak Asian encoders, so that they have desired behavior with backslash. I think that even though the string transformation is the same in both cases, using displayString for this is conceptually unsound.
Attachments
proposed patch
(5.85 KB, patch)
2013-05-08 22:14 PDT
,
Alexey Proskuryakov
darin
: review+
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2013-05-08 22:14:16 PDT
Created
attachment 201135
[details]
proposed patch I'm looking into this because I plan to add an ICU codec name pointer to TextCodecICU for <
rdar://problem/13823864
>.
WebKit Commit Bot
Comment 2
2013-05-08 22:15:36 PDT
Attachment 201135
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCore/platform/text/TextCodecICU.cpp', u'Source/WebCore/platform/text/TextCodecICU.h']" exit_code: 1 Source/WebCore/platform/text/TextCodecICU.cpp:339: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] Total errors found: 1 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Benjamin Poulain
Comment 3
2013-05-09 01:46:11 PDT
Comment on
attachment 201135
[details]
proposed patch View in context:
https://bugs.webkit.org/attachment.cgi?id=201135&action=review
> Source/WebCore/platform/text/TextCodecICU.cpp:62 > + // TextEncoding name is always atomic, so the string is never deleted.
I am a little confused by "atomic" here. What do you mean?
Alexey Proskuryakov
Comment 4
2013-05-09 08:45:22 PDT
Comment on
attachment 201135
[details]
proposed patch View in context:
https://bugs.webkit.org/attachment.cgi?id=201135&action=review
>> Source/WebCore/platform/text/TextCodecICU.cpp:62 >> + // TextEncoding name is always atomic, so the string is never deleted. > > I am a little confused by "atomic" here. What do you mean?
TextEncodingRegistry has a map for encodings, see atomicCanonicalTextEncodingName().
Darin Adler
Comment 5
2013-05-10 17:22:57 PDT
Comment on
attachment 201135
[details]
proposed patch View in context:
https://bugs.webkit.org/attachment.cgi?id=201135&action=review
> Source/WebCore/platform/text/TextCodecICU.cpp:32 > +#include "TextEncodingRegistry.h"
Why is this include needed?
>>> Source/WebCore/platform/text/TextCodecICU.cpp:62 >>> + // TextEncoding name is always atomic, so the string is never deleted. >> >> I am a little confused by "atomic" here. What do you mean? > > TextEncodingRegistry has a map for encodings, see atomicCanonicalTextEncodingName().
Maybe word more like this? // TextEncoding name is always one from atomicCanonicalTextEncodingName, guaranteed to never be deleted.
> Source/WebCore/platform/text/TextCodecICU.cpp:230 > - const char* name = m_encoding.name(); > - m_needsGBKFallbacks = name[0] == 'G' && name[1] == 'B' && name[2] == 'K' && !name[3]; > + m_needsGBKFallbacks = m_encodingName[0] == 'G' && m_encodingName[1] == 'B' && m_encodingName[2] == 'K' && !m_encodingName[3];
Seems to be a slight decrease in readability. The original author (me) thought that using a local made it more readable. It wasn’t performance optimization, but expression readability optimization.
> Source/WebCore/platform/text/TextCodecICU.cpp:437 > + if (shouldShowBackslashAsCurrencySymbolIn(m_encodingName)) {
Why not optimize further by searching for '\\' and not making a copy if there is no '\\' character?
> Source/WebCore/platform/text/TextCodecICU.cpp:439 > + copy.append(characters, length); > + copy.replace('\\', 0xA5);
This copies the characters twice. Instead, we could just use Vector<UChar> and modify the copy in place.
> Source/WebCore/platform/text/TextCodecICU.h:45 > + TextCodecICU(const char* encoding);
Maybe mark this explicit?
> Source/WebCore/platform/text/TextCodecICU.h:59 > + const char* m_encodingName;
Maybe even const char* const?
Alexey Proskuryakov
Comment 6
2013-05-10 22:45:20 PDT
>> +#include "TextEncodingRegistry.h"
>
> Why is this include needed?
We use shouldShowBackslashAsCurrencySymbolIn() that's declared in this header.
> Seems to be a slight decrease in readability. The original author (me) thought that using a local made it > more readable. It wasn’t performance optimization, but expression readability optimization.
For reference, this change was <
http://trac.webkit.org/changeset/21227/trunk/WebCore/platform/TextCodecICU.cpp
>. Before the change, the code would implicitly enumerate non-base codecs just to construct TextEncoding("GBK"). Interestingly, that's a historic example of why it's undesirable to use TextEncoding in TextCodecICU.cpp. I'm unsure why we can't just use strcmp, and in fact, I have a local patch that moves this code, and changes it to use strcmp. Made the other suggested changes, except for optimization ideas - this is just encode(), which is probably not worth any added risk. Committed <
http://trac.webkit.org/changeset/149924
>.
Darin Adler
Comment 7
2013-05-11 16:18:20 PDT
(In reply to
comment #6
)
> > Seems to be a slight decrease in readability. The original author (me) thought that using a local made it > > more readable. It wasn’t performance optimization, but expression readability optimization. > > For reference, this change was <
http://trac.webkit.org/changeset/21227/trunk/WebCore/platform/TextCodecICU.cpp
>. Before the change, the code would implicitly enumerate non-base codecs just to construct TextEncoding("GBK"). Interestingly, that's a historic example of why it's undesirable to use TextEncoding in TextCodecICU.cpp. > > I'm unsure why we can't just use strcmp
I think that when I wrote this I thought that the four explicit comparisons would be faster than a strcmp. Depends how aggressive compilers are at optimizing strcmp, particularly when a string constant is involved. And also whether code size or speed is more important. For readability, I think strcmp and memcmp are some of the worst functions. Remembering that "0 means equal" is confusing enough that we made an exception to our "== 0" rule for it, an exception that most people probably don’t realize it exists. But a comparison with "GBK" is obviously better, if only there was a clearer way than strcmp ;-)
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug