TextEncoding is a WebCore concept that builds on TextCodec, which is a platform API wrapper, so using the former in TextCodecICU is confusing. It's used for two reasons: 1. Passing an encoding as TextEncoding guarantees that the encoding name has been already canonicalized. 2. We use TextEncoding::displayString() to tweak Asian encoders, so that they have desired behavior with backslash. I think that even though the string transformation is the same in both cases, using displayString for this is conceptually unsound.
Created attachment 201135 [details] proposed patch I'm looking into this because I plan to add an ICU codec name pointer to TextCodecICU for <rdar://problem/13823864>.
Attachment 201135 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCore/platform/text/TextCodecICU.cpp', u'Source/WebCore/platform/text/TextCodecICU.h']" exit_code: 1 Source/WebCore/platform/text/TextCodecICU.cpp:339: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] Total errors found: 1 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 201135 [details] proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=201135&action=review > Source/WebCore/platform/text/TextCodecICU.cpp:62 > + // TextEncoding name is always atomic, so the string is never deleted. I am a little confused by "atomic" here. What do you mean?
Comment on attachment 201135 [details] proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=201135&action=review >> Source/WebCore/platform/text/TextCodecICU.cpp:62 >> + // TextEncoding name is always atomic, so the string is never deleted. > > I am a little confused by "atomic" here. What do you mean? TextEncodingRegistry has a map for encodings, see atomicCanonicalTextEncodingName().
Comment on attachment 201135 [details] proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=201135&action=review > Source/WebCore/platform/text/TextCodecICU.cpp:32 > +#include "TextEncodingRegistry.h" Why is this include needed? >>> Source/WebCore/platform/text/TextCodecICU.cpp:62 >>> + // TextEncoding name is always atomic, so the string is never deleted. >> >> I am a little confused by "atomic" here. What do you mean? > > TextEncodingRegistry has a map for encodings, see atomicCanonicalTextEncodingName(). Maybe word more like this? // TextEncoding name is always one from atomicCanonicalTextEncodingName, guaranteed to never be deleted. > Source/WebCore/platform/text/TextCodecICU.cpp:230 > - const char* name = m_encoding.name(); > - m_needsGBKFallbacks = name[0] == 'G' && name[1] == 'B' && name[2] == 'K' && !name[3]; > + m_needsGBKFallbacks = m_encodingName[0] == 'G' && m_encodingName[1] == 'B' && m_encodingName[2] == 'K' && !m_encodingName[3]; Seems to be a slight decrease in readability. The original author (me) thought that using a local made it more readable. It wasn’t performance optimization, but expression readability optimization. > Source/WebCore/platform/text/TextCodecICU.cpp:437 > + if (shouldShowBackslashAsCurrencySymbolIn(m_encodingName)) { Why not optimize further by searching for '\\' and not making a copy if there is no '\\' character? > Source/WebCore/platform/text/TextCodecICU.cpp:439 > + copy.append(characters, length); > + copy.replace('\\', 0xA5); This copies the characters twice. Instead, we could just use Vector<UChar> and modify the copy in place. > Source/WebCore/platform/text/TextCodecICU.h:45 > + TextCodecICU(const char* encoding); Maybe mark this explicit? > Source/WebCore/platform/text/TextCodecICU.h:59 > + const char* m_encodingName; Maybe even const char* const?
>> +#include "TextEncodingRegistry.h" > > Why is this include needed? We use shouldShowBackslashAsCurrencySymbolIn() that's declared in this header. > Seems to be a slight decrease in readability. The original author (me) thought that using a local made it > more readable. It wasn’t performance optimization, but expression readability optimization. For reference, this change was <http://trac.webkit.org/changeset/21227/trunk/WebCore/platform/TextCodecICU.cpp>. Before the change, the code would implicitly enumerate non-base codecs just to construct TextEncoding("GBK"). Interestingly, that's a historic example of why it's undesirable to use TextEncoding in TextCodecICU.cpp. I'm unsure why we can't just use strcmp, and in fact, I have a local patch that moves this code, and changes it to use strcmp. Made the other suggested changes, except for optimization ideas - this is just encode(), which is probably not worth any added risk. Committed <http://trac.webkit.org/changeset/149924>.
(In reply to comment #6) > > Seems to be a slight decrease in readability. The original author (me) thought that using a local made it > > more readable. It wasn’t performance optimization, but expression readability optimization. > > For reference, this change was <http://trac.webkit.org/changeset/21227/trunk/WebCore/platform/TextCodecICU.cpp>. Before the change, the code would implicitly enumerate non-base codecs just to construct TextEncoding("GBK"). Interestingly, that's a historic example of why it's undesirable to use TextEncoding in TextCodecICU.cpp. > > I'm unsure why we can't just use strcmp I think that when I wrote this I thought that the four explicit comparisons would be faster than a strcmp. Depends how aggressive compilers are at optimizing strcmp, particularly when a string constant is involved. And also whether code size or speed is more important. For readability, I think strcmp and memcmp are some of the worst functions. Remembering that "0 means equal" is confusing enough that we made an exception to our "== 0" rule for it, an exception that most people probably don’t realize it exists. But a comparison with "GBK" is obviously better, if only there was a clearer way than strcmp ;-)