LayoutTests/editing/selection/find-yensign-backslash.html doesn't pass or does pass unexpectedly on non-Mac platforms because - Shift_JIS_X0213-2000 is available only with TextCodecMac, and TextCodecICU doesn't provide it. - TextEncoding::backslashAsCurrencySymbol() doesn't take care of a case that atomicCanonicalTextEncodingName() result is 0.
Created attachment 74205 [details] Patch
Comment on attachment 74205 [details] Patch Out of curiosity, does TextCodecMac have both Shift_JIS and Shift_JIS_X0213-2000?
Comment on attachment 74205 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=74205&action=review Note that there are other places where we only check for the longer name - e.g. TextEncoding::backslashAsCurrencySymbol(). > Out of curiosity, does TextCodecMac have both Shift_JIS and Shift_JIS_X0213-2000? Yes. The latter doesn't come from ICU, but from mac-encodings.txt. I'd really expect a reviewer to know or at least research the answer to such a question before giving r+. > WebCore/platform/text/TextEncoding.cpp:207 > + // Shift_JIS_X0213-2000 is not the same encoding as Shift_JIS on Mac. We > + // need to check both of them. This is a short line, it shouldn't be split into two. > WebCore/platform/text/TextEncoding.cpp:210 > + static const char* const shiftJis = atomicCanonicalTextEncodingName("Shift_JIS"); > + static const char* const shiftJis2000 = atomicCanonicalTextEncodingName("Shift_JIS_X0213-2000"); > + static const char* const eucJp = atomicCanonicalTextEncodingName("EUC-JP"); Coding style mistake - it's "Shift_JIS", not "Shift_Jis", so variables should be named accordingly. Ditto for EUC-JP. > WebCore/platform/text/TextEncoding.cpp:211 > + return ((shiftJis && m_name == shiftJis) || (shiftJis2000 && m_name == shiftJis2000) || (eucJp && m_name == eucJp)) ? 0x00A5 : '\\'; I'm not really sure how performance critical this function is, but it's certainly invoked a lot, so I wouldn't be adding unnecessary null checks.
(In reply to comment #3) > > WebCore/platform/text/TextEncoding.cpp:211 > > + return ((shiftJis && m_name == shiftJis) || (shiftJis2000 && m_name == shiftJis2000) || (eucJp && m_name == eucJp)) ? 0x00A5 : '\\'; > > I'm not really sure how performance critical this function is, but it's certainly invoked a lot, so I wouldn't be adding unnecessary null checks. I think this is a correctness check. If the m_name is NULL and shiftJis2000 is NULL, we'll return the wrong value. I suppose you could reduce the number of comparisons by just NULL checking m_name.
It were null checks for Shift_JIS and EUC-JP that caught my eye - every reasonable back-end should have those. Perhaps Shift_JIS_X0213-2000 could just go inside #if PLATFORM(MAC). My recollection is that it's there simply because it was supported in Safari prior to switching to ICU, and no one had the gut to remove it.
As this hack is for compatibilities with IE, I think we also want other encodings with which Japanese fonts will be used. Specifically, I'd add ISO-2022-JP and ISO-2022-JP-2. Could you add them in this chance? I think we can use a HashSet like TextEncoding::isJapanese.
(In reply to comment #4) > I suppose you could reduce the number of comparisons by just NULL checking m_name. It's a good idea.
(In reply to comment #6) > As this hack is for compatibilities with IE, I think we also want other encodings with which Japanese fonts will be used. Specifically, I'd add ISO-2022-JP and ISO-2022-JP-2. Could you add them in this chance? I think we can use a HashSet like TextEncoding::isJapanese. I tested some encoding names with IE. IE showed yensign for 0x5c in the following encoding names: EUC-JP Shift_JIS, Windows-31J, x-sjis Shift_JIS_X0213-2000 x-mac-japanese ISO-2022-JP IE showed a backslash: cp932 JIS_X0201 JIS_X0208-1983 JIS_X0208-1990 JIS_X0212-1990 JIS_C6226-1978 ISO-2022-JP-1 ISO-2022-JP-2 ISO-2022-JP-3
Created attachment 74356 [details] Patch 2
Created attachment 74357 [details] Patch 3
Comment on attachment 74357 [details] Patch 3 View in context: https://bugs.webkit.org/attachment.cgi?id=74357&action=review > WebCore/platform/text/TextEncoding.cpp:204 > + DEFINE_STATIC_LOCAL(HashSet<const char*>, set, ()); Can this code be run from a secondary thread?
Comment on attachment 74357 [details] Patch 3 View in context: https://bugs.webkit.org/attachment.cgi?id=74357&action=review >> WebCore/platform/text/TextEncoding.cpp:204 >> + DEFINE_STATIC_LOCAL(HashSet<const char*>, set, ()); > > Can this code be run from a secondary thread? This code is not MT-safe. Is TextEncoding class used by multiple threads? TextEncoding::isJapanese() has the same issue.
(In reply to comment #12) > >> WebCore/platform/text/TextEncoding.cpp:204 > >> + DEFINE_STATIC_LOCAL(HashSet<const char*>, set, ()); > > > > Can this code be run from a secondary thread? > > This code is not MT-safe. Is TextEncoding class used by multiple threads? > TextEncoding::isJapanese() has the same issue. If the main thread always creates a TextEncoding instance and calls TextEncoding::isJapanese() before a secondary thread is created, we have no thread-safety issue.
I remember that TextEncodingRegistry had multi-threading issues, but don't remember which exactly. I'm not sure about these particular TextCodec functions. At the very least, they need isMainThread() assertions. One direction to try would be checking svn blame for encodingRegistryMutex to see why it was added.
(In reply to comment #14) > One direction to try would be checking svn blame for encodingRegistryMutex to see why it was added. I checked Bug 22341, and found no detail of the reason :-) Anyway, my "Patch 3" is wrong. If no extended encodings are used before starting workers and then extended encodings are used in the workers, multiple threads can enter the HashSet initialization path. I tried AtomicallyInitializedStatic(), but it produced a build error about global destructor. So I decided to move these HashSets to TextEncodingRegistry.
Created attachment 74532 [details] Patch 4
> I checked Bug 22341, and found no detail of the reason :-) Well, it explains that we were getting there through KURL in workers.
Comment on attachment 74532 [details] Patch 4 + IE shows a yensign for 0x5c code point encoded in x-mac-japanese, + ISO-2022-JP, EUC-JP, Shift_JIS, Shift_JIS_X0213-2000, x-sjis, and + Windows-31J. We follow it. This explanation is misleading. IE chooses a default font based on encoding, and some Windows fonts have a yen sign in place of backslash. We need much more complicated logic because fonts on other platforms are different, and don't have this quirk.
Created attachment 74535 [details] Patch 5
Created attachment 74536 [details] Patch 6
(In reply to comment #18) > (From update of attachment 74532 [details]) > + IE shows a yensign for 0x5c code point encoded in x-mac-japanese, > + ISO-2022-JP, EUC-JP, Shift_JIS, Shift_JIS_X0213-2000, x-sjis, and > + Windows-31J. We follow it. > > This explanation is misleading. IE chooses a default font based on encoding, and some Windows fonts have a yen sign in place of backslash. We need much more complicated logic because fonts on other platforms are different, and don't have this quirk. Ok, I have updated ChangeLog.
Comment on attachment 74536 [details] Patch 6 View in context: https://bugs.webkit.org/attachment.cgi?id=74536&action=review > WebCore/ChangeLog:10 > + IE chooses a font which shows a yensign for 0x5c code point for a page > + encoded in x-mac-japanese, ISO-2022-JP, EUC-JP, Shift_JIS, Shift_JIS_X0213-2000, > + x-sjis, and Windows-31J. I thought IE didn't support Shift_JIS_X0213-2000. > WebCore/ChangeLog:16 > + Also, we move the HashSet initialization for isJapanese() and > + backslashAsCurrencySymbol() to TextEncodingRegistry.cpp because of > + ease of making them multi-thread safe. This is not a great reason - TextEncodingRegistry does not and should not care about backslashes. > WebCore/platform/text/TextEncoding.cpp:166 > bool TextEncoding::isJapanese() const This function is misplaced - all it is needed for is to make a decision on whether to invoke charset detection. So it should be in TextResourceDecoder (not something for this patch, of course).
(In reply to comment #22) > (From update of attachment 74536 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=74536&action=review > > > WebCore/ChangeLog:10 > > + IE chooses a font which shows a yensign for 0x5c code point for a page > > + encoded in x-mac-japanese, ISO-2022-JP, EUC-JP, Shift_JIS, Shift_JIS_X0213-2000, > > + x-sjis, and Windows-31J. > > I thought IE didn't support Shift_JIS_X0213-2000. I was surprised when I saw IE switched the default font for it. I guess IE uses prefix-matching for encoding names. > > WebCore/ChangeLog:16 > > + Also, we move the HashSet initialization for isJapanese() and > > + backslashAsCurrencySymbol() to TextEncodingRegistry.cpp because of > > + ease of making them multi-thread safe. > > This is not a great reason - TextEncodingRegistry does not and should not care about backslashes. However, the encoding names for backslashAsCurrencySymbol() depend on TextEncodingRegistry. If one added new Japanese encoding to TextEncodingRegistry, he/she might need to add it to nonBackslashEncodings. Anyway, I also feel the new code is not good. I'll add a FIXME comment.
Landed: http://trac.webkit.org/changeset/73566
http://trac.webkit.org/changeset/73566 might have broken GTK Linux 32-bit Release The following tests are not passing: fast/css/input-search-padding.html fast/css/text-input-with-webkit-border-radius.html fast/forms/box-shadow-override.html fast/forms/control-restrict-line-height.html fast/forms/input-appearance-height.html fast/forms/placeholder-pseudo-style.html fast/forms/placeholder-set-value.html fast/forms/search-cancel-button-style-sharing.html fast/forms/search-placeholder-value-changed.html fast/forms/search-rtl.html fast/forms/search-styled.html fast/forms/search-transformed.html fast/forms/search-vertical-alignment.html fast/forms/search-zoomed.html fast/forms/searchfield-heights.html