49714 – Yensign hack should work with Shift_JIS encoding

RESOLVED FIXED49714

Yensign hack should work with Shift_JIS encoding

https://bugs.webkit.org/show_bug.cgi?id=49714

Summary Yensign hack should work with Shift_JIS encoding

Kent Tamura

Reported 2010-11-17 22:47:35 PST

LayoutTests/editing/selection/find-yensign-backslash.html doesn't pass or does pass unexpectedly on non-Mac platforms because - Shift_JIS_X0213-2000 is available only with TextCodecMac, and TextCodecICU doesn't provide it. - TextEncoding::backslashAsCurrencySymbol() doesn't take care of a case that atomicCanonicalTextEncodingName() result is 0.

Attachments
Patch (5.67 KB, patch) 2010-11-17 22:52 PST, Kent Tamura	no flags	Details Formatted Diff Diff
Patch 2 (6.95 KB, patch) 2010-11-18 23:15 PST, Kent Tamura	no flags	Details Formatted Diff Diff
Patch 3 (6.81 KB, patch) 2010-11-18 23:36 PST, Kent Tamura	no flags	Details Formatted Diff Diff
Patch 4 (12.68 KB, patch) 2010-11-22 01:37 PST, Kent Tamura	no flags	Details Formatted Diff Diff
Patch 5 (12.71 KB, patch) 2010-11-22 02:55 PST, Kent Tamura	no flags	Details Formatted Diff Diff
Patch 6 (12.85 KB, patch) 2010-11-22 02:59 PST, Kent Tamura	ap: review+	Details Formatted Diff Diff
Show Obsolete (5) View All Add attachment proposed patch, testcase, etc.

Kent Tamura

Comment 1 2010-11-17 22:52:33 PST

Created attachment 74205 [details] Patch

Tony Chang

Comment 2 2010-11-18 09:37:59 PST

Comment on attachment 74205 [details] Patch Out of curiosity, does TextCodecMac have both Shift_JIS and Shift_JIS_X0213-2000?

Alexey Proskuryakov

Comment 3 2010-11-18 10:53:02 PST

Comment on attachment 74205 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=74205&action=review Note that there are other places where we only check for the longer name - e.g. TextEncoding::backslashAsCurrencySymbol(). > Out of curiosity, does TextCodecMac have both Shift_JIS and Shift_JIS_X0213-2000? Yes. The latter doesn't come from ICU, but from mac-encodings.txt. I'd really expect a reviewer to know or at least research the answer to such a question before giving r+. > WebCore/platform/text/TextEncoding.cpp:207 > + // Shift_JIS_X0213-2000 is not the same encoding as Shift_JIS on Mac. We > + // need to check both of them. This is a short line, it shouldn't be split into two. > WebCore/platform/text/TextEncoding.cpp:210 > + static const char* const shiftJis = atomicCanonicalTextEncodingName("Shift_JIS"); > + static const char* const shiftJis2000 = atomicCanonicalTextEncodingName("Shift_JIS_X0213-2000"); > + static const char* const eucJp = atomicCanonicalTextEncodingName("EUC-JP"); Coding style mistake - it's "Shift_JIS", not "Shift_Jis", so variables should be named accordingly. Ditto for EUC-JP. > WebCore/platform/text/TextEncoding.cpp:211 > + return ((shiftJis && m_name == shiftJis) || (shiftJis2000 && m_name == shiftJis2000) || (eucJp && m_name == eucJp)) ? 0x00A5 : '\\'; I'm not really sure how performance critical this function is, but it's certainly invoked a lot, so I wouldn't be adding unnecessary null checks.

Tony Chang

Comment 4 2010-11-18 11:13:05 PST

(In reply to comment #3) > > WebCore/platform/text/TextEncoding.cpp:211 > > + return ((shiftJis && m_name == shiftJis) || (shiftJis2000 && m_name == shiftJis2000) || (eucJp && m_name == eucJp)) ? 0x00A5 : '\\'; > > I'm not really sure how performance critical this function is, but it's certainly invoked a lot, so I wouldn't be adding unnecessary null checks. I think this is a correctness check. If the m_name is NULL and shiftJis2000 is NULL, we'll return the wrong value. I suppose you could reduce the number of comparisons by just NULL checking m_name.

Alexey Proskuryakov

Comment 5 2010-11-18 11:19:45 PST

It were null checks for Shift_JIS and EUC-JP that caught my eye - every reasonable back-end should have those. Perhaps Shift_JIS_X0213-2000 could just go inside #if PLATFORM(MAC). My recollection is that it's there simply because it was supported in Safari prior to switching to ICU, and no one had the gut to remove it.

Shinichiro Hamaji

Comment 6 2010-11-18 12:58:01 PST

As this hack is for compatibilities with IE, I think we also want other encodings with which Japanese fonts will be used. Specifically, I'd add ISO-2022-JP and ISO-2022-JP-2. Could you add them in this chance? I think we can use a HashSet like TextEncoding::isJapanese.

Kent Tamura

Comment 7 2010-11-18 22:41:16 PST

(In reply to comment #4) > I suppose you could reduce the number of comparisons by just NULL checking m_name. It's a good idea.

Kent Tamura

Comment 8 2010-11-18 22:49:34 PST

(In reply to comment #6) > As this hack is for compatibilities with IE, I think we also want other encodings with which Japanese fonts will be used. Specifically, I'd add ISO-2022-JP and ISO-2022-JP-2. Could you add them in this chance? I think we can use a HashSet like TextEncoding::isJapanese. I tested some encoding names with IE. IE showed yensign for 0x5c in the following encoding names: EUC-JP Shift_JIS, Windows-31J, x-sjis Shift_JIS_X0213-2000 x-mac-japanese ISO-2022-JP IE showed a backslash: cp932 JIS_X0201 JIS_X0208-1983 JIS_X0208-1990 JIS_X0212-1990 JIS_C6226-1978 ISO-2022-JP-1 ISO-2022-JP-2 ISO-2022-JP-3

Kent Tamura

Comment 9 2010-11-18 23:15:19 PST

Created attachment 74356 [details] Patch 2

Kent Tamura

Comment 10 2010-11-18 23:36:36 PST

Created attachment 74357 [details] Patch 3

Alexey Proskuryakov

Comment 11 2010-11-18 23:42:17 PST

Comment on attachment 74357 [details] Patch 3 View in context: https://bugs.webkit.org/attachment.cgi?id=74357&action=review > WebCore/platform/text/TextEncoding.cpp:204 > + DEFINE_STATIC_LOCAL(HashSet<const char*>, set, ()); Can this code be run from a secondary thread?

Kent Tamura

Comment 12 2010-11-18 23:47:19 PST

Comment on attachment 74357 [details] Patch 3 View in context: https://bugs.webkit.org/attachment.cgi?id=74357&action=review >> WebCore/platform/text/TextEncoding.cpp:204 >> + DEFINE_STATIC_LOCAL(HashSet<const char*>, set, ()); > > Can this code be run from a secondary thread? This code is not MT-safe. Is TextEncoding class used by multiple threads? TextEncoding::isJapanese() has the same issue.

Kent Tamura

Comment 13 2010-11-19 00:11:23 PST

(In reply to comment #12) > >> WebCore/platform/text/TextEncoding.cpp:204 > >> + DEFINE_STATIC_LOCAL(HashSet<const char*>, set, ()); > > > > Can this code be run from a secondary thread? > > This code is not MT-safe. Is TextEncoding class used by multiple threads? > TextEncoding::isJapanese() has the same issue. If the main thread always creates a TextEncoding instance and calls TextEncoding::isJapanese() before a secondary thread is created, we have no thread-safety issue.

Alexey Proskuryakov

Comment 14 2010-11-19 00:40:52 PST

I remember that TextEncodingRegistry had multi-threading issues, but don't remember which exactly. I'm not sure about these particular TextCodec functions. At the very least, they need isMainThread() assertions. One direction to try would be checking svn blame for encodingRegistryMutex to see why it was added.

Kent Tamura

Comment 15 2010-11-22 01:36:03 PST

(In reply to comment #14) > One direction to try would be checking svn blame for encodingRegistryMutex to see why it was added. I checked Bug 22341, and found no detail of the reason :-) Anyway, my "Patch 3" is wrong. If no extended encodings are used before starting workers and then extended encodings are used in the workers, multiple threads can enter the HashSet initialization path. I tried AtomicallyInitializedStatic(), but it produced a build error about global destructor. So I decided to move these HashSets to TextEncodingRegistry.

Kent Tamura

Comment 16 2010-11-22 01:37:01 PST

Created attachment 74532 [details] Patch 4

Alexey Proskuryakov

Comment 17 2010-11-22 01:42:47 PST

> I checked Bug 22341, and found no detail of the reason :-) Well, it explains that we were getting there through KURL in workers.

Alexey Proskuryakov

Comment 18 2010-11-22 01:49:23 PST

Comment on attachment 74532 [details] Patch 4 + IE shows a yensign for 0x5c code point encoded in x-mac-japanese, + ISO-2022-JP, EUC-JP, Shift_JIS, Shift_JIS_X0213-2000, x-sjis, and + Windows-31J. We follow it. This explanation is misleading. IE chooses a default font based on encoding, and some Windows fonts have a yen sign in place of backslash. We need much more complicated logic because fonts on other platforms are different, and don't have this quirk.

Kent Tamura

Comment 19 2010-11-22 02:55:40 PST

Created attachment 74535 [details] Patch 5

Kent Tamura

Comment 20 2010-11-22 02:59:08 PST

Created attachment 74536 [details] Patch 6

Kent Tamura

Comment 21 2010-11-22 02:59:45 PST

(In reply to comment #18) > (From update of attachment 74532 [details]) > + IE shows a yensign for 0x5c code point encoded in x-mac-japanese, > + ISO-2022-JP, EUC-JP, Shift_JIS, Shift_JIS_X0213-2000, x-sjis, and > + Windows-31J. We follow it. > > This explanation is misleading. IE chooses a default font based on encoding, and some Windows fonts have a yen sign in place of backslash. We need much more complicated logic because fonts on other platforms are different, and don't have this quirk. Ok, I have updated ChangeLog.

Alexey Proskuryakov

Comment 22 2010-12-06 20:55:28 PST

Comment on attachment 74536 [details] Patch 6 View in context: https://bugs.webkit.org/attachment.cgi?id=74536&action=review > WebCore/ChangeLog:10 > + IE chooses a font which shows a yensign for 0x5c code point for a page > + encoded in x-mac-japanese, ISO-2022-JP, EUC-JP, Shift_JIS, Shift_JIS_X0213-2000, > + x-sjis, and Windows-31J. I thought IE didn't support Shift_JIS_X0213-2000. > WebCore/ChangeLog:16 > + Also, we move the HashSet initialization for isJapanese() and > + backslashAsCurrencySymbol() to TextEncodingRegistry.cpp because of > + ease of making them multi-thread safe. This is not a great reason - TextEncodingRegistry does not and should not care about backslashes. > WebCore/platform/text/TextEncoding.cpp:166 > bool TextEncoding::isJapanese() const This function is misplaced - all it is needed for is to make a decision on whether to invoke charset detection. So it should be in TextResourceDecoder (not something for this patch, of course).

Kent Tamura

Comment 23 2010-12-08 16:05:27 PST

(In reply to comment #22) > (From update of attachment 74536 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=74536&action=review > > > WebCore/ChangeLog:10 > > + IE chooses a font which shows a yensign for 0x5c code point for a page > > + encoded in x-mac-japanese, ISO-2022-JP, EUC-JP, Shift_JIS, Shift_JIS_X0213-2000, > > + x-sjis, and Windows-31J. > > I thought IE didn't support Shift_JIS_X0213-2000. I was surprised when I saw IE switched the default font for it. I guess IE uses prefix-matching for encoding names. > > WebCore/ChangeLog:16 > > + Also, we move the HashSet initialization for isJapanese() and > > + backslashAsCurrencySymbol() to TextEncodingRegistry.cpp because of > > + ease of making them multi-thread safe. > > This is not a great reason - TextEncodingRegistry does not and should not care about backslashes. However, the encoding names for backslashAsCurrencySymbol() depend on TextEncodingRegistry. If one added new Japanese encoding to TextEncodingRegistry, he/she might need to add it to nonBackslashEncodings. Anyway, I also feel the new code is not good. I'll add a FIXME comment.

Kent Tamura

Comment 24 2010-12-08 16:55:18 PST

Landed: http://trac.webkit.org/changeset/73566

WebKit Review Bot

Comment 25 2010-12-09 00:34:46 PST

http://trac.webkit.org/changeset/73566 might have broken GTK Linux 32-bit Release The following tests are not passing: fast/css/input-search-padding.html fast/css/text-input-with-webkit-border-radius.html fast/forms/box-shadow-override.html fast/forms/control-restrict-line-height.html fast/forms/input-appearance-height.html fast/forms/placeholder-pseudo-style.html fast/forms/placeholder-set-value.html fast/forms/search-cancel-button-style-sharing.html fast/forms/search-placeholder-value-changed.html fast/forms/search-rtl.html fast/forms/search-styled.html fast/forms/search-transformed.html fast/forms/search-vertical-alignment.html fast/forms/search-zoomed.html fast/forms/searchfield-heights.html

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware PC

OS All

Product WebKit

Component HTML Editing

Assignee

Nobody

Reported

2010-11-17 22:47 PST

Modified

2010-12-09 00:34 PST History

CC List

6 users Show

URL

http://crbug.com/38653

Keywords

Depends on

Blocks