16548 – REGRESSION(r28810): Font style and sizes are weird for Japanese text

RESOLVED FIXED 16548

REGRESSION(r28810): Font style and sizes are weird for Japanese text

https://bugs.webkit.org/show_bug.cgi?id=16548

Summary REGRESSION(r28810): Font style and sizes are weird for Japanese text

Louise

Reported 2007-12-20 19:22:54 PST

Font sizes look different for Japanese Hiragana compared to Chinese characters and alphabets. The title of the page on Safari's window has the same problem. Reproducible: Always Steps to Reproduce: Go to the above URL (or any site with Japanese text) tested with Safari 3.0.4 (523.13) Windows XP SP2

Attachments
Screenshot taken with Safari 3.0.4 (523.13) (45.33 KB, image/png) 2007-12-20 19:24 PST, Louise	no flags	Details
Screenshot taken with Firefox (13.66 KB, image/png) 2007-12-20 19:24 PST, Louise	no flags	Details
test code (not patch) corresponds to screenshots below (1.24 KB, text/plain) 2007-12-21 02:51 PST, 808caaa4.8ce9.9cd6c799e9f6	no flags	Details
screenshots, result samples, with CP_ACP (16.30 KB, image/gif) 2007-12-21 02:52 PST, 808caaa4.8ce9.9cd6c799e9f6	no flags	Details
screenshots, result samples, with CP=932(Japanese) (16.26 KB, image/gif) 2007-12-21 02:52 PST, 808caaa4.8ce9.9cd6c799e9f6	no flags	Details
Use FontLink\SystemLink registry values to map fallback fonts (8.35 KB, patch) 2007-12-24 22:47 PST, mitz	no flags	Details Formatted Diff Diff
Use FontLink\SystemLink registry values to map fallback fonts (10.40 KB, patch) 2008-01-02 08:25 PST, mitz	mitz: review-	Details Formatted Diff Diff
Use FontLink\SystemLink registry values to map fallback fonts (9.15 KB, patch) 2008-01-03 17:37 PST, mitz	darin: review+	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Louise

Comment 1 2007-12-20 19:24:03 PST

Created attachment 18023 [details] Screenshot taken with Safari 3.0.4 (523.13)

Louise

Comment 2 2007-12-20 19:24:23 PST

Created attachment 18024 [details] Screenshot taken with Firefox

Louise

Comment 3 2007-12-20 20:09:11 PST

This is a regression from r28810. Previous webkits didn't have this problem. The font style for Japanese text typed in Safari has changed from previous webkits (in text input, search box) and they are somewhat difficult to read now.

mitz

Comment 4 2007-12-20 20:20:08 PST

<rdar://problem/5659452>

mitz

Comment 5 2007-12-20 20:32:57 PST

Need to devise a better heuristic for picking fallback fonts for CJK characters that is consistent but doesn't favor the Chinese font everywhere.

808caaa4.8ce9.9cd6c799e9f6

Comment 6 2007-12-21 02:51:10 PST

Created attachment 18028 [details] test code (not patch) corresponds to screenshots below

808caaa4.8ce9.9cd6c799e9f6

Comment 7 2007-12-21 02:52:00 PST

Created attachment 18029 [details] screenshots, result samples, with CP_ACP

808caaa4.8ce9.9cd6c799e9f6

Comment 8 2007-12-21 02:52:53 PST

Created attachment 18030 [details] screenshots, result samples, with CP=932(Japanese)

808caaa4.8ce9.9cd6c799e9f6

Comment 9 2007-12-21 03:00:33 PST

With my environments, - CodePageToCodePages(CP_ACP,...) returns with E_FAIL, acpCodePages remains zero. - GetStrCodePages(<JapaneseUNICODEStr>,...) shows 0x1e0000 == Japanese | ChineseSimplified | LangKorean | ChineseTraditional so finally codePage == simplifiedChineseCP, *even when* 932(Japanese) is specified to CodePageToCodePages(). Screenshots with my environment attached. 1st line is 'apple'. 2nd line is hiragana form of 'apple'. 3rd line is ascii-arts, contains hankaku-hiragana.

mitz

Comment 10 2007-12-21 07:36:19 PST

One limitation is that system fallback font selection in WebKit is done on a character-by-character basis. GetStrCodePages will always be passed a single character.

808caaa4.8ce9.9cd6c799e9f6

Comment 11 2007-12-21 20:20:35 PST

Current impl meaning is, I think (WebCore/platform/graphics/win/FontCacheWin.cpp/getFontDataForCharacters()), if each UNICODE characters CAN BE simplifiedChinese(CP936), they are all simplifiedChinese char. Currently many many chars, not only Kanji but also alphanumeric chars are detected as (Chinese | Japanese | Korean | more...) by mlang.dll, and rendered with unfamiliar fonts (for-Chinese system fonts, it's forced preinstalled to Japanese NT too, specified by windows/inf/intl.inf). I don't know now how to solve this gently, with considaration of international use of WebKit. I'll think over while doing houseworks today.... The quickest *temporary* hack is exclude Japanese installation, simply checkable: if (GetACP()==932){ /* is Japanese */} so hacks like: if (/*TEMPHACK*/GetACP()!=932 && actualCodePages && ... Some anonymous testers says this hacks avoids current problem.

808caaa4.8ce9.9cd6c799e9f6

Comment 12 2007-12-21 20:21:42 PST

/* #9, not hankaku-hiragana, is hankaku-kana. Yes these example are with strings. Checking for chars resulted as same. --- D:\works>cptest A acpCodePages: 0(80004005),actualCodePages: 1F01FF, cchActual: 1,finalCodePage: 936 D:\works>cptest <one-hiragana-char> acpCodePages: 0(80004005),actualCodePages: E0000, cchActual: 1,finalCodePage: 936 D:\works> --- */

mitz

Comment 13 2007-12-21 22:45:42 PST

(In reply to comment #11) > The quickest *temporary* hack is exclude Japanese installation, simply > checkable: > if (GetACP()==932){ /* is Japanese */} > so hacks like: > if (/*TEMPHACK*/GetACP()!=932 && actualCodePages && ... I believe the above will still break if you are on a Japanese installation once you go to a Chinese page. We will then ask MLang to map from codepage 936 alone, so it will return a Chinese font, and subsequently it will return the same font for all characters, even those that are both in 932 and in 936. This inconsistent behavior and dependency on the order of operations is what r28810 was trying to prevent. The system code page should probably factor into the font linking process, but hopefully there is a way for that to happen that is internal to MLang or another Windows API without having to query it explicitly in WebKit. Perhaps hiding information from MLang is the way to go: when a character belongs to multiple code pages, if one of them is the system code page ask only about that code page. Then figure out what to do if none of the multiple code pages is the system code page (as is the case on an English installation). I think Mac OS X uses font traits to pick the fallback font.

mitz

Comment 14 2007-12-23 13:32:18 PST

I have been looking at how IE behaves on different language versions of Windows. Using two UTF-8 encoded HTML files with no locale metadata and no style information, one containing text from MSN in Chinese and the other containing text from Google search results in Japanese, I have observed the following: * On the English and Chinese (cn) installs - Chinese was rendered using a single "Chinese"-looking font (probably Simsun). - Japanese was rendered using a mixture of two fonts, the "Chinese" font for some character and a "Japanese" font (probably MS PGothic) for others. * On the Japanese install - Chinese was rendered using a mixture of two fonts, a "Chinese" one and a "Japanese" one. - Japanese was rendered using a single "Japanese" font. * On the Chinese (zh_HK) install - Japanese looked like on the (cn) install. - Chinese used mostly the "Chinese" font but a few characters were rendered using a "Japanese"-looking font. I have not tested it, but IE might perform better when it can infer the language from the encoding or metadata. Font fallback in WebKit is per-character and cannot be specific to a document, so I think ideas that involve context or metadata are not the right answer. The use of code pages on Windows is what leads to the "mixed fonts" behavior. I think the whole notion of code page should be avoided in WebKit, just like on the Mac. The other thing that helps on the Mac is that font fallback tries to maintain font traits, so for example even though the google.co.jp style sheet does not specify any font family that has CJK characters on Leopard, since fallback is from a sans-serif font, the system hands back a "Japanese"-looking font. As far as I could tell, Windows font fallback mechanisms do not try to match traits. However, it might still be possible to at least fall back on the appropriate font for the installed language by using the registry keys that GDI uses for its internal font fallback.

mitz

Comment 15 2007-12-24 22:47:08 PST

Created attachment 18104 [details] Use FontLink\SystemLink registry values to map fallback fonts Not using code pages and MapFonts. On an English XP, Japanese is rendered consistently in MS UI Gothic, but Chinese uses a mixture of MS UI Gothic and Simsun. I have not tested on other systems yet, but I expect Japanese Vista to behave the same, and Chinese Vista to use Simsun exclusively for both languages.

808caaa4.8ce9.9cd6c799e9f6

Comment 16 2007-12-25 02:47:53 PST

future extentions: What about try fetching WebKit-local fontlink list (prefs on plist,registry,...) before query SystemLink Key for further impl? These keys cannot be modified by 'Users' users.

mitz

Comment 17 2007-12-30 19:03:29 PST

(In reply to comment #15) > I have not tested on other systems yet, but I expect Japanese Vista to > behave the same, and Chinese Vista to use Simsun exclusively for both > languages. Confirmed.

mitz

Comment 18 2008-01-02 08:25:45 PST

Created attachment 18238 [details] Use FontLink\SystemLink registry values to map fallback fonts Cleaned up and added a change log. Not sure this is the best/correct approach.

Darin Adler

Comment 19 2008-01-02 09:29:30 PST

Comment on attachment 18238 [details] Use FontLink\SystemLink registry values to map fallback fonts This looks good. r=me

mitz

Comment 20 2008-01-02 14:38:21 PST

Comment on attachment 18238 [details] Use FontLink\SystemLink registry values to map fallback fonts I found an error in the loop that scans the registry key value and a few layout test failures.

mitz

Comment 21 2008-01-03 13:42:02 PST

(In reply to comment #20) > layout test failures Some tests that were using Ahem were failing because FontCache::getFontDataForCharacters() returned 0 when the primary font was Ahem and the character was a zero width space. Here is a list of characters for which Uniscribe says that it will use Ahem even though Ahem does not have a glyph for them: U+070F SYRIAC ABBREVIATION MAKR U+180B MONGOLIAN FREE VARIATION SELECTOR ONE U+180C MONGOLIAN FREE VARIATION SELECTOR TWO U+180D MONGOLIAN FREE VARIATION SELECTOR THREE U+180E MONGOLIAN VOWEL SEPARATOR U+180F U+2000 EN QUAD U+2001 EM QUAD U+2002 EN SPACE U+2003 EM SPACE U+2004 THREE-PER-EM SPACE U+2005 FOUR-PER-EM SPACE U+2006 SIX-PER-EM SPACE U+2007 FIGURE SPACE U+2008 PUNCTUATION SPACE U+2009 THIN SPACE U+200A HAIR SPACE U+200B ZERO WIDTH SPACE U+200C ZERO WIDTH NON-JOINER U+200D ZERO WIDTH JOINER

mitz

Comment 22 2008-01-03 13:43:56 PST

U+070F SYRIAC ABBREVIATION MARK (fixed typo in case anyone ever searches Bugzilla for this string).

mitz

Comment 23 2008-01-03 14:02:23 PST

For U+070F, U+180B, U+180C, U+180D, U+180E, U+180F, the current code calls MapFont, and the call fails, so next it tries the Uniscribe-metafile method, which as mentioned above returns "Ahem", and that is what is returned to the caller. It seems like U+2000..U+200D are really the only characters that should be treated specially.

mitz

Comment 24 2008-01-03 17:37:31 PST

Created attachment 18258 [details] Use FontLink\SystemLink registry values to map fallback fonts Corrected the loop limit and added a special case for characters in the range U+2000..U+200F.

mitz

Comment 25 2008-01-03 17:41:50 PST

Comment on attachment 18258 [details] Use FontLink\SystemLink registry values to map fallback fonts I think I do not need this: +#include "CharacterNames.h"

Darin Adler

Comment 26 2008-01-03 17:47:26 PST

Comment on attachment 18258 [details] Use FontLink\SystemLink registry values to map fallback fonts r=me

mitz

Comment 27 2008-01-03 18:06:44 PST

Landed in <http://trac.webkit.org/projects/webkit/changeset/29140>.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P1

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware PC

OS Windows XP

Product WebKit

Component Text

Assignee

Nobody

Reported

2007-12-20 19:22 PST

Modified

2008-01-03 18:06 PST History

CC List

2 users Show

URL

http://www.google.co.jp/search?q=safari&ie=UTF-8&hl=ja&lr=lang_ja

Keywords InRadar, Regression

Depends on

Blocks