WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
16548
REGRESSION(
r28810
): Font style and sizes are weird for Japanese text
https://bugs.webkit.org/show_bug.cgi?id=16548
Summary
REGRESSION(r28810): Font style and sizes are weird for Japanese text
Louise
Reported
2007-12-20 19:22:54 PST
Font sizes look different for Japanese Hiragana compared to Chinese characters and alphabets. The title of the page on Safari's window has the same problem. Reproducible: Always Steps to Reproduce: Go to the above URL (or any site with Japanese text) tested with Safari 3.0.4 (523.13) Windows XP SP2
Attachments
Screenshot taken with Safari 3.0.4 (523.13)
(45.33 KB, image/png)
2007-12-20 19:24 PST
,
Louise
no flags
Details
Screenshot taken with Firefox
(13.66 KB, image/png)
2007-12-20 19:24 PST
,
Louise
no flags
Details
test code (not patch) corresponds to screenshots below
(1.24 KB, text/plain)
2007-12-21 02:51 PST
,
808caaa4.8ce9.9cd6c799e9f6
no flags
Details
screenshots, result samples, with CP_ACP
(16.30 KB, image/gif)
2007-12-21 02:52 PST
,
808caaa4.8ce9.9cd6c799e9f6
no flags
Details
screenshots, result samples, with CP=932(Japanese)
(16.26 KB, image/gif)
2007-12-21 02:52 PST
,
808caaa4.8ce9.9cd6c799e9f6
no flags
Details
Use FontLink\SystemLink registry values to map fallback fonts
(8.35 KB, patch)
2007-12-24 22:47 PST
,
mitz
no flags
Details
Formatted Diff
Diff
Use FontLink\SystemLink registry values to map fallback fonts
(10.40 KB, patch)
2008-01-02 08:25 PST
,
mitz
mitz: review-
Details
Formatted Diff
Diff
Use FontLink\SystemLink registry values to map fallback fonts
(9.15 KB, patch)
2008-01-03 17:37 PST
,
mitz
darin
: review+
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Louise
Comment 1
2007-12-20 19:24:03 PST
Created
attachment 18023
[details]
Screenshot taken with Safari 3.0.4 (523.13)
Louise
Comment 2
2007-12-20 19:24:23 PST
Created
attachment 18024
[details]
Screenshot taken with Firefox
Louise
Comment 3
2007-12-20 20:09:11 PST
This is a regression from
r28810
. Previous webkits didn't have this problem. The font style for Japanese text typed in Safari has changed from previous webkits (in text input, search box) and they are somewhat difficult to read now.
mitz
Comment 4
2007-12-20 20:20:08 PST
<
rdar://problem/5659452
>
mitz
Comment 5
2007-12-20 20:32:57 PST
Need to devise a better heuristic for picking fallback fonts for CJK characters that is consistent but doesn't favor the Chinese font everywhere.
808caaa4.8ce9.9cd6c799e9f6
Comment 6
2007-12-21 02:51:10 PST
Created
attachment 18028
[details]
test code (not patch) corresponds to screenshots below
808caaa4.8ce9.9cd6c799e9f6
Comment 7
2007-12-21 02:52:00 PST
Created
attachment 18029
[details]
screenshots, result samples, with CP_ACP
808caaa4.8ce9.9cd6c799e9f6
Comment 8
2007-12-21 02:52:53 PST
Created
attachment 18030
[details]
screenshots, result samples, with CP=932(Japanese)
808caaa4.8ce9.9cd6c799e9f6
Comment 9
2007-12-21 03:00:33 PST
With my environments, - CodePageToCodePages(CP_ACP,...) returns with E_FAIL, acpCodePages remains zero. - GetStrCodePages(<JapaneseUNICODEStr>,...) shows 0x1e0000 == Japanese | ChineseSimplified | LangKorean | ChineseTraditional so finally codePage == simplifiedChineseCP, *even when* 932(Japanese) is specified to CodePageToCodePages(). Screenshots with my environment attached. 1st line is 'apple'. 2nd line is hiragana form of 'apple'. 3rd line is ascii-arts, contains hankaku-hiragana.
mitz
Comment 10
2007-12-21 07:36:19 PST
One limitation is that system fallback font selection in WebKit is done on a character-by-character basis. GetStrCodePages will always be passed a single character.
808caaa4.8ce9.9cd6c799e9f6
Comment 11
2007-12-21 20:20:35 PST
Current impl meaning is, I think (WebCore/platform/graphics/win/FontCacheWin.cpp/getFontDataForCharacters()), if each UNICODE characters CAN BE simplifiedChinese(CP936), they are all simplifiedChinese char. Currently many many chars, not only Kanji but also alphanumeric chars are detected as (Chinese | Japanese | Korean | more...) by mlang.dll, and rendered with unfamiliar fonts (for-Chinese system fonts, it's forced preinstalled to Japanese NT too, specified by windows/inf/intl.inf). I don't know now how to solve this gently, with considaration of international use of WebKit. I'll think over while doing houseworks today.... The quickest *temporary* hack is exclude Japanese installation, simply checkable: if (GetACP()==932){ /* is Japanese */} so hacks like: if (/*TEMPHACK*/GetACP()!=932 && actualCodePages && ... Some anonymous testers says this hacks avoids current problem.
808caaa4.8ce9.9cd6c799e9f6
Comment 12
2007-12-21 20:21:42 PST
/* #9, not hankaku-hiragana, is hankaku-kana. Yes these example are with strings. Checking for chars resulted as same. --- D:\works>cptest A acpCodePages: 0(80004005),actualCodePages: 1F01FF, cchActual: 1,finalCodePage: 936 D:\works>cptest <one-hiragana-char> acpCodePages: 0(80004005),actualCodePages: E0000, cchActual: 1,finalCodePage: 936 D:\works> --- */
mitz
Comment 13
2007-12-21 22:45:42 PST
(In reply to
comment #11
)
> The quickest *temporary* hack is exclude Japanese installation, simply > checkable: > if (GetACP()==932){ /* is Japanese */} > so hacks like: > if (/*TEMPHACK*/GetACP()!=932 && actualCodePages && ...
I believe the above will still break if you are on a Japanese installation once you go to a Chinese page. We will then ask MLang to map from codepage 936 alone, so it will return a Chinese font, and subsequently it will return the same font for all characters, even those that are both in 932 and in 936. This inconsistent behavior and dependency on the order of operations is what
r28810
was trying to prevent. The system code page should probably factor into the font linking process, but hopefully there is a way for that to happen that is internal to MLang or another Windows API without having to query it explicitly in WebKit. Perhaps hiding information from MLang is the way to go: when a character belongs to multiple code pages, if one of them is the system code page ask only about that code page. Then figure out what to do if none of the multiple code pages is the system code page (as is the case on an English installation). I think Mac OS X uses font traits to pick the fallback font.
mitz
Comment 14
2007-12-23 13:32:18 PST
I have been looking at how IE behaves on different language versions of Windows. Using two UTF-8 encoded HTML files with no locale metadata and no style information, one containing text from MSN in Chinese and the other containing text from Google search results in Japanese, I have observed the following: * On the English and Chinese (cn) installs - Chinese was rendered using a single "Chinese"-looking font (probably Simsun). - Japanese was rendered using a mixture of two fonts, the "Chinese" font for some character and a "Japanese" font (probably MS PGothic) for others. * On the Japanese install - Chinese was rendered using a mixture of two fonts, a "Chinese" one and a "Japanese" one. - Japanese was rendered using a single "Japanese" font. * On the Chinese (zh_HK) install - Japanese looked like on the (cn) install. - Chinese used mostly the "Chinese" font but a few characters were rendered using a "Japanese"-looking font. I have not tested it, but IE might perform better when it can infer the language from the encoding or metadata. Font fallback in WebKit is per-character and cannot be specific to a document, so I think ideas that involve context or metadata are not the right answer. The use of code pages on Windows is what leads to the "mixed fonts" behavior. I think the whole notion of code page should be avoided in WebKit, just like on the Mac. The other thing that helps on the Mac is that font fallback tries to maintain font traits, so for example even though the google.co.jp style sheet does not specify any font family that has CJK characters on Leopard, since fallback is from a sans-serif font, the system hands back a "Japanese"-looking font. As far as I could tell, Windows font fallback mechanisms do not try to match traits. However, it might still be possible to at least fall back on the appropriate font for the installed language by using the registry keys that GDI uses for its internal font fallback.
mitz
Comment 15
2007-12-24 22:47:08 PST
Created
attachment 18104
[details]
Use FontLink\SystemLink registry values to map fallback fonts Not using code pages and MapFonts. On an English XP, Japanese is rendered consistently in MS UI Gothic, but Chinese uses a mixture of MS UI Gothic and Simsun. I have not tested on other systems yet, but I expect Japanese Vista to behave the same, and Chinese Vista to use Simsun exclusively for both languages.
808caaa4.8ce9.9cd6c799e9f6
Comment 16
2007-12-25 02:47:53 PST
future extentions: What about try fetching WebKit-local fontlink list (prefs on plist,registry,...) before query SystemLink Key for further impl? These keys cannot be modified by 'Users' users.
mitz
Comment 17
2007-12-30 19:03:29 PST
(In reply to
comment #15
)
> I have not tested on other systems yet, but I expect Japanese Vista to > behave the same, and Chinese Vista to use Simsun exclusively for both > languages.
Confirmed.
mitz
Comment 18
2008-01-02 08:25:45 PST
Created
attachment 18238
[details]
Use FontLink\SystemLink registry values to map fallback fonts Cleaned up and added a change log. Not sure this is the best/correct approach.
Darin Adler
Comment 19
2008-01-02 09:29:30 PST
Comment on
attachment 18238
[details]
Use FontLink\SystemLink registry values to map fallback fonts This looks good. r=me
mitz
Comment 20
2008-01-02 14:38:21 PST
Comment on
attachment 18238
[details]
Use FontLink\SystemLink registry values to map fallback fonts I found an error in the loop that scans the registry key value and a few layout test failures.
mitz
Comment 21
2008-01-03 13:42:02 PST
(In reply to
comment #20
)
> layout test failures
Some tests that were using Ahem were failing because FontCache::getFontDataForCharacters() returned 0 when the primary font was Ahem and the character was a zero width space. Here is a list of characters for which Uniscribe says that it will use Ahem even though Ahem does not have a glyph for them: U+070F SYRIAC ABBREVIATION MAKR U+180B MONGOLIAN FREE VARIATION SELECTOR ONE U+180C MONGOLIAN FREE VARIATION SELECTOR TWO U+180D MONGOLIAN FREE VARIATION SELECTOR THREE U+180E MONGOLIAN VOWEL SEPARATOR U+180F U+2000 EN QUAD U+2001 EM QUAD U+2002 EN SPACE U+2003 EM SPACE U+2004 THREE-PER-EM SPACE U+2005 FOUR-PER-EM SPACE U+2006 SIX-PER-EM SPACE U+2007 FIGURE SPACE U+2008 PUNCTUATION SPACE U+2009 THIN SPACE U+200A HAIR SPACE U+200B ZERO WIDTH SPACE U+200C ZERO WIDTH NON-JOINER U+200D ZERO WIDTH JOINER
mitz
Comment 22
2008-01-03 13:43:56 PST
U+070F SYRIAC ABBREVIATION MARK (fixed typo in case anyone ever searches Bugzilla for this string).
mitz
Comment 23
2008-01-03 14:02:23 PST
For U+070F, U+180B, U+180C, U+180D, U+180E, U+180F, the current code calls MapFont, and the call fails, so next it tries the Uniscribe-metafile method, which as mentioned above returns "Ahem", and that is what is returned to the caller. It seems like U+2000..U+200D are really the only characters that should be treated specially.
mitz
Comment 24
2008-01-03 17:37:31 PST
Created
attachment 18258
[details]
Use FontLink\SystemLink registry values to map fallback fonts Corrected the loop limit and added a special case for characters in the range U+2000..U+200F.
mitz
Comment 25
2008-01-03 17:41:50 PST
Comment on
attachment 18258
[details]
Use FontLink\SystemLink registry values to map fallback fonts I think I do not need this: +#include "CharacterNames.h"
Darin Adler
Comment 26
2008-01-03 17:47:26 PST
Comment on
attachment 18258
[details]
Use FontLink\SystemLink registry values to map fallback fonts r=me
mitz
Comment 27
2008-01-03 18:06:44 PST
Landed in <
http://trac.webkit.org/projects/webkit/changeset/29140
>.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug