Bug 18085

Summary: -webkit-locale should be inferred from charset
Product: WebKit Reporter: jasneet <jasneet>
Component: Layout and RenderingAssignee: Jungshik Shin <jshin>
Status: NEW ---    
Severity: Normal CC: ap, brettw, eric, falken, hyatt, jasneet, jshin, karlcow, mario.bensi, mitz, nickshanks, phiw2, xfdbse
Priority: P2 Keywords: HasReduction, InRadar
Version: 525.x (Safari 3.1)   
Hardware: All   
OS: All   
URL: http://www.pconline.com.cn/
See Also: https://bugs.webkit.org/show_bug.cgi?id=17701
Bug Depends on: 20797    
Bug Blocks: 10874    
Attachments:
Description Flags
screenshot
none
reduction
none
patch with stub implementations of getGenericFontForScript for mac,win,gtk,qt,wx none

Description jasneet 2008-03-25 15:35:36 PDT
I Steps:
Go to 
http://www.pconline.com.cn/

II Issue:
The links in the center column navigation bar are not aligned correctly

III Conclusion:
Issue with the font-fallback of Chinese characters.

IV Other browsers:
IE7: partially ok (IE6: ok)
FF3: ok
Opera9.24: ok

V Nightly tested: 31238
Comment 1 jasneet 2008-03-25 15:36:17 PDT
Created attachment 20047 [details]
screenshot
Comment 2 jasneet 2008-03-25 15:36:45 PDT
Created attachment 20048 [details]
reduction
Comment 3 Jungshik Shin 2008-03-25 16:08:16 PDT
The layout is broken because there is no font specified in the page.  So, Safari tries 'Times New Roman' (standard font) to render 'ASCII range' characters while it uses a Chinese font  for Chinese characters. 

Other browsers uses a Chinese font (simplified) for all characters (as long as the font covers them) because the page is in GB2312.  Here, GB2312 is regarded as an indirect (and not always correct) indicator of the language of a document.  FF and IE select fonts as if 'lang=zh-CN' is specified in this case. 

Times New Roman has wider glyphs for ASCII-range characters than a Chinese font and breaks the layout.  

Even if Safari takes charset as an indirect indicator of a language, there's not much it can do at the moment because Safari's font preference is very limited. It does not have per-language *and* per-CSS-family preference. (it just has a global per-CSS-family preferences). 

See bug 10874 for this. ( http://bugs.webkit.org/show_bug.cgi?id=10874 ).






Comment 4 Jungshik Shin 2008-03-25 16:15:19 PDT
There are numerous Chinese web sites with no or a very minimal font specification (like 'sans-serif') and many of them depend on this rather fragile assumption that a simplified Chinese font will be used for both ASCII-range characters and Chinese characters.

This depends on bug 10874, but I'm not empowered to set the dependency. 

jasneet, can you set it? 


Comment 5 Jungshik Shin 2008-07-24 20:41:08 PDT
Now I made  a tentative patch to resolve this issue. With my initial patch, the issue at hand is resolved. However, with that,  Safari's font selection UI becomes all but useless because 'language/script'-specific font should take a higher precedence over the font selected in the font selection UI. 

Safari's font-selection UI is too simple and does not allow per-script/language and per-CSS generic family based selections (as is the case with Firefox and IE to a lesser extent.).   Webkit's Settings class is also not expressive enough to carry that information (compared with Gecko's nsIPref).

Because of that, I revised  my patch (I'm gonna attach later this week) to make it  add   per-script/language and per-CSS generic family fonts AFTER the font selected via the UI (Settings::SansSerifFamily, etc).  My patch is limited in another sense because it infers language/script from charset only (NOT refering to lang/xml:lang which is what bug 10874 is about).  However, I think it can be improved that way later.

This does not solve the problem at hand, but at least it helps a lot with sites like http://www.asahi.com or other sites which don't specify font at all or only minimally.  (asahi.com may not be directly affected by this because non-Japanese/non-Korean/non-Chinese Windows, Safari currently tries a Japanese font before trying Chinese and Korean fonts).  However, there are tons of Chinese web sites which will be helped by this change. 





Comment 6 Eric Seidel (no email) 2008-07-24 20:51:12 PDT
I'm not a big fan of complicating Safari's preferences dialog with language+css-font specific font choices.  But perhaps there is some nice UI that the safari folks could come up with.

It will be interesting to see your patch regardless.
Comment 7 Dave Hyatt 2008-07-24 21:00:33 PDT
Prefs in WebKit could be expanded without exposing additional UI in Safari (allowing other WebKit clients to expose a more complicated UI if needed).  The key is really just to pick good defaults on each platform (as gecko does).
Comment 8 Jungshik Shin 2008-09-07 21:51:28 PDT
Created attachment 23238 [details]
patch with stub implementations of getGenericFontForScript for mac,win,gtk,qt,wx

Sorry for the delay. This patch 

1. adds dominantScript to Document. Currently, it's derived from the charset of a document. This is a fallback. It should come from lang/xml:lang when either of them is specified. 

2. adds getGenericFontForScript to FontCache. It returns a font for {script, css_generic} combination. Currently, it's a stub returning emptyAtom.  
(when bug 10874 is fixed, WebPreference and/or Settings can be used in this method. If not, a port of webkit can put its own implementation).

This method is used in 3 and 4 below. 

It can also be used for per-charset theme (as is done by Firefox and IE). That is, a font used for, say, button, is different depending on the char. encoding of a document and/or lang/xml:lang (in case of IE and Firefox). This patch does not include that. 


3. For each CSS generic family specified in font-family, 'per-script / per-generic' font is added in addition to 'per-generic (but across-scripts/languages)' font that is currently added.

4. When no font is specified at all or no CSS generic family is specified, 'per-script' (standard) font is added.
Comment 9 mitz 2008-09-17 10:22:27 PDT
Jungshik, I do not understand why the additional synthetic generic families are needed, given that the FontDescription has a script field. Can you please explain?
Comment 10 Jungshik Shin 2008-09-17 22:05:53 PDT
(In reply to comment #9)
> Jungshik, I do not understand why the additional synthetic generic families are
> needed, given that the FontDescription has a script field. Can you please
> explain?

You meant FontDescription has a genericFamily field? Perhaps, the name of a function is confusing. Perhaps, getFontForGenericFamilyAndScript is better than getGenericFontForScript.  What it does is to get a font to use for a given script and a given generic family. For instance, for sans-serif and Hans, it'd be Simsun on Windows and for monospace and Hans, it'd be NSimsun.  Other examples are (all on Windows XP. Vista has better CJK fonts)

(serif, Latn) : Times New Roman
(sans-serif, Latn) : Arial
(monospace, Latn) : Courier New 
(serif, Hang)  : Batang     [0]
(sans-serif, Hang) : Gulim or Dotum
(monospace, Hang) : Gulimche , Dotumche or Batangche
(serif, Hant) : PMingLiu
(monospace, Hant) : MingLiu
(serif, Japn)   : MS P Mincho   [1]
(sans-serif, Japn) : MS P Gothic
(monospace, Japn) : MS Gothic or MS Batang.

Of course, eventually, all these need to be configurable (as in Firefox). 



[1] My patch does not use Japn (a new script code introduced in ICU 3.8 or later) but uses Hiragana because it has as discriminating as Japn.  
Comment 11 mitz 2008-09-17 22:08:25 PDT
(In reply to comment #10)
> (In reply to comment #9)
> > Jungshik, I do not understand why the additional synthetic generic families are
> > needed, given that the FontDescription has a script field. Can you please
> > explain?
> 
> You meant FontDescription has a genericFamily field?

No, I meant the m_dominantScript member that you had added.
Comment 12 Dave Hyatt 2008-09-17 22:37:49 PDT
If FontDescription knows the script, then why do you need to append additional families at the style selector level?  Can't you just do the right thing down in the platform/ layer?

Comment 13 Jungshik Shin 2008-10-01 13:01:10 PDT
(In reply to comment #12)
> If FontDescription knows the script, then why do you need to append additional
> families at the style selector level?  Can't you just do the right thing down
> in the platform/ layer?

I think that's better if the goal is just to do what's implemented in the patch.
That is, replace a CSS-generic family (e.g. 'serif') with 'the global generic font for that css-generic and then per-generic/per-script font. 

However, this change is kinda intermediate step toward resolving bug 10874 and is rather ugly (and is still work in progress).

What I eventually like to do is to use per-script font "substitution" for CSS generic family (serif, sans-serif, monospace, etc) rather than the global (across-language/across-script) generic-to-font family mapping.[1]

Given the following snippet, 
<span lang="ja" style="font-family: sans-serif;">blah blah</span>

sans-serif is currently replaced by the 'global' sans-serif font. My goal is to replace it with 'sans-serif' font for Japanese. What the current patch does is in-between (replacing 'sans-serif' with the 'global' sans-serif followed by 'ja sans-serif').   Actually, 'lang' is not yet honored and as a poor man's lang, it resorts to inferring the lang from the charset of a document (which does not always work for the obvious reason). 

 Probably, it can be done more cleanly in fontDataForGenericFamily in CSSFontSelector.cpp with some changes in |settings|.  Or, if |settings| remained as it is, it might be possible to do using |getFontForScriptAndGeneric| (in the patch, it's |getGenericFontForScript|) that is platform-dependent (or port-dependent). 



BTW,  the first feedback from W3C I18N WG about Chrome was the lack of UI and preferences for font-selection per-script/unicode block (as found in IE and Firefox). They strongly believed that a browser needs that. 



Comment 14 Dave Hyatt 2008-10-01 13:23:49 PDT
Right, as I said before, having settings such that serif, sans-serif map to different fonts depending on language is fine.

I don't see any reason to take an intermediate step here though.  Let's design this fully from the start. 


Comment 15 Alexey Proskuryakov 2012-04-10 11:14:00 PDT
We have most of this working already, but there is still no encoding to language mapping. Re-titling accordingly.
Comment 16 Alexey Proskuryakov 2012-04-11 17:36:36 PDT
Looking at some sites that people complained about in the past, it appears that this enhancement would be very desirable. Many of those pages don't have lang attributes, but do have a telling charset like Shift-JIS or GB2312.
Comment 17 Alexey Proskuryakov 2012-04-11 17:38:37 PDT
<rdar://problem/11233034>