Bug 20542

Summary: Adding EOT Font Rendering capability
Product: WebKit Reporter: Prunthaban <prunthaban>
Component: Layout and RenderingAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Enhancement CC: alp, annam, ap, ddkilzer, eric, hyatt, ian, jshin, mike, mitz, mjs, nickshanks, phiw2, prunthaban, saravanannkl, webkit
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Windows XP   
Attachments:
Description Flags
A patch for supporting EOT rendering along GDI path
eric: review-
A patch to provide cross-platform EOT support
hyatt: review-
EOT Font file to be used for testing hyatt: review-

Description Prunthaban 2008-08-27 04:54:48 PDT
This patch is to provide 'EOT' (Embedded open Type) font rendering capability to Webkit along the "GDI only" path in Windows. There are a significant number of regional language sites (in India and some other countries) which uses EOT fonts for native characters (They don't use Unicode, they use non-unicode plain ASCII fonts with glyphs rendering native characters instead of standard ASCII characters). Without the support for EOT fonts,these sites will not be 'readable'.
Comment 1 Prunthaban 2008-08-27 05:01:03 PDT
Created attachment 23022 [details]
A patch for supporting EOT rendering along GDI path
Comment 2 Eric Seidel (no email) 2008-08-27 06:20:08 PDT
Comment on attachment 23022 [details]
A patch for supporting EOT rendering along GDI path

It seems there are missing files?  Can this little amount of code really support EOT fonts?

This will not compile in release mode:
+    if(!m_isEOT || renderingMode != AlternateRenderingMode)
+        ASSERT(m_cgFont);

ASSERT disappears in release mode.

Also, be sure to read:
http://webkit.org/coding/coding-style.html

There are lots of little style violations in this code.  A few which come to mind:

// comment
( generally we have a space after // before the comment)

if ( instead of if(

We don't use else after a return statement:
+    if(!m_isEOT)
+        return FontPlatformData(hfont, m_cgFont, size, bold, italic, renderingMode == AlternateRenderingMode);
+    else
+        return FontPlatformData(hfont, size, bold, italic, renderingMode == AlternateRenderingMode);	

{ goes on the next line after a function definition:

+void EOTStream::setInHeader(bool inHeader) {
+    m_inHeader = inHeader;
+}

Again, won't compile in Release mode (afaik):

+            if(!m_useGDI)
+                ASSERT(m_cgFont);


+bool isEOTFont(SharedBuffer* fontData) 

should have some minimum length check.  Otherwise it will crash if passed data shorter than sizeof(EOTPrefix)

According to our style guidelines, we don't use c-style casts in C++, this should be a reinterpret_cast<EOTPrefix*>:

+    EOTPrefix* prefix = (EOTPrefix*)(data);


No inner { } here:
+            if (localGlyphBuffer[i] == invalid_glyph) {
+                localGlyphBuffer[i] = 0;
+            }

r- for all the style violations.  I've CC'd hyatt and hixie, who are the two people who you would need to convince that EOT support is a good thing for WebKit before we could accept such a patch.
Comment 3 Eric Seidel (no email) 2008-08-27 06:21:38 PDT
Comment on attachment 23022 [details]
A patch for supporting EOT rendering along GDI path

Oh, and *thanks* for the patch!  Congrats on your first WebKit patch.

One more thing is that we would require at least a few EOT test cases before we could land this.  I assume you could provide some test EOT fonts (from the web somewhere) and a couple HTML files which use them.  You can read more about creating layout tests here:
http://webkit.org/quality/testwriting.html
Comment 4 Dave Hyatt 2008-08-27 12:28:13 PDT
We have a cross-platform solution (TTF) that works on other platforms besides just Windows.  

Implementing EOT undermines this solution, and will encourage vendors to write code that would only work on Windows.

If you're going to add EOT support, then really add EOT support.   A patch that only works on one platform (and only with one kind of text rendering mode) does not cut it, and it creates a gigantic compatibility difference in the downloadable fonts feature between the Windows platform and the other platforms.
Comment 5 Ian 'Hixie' Hickson 2008-08-27 12:50:25 PDT
Hyatt is right, if we want to add EOT support to WebKit (and I understand that it might make sense to do that given Indic sites) then we absolutely must add it cross-platform. Just supporting one platform is bad.
Comment 6 Prunthaban 2008-08-29 00:12:54 PDT
I have started working on providing corss-platform support for EOT. I will be updating this bug as and when I create new patches.
Comment 7 Dave Hyatt 2008-08-29 11:09:55 PDT
Make sure you ifdef EOT support so that vendors are able to turn it off easily.

Comment 8 Jungshik Shin 2008-09-05 13:22:14 PDT
*** Bug 18668 has been marked as a duplicate of this bug. ***
Comment 9 Prunthaban 2008-10-15 05:07:30 PDT
Created attachment 24356 [details]
A patch to provide cross-platform EOT support

This patch will convert an EOT file into TTF file and pass it to the TTF handling routine. Since TTF handling already has a cross platform solution in Webkit, this patch should also work in all platforms.
This patch expects EOT_SUPPORT to be #defined. Otherwise, eot files will be ignored.
I have not tested this patch in Mac. This was tested on Windows and it works fine. 
The test-file attached expects broken.eot to be placed in resources directory (I will attach that too). If you need some additional test cases, I can create them. I need to know what type of test cases are expected.
Comment 10 Prunthaban 2008-10-15 05:08:45 PDT
Created attachment 24357 [details]
EOT Font file to be used for testing
Comment 11 Dave Hyatt 2008-10-15 11:26:34 PDT
I think I'd like to see the EOT stuff under its own subdirectory.  platform/graphics/eot/ would be fine.

You need to use ENABLE(EOT_SUPPORT) rather than EOT_SUPPORT.

I believe we use #ifndef NDEBUG for debugging code.

Not sure about the WINDOWS ifdef, but I think we use WIN_OS for that?

Are you respecting the restrictions that might be present in an EOT file (e.g., if the font says it can only be used with a certain URL etc.)?

Comment 12 Prunthaban 2008-10-15 21:00:30 PDT
I will make those modifications.

The Root String (URL validation) handling is not done yet. I will be adding them before this patch is committed (I need to figure out a way to create a test-font without that restriction so that we can test it locally).
Comment 13 mitz 2008-10-15 21:09:35 PDT
(In reply to comment #11)
> Are you respecting the restrictions that might be present in an EOT file (e.g.,
> if the font says it can only be used with a certain URL etc.)?

I think it is important to ensure that this conformance requirement does not get in users' way. For one, the Web Inspector should be able to show fonts regardless of the restriction. Non-browser WebKit clients may also want to do the same, so the restriction should probably be opt-in.
Comment 14 Ian 'Hixie' Hickson 2008-10-16 14:39:18 PDT
What's the benefit to the users of honouring those restrictions? It seems like unless we have a good user-driven reason to honour the restrictions, we should just not bother.
Comment 15 Srinivas Annam 2008-10-16 19:11:01 PDT
It is the UA that has the responsibility to enforce those restrictions. Otherwise, a web site operator can indiscriminately copy fonts from other web sites and make them available as their own.
Comment 16 Dave Hyatt 2008-10-17 00:27:05 PDT
Comment on attachment 24356 [details]
A patch to provide cross-platform EOT support

Minusing based off my comments.

One thing I should say is that this patch has absolutely no vendor support.  In fact neither Safari nor Chrome are willing to add EOT support.

If there is no major vendor that is actually interested in shipping this code, then I don't think it should land in the WebKit tree.
Comment 17 Prunthaban 2008-10-17 00:38:01 PDT
Hyatt,
Chrome is interested in adding EOT support (due to the fact that many major Indian regional sites use EOT) and that is the major reason behind this patch. (At present I am adding URL validation part to this patch).
Can you please clarify on this? If Chrome needs this patch, then we should not have a problem in landing this patch I believe.
Comment 18 Dave Hyatt 2008-10-17 01:08:25 PDT
You say that this is wanted for Chrome, but when I brought this bug up today with Chrome team members they didn't seem to agree.

Regardless, I would strongly urge you not to add support for EOT to Chrome.

Adding EOT support has implications that go far beyond Web site compatibility.  Support of EOT undermines the current TTF support.

Mozilla, Opera and Apple are not going to add EOT support to their browsers.  If Chrome supports EOT this will create a significant fork for Chrome not only compared to shipping WebKit products but also with the other standards-compliant browsers.

I urge you to read up on the positions of Apple, Opera and Mozilla regarding why EOT support is not desired.
Comment 20 Nicholas Shanks 2008-10-17 10:18:55 PDT
Indic sites need to move to standard encodings. Lots of fonts with unique encodings have had ISCII or Unicode converters created for them by users. (e.g. http://www.baraha.com/help/baraha/font_convert.htm or http://crl.nmsu.edu/~mleisher/nai.html )

Please persuade any sites you go to not to use font-dependant encodings. That not only means the use can't choose a font they are comfortable with, but it breaks indexing and searching.

If you choose ISCII rather than Unicode, serve the page with a Content-Type header containing one of the following charsets:

Codepage	Charset	Script
57002	x-iscii-de	Devanagri
57003	x-iscii-be	Bengali
57004	x-iscii-ta	Tamil
57005	x-iscii-te	Telugu
57006	x-iscii-as	Assamese
57007	x-iscii-or	Oriya
57008	x-iscii-ka	Kannada

When the user receives them, the browser will do the conversion from ISCII to Unicode, and render it with one of the user's installed fonts.

Sites that use fonts with custom encodings could also, at present, just serve the .ttf form of the 8-bit font to FF/Safari/Opera:

src: url(font.eot); /* ignored or overridden by everyone except IE */
src: url(font.ttf) format(truetype); /* ignored by IE due to unrecognised format() specifier */


Also, be aware that serving a full Unicode-encoded OpenType font to a Mac user via @font-face won't work. Apple's OpenType support is just getting started, and will not handle Indic for the foreseeable future, so an AAT font is required. The user should already have one for his language (either shipped with the OS, downloaded, or bought).
Comment 21 Jungshik Shin 2008-10-17 11:20:59 PDT
(In reply to comment #20)
> Indic sites need to move to standard encodings. Lots of fonts with unique
> encodings have had ISCII or Unicode converters created for them by users. (e.g.
> http://www.baraha.com/help/baraha/font_convert.htm or
> http://crl.nmsu.edu/~mleisher/nai.html )

Ick. No ISCII, please. Switching to Unicode is a long over-due. No doubt about it. 

The reality is that they're slo........w in switching. In some Indian languages (like Malayalam), the switch has been accelerated recently, but in other languages, it appears that it's moving really slowly.  Sigh....

There are a lot of reasons to switch to Unicode. Even with EOT support in a browser, copy'n'paste, find-in-a-page, and history would not work.  And, obviously, the practice is not search-engine-friently, either. 

There is a Firefox extension that makes all the above possible by translating each textnode in a font-sepcific encoding (depending on the font used) to Unicode on-the-fly. ( http://padma.mozdev.org/ ) There was  a proposal to add that feature to Chrome (actually,  Webkit) natively, but I haven't  mentioned it here because it'd require rather invasive (and potentially perf-impacting) Webkit changes (as well as a few dozens of font-specific conversion tables) and I was not sure whether it'd fly here.  That's a certainly alternative to EOT support with a better user-experience, but with a lot to think about as mentioned earlier. I wonder what others think of this alternative. 

Just thinking aloud further:
How about enabling EOT supports for pre-configured list of (font, website) pairs? The vast majority of them will be Indian web sites with a small number of sites in Burmese,Khmer, etc (however, the speakers of the latter languages  have been rapidly embracing Unicode so it's perhaps 100% Indian web sites). 


Comment 22 Jungshik Shin 2008-10-17 11:30:51 PDT
(In reply to comment #20)

> Sites that use fonts with custom encodings could also, at present, just serve
> the .ttf form of the 8-bit font to FF/Safari/Opera:
> 
> src: url(font.eot); /* ignored or overridden by everyone except IE */
> src: url(font.ttf) format(truetype); /* ignored by IE due to unrecognised
> format() specifier */

Yeah, that is the path of the least resistance to Indian web sites. If we cannot persuade them to switch to Unicode soon, at least we should ask them to do the above. Usually, they offer TTF available for download (for other browser users) so that there's no reason NOT to do the above. Needless to say, this does not solve find/search, and copy'n'paste. 

 
> Also, be aware that serving a full Unicode-encoded OpenType font to a Mac user
> via @font-face won't work. Apple's OpenType support is just getting started,
> and will not handle Indic for the foreseeable future, so an AAT font is
> required. The user should already have one for his language (either shipped
> with the OS, downloaded, or bought).


BTW, the current release version of Safari on Windows does not use OTF specified in font-face for complex scripts (Uniscribe codepath).  I haven't tried the trunk build, though. 

Comment 23 mitz 2008-10-17 11:54:27 PDT
(In reply to comment #22)

> BTW, the current release version of Safari on Windows does not use OTF
> specified in font-face for complex scripts (Uniscribe codepath).  I haven't
> tried the trunk build, though.

It should work in TOT.
Comment 24 Nicholas Shanks 2008-10-17 17:07:10 PDT
(In reply to comment #21)

> Ick. No ISCII, please.

There is nothing wrong in storing and serving the files in the various ISCII representations, just as there is nothing wrong in me serving GB 18030, ISO-8859-2, Windows 1256 or MacHebrew—because the browsers know how to convert them to their internal representation (which may not be Unicode anyway, especially low memory mobile devices in places like China and Japan).

> copy'n'paste, find-in-a-page, and history would not work

These features do not require Unicode, only that the encoding used is both specified by the server and understood by the client. In the case of WebKit, ICU is used to transcode from the served encoding to an internal representation (which happens to be Unicode).

If it is easier to talk sites into using ISCII than it is to get them to use Unicode, then that's great, lets do that. Anything to get them weened off font-specific encodings, and on to standardised ones.
Comment 25 Maciej Stachowiak 2008-10-17 18:28:36 PDT
(In reply to comment #21)

> There is a Firefox extension that makes all the above possible by translating
> each textnode in a font-sepcific encoding (depending on the font used) to
> Unicode on-the-fly. ( http://padma.mozdev.org/ ) There was  a proposal to add
> that feature to Chrome (actually,  Webkit) natively, but I haven't  mentioned
> it here because it'd require rather invasive (and potentially perf-impacting)
> Webkit changes (as well as a few dozens of font-specific conversion tables) and
> I was not sure whether it'd fly here.  That's a certainly alternative to EOT
> support with a better user-experience, but with a lot to think about as
> mentioned earlier. I wonder what others think of this alternative. 

I'd love to hear more details about this proposal. Is it always perf-impacting, or only when a known oddly-encoded EOT font is used?

> Just thinking aloud further:
> How about enabling EOT supports for pre-configured list of (font, website)
> pairs? The vast majority of them will be Indian web sites with a small number
> of sites in Burmese,Khmer, etc (however, the speakers of the latter languages 
> have been rapidly embracing Unicode so it's perhaps 100% Indian web sites). 

That's also worth considering, though it sounds like translating on the fly to unicode could lead to a better user experience, in all the ways you mention.

Comment 26 Jungshik Shin 2008-10-20 14:24:07 PDT
(In reply to comment #24)
> (In reply to comment #21)
> 
> > Ick. No ISCII, please.
> 
> There is nothing wrong in storing and serving the files in the various ISCII
> representations, 

You may not, but many people want to see *all* the web pages encoded in UTF-8 or UTF-16.  I know legacy-encoded pages will keep being put up for a long time to come.  However, do we really want to introduce any 'new legacy encoding' ('new' in the sense that they've not been used in any meaningful number of sites) into 'the web sphere' ?

> especially low memory mobile devices in places like China and Japan).
> 
> > copy'n'paste, find-in-a-page, and history would not work
> 
> These features do not require Unicode, 

Did I say that it requres Unicode?  Of course not ! Otherwise, all those non-Unicode encoded pages would be in big trouble. 

> If it is easier to talk sites into using ISCII than it is to get them to use
> Unicode, then that's great, lets do that.

I don't think it's any easier to persuade them to switch to ISCII than Unicode.  
Besides, note that Firefox does not support ISCII at all although it's relatively easy to add support for them.  Chrome does not either (it's even easier to support ISCII in Chrome than in Firefox), but again I'd not do that unless it's really necessary. 


Comment 27 Nicholas Shanks 2008-10-21 01:17:56 PDT
(in reply to comment #26)

> You may not, but many people want to see *all* the web pages encoded in UTF-8
> or UTF-16.

FWIW I encode all my pages in UTF-8 by default, though also offer older encodings for non-Latin languages, with content negotiation turned on so older UAs can request something they understand.
I *am* a great proponent of Unicode, and have been actively involved in the Indian Unicode promotional community for several years. It is the best answer. I'm just saying it's not the only answer.

> I know legacy-encoded pages will keep being put up for a long time
> to come.

Do you ever envisage a time when they will not. Even 200 years from now, I think people will still be using pre-Unicode encodings.

> However, do we really want to introduce any 'new legacy encoding'
> ('new' in the sense that they've not been used in any meaningful number of
> sites) into 'the web sphere' ?

There are massive numbers of government documents in India that are ISCII encoded. Allowing these to be put online without intermediary steps will do wonders for e-governance in the country. Yes, we browser makers could dictate terms to the Indian government and force them to re-encode everything, but it's much easier for us to just support the encodings they are already using. I don't think your "not on the web at the moment" argument really passes muster. Sorry.

> > If it is easier to talk sites into using ISCII than it is to get them to use
> > Unicode, then that's great, lets do that.
> 
> I don't think it's any easier to persuade them to switch to ISCII than Unicode.

Unicode is unwieldy, foreign and unfamiliar, ISCII is something they've already heard of, home-grown, cosy and warm. I do think it will be easier to get people to use this than Unicode, though it hasn't been something I have pushed for previously.
There is also lots of ISCII-supporting software available in India, and it is older (therefore more likely to be installed on infrequently refreshed computers) than Unicode-compatible software. Without doubt there are large swathes of regional government that cannot use Unicode because of lack of modern software.

> Firefox does not support ISCII
> Chrome does not either
> I'd not do that unless it's really necessary.

I think ease-of-transition. Until it was suggested earlier in this thread, I hadn't though of doing the font-specific encoding conversion inside the browser itself. This may be a good short term solution, but still hurts other UAs like search engines etc, and may add another obstacle in the way of getting these sites to switch over long term. I think it's okay though for minority browsers that do not implement EOT to do this.
Comment 28 Nicholas Shanks 2008-10-21 04:53:32 PDT
There seems to be a person/company/dunno offering a library for font-specific encoding conversion available at http://www.mukri.com/products/
Comment 29 Nicholas Shanks 2008-10-21 05:04:05 PDT
There's a big list of converters (both Windows apps and online, but biased towards Devanagari) available at:
http://hi.wikipedia.org/wiki/इंटरनेट_पर_हिन्दी_के_साधन#.E0.A4.AB.E0.A4.BC.E0.A5.89.E0.A4.A3.E0.A5.8D.E0.A4.9F_.E0.A4.AA.E0.A4.B0.E0.A4.BF.E0.A4.B5.E0.A4.B0.E0.A5.8D.E0.A4.A4.E0.A4.95_.28Font_Converters.29

There is also a Windows app called Pavaritan supplied by the Indian government available for conversions:
http://ildc.gov.in/gistnew/Parivartan.zip

Perhaps someone who can read Hindi can go through these and check their licenses?
Comment 30 John Daggett 2009-03-19 01:56:20 PDT
One word of caution here, the MTX compression scheme used in conjunction with the EOT format is a patented scheme, using it requires a license from the patent owner Monotype.  The EOT format has been proposed as a W3C format but until that proposal advances to the Recommendation phase, the submitted documentation does not imply a grant of license for its use.

Naturally, that doesn't apply to a Windows-only implementation that uses the t2embed library provided by Microsoft.
Comment 31 Ian 'Hixie' Hickson 2009-03-19 16:04:44 PDT
My understanding is that nobody actually wants to implement EOT itself anymore; if anything happens, it'll just be a kind of text transformation that converts the input text to the right Unicode characters, without actually downloading the font at all.