Summary: | Zero width and space characters are displayed incorrectly if not contained in a fallback font. | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Brett Wilson (Google) <brettw> | ||||||||||||
Component: | Platform | Assignee: | mitz | ||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||
Severity: | Normal | CC: | ap, dglazkov, dimitri, hyatt, jshin, mitz, nickshanks | ||||||||||||
Priority: | P2 | ||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||
Hardware: | PC | ||||||||||||||
OS: | Windows XP | ||||||||||||||
Attachments: |
|
Description
Brett Wilson (Google)
2008-07-30 17:23:55 PDT
Created attachment 22570 [details]
Test case
This patch shows a ZWSP and LTR characters. If you don't have Apple's fallback font or Arial Unicode MS on Windows, then these will be rendered as squares.
Created attachment 22591 [details]
Patch
This patch makes boxes for directional markers and nonbreaking spaces disappear when there is no fallback font that supports them. It hardcodes that the width of "zero width space" characters is 0. Platforms will also have to hardcode that the glyph for such characters is the space glyph. I've added this to the Windows code.
Mac doesn't seem to have this problem because the fonts are less crappy. If Mac wanted to implement it, it would be best in wkGetGlyphVectorFirstRecord which looks like isn't public.
I added a parameter to the glyph width getter so it gets the original character. I also use this for an optimization for Chinese characters which I hope to submit soon; it's very helpful for the width computation code to actually know what character we're talking about.
The changes in the Font.h file are to make the zero width space (and a few related chars) actually return true in the treatAsZeroWidthSpace function().
Comment on attachment 22591 [details]
Patch
Hyatt says that mitz is the man to review this.
(In reply to comment #2) > I added a parameter to the glyph width getter so it gets the original > character. I also use this for an optimization for Chinese characters which I > hope to submit soon; it's very helpful for the width computation code to > actually know what character we're talking about. I am not too happy with the extra work in widthForGlyph, and not particularly happy with breaking the separation between character space and glyph space (what are we going to pass down to that function when we have ligatures and other features that break the 1:1 character:glyph ratio?), and I can think of a different way to address this bug, so I would like to know more about your planned optimization for Chinese. With our CJK optimization, it looks like this: SimpleFontData::widthForGlyph(UChar32 c, Glyph glyph) const { bool is_CJK = IsCJKCodePoint(c); float width = is_CJK ? m_cjkGlyphWidth : m_glyphToWidthMap.widthForGlyph(glyph); #ifndef NDEBUG // Test our optimization that assuming all CGK glyphs have the same width if (is_CJK) { const float actual_width = platformWidthForGlyph(glyph); ASSERT((cGlyphWidthUnknown == width) || (actual_width == width)); } #endif if (c > ' ' && Font::treatAsZeroWidthSpace(c)) return 0.0f; if (width != cGlyphWidthUnknown) return width; width = platformWidthForGlyph(glyph); if (is_CJK) { m_cjkGlyphWidth = width; } else { m_glyphToWidthMap.setWidthForGlyph(glyph, width); } return width; } So basically, this caches the widths of the first CJK character seen in a font, and uses that for all subsequent ones. If I recall correctly, this, along with making the widths 16-bit fixed point, saves us ~150MB for a page cycler run with a lot of different CJK pages in it. There are a whole bunch of characters to fall into 'zero-width' category. It might be better to use | u_hasBinaryProperty(c, UCHAR_DEFAULT_IGNORABLE_CODE_POINT) | ( http://www.icu-project.org/apiref/icu4c/uchar_8h.html#25c5c820d4141e4099acc15ca83572a5 ). Firefox does not use the set (as it is) but defines its own (derived from the default ignorable). Some characters with default ignorable property cannot be ignored but as long as they're in 'simple script' code path, I think it's ok. mitz: Can you comment on the overall approach. I'll try to come up with better patch using Jungshik's suggestions, but I would like to know how best to integrate the new code. I think it would be best to keep character-to-glyph mapping separate from glyph-to-width mapping, and I think the goals of your changes could be achieved by creating separate FontData objects, such as one that has a zero-width invisible glyph (the underlying platform font can be any font with a space glyph; the width map changed to give it zero width) which could be used for characters that should render as zero-width space; and such as one that shares the same underlying platform font with another FontData, has a 'monospace' bit set, has no width map, which could be used for all CJK characters in the font. Comment on attachment 22591 [details]
Patch
r- to remove this from the queue while you consider my comments :)
... trying to figure out how this will work. So, in your view, these FontData instances be would be sub-classes of SimpleFontData (say, ZeroWidthFontData and CJKFontData), each offering a different widthForGlyph implementation? Then, in GlyphPage::fill, we supply either an instance of SimpleFontData (for most chars), ZeroWidthFontData (for ZWS and alike), or CJKFontData (for CJK chars). WDYT? I think subclassing and making widthForGlyph() virtual might be bad for performance, but if you have a patch that does that I can try and measure it. Created attachment 23174 [details]
Updated Patch
As discussed on #webkit w/mitz, here's an updated patch. There are opportunities for optimization, but I thought I'd first get the basics right.
Comment on attachment 23174 [details]
Updated Patch
+// static
Why is this here?
I think this patch tries to do too many things at once. This bug is related to the display of zero width characters, and thus, the patch should try and address only that issue. Please file another bug about the potential CJK optimization.
(In reply to comment #13) > (From update of attachment 23174 [details] [edit]) > +// static > Why is this here? it seems to be part of the Google C++ Style guidelines (thus what most Google engineers are familiar with). I don't see it listed in the external c++ style guidelines though. It will take a while for many of us to learn the differences between WebKit/Google style. :) (In reply to comment #14) > (In reply to comment #13) > > (From update of attachment 23174 [details] [edit] [edit]) > > +// static > > Why is this here? > > it seems to be part of the Google C++ Style guidelines (thus what most Google > engineers are familiar with). I don't see it listed in the external c++ style > guidelines though. It will take a while for many of us to learn the > differences between WebKit/Google style. :) Yes, I've been kinda sorta trying to merge both styles. I will split up CJK opt and resubmit. Created attachment 23200 [details]
Updated: ZWS fix only
Removed CJK optimization.
I am confused. From the initial description of the bug it sounded as if this is only a problem in the system fallback case, so I expected the substitution to happen only in the end of Font::glyphDataForCharacter(). What am I missing? The more I look at the bug description, and the more testing I do in Safari on Windows, the less I understand how this bug works, even in the system fallback case. The only exception is U+FFFC (OBJECT REPLACEMENT CHARACTER) which is not treatAsZeroWidthSpace(), even though it is ZWS-ed out in GlyphPageTreeNode::initializePage. In all other cases, I expect a missing glyph (square) to be rendered only if no font on the system has a glyph for ZWS. As far as I can tell, Times New Roman, plain ol' Arial, and other fonts on Windows have correct glyphs for ZWS. Well, I dug some more, using a pristine (font-wise) XP install and here's what happens for the ZWS chars for a standard Win font, using Times New Roman as an example. wkGetGlyphs returns 0 for the 0x200B char, so the glyph doesn't exist in the font, but FontCache::getFontDataForCharacters is able to scare up the glyph using font-linking (the glyph is borrowed from Lucida Sans Unicode). In the case of 0xFFFC, the same path is followed, except no glyph is found using font-linking, so we fall back into Uniscribe, drawing the string into a metafile. Unlucky for us, this succeeds with (woot!) "Times New Roman", which brings back the original font, which in turn draws a square in place of the character. So, the proposed patch, IMHO (I am not a fonts expert, please correct me if I am wrong) eliminates the need to dip into the IMLang linking business and avoids creating metafile objects for drawing the ORC char. I don't have a strong opinion about moving the SimpleFontData substitution logic to Font::glyphDataForCharacter. My thinking was to initialize the FontCache with the substituted objects to begin with, so that there's no additional checking while drawing each character. (In reply to comment #19) > In the case of 0xFFFC, the same path is followed, except no glyph is found > using font-linking [...] But we should never be asking for a glyph for U+FFFC. We should -- and we currently don't -- ask for a glyph for zero-width space instead. The reason we don't is that even though GlyphPageTreeNode::initializePage() is aware of this substitution, Font::treatAsZeroWidthSpace() is not, and the latter is what's used in the "system fallback" case near the end of glyphDataForCharacter(). Just adding U+FFFC to treatAsZeroWidthSpace() OR would make the problem go away, I think. Created attachment 23276 [details]
With just the 0xFFFC check
Shedding another pound of code after healthy mitzpettel's workout. This patch leaves finding ZWS glyphs up to IMLang font-linking, instead of using a separate FontData instance.
So, the only change is making Font:treatAsZeroWidthSpace(c) react to 0xFFFC properly.
Comment on attachment 23276 [details]
With just the 0xFFFC check
+ Adds an extra check for Object Replacement Character (0xFFFF) to address
Typo: should say FFFC.
Unfortunately, a layout test does not seem possible because our test configurations have ZWS mapped to a glyph even in non-fallback fonts, so assuming no existing layout tests break, r=me!
Committed <http://trac.webkit.org/changeset/36276>. |