Bug 136971 - Implement 'vhea', 'vmtx', and 'kern' tables in SVG -> OTF converter
Summary: Implement 'vhea', 'vmtx', and 'kern' tables in SVG -> OTF converter
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Myles C. Maxfield
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-19 18:57 PDT by Myles C. Maxfield
Modified: 2014-09-22 15:08 PDT (History)
6 users (show)

See Also:


Attachments
Patch (37.23 KB, patch)
2014-09-19 19:29 PDT, Myles C. Maxfield
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Myles C. Maxfield 2014-09-19 18:57:31 PDT
Implement 'vhea', 'vmtx', and 'kern' tables in SVG -> OTF converter
Comment 1 Myles C. Maxfield 2014-09-19 19:29:49 PDT
Created attachment 238404 [details]
Patch
Comment 2 Darin Adler 2014-09-21 10:49:16 PDT
Comment on attachment 238404 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=238404&action=review

I’m not sure that all this clamping is right.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:101
> +            return (static_cast<uint32_t>(glyph1) << 16 | glyph2) < (static_cast<uint32_t>(other.glyph1) << 16 | other.glyph2);

The static_cast here are not needed unless we are compiling for a platform where “int” and “unsigned” are 16-bit, and that won’t be happening. But also, there’s no need to combine these glyphs that way. Lets just write this:

    return glyph1 < other.glyph1 || (glyph1 == other.glyph1 && glyph2 < other.glyph2);

Also, it would be cleaner to have this be a separate compareGlyphs function rather than having KerningData implement a < operator that doesn’t consider all of the data.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:142
> +    HashMap<Codepoint, Glyph> m_codepointToIndexMap; // FIXME: There might be many glyphs that map to a single codepoint

Missing period.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:338
> +        auto& missingGlyphAdvanceAttribute = m_missingGlyphElement->fastGetAttribute(SVGNames::horiz_adv_xAttr);
> +        int value = missingGlyphAdvanceAttribute.toInt(&ok);

No need for the local variable here. Just write this in one line.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:340
> +            averageAdvance = clampTo<int16_t, int>(value);

Should just be clampTo<int16_t>, it’s not helpful to specify the second time.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:547
> +    int16_t defaultVerticalOriginY = clampTo<int16_t>(m_fontElement.fastGetAttribute(SVGNames::vert_origin_yAttr).toInt(&ok));
> +    if (!ok && m_missingGlyphElement)

Do we really need to check "ok"? Is zero a valid value?

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:548
> +        defaultVerticalOriginY = clampTo<int16_t>(m_missingGlyphElement->fastGetAttribute(SVGNames::vert_origin_yAttr).toInt(&ok));

No reason to pass &ok here since we never look at it afterward.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:575
> +    write16(result, clampTo<int16_t>(-static_cast<long>(m_unitsPerEm / 2))); // Vertical typographic descender

The choice of the type "long" here is unusual. I could see using int or int32_t, but long doesn’t seem better than either of those two types.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:580
> +    write16(result, clampTo<int16_t, float>(m_advanceHeightMax));
> +    write16(result, clampTo<int16_t, float>(m_unitsPerEm - m_boundingBox.maxY())); // Minimum top side bearing
> +    write16(result, clampTo<int16_t, float>(m_boundingBox.y())); // Minimum bottom side bearing
> +    write16(result, clampTo<int16_t, float>(m_unitsPerEm - m_boundingBox.y())); // Y maximum extent

No need to specify ", float" on all of these clampTo function calls.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:593
> +    for (const auto& glyph : m_glyphs) {

Just auto& is fine.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:595
> +        write16(result, clampTo<uint16_t, float>(glyph.verticalAdvance));
> +        write16(result, clampTo<int16_t, float>(m_unitsPerEm - glyph.boundingBox.maxY())); // top side bearing

No need to specify ", float" on all of these clampTo function calls.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:601
> +    uint16_t roundedNumKerningPairs = roundDownToPowerOfTwo(kerningData.size());

I really don’t understand why rounding *down* is OK.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:603
> +    uint16_t searchRange = roundedNumKerningPairs * 6;
> +    uint16_t rangeShift = (kerningData.size() - roundedNumKerningPairs) * 6;

No idea where this magic number 6 comes from.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:606
> +    uint16_t entrySelector = 0;
> +    while (roundedNumKerningPairs >>= 1)
> +        ++entrySelector;

I think it would be nicer to have a helper function for this rather than writing out the loop in this fashion.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:616
> +    for (auto unicodeRange : unicodeRanges) {

Should be auto&, not auto.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:619
> +            if (codepoint > std::numeric_limits<Codepoint>::max())
> +                continue;

Why is this check needed? Is there some danger that the find call below will find something when passed in a huge number? It’s probably not a good idea to truncate a code point here, but I don’t understand how such bad values could get into UnicodeRanges in the first place. Also, 0x0000 and 0xFFFF will both cause trouble below in the has table code but this doesn’t disallow either of those.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:620
> +            auto indexIter = m_codepointToIndexMap.find(codepoint);

I think the word “iterator” would be better than word and a half “index iter”.

What prevents us from passing 0 or the deleted value (usually -1) here?

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:622
> +                glyphSet.add(indexIter->value);

This is a really inefficient data structure for this kind of thing. Representing every single value in a range as a separate entry in a HashSet is super-expensive for large ranges.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:630
> +        // FIXME: Only support BMP for now

Why? The code to handle any UTF-16 sequence is simple.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:631
> +        if (codepoint.length() == 1 && codepoint.length() == 1 && codepoint.at(0) <= std::numeric_limits<Codepoint>::max()) {

This says codepoint.length() twice.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:632
> +            auto indexIter = m_codepointToIndexMap.find(codepoint.at(0));

What prevents us from trying to use 0x0000 or 0xFFFF here?

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:639
> +void SVGToOTFFontConverter::addGlyphNames(const HashSet<String>& glyphNames, HashSet<uint16_t>& glyphSet) const

I’m not sure why we keep the glyph names in a set and then later put them into a map. It seems wasteful to use indexed data types for both. I suggest using just a Vector for the names in the kerningPair object.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:641
> +    for (auto glyphName : glyphNames) {

This should be auto&.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:668
> +            for (auto glyph1 : glyphSet1) {
> +                for (auto glyph2 : glyphSet2)

Should be auto& instead of auto.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:690
> +        size_t sizeOfKerningDataTable = 14 + 6 * kerningData.size();

A little strange to put this into a size_t and then write it out as a 16-bit value without any range checking. I think the code overall is not consistent about things like clamping and range checking.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:693
> +            sizeOfKerningDataTable = 0;

Seems like this should be 14.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:714
> +    // Table 2

This copy and paste code isn’t good. There should be a helper function rather than code that we just repeat twice.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:740
> +    // There are two competing specs for the 'kern' table: one that only Apple supports and one that everyone supports. Apple's font
> +    // parser claims to support both; however, it requires some padding bytes at the end if you use the second format. <rdar://problem/18401901>

I don’t think this should have all this commentary. Comment should say “Work around a bug in Apple's font parser by adding some padding bytes.” The commentary about two specs for 'kern' table doesn’t seem relevant here.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:964
> +    for (const auto& glyphElement : childrenOfType<SVGGlyphElement>(m_fontElement)) {

Should just be auto&, not const auto&.
Comment 3 Myles C. Maxfield 2014-09-22 14:52:37 PDT
Comment on attachment 238404 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=238404&action=review

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:101
>> +            return (static_cast<uint32_t>(glyph1) << 16 | glyph2) < (static_cast<uint32_t>(other.glyph1) << 16 | other.glyph2);
> 
> The static_cast here are not needed unless we are compiling for a platform where “int” and “unsigned” are 16-bit, and that won’t be happening. But also, there’s no need to combine these glyphs that way. Lets just write this:
> 
>     return glyph1 < other.glyph1 || (glyph1 == other.glyph1 && glyph2 < other.glyph2);
> 
> Also, it would be cleaner to have this be a separate compareGlyphs function rather than having KerningData implement a < operator that doesn’t consider all of the data.

I had no idea that shifting performed integral promotions of the left argument! I had to read the C++ spec to figure out what you meant by that first statement.

By the way, the original code was written like that because the spec says: "The left and right halves of the kerning pair make an unsigned 32-bit number, which is then used to order the kerning pairs numerically." Your way is more readable, though.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:142
>> +    HashMap<Codepoint, Glyph> m_codepointToIndexMap; // FIXME: There might be many glyphs that map to a single codepoint
> 
> Missing period.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:338
>> +        int value = missingGlyphAdvanceAttribute.toInt(&ok);
> 
> No need for the local variable here. Just write this in one line.

Too bad you ended up mentioning a lot of these things twice. In my defense, I wrote this before I read your patch.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:340
>> +            averageAdvance = clampTo<int16_t, int>(value);
> 
> Should just be clampTo<int16_t>, it’s not helpful to specify the second time.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:547
>> +    if (!ok && m_missingGlyphElement)
> 
> Do we really need to check "ok"? Is zero a valid value?

Yep.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:548
>> +        defaultVerticalOriginY = clampTo<int16_t>(m_missingGlyphElement->fastGetAttribute(SVGNames::vert_origin_yAttr).toInt(&ok));
> 
> No reason to pass &ok here since we never look at it afterward.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:575
>> +    write16(result, clampTo<int16_t>(-static_cast<long>(m_unitsPerEm / 2))); // Vertical typographic descender
> 
> The choice of the type "long" here is unusual. I could see using int or int32_t, but long doesn’t seem better than either of those two types.

I was using long for the case of std::numeric_limits<unsigned>::max() (which can't be negated properly) but the "/ 2" allows a int32_t work just fine, as you said. Now that I think about it, long wouldn't have worked on 32-bit machines...

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:580
>> +    write16(result, clampTo<int16_t, float>(m_unitsPerEm - m_boundingBox.y())); // Y maximum extent
> 
> No need to specify ", float" on all of these clampTo function calls.

Done. I did this at all the call sites.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:593
>> +    for (const auto& glyph : m_glyphs) {
> 
> Just auto& is fine.

Done. I believe I just fixed this throughout the whole file.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:595
>> +        write16(result, clampTo<int16_t, float>(m_unitsPerEm - glyph.boundingBox.maxY())); // top side bearing
> 
> No need to specify ", float" on all of these clampTo function calls.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:601
>> +    uint16_t roundedNumKerningPairs = roundDownToPowerOfTwo(kerningData.size());
> 
> I really don’t understand why rounding *down* is OK.

These formulas come directly from the spec. I'll add some comments quoting bits of the spec.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:603
>> +    uint16_t rangeShift = (kerningData.size() - roundedNumKerningPairs) * 6;
> 
> No idea where this magic number 6 comes from.

Comments added. Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:606
>> +        ++entrySelector;
> 
> I think it would be nicer to have a helper function for this rather than writing out the loop in this fashion.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:616
>> +    for (auto unicodeRange : unicodeRanges) {
> 
> Should be auto&, not auto.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:619
>> +                continue;
> 
> Why is this check needed? Is there some danger that the find call below will find something when passed in a huge number? It’s probably not a good idea to truncate a code point here, but I don’t understand how such bad values could get into UnicodeRanges in the first place. Also, 0x0000 and 0xFFFF will both cause trouble below in the has table code but this doesn’t disallow either of those.

It's just that the key in m_codepointToIndexMap is a Codepoint, which is 16 bits, but UnicodeRange contains unsigned values. If the UnicodeRange contains 0x10001, I don't want the find() to look for 0x1.

UnicodeRanges get created from the "unicode" property in the SVG file's <glyph> element, so it definitely can contain values > 0xFFFF.

I'll guard against 0 and 0xFFFF as well.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:620
>> +            auto indexIter = m_codepointToIndexMap.find(codepoint);
> 
> I think the word “iterator” would be better than word and a half “index iter”.
> 
> What prevents us from passing 0 or the deleted value (usually -1) here?

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:622
>> +                glyphSet.add(indexIter->value);
> 
> This is a really inefficient data structure for this kind of thing. Representing every single value in a range as a separate entry in a HashSet is super-expensive for large ranges.

You're right, and I knew that going in. My thought was that it didn't matter because I had to output glyph-by-glyph anyway and that I haven't seen ranges used in any of the fonts I've looked at. I think this is something that I can fix up in a subsequent patch. https://bugs.webkit.org/show_bug.cgi?id=137010

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:630
>> +        // FIXME: Only support BMP for now
> 
> Why? The code to handle any UTF-16 sequence is simple.

Done.

There are a few places in this file where non-BMP support needs to be added. This one is simple enough that I did it right now.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:631
>> +        if (codepoint.length() == 1 && codepoint.length() == 1 && codepoint.at(0) <= std::numeric_limits<Codepoint>::max()) {
> 
> This says codepoint.length() twice.

Well, that's embarrassing. "codepoint.length() == 1 && codepoint.length() == 1"

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:632
>> +            auto indexIter = m_codepointToIndexMap.find(codepoint.at(0));
> 
> What prevents us from trying to use 0x0000 or 0xFFFF here?

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:639
>> +void SVGToOTFFontConverter::addGlyphNames(const HashSet<String>& glyphNames, HashSet<uint16_t>& glyphSet) const
> 
> I’m not sure why we keep the glyph names in a set and then later put them into a map. It seems wasteful to use indexed data types for both. I suggest using just a Vector for the names in the kerningPair object.

Seems reasonable to me, though that class was not added by this patch. I've made a bug regarding cleaning this up: https://bugs.webkit.org/show_bug.cgi?id=137011

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:641
>> +    for (auto glyphName : glyphNames) {
> 
> This should be auto&.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:668
>> +                for (auto glyph2 : glyphSet2)
> 
> Should be auto& instead of auto.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:690
>> +        size_t sizeOfKerningDataTable = 14 + 6 * kerningData.size();
> 
> A little strange to put this into a size_t and then write it out as a 16-bit value without any range checking. I think the code overall is not consistent about things like clamping and range checking.

Isn't that what line 691 is?

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:693
>> +            sizeOfKerningDataTable = 0;
> 
> Seems like this should be 14.

Good catch. Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:714
>> +    // Table 2
> 
> This copy and paste code isn’t good. There should be a helper function rather than code that we just repeat twice.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:740
>> +    // parser claims to support both; however, it requires some padding bytes at the end if you use the second format. <rdar://problem/18401901>
> 
> I don’t think this should have all this commentary. Comment should say “Work around a bug in Apple's font parser by adding some padding bytes.” The commentary about two specs for 'kern' table doesn’t seem relevant here.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:964
>> +    for (const auto& glyphElement : childrenOfType<SVGGlyphElement>(m_fontElement)) {
> 
> Should just be auto&, not const auto&.

Done.
Comment 4 Myles C. Maxfield 2014-09-22 15:08:44 PDT
http://trac.webkit.org/changeset/173852