Bug 136688 - Initial implementation of SVG to OTF font converter
Summary: Initial implementation of SVG to OTF font converter
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: SVG (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Myles C. Maxfield
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-09 16:50 PDT by Myles C. Maxfield
Modified: 2014-09-16 11:09 PDT (History)
14 users (show)

See Also:


Attachments
Patch (44.35 KB, patch)
2014-09-09 17:05 PDT, Myles C. Maxfield
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Myles C. Maxfield 2014-09-09 16:50:42 PDT
Initial implementation of SVG to OTF font converter
Comment 1 Myles C. Maxfield 2014-09-09 17:05:45 PDT
Created attachment 237873 [details]
Patch
Comment 2 Philip Rogers 2014-09-09 21:54:31 PDT
Wow, this is amazing work!

I worry that SVG fonts may not be worth this complexity though. SVG fonts are not supported in IE or Firefox and they were recently removed from Chromium (https://groups.google.com/a/chromium.org/d/msg/blink-dev/pYbbUcYvlYY/LQvFvM8KZZEJ). SVG fonts were also moved out of SVG2 (http://lists.w3.org/Archives/Public/www-svg/2013Jan/0030.html) and into a module.
Comment 3 Tim Horton 2014-09-09 22:30:35 PDT
Walled garden content might sway the equation slightly in a different direction for us, resulting in this neat experiment.
Comment 4 Myles C. Maxfield 2014-09-10 08:24:34 PDT
After running the benchmark on ToT for 21 hours, this is now at least a 500x speedup
Comment 5 Darin Adler 2014-09-10 09:07:10 PDT
Comment on attachment 237873 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=237873&action=review

Seems OK. I spotted quite a few mistakes.

This mixes types: Using uint16 as well as uint16_t.

This replicates parsing with subtly different rules from the rules used in the actual SVG font parsing code. All the hardcoded strings here and such might not be the best way to do the parsing.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:40
> +static inline void writeUInt32(Vector<char>& vector, uint32_t value)

I would just call this write32; could overload for different types (unsigned vs. signed).

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:43
> +    uint32_t bigEndianValue = htonl(value);
> +    vector.append(reinterpret_cast_ptr<char*>(&bigEndianValue), sizeof(bigEndianValue));

I think it would be better to just write:

    vector.append(value >> 24);
    vector.append(value >> 16);
    vector.append(value >> 8);
    vector.append(value);

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:46
> +static inline void writeUInt16(Vector<char>& vector, uint16_t value)

I would just call this write16; could overload for different types (unsigned vs. signed).

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:49
> +    uint16_t bigEndianValue = htons(value);
> +    vector.append(reinterpret_cast_ptr<char*>(&bigEndianValue), sizeof(bigEndianValue));

I think it would be better to just write:

    vector.append(value >> 8);
    vector.append(value);

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:60
> +    // Use unsigned chars to avoid wraparound problems
> +    unsigned char* ptr = reinterpret_cast_ptr<unsigned char*>(vector.data() + location);
> +    *(ptr++) = static_cast<unsigned char>(value >> 24);
> +    *(ptr++) = static_cast<unsigned char>(value >> 16);
> +    *(ptr++) = static_cast<unsigned char>(value >> 8);
> +    *(ptr++) = static_cast<unsigned char>(value >> 0);

The use of unsigned char here isn’t necessary. The function would work equally well if we didn’t do the reinterpret_cast. Also no need to do auto-increments instead of indexing.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:100
> +    Vector<char> createCMAPTable();
> +    Vector<char> createHEADTable();
> +    Vector<char> createHHEATable();
> +    Vector<char> createHMTXTable();
> +    Vector<char> createMAXPTable();
> +    Vector<char> createNAMETable();
> +    Vector<char> createOS2Table();
> +    Vector<char> createPOSTTable();
> +    Vector<char> createCFFTable();
> +    Vector<char> createVORGTable();

Seems to me that creating each table in a separate vector requires excessive memory allocation. An efficient design would be to have functions that append to an existing vector, like the write helper functions above, or use some kind of streaming abstraction. We’d first leave room for the offsets, then lay down the tables and write out the offsets after the fact. Rather than building all the tables in separate buffers just so we can get their sizes. Should be pretty easy to write it that way instead.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:181
> +    uint16 unitsPerEm = m_fontFaceElement ? m_fontFaceElement->unitsPerEm() : 0;

Elsewhere you use types like uint16_t, but here you use uint16, a less portable alternative.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:213
> +    uint16 ascent = std::numeric_limits<int16_t>::max();
> +    uint16 descent = std::numeric_limits<int16_t>::max();

Strange to use an unsigned integer, but default to signed 16-bit maximum.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:285
> +    for (unsigned i = 0; i < m_fontFamily.length(); ++i)
> +        writeUInt16(result, m_fontFamily[i]);

Should use a modern for loop.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:295
> +    const auto& attribute = m_fontElement.getAttribute("horiz-adv-x");

auto& would be fine; const adds nothing here.

fastGetAttribute. Also good

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:299
> +    int value = attribute.toInt(&ok);
> +    if (ok)
> +        averageAdvance = value;

This will overflow if the int is too big for a 16-bit integer.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:372
> +    if (string.isEmpty())
> +        return true;

No need to optimize this case. Just remove this code.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:374
> +    if (!string.is8Bit())
> +        return false;

This doesn’t make sense. We should not be checking this. It’s possible for a 16-bit string to have all value characters but just happen to be stored in 16-bit format. Just remove this incorrect code.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:392
> +    String fontName = String();

"= String()" is an no-op. We should remove it.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:399
> +    ASSERT(fontName.is8Bit());

This is wrong. It’s not right to do this.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:404
> +    result.append(fontName.characters8(), fontName.length());

We should write a function that knowns how to append all the characters one at a time. It’s not right to use characters8. Even if the string has all Latin-1 characters it’s not guaranteed to be stored 8-bit.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:408
> +        const auto& potentialWeight = m_fontFaceElement->fastGetAttribute(SVGNames::font_weightAttr);

The const adds nothing here.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:471
> +        ASSERT(weight.is8Bit());

Same problem with incorrect assertion here.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:476
> +    result.append(fontName.characters8(), fontName.length());
> +    result.append(weight.characters8(), weight.length());

Same problem with incorrect use of characters8 here.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:492
> +    for (const auto& glyph : m_glyphs) {

I would omit the const here.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:496
> +    for (const auto& glyph : m_glyphs)

I would omit the const here.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:520
> +            const auto& attribute = m_glyphs[i].glyphElement->fastGetAttribute(SVGNames::vert_origin_yAttr);

const here does nothing

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:522
> +                if (int16_t verticalOriginY = atoi(reinterpret_cast_ptr<const char*>(attribute.characters8())))

Incorrect to use atoi here. Incorrect to check is8Bit and use characters8 here. Need to use something more like the toInt code path above.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:531
> +        writeUInt16(result, p.first);
> +        writeUInt16(result, p.second);

I suggest just appending these as we go, and then going back to patch in the size, rather than building up a Vector<pair> just so we can count the sizes.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:538
> +    int raw = number * pow(2, 16);

Math here will be done as double rather than flow, since pow returns a double. Is that what we want?

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:554
> +        // FIXME: We probably want the initial moveto to use horiz-origin-x and horiz-origin-y... unless we're vertical...

Not sure what the "..." mean here.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:696
> +    for (const auto& glyph : childrenOfType<SVGGlyphElement>(m_fontElement)) {

The const here adds nothing.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:697
> +        const auto& unicodeAttribute = glyph.getAttribute("unicode");

Should use fastGetAttribute. The const here adds nothing.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:701
> +            const auto& advanceAttribute = glyph.fastGetAttribute(SVGNames::horiz_adv_xAttr);

The const here adds nothing.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:723
> +    std::sort(m_glyphs.begin(), m_glyphs.end(), &compareGlyphData);

I suggest a lambda here rather than a static member function.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:727
> +        const auto& fontWeightAttribute = m_fontFaceElement->fastGetAttribute(SVGNames::font_weightAttr);

The const here isn’t helpful.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:729
> +        String(fontWeightAttribute).split(" ", split);

Instead of:

    String(fontWeightAttribute)

Should use:

    fontWeightAttribute.string()

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:730
> +        for (const auto& s : split) {

Please don't use a single character here.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:759
> +    const uint32_t* ptr = reinterpret_cast_ptr<const uint32_t*>(table.data());
> +    const uint32_t* const endPtr = reinterpret_cast_ptr<const uint32_t*>(table.data() + table.size());
> +    for (; ptr < endPtr; ++ptr)
> +        sum += *ptr;

This is endian-specific checksumming, and it’s not going to work right on both little-endian and big-endian systems. Since we go to the trouble of doing big-endian appending, it’s clear that the checksumming is well defined as either little or big-endian.

It’s easy to write code that does that correctly, buildin each 32-bit number out of four characters we extract with unsigned shifting, rather than doing the reinterpret_cast style. If big-endian it would be:

    for (size_t i = 0; i < table.size(); i += 4) {
        sum += (static_cast<unsigned char>(table[i]) << 24)
            | (static_cast<unsigned char>(table[i + 1]) << 16)
            | (static_cast<unsigned char>(table[i + 2]) << 8)
            | static_cast<unsigned char>(table[i + 3]);
    }

If little-endian then:

    for (size_t i = 0; i < table.size(); i += 4) {
        sum += (static_cast<unsigned char>(table[i + 3]) << 24)
            | (static_cast<unsigned char>(table[i + 2]) << 16)
            | (static_cast<unsigned char>(table[i + 1]) << 8)
            | static_cast<unsigned char>(table[i]);
    }

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:766
> +    uint32_t size = table.size();

I don’t think size is a good name for this; should give a different name.

> Source/WebCore/svg/SVGToOTFFontConversion.cpp:768
> +        table.append(static_cast<char>(0));

Why is this static_cast needed? It should compile fine without this.

> Source/WebCore/svg/SVGToOTFFontConversion.h:29
> +#include <WTF/Vector.h>

Needs to be <wtf/Vector.h> for case sensitive file systems.
Comment 6 Myles C. Maxfield 2014-09-10 19:29:45 PDT
Comment on attachment 237873 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=237873&action=review

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:40
>> +static inline void writeUInt32(Vector<char>& vector, uint32_t value)
> 
> I would just call this write32; could overload for different types (unsigned vs. signed).

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:43
>> +    vector.append(reinterpret_cast_ptr<char*>(&bigEndianValue), sizeof(bigEndianValue));
> 
> I think it would be better to just write:
> 
>     vector.append(value >> 24);
>     vector.append(value >> 16);
>     vector.append(value >> 8);
>     vector.append(value);

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:46
>> +static inline void writeUInt16(Vector<char>& vector, uint16_t value)
> 
> I would just call this write16; could overload for different types (unsigned vs. signed).

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:49
>> +    vector.append(reinterpret_cast_ptr<char*>(&bigEndianValue), sizeof(bigEndianValue));
> 
> I think it would be better to just write:
> 
>     vector.append(value >> 8);
>     vector.append(value);

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:60
>> +    *(ptr++) = static_cast<unsigned char>(value >> 0);
> 
> The use of unsigned char here isn’t necessary. The function would work equally well if we didn’t do the reinterpret_cast. Also no need to do auto-increments instead of indexing.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:100
>> +    Vector<char> createVORGTable();
> 
> Seems to me that creating each table in a separate vector requires excessive memory allocation. An efficient design would be to have functions that append to an existing vector, like the write helper functions above, or use some kind of streaming abstraction. We’d first leave room for the offsets, then lay down the tables and write out the offsets after the fact. Rather than building all the tables in separate buffers just so we can get their sizes. Should be pretty easy to write it that way instead.

I did it to simplify the code that writes the directory table entries, however, I think you're right. I'll do this tomorrow.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:181
>> +    uint16 unitsPerEm = m_fontFaceElement ? m_fontFaceElement->unitsPerEm() : 0;
> 
> Elsewhere you use types like uint16_t, but here you use uint16, a less portable alternative.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:213
>> +    uint16 descent = std::numeric_limits<int16_t>::max();
> 
> Strange to use an unsigned integer, but default to signed 16-bit maximum.

Typo; done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:285
>> +        writeUInt16(result, m_fontFamily[i]);
> 
> Should use a modern for loop.

String doesn't support the modern for loop syntax (for accessing characters). Perhaps this support could be added in a later patch.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:295
>> +    const auto& attribute = m_fontElement.getAttribute("horiz-adv-x");
> 
> auto& would be fine; const adds nothing here.
> 
> fastGetAttribute. Also good

Done.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:299
>> +        averageAdvance = value;
> 
> This will overflow if the int is too big for a 16-bit integer.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:372
>> +        return true;
> 
> No need to optimize this case. Just remove this code.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:374
>> +        return false;
> 
> This doesn’t make sense. We should not be checking this. It’s possible for a 16-bit string to have all value characters but just happen to be stored in 16-bit format. Just remove this incorrect code.

Yeah, you're right. Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:392
>> +    String fontName = String();
> 
> "= String()" is an no-op. We should remove it.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:399
>> +    ASSERT(fontName.is8Bit());
> 
> This is wrong. It’s not right to do this.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:404
>> +    result.append(fontName.characters8(), fontName.length());
> 
> We should write a function that knowns how to append all the characters one at a time. It’s not right to use characters8. Even if the string has all Latin-1 characters it’s not guaranteed to be stored 8-bit.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:408
>> +        const auto& potentialWeight = m_fontFaceElement->fastGetAttribute(SVGNames::font_weightAttr);
> 
> The const adds nothing here.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:471
>> +        ASSERT(weight.is8Bit());
> 
> Same problem with incorrect assertion here.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:476
>> +    result.append(weight.characters8(), weight.length());
> 
> Same problem with incorrect use of characters8 here.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:492
>> +    for (const auto& glyph : m_glyphs) {
> 
> I would omit the const here.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:496
>> +    for (const auto& glyph : m_glyphs)
> 
> I would omit the const here.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:520
>> +            const auto& attribute = m_glyphs[i].glyphElement->fastGetAttribute(SVGNames::vert_origin_yAttr);
> 
> const here does nothing

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:522
>> +                if (int16_t verticalOriginY = atoi(reinterpret_cast_ptr<const char*>(attribute.characters8())))
> 
> Incorrect to use atoi here. Incorrect to check is8Bit and use characters8 here. Need to use something more like the toInt code path above.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:538
>> +    int raw = number * pow(2, 16);
> 
> Math here will be done as double rather than flow, since pow returns a double. Is that what we want?

Nope. Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:554
>> +        // FIXME: We probably want the initial moveto to use horiz-origin-x and horiz-origin-y... unless we're vertical...
> 
> Not sure what the "..." mean here.

Removed.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:696
>> +    for (const auto& glyph : childrenOfType<SVGGlyphElement>(m_fontElement)) {
> 
> The const here adds nothing.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:697
>> +        const auto& unicodeAttribute = glyph.getAttribute("unicode");
> 
> Should use fastGetAttribute. The const here adds nothing.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:701
>> +            const auto& advanceAttribute = glyph.fastGetAttribute(SVGNames::horiz_adv_xAttr);
> 
> The const here adds nothing.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:723
>> +    std::sort(m_glyphs.begin(), m_glyphs.end(), &compareGlyphData);
> 
> I suggest a lambda here rather than a static member function.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:727
>> +        const auto& fontWeightAttribute = m_fontFaceElement->fastGetAttribute(SVGNames::font_weightAttr);
> 
> The const here isn’t helpful.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:729
>> +        String(fontWeightAttribute).split(" ", split);
> 
> Instead of:
> 
>     String(fontWeightAttribute)
> 
> Should use:
> 
>     fontWeightAttribute.string()

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:730
>> +        for (const auto& s : split) {
> 
> Please don't use a single character here.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:759
>> +        sum += *ptr;
> 
> This is endian-specific checksumming, and it’s not going to work right on both little-endian and big-endian systems. Since we go to the trouble of doing big-endian appending, it’s clear that the checksumming is well defined as either little or big-endian.
> 
> It’s easy to write code that does that correctly, buildin each 32-bit number out of four characters we extract with unsigned shifting, rather than doing the reinterpret_cast style. If big-endian it would be:
> 
>     for (size_t i = 0; i < table.size(); i += 4) {
>         sum += (static_cast<unsigned char>(table[i]) << 24)
>             | (static_cast<unsigned char>(table[i + 1]) << 16)
>             | (static_cast<unsigned char>(table[i + 2]) << 8)
>             | static_cast<unsigned char>(table[i + 3]);
>     }
> 
> If little-endian then:
> 
>     for (size_t i = 0; i < table.size(); i += 4) {
>         sum += (static_cast<unsigned char>(table[i + 3]) << 24)
>             | (static_cast<unsigned char>(table[i + 2]) << 16)
>             | (static_cast<unsigned char>(table[i + 1]) << 8)
>             | static_cast<unsigned char>(table[i]);
>     }

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:766
>> +    uint32_t size = table.size();
> 
> I don’t think size is a good name for this; should give a different name.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.cpp:768
>> +        table.append(static_cast<char>(0));
> 
> Why is this static_cast needed? It should compile fine without this.

Done.

>> Source/WebCore/svg/SVGToOTFFontConversion.h:29
>> +#include <WTF/Vector.h>
> 
> Needs to be <wtf/Vector.h> for case sensitive file systems.

Done.
Comment 7 Myles C. Maxfield 2014-09-16 11:09:36 PDT
http://trac.webkit.org/changeset/173521