Bug 25068

Summary: [Chromium] Complex text support for Chromium Linux
Product: WebKit Reporter: Adam Langley <agl>
Component: Layout and RenderingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: jshin, playmobil
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Linux   
Attachments:
Description Flags
patch
eric: review-
patch
none
/tmp/patch
levin: review-
patch eric: review-

Description Adam Langley 2009-04-06 18:12:12 PDT
This patch enables complex text support using the Harfbuzz library and Skia.
Comment 1 Adam Langley 2009-04-06 18:12:57 PDT
Created attachment 29298 [details]
patch
Comment 2 Eric Seidel (no email) 2009-05-14 22:12:30 PDT
Comment on attachment 29298 [details]
patch

Various style issues.

1.  "helper" is not a very discriptive name. :)  Maybe TextRunWalkerHarfbuzz  TexRunWalker TextRunIterator?  Since the comment seems to imply its job is to walk text runs. :)

Lots of foo_bar style violations.  fooBar in WK

It seems HarfbuzzHelper (renamed?) could have its own header (and cpp?) file.  It's big enough.

Sad that HB_ShaperItem can't use OwnPtr. :(

Style:
 264     bool NextRTLScriptRun()

 307     ssize_t m_iter;  // indexes the next word in |m_run|
We generally try to use full words like m_indexOfNextWord?

 310     unsigned m_width;  // width (in px) of the current script run
m_widthInPx;  m_pixelWidth;

It seems all these things which pertain to m_run should be grouped with it, no? (and least in ordering, if not also in structure)

I assume this is in no way temporary?
 321     const FontPlatformData& font = fontDataForCharacters(run.characters(), run.length())->fontDataForCharacter(' ')->platformData();

You probably want to grab off gc->platformContext() since you use it so much:
 323     SkCanvas* canvas = gc->platformContext()->canvas();

I've not seen other parts of the WebCore code use "const" for local variables:
 325     const bool fill = textMode & cTextFill;

Really?
377     // (Mac code ignores includePartialGlyphs, and they don't know what it's
 378     // supposed to do, so we just ignore it as well.)
ignored arguments generally cause warnings in the compiler, no?

What is "i" here?
 424             int i;
 425             if (harfbuzz.rtl()) {
Needs a better name.
Comment 3 Eric Seidel (no email) 2009-05-14 22:14:55 PDT
Comment on attachment 29298 [details]
patch

Bah,  I hit submit early.

All the places you use "harfbuzz" probably should hav ea differnet name.  We don't care that we're using "harfbuzz" more that we're iterating over the text runs or?

What is iter doing here?
 393         ssize_t iter = 0;
 394         while (iter < run.length()) {

Seems that too needs a better name.

r- for all the nits above.
Comment 4 Adam Langley 2009-06-16 12:16:15 PDT
Created attachment 31363 [details]
patch

Points not addressed in code:

> Really?
> 377     // (Mac code ignores includePartialGlyphs, and they don't know what
it's
> 378     // supposed to do, so we just ignore it as well.)
> ignored arguments generally cause warnings in the compiler, no?

This comment was taken from the Windows code. The ignored argument doesn't trigger a warning for us in either place.

> I assume this is in no way temporary?
>  321     const FontPlatformData& font = fontDataForCharacters(run.characters(), run.length())->fontDataForCharacter(' ')->platformData();

It seems to be the way to get what I need.


Cheers

AGL
Comment 5 Eric Seidel (no email) 2009-06-16 12:18:44 PDT
(In reply to comment #4)
> > I assume this is in no way temporary?
> >  321     const FontPlatformData& font = fontDataForCharacters(run.characters(), run.length())->fontDataForCharacter(' ')->platformData();
> 
> It seems to be the way to get what I need.

I meant that the value was not a temporary value.  You're grabbing a reference to something that is likely owned by one of those objects earlier in the line.  Why do we know it's still alive a couple lines later?
Comment 6 Adam Langley 2009-06-16 13:52:16 PDT
To be perfectly honest, there's some amount of hope, copying other ports and "look, valgrind says it's ok". On a less hand-wavy note, the root of that font data is the Font object, and a reference to it never leaves a member function of the Font object.
Comment 7 Adam Langley 2009-06-17 14:11:48 PDT
Created attachment 31430 [details]
/tmp/patch

Changes:

* Add createGlyphArrays and deleteGlyphArrays to remove some code duplication.
* Removed some enum cases where they weren't needed.
* BUG (typo): stroked complex text would crash.
* BUG (typo): LTR complex text could hit an assertion.
* BUG: highlighting an LTR complex text run wouldn't highlight the last character.'
Comment 8 David Levin 2009-06-23 17:47:01 PDT
Comment on attachment 31430 [details]
/tmp/patch

Many of my comments are style related and I'm doing an r- because there are quite a few to fix up.


Even though I have tried to verify the logic, it would be great if you got someone on the chromium team more familiar with text handling to look over the code.  Perhaps jshin or playmobil would be good choices.



> diff --git a/WebCore/ChangeLog b/WebCore/ChangeLog

> +2009-06-16  Adam Langley  <agl@google.com>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        Chromium: Add complex text support on Linux.

Add link to bug here.

> +        (WebCore::GetGlyphMetrics):
> +        (WebCore::GetFontMetric):
> +        (WebCore::):
This appears to be a prepare-Changelog goof, so just delete it (or fix it if you figure out what function this should have been).


> diff --git a/WebCore/platform/graphics/chromium/FontLinux.cpp b/WebCore/platform/graphics/chromium/FontLinux.cpp

> +extern HB_Error harfbuzzSkiaGetTable(void* voidface, const HB_Tag tag, HB_Byte* buffer, HB_UInt* len);

I'd omit the param name "tag" since it doesn't appear to add any info.


> +class TextRunWalker {
> +public:
> +    TextRunWalker(const TextRun& run, SkPaint* paint, unsigned startingX, const FontPlatformData* font)
> +        : m_run(run)
> +        , m_paint(paint)

Why does this store m_paint and then never use it?



> +        m_item.face = HB_NewFace(const_cast<FontPlatformData*>(font), harfbuzzSkiaGetTable);
> +        m_item.font = (HB_FontRec*) fastMalloc(sizeof(HB_FontRec));

Use c++ style casts instead of c style casts.  This place and several others.



> +
> +    void setBackwardsIteration(bool is_backwards)
is_backwards 
useCasingLikeThis :)
.


> +        return m_item.glyphs
> +            && m_item.attributes
> +            && m_item.advances
> +            && m_item.offsets
> +            && m_glyphs16
> +            && m_xpos;

This looks rather narrow to me. (You can got beyond 80 columns if desired but if you find this more readable for this item leave it as is.)


>
> +
> +            // We overflowed our arrays. Resize and retry.
> +            if (!expandGlyphArrays())
> +                return false;
> +        }
> +
> +        HB_Fixed adv_sum = 0;
Two problems with "adv_sum"
1. abbrvtn
2. theCasing


> +        for (unsigned i = 0; i < m_item.num_glyphs; ++i) {
> +            m_glyphs16[i] = m_item.glyphs[i];
> +            adv_sum += m_item.advances[i] >> 6;
> +        }

6 is scattered through this code.  (dividing by 64.)  I'm not sure why.
It would be good to create a static const int/unsigned for this which will help explain what this is by its name and get rid of the 6's all over the place.



> +    uint16_t* m_glyphs16;  // a vector of 16-bit glyph ids
One space before end of line comments.
Also use sentences (or at least a sentence like format). Start with capital and end with period.

> +    SkScalar* m_xpos;  // a vector of x positions for each glyph
> +    SkPaint* const m_paint;
> +    ssize_t m_indexOfNextScriptRun;  // indexes the script run in |m_run|
> +    const unsigned m_startingX;
> +    unsigned m_offsetX;
> +    unsigned m_pixelWidth;  // width (in px) of the current script run
> +    unsigned m_numCodePoints;  // code points in current script run
> +    unsigned m_maxGlyphs;
> +    bool m_iterateBackwards;
> +};
> +
> +void Font::drawComplexText(GraphicsContext* gc, const TextRun& run,
>                             const FloatPoint& point, int from, int to) const
>  {
> -    notImplemented();
> +    if (run.length() == 0)
Use
    if (run.length())

(avoid comparisons to 0)

> +      return;
> +
> +    const FontPlatformData& font = fontDataForCharacters(run.characters(), run.length())->fontDataForCharacter(' ')->platformData();
> +    PlatformGraphicsContext* pgc = gc->platformContext();

Avoid abbreviations "pgc". 
Maybe graphicsContext instead?



> +    bool stroke = (textMode & cTextStroke)
> +               && pgc->getStrokeStyle() != NoStroke
> +               && pgc->getStrokeThickness() > 0;
Why not put this all on one line?
(If you leave it like this, then the continuation lines should only be indented 4 spaces.)



> +
> +    TextRunWalker walker(run, stroke ? &strokePaint : &fillPaint, point.x(), &font);

It looks like fill and stroke may both be true?  Why is only one passed into to this class? 
(Actually I would just get rid of that parameter from TextRunWalker.)




> +                    return basePosition + j;
> +            }
> +
> +            ASSERT(false);
s/ ASSERT(false);/ASSERT_NOT_REACHED();/



>  
> +// Return the rectangle for selecting the given range of code-points in the
> +// TextRun.

This is an odd line break.  In WebKit code, you are certainly not limited to 80 columns

>  FloatRect Font::selectionRectForComplexText(const TextRun& run,
>                                              const IntPoint& point, int h,
"h" use unabbreviated names.


>
> +    // the position in question might be just after the text
Use sentence formatting.



> +++ b/WebCore/platform/graphics/chromium/HarfbuzzSkia.cpp

> + * Copyright (c) 2009, Google Inc. All rights reserved.

Change like this: "Copyright (C) 2009 Google Inc. All rights reserved."



> +#include <harfbuzz-shaper.h>

Previously you did #include "harfbuzz-shaper.h"

> +
> +static HB_Fixed SkiaScalarToHarfbuzzFixed(SkScalar v)

What does "v" represent?  (Please change the variable name to something better.)


> +static HB_Bool StringToGlyphs(HB_Font hbFont, const HB_UChar16* chars, hb_uint32 len, HB_Glyph* glyphs, hb_uint32* glyphsSize, HB_Bool isRTL)

Avoid abbreviations.  Change variable names to something like:
character
length


> +        memcpy(&value, reinterpret_cast<uint8_t*>(glyphs) + sizeof(uint16_t) * i, sizeof(uint16_t));

reinterpret_cast<uint8_t*>(glyphs) + sizeof(uint16_t) * i

This looks wrong, but I get it. What do you think about this instead?

reinterpret_cast<uint16_t*>(glyphs) + i



> +static void GlyphsToAdvances(HB_Font hbFont, const HB_Glyph* glyphs, hb_uint32 numGlyphs, HB_Fixed* advances, int flags)

flags appears unused.  Is that a bug?


> +        // We use a memcpy to avoid breaking strict aliasing rules.
> +        memcpy(&value, reinterpret_cast<uint8_t*>(advances) + sizeof(float) * i, sizeof(float));

Same comment about "reinterpret_cast<uint8_t*>(advances) + sizeof(float) * i" as before.




> +    font->setupPaint(&paint);
> +    paint.setTextEncoding(SkPaint::kUTF16_TextEncoding);
> +
> +    uint16_t* glyphs16 = (uint16_t*) fastMalloc(sizeof(uint16_t) * len);

Why the use of fastMalloc vs plain old new?
And if you switch to new, it would be good to adopt OwnPtr (from wtf/OwnPtr.h). You can think of this like a scoped_ptr.



> +    for (int i = 0; i < numGlyphs; ++i) {
> +        if (glyphs16[i] == 0) {

Avoid comparisons to 0.

> +            canRender = false;

> +
> +static HB_Error GetOutlinePoint(HB_Font hbFont, HB_Glyph glyph, int flags, hb_uint32 point, HB_Fixed* xpos, HB_Fixed* ypos, hb_uint32* nPoints)

s/xpos/xPos/
s/ypos/yPos/


What is nPoints?  It looks like an abbreviation.


> +    SkPath path;
> +    paint.getTextPath(&glyph16, sizeof(glyph16), 0, 0, &path);
> +    int numPoints = path.getPoints(NULL, 0);

Use 0 instead of NULL.


> +static HB_Fixed GetFontMetric(HB_Font hbFont, HB_FontMetric metric)
> +{
...
> +    switch (metric) {
> +    case HB_FontAscent:
> +        return SkiaScalarToHarfbuzzFixed(-skiaMetrics.fAscent);
> +    default:
> +        return 0;
> +    }

This looks odd to me.  Why a switch instead of an "if"? (since there is only one case statement)
Comment 9 Adam Langley 2009-06-24 14:13:19 PDT
Created attachment 31811 [details]
patch

On Tue, Jun 23, 2009 at 5:47 PM, <bugzilla-daemon@webkit.org> wrote:
> Even though I have tried to verify the logic, it would be great if you got
> someone on the chromium team more familiar with text handling to look over the
> code.  Perhaps jshin or playmobil would be good choices.

jshin declined to review it. At this point, having something is better than nothing. We're have security bugs which rely on our non-painting of complex text.

> Why does this store m_paint and then never use it?

It does use it, but the code got moved to the constructor. I've removed the member variable.

> 6 is scattered through this code.  (dividing by 64.)  I'm not sure why.
> It would be good to create a static const int/unsigned for this which will help
> explain what this is by its name and get rid of the 6's all over the place.

Have added a static function. I hope it doesn't introduce any signness bugs.

> Avoid abbreviations "pgc".
> Maybe graphicsContext instead?

I only did this because Eric asked. Changing it back then.

> (If you leave it like this, then the continuation lines should only be indented
> 4 spaces.)

The style guide has the operators lining up on the left hand side.

> reinterpret_cast<uint8_t*>(glyphs) + sizeof(uint16_t) * i
>
> This looks wrong, but I get it. What do you think about this instead?

Sorry, it is wrong. It should a char*, I thought that unsigned char* was special cased in the strict alias rules but, now that I check, it doesn't appear so.

>> +static void GlyphsToAdvances(HB_Font hbFont, const HB_Glyph* glyphs, hb_uint32 numGlyphs, HB_Fixed* advances, int flags)
>
> flags appears unused.  Is that a bug?

It's not used in the Harfbuzz reference code either.

> Why the use of fastMalloc vs plain old new?
> And if you switch to new, it would be good to adopt OwnPtr (from wtf/OwnPtr.h).
> You can think of this like a scoped_ptr.

I think I can use OwnArrayPtr.

>> +    SkPath path;
>> +    paint.getTextPath(&glyph16, sizeof(glyph16), 0, 0, &path);
>> +    int numPoints = path.getPoints(NULL, 0);
>
> Use 0 instead of NULL.

I know that style guide says this, but this is so manifestly wrong in this case that I'm going to leave it.

>> +    switch (metric) {
>> +    case HB_FontAscent:
>> +        return SkiaScalarToHarfbuzzFixed(-skiaMetrics.fAscent);
>> +    default:
>> +        return 0;
>> +    }
>
> This looks odd to me.  Why a switch instead of an "if"? (since there is only
> one case statement)

I'm going to leave it as a switch statement because it makes it clearer that we may need to support other values in the future.


AGL
Comment 10 David Levin 2009-06-25 12:45:54 PDT
I pinged jshin@chromium.org and he agreed to look this over.
Comment 11 Jungshik Shin 2009-06-25 12:47:51 PDT
David alerted me about this I didn't know about before (unless my memory is really faulty :-)). I'll take a look. 
Comment 12 Adam Langley 2009-06-25 13:12:53 PDT
jshik: I did email you ages ago. I think you even responded :) But thanks for taking a look. To be clear, there are known bugs in this code which will need to some work to fix, but this patch has been outstanding for months and I just want to land it at this point. Some complex text support is better than none at the moment.
Comment 13 Eric Seidel (no email) 2009-06-29 14:17:24 PDT
Comment on attachment 31811 [details]
patch

God I hate C.

Style:
 322     bool NextRTLScriptRun()
 352     bool NextLTRScriptRun()

Why copy/paste here?

 571         if (fromX == -1 && from < walker.numCodePoints()) {
 572             // |from| is within this script run. So we index the clusters log to
 573             // find which glyph this code-point contributed to and find its x
 574             // position.
 575             int glyph = walker.logClusters()[from];
 576             fromX = base + walker.xPositions()[glyph];
 577             fromAdvance = walker.advances()[glyph];
 578         } else
 579             from -= walker.numCodePoints();
 580 
 581         if (toX == -1 && to < walker.numCodePoints()) {
 582             int glyph = walker.logClusters()[to];
 583             toX = base + walker.xPositions()[glyph];
 584             toAdvance = walker.advances()[glyph];
 585         } else
 586             to -= walker.numCodePoints();

Again:

 593     // The position in question might be just after the text.
 594     const int rightEdge = base;
 595     if (fromX == -1 && !from)
 596         fromX = leftEdge;
 597     else if (walker.rtl())
 598        fromX += truncateFixedPointToInteger(fromAdvance);
 599 
 600     if (toX == -1 && !to)
 601         toX = rightEdge;
 602     else if (!walker.rtl())
 603         toX += truncateFixedPointToInteger(toAdvance);

Style:
 51 static HB_Fixed SkiaScalarToHarfbuzzFixed(SkScalar value)
 57 static HB_Bool StringToGlyphs(HB_Font hbFont, const HB_UChar16* characters, hb_uint32 length, HB_Glyph* glyphs, hb_uint32* glyphsSize, HB_Bool isRTL)
 79 static void GlyphsToAdvances(HB_Font hbFont, const HB_Glyph* glyphs, hb_uint32 numGlyphs, HB_Fixed* advances, int flags)
 105 static HB_Bool CanRender(HB_Font hbFont, const HB_UChar16* characters, hb_uint32 length)


Style:
197 HB_FontClass harfbuzzSkiaClass = {
 198   StringToGlyphs,
 199   GlyphsToAdvances,
 200   CanRender,
 201   GetOutlinePoint,
 202   GetGlyphMetrics,
 203   GetFontMetric,
 204 };

This would probably have been easier in smaller pieces?  This code is pretty ugly.  Mostly because it's C... and hard enough to stomach in small chunks.  Silly manual memory management.

I still don't really know what this all is.  But at some level it doesn't matter.  This is your port, and I don't need to be a gate keeper.  Please post a new patch with fixed style though.
Comment 14 Eric Seidel (no email) 2009-06-29 16:03:57 PDT
I reviewed this with Adam in person.  Adam will land the reviewed version soon.