Bug 156792

Summary: Create a MathMLLength struct to handle the parsing of MathML length
Product: WebKit Reporter: Frédéric Wang (:fredw) <fred.wang>
Component: MathMLAssignee: Frédéric Wang (:fredw) <fred.wang>
Status: RESOLVED FIXED    
Severity: Normal CC: alex, benjamin, bfulgham, cfleizach, commit-queue, darin, dbarton, esprehn+autocc, glenn, kondapallykalyan, rego
Priority: P2    
Version: WebKit Nightly Build   
Hardware: All   
OS: All   
See Also: https://bugs.webkit.org/show_bug.cgi?id=160111
Bug Depends on:    
Bug Blocks: 118900, 156536, 156537, 156795, 159620, 159622, 159624    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch bfulgham: review+

Description Frédéric Wang (:fredw) 2016-04-20 04:44:03 PDT
This is a first step for bug 156536. For now, we will just move the MathML length parsing to a separate MathMLLength class and make parseMathMLLength call that function.

Later, MathMLLength should really be used as members of MathMLElement classes that are updated each time the attributes change (i.e. have similar role as SVGLength).
Comment 1 Frédéric Wang (:fredw) 2016-04-20 04:46:12 PDT
Created attachment 276817 [details]
Patch
Comment 2 Darin Adler 2016-04-20 09:08:05 PDT
Comment on attachment 276817 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=276817&action=review

I don’t think this is the right abstraction here; we should not be adding a header that defines a class for this. I am saying review+ because there is nothing wrong with this change, except there are tons of things I would do differently.

> Source/WebCore/mathml/MathMLLength.cpp:70
> +    : m_valueInSpecifiedUnits(0)
> +    , m_unit(MathMLLength::Unknown)

These values should be specified when defining the data members, not in the constructor.

> Source/WebCore/mathml/MathMLLength.cpp:79
> +    case MathMLLength::Unknown:

No need for the MathMLLength prefix inside a member of the class.

> Source/WebCore/mathml/MathMLLength.cpp:128
> +    const UChar firstChar = string[0];

No need for const here.

> Source/WebCore/mathml/MathMLLength.cpp:136
> +    const UChar secondChar = string[1];

No need for const here.

> Source/WebCore/mathml/MathMLLength.cpp:194
> +    String s = string.simplifyWhiteSpace();

I know this is existing code, but this is a bad idea. Should make this function work without the memory allocation of a new string just to turn runs of spaces into single spaces. Also, this function handles types of white space that should not be ignored, but the code should instead follow the HTML rule of what whitespace is from HTMLParserIdioms.h. Generally, this function should never be used in WebKit code.

> Source/WebCore/mathml/MathMLLength.cpp:205
> +    StringBuilder number;

Using a StringBuilder for this is an inefficient way to do it. A much more efficient way to do it is to use a Vector<LChar, xxx> where xxx is a size large enough to hold all normal numbers.

> Source/WebCore/mathml/MathMLLength.cpp:206
> +    String unit;

This should be a StringView, not a new String.

> Source/WebCore/mathml/MathMLLength.cpp:242
> +    if (type == MathMLLength::Unknown)
> +        return;

This if statement isn’t needed. The code already handles this case correctly even if we store a number in m_valueInSpecifiedUnits in this case and store the type. Thus the local variable isn’t needed.

> Source/WebCore/mathml/MathMLLength.h:23
> +#include <wtf/text/WTFString.h>

Should just include <wtf/Forward.h> instead of WTFString.h.

> Source/WebCore/mathml/MathMLLength.h:30
> +class MathMLLength {

We don’t gain anything by making this a class. Instead, the function should be moved into a separate file, but be left a function. If you want to have a class, that’s fine, but the class would be private and inside the .cpp file.

> Source/WebCore/mathml/MathMLLength.h:33
> +    MathMLLength(const String& valueAsString);

This should be marked explicit.

And in the long term should take a StringView rather than a String.

> Source/WebCore/mathml/MathMLLength.h:38
> +        Unknown,

I think the correct name for this would be ParseFailed.
Comment 3 Frédéric Wang (:fredw) 2016-04-20 09:43:07 PDT
Thanks for the review. Just to clarify the goal: The long term goal is to move the parsing of MathML attributes from the MathML renderer classes to the MathML element classes (bug 156536).

For MathML length attributes, the MathML element class will then store MathMLLength members (with their setValueAsString parsing function called when the attribute is set). Then the renderer classes can just call toUserUnits to quickly retrieve the length value. I just tried a first patch in bug 156795.

That seems closer to what SVG does with SVGLength but maybe that's overkill for MathML. What do you think?
Comment 4 Darin Adler 2016-04-21 09:14:19 PDT
If we want to add a type for storing parsed MathML lengths I suggest we not build parsing and value resolution into that same class.

It seems to me that there are three separate issues here:

1) A clean type for storing a MathML length including the value and the units. I would suggest a struct, but a class might be OK too. For clarity, I think type should probably not have a null value. We could use Optional when the length is optional. The default value could be some kind of valid zero length instead of a a null.

2) A parsing function that correctly parses a string into one of these lengths.

3) A rendering function that knows how to convert such a length into a LayoutUnit.

I would *not* build all three of these things into a single class. In fact, it seems that (1) and (2) should be part of DOM and (3) should be part of rendering.

I am not sure sure that storing the parsed values in the DOM is the right tradeoff, though. It means that we do parsing at the time an attribute is set, even if the attribute is set over and over again before we finally render. It might be more efficient to be lazy about the parsing if the result of the parsing is only exposed as a side effect of rendering.
Comment 5 Frédéric Wang (:fredw) 2016-04-21 09:33:37 PDT
(In reply to comment #4)
> If we want to add a type for storing parsed MathML lengths I suggest we not
> build parsing and value resolution into that same class.
> 
> It seems to me that there are three separate issues here:
> 
> 1) A clean type for storing a MathML length including the value and the
> units. I would suggest a struct, but a class might be OK too. For clarity, I
> think type should probably not have a null value. We could use Optional when
> the length is optional. The default value could be some kind of valid zero
> length instead of a a null.
> 
> 2) A parsing function that correctly parses a string into one of these
> lengths.
> 
> 3) A rendering function that knows how to convert such a length into a
> LayoutUnit.
> 
> I would *not* build all three of these things into a single class. In fact,
> it seems that (1) and (2) should be part of DOM and (3) should be part of
> rendering.

OK using a struct + two functions makes sense to me. I'm not attached to the class, I just tried to copy what SVG does but as I said that's probably overkill for MathML.

> I am not sure sure that storing the parsed values in the DOM is the right
> tradeoff, though. It means that we do parsing at the time an attribute is
> set, even if the attribute is set over and over again before we finally
> render. It might be more efficient to be lazy about the parsing if the
> result of the parsing is only exposed as a side effect of rendering.

So what do you suggest? IIUC (besides suggestions by other reviewers to do parsing in the DOM for consistency with the rest of the code), MathML renderers currently have to parse all its attributes very often e.g. when their style change or when one of its attribute change... which does not seem optimal.
Comment 6 Darin Adler 2016-04-21 09:46:41 PDT
We could cache the parsed results in the DOM, but compute them only when the rendering code asks for them. When the DOM attribute changes we would invalidate, but not recompute, the values.
Comment 7 Frédéric Wang (:fredw) 2016-04-21 09:56:15 PDT
(In reply to comment #6)
> We could cache the parsed results in the DOM, but compute them only when the
> rendering code asks for them. When the DOM attribute changes we would
> invalidate, but not recompute, the values.

So IIUC, by "parsed" you mean passed to setValueAsString and by "computed" you mean calculated with toUserUnits? If so, that was my intention.
Comment 8 Darin Adler 2016-04-21 13:34:52 PDT
Here’s what I mean, using your function names:

When the DOM attribute changes we would *not* call setValueAsString; we’d just invalidate the cached value if there was one. Later, when the the rendering code called the DOM to ask the the length, the DOM code would call setValueAsString and cache that value. Basically the "memoize" pattern.

The rendering code would call toUserUnits on the value it got from the DOM.
Comment 9 Frédéric Wang (:fredw) 2016-04-21 22:53:42 PDT
OK, now I see how to refactor the code to do this kind of lazy parsing. Thanks!
Comment 10 Frédéric Wang (:fredw) 2016-05-16 08:17:56 PDT
Created attachment 279018 [details]
Patch
Comment 11 Frédéric Wang (:fredw) 2016-05-17 04:08:18 PDT
Created attachment 279110 [details]
Patch
Comment 12 Frédéric Wang (:fredw) 2016-05-17 05:10:12 PDT
Created attachment 279112 [details]
Patch
Comment 13 Frédéric Wang (:fredw) 2016-07-06 06:26:51 PDT
@Darin: Since you did the review of the initial attempt, can you please take a look at the latest version of this patch?
Comment 14 Brent Fulgham 2016-07-07 12:38:28 PDT
Comment on attachment 279112 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=279112&action=review

I think this change looks good, but could be cleaner by partitioning some of the parsing logic into utility functions to reduce complexity and reduce the need for comments.

> Source/WebCore/mathml/MathMLElement.cpp:307
> +    if (string == "veryverythinmathspace")

Do these comparisons need to be case-sensitive?

> Source/WebCore/mathml/MathMLElement.cpp:342
> +        stringLength--;

I suggest the above be moved to a helper function called something like "skipLeadingAndTrailingWhitespace" that takes start and stringLength as references.

Also, could you just use string.stripWhiteSpace(isHTMLSpace) to do this work?

> Source/WebCore/mathml/MathMLElement.cpp:346
> +    // We consider the most typical case: a number followed by an optional unit.

Again, I think this would be more legible as a stand-alone function called "parseTypicalLengthStyle" that could also serve as this comment. It could return 'true' if we should return when it completes.

> Source/WebCore/mathml/MathMLElement.cpp:371
> +                lengthType = LengthType::Px;

It's a shame we can't share this code with the SVGLength class, or the logic in the CSSParser that does similar things.

> Source/WebCore/mathml/MathMLElement.h:64
> +        LengthType type = LengthType::ParsingFailed;

LengthType type { LengthType::ParsingFailed };
float value { 0 };
Comment 15 Frédéric Wang (:fredw) 2016-07-08 06:00:24 PDT
Comment on attachment 279112 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=279112&action=review

>> Source/WebCore/mathml/MathMLElement.cpp:307
>> +    if (string == "veryverythinmathspace")
> 
> Do these comparisons need to be case-sensitive?

Yes, MathML attribute values are case-sensitive. I'll add a comment.

>> Source/WebCore/mathml/MathMLElement.cpp:342
>> +        stringLength--;
> 
> I suggest the above be moved to a helper function called something like "skipLeadingAndTrailingWhitespace" that takes start and stringLength as references.
> 
> Also, could you just use string.stripWhiteSpace(isHTMLSpace) to do this work?

The old code used to rely on stripWhiteSpace, but Darin said above that we should try avoiding allocating new String and use StringView instead.

>> Source/WebCore/mathml/MathMLElement.cpp:371
>> +                lengthType = LengthType::Px;
> 
> It's a shame we can't share this code with the SVGLength class, or the logic in the CSSParser that does similar things.

Yes, indeed :-(
Comment 16 Frédéric Wang (:fredw) 2016-07-08 06:45:06 PDT
Created attachment 283147 [details]
Patch
Comment 17 Brent Fulgham 2016-07-11 13:40:24 PDT
Comment on attachment 283147 [details]
Patch

A very nice cleanup! Thanks for incorporating all of those comments. r=me.
Comment 18 Frédéric Wang (:fredw) 2016-07-11 21:23:39 PDT
Comment on attachment 283147 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=283147&action=review

> Source/WebCore/mathml/MathMLElement.h:61
> +    // Unitless values are intepreted as a multiple of a reference value.

interpreted
Comment 19 Frédéric Wang (:fredw) 2016-07-11 21:47:52 PDT
Committed r203106: <http://trac.webkit.org/changeset/203106>