Bug 99620

Summary: [MathML] Implement the MathML Operator Dictionary
Product: WebKit Reporter: Dave Barton <dbarton>
Component: MathMLAssignee: Frédéric Wang (:fredw) <fred.wang>
Severity: Normal CC: aboxhall, apinheiro, bfulgham, cfleizach, commit-queue, darin, dmazzoni, donggwan.kim, esprehn+autocc, fred.wang, glenn, jcraig, jdiggs, kondapallykalyan, mario, mrobinson, samuel_white
Priority: P2 Keywords: WebExposed
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
URL: http://www.w3.org/TR/MathML3/appendixc.html
Bug Depends on: 124841    
Bug Blocks: 84019, 124828, 124829, 124830, 124831, 99623, 115786, 115787, 118737, 118843, 122567, 124826, 124827, 124836, 124838, 126842    
Description Flags
WIP Patch
WIP Patch
WIP Patch
WIP Patch
script to generate the operator dictionary
script to generate the operator dictionary
WIP Patch
Patch V1
Patch V2
Patch V3
Patch V4 cfleizach: review+

Description Dave Barton 2012-10-17 11:21:22 PDT
The operator dictionary includes default spacing values (CSS margins?) around various infix, prefix, and postfix operators. See <http://www.w3.org/TR/MathML3/appendixc.html>.
Comment 1 Frédéric Wang (:fredw) 2013-11-19 08:25:27 PST
Created attachment 217299 [details]
WIP Patch
Comment 2 Frédéric Wang (:fredw) 2013-11-20 09:10:54 PST
Created attachment 217441 [details]
WIP Patch
Comment 3 Frédéric Wang (:fredw) 2013-11-20 09:23:21 PST
So I've attached a WIP patch for this bug.

I'd like to get feedback about how to create the Operator Dictionary. IIRC, there are ~1000 entries so I've used a hash table. I have an XSLT stylesheet to extract them from the big unicode.xml of http://www.w3.org/TR/xml-entity-names/ and I can easily convert into whatever formats.

I wonder if I should hardcode the table in some (probably separate) C++ file (using macros?) or in a text file that is parsed at init. Should I include the generation of the dictionary in the build system? What is the standard way to do such thing in WebKit?
Comment 4 Frédéric Wang (:fredw) 2013-11-21 08:48:59 PST
Created attachment 217574 [details]
WIP Patch
Comment 5 Darin Adler 2013-11-21 13:49:51 PST
I think a sorted constant array compiled into the binary, where we do lookups with a binary search, would use much less memory than a hash table, be much smaller than the giant set-up function, not require a special shell script or dependency on xslt, and still have quite good performance.
Comment 6 Frédéric Wang (:fredw) 2013-11-22 00:56:09 PST
Created attachment 217656 [details]
WIP Patch

Thanks Darin, I've rewritten the code using a hardcoded table and binary search.

My next issue is to find how to detect the operator form automatically (infix, prefix, postfix). At the moment, it seems to always be postfix when it is not explicitly specified in the MathML code. I'm guessing that when the <mo> renderer is being constructed, it is appended at the end of its parent <mrow> renderer and so does not have any next siblings yet, and so is "postfix". It looks like it will be time to refactor the code with updateFromElement and to make addChild/removeChild works with dynamic changes. I'll come back to this next week but any suggestion is welcome.
Comment 7 Frédéric Wang (:fredw) 2013-11-22 00:56:54 PST
Created attachment 217657 [details]
script to generate the operator dictionary

For the record, here is the script that generates the operator dictionary.
Comment 8 Frédéric Wang (:fredw) 2013-11-25 04:08:38 PST
Created attachment 217786 [details]
script to generate the operator dictionary
Comment 9 Frédéric Wang (:fredw) 2013-11-25 04:12:18 PST
Created attachment 217787 [details]
WIP Patch

Here is an updated patch.

I have tried to use updateFromElement on mrow-like containers. That seems to update the operator form correctly, but the CSS style seems to be lost at the update. Hence I'll open a new bug to do code refactoring for the token elements first and will come back to this bug later. I also suspect all the mrow-like elements do not behave correctly at the moment.

This patch also adds several new tests and references to follow-up bugs.
Comment 10 Frédéric Wang (:fredw) 2014-02-17 07:23:35 PST
I think I'll extract the changes and ignore the lspace/rspace support until I get feedback on bug 124838. So the op dict will only be used for the stretchy value and forms to start with. This means that many of the tests that relied on the spacing will be removed too.
Comment 11 Frédéric Wang (:fredw) 2014-02-19 01:38:59 PST
Created attachment 224610 [details]
Patch V1
Comment 12 WebKit Commit Bot 2014-02-19 01:40:10 PST
Attachment 224610 [details] did not pass style-queue:

ERROR: Source/WebCore/rendering/mathml/RenderMathMLFenced.h:50:  The parameter name "form" adds no information, so it should be removed.  [readability/parameter_name] [5]
ERROR: Source/WebCore/rendering/mathml/RenderMathMLFenced.h:50:  The parameter name "flag" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 2 in 15 files

If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 13 Frédéric Wang (:fredw) 2014-02-19 02:07:26 PST
Created attachment 224614 [details]
Patch V2
Comment 14 Frédéric Wang (:fredw) 2014-02-19 04:51:53 PST
Created attachment 224622 [details]
Patch V3
Comment 15 chris fleizach 2014-02-19 10:13:52 PST
Comment on attachment 224622 [details]
Patch V3

View in context: https://bugs.webkit.org/attachment.cgi?id=224622&action=review

looks pretty good overall to me. some minor comments inside

> Source/WebCore/rendering/mathml/RenderMathMLOperator.cpp:66
> +namespace MathMLOpDict {

I think it might be better to use the unabbreviated forms for op, so that this is

> Source/WebCore/rendering/mathml/RenderMathMLOperator.cpp:70
> +inline UChar ExtractChar(const Entry* aEntry) { return aEntry->character; }

parameter name for these methods should probably just be "entry" instead of aEntry, which isn't used so much in WebCore

> Source/WebCore/rendering/mathml/RenderMathMLOperator.cpp:1145
> +    if (element().getAttribute(name) == "true")

should be fastGetAttribute(). you might want to cache the value so you don't have to call it again for the false case

> Source/WebCore/rendering/mathml/RenderMathMLOperator.cpp:1152
> +void RenderMathMLOperator::setOperatorPropertiesFromOpDictEntry(const MathMLOpDict::Entry *entry)

* on wrong side

> Source/WebCore/rendering/mathml/RenderMathMLOperator.cpp:1179
> +        if (element().getAttribute(MathMLNames::formAttr) == "prefix")

fastGetAttribute and cache the result in a variable

> Source/WebCore/rendering/mathml/RenderMathMLOperator.cpp:1239
> +        String size = element().getAttribute(MathMLNames::minsizeAttr);


> Source/WebCore/rendering/mathml/RenderMathMLOperator.cpp:1241
> +        size = element().getAttribute(MathMLNames::maxsizeAttr);


> Source/WebCore/rendering/mathml/RenderMathMLOperator.h:119
> +    bool m_isFenceOp;

this should be isFenceOperator I think
Comment 16 Frédéric Wang (:fredw) 2014-02-19 11:16:17 PST
Created attachment 224657 [details]
Patch V4
Comment 17 chris fleizach 2014-02-19 11:28:32 PST
Comment on attachment 224657 [details]
Patch V4

Comment 18 Frédéric Wang (:fredw) 2014-02-20 00:23:43 PST
Committed r164418: <http://trac.webkit.org/changeset/164418>
Comment 19 Frédéric Wang (:fredw) 2014-03-10 12:25:49 PDT
Mass change: add WebExposed keyword to help MDN documentation.