Bug 78617

Summary: MathML internals - embellished operators, getBase() accessor functions
Product: WebKit Reporter: Dave Barton <dbarton@mathscribe.com>
Component: MathMLAssignee: Nobody <webkit-unassigned@lists.webkit.org>
Status: RESOLVED FIXED    
Severity: Normal CC: darin@apple.com, eric@webkit.org, fred.wang@free.fr, mitz@webkit.org, webkit.review.bot@gmail.com
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch none

Description From 2012-02-14 10:20:52 PST
MathML internals - embellished operators, getBase() accessor functions
------- Comment #1 From 2012-02-14 10:31:17 PST -------
Created an attachment (id=126997) [details]
Patch
------- Comment #2 From 2012-02-14 11:12:40 PST -------
(From update of attachment 126997 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=126997&action=review

> Source/WebCore/rendering/mathml/RenderMathMLBlock.h:54
> +    virtual const RenderMathMLOperator* unembellishedOperator() const { return 0; }

I don't think a const pointer will help you much here.  Since you won't be able to retain it...  For RefCounted objects (like Node) we normally don't bother with const much.

> Source/WebCore/rendering/mathml/RenderMathMLSubSup.cpp:57
> +RenderBoxModelObject* RenderMathMLSubSup::getBase() const

Normaly we don't use "get" in function names.  Since base() here would convey the same meaning.

It's unclear to me what "base" is here?  Also, it looks like you used it irght, but just to be clear, RenderBoxModelObject is the base class for all CSS-box-model renderers, this is in contrast to SVG-model renderers (which although uses CSS, does not participate in the CSS box model), which use RenderSVGModelObject.  Sometimes folks want RenderObject (to mean any kind of renderer) and sometimes they want RenderBoxModelObject (to mean only CSS-box-model renderers).
------- Comment #3 From 2012-02-14 11:14:42 PST -------
I don't know the webkit code, but here are some comments.

Be sure to consider all the elements that can be embellished operators:
http://www.w3.org/TR/MathML/chapter3.html#id.3.2.5.7.3

Some reftests for embellished op (I think Webkit supports reftest?):
http://devel.mathjax.org/testing/testsuite/MathMLToDisplay/Topics/EmbellishedOp/

In Mozilla, instead of using functions, we store pointers on the MathML frames (pointing to the core frame). Actually, we store a structure with such a pointer and other info relevant to operator stretching. I don't know why that was done that way, but the stretchy data may depend on several elements/attributes in the MathML tree, so it may be expensive to have to compute them if they are used several times. I don't know what is the best for Webkit.
------- Comment #4 From 2012-02-14 12:26:52 PST -------
Translation: "frame" in mozilla is basically "renderer" in WebKit.
------- Comment #5 From 2012-02-14 19:25:00 PST -------
(From update of attachment 126997 [details])
Thanks for all the expert comments! My wife seems to think it's Valentine's Day so I'll say more tomorrow. :-) I'll submit a revised patch after that.
------- Comment #6 From 2012-02-15 01:21:07 PST -------
Attachment 126997 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/update-webkit']" exit_code: 9

Updating OpenSource
From git://git.webkit.org/WebKit
 + 2ce77d7...a9730c4 master     -> origin/master  (forced update)
First, rewinding head to replay your work on top of it...
Applying: [Mac][Win][WK2] Switch to RFC 6455 protocol for WebSockets
Using index info to reconstruct a base tree...
<stdin>:1578: trailing whitespace.

<stdin>:1647: trailing whitespace.

<stdin>:1657: trailing whitespace.

<stdin>:1672: trailing whitespace.
        return 0;        
<stdin>:1674: trailing whitespace.

warning: squelched 7 whitespace errors
warning: 12 lines add whitespace errors.
Falling back to patching base and 3-way merge...
warning: too many files (created: 168753 deleted: 3), skipping inexact rename detection
Auto-merging LayoutTests/ChangeLog
CONFLICT (content): Merge conflict in LayoutTests/ChangeLog
Auto-merging LayoutTests/platform/wk2/Skipped
Auto-merging Source/WebCore/ChangeLog
Auto-merging Source/WebCore/css/CSSCalculationValue.cpp
Auto-merging Source/WebCore/css/CSSCalculationValue.h
Auto-merging Source/WebCore/css/CSSParser.cpp
Auto-merging Source/WebKit/mac/ChangeLog
CONFLICT (content): Merge conflict in Source/WebKit/mac/ChangeLog
Auto-merging Source/WebKit2/ChangeLog
CONFLICT (content): Merge conflict in Source/WebKit2/ChangeLog
Auto-merging Tools/ChangeLog
CONFLICT (content): Merge conflict in Tools/ChangeLog
Failed to merge in the changes.
Patch failed at 0001 [Mac][Win][WK2] Switch to RFC 6455 protocol for WebSockets

When you have resolved this problem run "git rebase --continue".
If you would prefer to skip this patch, instead run "git rebase --skip".
To restore the original branch and stop rebasing run "git rebase --abort".

rebase refs/remotes/origin/master: command returned error: 1

Died at Tools/Scripts/update-webkit line 164.


If any of these errors are false positives, please file a bug against check-webkit-style.
------- Comment #7 From 2012-02-15 07:57:24 PST -------
FYI, here is how it is implemented in Mozilla. I hope that can help you to implement it in Webkit.

First we have a data structure to describe embellished operators:
http://mxr.mozilla.org/mozilla-central/source/layout/mathml/nsIMathMLFrame.h#239

This embellish data is initialized for each frame (renderer) from bottom to top using TransmitAutomaticData. For example they are initialized for the <mo>, based on various info (attributes, operator dictionary etc). An <msup> element takes the EmbellishData from its base (first child):

http://mxr.mozilla.org/mozilla-central/source/layout/mathml/nsMathMLmsupFrame.cpp#65

and similarly for the other cases described in the MathML REC:

http://www.w3.org/TR/MathML/chapter3.html#id.3.2.5.7.3

The non-trivial case is for mrow-like elements (mrow, mstyle, mphantom...). To do that, first we store on the frame a boolean property indicating whether the element is "space-like":

http://www.w3.org/TR/MathML/chapter3.html#id.3.2.7.4

which is also initialized from bottom to top. So for example mtext and mspace are initialized to be space-like per the MathML REC. Finally, for mrow-like elements the EmbellishData and the space-like property are initialized at the same time using TransmitAutomaticDataForMrowLikeElement:

http://mxr.mozilla.org/mozilla-central/source/layout/mathml/nsMathMLContainerFrame.cpp#1482
------- Comment #8 From 2012-02-15 10:27:55 PST -------
I'll remove the "const" and "get", and add comments to the code summarizing or referring to our discussions here.

"base" here omits a subscript and/or superscript, or an underscript and/or overscript. In legal MathML, the result will still be MathML, hence a RenderMathMLBlock or RenderInline, both of which derive from RenderBoxModelObject. We do need to be able to work with the base, e.g. compute its offsetHeight.

I'm not an SVG expert, but you raise a basic design question for MathML in WebKit. Currently RenderMathMLBlock, which is used by most MathML elements, derives from RenderBlock. We get a lot of horizontal and vertical formatting for free that way, hit-testing, probably a lot of other things(?). Also MathML elements are probably most like inline-block elements, which are implemented by RenderBlock. However, one might argue it'd be cleaner to not derive from RenderBlock, and reimplement layout() and other functionality for some base RenderMathML class. This effort is probably beyond the time I have available though.

Even more fundamental is the question of whether MathML objects obey the CSS box model, or whether they should. The MathML spec suggests that one ought to be able to style MathML with CSS in environments that support CSS, presumably including using the box model.

For now, I am indeed using a simplified definition of "embellished operator". I think it's best to simplify the current RenderMathML* classes before we add things to them. Also, I don't agree with larger definition, FWIW. Is there a reference explaining more fully the rationale for always treating an <mrow> of one argument, and without attributes, as a special case? I've written a MathML generator, jqMath, and it was actually simpler to not do this. Also checking specially for space-like elements adds complications and an unnecessary inconsistency, in my opinion. This requirement pre-dates CSS. Just as many presentational attributes in HTML were deprecated and replaced by CSS, I think many in MathML should be also, and the definition of "embellished operator" should be simplified as well. But having said all that, if the spec isn't changed, then I agree that WebKit MathML should eventually implement a mechanism similar to Mozilla's.

Thanks for the pointers to the tests. We definitely should include these tests, and/or tests from the w3c mathml committee, at some point.

This careful review from you experts is invaluable.
------- Comment #9 From 2012-02-15 13:18:18 PST -------
Created an attachment (id=127224) [details]
Patch
------- Comment #10 From 2012-02-17 10:38:38 PST -------
I asked Neil Soiffer, arguably the world's #1 authority on these matters (MathML committee major author for 15 years, Mathematica front end & IE MathPlayer implementer, authority for Frédéric Wang and others on w3c's mathml mailing list), about the rules for mrow and embellished operators in MathML and got this response:

``It's hard to remember back why all of the things that are in MathML are in there.  Some have good solid technical reasons and some were political compromises -- I hate mfenced, which is duplicates other things and isn't as powerful and causes problems for things like CSS because drawn items are buried in attributes.  Anyway...

mrow with one arg were added because authoring tool editors felt that it was more natural to always have mrows around subexpression, such as a numerator of a fraction.  Hand authoring, which was still popular (and remains popular by HTML developers to this day) pushed towards not always requiring the mrows.  That's also the reason for the implicit mrows in some places such as msqrt.  In the implementations I've done, I've typically "normalized" them away so the rendering code never sees it.  This normalization is either explicit (tree is changed) or virtual via a "getChild" method that checks if the child is an mrow with a single child, and if so, calls itself recursively.

The embellished operators rules are a bit complicated, but grew out of use cases.  I think (not 100% positive -- I'd need to check the code), that like mrow, embellished operators were virtualized in my code in the sense that there is a "isOperator" call that checks to see if something is a script (etc), and if so, calls itself on the base of the script.  In that way, they turned out not to be much of an exception to the layout logic -- any value you ask for of  an operator could be asked of an embellished operator without needing to know about details of the embellished operator.  I will admit that in my linebreaking code, I repeatedly forgot to call the higher level function and that led to many bugs, all of which were trivial to fix but still bugs nonetheless.  I do remember there being a lot of discussion about the embellished operator/mrow/space-like children rule, but it was too long ago to remember the rationale.  If needed, I could go through the list archives...''

So it sounds like Neil's probably gone the virtual function route for tracking embellished operators, instead of the Mozilla method of adding extra data. The extra data has the drawback of needing to be updated whenever the DOM changes. I'm not sure it really saves much time in practice.

I find all this fascinating. The whole <mrow> of one argument must be equivalent in all ways to the argument itself seems crazy to me. This would be like saying that <span> of one argument must behave like the argument alone in all ways, e.g. CSS child selector rules. This would be nuts, IMHO. To justify it by saying that the numerator argument to <mfrac> should always be an <mrow> in some editor is like saying everything inside a <p> tag should always be wrapped in a <span>. This just doesn't make any sense to me. Perhaps this is part of why MathML never caught on in 15 years. I understand that it's our job to implement all of the spec, not just the parts we like, but I also think eventually someone from WebKit should be on the w3c mathml committee and ask about deprecating some of these features (bugs?) from 15 years ago, when for instance CSS didn't even exist.

But the real point is we are a long way away from even worrying about things like this. WebKit doesn't yet have mmultiscripts, semantics, mstyle, mphantom, mpadded, or maction elements, or almost any kind of space-like element (and is arbitrary mtext really space-like?). Thus almost all the clauses about embellished operators that Frédéric Wang refers to don't yet apply to us. I have flagged "embellished operators" as a FIXME comment in the code in this patch, and I suppose we could file a bug for it also, but there are dozens of other things in MathML that we don't have yet either, and we don't have bug reports for them.

I just want to fix WebKit's stretching of operators, including ones with subscripts/superscripts/underscripts/overscripts, for example integral signs with limits. I'd like to do (most of) this in my next patch, eliminating the current nonOperatorHeight() and hasBase() functions entirely, which don't conform at all to either the MathML spec or standard mathematical typesetting.

WebKit's MathML implementation is currently minimal (a partial implementation), nonstandard, buggy, and perhaps insecure. In my opinion, we'll never fix all this if we insist on small patches, and then compare each one to everything in Mozilla and the MathML spec. Even if I or someone else was able to put in unlimited hours of programming for free, no one at webkit would review or commit the patches. The current system is difficult enough - break patches into tiny pieces, and then hope they will be reviewed within a few days each. Not only will it take years to implement MathML at this pace, it's just plain hard to program when taking breaks of a few days or longer between each tiny bit of programming.

I think we need to either review this patch + or - soon, or else admit that this development model isn't working. Do we need to wait for WebKit companies to decide whether MathML is worth paying a committer to read parts of the MathML spec and spend a few hours per week reviewing patches, to help sell devices to literally hundreds of millions of students, scientists, and engineers that will want to use MathML in WebKit in the next few years? I respectfully await your reply.
------- Comment #11 From 2012-02-17 13:00:39 PST -------
> So it sounds like Neil's probably gone the virtual function route for tracking embellished operators, instead of the Mozilla method of adding extra data. The extra data has the drawback of needing to be updated whenever the DOM changes. I'm not sure it really saves much time in practice.

I think the idea of the data is the following. If you have

<munder>
  <msub>
    <mo>→</mo>
    <mi>0</mi>
  </msub>
  <mtext>text</mtext>
</munder>

Each time you must redraw the arrow (for example when you modify the mtext, or zoom, or resize the window...), you can directly use the pointer instead of having to call a function to find it. Of course, this does not seem very helpful on this trivial example and one may argue that the embellished operator subtree is not really deep in general, which is certainly true. But the point is that we have actually more data to compute than simply the pointer to the core frame and the data may depend on many other elements/attributes so that's better to compute the whole thing once and let the frames communicate their infos to each other. So the drawback of the data is just a slight memory overhead but it is intended to make the whole thing faster. I don't claim that it is actually true in practice, I didn't make experiments to verify this assertion and didn't write the Mozilla code. Again, I don't know the Webkit code and just gave you a suggestion. Webkit folks know better than me which design is appropriate for their code.


> 
> I find all this fascinating. The whole <mrow> of one argument must be equivalent in all ways to the argument itself seems crazy to me. This would be like saying that <span> of one argument must behave like the argument alone in all ways, e.g. CSS child selector rules. This would be nuts, IMHO. To justify it by saying that the numerator argument to <mfrac> should always be an <mrow> in some editor is like saying everything inside a <p> tag should always be wrapped in a <span>. This just doesn't make any sense to me. Perhaps this is part of why MathML never caught on in 15 years. I understand that it's our job to implement all of the spec, not just the parts we like, but I also think eventually someone from WebKit should be on the w3c mathml committee and ask about deprecating some of these features (bugs?) from 15 years ago, when for instance CSS didn't even exist.
> 


Well, I think there are other deeper reasons for MathML not having been used as much as other web languages than this <mrow> property,  but that seems a bit off topic... And BTW this property seems quite reasonable to me actually, since <mrow> is only here to group elements. If you group a single element, I can't see any reason to change its properties (of course this grouping is useless, but you can't prevent people from writing/generating such code)...

> But the real point is we are a long way away from even worrying about things like this. WebKit doesn't yet have mmultiscripts, semantics, mstyle, mphantom, mpadded, or maction elements, or almost any kind of space-like element (and is arbitrary mtext really space-like?). Thus almost all the clauses about embellished operators that Frédéric Wang refers to don't yet apply to us. I have flagged "embellished operators" as a FIXME comment in the code in this patch, and I suppose we could file a bug for it also, but there are dozens of other things in MathML that we don't have yet either, and we don't have bug reports for them.
> 

I don't know which elements Webkit implements or not, I just wanted to make you aware of the embellished op details. Obviously, that makes sense to make embellished op work just for the elements that are already implemented in Webkit.


> I just want to fix WebKit's stretching of operators, including ones with subscripts/superscripts/underscripts/overscripts, for example integral signs with limits. I'd like to do (most of) this in my next patch, eliminating the current nonOperatorHeight() and hasBase() functions entirely, which don't conform at all to either the MathML spec or standard mathematical typesetting.
> 
> WebKit's MathML implementation is currently minimal (a partial implementation), nonstandard, buggy, and perhaps insecure. In my opinion, we'll never fix all this if we insist on small patches, and then compare each one to everything in Mozilla and the MathML spec. Even if I or someone else was able to put in unlimited hours of programming for free, no one at webkit would review or commit the patches. The current system is difficult enough - break patches into tiny pieces, and then hope they will be reviewed within a few days each. Not only will it take years to implement MathML at this pace, it's just plain hard to program when taking breaks of a few days or longer between each tiny bit of programming.
> 
> I think we need to either review this patch + or - soon, or else admit that this development model isn't working. Do we need to wait for WebKit companies to decide whether MathML is worth paying a committer to read parts of the MathML spec and spend a few hours per week reviewing patches, to help sell devices to literally hundreds of millions of students, scientists, and engineers that will want to use MathML in WebKit in the next few years? I respectfully await your reply.

I can't speak for Webkit, but I think the purpose of the review system is to ensure code quality (and that's already a work for Webkit people to do that...). Sometimes it is best not to go too fast and to do things carefully... Currently, code refactoring is needed in Mozilla MathML code because things were implemented in a very bad way (for example attributes added to the DOM to simulate style properties). Similarly, my comparison with the MathML REC and Mozilla code was intended to help you, not to delay your work on MathML. You'll have to find the balance between prioritizing the things that have to be implemented and to deliver a good MathML support for Webkit...

Also, there are tools to handle several patches at the same time (for example, Mercurial has an extension called "Mercurial Queues"). This is very convenient to break down the work in small pieces but still being able to modify a large portions of code simultaneously. You should definitely have a look to them.

Good luck for your MathML work in Webkit!
------- Comment #12 From 2012-02-17 15:33:22 PST -------
(From update of attachment 127224 [details])
Clearing flags on attachment: 127224

Committed r108136: <http://trac.webkit.org/changeset/108136>
------- Comment #13 From 2012-02-17 15:33:27 PST -------
All reviewed patches have been landed.  Closing bug.