Bug 15302 - Add very-basic CSS-based MathML support into WebKit
Summary: Add very-basic CSS-based MathML support into WebKit
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 523.x (Safari 3)
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on: 16002
Blocks: 3251
  Show dependency treegraph
 
Reported: 2007-09-27 20:47 PDT by Eric Seidel (no email)
Modified: 2010-10-17 10:54 PDT (History)
14 users (show)

See Also:


Attachments
WebCore patch (MOST of this file is the XHTML+MathML DTD) (481.36 KB, patch)
2007-09-27 20:49 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
WebKitTools suppport for building MathML enabled (and parsing entities from the DTD) (3.78 KB, patch)
2007-09-27 20:50 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Slightly improved WebCore patch (again the size is *all* the DTD) (481.90 KB, patch)
2007-09-28 08:38 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
improved (large size due to DTD inclusion) (438.10 KB, patch)
2007-10-07 13:04 PDT, Eric Seidel (no email)
hyatt: review-
Details | Formatted Diff | Diff
fixed improved patch to work on new sources (large size due to DTD inclusion) (437.34 KB, patch)
2008-04-05 07:33 PDT, Dmitriy Dzema
eric: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2007-09-27 20:47:49 PDT
Add very-basic CSS-based MathML support into WebKit

I used the stylesheet from:
http://www.w3.org/TR/mathml-for-css/

and basically just made the XMLTokenizer MathML aware.  I also added an (incomplete) MathML entities hash table.  MathML has *lots* of entities, and my DTD -> Entity-hash script is not yet perfect, but it's good enough for many many mathml files.
Comment 1 Eric Seidel (no email) 2007-09-27 20:49:43 PDT
Created attachment 16426 [details]
WebCore patch (MOST of this file is the XHTML+MathML DTD)
Comment 2 Eric Seidel (no email) 2007-09-27 20:50:36 PDT
Created attachment 16427 [details]
WebKitTools suppport for building MathML enabled (and parsing entities from the DTD)
Comment 3 Rob Buis 2007-09-27 23:32:46 PDT
Hi Eric,

(In reply to comment #0)
> Add very-basic CSS-based MathML support into WebKit
> 
> I used the stylesheet from:
> http://www.w3.org/TR/mathml-for-css/
> 
> and basically just made the XMLTokenizer MathML aware.  I also added an
> (incomplete) MathML entities hash table.  MathML has *lots* of entities, and my
> DTD -> Entity-hash script is not yet perfect, but it's good enough for many
> many mathml files.

I also did some MathML support locally some time ago, with a slighlty different stylesheet IIRC. Anyway, I think we should talk! I assume this is meant for the feature branch?
Cheers,

Rob.
Comment 4 Eric Seidel (no email) 2007-09-28 05:47:56 PDT
(In reply to comment #3)

> I also did some MathML support locally some time ago, with a slighlty different
> stylesheet IIRC. Anyway, I think we should talk! I assume this is meant for the
> feature branch?

This is intend for the feature branch (or its own separate branch), yes.

Please, feel free to try different style sheets.  This patch is by no means an authoritative way of doing MathML in WebKit.  I just had explained this method *again* on IRC (for probably now the 10th time), and figured, "why the heck not.  it really is easy, I'll just do it."  I have no plans for bringing "complete" MathML support to WebKit, however, were this to land, I might fiddle around with it over time (especially were I to find a good -- and easy to use -- test suite, or if others were to play around with this half-support and file bugs, etc).

Hopefully that helps clear things up.  Again, I was just trying to give the MathML project a small kick in the pants, with the hope that actually having some "minimal" support would motivate others, or that even if this patch didn't land, I would never have to explain the "use CSS for MathML" method again, other than to just point at this bug. ;)
Comment 5 Rob Buis 2007-09-28 06:02:33 PDT
Hi Eric,

(In reply to comment #4)
> This is intend for the feature branch (or its own separate branch), yes.

Would be the only sane approach, but just wanted it cleared.

> Please, feel free to try different style sheets.  This patch is by no means an
> authoritative way of doing MathML in WebKit.  I just had explained this method
> *again* on IRC (for probably now the 10th time), and figured, "why the heck
> not.  it really is easy, I'll just do it."  I have no plans for bringing
> "complete" MathML support to WebKit, however, were this to land, I might fiddle
> around with it over time (especially were I to find a good -- and easy to use
> -- test suite, or if others were to play around with this half-support and file
> bugs, etc).
> 
> Hopefully that helps clear things up.  Again, I was just trying to give the
> MathML project a small kick in the pants, with the hope that actually having
> some "minimal" support would motivate others, or that even if this patch didn't
> land, I would never have to explain the "use CSS for MathML" method again,
> other than to just point at this bug. ;)

Thanks for clearing that up. This patch could be useful as well to enter some more in the MathML project info on webkit.org and with a hint that help is needed. I'll have to study this stylesheet later, IIRC there was nothing there for 1.1 but I found an independent one. I also would like to know what changed between 1.1 and 2.0.
Cheers,

Rob.
Comment 6 Jacques Distler 2007-09-28 06:23:29 PDT
Is there a hope that this approach might be a start on real MathML support? Or is it envisioned as an alternative to real MathML support? Clearly, some things (like the DTD) are necessary, either way. But it would be nice if this were to kickstart some activity on the MathML project.
Comment 7 Eric Seidel (no email) 2007-09-28 06:31:11 PDT
(In reply to comment #6)
> Is there a hope that this approach might be a start on real MathML support? Or
> is it envisioned as an alternative to real MathML support? Clearly, some things
> (like the DTD) are necessary, either way. But it would be nice if this were to
> kickstart some activity on the MathML project.

Yes.  Kickstarting is the intention (as I said above).  But I think that work will need to come from new contributers, the current set is... well... rather overbooked. ;)
Comment 8 Eric Seidel (no email) 2007-09-28 08:38:23 PDT
Created attachment 16431 [details]
Slightly improved WebCore patch (again the size is *all* the DTD)
Comment 9 Eric Seidel (no email) 2007-09-28 08:44:30 PDT
So my opinions of MathML have gone down since yesterday.  I'm not sure it's really a very useful addition to WebKit.

http://en.wikipedia.org/wiki/MathML

gives you some idea how screwy MathML is.  It's incredibly verbose compared to LaTeX.  It's also not really XML.  Not only do MathML files not require a DTD or namespace, but there also appears to be this screwy "annotation" tag, which can contain non-xml content (probably not even in a CDATA tag).  A good example of this is the MathML test suite http://www.w3.org/Math/testsuite/ which has lots of mml files which are not valid XML.  Also, the MathML test suite seems even worse than the SVG test suite (at least in ease of use, not necessarily coverage... as I don't know MathML anyway).

It's of course totally possible to work around all these issues and build a working MathML implementation, but given how little MathML seems to be used on the Web, I'm not sure it's worth it.  (Also, I keep expecting someone to make an XSL stylesheet which knows how to render MathML perfectly using an SVG engine...)
Comment 10 Jacques Distler 2007-09-28 10:44:56 PDT
MathML is verbose. It's also not intended to be hand-authored.  Typically, it is machine-generated, often from LaTeX input. See, e.g.

   http://golem.ph.utexas.edu/~distler/blog/itex2MML.html

>It's also not really XML.  Not only do MathML files not require a DTD
or namespace, ...


Huh? MathML most certainly IS an application of XML. XML doesn't require a DTD (and porbably shouldn't). Of course, without a DTD, you can't use named entities (other than the standard 5). But one shouldn't be sending the 2000+ named entities defined by the MathML DTD over the wire, anyway.

What *is* strange about the MathML Spec is that it is really *two* specifications:

* Content MathML
    Conceived, among other things, as a data interchange format for symbolic manupulation programs.

* Presentational MathML
    Conceived as a format for displaying Math on the web.

No browser implements CMML. The focus is on PMML.

>but there also appears to be this screwy "annotation" tag, which
can contain non-xml content (probably not even in a CDATA tag).

The <annotation> is a part of Content MathML, and its content-model is PCDATA. No browser implements it, nor (probably) should they.

>A good example of this is the MathML test suite http://www.w3.org/Math/testsuite/ which has lots of mml files which are not valid XML.

Really? If so, that's a bug. You should report it. (But, again, for present purposes, I would not give a hoot about the CMML part of the test suite).

>Also, I keep expecting someone to make
an XSL stylesheet which knows how to render MathML perfectly using an SVG
engine

That is, certainly, one approach to implementing MathML...

>given how little MathML seems to be used on
the Web, I'm not sure it's worth it.

Both MathML and inline SVG require application/xhtml+xml, and so are essentially unused on the Web.

By comparison with SVG 1.1, Presentational MathML is a small, implementer-friendly little specification.

Comment 11 Darin Adler 2007-10-04 09:27:31 PDT
Comment on attachment 16427 [details]
WebKitTools suppport for building MathML enabled (and parsing entities from the DTD)

Do we really need to make a Ruby script? I'd prefer to not add another require scripting language.

Can we make generation of the gperf file be part of the build instead?
Comment 12 Eric Seidel (no email) 2007-10-07 00:45:10 PDT
(In reply to comment #11)
> (From update of attachment 16427 [details] [edit])
> Do we really need to make a Ruby script? I'd prefer to not add another require
> scripting language.

I seem to remember you and I having several conversations in the past about the relative merits of perl. :)  But yes, I can make it perl to be like the majority of the rest of our scripts.  I had some other ruby scripts lying around and just did a cp and started from one of those for this one.

> 
> Can we make generation of the gperf file be part of the build instead?

Yes, potentially.  That would be kinda cool.  Currently this script is incomplete.  It doesn't suck out quite all of MathMLs (nearly endless) list of entities.  It misses all the really large ones where one entity is replaced by 5-6 chars :(
Comment 13 Jacques Distler 2007-10-07 01:02:59 PDT
>Currently this script is incomplete.  It doesn't suck out quite all of MathMLs (nearly endless) list of
entities.  It misses all the really large ones where one entity is replaced by
5-6 chars :(

See  MathML::Entities

   http://search.cpan.org/~distler/MathML-Entities/lib/MathML/Entities.pm

for, among other things, the full list of XHTML+MathML named entities as a Perl hash.

The same is available in Ruby here (scroll down):

     http://golem.ph.utexas.edu/~distler/code/instiki/svn/lib/sanitize.rb
Comment 14 Eric Seidel (no email) 2007-10-07 13:04:21 PDT
Created attachment 16576 [details]
improved (large size due to DTD inclusion)

I took Darin's suggestion and made the MathMLEntityNames.gperf file entirely autogenerated as part of the build process.  It took way too long... and I think I might still smell like perl.   But it's done. :)
Comment 15 Eric Seidel (no email) 2007-11-15 10:18:01 PST
If it would make it easier I could prepare a patch which was just the parser and the entities with an empty mathml.css.  Then we could land that and let others hack on the actual mathml.css implementation.
Comment 16 Eric Seidel (no email) 2007-11-15 10:19:05 PST
Btw, I understand that this is *far* from the top priority in WebKIt at the moment, and I'm certainly not trying to make it so (at least for such a half-implementation).  I just figured the bug could use a little update-lovin after rotting for so long.  I should pull this patch down into my local git repo to keep it from rotting anyway.
Comment 17 Dave Hyatt 2007-11-15 11:10:12 PST
Comment on attachment 16576 [details]
improved (large size due to DTD inclusion)

I am worried that the MathML stylesheet is too big to be loaded on startup and kept in memory all the  time.  We should possibly explore dynamic loading of SVG and MathML user agents sheets when an element of that namespace is first encountered in an XML file.
Comment 18 Eric Seidel (no email) 2007-11-15 11:55:11 PST
Dynamic loading sounds like an interesting idea.... but kinda a separate bug.  I guess you're marking this as "blocked" by that bug.  Filed bug 16002 to cover hyatt's concern.
Comment 19 Dmitriy Dzema 2008-04-05 07:33:38 PDT
Created attachment 20352 [details]
fixed improved patch to work on new sources (large size due to DTD inclusion)
Comment 20 Dave Hyatt 2008-04-05 10:06:39 PDT
The CSS file has a lot of rules that are going to perform very badly.
Comment 21 Eric Seidel (no email) 2008-04-05 13:39:29 PDT
Comment on attachment 20352 [details]
fixed improved patch to work on new sources (large size due to DTD inclusion)

A couple comments:
1.  As a general rule, we don't commit commented-out code, or #if 0'd out code.  so :
 /*
 154     uncomment and add class MathMLElement to support dynamic CSS loading
 155     
 156 #if ENABLE(MATHML)
 157     virtual
 158 #endif
 159         bool isMathMLElement() const { return false; }
 160 */

shouldn't be commited.

Instead, I think that you should make that method non-virtual and just make it return false in the non-enabled case and:
return namespaceURI() == mathMLNamespace; in the enabled case.  Even if you have to define mathMLNamespace as a function-local static AtomicString.  If we had actual MathML elements, and MathMLNames.* MathMLNames::mathMLNamespace woudl already have been defined by the make_names.pl autogeneration script.

+                if (document() && document()->isHTMLDocument()) // mathml.css is parsed before any document() exists
goes away once you add a real isMathMLElement function (which for now is non-virtual and just checks the namespaceURI)

Otherwise the patch looks good.  I think we need to decide if MathML development should go on a branch at the beginning or if it can go on trunk.  Also I think we should talk (over IRC or otherwise) about the general plan for MathML is.  What the "end" condition is for the summer, etc.  I assume there is a MathML test suite or two that we could check in to test our progress?

Oh, and finally, I made a mistake by including the DTD in these patches.  I should have made a separate patch for the DTD, and then not included it in any future patch (that would make it easier to read all the smaller patches).  If either of us posts any more of these patches, we should probably not include the DTD.

r- for the commented out code (which could be replaced by real code). Before we can land a fixed version, we'll need to figure out if this is going on a branch or not.
Comment 22 Alex Milowski 2009-09-20 07:55:14 PDT
Some of the ideas in this patch have been applied by the patch 29158.  Specifically, the support for the MathML CSS stylesheet has been added by that patch.

The CSS in that patch is very different (much smaller) and specific to the rendering code that will be added in the future rather than the CSS from the "MathML for CSS" stylesheet from the W3C.

The changes to the parsing code are style valid and probably work considering but should be broken out into a separate smaller patch now that the patch in 29158 has been applied.
Comment 23 Alex Milowski 2010-10-17 10:54:10 PDT
At this point, I believe we've taken what we can from this patch.  The support for MathML parsing has been and will continue to be integrated into HTML5.  Additional support for XHTML should be considered according to whatever comes out of the HTML5 XHTML serialization efforts (which are yet to be completely determined).

If you feel that this should remain open, please comment and let me know.