Bug 43949 - Add support for MathML entities
Summary: Add support for MathML entities
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Other OS X 10.5
: P2 Normal
Assignee: Nobody
URL:
Keywords:
: 43595 (view as bug list)
Depends on: 43595 44036
Blocks: mathjax
  Show dependency treegraph
 
Reported: 2010-08-12 22:26 PDT by Eric Seidel
Modified: 2012-04-16 12:51 PDT (History)
19 users (show)

See Also:


Attachments
Now with all build systems including HTMLEntitySearch.* (48.08 KB, patch)
2010-08-12 22:33 PDT, Eric Seidel
no flags Details | Formatted Diff | Diff
Now should compile for everyone except chromium and possibly tiger (48.83 KB, patch)
2010-08-13 12:26 PDT, Eric Seidel
no flags Details | Formatted Diff | Diff
Now should compile for everyone (50.77 KB, patch)
2010-08-13 14:57 PDT, Eric Seidel
no flags Details | Formatted Diff | Diff
Fixed mac build (50.71 KB, patch)
2010-08-13 16:01 PDT, Eric Seidel
no flags Details | Formatted Diff | Diff
Patch to fix cr-win builder (1.41 KB, patch)
2010-08-13 22:38 PDT, Eric Seidel
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel 2010-08-12 22:26:31 PDT
+++ This bug was initially created as a clone of Bug #43595 +++

Add support for MathML entities

The original bug got spammed, so making a new one.
Comment 1 Eric Seidel 2010-08-12 22:26:52 PDT
*** Bug 43595 has been marked as a duplicate of this bug. ***
Comment 2 Eric Seidel 2010-08-12 22:30:46 PDT
@mmentovai: My gyp changes in the previous patch were partial and wrong.  I just copied the gperf rule and modified it, so any typos you saw were also in the gperf rule. :)
Comment 3 Eric Seidel 2010-08-12 22:33:15 PDT
Created attachment 64303 [details]
Now with all build systems including HTMLEntitySearch.*
Comment 4 Eric Seidel 2010-08-12 23:00:37 PDT
Attachment 64303 [details] did not build on mac:
Build output: http://queues.webkit.org/results/3754101
Comment 5 Adam Barth 2010-08-13 09:24:46 PDT
Comment on attachment 64303 [details]
Now with all build systems including HTMLEntitySearch.*

WebKitTools/ChangeLog:12
 +                patch will not build!  OOPS!
Maybe remove this note?

WebCore/html/HTMLEntityParser.cpp:35
 +  #include "HTMLEntityNames.cpp"
How does this still compile?
Comment 6 Eric Seidel 2010-08-13 09:35:08 PDT
Bah.  You mean we need the old entity code for the xml code?!?
Comment 7 Eric Seidel 2010-08-13 12:26:52 PDT
Created attachment 64363 [details]
Now should compile for everyone except chromium and possibly tiger
Comment 8 Eric Seidel 2010-08-13 12:30:54 PDT
Hmm... my patches are having the same trouble on win-ews.  I think this is a svn-apply regression. :(
Comment 9 Eric Seidel 2010-08-13 12:33:57 PDT
Attachment 64363 [details] did not build on mac:
Build output: http://queues.webkit.org/results/3757134
Comment 10 Eric Seidel 2010-08-13 14:24:00 PDT
Odd.  It seems to build both debug and release on my machine.  Maybe a SL vs. Leopard difference.  Maybe I forgot to do a clean build locally?
Comment 11 Adam Barth 2010-08-13 14:38:12 PDT
I bet the generated file was left over.
Comment 12 Eric Seidel 2010-08-13 14:57:01 PDT
Created attachment 64377 [details]
Now should compile for everyone
Comment 13 Eric Seidel 2010-08-13 15:06:21 PDT
Attachment 64377 [details] did not build on mac:
Build output: http://queues.webkit.org/results/3712146
Comment 14 Eric Seidel 2010-08-13 15:12:44 PDT
Doing a clean build to see if I can figure out what's wrong with Mac.  If that doesn't fix it, I'll poke the ews bot.
Comment 15 Eric Seidel 2010-08-13 15:46:40 PDT
Clean build worked locally.  I'll poke the EWS bot...
Comment 16 Eric Seidel 2010-08-13 16:01:19 PDT
Created attachment 64384 [details]
Fixed mac build
Comment 17 Eric Seidel 2010-08-13 16:15:13 PDT
This looks ready to land.  We'll fix any Tiger/Chromium breakage after landing if necessary.
Comment 18 Eric Seidel 2010-08-13 19:01:13 PDT
Comment on attachment 64384 [details]
Fixed mac build

Since Adam wrote the patch, I guess I'm supposed to r+ it.

The code looks great.  Lets hope I did the build system integration right. :)
Comment 19 Eric Seidel 2010-08-13 20:18:39 PDT
Committed r65351: <http://trac.webkit.org/changeset/65351>
Comment 20 WebKit Review Bot 2010-08-13 20:26:33 PDT
http://trac.webkit.org/changeset/65351 might have broken Chromium Linux Release
Comment 21 Eric Seidel 2010-08-13 20:29:13 PDT
Committed r65352: <http://trac.webkit.org/changeset/65352>
Comment 22 Eric Seidel 2010-08-13 20:32:32 PDT
Committed r65353: <http://trac.webkit.org/changeset/65353>
Comment 23 WebKit Review Bot 2010-08-13 21:11:13 PDT
http://trac.webkit.org/changeset/65353 might have broken SnowLeopard Intel Release (Tests)
Comment 24 Eric Seidel 2010-08-13 22:38:22 PDT
Created attachment 64407 [details]
Patch to fix cr-win builder
Comment 25 Eric Seidel 2010-08-13 22:39:01 PDT
I would have landed this manually, except I have other stuff in my tree which would make me need a full rebuild. :p
Comment 26 WebKit Commit Bot 2010-08-13 23:45:05 PDT
Comment on attachment 64407 [details]
Patch to fix cr-win builder

Clearing flags on attachment: 64407

Committed r65359: <http://trac.webkit.org/changeset/65359>
Comment 27 WebKit Commit Bot 2010-08-13 23:45:12 PDT
All reviewed patches have been landed.  Closing bug.
Comment 28 Eric Seidel 2010-08-14 00:18:34 PDT
Committed r65360: <http://trac.webkit.org/changeset/65360>
Comment 29 WebKit Review Bot 2010-08-14 00:55:57 PDT
http://trac.webkit.org/changeset/65360 might have broken SnowLeopard Intel Release (Tests)
Comment 30 Mihai Parparita 2010-08-15 10:55:24 PDT
I'm still having trouble building Chromium/TestShell after http://trac.webkit.org/changeset/65351 (which removed HTMLEntityNames.gperf, even with http://trac.webkit.org/changeset/65353.  WebKit/chromium/src/WebEntities.cpp still includes HTMLEntityNames.cpp, and uses the Entity struct from it:

http://trac.webkit.org/browser/trunk/WebKit/chromium/src/WebEntities.cpp#L50
Comment 31 Eric Seidel 2010-08-15 11:02:40 PDT
Yup.  I could see how that would break.

We'll need to re-write WebEntities.cpp.  That code looks wrong.

Why would chromium even need this code?
Comment 32 Eric Seidel 2010-08-15 11:05:56 PDT
The easy fix is to restore the HTMLEntityNames.cpp generation code for chromium only.

The code as-written is wrong for HTML5.  But someone from chromium who understands this code will have to fix that.
Comment 33 Eric Seidel 2010-08-15 11:11:09 PDT
I suspect the WebEntities stuff is part of Chromiums broken hacks around markup.cpp.  iirc, Chromium does its own dom serialization outside of WebCore.  Which is super-wrong.  The correct fix is to fix markup.cpp to work as Chromium needs it to.

But someone who knows what this code is for should comment.  I'll see if I can sketch up a patch to fix the build break.
Comment 34 Mihai Parparita 2010-08-15 11:12:21 PDT
(In reply to comment #31)
> Why would chromium even need this code?

Looks like it was added for http://trac.webkit.org/changeset/52268
Comment 35 Adam Barth 2010-08-15 11:15:44 PDT
> The easy fix is to restore the HTMLEntityNames.cpp generation code for chromium only.

I don't think that's the right fix.  The minimal thing we should do is move this code over to using HTMLEntityTable.  A better fix would be to move this logic into WebCore, where it belongs, and make it correct.
Comment 36 Eric Seidel 2010-08-15 11:18:37 PDT
The logic appears to be trying to build a reverse lookup map for the entity table.

markup.cpp only escapes a few entities, not all of the HTML ones like Chromium seems to.  I'm not sure why they would want to escape all possible entities, that would not end up looking like the original source. :)

Honestly this code is just horribly wrong, and I think we should try not to touch it.  We should retore their build and let someone else fix the wrongness.
Comment 37 Adam Barth 2010-08-15 11:22:45 PDT
> The logic appears to be trying to build a reverse lookup map for the entity table.

Right, which is easy to do from HTMLEntityTable.

> Honestly this code is just horribly wrong, and I think we should try not to touch it.  We should retore their build and let someone else fix the wrongness.

It seems very bad to have two out-of-sync lists of all the entities.  Surely that's not a good idea.
Comment 38 Eric Seidel 2010-08-15 12:40:31 PDT
Yes, but this is clearly already an out-of-sync list.  Between markup.cpp and this code.  Why Chromium has chosen this arbitrary subset of HTML entities to replace when creating markup from a DOM is unclear.  Why they have an explicit blacklist instead of a whitelist is also unclear.  Moving them to the new list of entities will have a behavior change they may or may not want.  Again, completely unclear from this code what they want.

Either path forward is fine.  My gut would say just restore the old broken behavior to fix their compile.  We could also write new broken code for them.  But that sounds like more work which seems silly to do since this whole approach/architecture/code-location seems wrong. :)
Comment 39 Frédéric Wang (:fredw) 2011-06-06 07:45:03 PDT
I don't know if it is the right place to report the problem, but a unit test for MathJax detected the following problems with these Webkit-based browsers (on Windows 7):

Safari 5.0.5: &lang; and &rang; are not converted to the new values.
Chrome 11.0.696.71: &notinE; and &notindot; are converted to ¬inE; and ¬indot;

Maybe it's fixed in development versions, but I just wanted to inform Webkit's developers.
Comment 40 Alexey Proskuryakov 2011-06-06 14:12:45 PDT
Please report new issues in new bugs. I have a few quick comments however.

> Safari 5.0.5: &lang; and &rang; are not converted to the new values.

This was a reckless incompatible spec change. We can update to the new mapping once it's implemented in other browsers, but not just because of a spec change.

> Chrome 11.0.696.71: &notinE; and &notindot; are converted to ¬inE; and ¬indot;

This is something we need to add support for (please file a new bug). Note this is correct parsing for an engine that doesn't support these character references.
Comment 41 Frédéric Wang (:fredw) 2011-06-06 14:34:41 PDT
(In reply to comment #40)
> Please report new issues in new bugs. I have a few quick comments however.

I didn't know if this should be dealt in Webkit or if it specific to each browser. So I preferred to leave a comment here rather than opening duplicated or misclassified bugs.

> 
> > Safari 5.0.5: &lang; and &rang; are not converted to the new values.
> 
> This was a reckless incompatible spec change. We can update to the new mapping once it's implemented in other browsers, but not just because of a spec change.

It's already in stable versions of Chrome and Firefox, at least (I don't know for the other browsers).

> 
> > Chrome 11.0.696.71: &notinE; and &notindot; are converted to ¬inE; and ¬indot;
> 
> This is something we need to add support for (please file a new bug). Note this is correct parsing for an engine that doesn't support these character references.

OK. Again, should I report this in Webkit bugzilla or is it best to do that in a dedicated place for Chrome?
Comment 42 Adam Barth 2011-06-06 15:21:54 PDT
> OK. Again, should I report this in Webkit bugzilla or is it best to do that in a dedicated place for Chrome?

WebKit please.  Thanks!
Comment 43 Alexey Proskuryakov 2011-06-07 09:57:16 PDT
> It's already in stable versions of Chrome and Firefox, at least (I don't know for the other browsers).

If Chrome has updated, then the change is probably in mainline WebKit - I don't think that this is something they customize. Please test with a nightly build from http://nightly.webkit.org or with Safari 5.1 developer preview.
Comment 44 Frédéric Wang (:fredw) 2011-06-09 03:11:42 PDT
(In reply to comment #43)
> > It's already in stable versions of Chrome and Firefox, at least (I don't know for the other browsers).
> 
> If Chrome has updated, then the change is probably in mainline WebKit - I don't think that this is something they customize. Please test with a nightly build from http://nightly.webkit.org or with Safari 5.1 developer preview.

I just tested with a nightly build. The entities lang and rang are still mapped to the former values.
Comment 45 Adam Barth 2011-06-09 10:36:13 PDT
> I just tested with a nightly build. The entities lang and rang are still mapped to the former values.

The entities are stored here:

http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLEntityNames.in

They should be the same for all ports.
Comment 46 Alexey Proskuryakov 2011-06-09 10:43:32 PDT
In any case, please file a new bug (in this Bugzilla).
Comment 47 Frédéric Wang (:fredw) 2012-04-16 12:51:37 PDT
(In reply to comment #45)
> > I just tested with a nightly build. The entities lang and rang are still mapped to the former values.
> 
> The entities are stored here:
> 
> http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLEntityNames.in
> 
> They should be the same for all ports.

I just tested again with the latest release of Safari and lang and rang are now mapped to the new entities.

Thanks.