Bug 43949

Summary: Add support for MathML entities
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: WebCore Misc.Assignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, ap, aroben, commit-queue, darin, dbates, dglazkov, eric, fishd, fred.wang, jamesr, japhet, jchaffraix, mark, sausset, tony, webkit-ews, webkit.review.bot, yaar
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Other   
OS: OS X 10.5   
Bug Depends on: 43595, 44036    
Bug Blocks: 84012    
Attachments:
Description Flags
Now with all build systems including HTMLEntitySearch.*
none
Now should compile for everyone except chromium and possibly tiger
none
Now should compile for everyone
none
Fixed mac build
none
Patch to fix cr-win builder none

Description Eric Seidel (no email) 2010-08-12 22:26:31 PDT
+++ This bug was initially created as a clone of Bug #43595 +++

Add support for MathML entities

The original bug got spammed, so making a new one.
Comment 1 Eric Seidel (no email) 2010-08-12 22:26:52 PDT
*** Bug 43595 has been marked as a duplicate of this bug. ***
Comment 2 Eric Seidel (no email) 2010-08-12 22:30:46 PDT
@mmentovai: My gyp changes in the previous patch were partial and wrong.  I just copied the gperf rule and modified it, so any typos you saw were also in the gperf rule. :)
Comment 3 Eric Seidel (no email) 2010-08-12 22:33:15 PDT
Created attachment 64303 [details]
Now with all build systems including HTMLEntitySearch.*
Comment 4 Eric Seidel (no email) 2010-08-12 23:00:37 PDT
Attachment 64303 [details] did not build on mac:
Build output: http://queues.webkit.org/results/3754101
Comment 5 Adam Barth 2010-08-13 09:24:46 PDT
Comment on attachment 64303 [details]
Now with all build systems including HTMLEntitySearch.*

WebKitTools/ChangeLog:12
 +                patch will not build!  OOPS!
Maybe remove this note?

WebCore/html/HTMLEntityParser.cpp:35
 +  #include "HTMLEntityNames.cpp"
How does this still compile?
Comment 6 Eric Seidel (no email) 2010-08-13 09:35:08 PDT
Bah.  You mean we need the old entity code for the xml code?!?
Comment 7 Eric Seidel (no email) 2010-08-13 12:26:52 PDT
Created attachment 64363 [details]
Now should compile for everyone except chromium and possibly tiger
Comment 8 Eric Seidel (no email) 2010-08-13 12:30:54 PDT
Hmm... my patches are having the same trouble on win-ews.  I think this is a svn-apply regression. :(
Comment 9 Eric Seidel (no email) 2010-08-13 12:33:57 PDT
Attachment 64363 [details] did not build on mac:
Build output: http://queues.webkit.org/results/3757134
Comment 10 Eric Seidel (no email) 2010-08-13 14:24:00 PDT
Odd.  It seems to build both debug and release on my machine.  Maybe a SL vs. Leopard difference.  Maybe I forgot to do a clean build locally?
Comment 11 Adam Barth 2010-08-13 14:38:12 PDT
I bet the generated file was left over.
Comment 12 Eric Seidel (no email) 2010-08-13 14:57:01 PDT
Created attachment 64377 [details]
Now should compile for everyone
Comment 13 Eric Seidel (no email) 2010-08-13 15:06:21 PDT
Attachment 64377 [details] did not build on mac:
Build output: http://queues.webkit.org/results/3712146
Comment 14 Eric Seidel (no email) 2010-08-13 15:12:44 PDT
Doing a clean build to see if I can figure out what's wrong with Mac.  If that doesn't fix it, I'll poke the ews bot.
Comment 15 Eric Seidel (no email) 2010-08-13 15:46:40 PDT
Clean build worked locally.  I'll poke the EWS bot...
Comment 16 Eric Seidel (no email) 2010-08-13 16:01:19 PDT
Created attachment 64384 [details]
Fixed mac build
Comment 17 Eric Seidel (no email) 2010-08-13 16:15:13 PDT
This looks ready to land.  We'll fix any Tiger/Chromium breakage after landing if necessary.
Comment 18 Eric Seidel (no email) 2010-08-13 19:01:13 PDT
Comment on attachment 64384 [details]
Fixed mac build

Since Adam wrote the patch, I guess I'm supposed to r+ it.

The code looks great.  Lets hope I did the build system integration right. :)
Comment 19 Eric Seidel (no email) 2010-08-13 20:18:39 PDT
Committed r65351: <http://trac.webkit.org/changeset/65351>
Comment 20 WebKit Review Bot 2010-08-13 20:26:33 PDT
http://trac.webkit.org/changeset/65351 might have broken Chromium Linux Release
Comment 21 Eric Seidel (no email) 2010-08-13 20:29:13 PDT
Committed r65352: <http://trac.webkit.org/changeset/65352>
Comment 22 Eric Seidel (no email) 2010-08-13 20:32:32 PDT
Committed r65353: <http://trac.webkit.org/changeset/65353>
Comment 23 WebKit Review Bot 2010-08-13 21:11:13 PDT
http://trac.webkit.org/changeset/65353 might have broken SnowLeopard Intel Release (Tests)
Comment 24 Eric Seidel (no email) 2010-08-13 22:38:22 PDT
Created attachment 64407 [details]
Patch to fix cr-win builder
Comment 25 Eric Seidel (no email) 2010-08-13 22:39:01 PDT
I would have landed this manually, except I have other stuff in my tree which would make me need a full rebuild. :p
Comment 26 WebKit Commit Bot 2010-08-13 23:45:05 PDT
Comment on attachment 64407 [details]
Patch to fix cr-win builder

Clearing flags on attachment: 64407

Committed r65359: <http://trac.webkit.org/changeset/65359>
Comment 27 WebKit Commit Bot 2010-08-13 23:45:12 PDT
All reviewed patches have been landed.  Closing bug.
Comment 28 Eric Seidel (no email) 2010-08-14 00:18:34 PDT
Committed r65360: <http://trac.webkit.org/changeset/65360>
Comment 29 WebKit Review Bot 2010-08-14 00:55:57 PDT
http://trac.webkit.org/changeset/65360 might have broken SnowLeopard Intel Release (Tests)
Comment 30 Mihai Parparita 2010-08-15 10:55:24 PDT
I'm still having trouble building Chromium/TestShell after http://trac.webkit.org/changeset/65351 (which removed HTMLEntityNames.gperf, even with http://trac.webkit.org/changeset/65353.  WebKit/chromium/src/WebEntities.cpp still includes HTMLEntityNames.cpp, and uses the Entity struct from it:

http://trac.webkit.org/browser/trunk/WebKit/chromium/src/WebEntities.cpp#L50
Comment 31 Eric Seidel (no email) 2010-08-15 11:02:40 PDT
Yup.  I could see how that would break.

We'll need to re-write WebEntities.cpp.  That code looks wrong.

Why would chromium even need this code?
Comment 32 Eric Seidel (no email) 2010-08-15 11:05:56 PDT
The easy fix is to restore the HTMLEntityNames.cpp generation code for chromium only.

The code as-written is wrong for HTML5.  But someone from chromium who understands this code will have to fix that.
Comment 33 Eric Seidel (no email) 2010-08-15 11:11:09 PDT
I suspect the WebEntities stuff is part of Chromiums broken hacks around markup.cpp.  iirc, Chromium does its own dom serialization outside of WebCore.  Which is super-wrong.  The correct fix is to fix markup.cpp to work as Chromium needs it to.

But someone who knows what this code is for should comment.  I'll see if I can sketch up a patch to fix the build break.
Comment 34 Mihai Parparita 2010-08-15 11:12:21 PDT
(In reply to comment #31)
> Why would chromium even need this code?

Looks like it was added for http://trac.webkit.org/changeset/52268
Comment 35 Adam Barth 2010-08-15 11:15:44 PDT
> The easy fix is to restore the HTMLEntityNames.cpp generation code for chromium only.

I don't think that's the right fix.  The minimal thing we should do is move this code over to using HTMLEntityTable.  A better fix would be to move this logic into WebCore, where it belongs, and make it correct.
Comment 36 Eric Seidel (no email) 2010-08-15 11:18:37 PDT
The logic appears to be trying to build a reverse lookup map for the entity table.

markup.cpp only escapes a few entities, not all of the HTML ones like Chromium seems to.  I'm not sure why they would want to escape all possible entities, that would not end up looking like the original source. :)

Honestly this code is just horribly wrong, and I think we should try not to touch it.  We should retore their build and let someone else fix the wrongness.
Comment 37 Adam Barth 2010-08-15 11:22:45 PDT
> The logic appears to be trying to build a reverse lookup map for the entity table.

Right, which is easy to do from HTMLEntityTable.

> Honestly this code is just horribly wrong, and I think we should try not to touch it.  We should retore their build and let someone else fix the wrongness.

It seems very bad to have two out-of-sync lists of all the entities.  Surely that's not a good idea.
Comment 38 Eric Seidel (no email) 2010-08-15 12:40:31 PDT
Yes, but this is clearly already an out-of-sync list.  Between markup.cpp and this code.  Why Chromium has chosen this arbitrary subset of HTML entities to replace when creating markup from a DOM is unclear.  Why they have an explicit blacklist instead of a whitelist is also unclear.  Moving them to the new list of entities will have a behavior change they may or may not want.  Again, completely unclear from this code what they want.

Either path forward is fine.  My gut would say just restore the old broken behavior to fix their compile.  We could also write new broken code for them.  But that sounds like more work which seems silly to do since this whole approach/architecture/code-location seems wrong. :)
Comment 39 Frédéric Wang (:fredw) 2011-06-06 07:45:03 PDT
I don't know if it is the right place to report the problem, but a unit test for MathJax detected the following problems with these Webkit-based browsers (on Windows 7):

Safari 5.0.5: &lang; and &rang; are not converted to the new values.
Chrome 11.0.696.71: &notinE; and &notindot; are converted to ¬inE; and ¬indot;

Maybe it's fixed in development versions, but I just wanted to inform Webkit's developers.
Comment 40 Alexey Proskuryakov 2011-06-06 14:12:45 PDT
Please report new issues in new bugs. I have a few quick comments however.

> Safari 5.0.5: &lang; and &rang; are not converted to the new values.

This was a reckless incompatible spec change. We can update to the new mapping once it's implemented in other browsers, but not just because of a spec change.

> Chrome 11.0.696.71: &notinE; and &notindot; are converted to ¬inE; and ¬indot;

This is something we need to add support for (please file a new bug). Note this is correct parsing for an engine that doesn't support these character references.
Comment 41 Frédéric Wang (:fredw) 2011-06-06 14:34:41 PDT
(In reply to comment #40)
> Please report new issues in new bugs. I have a few quick comments however.

I didn't know if this should be dealt in Webkit or if it specific to each browser. So I preferred to leave a comment here rather than opening duplicated or misclassified bugs.

> 
> > Safari 5.0.5: &lang; and &rang; are not converted to the new values.
> 
> This was a reckless incompatible spec change. We can update to the new mapping once it's implemented in other browsers, but not just because of a spec change.

It's already in stable versions of Chrome and Firefox, at least (I don't know for the other browsers).

> 
> > Chrome 11.0.696.71: &notinE; and &notindot; are converted to ¬inE; and ¬indot;
> 
> This is something we need to add support for (please file a new bug). Note this is correct parsing for an engine that doesn't support these character references.

OK. Again, should I report this in Webkit bugzilla or is it best to do that in a dedicated place for Chrome?
Comment 42 Adam Barth 2011-06-06 15:21:54 PDT
> OK. Again, should I report this in Webkit bugzilla or is it best to do that in a dedicated place for Chrome?

WebKit please.  Thanks!
Comment 43 Alexey Proskuryakov 2011-06-07 09:57:16 PDT
> It's already in stable versions of Chrome and Firefox, at least (I don't know for the other browsers).

If Chrome has updated, then the change is probably in mainline WebKit - I don't think that this is something they customize. Please test with a nightly build from http://nightly.webkit.org or with Safari 5.1 developer preview.
Comment 44 Frédéric Wang (:fredw) 2011-06-09 03:11:42 PDT
(In reply to comment #43)
> > It's already in stable versions of Chrome and Firefox, at least (I don't know for the other browsers).
> 
> If Chrome has updated, then the change is probably in mainline WebKit - I don't think that this is something they customize. Please test with a nightly build from http://nightly.webkit.org or with Safari 5.1 developer preview.

I just tested with a nightly build. The entities lang and rang are still mapped to the former values.
Comment 45 Adam Barth 2011-06-09 10:36:13 PDT
> I just tested with a nightly build. The entities lang and rang are still mapped to the former values.

The entities are stored here:

http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLEntityNames.in

They should be the same for all ports.
Comment 46 Alexey Proskuryakov 2011-06-09 10:43:32 PDT
In any case, please file a new bug (in this Bugzilla).
Comment 47 Frédéric Wang (:fredw) 2012-04-16 12:51:37 PDT
(In reply to comment #45)
> > I just tested with a nightly build. The entities lang and rang are still mapped to the former values.
> 
> The entities are stored here:
> 
> http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLEntityNames.in
> 
> They should be the same for all ports.

I just tested again with the latest release of Safari and lang and rang are now mapped to the new entities.

Thanks.