OVERVIEW DESCRIPTION: Some entities in documents parsed using WebKit's XML parser are not recognised. Examples of unrecognised entities include 'copy' and 'raquo'. STEPS TO REPRODUCE: Load an XML page containing entities such as 'copy' and 'raquo'. ACTUAL RESULTS: The entities are not displayed. The top of the page shows the following error message: This page contains the following errors: error on line 7 at column 56: Entity 'copy' not defined Below is a rendering of the page up to the first error. EXPECTED RESULTS: The entities should show up and there should be no error message. ADDITIONAL BUILDS: This bug can be reproduced in 412 and up. I believe I encountered this bug in 312 as well, but I'm not 100% sure. ADDITIONAL INFORMATION: Darin said: "we broke it by moving from expat to libxml2".
Confirmed with ToT.
This one is mine.
This is fixed in TOT.
Oops.
We're only supposed to support these entities: http://www.w3.org/TR/REC-xml/#sec-predefined-ent this xml file seems to depend on an html entity "copy", which xml as far as I can tell, is not supposed to support: http://www.w3.org/TR/REC-html40/sgml/entities.html
*** Bug 3861 has been marked as a duplicate of this bug. ***
I was incorrect. XHTML documents require us to support HTML entities. So this is a valid bug: http://www.w3.org/TR/xhtml1/#h-A2
Created attachment 4378 [details] First pass at fixing this Ok, so I took a stab at fixing this tonight. Unfortunately there are some other minor changes for XSLTProcessor support included in this patch. Named entities still don't work correctly (I'm not sure why?), but numerical entities work just fine. I'll hopefully get around to finishing this later this week, unless somebody beats me to it.
FYI I've been using this *fabulous* set of test pages: http://www.w3.org/People/mimasa/test/xhtml/entities/
Created attachment 4550 [details] Updated patch that handles named entities correctly The thing that was preventing named entities from working was that the values were being passed through as utf-16 while libxml was expecting utf-8. It's also worth noting that character references ("numerical entities") are handled internally by libxml so getXHTMLEntity doesn't need to handle them.
Comment on attachment 4550 [details] Updated patch that handles named entities correctly Patch looks great. You should add an ASSERT(value.length() < 5) to make sure we don't screw ourselves by changing the entity lookup code some day.
Created attachment 4556 [details] Patch with assertion to ensure data will fit in static buffer
Comment on attachment 4556 [details] Patch with assertion to ensure data will fit in static buffer Looks great. r=me. Darin or mjs could look at it, but I don't think they're gonna have anything else to say.
Comment on attachment 4556 [details] Patch with assertion to ensure data will fit in static buffer Oops. Almost forgot. This needs a test case. I suggest just landing the w3c test page w/ text results.
As darin pointed out, the code comment should be changed: Using a global variable entity and marking it XML_INTERNAL_PREDEFINED_ENTITY is a hack to avoid malloc/free. However using a global variable like this could cause trouble if libxml implementation details were to change.
Mark and I sorta left this one hanging...
Created attachment 5337 [details] Updated patch including layout test + comment change
Comment on attachment 5337 [details] Updated patch including layout test + comment change I think getXHTMLEntity should use memcpy instead of a loop to copy the data.
Comment on attachment 5337 [details] Updated patch including layout test + comment change I agree with darin, it shoudl be changed to memcpy. I'll change it as I land. r=me.
Comment on attachment 5337 [details] Updated patch including layout test + comment change Including "kentities.c" like this results in two copies of the HTML entities table. It would be much better to instead find a way to share a single copy of the table.
Created attachment 5358 [details] Patch addressing darin's comments.
Created attachment 5359 [details] now including kentities.h
Addressed darin's concerns. Landed.
TESTCASE: http://www.hixie.ch/tests/adhoc/xml/parsing/010.xml You should get a well-formedness error on that page. If you see the word "fail" then the test has failed (and that's a regression).
Filed regression as bug 6290.