WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
4301
Support HTML entities on pages parsed as XHTML
https://bugs.webkit.org/show_bug.cgi?id=4301
Summary
Support HTML entities on pages parsed as XHTML
Denis Defreyne
Reported
2005-08-06 11:25:39 PDT
OVERVIEW DESCRIPTION: Some entities in documents parsed using WebKit's XML parser are not recognised. Examples of unrecognised entities include 'copy' and 'raquo'. STEPS TO REPRODUCE: Load an XML page containing entities such as 'copy' and 'raquo'. ACTUAL RESULTS: The entities are not displayed. The top of the page shows the following error message: This page contains the following errors: error on line 7 at column 56: Entity 'copy' not defined Below is a rendering of the page up to the first error. EXPECTED RESULTS: The entities should show up and there should be no error message. ADDITIONAL BUILDS: This bug can be reproduced in 412 and up. I believe I encountered this bug in 312 as well, but I'm not 100% sure. ADDITIONAL INFORMATION: Darin said: "we broke it by moving from expat to libxml2".
Attachments
First pass at fixing this
(7.22 KB, patch)
2005-10-17 00:53 PDT
,
Eric Seidel (no email)
no flags
Details
Formatted Diff
Diff
Updated patch that handles named entities correctly
(3.56 KB, patch)
2005-11-01 05:19 PST
,
Mark Rowe (bdash)
no flags
Details
Formatted Diff
Diff
Patch with assertion to ensure data will fit in static buffer
(3.63 KB, patch)
2005-11-01 16:36 PST
,
Mark Rowe (bdash)
eric
: review-
Details
Formatted Diff
Diff
Updated patch including layout test + comment change
(367.93 KB, patch)
2005-12-28 07:41 PST
,
Mark Rowe (bdash)
eric
: review+
Details
Formatted Diff
Diff
Patch addressing darin's comments.
(7.17 KB, patch)
2005-12-29 15:21 PST
,
Eric Seidel (no email)
no flags
Details
Formatted Diff
Diff
now including kentities.h
(9.00 KB, patch)
2005-12-29 15:27 PST
,
Eric Seidel (no email)
no flags
Details
Formatted Diff
Diff
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Mark Rowe (bdash)
Comment 1
2005-08-06 22:16:05 PDT
Confirmed with ToT.
Eric Seidel (no email)
Comment 2
2005-09-08 00:01:07 PDT
This one is mine.
Eric Seidel (no email)
Comment 3
2005-10-05 15:44:31 PDT
This is fixed in TOT.
Eric Seidel (no email)
Comment 4
2005-10-05 16:10:23 PDT
Oops.
Eric Seidel (no email)
Comment 5
2005-10-05 16:23:58 PDT
We're only supposed to support these entities:
http://www.w3.org/TR/REC-xml/#sec-predefined-ent
this xml file seems to depend on an html entity "copy", which xml as far as I can tell, is not supposed to support:
http://www.w3.org/TR/REC-html40/sgml/entities.html
Eric Seidel (no email)
Comment 6
2005-10-16 21:36:51 PDT
***
Bug 3861
has been marked as a duplicate of this bug. ***
Eric Seidel (no email)
Comment 7
2005-10-16 21:47:03 PDT
I was incorrect. XHTML documents require us to support HTML entities. So this is a valid bug:
http://www.w3.org/TR/xhtml1/#h-A2
Eric Seidel (no email)
Comment 8
2005-10-17 00:53:40 PDT
Created
attachment 4378
[details]
First pass at fixing this Ok, so I took a stab at fixing this tonight. Unfortunately there are some other minor changes for XSLTProcessor support included in this patch. Named entities still don't work correctly (I'm not sure why?), but numerical entities work just fine. I'll hopefully get around to finishing this later this week, unless somebody beats me to it.
Eric Seidel (no email)
Comment 9
2005-10-17 00:54:52 PDT
FYI I've been using this *fabulous* set of test pages:
http://www.w3.org/People/mimasa/test/xhtml/entities/
Mark Rowe (bdash)
Comment 10
2005-11-01 05:19:36 PST
Created
attachment 4550
[details]
Updated patch that handles named entities correctly The thing that was preventing named entities from working was that the values were being passed through as utf-16 while libxml was expecting utf-8. It's also worth noting that character references ("numerical entities") are handled internally by libxml so getXHTMLEntity doesn't need to handle them.
Eric Seidel (no email)
Comment 11
2005-11-01 11:37:21 PST
Comment on
attachment 4550
[details]
Updated patch that handles named entities correctly Patch looks great. You should add an ASSERT(value.length() < 5) to make sure we don't screw ourselves by changing the entity lookup code some day.
Mark Rowe (bdash)
Comment 12
2005-11-01 16:36:11 PST
Created
attachment 4556
[details]
Patch with assertion to ensure data will fit in static buffer
Eric Seidel (no email)
Comment 13
2005-11-01 16:43:02 PST
Comment on
attachment 4556
[details]
Patch with assertion to ensure data will fit in static buffer Looks great. r=me. Darin or mjs could look at it, but I don't think they're gonna have anything else to say.
Eric Seidel (no email)
Comment 14
2005-11-01 16:45:58 PST
Comment on
attachment 4556
[details]
Patch with assertion to ensure data will fit in static buffer Oops. Almost forgot. This needs a test case. I suggest just landing the w3c test page w/ text results.
Eric Seidel (no email)
Comment 15
2005-11-01 17:05:33 PST
As darin pointed out, the code comment should be changed: Using a global variable entity and marking it XML_INTERNAL_PREDEFINED_ENTITY is a hack to avoid malloc/free. However using a global variable like this could cause trouble if libxml implementation details were to change.
Eric Seidel (no email)
Comment 16
2005-11-30 01:37:54 PST
Mark and I sorta left this one hanging...
Mark Rowe (bdash)
Comment 17
2005-12-28 07:41:45 PST
Created
attachment 5337
[details]
Updated patch including layout test + comment change
Darin Adler
Comment 18
2005-12-28 10:40:45 PST
Comment on
attachment 5337
[details]
Updated patch including layout test + comment change I think getXHTMLEntity should use memcpy instead of a loop to copy the data.
Eric Seidel (no email)
Comment 19
2005-12-28 21:38:56 PST
Comment on
attachment 5337
[details]
Updated patch including layout test + comment change I agree with darin, it shoudl be changed to memcpy. I'll change it as I land. r=me.
Darin Adler
Comment 20
2005-12-29 09:02:21 PST
Comment on
attachment 5337
[details]
Updated patch including layout test + comment change Including "kentities.c" like this results in two copies of the HTML entities table. It would be much better to instead find a way to share a single copy of the table.
Eric Seidel (no email)
Comment 21
2005-12-29 15:21:37 PST
Created
attachment 5358
[details]
Patch addressing darin's comments.
Eric Seidel (no email)
Comment 22
2005-12-29 15:27:52 PST
Created
attachment 5359
[details]
now including kentities.h
Eric Seidel (no email)
Comment 23
2005-12-29 15:49:08 PST
Addressed darin's concerns. Landed.
Ian 'Hixie' Hickson
Comment 24
2005-12-29 16:53:36 PST
TESTCASE:
http://www.hixie.ch/tests/adhoc/xml/parsing/010.xml
You should get a well-formedness error on that page. If you see the word "fail" then the test has failed (and that's a regression).
Ian 'Hixie' Hickson
Comment 25
2005-12-29 16:56:55 PST
Filed regression as
bug 6290
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug