Bug 20233 - Please expose an API to convert to/from Named Entities
Summary: Please expose an API to convert to/from Named Entities
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit API (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.5
: P2 Enhancement
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2008-07-30 13:46 PDT by Dan Wood
Modified: 2013-04-17 22:23 PDT (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Wood 2008-07-30 13:46:32 PDT
There are a hundred or two "named entities" which are legitimate ways to encode characters in HTML.  For instance, € is a euro sign, Ç is a capital C with a cedilla, etc.

Many programs - e.g. Sandvox (my App) from Karelia, Code from Panic, UnicodeChecker from Panic, Flow from Extendamac, and application(s) from Connected Flow and Toxic Software, just to name a few that I was able to find, have a need to convert named entities to the actual Unicode character values.  Each of these applications has had to roll their own "homemade" solutions to this, which usually means building up a big data structure.

Oh of course, WebKit needs to do this too, and it has code in there to parse entities; see "HTMLEntityNames.gperf" for example.

It would be greatly beneficial to third-party applications if WebKit could expose an API to convert from an entity into its corresponding Unicode equivalent.  That way, applications could just use the function provided by the Operating System (in WebKit) rather than having to invent their own bulky solution.

(If you don't think this is a great need, please note again the list of shipping software that has had a need to do this.) 

The data and the functionality are *already* in webkit; this is just a request to expose this for use by other applications.
Comment 1 Mark Rowe (bdash) 2008-07-30 18:04:52 PDT
<rdar://problem/6114352>
Comment 2 Alexey Proskuryakov 2008-07-31 03:17:52 PDT
The bug summary also mentions converting from Unicode to named entities - are there any use cases for such an API?