Bug 20233

Summary: Please expose an API to convert to/from Named Entities
Product: WebKit Reporter: Dan Wood <dwood>
Component: WebKit APIAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Enhancement CC: ap
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.5   

Description Dan Wood 2008-07-30 13:46:32 PDT
There are a hundred or two "named entities" which are legitimate ways to encode characters in HTML.  For instance, &euro; is a euro sign, &Ccedil; is a capital C with a cedilla, etc.

Many programs - e.g. Sandvox (my App) from Karelia, Code from Panic, UnicodeChecker from Panic, Flow from Extendamac, and application(s) from Connected Flow and Toxic Software, just to name a few that I was able to find, have a need to convert named entities to the actual Unicode character values.  Each of these applications has had to roll their own "homemade" solutions to this, which usually means building up a big data structure.

Oh of course, WebKit needs to do this too, and it has code in there to parse entities; see "HTMLEntityNames.gperf" for example.

It would be greatly beneficial to third-party applications if WebKit could expose an API to convert from an entity into its corresponding Unicode equivalent.  That way, applications could just use the function provided by the Operating System (in WebKit) rather than having to invent their own bulky solution.

(If you don't think this is a great need, please note again the list of shipping software that has had a need to do this.) 

The data and the functionality are *already* in webkit; this is just a request to expose this for use by other applications.
Comment 1 Mark Rowe (bdash) 2008-07-30 18:04:52 PDT
<rdar://problem/6114352>
Comment 2 Alexey Proskuryakov 2008-07-31 03:17:52 PDT
The bug summary also mentions converting from Unicode to named entities - are there any use cases for such an API?