RESOLVED DUPLICATE of bug 43949 43595
Add support for MathML entities
https://bugs.webkit.org/show_bug.cgi?id=43595
Summary Add support for MathML entities
Adam Barth
Reported 2010-08-05 17:18:28 PDT
Add support for MathML entities
Attachments
work in progress (14.92 KB, patch)
2010-08-05 17:19 PDT, Adam Barth
no flags
Works, no build integration (32.01 KB, patch)
2010-08-05 20:50 PDT, Adam Barth
no flags
Patch (33.06 KB, patch)
2010-08-10 00:14 PDT, Adam Barth
no flags
Patch (33.13 KB, patch)
2010-08-10 00:45 PDT, Adam Barth
no flags
Patch (55.02 KB, patch)
2010-08-10 18:01 PDT, Adam Barth
no flags
compiles on mac, likely needs further tweaks (38.56 KB, patch)
2010-08-12 20:48 PDT, Eric Seidel (no email)
no flags
Should fix Qt/Gtk, gyp/chromium was beyond me (48.71 KB, patch)
2010-08-12 21:33 PDT, Eric Seidel (no email)
no flags
Another attempt to fix Qt (48.76 KB, patch)
2010-08-12 21:58 PDT, Eric Seidel (no email)
no flags
Now with all build systems including HTMLEntitySearch.* (49.40 KB, patch)
2010-08-12 22:17 PDT, Eric Seidel (no email)
no flags
Now with all build systems including HTMLEntitySearch.* (48.08 KB, patch)
2010-08-12 22:30 PDT, Eric Seidel (no email)
no flags
Adam Barth
Comment 1 2010-08-05 17:19:21 PDT
Created attachment 63667 [details] work in progress
Adam Barth
Comment 2 2010-08-05 20:39:59 PDT
*** Bug 42041 has been marked as a duplicate of this bug. ***
Adam Barth
Comment 3 2010-08-05 20:50:29 PDT
Created attachment 63684 [details] Works, no build integration
Eric Seidel (no email)
Comment 4 2010-08-06 10:50:02 PDT
Comment on attachment 63684 [details] Works, no build integration Do we have a way to perf test this?
Eric Seidel (no email)
Comment 5 2010-08-06 10:57:36 PDT
I think this fantastic. But I think we should probably talk about this in person when we're in the office tomorrow.
Adam Barth
Comment 6 2010-08-06 11:03:52 PDT
We can certainly write a benchmark. The question is how relate it to real-world content. At the least, we can run it through the html-parser benchmark.
Eric Seidel (no email)
Comment 7 2010-08-06 11:07:42 PDT
I was thinking we might just make an html page with all of the old html entities on it, maybe 100 times or something? And then run that through the html parser benchmark? Donno. Either way, I'm happy to help you get this landed tomorrow. It looks fantastic.
Adam Barth
Comment 8 2010-08-10 00:14:27 PDT
Adam Barth
Comment 9 2010-08-10 00:45:18 PDT
Adam Barth
Comment 10 2010-08-10 00:45:43 PDT
This version is faster than the old entity parser, as far as I can tell.
Eric Seidel (no email)
Comment 11 2010-08-10 00:55:19 PDT
Comment on attachment 63981 [details] Patch WebKitTools/Scripts/create-html-entity-table:142 + if (c >= 'A' && c <= 'Z') Shouldn't these use our ascii helper methods? WebCore/html/HTMLEntityTable.h:43 + static const HTMLEntityTableEntry* start(); I'm confused by these names. You explained in person this was the global start/end. aka, the start/end of the empty string. WebCore/html/HTMLEntityTable.h:46 + static const HTMLEntityTableEntry* start(UChar); Confused similarly by these names. I'll have to review this for real when I'm less tired. But looks great. I'm happy to help you wire it up.
Adam Barth
Comment 12 2010-08-10 18:01:04 PDT
Adam Barth
Comment 13 2010-08-10 18:21:04 PDT
Comment on attachment 64060 [details] Patch Sorry. I got confused and asked git to slam two patches together.
Eric Seidel (no email)
Comment 14 2010-08-12 19:51:53 PDT
Comment on attachment 64060 [details] Patch This patch wrongly included some of your view-source patch (which is probably why it's failing to apply). I'm working to get a version of it applied to my local copy now...
Eric Seidel (no email)
Comment 15 2010-08-12 20:48:03 PDT
Created attachment 64295 [details] compiles on mac, likely needs further tweaks
Eric Seidel (no email)
Comment 16 2010-08-12 20:48:55 PDT
We're likely going to have to do some sys.path hacks to get around webkitpy initialization code on Tiger, since Tiger only has python 2.3. Otherwise this patch may be ready for landing. I'm not sure if all platforms use DerivedSources.make though.
Early Warning System Bot
Comment 17 2010-08-12 20:58:34 PDT
Eric Seidel (no email)
Comment 18 2010-08-12 21:33:46 PDT
Created attachment 64297 [details] Should fix Qt/Gtk, gyp/chromium was beyond me
Eric Seidel (no email)
Comment 19 2010-08-12 21:35:33 PDT
GYP seems to use file-extension base rules, which doesn't fit well to using ".json" here. It also doesn't play nice that our input is HTMLEntityNames.json and our output is HTMLEntityTable.cpp. So I'm not sure how to fix the chromium case.
Early Warning System Bot
Comment 20 2010-08-12 21:46:33 PDT
Eric Seidel (no email)
Comment 21 2010-08-12 21:58:03 PDT
Created attachment 64298 [details] Another attempt to fix Qt
Early Warning System Bot
Comment 22 2010-08-12 22:04:46 PDT
Eric Seidel (no email)
Comment 23 2010-08-12 22:17:59 PDT
Created attachment 64299 [details] Now with all build systems including HTMLEntitySearch.*
Mark Mentovai
Comment 24 2010-08-12 22:18:51 PDT
Comment on attachment 64298 [details] Another attempt to fix Qt > diff --git a/LayoutTests/ChangeLog b/LayoutTests/ChangeLog > index 415a62f609c289e3f25e337f158de5db0139cf80..62ca60f2a3f8fb35c50b0caed3b68c0964cd153b 100644 > --- a/LayoutTests/ChangeLog > +++ b/LayoutTests/ChangeLog > @@ -1,3 +1,15 @@ > +2010-08-12 Adam Barth <abarth@webkit.org> > + > + Reviewed by NOBODY (OOPS!). > + > + Add support for MathML entities > + https://bugs.webkit.org/show_bug.cgi?id=43595 > + > + Test progression for proper entity support. > + > + * html5lib/runner-expected-html5.txt: > + * html5lib/runner-expected.txt: > + > 2010-08-12 Tony Chang <tony@chromium.org> > > Unreviewed, landing google-chrome linux 64 test results. > diff --git a/LayoutTests/html5lib/runner-expected-html5.txt b/LayoutTests/html5lib/runner-expected-html5.txt > index 2eb01b4cbf8aa762384ab5810f2f18f799a25dcf..84c72174fdce1be53d168da68266ed29234a984c 100644 > --- a/LayoutTests/html5lib/runner-expected-html5.txt > +++ b/LayoutTests/html5lib/runner-expected-html5.txt > @@ -118,92 +118,10 @@ resources/doctype01.dat: PASS > > resources/scriptdata01.dat: PASS > > -resources/html5test-com.dat: > -7 > -9 > -10 > -11 > - > -Test 7 of 24 in resources/html5test-com.dat failed. Input: > -&lang;&rang; > -Got: > -| <html> > -| <head> > -| <body> > -| "ãã" > -Expected: > -| <html> > -| <head> > -| <body> > -| "â¨â©" > - > -Test 9 of 24 in resources/html5test-com.dat failed. Input: > -&ImaginaryI; > -Got: > -| <html> > -| <head> > -| <body> > -| "&ImaginaryI;" > -Expected: > -| <html> > -| <head> > -| <body> > -| "â" > - > -Test 10 of 24 in resources/html5test-com.dat failed. Input: > -&Kopf; > -Got: > -| <html> > -| <head> > -| <body> > -| "&Kopf;" > -Expected: > -| <html> > -| <head> > -| <body> > -| "ð" > +resources/html5test-com.dat: PASS > > -Test 11 of 24 in resources/html5test-com.dat failed. Input: > -&notinva; > -Got: > -| <html> > -| <head> > -| <body> > -| "&notinva;" > -Expected: > -| <html> > -| <head> > -| <body> > -| "â" > -resources/entities01.dat: > -2 > -5 > - > -Test 2 of 68 in resources/entities01.dat failed. Input: > -FOO&gtBAR > -Got: > -| <html> > -| <head> > -| <body> > -| "FOO&gtBAR" > -Expected: > -| <html> > -| <head> > -| <body> > -| "FOO>BAR" > +resources/entities01.dat: PASS > > -Test 5 of 68 in resources/entities01.dat failed. Input: > -I'm &notit; I tell you > -Got: > -| <html> > -| <head> > -| <body> > -| "I'm &notit; I tell you" > -Expected: > -| <html> > -| <head> > -| <body> > -| "I'm ¬it; I tell you" > resources/entities02.dat: PASS > > resources/comments01.dat: PASS > diff --git a/LayoutTests/html5lib/runner-expected.txt b/LayoutTests/html5lib/runner-expected.txt > index 8a8f7be38c66e31f7d5c91eaa89b5f838fa3a2a9..c9ae245480d49ff502f7030712c24b8c3055e765 100644 > --- a/LayoutTests/html5lib/runner-expected.txt > +++ b/LayoutTests/html5lib/runner-expected.txt > @@ -191,92 +191,10 @@ resources/doctype01.dat: PASS > > resources/scriptdata01.dat: PASS > > -resources/html5test-com.dat: > -7 > -9 > -10 > -11 > - > -Test 7 of 24 in resources/html5test-com.dat failed. Input: > -&lang;&rang; > -Got: > -| <html> > -| <head> > -| <body> > -| "ãã" > -Expected: > -| <html> > -| <head> > -| <body> > -| "â¨â©" > - > -Test 9 of 24 in resources/html5test-com.dat failed. Input: > -&ImaginaryI; > -Got: > -| <html> > -| <head> > -| <body> > -| "&ImaginaryI;" > -Expected: > -| <html> > -| <head> > -| <body> > -| "â" > - > -Test 10 of 24 in resources/html5test-com.dat failed. Input: > -&Kopf; > -Got: > -| <html> > -| <head> > -| <body> > -| "&Kopf;" > -Expected: > -| <html> > -| <head> > -| <body> > -| "ð" > +resources/html5test-com.dat: PASS > > -Test 11 of 24 in resources/html5test-com.dat failed. Input: > -&notinva; > -Got: > -| <html> > -| <head> > -| <body> > -| "&notinva;" > -Expected: > -| <html> > -| <head> > -| <body> > -| "â" > -resources/entities01.dat: > -2 > -5 > +resources/entities01.dat: PASS > > -Test 2 of 68 in resources/entities01.dat failed. Input: > -FOO&gtBAR > -Got: > -| <html> > -| <head> > -| <body> > -| "FOO&gtBAR" > -Expected: > -| <html> > -| <head> > -| <body> > -| "FOO>BAR" > - > -Test 5 of 68 in resources/entities01.dat failed. Input: > -I'm &notit; I tell you > -Got: > -| <html> > -| <head> > -| <body> > -| "I'm &notit; I tell you" > -Expected: > -| <html> > -| <head> > -| <body> > -| "I'm ¬it; I tell you" > resources/entities02.dat: PASS > > resources/comments01.dat: PASS > diff --git a/WebCore/ChangeLog b/WebCore/ChangeLog > index 207608e8eefe0f63cb81dd95d0b899ee8bcc5087..eeb1763fbffacb4b8651111526de4f55d3ab2ed3 100644 > --- a/WebCore/ChangeLog > +++ b/WebCore/ChangeLog > @@ -3062,6 +3062,74 @@ > > Reviewed by Eric Seidel. > > + Add support for MathML entities > + https://bugs.webkit.org/show_bug.cgi?id=43595 > + > + Implementing the HTML5 entity parsing algorithm require refactoring how > + we search for entity names. Instead of using a perfect hash, we now > + use a sorted list. As we advance through the input, we walk down a > + binary search of the table looking for an entity. > + > + Using this data structure lets us keep track of whether the current > + string is a prefix of an existing entity, which we need for the > + algorithm. In a future patch, I plan to add some indicies to the > + table, which should let us narrow down the range of interesting entries > + more quickly. > + > + The one nasty piece of the algorithm is if we walk too far down the > + input and we need to back up to a previous match. In this patch, we > + accomplish this by rewinding the input and consuming a known number of > + characters to resync the source. > + > + * WebCore.xcodeproj/project.pbxproj: > + * html/HTMLEntityParser.cpp: > + (WebCore::consumeHTMLEntity): > + * html/HTMLEntitySearch.cpp: Added. > + (WebCore::): > + (WebCore::HTMLEntitySearch::HTMLEntitySearch): > + (WebCore::HTMLEntitySearch::compare): > + (WebCore::HTMLEntitySearch::findStart): > + (WebCore::HTMLEntitySearch::findEnd): > + (WebCore::HTMLEntitySearch::advance): > + * html/HTMLEntitySearch.h: Added. > + (WebCore::HTMLEntitySearch::isEntityPrefix): > + (WebCore::HTMLEntitySearch::currentValue): > + (WebCore::HTMLEntitySearch::lastMatch): > + (WebCore::HTMLEntitySearch::): > + (WebCore::HTMLEntitySearch::fail): > + * html/HTMLEntityTable.h: Added. > + (WebCore::HTMLEntityTableEntry::lastCharacter): > + > +2010-08-12 Adam Barth <abarth@webkit.org> > + > + Reviewed by NOBODY (OOPS!). > + > + Port view-source to new parser > + https://bugs.webkit.org/show_bug.cgi?id=43746 > + > + This patch has the basic functionality working in the old design of > + constructing the view-source view from the token stream. > + > + No new tests. (OOPS!) > + > + * html/HTMLDocumentParser.cpp: > + (WebCore::HTMLDocumentParser::pumpTokenizer): > + * html/HTMLViewSourceDocument.cpp: > + (WebCore::HTMLViewSourceDocument::createParser): > + (WebCore::HTMLViewSourceDocument::addViewSourceToken): > + (WebCore::HTMLViewSourceDocument::processCharacterToken): > + (WebCore::HTMLViewSourceDocument::processCommentToken): > + (WebCore::HTMLViewSourceDocument::processDoctypeToken): > + (WebCore::HTMLViewSourceDocument::processTagToken): > + * html/HTMLViewSourceDocument.h: > + * html/LegacyHTMLDocumentParser.cpp: > + (WebCore::LegacyHTMLDocumentParser::processToken): > + (WebCore::LegacyHTMLDocumentParser::processDoctypeToken): > + > +2010-08-09 Adam Barth <abarth@webkit.org> > + > + Reviewed by NOBODY (OOPS!). > + > Remove trailing whitespace in HTMLViewSourceDocument.cpp > https://bugs.webkit.org/show_bug.cgi?id=43741 > > diff --git a/WebCore/DerivedSources.make b/WebCore/DerivedSources.make > index 37c2f10d33f8dbd9238dbb4e245feec39b6c9edf..c5ddff9eab2bd7a10c2a5c474bda567a5a8c1b49 100644 > --- a/WebCore/DerivedSources.make > +++ b/WebCore/DerivedSources.make > @@ -505,7 +505,7 @@ all : \ > ColorData.cpp \ > DocTypeStrings.cpp \ > HTMLElementFactory.cpp \ > - HTMLEntityNames.cpp \ > + HTMLEntityTable.cpp \ > HTMLNames.cpp \ > WMLElementFactory.cpp \ > WMLNames.cpp \ > @@ -600,8 +600,8 @@ DocTypeStrings.cpp : html/DocTypeStrings.gperf $(WebCore)/make-hash-tools.pl > > # HTML entity names > > -HTMLEntityNames.cpp : html/HTMLEntityNames.gperf $(WebCore)/make-hash-tools.pl > - perl $(WebCore)/make-hash-tools.pl . $(WebCore)/html/HTMLEntityNames.gperf > +HTMLEntityTable.cpp : html/HTMLEntityNames.json $(WebCore)/../WebKitTools/Scripts/create-html-entity-table > + python $(WebCore)/../WebKitTools/Scripts/create-html-entity-table $(WebCore)/html/HTMLEntityNames.json > HTMLEntityTable.cpp > > # -------- > > diff --git a/WebCore/GNUmakefile.am b/WebCore/GNUmakefile.am > index bca2aae30fb9095f69f0fee1896b9ecbf7a47995..cd035fe80547dd72bee52b5cc353eb4dedd05238 100644 > --- a/WebCore/GNUmakefile.am > +++ b/WebCore/GNUmakefile.am > @@ -92,7 +92,7 @@ webcore_built_sources += \ > DerivedSources/WebCore/CSSValueKeywords.h \ > DerivedSources/WebCore/HTMLElementFactory.cpp \ > DerivedSources/WebCore/HTMLElementFactory.h \ > - DerivedSources/WebCore/HTMLEntityNames.cpp \ > + DerivedSources/WebCore/HTMLEntityTable.cpp \ > DerivedSources/WebCore/HTMLNames.cpp \ > DerivedSources/WebCore/HTMLNames.h \ > DerivedSources/WebCore/InspectorBackendDispatcher.cpp \ > @@ -4393,8 +4393,8 @@ DerivedSources/WebCore/DocTypeStrings.cpp : $(WebCore)/html/DocTypeStrings.gperf > $(PERL) $(WebCore)/make-hash-tools.pl $(GENSOURCES_WEBCORE) $(WebCore)/html/DocTypeStrings.gperf > > # HTML entity names > -DerivedSources/WebCore/HTMLEntityNames.cpp : $(WebCore)/html/HTMLEntityNames.gperf $(WebCore)/make-hash-tools.pl > - $(PERL) $(WebCore)/make-hash-tools.pl $(GENSOURCES_WEBCORE) $(WebCore)/html/HTMLEntityNames.gperf > +DerivedSources/WebCore/HTMLEntityTable.cpp : $(WebCore)/html/HTMLEntityNames.json $(WebCore)/../WebKitTools/Scripts/create-html-entity-table > + $(PYTHON) $(WebCore)/../WebKitTools/Scripts/create-html-entity-table $(WebCore)/html/HTMLEntityNames.json > $(GENSOURCES_WEBCORE)/HTMLEntityTable.cpp > > # color names > DerivedSources/WebCore/ColorData.cpp: $(WebCore)/platform/ColorData.gperf $(WebCore)/make-hash-tools.pl > diff --git a/WebCore/WebCore.gyp/WebCore.gyp b/WebCore/WebCore.gyp/WebCore.gyp > index a28ee5d061059abfa4ef91582f3f579ca80d2954..85fa5b84768e77bb60514aa620f5793cc78a47ac 100644 > --- a/WebCore/WebCore.gyp/WebCore.gyp > +++ b/WebCore/WebCore.gyp/WebCore.gyp > @@ -276,9 +276,10 @@ > > # gperf rule > '../html/DocTypeStrings.gperf', > - '../html/HTMLEntityNames.gperf', > '../platform/ColorData.gperf', > > + '../html/HTMLEntityNames.json', > + > # idl rules > '<@(bindings_idl_files)', > ], > @@ -609,6 +610,26 @@ > ], > 'process_outputs_as_sources': 0, > }, > + { > + 'rule_name': 'json', > + 'extension': 'json', > + # > + # json is fed into WebKitTools/Scripts//make-hash-tools.pl > + # > + 'outputs': [ > + '<(SHARED_INTERMEDIATE_DIR)/webkit/<(RULE_INPUT_ROOT).cpp', > + ], > + 'dependencies': [ > + '../make-hash-tools.pl', > + ], > + 'action': [ > + 'perl', > + '../make-hash-tools.pl', > + '<(SHARED_INTERMEDIATE_DIR)/webkit', > + '<(RULE_INPUT_PATH)', > + ], > + 'process_outputs_as_sources': 0, > + }, > # Rule to build generated JavaScript (V8) bindings from .idl source. > { > 'rule_name': 'binding', > diff --git a/WebCore/WebCore.pri b/WebCore/WebCore.pri > index b0effee7eb6ee2b1eb7eb4fb5de6699fe8a84ec8..137c3b03fe10cdbb76b0dca7679cb762101a4173 100644 > --- a/WebCore/WebCore.pri > +++ b/WebCore/WebCore.pri > @@ -29,7 +29,7 @@ XML_NAMES = $$PWD/xml/xmlattrs.in > > XMLNS_NAMES = $$PWD/xml/xmlnsattrs.in > > -ENTITIES_GPERF = $$PWD/html/HTMLEntityNames.gperf > +HTML_ENTITIES = $$PWD/html/HTMLEntityNames.json > > COLORDATA_GPERF = $$PWD/platform/ColorData.gperf > > @@ -590,12 +590,12 @@ xmlnames.commands = perl -I$$PWD/bindings/scripts $$xmlnames.wkScript --attrs $$ > addExtraCompiler(xmlnames) > > # GENERATOR 8-A: > -entities.output = $${WC_GENERATED_SOURCES_DIR}/HTMLEntityNames.cpp > -entities.input = ENTITIES_GPERF > -entities.wkScript = $$PWD/make-hash-tools.pl > -entities.commands = perl $$entities.wkScript $${WC_GENERATED_SOURCES_DIR} $$ENTITIES_GPERF > +entities.output = $${WC_GENERATED_SOURCES_DIR}/HTMLEntityTable.cpp > +entities.input = HTML_ENTITIES > +entities.wkScript = $$PWD/../WebKitTools/Scripts/create-html-entity-table > +entities.commands = python $$entities.wkScript $$HTML_ENTITIES > $${WC_GENERATED_SOURCES_DIR}/HTMLEntityTable.cpp > entities.clean = ${QMAKE_FILE_OUT} > -entities.depends = $$PWD/make-hash-tools.pl > +entities.depends = $$PWD/../WebKitTools/Scripts/create-html-entity-table > addExtraCompiler(entities) > > # GENERATOR 8-B: > diff --git a/WebCore/WebCore.vcproj/WebCore.vcproj b/WebCore/WebCore.vcproj/WebCore.vcproj > index 5ff42507b62ddc560a894a9fe2b228cf71c43bbc..2867512609f90cbb35a8c3b0ca3d406e7dbb9cd0 100644 > --- a/WebCore/WebCore.vcproj/WebCore.vcproj > +++ b/WebCore/WebCore.vcproj/WebCore.vcproj > @@ -40905,6 +40905,14 @@ > > > </File> > <File > + RelativePath="..\html\HTMLViewSourceParser.cpp" > + > > + </File> > + <File > + RelativePath="..\html\HTMLViewSourceParser.h" > + > > + </File> > + <File > RelativePath="..\html\ImageData.cpp" > > > </File> > diff --git a/WebCore/WebCore.xcodeproj/project.pbxproj b/WebCore/WebCore.xcodeproj/project.pbxproj > index f9ed1b431396be0c1fd87e107d651c172bf30243..750ed400a8defd414cea95cace22f1fb9f145246 100644 > --- a/WebCore/WebCore.xcodeproj/project.pbxproj > +++ b/WebCore/WebCore.xcodeproj/project.pbxproj > @@ -3181,6 +3181,9 @@ > A8A564A611DC0E59003AC2F0 /* HTMLFormattingElementList.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A8A564A411DC0E59003AC2F0 /* HTMLFormattingElementList.cpp */; }; > A8A909AC0CBCD6B50029B807 /* RenderSVGTransformableContainer.h in Headers */ = {isa = PBXBuildFile; fileRef = A8A909AA0CBCD6B50029B807 /* RenderSVGTransformableContainer.h */; }; > A8A909AD0CBCD6B50029B807 /* RenderSVGTransformableContainer.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A8A909AB0CBCD6B50029B807 /* RenderSVGTransformableContainer.cpp */; }; > + A8BC044E1214EB2A00B5F122 /* HTMLEntitySearch.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 970C4FDF1211266200C3D393 /* HTMLEntitySearch.cpp */; }; > + A8BC044F1214EB2B00B5F122 /* HTMLEntitySearch.h in Headers */ = {isa = PBXBuildFile; fileRef = 970C4FE01211266200C3D393 /* HTMLEntitySearch.h */; }; > + A8BC04921214F69600B5F122 /* HTMLEntityTable.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A8BC04911214F69600B5F122 /* HTMLEntityTable.cpp */; }; > A8BCFD05120A046100B5F122 /* SVGPathSeg.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A8BCFD04120A046100B5F122 /* SVGPathSeg.cpp */; }; > A8C2280E11D4A59700D5A7D3 /* DocumentParser.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A8C2280D11D4A59700D5A7D3 /* DocumentParser.cpp */; }; > A8C228A111D5722E00D5A7D3 /* DecodedDataDocumentParser.h in Headers */ = {isa = PBXBuildFile; fileRef = A8C2289F11D5722E00D5A7D3 /* DecodedDataDocumentParser.h */; }; > @@ -8476,6 +8479,10 @@ > 97059974107D975200A50A7C /* PolicyCallback.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = PolicyCallback.h; sourceTree = "<group>"; }; > 97059975107D975200A50A7C /* PolicyChecker.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = PolicyChecker.cpp; sourceTree = "<group>"; }; > 97059976107D975200A50A7C /* PolicyChecker.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = PolicyChecker.h; sourceTree = "<group>"; }; > + 970C4FDF1211266200C3D393 /* HTMLEntitySearch.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = HTMLEntitySearch.cpp; sourceTree = "<group>"; }; > + 970C4FE01211266200C3D393 /* HTMLEntitySearch.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = HTMLEntitySearch.h; sourceTree = "<group>"; }; > + 970C4FE11211266200C3D393 /* HTMLEntityTable.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = HTMLEntityTable.cpp; sourceTree = "<group>"; }; > + 970C4FE21211266200C3D393 /* HTMLEntityTable.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = HTMLEntityTable.h; sourceTree = "<group>"; }; > 9719AEFF11D09F2C00D45831 /* HTMLInputStream.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = HTMLInputStream.h; sourceTree = "<group>"; }; > 9738899E116EA9DC00ADF313 /* DocumentWriter.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = DocumentWriter.cpp; sourceTree = "<group>"; }; > 9738899F116EA9DC00ADF313 /* DocumentWriter.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = DocumentWriter.h; sourceTree = "<group>"; }; > @@ -8861,6 +8868,7 @@ > A8A564A411DC0E59003AC2F0 /* HTMLFormattingElementList.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = HTMLFormattingElementList.cpp; sourceTree = "<group>"; }; > A8A909AA0CBCD6B50029B807 /* RenderSVGTransformableContainer.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RenderSVGTransformableContainer.h; sourceTree = "<group>"; }; > A8A909AB0CBCD6B50029B807 /* RenderSVGTransformableContainer.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = RenderSVGTransformableContainer.cpp; sourceTree = "<group>"; }; > + A8BC04911214F69600B5F122 /* HTMLEntityTable.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = HTMLEntityTable.cpp; path = /build/Debug/DerivedSources/WebCore/HTMLEntityTable.cpp; sourceTree = "<absolute>"; }; > A8BCFD04120A046100B5F122 /* SVGPathSeg.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = SVGPathSeg.cpp; sourceTree = "<group>"; }; > A8C2280D11D4A59700D5A7D3 /* DocumentParser.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = DocumentParser.cpp; sourceTree = "<group>"; }; > A8C2289F11D5722E00D5A7D3 /* DecodedDataDocumentParser.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = DecodedDataDocumentParser.h; sourceTree = "<group>"; }; > @@ -10941,7 +10949,6 @@ > E1FF57A50F01256B00891EBB /* ThreadGlobalData.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = ThreadGlobalData.cpp; sourceTree = "<group>"; }; > E406F3FA1198304D009D59D6 /* DocTypeStrings.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = DocTypeStrings.cpp; sourceTree = "<group>"; }; > E406F3FB1198307D009D59D6 /* ColorData.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = ColorData.cpp; sourceTree = "<group>"; }; > - E406F4021198329A009D59D6 /* HTMLEntityNames.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = HTMLEntityNames.cpp; sourceTree = "<group>"; }; > E415F10C0D9A05870033CE97 /* ElementTimeControl.idl */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text; path = ElementTimeControl.idl; sourceTree = "<group>"; }; > E415F1680D9A165D0033CE97 /* DOMElementTimeControl.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = DOMElementTimeControl.h; sourceTree = "<group>"; }; > E415F1830D9A1A830033CE97 /* ElementTimeControl.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = ElementTimeControl.h; sourceTree = "<group>"; }; > @@ -12288,7 +12295,7 @@ > E406F3FA1198304D009D59D6 /* DocTypeStrings.cpp */, > A17C81200F2A5CF7005DAAEB /* HTMLElementFactory.cpp */, > A17C81210F2A5CF7005DAAEB /* HTMLElementFactory.h */, > - E406F4021198329A009D59D6 /* HTMLEntityNames.cpp */, > + A8BC04911214F69600B5F122 /* HTMLEntityTable.cpp */, > A8D06B380A265DCD005E7203 /* HTMLNames.cpp */, > A8D06B370A265DCD005E7203 /* HTMLNames.h */, > 938E65F609F0985D008A48EC /* JSHTMLElementWrapperFactory.cpp */, > @@ -13986,6 +13993,10 @@ > 859128790AB222EC00202265 /* HTMLEmbedElement.idl */, > 976E895E11C0CA3A00EA9CA9 /* HTMLEntityParser.cpp */, > 976E895F11C0CA3A00EA9CA9 /* HTMLEntityParser.h */, > + 970C4FDF1211266200C3D393 /* HTMLEntitySearch.cpp */, > + 970C4FE01211266200C3D393 /* HTMLEntitySearch.h */, > + 970C4FE11211266200C3D393 /* HTMLEntityTable.cpp */, > + 970C4FE21211266200C3D393 /* HTMLEntityTable.h */, > A81369B9097374F500D74463 /* HTMLFieldSetElement.cpp */, > A81369B8097374F500D74463 /* HTMLFieldSetElement.h */, > 1AE2A9F40A1CDA5700B42B25 /* HTMLFieldSetElement.idl */, > @@ -20156,6 +20167,7 @@ > 97DD4D870FDF4D6E00ECF9A4 /* XSSAuditor.h in Headers */, > CE172E011136E8CE0062A533 /* ZoomMode.h in Headers */, > 2EED57FE1214A9C2007656BB /* ThreadableBlobRegistry.h in Headers */, > + A8BC044F1214EB2B00B5F122 /* HTMLEntitySearch.h in Headers */, > ); > runOnlyForDeploymentPostprocessing = 0; > }; > @@ -22582,6 +22594,8 @@ > E1BE512D0CF6C512002EA959 /* XSLTUnicodeSort.cpp in Sources */, > 97DD4D860FDF4D6E00ECF9A4 /* XSSAuditor.cpp in Sources */, > 2EED57FD1214A9C2007656BB /* ThreadableBlobRegistry.cpp in Sources */, > + A8BC044E1214EB2A00B5F122 /* HTMLEntitySearch.cpp in Sources */, > + A8BC04921214F69600B5F122 /* HTMLEntityTable.cpp in Sources */, > ); > runOnlyForDeploymentPostprocessing = 0; > }; > diff --git a/WebCore/html/HTMLEntityNames.gperf b/WebCore/html/HTMLEntityNames.gperf > deleted file mode 100644 > index c665efea52401a202337cf2ea6bb8800bbcb754d..0000000000000000000000000000000000000000 > --- a/WebCore/html/HTMLEntityNames.gperf > +++ /dev/null > @@ -1,303 +0,0 @@ > -%{ > -/* > - Copyright (C) 1999 Lars Knoll (knoll@mpi-hd.mpg.de) > - Copyright (C) 2002, 2003, 2004, 2005 Apple Inc. All rights reserved. > - > - This library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Library General Public > - License as published by the Free Software Foundation; either > - version 2 of the License, or (at your option) any later version. > - > - This library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Library General Public License for more details. > - > - You should have received a copy of the GNU Library General Public License > - along with this library; see the file COPYING.LIB. If not, write to > - the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, > - Boston, MA 02110-1301, USA. > - > - ---------------------------------------------------------------------------- > - > - HTMLEntityNames.gperf: input file to generate a hash table for entities > - HTMLEntityNames.cpp: DO NOT EDIT! generated by WebCore/make-hash-tools.pl > -*/ > -%} > -%struct-type > -struct Entity { > - const char *name; > - int code; > -}; > -%language=ANSI-C > -%readonly-tables > -%global-table > -%compare-strncmp > -%define lookup-function-name findEntity > -%define hash-function-name entity_hash_function > -%includes > -%enum > -%% > -AElig, 0x00c6 > -AMP, 38 > -Aacute, 0x00c1 > -Acirc, 0x00c2 > -Agrave, 0x00c0 > -Alpha, 0x0391 > -Aring, 0x00c5 > -Atilde, 0x00c3 > -Auml, 0x00c4 > -Beta, 0x0392 > -COPY, 0x00a9 > -Ccedil, 0x00c7 > -Chi, 0x03a7 > -Dagger, 0x2021 > -Delta, 0x0394 > -ETH, 0x00d0 > -Eacute, 0x00c9 > -Ecirc, 0x00ca > -Egrave, 0x00c8 > -Epsilon, 0x0395 > -Eta, 0x0397 > -Euml, 0x00cb > -GT, 62 > -Gamma, 0x0393 > -Iacute, 0x00cd > -Icirc, 0x00ce > -Igrave, 0x00cc > -Iota, 0x0399 > -Iuml, 0x00cf > -Kappa, 0x039a > -LT, 60 > -Lambda, 0x039b > -Mu, 0x039c > -Ntilde, 0x00d1 > -Nu, 0x039d > -OElig, 0x0152 > -Oacute, 0x00d3 > -Ocirc, 0x00d4 > -Ograve, 0x00d2 > -Omega, 0x03a9 > -Omicron, 0x039f > -Oslash, 0x00d8 > -Otilde, 0x00d5 > -Ouml, 0x00d6 > -Phi, 0x03a6 > -Pi, 0x03a0 > -Prime, 0x2033 > -Psi, 0x03a8 > -QUOT, 34 > -REG, 0x00ae > -Rho, 0x03a1 > -Scaron, 0x0160 > -Sigma, 0x03a3 > -THORN, 0x00de > -Tau, 0x03a4 > -Theta, 0x0398 > -Uacute, 0x00da > -Ucirc, 0x00db > -Ugrave, 0x00d9 > -Upsilon, 0x03a5 > -Uuml, 0x00dc > -Xi, 0x039e > -Yacute, 0x00dd > -Yuml, 0x0178 > -Zeta, 0x0396 > -aacute, 0x00e1 > -acirc, 0x00e2 > -acute, 0x00b4 > -aelig, 0x00e6 > -agrave, 0x00e0 > -alefsym, 0x2135 > -alpha, 0x03b1 > -amp, 38 > -and, 0x2227 > -ang, 0x2220 > -apos, 0x0027 > -aring, 0x00e5 > -asymp, 0x2248 > -atilde, 0x00e3 > -auml, 0x00e4 > -bdquo, 0x201e > -beta, 0x03b2 > -brvbar, 0x00a6 > -bull, 0x2022 > -cap, 0x2229 > -ccedil, 0x00e7 > -cedil, 0x00b8 > -cent, 0x00a2 > -chi, 0x03c7 > -circ, 0x02c6 > -clubs, 0x2663 > -cong, 0x2245 > -copy, 0x00a9 > -crarr, 0x21b5 > -cup, 0x222a > -curren, 0x00a4 > -dArr, 0x21d3 > -dagger, 0x2020 > -darr, 0x2193 > -deg, 0x00b0 > -delta, 0x03b4 > -diams, 0x2666 > -divide, 0x00f7 > -eacute, 0x00e9 > -ecirc, 0x00ea > -egrave, 0x00e8 > -empty, 0x2205 > -emsp, 0x2003 > -ensp, 0x2002 > -epsilon, 0x03b5 > -equiv, 0x2261 > -eta, 0x03b7 > -eth, 0x00f0 > -euml, 0x00eb > -euro, 0x20ac > -exist, 0x2203 > -fnof, 0x0192 > -forall, 0x2200 > -frac12, 0x00bd > -frac14, 0x00bc > -frac34, 0x00be > -frasl, 0x2044 > -gamma, 0x03b3 > -ge, 0x2265 > -gt, 62 > -hArr, 0x21d4 > -harr, 0x2194 > -hearts, 0x2665 > -hellip, 0x2026 > -iacute, 0x00ed > -icirc, 0x00ee > -iexcl, 0x00a1 > -igrave, 0x00ec > -image, 0x2111 > -infin, 0x221e > -int, 0x222b > -iota, 0x03b9 > -iquest, 0x00bf > -isin, 0x2208 > -iuml, 0x00ef > -kappa, 0x03ba > -lArr, 0x21d0 > -lambda, 0x03bb > -lang, 0x3008 > -laquo, 0x00ab > -larr, 0x2190 > -lceil, 0x2308 > -ldquo, 0x201c > -le, 0x2264 > -lfloor, 0x230a > -lowast, 0x2217 > -loz, 0x25ca > -lrm, 0x200e > -lsaquo, 0x2039 > -lsquo, 0x2018 > -lt, 60 > -macr, 0x00af > -mdash, 0x2014 > -micro, 0x00b5 > -middot, 0x00b7 > -minus, 0x2212 > -mu, 0x03bc > -nabla, 0x2207 > -nbsp, 0x00a0 > -ndash, 0x2013 > -ne, 0x2260 > -ni, 0x220b > -not, 0x00ac > -notin, 0x2209 > -nsub, 0x2284 > -nsup, 0x2285 > -ntilde, 0x00f1 > -nu, 0x03bd > -oacute, 0x00f3 > -ocirc, 0x00f4 > -oelig, 0x0153 > -ograve, 0x00f2 > -oline, 0x203e > -omega, 0x03c9 > -omicron, 0x03bf > -oplus, 0x2295 > -or, 0x2228 > -ordf, 0x00aa > -ordm, 0x00ba > -oslash, 0x00f8 > -otilde, 0x00f5 > -otimes, 0x2297 > -ouml, 0x00f6 > -para, 0x00b6 > -part, 0x2202 > -percnt, 0x0025 > -permil, 0x2030 > -perp, 0x22a5 > -phi, 0x03c6 > -pi, 0x03c0 > -piv, 0x03d6 > -plusmn, 0x00b1 > -pound, 0x00a3 > -prime, 0x2032 > -prod, 0x220f > -prop, 0x221d > -psi, 0x03c8 > -quot, 34 > -rArr, 0x21d2 > -radic, 0x221a > -rang, 0x3009 > -raquo, 0x00bb > -rarr, 0x2192 > -rceil, 0x2309 > -rdquo, 0x201d > -real, 0x211c > -reg, 0x00ae > -rfloor, 0x230b > -rho, 0x03c1 > -rlm, 0x200f > -rsaquo, 0x203a > -rsquo, 0x2019 > -sbquo, 0x201a > -scaron, 0x0161 > -sdot, 0x22c5 > -sect, 0x00a7 > -shy, 0x00ad > -sigma, 0x03c3 > -sigmaf, 0x03c2 > -sim, 0x223c > -spades, 0x2660 > -sub, 0x2282 > -sube, 0x2286 > -sum, 0x2211 > -sup, 0x2283 > -sup1, 0x00b9 > -sup2, 0x00b2 > -sup3, 0x00b3 > -supe, 0x2287 > -supl, 0x00b9 > -szlig, 0x00df > -tau, 0x03c4 > -there4, 0x2234 > -theta, 0x03b8 > -thetasym, 0x03d1 > -thinsp, 0x2009 > -thorn, 0x00fe > -tilde, 0x02dc > -times, 0x00d7 > -trade, 0x2122 > -uArr, 0x21d1 > -uacute, 0x00fa > -uarr, 0x2191 > -ucirc, 0x00fb > -ugrave, 0x00f9 > -uml, 0x00a8 > -upsih, 0x03d2 > -upsilon, 0x03c5 > -uuml, 0x00fc > -weierp, 0x2118 > -xi, 0x03be > -yacute, 0x00fd > -yen, 0x00a5 > -yuml, 0x00ff > -zeta, 0x03b6 > -zwj, 0x200d > -zwnj, 0x200c > -%% > diff --git a/WebCore/html/HTMLEntityParser.cpp b/WebCore/html/HTMLEntityParser.cpp > index 6bec8190dc886cc3e69df71ed0405fa106c0be03..4822827cf43c548214a541da7f39668b88f09000 100644 > --- a/WebCore/html/HTMLEntityParser.cpp > +++ b/WebCore/html/HTMLEntityParser.cpp > @@ -28,6 +28,8 @@ > #include "config.h" > #include "HTMLEntityParser.h" > > +#include "HTMLEntitySearch.h" > +#include "HTMLEntityTable.h" > #include <wtf/Vector.h> > > #include "HTMLEntityNames.cpp" > @@ -102,7 +104,6 @@ unsigned consumeHTMLEntity(SegmentedString& source, bool& notEnoughCharacters, U > EntityState entityState = Initial; > unsigned result = 0; > Vector<UChar, 10> consumedCharacters; > - Vector<char, 10> entityName; > > while (!source.isEmpty()) { > UChar cc = *source; > @@ -166,7 +167,7 @@ unsigned consumeHTMLEntity(SegmentedString& source, bool& notEnoughCharacters, U > else if (cc == ';') { > source.advancePastNonNewline(); > return legalEntityFor(result); > - } else > + } else > return legalEntityFor(result); > break; > } > @@ -181,48 +182,48 @@ unsigned consumeHTMLEntity(SegmentedString& source, bool& notEnoughCharacters, U > break; > } > case Named: { > - // FIXME: This code is wrong. We need to find the longest matching entity. > - // The examples from the spec are: > - // I'm &notit; I tell you > - // I'm &notin; I tell you > - // In the first case, "&not" is the entity. In the second > - // case, "&notin;" is the entity. > - // FIXME: Our list of HTML entities is incomplete. > - // FIXME: The number 8 below is bogus. > - while (!source.isEmpty() && entityName.size() <= 8) { > + HTMLEntitySearch entitySearch; > + while (!source.isEmpty()) { > cc = *source; > - if (cc == ';') { > - const Entity* entity = findEntity(entityName.data(), entityName.size()); > - if (entity) { > - source.advanceAndASSERT(';'); > - return entity->code; > - } > - break; > - } > - if (!isAlphaNumeric(cc)) { > - const Entity* entity = findEntity(entityName.data(), entityName.size()); > - if (entity) { > - // HTML5 tells us to ignore this entity, for historical reasons, > - // if the lookhead character is '='. > - if (additionalAllowedCharacter && cc == '=') > - break; > - // Some entities require a terminating semicolon, whereas other > - // entities do not. The HTML5 spec has a giant list: > - // > - // http://www.whatwg.org/specs/web-apps/current-work/multipage/named-character-references.html#named-character-references > - // > - // However, the list seems to boil down to this branch: > - if (entity->code > 255) > - break; > - return entity->code; > - } > + entitySearch.advance(cc); > + if (!entitySearch.isEntityPrefix()) > break; > - } > - entityName.append(cc); > consumedCharacters.append(cc); > source.advanceAndASSERT(cc); > } > notEnoughCharacters = source.isEmpty(); > + if (notEnoughCharacters) { > + // We can't an entity because there might be a longer entity > + // that we could match if we had more data. > + unconsumeCharacters(source, consumedCharacters); > + return 0; > + } > + if (!entitySearch.lastMatch()) { > + ASSERT(!entitySearch.currentValue()); > + unconsumeCharacters(source, consumedCharacters); > + return 0; > + } > + if (entitySearch.lastMatch()->length != entitySearch.currentLength()) { > + // We've consumed too many characters. We need to walk the > + // source back to the point at which we had consumed an > + // actual entity. > + unconsumeCharacters(source, consumedCharacters); > + consumedCharacters.clear(); > + const int length = entitySearch.lastMatch()->length; > + const UChar* reference = entitySearch.lastMatch()->entity; > + for (int i = 0; i < length; ++i) { > + cc = *source; > + ASSERT_UNUSED(reference, cc == *reference++); > + consumedCharacters.append(cc); > + source.advanceAndASSERT(cc); > + ASSERT(!source.isEmpty()); > + } > + cc = *source; > + } > + if (entitySearch.lastMatch()->lastCharacter() == ';') > + return entitySearch.lastMatch()->value; > + if (!additionalAllowedCharacter || !(isAlphaNumeric(cc) || cc == '=')) > + return entitySearch.lastMatch()->value; > unconsumeCharacters(source, consumedCharacters); > return 0; > } > diff --git a/WebCore/html/HTMLEntitySearch.cpp b/WebCore/html/HTMLEntitySearch.cpp > new file mode 100644 > index 0000000000000000000000000000000000000000..c0526a3d6de7fb27ebb3c83ac39a1ea7a66ec169 > --- /dev/null > +++ b/WebCore/html/HTMLEntitySearch.cpp > @@ -0,0 +1,132 @@ > +/* > + * Copyright (C) 2010 Google, Inc. All Rights Reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in the > + * documentation and/or other materials provided with the distribution. > + * > + * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY > + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR > + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, > + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, > + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR > + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY > + * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#include "config.h" > +#include "HTMLEntitySearch.h" > + > +#include "HTMLEntityTable.h" > + > +namespace WebCore { > + > +namespace { > + > +const HTMLEntityTableEntry* halfway(const HTMLEntityTableEntry* left, const HTMLEntityTableEntry* right) > +{ > + return &left[(right - left) / 2]; > +} > + > +} > + > +HTMLEntitySearch::HTMLEntitySearch() > + : m_currentLength(0) > + , m_currentValue(0) > + , m_lastMatch(0) > + , m_start(HTMLEntityTable::start()) > + , m_end(HTMLEntityTable::end()) > +{ > +} > + > +HTMLEntitySearch::CompareResult HTMLEntitySearch::compare(const HTMLEntityTableEntry* entry, UChar nextCharacter) const > +{ > + if (entry->length < m_currentLength + 1) > + return Before; > + UChar entryNextCharacter = entry->entity[m_currentLength]; > + if (entryNextCharacter == nextCharacter) > + return Prefix; > + return entryNextCharacter < nextCharacter ? Before : After; > +} > + > +const HTMLEntityTableEntry* HTMLEntitySearch::findStart(UChar nextCharacter) const > +{ > + const HTMLEntityTableEntry* left = m_start; > + const HTMLEntityTableEntry* right = m_end; > + if (left == right) > + return left; > + CompareResult result = compare(left, nextCharacter); > + if (result == Prefix) > + return left; > + if (result == After) > + return right; > + while (left + 1 < right) { > + const HTMLEntityTableEntry* probe = halfway(left, right); > + result = compare(probe, nextCharacter); > + if (result == Before) > + left = probe; > + else { > + ASSERT(result == After || result == Prefix); > + right = probe; > + } > + } > + ASSERT(left + 1 == right); > + return right; > +} > + > +const HTMLEntityTableEntry* HTMLEntitySearch::findEnd(UChar nextCharacter) const > +{ > + const HTMLEntityTableEntry* left = m_start; > + const HTMLEntityTableEntry* right = m_end; > + if (left == right) > + return right; > + CompareResult result = compare(right, nextCharacter); > + if (result == Prefix) > + return right; > + if (result == Before) > + return left; > + while (left + 1 < right) { > + const HTMLEntityTableEntry* probe = halfway(left, right); > + result = compare(probe, nextCharacter); > + if (result == After) > + right = probe; > + else { > + ASSERT(result == Before || result == Prefix); > + left = probe; > + } > + } > + ASSERT(left + 1 == right); > + return left; > +} > + > +void HTMLEntitySearch::advance(UChar nextCharacter) > +{ > + ASSERT(isEntityPrefix()); > + if (!m_currentLength) { > + m_start = HTMLEntityTable::start(nextCharacter); > + m_end = HTMLEntityTable::end(nextCharacter); > + } else { > + m_start = findStart(nextCharacter); > + m_end = findEnd(nextCharacter); > + if (m_start == m_end && compare(m_start, nextCharacter) != Prefix) > + return fail(); > + } > + ++m_currentLength; > + if (m_start->length != m_currentLength) { > + m_currentValue = 0; > + return; > + } > + m_lastMatch = m_start; > + m_currentValue = m_lastMatch->value; > +} > + > +} > diff --git a/WebCore/html/HTMLEntitySearch.h b/WebCore/html/HTMLEntitySearch.h > new file mode 100644 > index 0000000000000000000000000000000000000000..e57859d4ebccd2a12b03bde0f8cf1282394dbcc9 > --- /dev/null > +++ b/WebCore/html/HTMLEntitySearch.h > @@ -0,0 +1,75 @@ > +/* > + * Copyright (C) 2010 Google, Inc. All Rights Reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in the > + * documentation and/or other materials provided with the distribution. > + * > + * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY > + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR > + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, > + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, > + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR > + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY > + * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#ifndef HTMLEntitySearch_h > +#define HTMLEntitySearch_h > + > +#include "PlatformString.h" > + > +namespace WebCore { > + > +struct HTMLEntityTableEntry; > + > +class HTMLEntitySearch { > +public: > + HTMLEntitySearch(); > + > + void advance(UChar); > + > + bool isEntityPrefix() const { return !!m_start; } > + int currentValue() const { return m_currentValue; } > + int currentLength() const { return m_currentLength; } > + > + const HTMLEntityTableEntry* lastMatch() const { return m_lastMatch; } > + > +private: > + enum CompareResult { > + Before, > + Prefix, > + After, > + }; > + > + CompareResult compare(const HTMLEntityTableEntry*, UChar) const; > + const HTMLEntityTableEntry* findStart(UChar) const; > + const HTMLEntityTableEntry* findEnd(UChar) const; > + > + void fail() > + { > + m_currentValue = 0; > + m_start = 0; > + m_end = 0; > + } > + > + int m_currentLength; > + int m_currentValue; > + > + const HTMLEntityTableEntry* m_lastMatch; > + const HTMLEntityTableEntry* m_start; > + const HTMLEntityTableEntry* m_end; > +}; > + > +} > + > +#endif > diff --git a/WebCore/html/HTMLEntityTable.h b/WebCore/html/HTMLEntityTable.h > new file mode 100644 > index 0000000000000000000000000000000000000000..35a1afd3161f0dca34c52d301dca9c3959c72931 > --- /dev/null > +++ b/WebCore/html/HTMLEntityTable.h > @@ -0,0 +1,52 @@ > +/* > + * Copyright (C) 2010 Google, Inc. All Rights Reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in the > + * documentation and/or other materials provided with the distribution. > + * > + * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY > + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR > + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, > + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, > + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR > + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY > + * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#ifndef HTMLEntityTable_h > +#define HTMLEntityTable_h > + > +#include "PlatformString.h" > + > +namespace WebCore { > + > +struct HTMLEntityTableEntry { > + UChar lastCharacter() const { return entity[length - 1]; } > + > + const UChar* entity; > + int length; > + int value; > +}; > + > +class HTMLEntityTable { > +public: > + static const HTMLEntityTableEntry* start(); > + static const HTMLEntityTableEntry* end(); > + > + static const HTMLEntityTableEntry* start(UChar); > + static const HTMLEntityTableEntry* end(UChar); > +}; > + > +} > + > +#endif > diff --git a/WebCore/make-hash-tools.pl b/WebCore/make-hash-tools.pl > index 42cb6fd958f105ae6afbf6ede348ac35b6f3963e..8cc9952c1572b003381b0d8273cb1d665c30ffe8 100644 > --- a/WebCore/make-hash-tools.pl > +++ b/WebCore/make-hash-tools.pl > @@ -29,16 +29,6 @@ my $option = basename($ARGV[0],".gperf"); > > switch ($option) { > > -case "HTMLEntityNames" { > - > - my $htmlEntityNamesGenerated = "$outdir/HTMLEntityNames.cpp"; > - my $htmlEntityNamesGperf = $ARGV[0]; > - shift; > - > - system("gperf --key-positions=\"*\" -D -s 2 $htmlEntityNamesGperf > $htmlEntityNamesGenerated") == 0 || die "calling gperf failed: $?"; > - > -} # case "HTMLEntityNames" > - > case "DocTypeStrings" { > > my $docTypeStringsGenerated = "$outdir/DocTypeStrings.cpp"; > diff --git a/WebKitTools/ChangeLog b/WebKitTools/ChangeLog > index f8d1d88c2bd8ac93cf628d0623c58978274c1953..8521eabd7cba5659d24b4d3131714f780a047eed 100644 > --- a/WebKitTools/ChangeLog > +++ b/WebKitTools/ChangeLog > @@ -1,3 +1,18 @@ > +2010-08-12 Adam Barth <abarth@webkit.org> > + > + Reviewed by NOBODY (OOPS!). > + > + Add support for MathML entities > + https://bugs.webkit.org/show_bug.cgi?id=43595 > + > + A script for generating the C++ state data structure describing all the > + entities from a JSON description. > + > + Note: This script is not yet integrated with the build system, so this > + patch will not build! OOPS! > + > + * Scripts/create-html-entity-table: Added. > + > 2010-08-12 David Levin <levin@chromium.org> > > Build break fix. > diff --git a/WebKitTools/Scripts/create-html-entity-table b/WebKitTools/Scripts/create-html-entity-table > new file mode 100755 > index 0000000000000000000000000000000000000000..6513b29da78b3fb5eb80f845bbb3480330c320b5 > --- /dev/null > +++ b/WebKitTools/Scripts/create-html-entity-table > @@ -0,0 +1,173 @@ > +#!/usr/bin/env python > +# Copyright (c) 2010 Google Inc. All rights reserved. > +# > +# Redistribution and use in source and binary forms, with or without > +# modification, are permitted provided that the following conditions are > +# met: > +# > +# * Redistributions of source code must retain the above copyright > +# notice, this list of conditions and the following disclaimer. > +# * Redistributions in binary form must reproduce the above > +# copyright notice, this list of conditions and the following disclaimer > +# in the documentation and/or other materials provided with the > +# distribution. > +# * Neither the name of Google Inc. nor the names of its > +# contributors may be used to endorse or promote products derived from > +# this software without specific prior written permission. > +# > +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + > +import os.path > +import string > +import sys > + > +import webkitpy.thirdparty.simplejson as simplejson > + > + > +def convert_entity_to_cpp_name(entity): > + postfix = "EntityName" > + if entity[-1] == ";": > + return "%sSemicolon%s" % (entity[:-1], postfix) > + return "%s%s" % (entity, postfix) > + > + > +def convert_entity_to_uchar_array(entity): > + return "{'%s'}" % "', '".join(entity) > + > + > +def convert_value_to_int(value): > + assert(value[0] == "U") > + assert(value[1] == "+") > + return "0x" + value[2:] > + > + > +def offset_table_entry(offset): > + return " &staticEntityTable[%s]," % offset > + > + > +program_name = os.path.basename(__file__) > +if len(sys.argv) < 2: > + print >> sys.stderr, "Usage: %s INPUT_FILE" % program_name > + exit(1) > + > +input_file = sys.argv[1] > +html_entity_names_file = open(input_file) > +entries = simplejson.load(html_entity_names_file) > +html_entity_names_file.close() > + > +entries = sorted(entries, key=lambda entry: entry['entity']) > +entity_count = len(entries) > + > +print """/* > + * Copyright (C) 2010 Google, Inc. All Rights Reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in the > + * documentation and/or other materials provided with the distribution. > + * > + * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY > + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR > + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, > + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, > + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR > + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY > + * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +// THIS FILE IS GENERATED BY WebKitTools/Scripts/create-html-entity-table > +// DO NOT EDIT (unless you are a ninja)! > + > +#include "config.h" > +#include "HTMLEntityTable.h" > + > +namespace WebCore { > + > +namespace { > +""" > + > +for entry in entries: > + print "const UChar %sEntityName[] = %s;" % ( > + convert_entity_to_cpp_name(entry["entity"]), > + convert_entity_to_uchar_array(entry["entity"])) > + > +print """ > +HTMLEntityTableEntry staticEntityTable[%s] = {""" % entity_count > + > +index = {} > +offset = 0 > +for entry in entries: > + letter = entry["entity"][0] > + if not index.get(letter): > + index[letter] = offset > + print ' { %sEntityName, %s, %s },' % ( > + convert_entity_to_cpp_name(entry["entity"]), > + len(entry["entity"]), > + convert_value_to_int(entry["value"])) > + offset += 1 > + > +print """}; > +""" > + > +print "const HTMLEntityTableEntry* uppercaseOffset[] = {" > +for letter in string.uppercase: > + print offset_table_entry(index[letter]) > +print offset_table_entry(index['a']) > +print """}; > + > +const HTMLEntityTableEntry* lowercaseOffset[] = {""" > +for letter in string.lowercase: > + print offset_table_entry(index[letter]) > +print offset_table_entry(entity_count) > +print """}; > + > +} > + > +const HTMLEntityTableEntry* HTMLEntityTable::start(UChar c) > +{ > + if (c >= 'A' && c <= 'Z') > + return uppercaseOffset[c - 'A']; > + if (c >= 'a' && c <= 'z') > + return lowercaseOffset[c - 'a']; > + return 0; > +} > + > +const HTMLEntityTableEntry* HTMLEntityTable::end(UChar c) > +{ > + if (c >= 'A' && c <= 'Z') > + return uppercaseOffset[c - 'A' + 1] - 1; > + if (c >= 'a' && c <= 'z') > + return lowercaseOffset[c - 'a' + 1] - 1; > + return 0; > +} > + > +const HTMLEntityTableEntry* HTMLEntityTable::start() > +{ > + return &staticEntityTable[0]; > +} > + > +const HTMLEntityTableEntry* HTMLEntityTable::end() > +{ > + return &staticEntityTable[%s - 1]; > +} > + > +} > +""" % entity_count WebCore/WebCore.gyp/WebCore.gyp:622 + 'dependencies': [ This should be 'inputs', not 'dependencies'. WebCore/WebCore.gyp/WebCore.gyp:281 + '../html/HTMLEntityNames.json', Eric, you seemed to indicate something wasn’t working here?
Mark Mentovai
Comment 25 2010-08-12 22:19:37 PDT
Sorry, I guess that’s the last time I try to use Bugzilla’s “review patch.”
Eric Seidel (no email)
Comment 26 2010-08-12 22:20:26 PDT
Bugzilla's review patch including all text by default is about the worst idea ever. I think I'll just create a new bug since this one is dead now.
Eric Seidel (no email)
Comment 27 2010-08-12 22:24:34 PDT
I've filed bug https://bugs.webkit.org/show_bug.cgi?id=43948 about the bad default for webkit's "review patch" link. I'll close this bug out once i hear back from the EWSes and open a new one.
Eric Seidel (no email)
Comment 28 2010-08-12 22:26:52 PDT
*** This bug has been marked as a duplicate of bug 43949 ***
Eric Seidel (no email)
Comment 29 2010-08-12 22:30:53 PDT
Created attachment 64302 [details] Now with all build systems including HTMLEntitySearch.*
Note You need to log in before you can comment on or make changes to this bug.