WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
74826
WebKit should support HTML entities that expand to more than one character
https://bugs.webkit.org/show_bug.cgi?id=74826
Summary
WebKit should support HTML entities that expand to more than one character
Adam Barth
Reported
2011-12-18 16:22:30 PST
WebKit should support HTML entities that expand to more than one character
Attachments
Patch
(23.03 KB, patch)
2011-12-18 16:30 PST
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Patch for landing
(22.45 KB, patch)
2011-12-18 22:53 PST
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Patch for landing
(22.40 KB, patch)
2011-12-19 09:13 PST
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Adam Barth
Comment 1
2011-12-18 16:30:15 PST
Created
attachment 119787
[details]
Patch
Darin Adler
Comment 2
2011-12-18 17:49:13 PST
Comment on
attachment 119787
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=119787&action=review
> Source/WebCore/html/parser/HTMLEntityParser.cpp:160 > + if (U16_LENGTH(entityValue) != 1 || search.mostRecentMatch()->rightValue) {
It’s fine to have one of these checks here at runtime. But the other should be an assertion. There’s no need to check both. Of course, you plan to delete this whole function eventually, I assume.
> Source/WebCore/html/parser/HTMLEntityParser.h:37 > +// FIXME: Move the XML parser to an entity decoding function that actually works!
The “that actually works” here is unnecessarily mysterious. You could say something specific like “that works for non-BMP characters”.
> Source/WebCore/html/parser/HTMLEntityTable.h:39 > + UChar32 leftValue; > + UChar32 rightValue;
Using left and right here is curious. I would have called these first and second or something that implies ordering rather than direction.
> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:382 > + UChar32 entityValue = search.mostRecentMatch()->leftValue; > + // FIXME: We need to account for rightValue if any XML entities are longer > + // than one unicode character. > + ASSERT(!search.mostRecentMatch()->rightValue);
What prevents this assertion from firing? Is there a test case demonstrating that?
> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:385 > if (entityValue <= 0xFFFF) > appendToText(reinterpret_cast<UChar*>(&entityValue), 1); > else {
You can see given the code above that this else is dead code. This code is in a strange state. And the reinterpret_cast to UChar* makes the code little-endian-specific. That is not good!
Adam Barth
Comment 3
2011-12-18 17:55:00 PST
Thanks for the review. I'll go through your comments carefully before landing. I can answer one of your questions quickly though: (In reply to
comment #2
)
> (From update of
attachment 119787
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=119787&action=review
> > > Source/WebCore/xml/parser/XMLTreeBuilder.cpp:382 > > + UChar32 entityValue = search.mostRecentMatch()->leftValue; > > + // FIXME: We need to account for rightValue if any XML entities are longer > > + // than one unicode character. > > + ASSERT(!search.mostRecentMatch()->rightValue); > > What prevents this assertion from firing? Is there a test case demonstrating that?
This code is part of ENABLE(NEWXML), which was an intern project this past summer. Currently it's impossible to execute this code without enabling NEWXML. It might be better to change this to an ASSERT_NOT_REACHED to remind whoever tries to enable this code in the future that this issue needs to be addressed (in addition to the other remaining issues with the new XML parser).
Adam Barth
Comment 4
2011-12-18 22:52:43 PST
> Of course, you plan to delete this whole function eventually, I assume.
Yes. The function is deeply broken and needs to be deleted.
> > Source/WebCore/html/parser/HTMLEntityParser.h:37 > > +// FIXME: Move the XML parser to an entity decoding function that actually works! > > The “that actually works” here is unnecessarily mysterious. You could say something specific like “that works for non-BMP characters”.
Fixed.
> > Source/WebCore/html/parser/HTMLEntityTable.h:39 > > + UChar32 leftValue; > > + UChar32 rightValue; > > Using left and right here is curious. I would have called these first and second or something that implies ordering rather than direction.
Done.
> > Source/WebCore/xml/parser/XMLTreeBuilder.cpp:385 > > if (entityValue <= 0xFFFF) > > appendToText(reinterpret_cast<UChar*>(&entityValue), 1); > > else { > > You can see given the code above that this else is dead code. This code is in a strange state. > > And the reinterpret_cast to UChar* makes the code little-endian-specific. That is not good!
I've added an ASSERT_NOT_REACHED() to this function to ensure that future developers working on bringing up the XML parser read these comments. I've also added your notes as comments.
Adam Barth
Comment 5
2011-12-18 22:53:59 PST
Created
attachment 119818
[details]
Patch for landing
WebKit Review Bot
Comment 6
2011-12-19 00:40:01 PST
Comment on
attachment 119818
[details]
Patch for landing Rejecting
attachment 119818
[details]
from commit-queue. Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '-..." exit_code: 2 Last 500 characters of output: git/webkit-commit-queue/Source/WebKit/chromium/ui --revision 114686 --non-interactive --force --accept theirs-conflict --ignore-externals' in '/mnt/git/webkit-commit-queue/Source/WebKit/chromium' 45>At revision 114686. ________ running '/usr/bin/python tools/clang/scripts/update.py --mac-only' in '/mnt/git/webkit-commit-queue/Source/WebKit/chromium' ________ running '/usr/bin/python gyp_webkit' in '/mnt/git/webkit-commit-queue/Source/WebKit/chromium' Updating webkit projects from gyp files... Full output:
http://queues.webkit.org/results/10930530
Adam Barth
Comment 7
2011-12-19 09:13:46 PST
Created
attachment 119869
[details]
Patch for landing
WebKit Review Bot
Comment 8
2011-12-19 10:30:40 PST
Comment on
attachment 119869
[details]
Patch for landing Clearing flags on attachment: 119869 Committed
r103246
: <
http://trac.webkit.org/changeset/103246
>
WebKit Review Bot
Comment 9
2011-12-19 10:30:45 PST
All reviewed patches have been landed. Closing bug.
Mathias Bynens
Comment 10
2012-05-24 23:32:06 PDT
With this change,
http://mathias.html5.org/tests/named-character-references/
only lists a single error:
> Æ doesn’t match Æ
(Even though the entry in Source/WebCore/html/parser/HTMLEntityNames.in looks alright:
http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLEntityNames.in#L2
) See
bug 87465
for that.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug