WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 44790
Move UTF16 LEAD/TRAIL logic into the HTMLEntityParser
https://bugs.webkit.org/show_bug.cgi?id=44790
Summary
Move UTF16 LEAD/TRAIL logic into the HTMLEntityParser
Adam Barth
Reported
2010-08-27 13:12:52 PDT
Move UTF16 LEAD/TRAIL logic into the HTMLEntityParser
Attachments
Patch
(13.73 KB, patch)
2010-08-27 13:14 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Patch for landing
(14.52 KB, patch)
2010-08-29 23:12 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Adam Barth
Comment 1
2010-08-27 13:14:55 PDT
Created
attachment 65756
[details]
Patch
Eric Seidel (no email)
Comment 2
2010-08-27 14:34:34 PDT
Comment on
attachment 65756
[details]
Patch WebCore/html/HTMLEntityParser.cpp:65 + inline bool convertToCharacters(unsigned value, Vector<UChar>& decodedEntity) I'm confused why this returns a bool if it's always true. WebCore/html/HTMLEntityParser.h:34 + bool consumeHTMLEntity(SegmentedString&, Vector<UChar>& decodedEntity, bool& notEnoughCharacters, UChar additionalAllowedCharacter = '\0'); This is always at most 2 chars. Vector seems like overkill. WebCore/html/HTMLTokenizer.cpp:122 + Vector<UChar> decodedEntity; Inline capacity 2? Using a vector here feels strange. It makes sense in your possible future world if many-char entities, but not much for when we at most produce 2 utf16 code points.
Adam Barth
Comment 3
2010-08-27 14:46:46 PDT
> I'm confused why this returns a bool if it's always true.
Just to make the callsites look snazzy. We always want to return true after calling this function.
> This is always at most 2 chars. Vector seems like overkill.
Secretly I want to prepare for entities that expand to more than two characters.
> WebCore/html/HTMLTokenizer.cpp:122 > + Vector<UChar> decodedEntity; > Inline capacity 2?
Sure.
> Using a vector here feels strange. It makes sense in your possible future world if many-char entities, but not much for when we at most produce 2 utf16 code points.
Would you prefer UChar decodedEntity[2] ? Generally, I like Vector because it makes sure we have memory safety.
Darin Adler
Comment 4
2010-08-28 16:10:54 PDT
Comment on
attachment 65756
[details]
Patch
> + * html/HTMLEntityParser.cpp: > + (WebCore::): > + (WebCore::consumeHTMLEntity):
This list of function names is a bit strange. As you know, I am a big fan of per-function change log comments, but just removing the bogus function name might be enough. For some reason the new function's name is not here either.
> +inline bool convertToCharacters(unsigned value, Vector<UChar>& decodedEntity)
Functions defined and used within the same file should be marked static so they get internal linkage, unless this is inside an unnamed namespace. We have been using static rather than unnamed namespaces because tool support for unnamed namespaces was lacking, but perhaps you decided on the other path for this code? The argument name "value" here is strange. We have a type UChar32 that we often use instead of unsigned for such things, by the way. The name convertToCharacters is a bit unclear since this converts a Unicode character into a sequence of UTF-16 code units. It actually converts *from* a character. I think you should use a different name.
> + if (value < 0xFFFF) {
Why < here instead of <=? The old code used "< 0xFFFF" in one place and "> 0xFFFF" in another, which is inconsistent. ICU has the macro U_IS_BMP for this operation, which I think is slightly more explicit than doing the math directly.
> + Vector<UChar> decodedEntity; > + bool success = consumeHTMLEntity(source, decodedEntity, notEnoughCharacters);
I don't know how hot this code path is, but if it's hot enough, we might want to use Vector<UChar, 16> for this instead of Vector<UChar> to avoid a memory allocation each time. r=me
Adam Barth
Comment 5
2010-08-28 22:27:47 PDT
Thanks for the comments.
> > +inline bool convertToCharacters(unsigned value, Vector<UChar>& decodedEntity) > > Functions defined and used within the same file should be marked static so they get internal linkage, unless this is inside an unnamed namespace. We have been using static rather than unnamed namespaces because tool support for unnamed namespaces was lacking, but perhaps you decided on the other path for this code?
Ah, I was under the impression that inline implies static. I thought that's what one of the compiler gurus told me, but
http://gcc.gnu.org/onlinedocs/gcc/Inline.html
seems to imply that static is not redundant. In this case, the code is inside an anonymous namespace. I didn't know there were tooling issues with anonymous namespaces. Maybe those tooling issues are out of date since we use anonymous namespaces in a bunch of places?
Adam Barth
Comment 6
2010-08-29 23:11:12 PDT
Ok. I think I've addressed all your comments. Let's see if the bot likes the patch.
Adam Barth
Comment 7
2010-08-29 23:12:31 PDT
Created
attachment 65879
[details]
Patch for landing
WebKit Commit Bot
Comment 8
2010-08-29 23:31:36 PDT
Comment on
attachment 65879
[details]
Patch for landing Clearing flags on attachment: 65879 Committed
r66359
: <
http://trac.webkit.org/changeset/66359
>
WebKit Commit Bot
Comment 9
2010-08-29 23:31:41 PDT
All reviewed patches have been landed. Closing bug.
WebKit Review Bot
Comment 10
2010-08-29 23:37:12 PDT
http://trac.webkit.org/changeset/66359
might have broken Qt Linux Release minimal, Qt Linux ARMv5 Release, and Qt Linux ARMv7 Release The following changes are on the blame list:
http://trac.webkit.org/changeset/66360
http://trac.webkit.org/changeset/66359
Adam Barth
Comment 11
2010-08-29 23:55:59 PDT
(In reply to
comment #10
)
>
http://trac.webkit.org/changeset/66359
might have broken Qt Linux Release minimal, Qt Linux ARMv5 Release, and Qt Linux ARMv7 Release
Thanks Darin. I was working on a build fix, but yours is much nice.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug