WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
74784
Add support for 8 bits strings to Document::isValidName()
https://bugs.webkit.org/show_bug.cgi?id=74784
Summary
Add support for 8 bits strings to Document::isValidName()
Benjamin Poulain
Reported
2011-12-17 00:33:17 PST
We don't have to go through the slow case when we have a 8 bit string.
Attachments
Patch
(2.21 KB, patch)
2011-12-17 00:35 PST
,
Benjamin Poulain
no flags
Details
Formatted Diff
Diff
Patch
(10.34 KB, patch)
2011-12-18 22:23 PST
,
Benjamin Poulain
no flags
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Benjamin Poulain
Comment 1
2011-12-17 00:35:08 PST
Created
attachment 119719
[details]
Patch
Andreas Kling
Comment 2
2011-12-17 09:28:45 PST
Comment on
attachment 119719
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=119719&action=review
Ahh, these 8-bit string hacks are just great! :)
> Source/WebCore/ChangeLog:9 > + The valid name has a fast path for ASCII, and a slow path > + taking into account Unicode characters.
... a slow path taking Unicode characters into account.
> Source/WebCore/ChangeLog:12 > + For 8 bits strings, we do not need to test again the slow path > + as it should never succeed if the fast path does not succed.
Since the slow path here is really the non-ASCII path, you could be more specific, e.g: "For 8-bit strings, we don't need to take the non-ASCII path as it could never succeed if the ASCII path didn't."
> Source/WebCore/dom/Document.cpp:3802 > + return isValidNameASCII(name.characters16(), length) || isValidNameNonASCII(name.characters16(), length);
We should cache the return value from characters16() to avoid doing the String::m_impl null check again.
Benjamin Poulain
Comment 3
2011-12-18 18:17:40 PST
> We should cache the return value from characters16() to avoid doing the String::m_impl null check again.
Good catch Baron Kling.
Benjamin Poulain
Comment 4
2011-12-18 18:17:53 PST
Committed
r103205
: <
http://trac.webkit.org/changeset/103205
>
Darin Adler
Comment 5
2011-12-18 18:43:52 PST
Comment on
attachment 119719
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=119719&action=review
>> Source/WebCore/ChangeLog:12 >> + as it should never succeed if the fast path does not succed. > > Since the slow path here is really the non-ASCII path, you could be more specific, e.g: > "For 8-bit strings, we don't need to take the non-ASCII path as it could never succeed if the ASCII path didn't."
Is that true? Are there no valid name characters in the 0x80-0xFF range that could be in 8-bit strings? That doesn’t sound right to me.
Benjamin Poulain
Comment 6
2011-12-18 18:49:28 PST
> > Since the slow path here is really the non-ASCII path, you could be more specific, e.g: > > "For 8-bit strings, we don't need to take the non-ASCII path as it could never succeed if the ASCII path didn't." > > Is that true? Are there no valid name characters in the 0x80-0xFF range that could be in 8-bit strings? That doesn’t sound right to me.
All I had seen in the nonASCII path is characters > 0xFF. Did I miss something?
Darin Adler
Comment 7
2011-12-18 18:54:03 PST
(In reply to
comment #6
)
> All I had seen in the nonASCII path is characters > 0xFF. Did I miss something?
Well, I went to read isValidNameStart and isValidNamePart and I immediately spotted these: - All letters are legal name starts. I believe there are many letters in the U+0080-U+00FF range, such as U+00C0. - U+00B7 is a legal name part. It’s right there as a constant in isValidNamePart. So I think there are many non-ASCII valid name characters that fit in 8-bit.
Benjamin Poulain
Comment 8
2011-12-18 18:56:51 PST
I checked the Unicode categories of the valid characters, and there are some in 0x80-0xFF. The function should probably be changed isValidNameASCII() -> isValidNameLatin1. Or if not possible, skip the shortcut altogether. I will revert this. There should be tests for those cases.
Darin Adler
Comment 9
2011-12-18 19:02:19 PST
(In reply to
comment #8
)
> The function should probably be changed isValidNameASCII() -> isValidNameLatin1.
That might be OK.
> Or if not possible, skip the shortcut altogether.
Why!? Valid ASCII names are far more common than other kind of name, so it’s great to have a fast case for that.
> There should be tests for those cases.
Absolutely. I agree!
Benjamin Poulain
Comment 10
2011-12-18 19:07:59 PST
> > Or if not possible, skip the shortcut altogether. > > Why!? Valid ASCII names are far more common than other kind of name, so it’s great to have a fast case for that.
I meant skipping the Unicode test if the string is 8 bits. The current fast path totally make sense, I have no intention to change that.
Benjamin Poulain
Comment 11
2011-12-18 19:16:08 PST
Reverted in
r103210
. Reopening to fix this again with tests.
Darin Adler
Comment 12
2011-12-18 19:19:58 PST
I suggest logic that uses characters() if the 8-bit string is not all ASCII. I think it’s OK to expand the string to 16-bit to check if it’s valid if the string has non-ASCII characters in it. I believe those cases are rare enough that we don’t need to optimize them.
Darin Adler
Comment 13
2011-12-18 19:57:37 PST
Here’s the code I suggest: UChar* characters; if (name.is8Bit()) { if (isValidNameASCII(name.characters8(), length)) return true; characters = name.characters(); } else { characters = name.characters16(); if (isValidNameASCII(characters, length)) return true; } return isValidNameNonASCII(characters, length); You like that?
Benjamin Poulain
Comment 14
2011-12-18 20:09:26 PST
(In reply to
comment #13
)
> Here’s the code I suggest: > > UChar* characters; > if (name.is8Bit()) { > if (isValidNameASCII(name.characters8(), length)) > return true; > characters = name.characters(); > } else { > characters = name.characters16(); > if (isValidNameASCII(characters, length)) > return true; > } > return isValidNameNonASCII(characters, length); > > You like that?
Sure, I think that is the right thing to do. I just want to add tests before updating and I am busy with a merge at the moment.
Benjamin Poulain
Comment 15
2011-12-18 22:23:11 PST
Created
attachment 119815
[details]
Patch
WebKit Review Bot
Comment 16
2011-12-19 00:11:40 PST
Comment on
attachment 119815
[details]
Patch Clearing flags on attachment: 119815 Committed
r103222
: <
http://trac.webkit.org/changeset/103222
>
WebKit Review Bot
Comment 17
2011-12-19 00:11:45 PST
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug