RESOLVED FIXED 74784
Add support for 8 bits strings to Document::isValidName()
https://bugs.webkit.org/show_bug.cgi?id=74784
Summary Add support for 8 bits strings to Document::isValidName()
Benjamin Poulain
Reported 2011-12-17 00:33:17 PST
We don't have to go through the slow case when we have a 8 bit string.
Attachments
Patch (2.21 KB, patch)
2011-12-17 00:35 PST, Benjamin Poulain
no flags
Patch (10.34 KB, patch)
2011-12-18 22:23 PST, Benjamin Poulain
no flags
Benjamin Poulain
Comment 1 2011-12-17 00:35:08 PST
Andreas Kling
Comment 2 2011-12-17 09:28:45 PST
Comment on attachment 119719 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=119719&action=review Ahh, these 8-bit string hacks are just great! :) > Source/WebCore/ChangeLog:9 > + The valid name has a fast path for ASCII, and a slow path > + taking into account Unicode characters. ... a slow path taking Unicode characters into account. > Source/WebCore/ChangeLog:12 > + For 8 bits strings, we do not need to test again the slow path > + as it should never succeed if the fast path does not succed. Since the slow path here is really the non-ASCII path, you could be more specific, e.g: "For 8-bit strings, we don't need to take the non-ASCII path as it could never succeed if the ASCII path didn't." > Source/WebCore/dom/Document.cpp:3802 > + return isValidNameASCII(name.characters16(), length) || isValidNameNonASCII(name.characters16(), length); We should cache the return value from characters16() to avoid doing the String::m_impl null check again.
Benjamin Poulain
Comment 3 2011-12-18 18:17:40 PST
> We should cache the return value from characters16() to avoid doing the String::m_impl null check again. Good catch Baron Kling.
Benjamin Poulain
Comment 4 2011-12-18 18:17:53 PST
Darin Adler
Comment 5 2011-12-18 18:43:52 PST
Comment on attachment 119719 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=119719&action=review >> Source/WebCore/ChangeLog:12 >> + as it should never succeed if the fast path does not succed. > > Since the slow path here is really the non-ASCII path, you could be more specific, e.g: > "For 8-bit strings, we don't need to take the non-ASCII path as it could never succeed if the ASCII path didn't." Is that true? Are there no valid name characters in the 0x80-0xFF range that could be in 8-bit strings? That doesn’t sound right to me.
Benjamin Poulain
Comment 6 2011-12-18 18:49:28 PST
> > Since the slow path here is really the non-ASCII path, you could be more specific, e.g: > > "For 8-bit strings, we don't need to take the non-ASCII path as it could never succeed if the ASCII path didn't." > > Is that true? Are there no valid name characters in the 0x80-0xFF range that could be in 8-bit strings? That doesn’t sound right to me. All I had seen in the nonASCII path is characters > 0xFF. Did I miss something?
Darin Adler
Comment 7 2011-12-18 18:54:03 PST
(In reply to comment #6) > All I had seen in the nonASCII path is characters > 0xFF. Did I miss something? Well, I went to read isValidNameStart and isValidNamePart and I immediately spotted these: - All letters are legal name starts. I believe there are many letters in the U+0080-U+00FF range, such as U+00C0. - U+00B7 is a legal name part. It’s right there as a constant in isValidNamePart. So I think there are many non-ASCII valid name characters that fit in 8-bit.
Benjamin Poulain
Comment 8 2011-12-18 18:56:51 PST
I checked the Unicode categories of the valid characters, and there are some in 0x80-0xFF. The function should probably be changed isValidNameASCII() -> isValidNameLatin1. Or if not possible, skip the shortcut altogether. I will revert this. There should be tests for those cases.
Darin Adler
Comment 9 2011-12-18 19:02:19 PST
(In reply to comment #8) > The function should probably be changed isValidNameASCII() -> isValidNameLatin1. That might be OK. > Or if not possible, skip the shortcut altogether. Why!? Valid ASCII names are far more common than other kind of name, so it’s great to have a fast case for that. > There should be tests for those cases. Absolutely. I agree!
Benjamin Poulain
Comment 10 2011-12-18 19:07:59 PST
> > Or if not possible, skip the shortcut altogether. > > Why!? Valid ASCII names are far more common than other kind of name, so it’s great to have a fast case for that. I meant skipping the Unicode test if the string is 8 bits. The current fast path totally make sense, I have no intention to change that.
Benjamin Poulain
Comment 11 2011-12-18 19:16:08 PST
Reverted in r103210. Reopening to fix this again with tests.
Darin Adler
Comment 12 2011-12-18 19:19:58 PST
I suggest logic that uses characters() if the 8-bit string is not all ASCII. I think it’s OK to expand the string to 16-bit to check if it’s valid if the string has non-ASCII characters in it. I believe those cases are rare enough that we don’t need to optimize them.
Darin Adler
Comment 13 2011-12-18 19:57:37 PST
Here’s the code I suggest: UChar* characters; if (name.is8Bit()) { if (isValidNameASCII(name.characters8(), length)) return true; characters = name.characters(); } else { characters = name.characters16(); if (isValidNameASCII(characters, length)) return true; } return isValidNameNonASCII(characters, length); You like that?
Benjamin Poulain
Comment 14 2011-12-18 20:09:26 PST
(In reply to comment #13) > Here’s the code I suggest: > > UChar* characters; > if (name.is8Bit()) { > if (isValidNameASCII(name.characters8(), length)) > return true; > characters = name.characters(); > } else { > characters = name.characters16(); > if (isValidNameASCII(characters, length)) > return true; > } > return isValidNameNonASCII(characters, length); > > You like that? Sure, I think that is the right thing to do. I just want to add tests before updating and I am busy with a merge at the moment.
Benjamin Poulain
Comment 15 2011-12-18 22:23:11 PST
WebKit Review Bot
Comment 16 2011-12-19 00:11:40 PST
Comment on attachment 119815 [details] Patch Clearing flags on attachment: 119815 Committed r103222: <http://trac.webkit.org/changeset/103222>
WebKit Review Bot
Comment 17 2011-12-19 00:11:45 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.