Bug 7761 - Tabs in class attribute not treated as whitespace
Summary: Tabs in class attribute not treated as whitespace
Status: VERIFIED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-03-13 22:29 PST by mitz
Modified: 2019-02-06 09:03 PST (History)
2 users (show)

See Also:


Attachments
test case (165 bytes, text/html)
2006-03-13 22:29 PST, mitz
no flags Details
Treat \t and \r as whitespace (7.99 KB, patch)
2006-03-14 15:29 PST, mitz
darin: review-
Details | Formatted Diff | Diff
Revised patch (8.44 KB, patch)
2006-03-15 10:01 PST, mitz
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description mitz 2006-03-13 22:29:21 PST
Tab characters are considered part of the class name they precede/follow. See the test case, which renders correctly in Firefox.
Comment 1 mitz 2006-03-13 22:29:57 PST
Created attachment 7057 [details]
test case
Comment 2 mitz 2006-03-14 15:29:17 PST
Created attachment 7064 [details]
Treat \t and \r as whitespace
Comment 3 Darin Adler 2006-03-14 15:46:59 PST
Comment on attachment 7064 [details]
Treat \t and \r as whitespace

Looks great. We wanted to get this class off of QStringList anyway, so hooray for that!

OK, let me now be crazy and make premature optimization comments:

+        while (classAttr[sPos] && (classAttr[sPos] == ' ' || classAttr[sPos] == '\r' || classAttr[sPos] == '\n' || classAttr[sPos] == '\t'))
+            sPos++;

The code above is a little inefficient. First of all, it checks that the character is non-null before comparing it with three specific characters. The check for 0 is not needed because if it's ' ' or '\r' or '\n' or '\t', then it's definitely not 0.

Each call to the [] operator does a separate comparison with the length of the string so it would be nicer if we didn't have to index over and over again. If you made a little helper function then you could write:

    while (isClassWhitespace(classAttr[sPos]))
        sPos++;

which might be clearer anyway.

Checking for the null character to detect the end of the string is intrinsically inefficient for our Platform::String class, because it has a length stored, not a trailing null. So if you were to take classAttr.unicode() and classAttr.length() and work directly, you'd end up with something more efficient. Something like this:

    str = classAttr.unicode();

    while (sPos < length && isClassWhitespace(str[sPos]))
        ++sPos;
    if (sPos >= length)
        break;
    int ePos = sPos;
    while (ePos < length && !isClassWhitespace(str[ePos]))
        ++sPos;

But I can't tell whether this kind of tweaking makes the code better or worse.

ePos should not be declared outside the loop. Instead it should be declared right where it's used.

Do we want to optimize the common case where sPos is 0 and ePos is 1 so that we don't create a new atomic string for that case?

Not sure what to do about these nitpicks. The code is good as is. I'm going to mark this review-, but it's practically a review+.
Comment 4 Darin Adler 2006-03-14 15:47:31 PST
(In reply to comment #3)
> (From update of attachment 7064 [details] [edit])
> Do we want to optimize the common case where sPos is 0 and ePos is 1 so that we
> don't create a new atomic string for that case?

I meant where sPos is 0 and ePos is len.
Comment 5 mitz 2006-03-15 10:01:27 PST
Created attachment 7089 [details]
Revised patch

(In reply to comment #3)
> OK, let me now be crazy and make premature optimization comments:

Yay! I followed your suggestions in this version.

> Do we want to optimize the common case [...] so that we
> don't create a new atomic string for that case?

In that case, this patch uses AtomicString(classAttr) instead of AtomicString(str, length), but I almost convinced myself that there is no difference.

> Not sure what to do about these nitpicks. The code is good as is.

Not really... it was also checking each boundary character twice!
Comment 6 Darin Adler 2006-03-15 10:38:43 PST
(In reply to comment #5)
> In that case, this patch uses AtomicString(classAttr) instead of
> AtomicString(str, length), but I almost convinced myself that there is no
> difference.

You're right, there is no difference. To make a difference we'd have to change the code path so that the attribute stays an atomic string. Currently the attribute starts out an atomic string in the attribute, but then:

    1) AttrImpl::value() changes it into a plain string (should be changed to return AtomicString)
    2) calling parseClassAttribute would also change it into a plain string (should be changed to pass an AtomicString)

Also, lowercasing inside parseClassAttribute has to be done carefully; the code does seem to be doing that right already.

But if we took care of all of those, then we could have a fast path. I'm not certain it would matter, but it seems like it would be a win. But lets not require that be in this patch.
Comment 7 Darin Adler 2006-03-15 10:40:25 PST
Comment on attachment 7089 [details]
Revised patch

r=me
Comment 8 mitz 2006-03-19 11:22:37 PST
Verified in r13387 nightly.
Comment 9 Lucas Forschler 2019-02-06 09:03:35 PST
Mass moving XML DOM bugs to the "DOM" Component.