Bug 27032 - document.title does not replace or remove space characters
Summary: document.title does not replace or remove space characters
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords: EasyFix, HasReduction
Depends on:
Blocks:
 
Reported: 2009-07-07 08:36 PDT by hito
Modified: 2010-01-07 13:32 PST (History)
6 users (show)

See Also:


Attachments
Patch and test case for collapsing whitespace when using document.title (4.92 KB, patch)
2009-11-07 13:30 PST, Christian Sejersen
darin: review-
Details | Formatted Diff | Diff
Second version of patch (12.90 KB, patch)
2009-11-19 07:32 PST, Christian Sejersen
darin: review-
Details | Formatted Diff | Diff
Third attempt (11.46 KB, patch)
2009-12-08 13:10 PST, Christian Sejersen
darin: review-
Details | Formatted Diff | Diff
Fourth attempt (13.01 KB, patch)
2009-12-26 16:37 PST, Christian Sejersen
no flags Details | Formatted Diff | Diff
Updated "Fourth attempt" by keeping the patch more true to the original code (13.12 KB, patch)
2009-12-28 14:38 PST, Christian Sejersen
darin: review-
Details | Formatted Diff | Diff
Updated final(?) patch (11.62 KB, patch)
2010-01-06 05:56 PST, Christian Sejersen
eric: review-
commit-queue: commit-queue-
Details | Formatted Diff | Diff
Updated patch (11.52 KB, patch)
2010-01-07 00:15 PST, Christian Sejersen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description hito 2009-07-07 08:36:44 PDT
According to current HTML5 draft.(http://dev.w3.org/html5/spec/Overview.html#document.title)
the value of the document.title attribute for following title element...

<title>
   foo    bar    
</title>

must be "foo bar".
But at Safari 4.0(530.17), the value is "\n    foo    bar    \n".
This is against HTML5 draft or any other major browser's implementation.
Comment 1 Alexey Proskuryakov 2009-07-10 22:09:12 PDT
Confirmed with ToT.
Comment 2 Christian Sejersen 2009-11-07 13:30:31 PST
Created attachment 42702 [details]
Patch and test case for collapsing whitespace when using document.title
Comment 3 Darin Adler 2009-11-07 14:24:42 PST
Comment on attachment 42702 [details]
Patch and test case for collapsing whitespace when using document.title

Thanks for taking this on!

This new work seems to be a subset of the work already done in the DocumentLoader function canonicalizedTitle. I don't think Document::title should have its own separate copy of the logic. Some of what canonicalizedTitle does may be specific to display -- I'm thinking specifically of displayBufferModifiedByEncoding -- but most of it should be shared in common with Document::title.

I think that rather than adding new code for this, instead you should be moving some of the code from DocumentLoader to Document.

A single title in the test case is really light coverage. I'd like to see a test case that sets all sorts of title strings to test the various edge cases and lots of characters other than just letters and the space character. The simplest way to write such a test would be to set and then get the document's title. I recommend a script-tests style test rather than writing your own mini test harness. See an example such as fast/Element/script-tests/element-traversal.js. The HTML driver file for the script test is made by typing make-script-test-wrappers.
Comment 4 Christian Sejersen 2009-11-10 06:13:06 PST
(In reply to comment #3)
> (From update of attachment 42702 [details])
> Thanks for taking this on!

No problem.

> 
> This new work seems to be a subset of the work already done in the
> DocumentLoader function canonicalizedTitle. I don't think Document::title
> should have its own separate copy of the logic. Some of what canonicalizedTitle
> does may be specific to display -- I'm thinking specifically of
> displayBufferModifiedByEncoding -- but most of it should be shared in common
> with Document::title.

It seems odd that the implementation in canonicalizedTitle is not using the already existing methods in String to strip whitespace (e.g. the ones I used in Document::title(). Also one of the comments in canonicalizedTitle is misleading as it states that it will replace backslashes with currency symbols. That doesn't happen until in displayBufferModifiedByEncoding.


> 
> I think that rather than adding new code for this, instead you should be moving
> some of the code from DocumentLoader to Document.

How about keeping the code I have for Document and then in DocumentLoader call displayBufferModifiedByEncoding and remove the whitespace removal code?

> 
> A single title in the test case is really light coverage. I'd like to see a
> test case that sets all sorts of title strings to test the various edge cases
> and lots of characters other than just letters and the space character. The
> simplest way to write such a test would be to set and then get the document's
> title. I recommend a script-tests style test rather than writing your own mini
> test harness. See an example such as
> fast/Element/script-tests/element-traversal.js. The HTML driver file for the
> script test is made by typing make-script-test-wrappers.

I agree and thanks for the pointer. I will change and update the tests with more coverage.
Comment 5 Christian Sejersen 2009-11-19 07:32:35 PST
Created attachment 43500 [details]
Second version of patch

This patch now unifies the functionality of stripping white space between Document and DocumentLoader. In the unified white apace function there is now also support for Unicode white space which also fixes using Unicode white space in the option element (updated test case there as well).
Comment 6 Darin Adler 2009-11-19 10:51:28 PST
Comment on attachment 43500 [details]
Second version of patch

Thanks for tackling this. This patch combines refactoring and at least three different behavior changes. We should ideally do these separately and each behavior change needs either a test or a justification about why it can't be tested.

> +	Unify the behaviour of stripping whitespace from the title of the 
> +	window and when using document.title. So in cases where whitespace had 
> +	to be removed isSpaceOrNewline am now taking into account Unicode separator 
> +	characters. 

Can't land patches with tabs. There are quite a few tabs in the patch.

> +        * dom/Document.cpp:
> +        (WebCore::canonicalizedTitle):
> +        (WebCore::Document::updateTitle):
> +        * dom/Document.h:
> +        (WebCore::Document::title):
> +        * loader/DocumentLoader.cpp:
> +        (WebCore::DocumentLoader::setTitle):
> +        * platform/text/StringImpl.h:
> +        (WebCore::isSpaceOrNewline):

It's better to have per-function comments. It seems I am one of the few people still doing this. You can win points with me by doing it, though!

> +/*
> + * Performs three operations:
> + *  1. Trim leading and trailing whitespace
> + *  2. Collapse internal whitespace.
> +*/

We use "//" comments almost all the time, not "/* */" except in rare cases.

I'm also not sure this is an important comment. Comments should usually state "why", not "what". The "what" part should be expressed in the function names and code itself.

> +static inline String canonicalizedTitle(const String& title)
> +{
> +    //ASSERT(!title.isEmpty());

Assertion should either be removed or added, not added in commented-out.

> +        
> +    String canonicalTitle = title.stripWhiteSpace();
> +    canonicalTitle = canonicalTitle.simplifyWhiteSpace();
> +        
> +    return canonicalTitle;
> +}

Is there a a good reason to not write this one one line?

    return title.stripWhiteSpace().simplifyWhiteSpace();

> +    m_canonicalizedTitle = canonicalizedTitle(m_title);

It seems to me that if we store the canonicalized title, then we should check if it changes, and avoid the setTitle call when the title has not changed.

> -    String title() const { return m_title; }
> +    String title() const { return m_canonicalizedTitle; }

This is a bad situation. We do not want a class where there is a data member named m_title and a function named title, and they are not the same thing.

We need to adjust the names for clarity. Do we even need an m_title data member at all? Maybe m_canonicalizedTitle should be the only thing we store. Who uses m_title?

Maybe m_title should stay and change purpose.

> -/*
> - * Performs four operations:
> - *  1. Convert backslashes to currency symbols
> - *  2. Convert control characters to spaces
> - *  3. Trim leading and trailing spaces
> - *  4. Collapse internal whitespace.
> - */

You eliminated conversion of of control characters to spaces. Why are you sure that's OK? I'd like to see a separate patch for that behavior change which does not combine that change with all the refactoring.

> -static inline String canonicalizedTitle(const String& title, Frame* frame)
> -{
> -    ASSERT(!title.isEmpty());
> -
> -    const UChar* characters = title.characters();
> -    unsigned length = title.length();
> -    unsigned i;
> -
> -    StringBuffer buffer(length);
> -    unsigned builderIndex = 0;
> -
> -    // Skip leading spaces and leading characters that would convert to spaces
> -    for (i = 0; i < length; ++i) {
> -        UChar c = characters[i];
> -        if (!(c <= 0x20 || c == 0x7F))
> -            break;
> -    }
> -
> -    if (i == length)
> -        return "";
> -
> -    // Replace control characters with spaces, and backslashes with currency symbols, and collapse whitespace.
> -    bool previousCharWasWS = false;
> -    for (; i < length; ++i) {
> -        UChar c = characters[i];
> -        if (c <= 0x20 || c == 0x7F || (WTF::Unicode::category(c) & (WTF::Unicode::Separator_Line | WTF::Unicode::Separator_Paragraph))) {
> -            if (previousCharWasWS)
> -                continue;
> -            buffer[builderIndex++] = ' ';
> -            previousCharWasWS = true;
> -        } else {
> -            buffer[builderIndex++] = c;
> -            previousCharWasWS = false;
> -        }
> -    }
> -
> -    // Strip trailing spaces
> -    while (builderIndex > 0) {
> -        --builderIndex;
> -        if (buffer[builderIndex] != ' ')
> -            break;
> -    }
> -
> -    if (!builderIndex && buffer[builderIndex] == ' ')
> -        return "";
> -
> -    buffer.shrink(builderIndex + 1);
> -    
> -    // Replace the backslashes with currency symbols if the encoding requires it.
> -    frame->document()->displayBufferModifiedByEncoding(buffer.characters(), buffer.length());
> -
> -    return String::adopt(buffer);
> -}

This is a hot code path, and one of the reasons the function was written the way it was is for performance. Making your change is OK if we know the performance has not changed. Have you found a way to determine this does not slow down performance?

> -    String trimmed = canonicalizedTitle(title, m_frame);
> -    if (!trimmed.isEmpty() && m_pageTitle != trimmed) {
> +    if (!title.isEmpty() && m_pageTitle != title) {
>          frameLoader()->willChangeTitle(this);
> -        m_pageTitle = trimmed;
> +        // Replace the backslashes with currency symbols if the encoding requires it.
> +        m_pageTitle = m_frame->document()->displayStringModifiedByEncoding(title);
>          frameLoader()->didChangeTitle(this);
>      }

This changes behavior for titles that would be empty if made canonical. Is that OK? Did you test behavior in other browsers? I'd prefer to have the behavior change in a separate patch from the refactoring for clarity, where the change is clearly justified and tested.

>  static inline bool isSpaceOrNewline(UChar c)
>  {
> -    // Use isASCIISpace() for basic Latin-1.
> -    // This will include newlines, which aren't included in Unicode DirWS.
> -    return c <= 0x7F ? WTF::isASCIISpace(c) : WTF::Unicode::direction(c) == WTF::Unicode::WhiteSpaceNeutral;
> +    // Use isASCIISpace() for ASCII
> +    if (c <= 0x7F)
> +        return WTF::isASCIISpace(c);
> +    
> +    // Unicode characters in the space, line or paragraph separator
> +    // category are treated as white space
> +    return ((WTF::Unicode::category(c) & 
> +             (WTF::Unicode::Separator_Space | 
> +              WTF::Unicode::Separator_Line | 
> +              WTF::Unicode::Separator_Paragraph)));
>  }

What's the motivation for this change? Is this about correctness? If so, where's the test case that shows the old behavior was incorrect? Is it about efficiency?

The formatting seems inferior to the existing to me and doesn't follow our rules about where to put boolean operators in multiple lines.

One way to make these more readable is to use "using namespace" inside the function.
Comment 7 Christian Sejersen 2009-11-19 12:39:24 PST
(In reply to comment #6)
> (From update of attachment 43500 [details])
> Thanks for tackling this. This patch combines refactoring and at least three
> different behavior changes. We should ideally do these separately and each
> behavior change needs either a test or a justification about why it can't be
> tested.

I am not sure I agree about the behavior changes as I have indicated below.

> 
> > +	Unify the behaviour of stripping whitespace from the title of the 
> > +	window and when using document.title. So in cases where whitespace had 
> > +	to be removed isSpaceOrNewline am now taking into account Unicode separator 
> > +	characters. 
> 
> Can't land patches with tabs. There are quite a few tabs in the patch.
> 

Oops, will fix that.

> > +        * dom/Document.cpp:
> > +        (WebCore::canonicalizedTitle):
> > +        (WebCore::Document::updateTitle):
> > +        * dom/Document.h:
> > +        (WebCore::Document::title):
> > +        * loader/DocumentLoader.cpp:
> > +        (WebCore::DocumentLoader::setTitle):
> > +        * platform/text/StringImpl.h:
> > +        (WebCore::isSpaceOrNewline):
> 
> It's better to have per-function comments. It seems I am one of the few people
> still doing this. You can win points with me by doing it, though!
> 

I am a bit behind on points so I better do it then :)

> > +/*
> > + * Performs three operations:
> > + *  1. Trim leading and trailing whitespace
> > + *  2. Collapse internal whitespace.
> > +*/
> 
> We use "//" comments almost all the time, not "/* */" except in rare cases.
> 
> I'm also not sure this is an important comment. Comments should usually state
> "why", not "what". The "what" part should be expressed in the function names
> and code itself.
>

I actually just edited the existing comment to reflect what the function actually does but I agree it isn't important, so I will remove it. 
 
> > +static inline String canonicalizedTitle(const String& title)
> > +{
> > +    //ASSERT(!title.isEmpty());
> 
> Assertion should either be removed or added, not added in commented-out.
> 

Doh, will remove the assertion.

> > +        
> > +    String canonicalTitle = title.stripWhiteSpace();
> > +    canonicalTitle = canonicalTitle.simplifyWhiteSpace();
> > +        
> > +    return canonicalTitle;
> > +}
> 
> Is there a a good reason to not write this one one line?
> 
>     return title.stripWhiteSpace().simplifyWhiteSpace();

Nope.

> 
> > +    m_canonicalizedTitle = canonicalizedTitle(m_title);
> 
> It seems to me that if we store the canonicalized title, then we should check
> if it changes, and avoid the setTitle call when the title has not changed.
> 

This already happens in Document.setTitle() with

    if (m_title == title)
        return;

> > -    String title() const { return m_title; }
> > +    String title() const { return m_canonicalizedTitle; }
> 
> This is a bad situation. We do not want a class where there is a data member
> named m_title and a function named title, and they are not the same thing.
> 
> We need to adjust the names for clarity. Do we even need an m_title data member
> at all? Maybe m_canonicalizedTitle should be the only thing we store. Who uses
> m_title?
> 
> Maybe m_title should stay and change purpose.
> 

I agree that this is not ideal, but this is to prevent the situation you mention above, to not change the title if it hasn't changed. Do you have an idea for another way of solving that situation?

> > -/*
> > - * Performs four operations:
> > - *  1. Convert backslashes to currency symbols
> > - *  2. Convert control characters to spaces
> > - *  3. Trim leading and trailing spaces
> > - *  4. Collapse internal whitespace.
> > - */
> 
> You eliminated conversion of of control characters to spaces. Why are you sure
> that's OK? I'd like to see a separate patch for that behavior change which does
> not combine that change with all the refactoring.
> 

Actually, isASCIISpace has the same check for control characters, so there isn't any behavior change.


> > -static inline String canonicalizedTitle(const String& title, Frame* frame)
> > -{
> > -    ASSERT(!title.isEmpty());
> > -
> > -    const UChar* characters = title.characters();
> > -    unsigned length = title.length();
> > -    unsigned i;
> > -
> > -    StringBuffer buffer(length);
> > -    unsigned builderIndex = 0;
> > -
> > -    // Skip leading spaces and leading characters that would convert to spaces
> > -    for (i = 0; i < length; ++i) {
> > -        UChar c = characters[i];
> > -        if (!(c <= 0x20 || c == 0x7F))
> > -            break;
> > -    }
> > -
> > -    if (i == length)
> > -        return "";
> > -
> > -    // Replace control characters with spaces, and backslashes with currency symbols, and collapse whitespace.
> > -    bool previousCharWasWS = false;
> > -    for (; i < length; ++i) {
> > -        UChar c = characters[i];
> > -        if (c <= 0x20 || c == 0x7F || (WTF::Unicode::category(c) & (WTF::Unicode::Separator_Line | WTF::Unicode::Separator_Paragraph))) {
> > -            if (previousCharWasWS)
> > -                continue;
> > -            buffer[builderIndex++] = ' ';
> > -            previousCharWasWS = true;
> > -        } else {
> > -            buffer[builderIndex++] = c;
> > -            previousCharWasWS = false;
> > -        }
> > -    }
> > -
> > -    // Strip trailing spaces
> > -    while (builderIndex > 0) {
> > -        --builderIndex;
> > -        if (buffer[builderIndex] != ' ')
> > -            break;
> > -    }
> > -
> > -    if (!builderIndex && buffer[builderIndex] == ' ')
> > -        return "";
> > -
> > -    buffer.shrink(builderIndex + 1);
> > -    
> > -    // Replace the backslashes with currency symbols if the encoding requires it.
> > -    frame->document()->displayBufferModifiedByEncoding(buffer.characters(), buffer.length());
> > -
> > -    return String::adopt(buffer);
> > -}
> 
> This is a hot code path, and one of the reasons the function was written the
> way it was is for performance. Making your change is OK if we know the
> performance has not changed. Have you found a way to determine this does not
> slow down performance?

I haven't run a profiler on this to determine the difference in performance, but there is a performance penalty just due to the fact that this in done in two functions and the string buffer is copied two times instead of once. I can revert the code back to the original if you think that is better?

Out of curiosity, why is this a hot code path and how can I find out if something is considered a hot code path? E.g. in this case canonicalizedTitle is only called once per document.

> 
> > -    String trimmed = canonicalizedTitle(title, m_frame);
> > -    if (!trimmed.isEmpty() && m_pageTitle != trimmed) {
> > +    if (!title.isEmpty() && m_pageTitle != title) {
> >          frameLoader()->willChangeTitle(this);
> > -        m_pageTitle = trimmed;
> > +        // Replace the backslashes with currency symbols if the encoding requires it.
> > +        m_pageTitle = m_frame->document()->displayStringModifiedByEncoding(title);
> >          frameLoader()->didChangeTitle(this);
> >      }
> 
> This changes behavior for titles that would be empty if made canonical. Is that
> OK? Did you test behavior in other browsers? I'd prefer to have the behavior
> change in a separate patch from the refactoring for clarity, where the change
> is clearly justified and tested.

The title passed to the DocumentLoader has already been canonicalized, so there is no change in behavior here.

> 
> >  static inline bool isSpaceOrNewline(UChar c)
> >  {
> > -    // Use isASCIISpace() for basic Latin-1.
> > -    // This will include newlines, which aren't included in Unicode DirWS.
> > -    return c <= 0x7F ? WTF::isASCIISpace(c) : WTF::Unicode::direction(c) == WTF::Unicode::WhiteSpaceNeutral;
> > +    // Use isASCIISpace() for ASCII
> > +    if (c <= 0x7F)
> > +        return WTF::isASCIISpace(c);
> > +    
> > +    // Unicode characters in the space, line or paragraph separator
> > +    // category are treated as white space
> > +    return ((WTF::Unicode::category(c) & 
> > +             (WTF::Unicode::Separator_Space | 
> > +              WTF::Unicode::Separator_Line | 
> > +              WTF::Unicode::Separator_Paragraph)));
> >  }
> 
> What's the motivation for this change? Is this about correctness? If so,
> where's the test case that shows the old behavior was incorrect? Is it about
> efficiency?

AFAICT the code: WTF::Unicode::direction(c) == WTF::Unicode::WhiteSpaceNeutral really doesn't do anything that has anything to do with determining whether the character is a space or newline. I believe the right implementation is to check for all the Unicode separator categories if the input character isn't ASCII. The test case fast/dom/Document/document-title-get.html I added the one I extended in fast/dom/HTMLOptionElement/option-text.html are testing this code. It will exhibit the wrong behavior in the old code. 

> 
> The formatting seems inferior to the existing to me and doesn't follow our
> rules about where to put boolean operators in multiple lines.
> 
> One way to make these more readable is to use "using namespace" inside the
> function.

OK, thanks for the review, I will upload a new patch.
Comment 8 Ms2ger (he/him; ⌚ UTC+1/+2) 2009-11-20 12:35:31 PST
Please note that the only characters that HTML5 allows to be collapsed are U+0020 SPACE, U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR).
Comment 9 Christian Sejersen 2009-12-08 13:10:09 PST
Created attachment 44480 [details]
Third attempt

In this patch I have taken into account comment #8 and have responded to the previous review in comment #7
Comment 10 WebKit Review Bot 2009-12-08 13:14:15 PST
style-queue ran check-webkit-style on attachment 44480 [details] without any errors.
Comment 11 Darin Adler 2009-12-11 14:13:01 PST
Comment on attachment 44480 [details]
Third attempt

Thanks for tackling this. I would love to see this fixed.

> +static inline String canonicalizedTitle(const String& title)
> +{
> +    if (title.isEmpty())
> +        return title;
> +        
> +    return title.stripWhiteSpace().simplifyWhiteSpace();
> +}

Why the early exit for empty string? I think stripWhiteSpace and simplifyWhiteSpace already are fast and correct for the empty string.

Maybe it's my fault? Did I suggest it?

>  void Document::updateTitle()
>  {
> +    m_canonicalizedTitle = canonicalizedTitle(m_title);
>      if (Frame* f = frame())
> -        f->loader()->setTitle(m_title);
> +        f->loader()->setTitle(m_canonicalizedTitle);
>  }

It seems to me this should only call setTitle if the new m_canonicalizedTitle is different from the old. I know that DocumentLoader::setTitle has a similar optimization, but I'd like to see it at this level too.

> -    String title() const { return m_title; }
> +    String title() const { return m_canonicalizedTitle; }

I'm sorry, I know you quibbled with my comment last time, but I have to reiterate. It is not acceptable to have a class with a title function and an m_title data member, but have those be different things. You need to rename either title or m_title.

> > You eliminated conversion of of control characters to spaces. Why are you sure
> > that's OK? I'd like to see a separate patch for that behavior change which does
> > not combine that change with all the refactoring.
> 
> Actually, isASCIISpace has the same check for control characters, so there
> isn't any behavior change.

isASCIISpace does not return true for control characters. The check in isASCIISpace is a performance optimization. The isASCIISpace function returns false for characters like U+0000 and U+0001.

>  static inline bool isSpaceOrNewline(UChar c)
>  {
> +    // According to HTML 5 space characters, are defined as U+0020 SPACE, 
> +    // U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), 
> +    // U+000C FORM FEED (FF) and U+000D CARRIAGE RETURN (CR).
> +    return c <= 0x7F ? isASCIISpace(c) : false;
>  }

While it's nice to have that comment there, it's not what the code implements. isASCIISpace returns true for U+000B, and thus so does this function. I think it's not good to put the comment in there, implying that's what the function does.

The check for <= 0x7F is unnecessary. The isASCIISpace function already returns false for non-ASCII characters, so you don't need to redo that work.

I think the function should be named isHTMLSpace, and should be correctly implemented to return false for U+000B.

Also, since you are fixing this function, we need to write tests to show the bugs we're fixing by correcting the definition. We need to look at call sites and see which ones we're changing. There's a chance that for at least some of the call sites, the HTML space character definition is not what's wanted, and we need test cases to figure this out.
Comment 12 Christian Sejersen 2009-12-26 16:37:15 PST
Created attachment 45516 [details]
Fourth attempt

Simpler patch now, that moves the canonicalizedTitle function from DocumentLoader.cpp to Document.cpp and adds an extra member variable to Document (m_rawTitle) that stores the title passed in, while m_title has the canonicalizedTitle now.
Comment 13 WebKit Review Bot 2009-12-26 16:42:41 PST
style-queue ran check-webkit-style on attachment 45516 [details] without any errors.
Comment 14 Christian Sejersen 2009-12-28 14:38:11 PST
Created attachment 45574 [details]
Updated "Fourth attempt" by keeping the patch more true to the original code

An update to the 4th patch keeping the display buffer encoding code in the canonicalization of the title function.
Comment 15 WebKit Review Bot 2009-12-28 14:43:44 PST
style-queue ran check-webkit-style on attachment 45574 [details] without any errors.
Comment 16 Eric Seidel (no email) 2009-12-29 00:20:59 PST
Comment on attachment 45574 [details]
Updated "Fourth attempt" by keeping the patch more true to the original code

Be sure to mark your attachments as patches, with type text/plain.  Various patch uploading tools including "bugzilla-tool post-diff" or "bugzilla-tool post-commits" will handle that for you (see WebKitTools/Scripts/bugzilla-tool --help).
Comment 17 Christian Sejersen 2010-01-01 04:06:19 PST
(In reply to comment #16)
> (From update of attachment 45574 [details])
> Be sure to mark your attachments as patches, with type text/plain.  Various
> patch uploading tools including "bugzilla-tool post-diff" or "bugzilla-tool
> post-commits" will handle that for you (see WebKitTools/Scripts/bugzilla-tool
> --help).
Apologies, it was an oversight from my side. As you may see the previous 4 patches were marked as such. Thanks for correcting it.
Comment 18 Darin Adler 2010-01-04 09:05:28 PST
Comment on attachment 45574 [details]
Updated "Fourth attempt" by keeping the patch more true to the original code

> +static inline String canonicalizedTitle(Document* doc, const String& title)

We normally do not abbreviate names like "doc". Using "document" would be better.

> +}
> +
> +
>  void Document::updateTitle()

Other functions in this file have one blank line between them.

>  {
> +    m_title = canonicalizedTitle(this, m_rawTitle);
>      if (Frame* f = frame())
>          f->loader()->setTitle(m_title);
>  }

It would be good to call setTitle only if m_title is different than it was previously.

> +debug('Test with Unicode whitespace in the space separator category');
> +document.title = "\u0020\u0020one\u0020\u0020\u0020space\u0020\u0020";
> +shouldBeEqualToString("document.title","one space");

This test does not do what it says. The spaces are all "\u0020" which is just a normal space expressed with "\u" syntax. It would be good to have tests of other whitespace characters as the comment indicates.

> Index: LayoutTests/fast/dom/HTMLOptionElement/option-text-expected.txt
> ===================================================================
> --- LayoutTests/fast/dom/HTMLOptionElement/option-text-expected.txt	(revision 52602)
> +++ LayoutTests/fast/dom/HTMLOptionElement/option-text-expected.txt	(working copy)
> @@ -6,4 +6,5 @@ PASS: got "foo" as expected.
>  PASS: got "Â¥" as expected.
>  PASS: got "foo bar" as expected.
>  PASS: got "label" as expected.
> +PASS: got "one space" as expected.
>  
> Index: LayoutTests/fast/dom/HTMLOptionElement/option-text.html
> ===================================================================
> --- LayoutTests/fast/dom/HTMLOptionElement/option-text.html	(revision 52602)
> +++ LayoutTests/fast/dom/HTMLOptionElement/option-text.html	(working copy)
> @@ -20,6 +20,7 @@
>          <option id="o3">\</option>
>          <option id="o4">foo   bar</option>
>          <option id="o5" label=" label ">ignored</option>
> +	<option id="o6">&#x0020;one&#x0009;&#x000a;&#x000c;&#x000d;space&#x0020;</option>
>      </select>
>      <pre id="console"></pre>
>      <script>
> @@ -42,6 +43,7 @@
>          test("o3", "\u00a5");
>          test("o4", "foo bar");
>          test("o5", "label");
> +	test("o6", "one space");
>      </script>
>  </body>
>  </html>

This change to this test seems unrelated, and has tab characters in it. I think these tab characters are the reason this patch won't auto-apply.

I'm going to say review-, but *only* because of the option-text mistake. Otherwise seems good.

I think we could follow this up by eliminating m_rawTitle entirely. And perhaps make some of the other changes from earlier versions of your patch if there are good reasons to do so.
Comment 19 Christian Sejersen 2010-01-06 05:42:25 PST
(In reply to comment #18)
> (From update of attachment 45574 [details])
> > +static inline String canonicalizedTitle(Document* doc, const String& title)
> 
> We normally do not abbreviate names like "doc". Using "document" would be
> better.

Fixed.

> 
> > +}
> > +
> > +
> >  void Document::updateTitle()
> 
> Other functions in this file have one blank line between them.

Fixed.

> >  {
> > +    m_title = canonicalizedTitle(this, m_rawTitle);
> >      if (Frame* f = frame())
> >          f->loader()->setTitle(m_title);
> >  }
> 
> It would be good to call setTitle only if m_title is different than it was
> previously.

In setTitle there is the following check:

    if (m_rawTitle == title)
        return;

So updateTitle() isn't called unless they are different.

> > +debug('Test with Unicode whitespace in the space separator category');
> > +document.title = "\u0020\u0020one\u0020\u0020\u0020space\u0020\u0020";
> > +shouldBeEqualToString("document.title","one space");
> 
> This test does not do what it says. The spaces are all "\u0020" which is just a
> normal space expressed with "\u" syntax. It would be good to have tests of
> other whitespace characters as the comment indicates.

I took out this case as the only whitespace that we need to remove in the space separator category is "\u0020". I have added "\u0020" to the next test. 

> 
> > Index: LayoutTests/fast/dom/HTMLOptionElement/option-text-expected.txt
> > ===================================================================
> > --- LayoutTests/fast/dom/HTMLOptionElement/option-text-expected.txt	(revision 52602)
> > +++ LayoutTests/fast/dom/HTMLOptionElement/option-text-expected.txt	(working copy)
> > @@ -6,4 +6,5 @@ PASS: got "foo" as expected.
> >  PASS: got "Â¥" as expected.
> >  PASS: got "foo bar" as expected.
> >  PASS: got "label" as expected.
> > +PASS: got "one space" as expected.
> >  
> > Index: LayoutTests/fast/dom/HTMLOptionElement/option-text.html
> > ===================================================================
> > --- LayoutTests/fast/dom/HTMLOptionElement/option-text.html	(revision 52602)
> > +++ LayoutTests/fast/dom/HTMLOptionElement/option-text.html	(working copy)
> > @@ -20,6 +20,7 @@
> >          <option id="o3">\</option>
> >          <option id="o4">foo   bar</option>
> >          <option id="o5" label=" label ">ignored</option>
> > +	<option id="o6">&#x0020;one&#x0009;&#x000a;&#x000c;&#x000d;space&#x0020;</option>
> >      </select>
> >      <pre id="console"></pre>
> >      <script>
> > @@ -42,6 +43,7 @@
> >          test("o3", "\u00a5");
> >          test("o4", "foo bar");
> >          test("o5", "label");
> > +	test("o6", "one space");
> >      </script>
> >  </body>
> >  </html>
> 
> This change to this test seems unrelated, and has tab characters in it. I think
> these tab characters are the reason this patch won't auto-apply.

I reverted the changes.

> 
> I'm going to say review-, but *only* because of the option-text mistake.
> Otherwise seems good.

Thanks for your patience with me.

> 
> I think we could follow this up by eliminating m_rawTitle entirely. And perhaps
> make some of the other changes from earlier versions of your patch if there are
> good reasons to do so.

Sounds like a good idea. I will file a follow-up bug once this gets in.
Comment 20 Christian Sejersen 2010-01-06 05:56:52 PST
Created attachment 45961 [details]
Updated final(?) patch

A few minor changes according to comments in comment #18
Comment 21 WebKit Review Bot 2010-01-06 05:57:41 PST
style-queue ran check-webkit-style on attachment 45961 [details] without any errors.
Comment 22 Darin Adler 2010-01-06 08:18:37 PST
(In reply to comment #19)
> (In reply to comment #18)
> > >  {
> > > +    m_title = canonicalizedTitle(this, m_rawTitle);
> > >      if (Frame* f = frame())
> > >          f->loader()->setTitle(m_title);
> > >  }
> > 
> > It would be good to call setTitle only if m_title is different than it was
> > previously.
> 
> In setTitle there is the following check:
> 
>     if (m_rawTitle == title)
>         return;
> 
> So updateTitle() isn't called unless they are different.

Yes, my point was simply that two different raw titles could result in the same canonicalized title.
Comment 23 Darin Adler 2010-01-06 08:19:35 PST
Comment on attachment 45961 [details]
Updated final(?) patch

> +debug('Test with various length strings');

A title here, but no tests.

> +var successfullyParsed = true;
> \ No newline at end of file

Normally we put newlines at ends of files like this.

r=me -- we can fix up more in followups
Comment 24 WebKit Commit Bot 2010-01-06 20:23:28 PST
Comment on attachment 45961 [details]
Updated final(?) patch

Rejecting patch 45961 from commit-queue.

Failed to run "['WebKitTools/Scripts/run-webkit-tests', '--no-launch-safari', '--exit-after-n-failures=1', '--quiet']" exit_code: 1
Running build-dumprendertree
Running tests from /Users/eseidel/Projects/CommitQueue/LayoutTests
Testing 11881 test cases.
fast/dom/Document/document-title-get.html -> failed

Exiting early after 1 failures. 5673 tests run.
84.13s total testing time

5672 test cases (99%) succeeded
1 test case (<1%) had incorrect layout
1 test case (<1%) had stderr output

Full output: http://webkit-commit-queue.appspot.com/results/166084
Comment 25 Eric Seidel (no email) 2010-01-06 23:01:37 PST
Comment on attachment 45961 [details]
Updated final(?) patch

Failure seen on the commit-bot:
--- /tmp/layout-test-results/fast/dom/Document/document-title-get-expected.txt	2010-01-06 20:23:24.000000000 -0800
+++ /tmp/layout-test-results/fast/dom/Document/document-title-get-actual.txt	2010-01-06 20:23:24.000000000 -0800
@@ -11,8 +11,6 @@
 PASS document.title is ""
 Test with no whitespace
 PASS document.title is "nowhitespacetitle"
-Test with Unicode whitespace in the space separator category
-PASS document.title is "one space"
 Test with whitespace
 PASS document.title is "one space"
 Test with various whitespace lengths and fields

Looks like this patch is going to need one more round of update.  Marking r- since it would cause a test to fail as is.
Comment 26 Eric Seidel (no email) 2010-01-06 23:02:52 PST
Actually, it looks like the restuls are just out of date.  You could land this with Darin's review if you fixed the expected results.  Or you could simply re-post and we can r+/cq+ it again.  I can't remember if you're a committer or not.
Comment 27 Christian Sejersen 2010-01-07 00:15:08 PST
Created attachment 46028 [details]
Updated patch

Removed the obsolete expected result. Doh!
Comment 28 WebKit Review Bot 2010-01-07 00:53:04 PST
style-queue ran check-webkit-style on attachment 46028 [details] without any errors.
Comment 29 WebKit Commit Bot 2010-01-07 13:32:24 PST
Comment on attachment 46028 [details]
Updated patch

Clearing flags on attachment: 46028

Committed r52944: <http://trac.webkit.org/changeset/52944>
Comment 30 WebKit Commit Bot 2010-01-07 13:32:32 PST
All reviewed patches have been landed.  Closing bug.