36291 – placeholder text should be stripped from line breaks

RESOLVED FIXED 36291

placeholder text should be stripped from line breaks

https://bugs.webkit.org/show_bug.cgi?id=36291

Summary placeholder text should be stripped from line breaks

Mounir Lamouri

Reported 2010-03-18 04:23:32 PDT

According to the specs, the placeholder should be shown after having been stripped from line breaks [1]. At the moment, WebKit doesn't look to do that. A simple test: <input placeholder="first part&#10 &#13second part"/> You should see "first part second part" instead of "first part". [1] "User agents should present this hint to the user, after having stripped line breaks from it, when the element's value is the empty string and the control is not focused (e.g. by displaying it inside a blank unfocused control).", see $URL

Attachments
Proposed patch (31.52 KB, patch) 2010-05-05 02:59 PDT, Kent Tamura	no flags	Details Formatted Diff Diff
Proposed patch (rev.2) (32.47 KB, patch) 2010-05-07 09:59 PDT, Kent Tamura	no flags	Details Formatted Diff Diff
Proposed patch (rev.3) (32.47 KB, patch) 2010-05-07 10:03 PDT, Kent Tamura	no flags	Details Formatted Diff Diff
Proposed patch (rev.4) (32.18 KB, patch) 2010-05-10 06:20 PDT, Kent Tamura	no flags	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Kent Tamura

Comment 1 2010-05-05 02:59:34 PDT

Created attachment 55102 [details] Proposed patch

Darin Adler

Comment 2 2010-05-06 18:20:51 PDT

Comment on attachment 55102 [details] Proposed patch > + const AtomicString& attributeValue = getAttribute(placeholderAttr); > + if (attributeValue.isEmpty()) > + return String(); > + // According to the HTML5 specification, we need to remove CR and LF from > + // the attribute value. > + Vector<UChar> stripped; > + unsigned length = attributeValue.length(); > + stripped.reserveCapacity(length); > + for (unsigned i = 0; i < length; ++i) { > + UChar character = attributeValue[i]; > + if (character == newlineCharacter || character == carriageReturn) > + continue; > + stripped.append(character); > + } > + return String(stripped.data(), stripped.size()); Since most strings don't have line breaks, it would be good to have a more efficient code path for the common case. A simple way to do that is to simply search for newline character and carriage returns first with the contains function and add an early return. Further, we can make this code faster by removing one allocation by using: Vector<UChar, 256> stripped; The storage will be allocated on the stack instead of the heap if it fits in 256 characters. Another possibility, if you decide not to do this and to stick with Vector<UChar>, is to use String::adopt instead, which uses the same pointer. The resulting string will use a bit more storage, but we avoid copying all the characters. > + && !strippedPlaceholder().isEmpty(); Seems wasteful to calculate the stripped placeholder just to check if it's empty. If this is at all hot I suggest using a separate function that does the far more efficient emptiness test, stopping on the first character that is not a newline or carriage return. > virtual ~HTMLTextFormControlElement(); > virtual void dispatchFocusEvent(); > virtual void dispatchBlurEvent(); > + String strippedPlaceholder() const; I think this function should be in its own paragraph, not in the same one as dispatchFocusEvent and dispatchBlurEvent. > const UChar blackSquare = 0x25A0; > const UChar bullet = 0x2022; > +const UChar carriageReturn = 0x000D; > const UChar ethiopicPrefaceColon = 0x1366; > const UChar hebrewPunctuationGeresh = 0x05F3; The characters in CharacterNames.h are supposed to use the names from the Unicode standard. So the name should be lineFeed, not newlineCharacter. The one you are adding has the right name. > + // node() should be an HTMLInputElement. WMLInputElement doesn't support placeholder. > + ASSERT(node()->isHTMLElement()); This is not the right assertion. You need to assert this is an HTMLInputElement since that's what you are casting to. ASSERT(node()->hasTagName(inputTag)); review- because I think you should do at least some of the things I mentioned above

Kent Tamura

Comment 3 2010-05-07 09:59:27 PDT

Created attachment 55390 [details] Proposed patch (rev.2)

Kent Tamura

Comment 4 2010-05-07 10:03:30 PDT

Created attachment 55391 [details] Proposed patch (rev.3)

Kent Tamura

Comment 5 2010-05-07 10:09:57 PDT

Thank you for reviewing! (In reply to comment #2) > (From update of attachment 55102 [details]) > > + const AtomicString& attributeValue = getAttribute(placeholderAttr); > > + if (attributeValue.isEmpty()) > > + return String(); > > + // According to the HTML5 specification, we need to remove CR and LF from > > + // the attribute value. > > + Vector<UChar> stripped; > > + unsigned length = attributeValue.length(); > > + stripped.reserveCapacity(length); > > + for (unsigned i = 0; i < length; ++i) { > > + UChar character = attributeValue[i]; > > + if (character == newlineCharacter || character == carriageReturn) > > + continue; > > + stripped.append(character); > > + } > > + return String(stripped.data(), stripped.size()); > > Since most strings don't have line breaks, it would be good to have a more > efficient code path for the common case. A simple way to do that is to simply > search for newline character and carriage returns first with the contains > function and add an early return. Ok, i added code to check CR/LF existence before copying. > Further, we can make this code faster by removing one allocation by using: > > Vector<UChar, 256> stripped; > > The storage will be allocated on the stack instead of the heap if it fits in > 256 characters. Another possibility, if you decide not to do this and to stick > with Vector<UChar>, is to use String::adopt instead, which uses the same > pointer. The resulting string will use a bit more storage, but we avoid copying > all the characters. I chose String::adopt(). > > + && !strippedPlaceholder().isEmpty(); > > Seems wasteful to calculate the stripped placeholder just to check if it's > empty. If this is at all hot I suggest using a separate function that does the > far more efficient emptiness test, stopping on the first character that is not > a newline or carriage return. Ok, I added new function; isPlaceholderEmpty(). > > virtual ~HTMLTextFormControlElement(); > > virtual void dispatchFocusEvent(); > > virtual void dispatchBlurEvent(); > > + String strippedPlaceholder() const; > > I think this function should be in its own paragraph, not in the same one as > dispatchFocusEvent and dispatchBlurEvent. Added an empty line. > > const UChar blackSquare = 0x25A0; > > const UChar bullet = 0x2022; > > +const UChar carriageReturn = 0x000D; > > const UChar ethiopicPrefaceColon = 0x1366; > > const UChar hebrewPunctuationGeresh = 0x05F3; > > The characters in CharacterNames.h are supposed to use the names from the > Unicode standard. So the name should be lineFeed, not newlineCharacter. The one > you are adding has the right name. No. The right name for U+000D is CARRIAGE RETURN according to the Unicode standard. LINE FEED is U+000A. Note that we already have a wrong name 'newlineCharacter' for U+000A in CharacterNames.h. > > + // node() should be an HTMLInputElement. WMLInputElement doesn't support placeholder. > > + ASSERT(node()->isHTMLElement()); > > This is not the right assertion. You need to assert this is an HTMLInputElement > since that's what you are casting to. No. - WMLInputElement also has inputTag. - HTMLInputElement represents <isindex> too, and we support placeholder for it (See LayoutTests/fast/forms/isindex-placeholder.html)

Kent Tamura

Comment 6 2010-05-07 10:18:38 PDT

(In reply to comment #5) > > > + ASSERT(node()->isHTMLElement()); > > > > This is not the right assertion. You need to assert this is an HTMLInputElement > > since that's what you are casting to. > > No. > - WMLInputElement also has inputTag. > - HTMLInputElement represents <isindex> too, and we support placeholder for it > (See LayoutTests/fast/forms/isindex-placeholder.html) I meant "No" for ASSERT(node()->hasTagName(inputTag)). I don't think ASSERT(hasTagName(inputTag) || hasTagName(isindexTag)) should be added to here because we assume isHTMLElement() == HTMLInputElement in RenderTextControlSingleLine.

Darin Adler

Comment 7 2010-05-07 18:10:23 PDT

(In reply to comment #6) > > - WMLInputElement also has inputTag. That's WMLNames::inputTag, which is not the same as HTMLNames::inputTag. > > - HTMLInputElement represents <isindex> too, and we support placeholder for it Good point. > we assume isHTMLElement() == HTMLInputElement in RenderTextControlSingleLine. OK. But generally speaking if we're about to typecast something, we want an assertion to state that the typecast is OK. And isHTMLElement() does not check that for HTMLInputElement*! ValidityState::typeMismatch is one of many places that uses hasTagName(inputTag) as a guard for a cast to HTMLInputElement*. It's almost certainly OK that site returns false for <isindex>, though, so it's probably OK. Maybe we need to put an isHTMLInputElement function somewhere so we can use it in this assertion. We might also want to audit all the call sites checking inputTag to see if any of them need to cover isindexTag too.

Darin Adler

Comment 8 2010-05-07 18:11:20 PDT

(In reply to comment #5) > > > const UChar blackSquare = 0x25A0; > > > const UChar bullet = 0x2022; > > > +const UChar carriageReturn = 0x000D; > > > const UChar ethiopicPrefaceColon = 0x1366; > > > const UChar hebrewPunctuationGeresh = 0x05F3; > > > > The characters in CharacterNames.h are supposed to use the names from the > > Unicode standard. So the name should be lineFeed, not newlineCharacter. The one > > you are adding has the right name. > > No. The right name for U+000D is CARRIAGE RETURN according to the Unicode > standard. > LINE FEED is U+000A. Note that we already have a wrong name 'newlineCharacter' > for U+000A in CharacterNames.h. Yes, my comment was about the existing wrong name, not about your new correct name.

Darin Adler

Comment 9 2010-05-07 18:21:31 PDT

Comment on attachment 55391 [details] Proposed patch (rev.3) I looked in HTML5 and could not find any text saying that we need to remove CR and LF from attribute values. What I saw was text telling HTML5 content authors they must not include CR and LF in their attribute values, but nothing that tells HTML5 browser implementations what to do. > +<head> > +<link rel="stylesheet" href="../../../../fast/js/resources/js-test-style.css" > +</head> This is strange. Why do you have an unterminated <link> element here? I think you should just leave out this <head> entirely. Do you need this style sheet at all? > +static bool isNewLineCharacter(UChar ch) { return ch == newlineCharacter || ch == carriageReturn; } > + > +static bool isNotNewLineCharacter(UChar ch) { return ch != newlineCharacter && ch != carriageReturn; } Why these particular function names? Are these two characters called "new line characters"? Is that terminology from the HTML5 specification? I'd prefer a different name given that one of these characters is called "newlineCharacter". The function name seems too close to that. > + if (attributeValue.string().find(isNewLineCharacter) == -1) > + return attributeValue; I'm sure it's much more efficient to call contains on a UChar twice than to call the version of find that takes a character-matching function once. I suggest using contains. > +bool HTMLTextFormControlElement::isPlaceholderEmpty() const > +{ > + const AtomicString& attributeValue = getAttribute(placeholderAttr); > + if (attributeValue.isEmpty()) > + return true; > + return attributeValue.string().find(isNotNewLineCharacter) == -1; > +} I don't think you need a special case for the empty string here. The find function is already quite fast for empty strings, and will give the correct result. > + // node() should be an HTMLInputElement. WMLInputElement doesn't support placeholder. The phrase "should be" is not strong enough. I would say "must be".

Kent Tamura

Comment 10 2010-05-10 06:20:11 PDT

Created attachment 55547 [details] Proposed patch (rev.4)

Kent Tamura

Comment 11 2010-05-10 06:25:51 PDT

Thank you for reviewing! (In reply to comment #9) > I looked in HTML5 and could not find any text saying that we need to remove CR and LF from attribute values. What I saw was text telling HTML5 content authors they must not include CR and LF in their attribute values, but nothing that tells HTML5 browser implementations what to do. Here it is: http://www.whatwg.org/specs/web-apps/current-work/multipage/common-input-element-attributes.html#the-placeholder-attribute | User agents should present this hint to the user, | after having stripped line breaks from it, when the | element's value is the empty string and the control | is not focused (e.g. by displaying it inside a blank | unfocused control). > > +<head> > > +<link rel="stylesheet" href="../../../../fast/js/resources/js-test-style.css" > > +</head> > > This is strange. Why do you have an unterminated <link> element here? I think you should just leave out this <head> entirely. Do you need this style sheet at all? You're right. This is not needed at all. Removed. > > +static bool isNewLineCharacter(UChar ch) { return ch == newlineCharacter || ch == carriageReturn; } > > + > > +static bool isNotNewLineCharacter(UChar ch) { return ch != newlineCharacter && ch != carriageReturn; } > > Why these particular function names? Are these two characters called "new line characters"? Is that terminology from the HTML5 specification? I'd prefer a different name given that one of these characters is called "newlineCharacter". The function name seems too close to that. My naming was wrong. The specification calls them "line breaks". So I should name them is(Not)LineBreak(). > > + if (attributeValue.string().find(isNewLineCharacter) == -1) > > + return attributeValue; > > I'm sure it's much more efficient to call contains on a UChar twice than to call the version of find that takes a character-matching function once. I suggest using contains. Ok, I changed it to two contains() calls. > > +bool HTMLTextFormControlElement::isPlaceholderEmpty() const > > +{ > > + const AtomicString& attributeValue = getAttribute(placeholderAttr); > > + if (attributeValue.isEmpty()) > > + return true; > > + return attributeValue.string().find(isNotNewLineCharacter) == -1; > > +} > > I don't think you need a special case for the empty string here. The find function is already quite fast for empty strings, and will give the correct result. Removed attributeValue.isEmpty(). > > + // node() should be an HTMLInputElement. WMLInputElement doesn't support placeholder. > > The phrase "should be" is not strong enough. I would say "must be". Changed to "must be".

Kent Tamura

Comment 12 2010-05-12 08:57:34 PDT

Comment on attachment 55547 [details] Proposed patch (rev.4) Clearing flags on attachment: 55547 Committed r59236: <http://trac.webkit.org/changeset/59236>

Kent Tamura

Comment 13 2010-05-12 08:57:44 PDT

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component Forms

Assignee

Nobody

Reported

2010-03-18 04:23 PDT

Modified

2010-05-12 08:57 PDT History

CC List

4 users Show

URL

http://dev.w3.org/html5/spec/forms.html#attr-input-placeholder

Keywords

Depends on

Blocks