Bug 148864 - The value sanitization algorithm for input[type=url] should strip whitespaces
Summary: The value sanitization algorithm for input[type=url] should strip whitespaces
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Forms (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Keith Rollin
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2015-09-04 18:24 PDT by Ryosuke Niwa
Modified: 2015-10-19 15:44 PDT (History)
5 users (show)

See Also:


Attachments
Patch (21.25 KB, patch)
2015-10-13 23:04 PDT, Keith Rollin
no flags Details | Formatted Diff | Diff
Patch (21.75 KB, patch)
2015-10-16 10:07 PDT, Keith Rollin
no flags Details | Formatted Diff | Diff
Patch (21.74 KB, patch)
2015-10-16 10:36 PDT, Keith Rollin
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ryosuke Niwa 2015-09-04 18:24:47 PDT
The value sanitization algorithm for input type=url should strip leading and trailing whitespaces.
See https://html.spec.whatwg.org/multipage/forms.html#url-state-(type=url)

This bug was found by the newly added test:
LayoutTests/http/tests/w3c/html/semantics/forms/the-input-element/url.html
Comment 1 Radar WebKit Bug Importer 2015-09-04 18:26:01 PDT
<rdar://problem/22589358>
Comment 2 Keith Rollin 2015-10-13 21:32:05 PDT
For reference, from https://html.spec.whatwg.org/multipage/forms.html#url-state-(type=url) :

The value sanitization algorithm is as follows: Strip line breaks from the value, then strip leading and trailing whitespace from the value.
Comment 3 Keith Rollin 2015-10-13 23:03:02 PDT
Added URLInputType::sanitizeValue to strip leading and trailing spaces.
Comment 4 Keith Rollin 2015-10-13 23:04:29 PDT
Created attachment 263059 [details]
Patch
Comment 5 Chris Dumez 2015-10-14 09:48:21 PDT
Comment on attachment 263059 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=263059&action=review

> Source/WebCore/ChangeLog:8
> +

Please add a link to the spec and indicate the behavior of other major browsers.

> LayoutTests/imported/w3c/web-platform-tests/html/semantics/forms/the-input-element/type-change-state-expected.txt:87
> +FAIL change state from url to text assert_equals: input.value should be   foobar   after change of state expected "  foobar  " but got "foobar"

There are new failures here. I think we should handle these cases properly in this patch.
Comment 6 Keith Rollin 2015-10-14 10:34:13 PDT
> There are new failures here. I think we should handle these cases properly in this patch.

The failures are a confusing part of the test. They look like errors, but they seem to be part of capturing correct/expected results.

What's happening in the test is that there is a double-nested loop, each loop iterating over each possible type of input (hidden, email, url, number, color, etc.). The core of the loop sets the type of the input element to the type controlled by the outer loop, sets the input element's value to a test string ("   foobar   "), sets the type of the input element to the type controlled by the inner loop, and checks the input element's new value.

In order to check that the new value is correct, the test would need to encode the full matrix of initial-type -> new-type pairs, along with the expected new value for that pair (i.e., {"url/hidden/<result>", "url/text/<result>", "url/search/<result>", ... }). There are around 22 different types, meaning 484 pairs that would need to be captured in the test.

This seems onerous and error-prone to generate and maintain. Therefore (and, since I didn't originally write this test, I'm only inferring here), it seems that the creator of this test captured the expected results in the *-expected.txt file by allowing both PASS and FAIL lines. The PASS lines cover test pairs where the result can trivially be validated. The FAIL lines cover test pairs where the result cannot be trivially be determined. But since the FAIL line contains information on what was actually produced in the conversion from one type to another, if a test run continues to match that FAIL line, then the test is correct and passes. If we start failing in a different way, then something has changed that needs to be investigated.

That's what happened with the patch that sanitizes input[type=url].value. The resulting value is now different, leading to some FAIL tests to now fail in a different way, some PASS tests to now fail, and some FAIL tests to now PASS. I went through all the tests that now produce different results to verify that the new results are expected and that the *-expected.txt file contains the patterns we now want.

Please let me know if it's OK to keep this current approach, or if we should implement a new approach (e.g., writing some script to generate and maintain the full set of 484 expected results).
Comment 7 Chris Dumez 2015-10-16 09:27:17 PDT
Comment on attachment 263059 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=263059&action=review

r=me but please update the Change Log as suggested before landing.

>> LayoutTests/imported/w3c/web-platform-tests/html/semantics/forms/the-input-element/type-change-state-expected.txt:87
>> +FAIL change state from url to text assert_equals: input.value should be   foobar   after change of state expected "  foobar  " but got "foobar"
> 
> There are new failures here. I think we should handle these cases properly in this patch.

Ok, never mind this comment.
Comment 8 Keith Rollin 2015-10-16 10:07:33 PDT
Created attachment 263281 [details]
Patch
Comment 9 Chris Dumez 2015-10-16 10:09:27 PDT
Comment on attachment 263281 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=263281&action=review

> Source/WebCore/ChangeLog:7
> +        Reviewed by cdumez@apple.com.

email won't work.

> Source/WebCore/ChangeLog:12
> +        Chrome also has the same issue with url.html. Firefox passes.  All

extra space before 'All'.

> LayoutTests/imported/w3c/ChangeLog:7
> +        Reviewed by cdumez@apple.com.

ditto.
Comment 10 Keith Rollin 2015-10-16 10:36:06 PDT
Created attachment 263288 [details]
Patch
Comment 11 WebKit Commit Bot 2015-10-16 11:24:45 PDT
Comment on attachment 263288 [details]
Patch

Clearing flags on attachment: 263288

Committed r191188: <http://trac.webkit.org/changeset/191188>
Comment 12 WebKit Commit Bot 2015-10-16 11:24:50 PDT
All reviewed patches have been landed.  Closing bug.
Comment 13 Kent Tamura 2015-10-19 01:10:11 PDT
The landed patch looks wrong to me.
Please refer to https://code.google.com/p/chromium/issues/detail?id=446108#c10
Comment 14 Kent Tamura 2015-10-19 01:47:17 PDT
Comment on attachment 263288 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=263288&action=review

> Source/WebCore/html/URLInputType.cpp:69
> +    return stripLeadingAndTrailingHTMLSpaces(BaseTextInputType::sanitizeValue(proposedValue));

TextFieldInputType::sanitizeValue() is not called.  Now 'value' IDL property can have CR/LF wrongly.
Comment 15 Chris Dumez 2015-10-19 09:57:09 PDT
Comment on attachment 263288 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=263288&action=review

>> Source/WebCore/html/URLInputType.cpp:69
>> +    return stripLeadingAndTrailingHTMLSpaces(BaseTextInputType::sanitizeValue(proposedValue));
> 
> TextFieldInputType::sanitizeValue() is not called.  Now 'value' IDL property can have CR/LF wrongly.

I don't understand this comment. We're calling BaseTextInputType::sanitizeValue(), which ends up calling TextFieldInputType::sanitizeValue() (because BaseTextInputType is a subclass of TextFieldInputType and BaseTextInputType does not override sanitizeValue().
Comment 16 Kent Tamura 2015-10-19 15:44:56 PDT
Comment on attachment 263288 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=263288&action=review

>>> Source/WebCore/html/URLInputType.cpp:69
>>> +    return stripLeadingAndTrailingHTMLSpaces(BaseTextInputType::sanitizeValue(proposedValue));
>> 
>> TextFieldInputType::sanitizeValue() is not called.  Now 'value' IDL property can have CR/LF wrongly.
> 
> I don't understand this comment. We're calling BaseTextInputType::sanitizeValue(), which ends up calling TextFieldInputType::sanitizeValue() (because BaseTextInputType is a subclass of TextFieldInputType and BaseTextInputType does not override sanitizeValue().

My bad.  You're right.