Bug 160172 - Remove stripLeadingAndTrailingWhitespace from MathMLElement.cpp
Summary: Remove stripLeadingAndTrailingWhitespace from MathMLElement.cpp
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: David Kilzer (:ddkilzer)
URL:
Keywords: InRadar
Depends on:
Blocks: 160462
  Show dependency treegraph
 
Reported: 2016-07-25 14:57 PDT by David Kilzer (:ddkilzer)
Modified: 2019-02-22 15:46 PST (History)
14 users (show)

See Also:


Attachments
Patch v1 (7.60 KB, patch)
2016-07-25 16:33 PDT, David Kilzer (:ddkilzer)
no flags Details | Formatted Diff | Diff
Patch v2 (7.88 KB, patch)
2016-07-26 06:12 PDT, David Kilzer (:ddkilzer)
no flags Details | Formatted Diff | Diff
Patch v3 (8.35 KB, patch)
2016-07-26 08:00 PDT, David Kilzer (:ddkilzer)
no flags Details | Formatted Diff | Diff
Patch v4 (6.53 KB, patch)
2016-07-26 08:43 PDT, David Kilzer (:ddkilzer)
no flags Details | Formatted Diff | Diff
Patch v5 (13.67 KB, patch)
2016-08-07 06:30 PDT, David Kilzer (:ddkilzer)
no flags Details | Formatted Diff | Diff
Patch (2.35 KB, patch)
2019-02-22 00:47 PST, Rob Buis
no flags Details | Formatted Diff | Diff
Patch (4.36 KB, patch)
2019-02-22 02:59 PST, Rob Buis
no flags Details | Formatted Diff | Diff
Patch (4.41 KB, patch)
2019-02-22 03:39 PST, Rob Buis
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews114 for mac-highsierra (2.30 MB, application/zip)
2019-02-22 05:26 PST, EWS Watchlist
no flags Details
Archive of layout-test-results from ews126 for ios-simulator-wk2 (2.47 MB, application/zip)
2019-02-22 05:41 PST, EWS Watchlist
no flags Details
Patch (5.09 KB, patch)
2019-02-22 05:50 PST, Rob Buis
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Kilzer (:ddkilzer) 2016-07-25 14:57:59 PDT
Based on Bug 160111 Comment #14, we should:
- Move skipLeadingAndTrailingWhitespace() into the HTMLParserIdioms class,
- Rename it to overload stripLeadingAndTrailingHTMLSpaces(), and
- Change the argument from const StringView& to StringView.

Eventually, we should remove the stripLeadingAndTrailingHTMLSpaces(const String&) method in favor of the StringView method.
Comment 1 David Kilzer (:ddkilzer) 2016-07-25 16:33:08 PDT
Created attachment 284544 [details]
Patch v1
Comment 2 David Kilzer (:ddkilzer) 2016-07-25 16:33:40 PDT
Comment on attachment 284544 [details]
Patch v1

Oops, not ready yet.
Comment 3 David Kilzer (:ddkilzer) 2016-07-26 06:12:54 PDT
Created attachment 284586 [details]
Patch v2
Comment 4 David Kilzer (:ddkilzer) 2016-07-26 08:00:18 PDT
Created attachment 284589 [details]
Patch v3
Comment 5 David Kilzer (:ddkilzer) 2016-07-26 08:43:13 PDT
Created attachment 284593 [details]
Patch v4
Comment 6 David Kilzer (:ddkilzer) 2016-07-26 08:46:13 PDT
(In reply to comment #5)
> Created attachment 284593 [details]
> Patch v4

This has the minimal amount of changes to move the method into HTMLParserIdioms.
Comment 7 Chris Dumez 2016-07-26 09:23:59 PDT
Comment on attachment 284593 [details]
Patch v4

View in context: https://bugs.webkit.org/attachment.cgi?id=284593&action=review

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:80
> +    unsigned start = 0;

start could be declared later, right before the first while loop.

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:84
> +        return string.isNull() ? string : StringView::empty();

Is this ternary reversed? Right now it says, if string is null, return null. If string is empty, return empty. Seems like a no-op.
Comment 8 Darin Adler 2016-07-26 10:50:33 PDT
Comment on attachment 284593 [details]
Patch v4

View in context: https://bugs.webkit.org/attachment.cgi?id=284593&action=review

Should probably make at least some small improvements before landing. Also should probably have unit tests for this function.

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:78
> +StringView stripLeadingAndTrailingHTMLSpaces(StringView string)

Should consider writing this more like the existing String version. Separate loops for 8-bit and 16-bit characters are likely to result in better performance, hoisting the check for the string type out of the two loops. On the other hand, maybe it’s not a measurable performance difference.

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:81
> +    unsigned stringLength = string.length();

This should just be named length. After the value is modified during the execution of the function it’s no longer the string length. It becomes the substring length, so the name stringLength does more harm than good.

>> Source/WebCore/html/parser/HTMLParserIdioms.cpp:84
>> +        return string.isNull() ? string : StringView::empty();
> 
> Is this ternary reversed? Right now it says, if string is null, return null. If string is empty, return empty. Seems like a no-op.

This special case for zero length strings can and should simply be removed. The loops below both handle zero length strings perfectly, as does the substring function.

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:86
> +    while (stringLength > 0 && isHTMLSpace(string[start])) {

Should just be while (stringLength && ...), not > 0.

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:88
> +        start++;
> +        stringLength--;

Pre-increment and pre-decrement are more our standard C++ style.

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:91
> +    while (stringLength > 0 && isHTMLSpace(string[start + stringLength - 1]))

Should be just while (stringLength && ...), not > 0.

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:92
> +        stringLength--;

Pre-decrement is more our standard C++ style.
Comment 9 Frédéric Wang (:fredw) 2016-08-03 02:46:53 PDT
@David: Are you still working on this?
Comment 10 David Kilzer (:ddkilzer) 2016-08-03 03:01:21 PDT
Yes, planning to do another patch to address Darin's comments and to add tests.

Got side-tracked with other work for the past couple of days.
Comment 11 Frédéric Wang (:fredw) 2016-08-03 04:01:17 PDT
(In reply to comment #10)
> Yes, planning to do another patch to address Darin's comments and to add
> tests.
> 
> Got side-tracked with other work for the past couple of days.

OK, I was asking because I need to make skipLeadingAndTrailingWhitespace accessible from another file in bug 160462, so just wanted to warn you that the two patches may conflict.
Comment 12 David Kilzer (:ddkilzer) 2016-08-05 14:36:15 PDT
Comment on attachment 284593 [details]
Patch v4

View in context: https://bugs.webkit.org/attachment.cgi?id=284593&action=review

>> Source/WebCore/html/parser/HTMLParserIdioms.cpp:78
>> +StringView stripLeadingAndTrailingHTMLSpaces(StringView string)
> 
> Should consider writing this more like the existing String version. Separate loops for 8-bit and 16-bit characters are likely to result in better performance, hoisting the check for the string type out of the two loops. On the other hand, maybe it’s not a measurable performance difference.

Okay, I'll make this change when removing:

String stripLeadingAndTrailingHTMLSpaces(const String& string)

and replacing it with:

StringView stripLeadingAndTrailingHTMLSpaces(StringView string)

>>> Source/WebCore/html/parser/HTMLParserIdioms.cpp:84
>>> +        return string.isNull() ? string : StringView::empty();
>> 
>> Is this ternary reversed? Right now it says, if string is null, return null. If string is empty, return empty. Seems like a no-op.
> 
> This special case for zero length strings can and should simply be removed. The loops below both handle zero length strings perfectly, as does the substring function.

I added this because  stripLeadingAndTrailingHTMLSpaces(const String& string) has this early return.

Seems like when I refactor to remove stripLeadingAndTrailingHTMLSpaces(const String& string), then I should bring this back.
Comment 13 David Kilzer (:ddkilzer) 2016-08-05 14:37:39 PDT
(In reply to comment #8)
> Comment on attachment 284593 [details]
> Patch v4
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=284593&action=review
> 
> Should probably make at least some small improvements before landing. Also
> should probably have unit tests for this function.

Going to write some test cases before landing (or maybe shortly afterwards in a separate patch).

(In reply to comment #11)
> (In reply to comment #10)
> > Yes, planning to do another patch to address Darin's comments and to add
> > tests.
> > 
> > Got side-tracked with other work for the past couple of days.
> 
> OK, I was asking because I need to make skipLeadingAndTrailingWhitespace
> accessible from another file in bug 160462, so just wanted to warn you that
> the two patches may conflict.

Thanks for the head's up!  I'll check to make sure this is working as expected before landing.
Comment 14 David Kilzer (:ddkilzer) 2016-08-07 06:30:55 PDT
Created attachment 285527 [details]
Patch v5
Comment 15 David Kilzer (:ddkilzer) 2016-08-07 06:35:39 PDT
Comment on attachment 284593 [details]
Patch v4

View in context: https://bugs.webkit.org/attachment.cgi?id=284593&action=review

>> Source/WebCore/html/parser/HTMLParserIdioms.cpp:80
>> +    unsigned start = 0;
> 
> start could be declared later, right before the first while loop.

With Patch v5 change, this is a moot point.

>>>> Source/WebCore/html/parser/HTMLParserIdioms.cpp:84
>>>> +        return string.isNull() ? string : StringView::empty();
>>> 
>>> Is this ternary reversed? Right now it says, if string is null, return null. If string is empty, return empty. Seems like a no-op.
>> 
>> This special case for zero length strings can and should simply be removed. The loops below both handle zero length strings perfectly, as does the substring function.
> 
> I added this because  stripLeadingAndTrailingHTMLSpaces(const String& string) has this early return.
> 
> Seems like when I refactor to remove stripLeadingAndTrailingHTMLSpaces(const String& string), then I should bring this back.

After writing API tests, it's obvious that stripLeadingAndTrailingHTMLSpaces() is always supposed to return a null String in the case of a zero-length string, never an empty String.

I suspect this is a micro-optimization since we must always check m_impl to see if it's nullptr first whenever comparing strings or getting their length or calling isNull() or isEmpty().
Comment 16 Darin Adler 2016-08-07 09:51:06 PDT
Comment on attachment 285527 [details]
Patch v5

View in context: https://bugs.webkit.org/attachment.cgi?id=285527&action=review

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:91
> +    return length ? string.substring(start, length) : emptyAtom.string();

I don’t understand why we are doing a special case here for 0 length. If this special case is needed, it should be in StringView::substring, not here.

Unless this function has special requirements that StringView::substring does not.

> Source/WebCore/html/parser/HTMLParserIdioms.h:50
> +// FIXME: Replace overload using const String& with method using StringView.
>  WEBCORE_EXPORT String stripLeadingAndTrailingHTMLSpaces(const String&);

I don’t think this FIXME is exactly right.

If I have a String that probably does *not* have leading and trailing HTML spaces, then converting it to a StringView and back to a String will have to allocate a new String! On the other hand, if you are starting with a String and do not need to convert the result to a String for storage, just need to look at it, then it’s better to have the result be a StringView.

So there is value in having a version that produces a String that can reuse the existing string when there is nothing to strip. Not clear if overloading with the same name is the best design choice, since both the return type and the argument type matter.

Implying that we will want to remove this overload entirely is probably incorrect, though.
Comment 17 Darin Adler 2016-08-07 16:23:00 PDT
Comment on attachment 284593 [details]
Patch v4

View in context: https://bugs.webkit.org/attachment.cgi?id=284593&action=review

>>>>> Source/WebCore/html/parser/HTMLParserIdioms.cpp:84
>>>>> +        return string.isNull() ? string : StringView::empty();
>>>> 
>>>> Is this ternary reversed? Right now it says, if string is null, return null. If string is empty, return empty. Seems like a no-op.
>>> 
>>> This special case for zero length strings can and should simply be removed. The loops below both handle zero length strings perfectly, as does the substring function.
>> 
>> I added this because  stripLeadingAndTrailingHTMLSpaces(const String& string) has this early return.
>> 
>> Seems like when I refactor to remove stripLeadingAndTrailingHTMLSpaces(const String& string), then I should bring this back.
> 
> After writing API tests, it's obvious that stripLeadingAndTrailingHTMLSpaces() is always supposed to return a null String in the case of a zero-length string, never an empty String.
> 
> I suspect this is a micro-optimization since we must always check m_impl to see if it's nullptr first whenever comparing strings or getting their length or calling isNull() or isEmpty().

I don’t agree that stripLeadingAndTrailingHTMLSpaces should return a null string. How does writing API tests make this clear?

I don’t understand what you mean with the micro optimization comment. To me it does not seem like we should add branches to optimize the empty or null string cases.
Comment 18 David Kilzer (:ddkilzer) 2016-08-08 17:11:26 PDT
Comment on attachment 285527 [details]
Patch v5

View in context: https://bugs.webkit.org/attachment.cgi?id=285527&action=review

>> Source/WebCore/html/parser/HTMLParserIdioms.cpp:91
>> +    return length ? string.substring(start, length) : emptyAtom.string();
> 
> I don’t understand why we are doing a special case here for 0 length. If this special case is needed, it should be in StringView::substring, not here.
> 
> Unless this function has special requirements that StringView::substring does not.

In a separate comment, you (Darin) wrote:
> I don’t agree that stripLeadingAndTrailingHTMLSpaces should return a null string. How does writing API tests make this clear?

It made it clear insofar as that is the current behavior of stripLeadingAndTrailingHTMLSpaces(const String&).

I wanted to match the behavior of stripLeadingAndTrailingHTMLSpaces(const String&) when implementing stripLeadingAndTrailingHTMLSpaces(StringView) while writing the tests.

> I don’t understand what you mean with the micro optimization comment.

Calling String[View].isEmpty() on a null string checks to see if m_impl is nullptr, which is (slightly) faster than checking if String.isEmpty() on an empty string:

WTFString:
    bool isEmpty() const { return !m_impl || !m_impl->length(); }

I guess StringView just checks length, though, and skips the NULL check, which seems like WTFString could do as well!  (There's no perf win here, however minimal.)

StringView:
inline bool StringView::isEmpty() const
{
    return !length();
}

inline unsigned StringView::length() const
{
    return m_length;
}

Although now that you point it out, this is changing null strings into empty strings, so it's not even doing what I expected, even though the tests pass.

I need to go back and add more tests, probably for isNull() and isEmpty().

And you're saying that we should remove these special cases for stripLeadingAndTrailingHTMLSpaces(const String&) as well?

> To me it does not seem like we should add branches to optimize the empty or null string cases.

Okay, researching the origin of this code in stripLeadingAndTrailingHTMLSpaces(const String&), it was introduced by you in r68854 to make sure not to turn null strings into empty strings:

            * html/parser/HTMLParserIdioms.cpp:
            (WebCore::stripLeadingAndTrailingHTMLSpaces):
            Fixed bug where the function would turn the null string into the empty string.
            Fixed bug where the function would not strip all trailing spaces.

<https://trac.webkit.org/changeset/68854/trunk/WebCore/html/parser/HTMLParserIdioms.cpp>

I'm clearing the review flag and will work on a new patch.

>> Source/WebCore/html/parser/HTMLParserIdioms.h:50
>>  WEBCORE_EXPORT String stripLeadingAndTrailingHTMLSpaces(const String&);
> 
> I don’t think this FIXME is exactly right.
> 
> If I have a String that probably does *not* have leading and trailing HTML spaces, then converting it to a StringView and back to a String will have to allocate a new String! On the other hand, if you are starting with a String and do not need to convert the result to a String for storage, just need to look at it, then it’s better to have the result be a StringView.
> 
> So there is value in having a version that produces a String that can reuse the existing string when there is nothing to strip. Not clear if overloading with the same name is the best design choice, since both the return type and the argument type matter.
> 
> Implying that we will want to remove this overload entirely is probably incorrect, though.

Okay, I'll remove the FIXME before landing.  I think I read too much into your original comment about moving the method into HTMLParserIdioms.cpp.
Comment 19 Frédéric Wang (:fredw) 2016-08-23 06:40:10 PDT
(In reply to comment #13)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > Yes, planning to do another patch to address Darin's comments and to add
> > > tests.
> > > 
> > > Got side-tracked with other work for the past couple of days.
> > 
> > OK, I was asking because I need to make skipLeadingAndTrailingWhitespace
> > accessible from another file in bug 160462, so just wanted to warn you that
> > the two patches may conflict.
> 
> Thanks for the head's up!  I'll check to make sure this is working as
> expected before landing.

Hi David, I landed the patch for bug 160462, so you'll have to rebase your patch here.
Comment 20 Rob Buis 2019-02-22 00:47:28 PST
Created attachment 362704 [details]
Patch
Comment 21 Rob Buis 2019-02-22 02:59:54 PST
Created attachment 362713 [details]
Patch
Comment 22 Frédéric Wang (:fredw) 2019-02-22 03:04:24 PST
Comment on attachment 362713 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=362713&action=review

> Source/WebCore/ChangeLog:9
> +        from HTTPParsers instead.

If should be something like "no new tests, already covered by MathML tests".
Comment 23 Rob Buis 2019-02-22 03:39:12 PST
Created attachment 362714 [details]
Patch
Comment 24 EWS Watchlist 2019-02-22 05:26:33 PST
Comment on attachment 362714 [details]
Patch

Attachment 362714 [details] did not pass mac-debug-ews (mac):
Output: https://webkit-queues.webkit.org/results/11245928

New failing tests:
imported/w3c/web-platform-tests/mathml/relations/css-styling/lengths-3.html
mathml/mpadded-crash.html
Comment 25 EWS Watchlist 2019-02-22 05:26:35 PST
Created attachment 362716 [details]
Archive of layout-test-results from ews114 for mac-highsierra

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews114  Port: mac-highsierra  Platform: Mac OS X 10.13.6
Comment 26 EWS Watchlist 2019-02-22 05:41:06 PST
Comment on attachment 362714 [details]
Patch

Attachment 362714 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: https://webkit-queues.webkit.org/results/11245953

New failing tests:
fast/viewport/ios/device-width-viewport-after-changing-view-scale.html
Comment 27 EWS Watchlist 2019-02-22 05:41:08 PST
Created attachment 362718 [details]
Archive of layout-test-results from ews126 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews126  Port: ios-simulator-wk2  Platform: Mac OS X 10.13.6
Comment 28 Rob Buis 2019-02-22 05:50:42 PST
Created attachment 362719 [details]
Patch
Comment 29 Frédéric Wang (:fredw) 2019-02-22 08:02:29 PST
Comment on attachment 362719 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=362719&action=review

> Source/WebCore/mathml/MathMLPresentationElement.cpp:298
> +    StringView stringView = string;

Probably the next step will be to see if we can use a StringView for parseMathMLLength
Comment 30 WebKit Commit Bot 2019-02-22 08:31:23 PST
Comment on attachment 362719 [details]
Patch

Clearing flags on attachment: 362719

Committed r241947: <https://trac.webkit.org/changeset/241947>
Comment 31 WebKit Commit Bot 2019-02-22 08:31:25 PST
All reviewed patches have been landed.  Closing bug.
Comment 32 Radar WebKit Bug Importer 2019-02-22 08:39:27 PST
<rdar://problem/48312326>
Comment 33 Darin Adler 2019-02-22 09:50:14 PST
I think we have both stripLeadingAndTrailingHTMLSpaces and stripLeadingAndTrailingHTTPSpaces, I am really surprised that this one uses the HTTP one, not the HTML one.

Otherwise, I love this!
Comment 34 David Kilzer (:ddkilzer) 2019-02-22 15:46:19 PST
Rob, thanks for finishing this patch!