Bug 180009 - Modernize some aspects of text codecs, eliminate WebKit use of strcasecmp
Summary: Modernize some aspects of text codecs, eliminate WebKit use of strcasecmp
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: All All
: P2 Normal
Assignee: Darin Adler
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-24 15:42 PST by Darin Adler
Modified: 2018-01-02 12:58 PST (History)
6 users (show)

See Also:


Attachments
Patch (163.38 KB, patch)
2017-11-24 17:02 PST, Darin Adler
no flags Details | Formatted Diff | Diff
Patch (164.95 KB, patch)
2017-11-24 17:58 PST, Darin Adler
no flags Details | Formatted Diff | Diff
Patch (158.32 KB, patch)
2017-11-27 21:49 PST, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (159.10 KB, patch)
2017-11-27 23:19 PST, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (196.08 KB, patch)
2017-12-02 11:57 PST, Darin Adler
no flags Details | Formatted Diff | Diff
Patch (195.30 KB, patch)
2017-12-02 12:10 PST, Darin Adler
no flags Details | Formatted Diff | Diff
Patch (195.50 KB, patch)
2017-12-02 14:29 PST, Darin Adler
achristensen: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Darin Adler 2017-11-24 15:42:19 PST
Modernize some aspects of text codecs, eliminate WebKit use of strcasecmp
Comment 1 Darin Adler 2017-11-24 17:02:08 PST Comment hidden (obsolete)
Comment 2 EWS Watchlist 2017-11-24 17:04:05 PST Comment hidden (obsolete)
Comment 3 Darin Adler 2017-11-24 17:58:40 PST Comment hidden (obsolete)
Comment 4 EWS Watchlist 2017-11-24 18:00:36 PST Comment hidden (obsolete)
Comment 5 Alex Christensen 2017-11-27 21:46:22 PST
Comment on attachment 327565 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=327565&action=review

> Source/WTF/wtf/text/LineEnding.cpp:57
> +            needFix = true;

You could either break after either one of these, or you could count the number of changes you will need to make to call reserveInitialCapacity and uncheckedAppend in the loop below.  The first condition would reduce the length by one and this could just replace the character in place.  You could even do both in place then adjust the size of the Vector but not the capacity to make 0 allocations in this function.

Maybe a FIXME would be appropriate for these because they're just performance optimizations and you're developing on a platform that doesn't even compile them.

> Source/WTF/wtf/text/LineEnding.cpp:131
> +    return result;

ASSERT(q == result.data() + result.size()) might be nice here.

> Source/WTF/wtf/text/StringCommon.h:659
> +    return length == strlen(b) && equalIgnoringASCIICase(a, b, length);

This reads each character twice when only one read is necessary.  A loop that checks for the null character before when it's expected might have better performance.

> Source/WTF/wtf/text/StringCommon.h:665
> +    return strlen(string) == length && equalLettersIgnoringASCIICase(string, lowercaseLetters, length);

ditto

> Source/WebCore/PAL/pal/text/UnencodableHandling.h:32
> +enum UnencodableHandling {

enum classes are all the new hotness because they don't pollute the namespace as much.  It's not the biggest deal for this one, but I think it's good practice to use enum classes unless there's a reason to use an enum.

> Source/WebCore/fileapi/BlobBuilder.cpp:59
> -    m_appendableData.append(static_cast<const char*>(arrayBufferView->baseAddress()), arrayBufferView->byteLength());
> +    m_appendableData.append(static_cast<const uint8_t*>(arrayBufferView->baseAddress()), arrayBufferView->byteLength());

I think these static casts are no longer necessary.

> Source/WebCore/platform/SharedBuffer.cpp:80
> +    return adoptRef(*new SharedBuffer { vector.data(), vector.size() });

Couldn't we do some kind of awful casting instead of copying the data unnecessarily?
Comment 6 Alex Christensen 2017-11-27 21:49:15 PST Comment hidden (obsolete)
Comment 7 Alex Christensen 2017-11-27 23:19:29 PST Comment hidden (obsolete)
Comment 8 Alex Christensen 2017-11-27 23:59:26 PST
Comment on attachment 327732 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=327732&action=review

Looks like some tests need fixing.

> Source/WebCore/platform/text/TextCodec.h:47
> +    virtual String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) = 0;

This could return a std::optional<String> instead of passing in a bool&
Comment 9 Alex Christensen 2017-11-28 10:42:13 PST
These tests need looking into.  Many of them look relevant:

Regressions: Unexpected text-only failures (19)
  compositing/iframes/compositing-for-scrollable-iframe.html [ Failure ]
  editing/pasteboard/data-transfer-item-list-add-file-multiple-times.html [ Failure ]
  editing/pasteboard/data-transfer-item-list-add-file-on-copy.html [ Failure ]
  editing/pasteboard/data-transfer-item-list-add-file-on-drag.html [ Failure ]
  editing/selection/4960137.html [ Failure ]
  editing/selection/drag-in-iframe.html [ Failure ]
  editing/selection/dump-as-markup.html [ Failure ]
  fast/animation/animation-element-removal.html [ Failure ]
  fast/css/acid2-pixel.html [ Failure ]
  fast/css/acid2.html [ Failure ]
  fast/dom/Document/readystate.html [ Failure ]
  fast/dom/Geolocation/clear-watch-invalid-id-crash.html [ Failure ]
  fast/dom/HTMLAnchorElement/anchor-download-unset.html [ Failure ]
  fast/dom/HTMLAnchorElement/anchor-nodownload.html [ Failure ]
  fast/dom/HTMLScriptElement/async-write.html [ Failure ]
  fast/dom/HTMLScriptElement/defer-inline-script.html [ Failure ]
  fast/dom/HTMLScriptElement/defer-write.html [ Failure ]
  fast/files/blob-constructor.html [ Failure ]
  fast/files/blob-slice-overflow.html [ Failure ]

Regressions: Unexpected image-only failures (11)
  compositing/visibility/frameset-visibility-hidden.html [ ImageOnlyFailure ]
  css3/blending/background-blend-mode-crossfade-image.html [ ImageOnlyFailure ]
  css3/blending/background-blend-mode-data-uri-svg-image.html [ ImageOnlyFailure ]
  css3/blending/background-blend-mode-tiled-layers.html [ ImageOnlyFailure ]
  css3/images/cross-fade-svg-with-opacity.html [ ImageOnlyFailure ]
  css3/shapes/shape-outside/shape-image/shape-image-002.html [ ImageOnlyFailure ]
  css3/shapes/shape-outside/shape-image/shape-image-005.html [ ImageOnlyFailure ]
  css3/shapes/shape-outside/shape-image/shape-image-020.html [ ImageOnlyFailure ]
  css3/shapes/shape-outside/shape-image/shape-image-023.html [ ImageOnlyFailure ]
  css3/viewport-percentage-lengths/vh-resize.html [ ImageOnlyFailure ]
  fast/backgrounds/background-image-large-float-intrinsic-ratio.html [ ImageOnlyFailure ]
Comment 10 Darin Adler 2017-11-30 09:19:30 PST
Comment on attachment 327732 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=327732&action=review

>> Source/WebCore/platform/text/TextCodec.h:47
>> +    virtual String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) = 0;
> 
> This could return a std::optional<String> instead of passing in a bool&

I agree and even started on this change locally. Then I realized it was getting to be too much change all in one patch, so I held off on that for now.
Comment 11 Darin Adler 2017-11-30 09:39:22 PST
Comment on attachment 327565 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=327565&action=review

Thanks for the review, Alex. I will figure out why the tests are failing and upload a new patch the next chance I get to work on this.

>> Source/WTF/wtf/text/LineEnding.cpp:57
>> +            needFix = true;
> 
> You could either break after either one of these, or you could count the number of changes you will need to make to call reserveInitialCapacity and uncheckedAppend in the loop below.  The first condition would reduce the length by one and this could just replace the character in place.  You could even do both in place then adjust the size of the Vector but not the capacity to make 0 allocations in this function.
> 
> Maybe a FIXME would be appropriate for these because they're just performance optimizations and you're developing on a platform that doesn't even compile them.

> You could even do both in place then adjust the size of the Vector

Sure, I will change this to modify the vector in place; then I won’t need a preflight at all. When I wrote the code I hadn’t yet changed the argument to be an rvalue reference; I did that later to optimize the "nothing changed" case and did not take that into account.

> you're developing on a platform that doesn't even compile them

That’s wrong. On Mac and all non-Windows platforms we compile both of these. On *Windows* we don’t compile normalizeLineEndingsToLF.

> they're just performance optimizations

No, these are not just performance optimizations. I fixed at least two or three small logic errors in the existing functions. I probably should add test cases for these functions and maybe I should break this part of the patch off since it’s not dependent on much else in the patch.

>> Source/WTF/wtf/text/LineEnding.cpp:131
>> +    return result;
> 
> ASSERT(q == result.data() + result.size()) might be nice here.

Sure, will add that.

>> Source/WTF/wtf/text/StringCommon.h:659
>> +    return length == strlen(b) && equalIgnoringASCIICase(a, b, length);
> 
> This reads each character twice when only one read is necessary.  A loop that checks for the null character before when it's expected might have better performance.

I agree that it might. There are other cases above, such as equalLettersIgnoringASCIICaseCommonWithoutLength and startsWithLettersIgnoringASCIICaseCommonWithoutLength that might also be faster this way.

I like the simplicity and clarity of the current version, and it would be easy for someone to write the more efficient one if they can prove it’s worth it.

>> Source/WebCore/PAL/pal/text/UnencodableHandling.h:32
>> +enum UnencodableHandling {
> 
> enum classes are all the new hotness because they don't pollute the namespace as much.  It's not the biggest deal for this one, but I think it's good practice to use enum classes unless there's a reason to use an enum.

I was moving something here, not changing it.

I would be happy to change it also, but doing that would mean that the patch starts to get even bigger.

>> Source/WebCore/fileapi/BlobBuilder.cpp:59
>> +    m_appendableData.append(static_cast<const uint8_t*>(arrayBufferView->baseAddress()), arrayBufferView->byteLength());
> 
> I think these static casts are no longer necessary.

The casts are necessary.

ArrayBufferView::baseAddress returns a const void*, but Vector<uint8_t>::append requires a const uint8_t.

The reason we can use static_cast instead of reinterpret_cast is that it’s void*, not another specific type.

>> Source/WebCore/platform/SharedBuffer.cpp:80
>> +    return adoptRef(*new SharedBuffer { vector.data(), vector.size() });
> 
> Couldn't we do some kind of awful casting instead of copying the data unnecessarily?

Perhaps. I did consider that. I would prefer not to do that. We can finish the job soon and get rid of the copying, and even if we don’t, the copying is not catastrophic.
Comment 12 Darin Adler 2017-12-02 11:29:57 PST
Comment on attachment 327732 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=327732&action=review

>>> Source/WebCore/platform/text/TextCodec.h:47
>>> +    virtual String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) = 0;
>> 
>> This could return a std::optional<String> instead of passing in a bool&
> 
> I agree and even started on this change locally. Then I realized it was getting to be too much change all in one patch, so I held off on that for now.

On reflection, I realize that would not work. This function returns *both* a String and whether we saw an error; the string contains what was decoded even if an error was encountered. So if we wanted to return both things cleanly as a return value, the type of the return value would need to be a std::pair, std::tuple, or struct. Not std::optional or std::expected.

Unless we changed the design after studying how callers use this. There’s a change that callers fall into two categories:

1) callers that ignore errors completely
2) callers that do  not pay attention to the decoded data in the return value if there was an error

If callers fall into those two categories, then we might be able to just rid of sawError and use a null string or std::nullopt to indicate there was an error. But it requires looking at all the callers.
Comment 13 Darin Adler 2017-12-02 11:57:26 PST Comment hidden (obsolete)
Comment 14 Darin Adler 2017-12-02 12:00:22 PST
OK, Alex, new patch has all the changes you asked for modulo my responses above; adds the missing line of code the absence of which was causing tests to fail, the shrink call in TextCodecUTF8::encode; and adds TestWebKitAPI tests for the LineEnding functions, including tests which would have failed with the old version with some bugs in edge cases.

So ready to review and land I think.
Comment 15 Darin Adler 2017-12-02 12:10:59 PST Comment hidden (obsolete)
Comment 16 Darin Adler 2017-12-02 12:11:23 PST
And now also fixed GTK and WPE builds.
Comment 17 Darin Adler 2017-12-02 14:25:45 PST
Tests don’t compile on Windows. Will fix that.
Comment 18 Darin Adler 2017-12-02 14:29:55 PST
Created attachment 328255 [details]
Patch
Comment 19 Darin Adler 2017-12-02 14:30:19 PST
New patch should fix tests compiling on Windows.
Comment 20 Darin Adler 2017-12-02 16:39:32 PST
Hooray, looks like EWS is all green now.
Comment 21 Darin Adler 2017-12-03 17:53:51 PST
Anyone willing to review?
Comment 22 Darin Adler 2017-12-06 20:06:59 PST
Committed r225618: <https://trac.webkit.org/changeset/225618>