RESOLVED FIXED 180009
Modernize some aspects of text codecs, eliminate WebKit use of strcasecmp
https://bugs.webkit.org/show_bug.cgi?id=180009
Summary Modernize some aspects of text codecs, eliminate WebKit use of strcasecmp
Darin Adler
Reported 2017-11-24 15:42:19 PST
Modernize some aspects of text codecs, eliminate WebKit use of strcasecmp
Attachments
Patch (163.38 KB, patch)
2017-11-24 17:02 PST, Darin Adler
no flags
Patch (164.95 KB, patch)
2017-11-24 17:58 PST, Darin Adler
no flags
Patch (158.32 KB, patch)
2017-11-27 21:49 PST, Alex Christensen
no flags
Patch (159.10 KB, patch)
2017-11-27 23:19 PST, Alex Christensen
no flags
Patch (196.08 KB, patch)
2017-12-02 11:57 PST, Darin Adler
no flags
Patch (195.30 KB, patch)
2017-12-02 12:10 PST, Darin Adler
no flags
Patch (195.50 KB, patch)
2017-12-02 14:29 PST, Darin Adler
achristensen: review+
Darin Adler
Comment 1 2017-11-24 17:02:08 PST Comment hidden (obsolete)
EWS Watchlist
Comment 2 2017-11-24 17:04:05 PST Comment hidden (obsolete)
Darin Adler
Comment 3 2017-11-24 17:58:40 PST Comment hidden (obsolete)
EWS Watchlist
Comment 4 2017-11-24 18:00:36 PST Comment hidden (obsolete)
Alex Christensen
Comment 5 2017-11-27 21:46:22 PST
Comment on attachment 327565 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=327565&action=review > Source/WTF/wtf/text/LineEnding.cpp:57 > + needFix = true; You could either break after either one of these, or you could count the number of changes you will need to make to call reserveInitialCapacity and uncheckedAppend in the loop below. The first condition would reduce the length by one and this could just replace the character in place. You could even do both in place then adjust the size of the Vector but not the capacity to make 0 allocations in this function. Maybe a FIXME would be appropriate for these because they're just performance optimizations and you're developing on a platform that doesn't even compile them. > Source/WTF/wtf/text/LineEnding.cpp:131 > + return result; ASSERT(q == result.data() + result.size()) might be nice here. > Source/WTF/wtf/text/StringCommon.h:659 > + return length == strlen(b) && equalIgnoringASCIICase(a, b, length); This reads each character twice when only one read is necessary. A loop that checks for the null character before when it's expected might have better performance. > Source/WTF/wtf/text/StringCommon.h:665 > + return strlen(string) == length && equalLettersIgnoringASCIICase(string, lowercaseLetters, length); ditto > Source/WebCore/PAL/pal/text/UnencodableHandling.h:32 > +enum UnencodableHandling { enum classes are all the new hotness because they don't pollute the namespace as much. It's not the biggest deal for this one, but I think it's good practice to use enum classes unless there's a reason to use an enum. > Source/WebCore/fileapi/BlobBuilder.cpp:59 > - m_appendableData.append(static_cast<const char*>(arrayBufferView->baseAddress()), arrayBufferView->byteLength()); > + m_appendableData.append(static_cast<const uint8_t*>(arrayBufferView->baseAddress()), arrayBufferView->byteLength()); I think these static casts are no longer necessary. > Source/WebCore/platform/SharedBuffer.cpp:80 > + return adoptRef(*new SharedBuffer { vector.data(), vector.size() }); Couldn't we do some kind of awful casting instead of copying the data unnecessarily?
Alex Christensen
Comment 6 2017-11-27 21:49:15 PST Comment hidden (obsolete)
Alex Christensen
Comment 7 2017-11-27 23:19:29 PST Comment hidden (obsolete)
Alex Christensen
Comment 8 2017-11-27 23:59:26 PST
Comment on attachment 327732 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=327732&action=review Looks like some tests need fixing. > Source/WebCore/platform/text/TextCodec.h:47 > + virtual String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) = 0; This could return a std::optional<String> instead of passing in a bool&
Alex Christensen
Comment 9 2017-11-28 10:42:13 PST
These tests need looking into. Many of them look relevant: Regressions: Unexpected text-only failures (19) compositing/iframes/compositing-for-scrollable-iframe.html [ Failure ] editing/pasteboard/data-transfer-item-list-add-file-multiple-times.html [ Failure ] editing/pasteboard/data-transfer-item-list-add-file-on-copy.html [ Failure ] editing/pasteboard/data-transfer-item-list-add-file-on-drag.html [ Failure ] editing/selection/4960137.html [ Failure ] editing/selection/drag-in-iframe.html [ Failure ] editing/selection/dump-as-markup.html [ Failure ] fast/animation/animation-element-removal.html [ Failure ] fast/css/acid2-pixel.html [ Failure ] fast/css/acid2.html [ Failure ] fast/dom/Document/readystate.html [ Failure ] fast/dom/Geolocation/clear-watch-invalid-id-crash.html [ Failure ] fast/dom/HTMLAnchorElement/anchor-download-unset.html [ Failure ] fast/dom/HTMLAnchorElement/anchor-nodownload.html [ Failure ] fast/dom/HTMLScriptElement/async-write.html [ Failure ] fast/dom/HTMLScriptElement/defer-inline-script.html [ Failure ] fast/dom/HTMLScriptElement/defer-write.html [ Failure ] fast/files/blob-constructor.html [ Failure ] fast/files/blob-slice-overflow.html [ Failure ] Regressions: Unexpected image-only failures (11) compositing/visibility/frameset-visibility-hidden.html [ ImageOnlyFailure ] css3/blending/background-blend-mode-crossfade-image.html [ ImageOnlyFailure ] css3/blending/background-blend-mode-data-uri-svg-image.html [ ImageOnlyFailure ] css3/blending/background-blend-mode-tiled-layers.html [ ImageOnlyFailure ] css3/images/cross-fade-svg-with-opacity.html [ ImageOnlyFailure ] css3/shapes/shape-outside/shape-image/shape-image-002.html [ ImageOnlyFailure ] css3/shapes/shape-outside/shape-image/shape-image-005.html [ ImageOnlyFailure ] css3/shapes/shape-outside/shape-image/shape-image-020.html [ ImageOnlyFailure ] css3/shapes/shape-outside/shape-image/shape-image-023.html [ ImageOnlyFailure ] css3/viewport-percentage-lengths/vh-resize.html [ ImageOnlyFailure ] fast/backgrounds/background-image-large-float-intrinsic-ratio.html [ ImageOnlyFailure ]
Darin Adler
Comment 10 2017-11-30 09:19:30 PST
Comment on attachment 327732 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=327732&action=review >> Source/WebCore/platform/text/TextCodec.h:47 >> + virtual String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) = 0; > > This could return a std::optional<String> instead of passing in a bool& I agree and even started on this change locally. Then I realized it was getting to be too much change all in one patch, so I held off on that for now.
Darin Adler
Comment 11 2017-11-30 09:39:22 PST
Comment on attachment 327565 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=327565&action=review Thanks for the review, Alex. I will figure out why the tests are failing and upload a new patch the next chance I get to work on this. >> Source/WTF/wtf/text/LineEnding.cpp:57 >> + needFix = true; > > You could either break after either one of these, or you could count the number of changes you will need to make to call reserveInitialCapacity and uncheckedAppend in the loop below. The first condition would reduce the length by one and this could just replace the character in place. You could even do both in place then adjust the size of the Vector but not the capacity to make 0 allocations in this function. > > Maybe a FIXME would be appropriate for these because they're just performance optimizations and you're developing on a platform that doesn't even compile them. > You could even do both in place then adjust the size of the Vector Sure, I will change this to modify the vector in place; then I won’t need a preflight at all. When I wrote the code I hadn’t yet changed the argument to be an rvalue reference; I did that later to optimize the "nothing changed" case and did not take that into account. > you're developing on a platform that doesn't even compile them That’s wrong. On Mac and all non-Windows platforms we compile both of these. On *Windows* we don’t compile normalizeLineEndingsToLF. > they're just performance optimizations No, these are not just performance optimizations. I fixed at least two or three small logic errors in the existing functions. I probably should add test cases for these functions and maybe I should break this part of the patch off since it’s not dependent on much else in the patch. >> Source/WTF/wtf/text/LineEnding.cpp:131 >> + return result; > > ASSERT(q == result.data() + result.size()) might be nice here. Sure, will add that. >> Source/WTF/wtf/text/StringCommon.h:659 >> + return length == strlen(b) && equalIgnoringASCIICase(a, b, length); > > This reads each character twice when only one read is necessary. A loop that checks for the null character before when it's expected might have better performance. I agree that it might. There are other cases above, such as equalLettersIgnoringASCIICaseCommonWithoutLength and startsWithLettersIgnoringASCIICaseCommonWithoutLength that might also be faster this way. I like the simplicity and clarity of the current version, and it would be easy for someone to write the more efficient one if they can prove it’s worth it. >> Source/WebCore/PAL/pal/text/UnencodableHandling.h:32 >> +enum UnencodableHandling { > > enum classes are all the new hotness because they don't pollute the namespace as much. It's not the biggest deal for this one, but I think it's good practice to use enum classes unless there's a reason to use an enum. I was moving something here, not changing it. I would be happy to change it also, but doing that would mean that the patch starts to get even bigger. >> Source/WebCore/fileapi/BlobBuilder.cpp:59 >> + m_appendableData.append(static_cast<const uint8_t*>(arrayBufferView->baseAddress()), arrayBufferView->byteLength()); > > I think these static casts are no longer necessary. The casts are necessary. ArrayBufferView::baseAddress returns a const void*, but Vector<uint8_t>::append requires a const uint8_t. The reason we can use static_cast instead of reinterpret_cast is that it’s void*, not another specific type. >> Source/WebCore/platform/SharedBuffer.cpp:80 >> + return adoptRef(*new SharedBuffer { vector.data(), vector.size() }); > > Couldn't we do some kind of awful casting instead of copying the data unnecessarily? Perhaps. I did consider that. I would prefer not to do that. We can finish the job soon and get rid of the copying, and even if we don’t, the copying is not catastrophic.
Darin Adler
Comment 12 2017-12-02 11:29:57 PST
Comment on attachment 327732 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=327732&action=review >>> Source/WebCore/platform/text/TextCodec.h:47 >>> + virtual String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) = 0; >> >> This could return a std::optional<String> instead of passing in a bool& > > I agree and even started on this change locally. Then I realized it was getting to be too much change all in one patch, so I held off on that for now. On reflection, I realize that would not work. This function returns *both* a String and whether we saw an error; the string contains what was decoded even if an error was encountered. So if we wanted to return both things cleanly as a return value, the type of the return value would need to be a std::pair, std::tuple, or struct. Not std::optional or std::expected. Unless we changed the design after studying how callers use this. There’s a change that callers fall into two categories: 1) callers that ignore errors completely 2) callers that do not pay attention to the decoded data in the return value if there was an error If callers fall into those two categories, then we might be able to just rid of sawError and use a null string or std::nullopt to indicate there was an error. But it requires looking at all the callers.
Darin Adler
Comment 13 2017-12-02 11:57:26 PST Comment hidden (obsolete)
Darin Adler
Comment 14 2017-12-02 12:00:22 PST
OK, Alex, new patch has all the changes you asked for modulo my responses above; adds the missing line of code the absence of which was causing tests to fail, the shrink call in TextCodecUTF8::encode; and adds TestWebKitAPI tests for the LineEnding functions, including tests which would have failed with the old version with some bugs in edge cases. So ready to review and land I think.
Darin Adler
Comment 15 2017-12-02 12:10:59 PST Comment hidden (obsolete)
Darin Adler
Comment 16 2017-12-02 12:11:23 PST
And now also fixed GTK and WPE builds.
Darin Adler
Comment 17 2017-12-02 14:25:45 PST
Tests don’t compile on Windows. Will fix that.
Darin Adler
Comment 18 2017-12-02 14:29:55 PST
Darin Adler
Comment 19 2017-12-02 14:30:19 PST
New patch should fix tests compiling on Windows.
Darin Adler
Comment 20 2017-12-02 16:39:32 PST
Hooray, looks like EWS is all green now.
Darin Adler
Comment 21 2017-12-03 17:53:51 PST
Anyone willing to review?
Darin Adler
Comment 22 2017-12-06 20:06:59 PST
Note You need to log in before you can comment on or make changes to this bug.