WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
180009
Modernize some aspects of text codecs, eliminate WebKit use of strcasecmp
https://bugs.webkit.org/show_bug.cgi?id=180009
Summary
Modernize some aspects of text codecs, eliminate WebKit use of strcasecmp
Darin Adler
Reported
2017-11-24 15:42:19 PST
Modernize some aspects of text codecs, eliminate WebKit use of strcasecmp
Attachments
Patch
(163.38 KB, patch)
2017-11-24 17:02 PST
,
Darin Adler
no flags
Details
Formatted Diff
Diff
Patch
(164.95 KB, patch)
2017-11-24 17:58 PST
,
Darin Adler
no flags
Details
Formatted Diff
Diff
Patch
(158.32 KB, patch)
2017-11-27 21:49 PST
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Patch
(159.10 KB, patch)
2017-11-27 23:19 PST
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Patch
(196.08 KB, patch)
2017-12-02 11:57 PST
,
Darin Adler
no flags
Details
Formatted Diff
Diff
Patch
(195.30 KB, patch)
2017-12-02 12:10 PST
,
Darin Adler
no flags
Details
Formatted Diff
Diff
Patch
(195.50 KB, patch)
2017-12-02 14:29 PST
,
Darin Adler
achristensen
: review+
Details
Formatted Diff
Diff
Show Obsolete
(6)
View All
Add attachment
proposed patch, testcase, etc.
Darin Adler
Comment 1
2017-11-24 17:02:08 PST
Comment hidden (obsolete)
Created
attachment 327562
[details]
Patch
EWS Watchlist
Comment 2
2017-11-24 17:04:05 PST
Comment hidden (obsolete)
Attachment 327562
[details]
did not pass style-queue: ERROR: Source/WebCore/platform/text/TextCodecUTF16.h:37: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] Total errors found: 1 in 56 files If any of these errors are false positives, please file a bug against check-webkit-style.
Darin Adler
Comment 3
2017-11-24 17:58:40 PST
Comment hidden (obsolete)
Created
attachment 327565
[details]
Patch
EWS Watchlist
Comment 4
2017-11-24 18:00:36 PST
Comment hidden (obsolete)
Attachment 327565
[details]
did not pass style-queue: ERROR: Source/WebCore/platform/text/TextCodecUTF16.h:37: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] Total errors found: 1 in 58 files If any of these errors are false positives, please file a bug against check-webkit-style.
Alex Christensen
Comment 5
2017-11-27 21:46:22 PST
Comment on
attachment 327565
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=327565&action=review
> Source/WTF/wtf/text/LineEnding.cpp:57 > + needFix = true;
You could either break after either one of these, or you could count the number of changes you will need to make to call reserveInitialCapacity and uncheckedAppend in the loop below. The first condition would reduce the length by one and this could just replace the character in place. You could even do both in place then adjust the size of the Vector but not the capacity to make 0 allocations in this function. Maybe a FIXME would be appropriate for these because they're just performance optimizations and you're developing on a platform that doesn't even compile them.
> Source/WTF/wtf/text/LineEnding.cpp:131 > + return result;
ASSERT(q == result.data() + result.size()) might be nice here.
> Source/WTF/wtf/text/StringCommon.h:659 > + return length == strlen(b) && equalIgnoringASCIICase(a, b, length);
This reads each character twice when only one read is necessary. A loop that checks for the null character before when it's expected might have better performance.
> Source/WTF/wtf/text/StringCommon.h:665 > + return strlen(string) == length && equalLettersIgnoringASCIICase(string, lowercaseLetters, length);
ditto
> Source/WebCore/PAL/pal/text/UnencodableHandling.h:32 > +enum UnencodableHandling {
enum classes are all the new hotness because they don't pollute the namespace as much. It's not the biggest deal for this one, but I think it's good practice to use enum classes unless there's a reason to use an enum.
> Source/WebCore/fileapi/BlobBuilder.cpp:59 > - m_appendableData.append(static_cast<const char*>(arrayBufferView->baseAddress()), arrayBufferView->byteLength()); > + m_appendableData.append(static_cast<const uint8_t*>(arrayBufferView->baseAddress()), arrayBufferView->byteLength());
I think these static casts are no longer necessary.
> Source/WebCore/platform/SharedBuffer.cpp:80 > + return adoptRef(*new SharedBuffer { vector.data(), vector.size() });
Couldn't we do some kind of awful casting instead of copying the data unnecessarily?
Alex Christensen
Comment 6
2017-11-27 21:49:15 PST
Comment hidden (obsolete)
Created
attachment 327726
[details]
Patch
Alex Christensen
Comment 7
2017-11-27 23:19:29 PST
Comment hidden (obsolete)
Created
attachment 327732
[details]
Patch
Alex Christensen
Comment 8
2017-11-27 23:59:26 PST
Comment on
attachment 327732
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=327732&action=review
Looks like some tests need fixing.
> Source/WebCore/platform/text/TextCodec.h:47 > + virtual String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) = 0;
This could return a std::optional<String> instead of passing in a bool&
Alex Christensen
Comment 9
2017-11-28 10:42:13 PST
These tests need looking into. Many of them look relevant: Regressions: Unexpected text-only failures (19) compositing/iframes/compositing-for-scrollable-iframe.html [ Failure ] editing/pasteboard/data-transfer-item-list-add-file-multiple-times.html [ Failure ] editing/pasteboard/data-transfer-item-list-add-file-on-copy.html [ Failure ] editing/pasteboard/data-transfer-item-list-add-file-on-drag.html [ Failure ] editing/selection/4960137.html [ Failure ] editing/selection/drag-in-iframe.html [ Failure ] editing/selection/dump-as-markup.html [ Failure ] fast/animation/animation-element-removal.html [ Failure ] fast/css/acid2-pixel.html [ Failure ] fast/css/acid2.html [ Failure ] fast/dom/Document/readystate.html [ Failure ] fast/dom/Geolocation/clear-watch-invalid-id-crash.html [ Failure ] fast/dom/HTMLAnchorElement/anchor-download-unset.html [ Failure ] fast/dom/HTMLAnchorElement/anchor-nodownload.html [ Failure ] fast/dom/HTMLScriptElement/async-write.html [ Failure ] fast/dom/HTMLScriptElement/defer-inline-script.html [ Failure ] fast/dom/HTMLScriptElement/defer-write.html [ Failure ] fast/files/blob-constructor.html [ Failure ] fast/files/blob-slice-overflow.html [ Failure ] Regressions: Unexpected image-only failures (11) compositing/visibility/frameset-visibility-hidden.html [ ImageOnlyFailure ] css3/blending/background-blend-mode-crossfade-image.html [ ImageOnlyFailure ] css3/blending/background-blend-mode-data-uri-svg-image.html [ ImageOnlyFailure ] css3/blending/background-blend-mode-tiled-layers.html [ ImageOnlyFailure ] css3/images/cross-fade-svg-with-opacity.html [ ImageOnlyFailure ] css3/shapes/shape-outside/shape-image/shape-image-002.html [ ImageOnlyFailure ] css3/shapes/shape-outside/shape-image/shape-image-005.html [ ImageOnlyFailure ] css3/shapes/shape-outside/shape-image/shape-image-020.html [ ImageOnlyFailure ] css3/shapes/shape-outside/shape-image/shape-image-023.html [ ImageOnlyFailure ] css3/viewport-percentage-lengths/vh-resize.html [ ImageOnlyFailure ] fast/backgrounds/background-image-large-float-intrinsic-ratio.html [ ImageOnlyFailure ]
Darin Adler
Comment 10
2017-11-30 09:19:30 PST
Comment on
attachment 327732
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=327732&action=review
>> Source/WebCore/platform/text/TextCodec.h:47 >> + virtual String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) = 0; > > This could return a std::optional<String> instead of passing in a bool&
I agree and even started on this change locally. Then I realized it was getting to be too much change all in one patch, so I held off on that for now.
Darin Adler
Comment 11
2017-11-30 09:39:22 PST
Comment on
attachment 327565
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=327565&action=review
Thanks for the review, Alex. I will figure out why the tests are failing and upload a new patch the next chance I get to work on this.
>> Source/WTF/wtf/text/LineEnding.cpp:57 >> + needFix = true; > > You could either break after either one of these, or you could count the number of changes you will need to make to call reserveInitialCapacity and uncheckedAppend in the loop below. The first condition would reduce the length by one and this could just replace the character in place. You could even do both in place then adjust the size of the Vector but not the capacity to make 0 allocations in this function. > > Maybe a FIXME would be appropriate for these because they're just performance optimizations and you're developing on a platform that doesn't even compile them.
> You could even do both in place then adjust the size of the Vector
Sure, I will change this to modify the vector in place; then I won’t need a preflight at all. When I wrote the code I hadn’t yet changed the argument to be an rvalue reference; I did that later to optimize the "nothing changed" case and did not take that into account.
> you're developing on a platform that doesn't even compile them
That’s wrong. On Mac and all non-Windows platforms we compile both of these. On *Windows* we don’t compile normalizeLineEndingsToLF.
> they're just performance optimizations
No, these are not just performance optimizations. I fixed at least two or three small logic errors in the existing functions. I probably should add test cases for these functions and maybe I should break this part of the patch off since it’s not dependent on much else in the patch.
>> Source/WTF/wtf/text/LineEnding.cpp:131 >> + return result; > > ASSERT(q == result.data() + result.size()) might be nice here.
Sure, will add that.
>> Source/WTF/wtf/text/StringCommon.h:659 >> + return length == strlen(b) && equalIgnoringASCIICase(a, b, length); > > This reads each character twice when only one read is necessary. A loop that checks for the null character before when it's expected might have better performance.
I agree that it might. There are other cases above, such as equalLettersIgnoringASCIICaseCommonWithoutLength and startsWithLettersIgnoringASCIICaseCommonWithoutLength that might also be faster this way. I like the simplicity and clarity of the current version, and it would be easy for someone to write the more efficient one if they can prove it’s worth it.
>> Source/WebCore/PAL/pal/text/UnencodableHandling.h:32 >> +enum UnencodableHandling { > > enum classes are all the new hotness because they don't pollute the namespace as much. It's not the biggest deal for this one, but I think it's good practice to use enum classes unless there's a reason to use an enum.
I was moving something here, not changing it. I would be happy to change it also, but doing that would mean that the patch starts to get even bigger.
>> Source/WebCore/fileapi/BlobBuilder.cpp:59 >> + m_appendableData.append(static_cast<const uint8_t*>(arrayBufferView->baseAddress()), arrayBufferView->byteLength()); > > I think these static casts are no longer necessary.
The casts are necessary. ArrayBufferView::baseAddress returns a const void*, but Vector<uint8_t>::append requires a const uint8_t. The reason we can use static_cast instead of reinterpret_cast is that it’s void*, not another specific type.
>> Source/WebCore/platform/SharedBuffer.cpp:80 >> + return adoptRef(*new SharedBuffer { vector.data(), vector.size() }); > > Couldn't we do some kind of awful casting instead of copying the data unnecessarily?
Perhaps. I did consider that. I would prefer not to do that. We can finish the job soon and get rid of the copying, and even if we don’t, the copying is not catastrophic.
Darin Adler
Comment 12
2017-12-02 11:29:57 PST
Comment on
attachment 327732
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=327732&action=review
>>> Source/WebCore/platform/text/TextCodec.h:47 >>> + virtual String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) = 0; >> >> This could return a std::optional<String> instead of passing in a bool& > > I agree and even started on this change locally. Then I realized it was getting to be too much change all in one patch, so I held off on that for now.
On reflection, I realize that would not work. This function returns *both* a String and whether we saw an error; the string contains what was decoded even if an error was encountered. So if we wanted to return both things cleanly as a return value, the type of the return value would need to be a std::pair, std::tuple, or struct. Not std::optional or std::expected. Unless we changed the design after studying how callers use this. There’s a change that callers fall into two categories: 1) callers that ignore errors completely 2) callers that do not pay attention to the decoded data in the return value if there was an error If callers fall into those two categories, then we might be able to just rid of sawError and use a null string or std::nullopt to indicate there was an error. But it requires looking at all the callers.
Darin Adler
Comment 13
2017-12-02 11:57:26 PST
Comment hidden (obsolete)
Created
attachment 328245
[details]
Patch
Darin Adler
Comment 14
2017-12-02 12:00:22 PST
OK, Alex, new patch has all the changes you asked for modulo my responses above; adds the missing line of code the absence of which was causing tests to fail, the shrink call in TextCodecUTF8::encode; and adds TestWebKitAPI tests for the LineEnding functions, including tests which would have failed with the old version with some bugs in edge cases. So ready to review and land I think.
Darin Adler
Comment 15
2017-12-02 12:10:59 PST
Comment hidden (obsolete)
Created
attachment 328247
[details]
Patch
Darin Adler
Comment 16
2017-12-02 12:11:23 PST
And now also fixed GTK and WPE builds.
Darin Adler
Comment 17
2017-12-02 14:25:45 PST
Tests don’t compile on Windows. Will fix that.
Darin Adler
Comment 18
2017-12-02 14:29:55 PST
Created
attachment 328255
[details]
Patch
Darin Adler
Comment 19
2017-12-02 14:30:19 PST
New patch should fix tests compiling on Windows.
Darin Adler
Comment 20
2017-12-02 16:39:32 PST
Hooray, looks like EWS is all green now.
Darin Adler
Comment 21
2017-12-03 17:53:51 PST
Anyone willing to review?
Darin Adler
Comment 22
2017-12-06 20:06:59 PST
Committed
r225618
: <
https://trac.webkit.org/changeset/225618
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug