Bug 163771

Summary: Implement TextDecoder and TextEncoder
Product: WebKit Reporter: Alex Christensen <achristensen>
Component: New BugsAssignee: Alex Christensen <achristensen>
Status: RESOLVED FIXED    
Severity: Normal CC: buildbot, commit-queue, fpizlo, gyuyoung.kim, jh718.park, keith_miller, rniwa, sam, webkit
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Archive of layout-test-results from ews102 for mac-yosemite
none
Archive of layout-test-results from ews107 for mac-yosemite-wk2
none
Archive of layout-test-results from ews115 for mac-yosemite
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
sam: review+
source_changes
sam: review+, buildbot: commit-queue-
Archive of layout-test-results from ews103 for mac-yosemite
none
Archive of layout-test-results from ews104 for mac-yosemite-wk2
none
Archive of layout-test-results from ews115 for mac-yosemite
none
Archive of layout-test-results from ews125 for ios-simulator-wk2 none

Description Alex Christensen 2016-10-20 18:30:13 PDT
Implement TextDecoder and TextEncoder
Comment 1 Alex Christensen 2016-10-20 18:37:58 PDT
Created attachment 292305 [details]
Patch
Comment 2 Alex Christensen 2016-10-20 18:43:23 PDT
Created attachment 292306 [details]
Patch
Comment 3 Build Bot 2016-10-20 19:42:01 PDT
Comment on attachment 292306 [details]
Patch

Attachment 292306 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/2335040

New failing tests:
js/dom/global-constructors-attributes-dedicated-worker.html
Comment 4 Build Bot 2016-10-20 19:42:03 PDT
Created attachment 292313 [details]
Archive of layout-test-results from ews102 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews102  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 5 Build Bot 2016-10-20 19:50:11 PDT
Comment on attachment 292306 [details]
Patch

Attachment 292306 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/2335083

New failing tests:
js/dom/global-constructors-attributes.html
js/dom/global-constructors-attributes-dedicated-worker.html
Comment 6 Build Bot 2016-10-20 19:50:13 PDT
Created attachment 292314 [details]
Archive of layout-test-results from ews107 for mac-yosemite-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107  Port: mac-yosemite-wk2  Platform: Mac OS X 10.10.5
Comment 7 Build Bot 2016-10-20 19:53:00 PDT
Comment on attachment 292306 [details]
Patch

Attachment 292306 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/2335060

New failing tests:
js/dom/global-constructors-attributes-dedicated-worker.html
Comment 8 Build Bot 2016-10-20 19:53:02 PDT
Created attachment 292315 [details]
Archive of layout-test-results from ews115 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews115  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 9 Alex Christensen 2016-10-20 20:05:03 PDT
Created attachment 292318 [details]
Patch
Comment 10 Alex Christensen 2016-10-20 20:31:46 PDT
Comment on attachment 292318 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=292318&action=review

> LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-labels-expected.txt:3170
> +FAIL "cseuckr" => "euc-kr" assert_equals: label for encoding should match expected "euc-kr" but got "windows-949"

These failures can only be fixed by changing TextCodecICU.cpp which I don't want to touch in this patch.

> LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-utf16-surrogates-expected.txt:5
> +FAIL utf-16le - lone surrogate trail assert_equals: expected "�" but got "í°Â"

These failures are related to the FIXME comment in TextCodecUTF16::decode which I don't want to touch in this patch.
Comment 11 Alex Christensen 2016-10-20 20:59:33 PDT
Created attachment 292319 [details]
Patch
Comment 12 Alex Christensen 2016-10-20 23:34:44 PDT
Created attachment 292323 [details]
Patch
Comment 13 Alex Christensen 2016-10-21 00:45:18 PDT
Could somebody test the efl build?  I don't know why it fails.
Comment 14 Alex Christensen 2016-10-21 09:48:48 PDT
Why are there linker failures with typed arrays?  This is super confusing
Comment 15 Sam Weinig 2016-10-21 10:42:59 PDT
Comment on attachment 292323 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=292323&action=review

> Source/WebCore/dom/TextDecoder.idl:39
> +[Constructor(optional DOMString label = "utf-8", optional TextDecoderOptions options),
> +Exposed=(Window,Worker),
> +ImplementationLacksVTable,
> +ConstructorMayThrowException]
> +interface TextDecoder {

This is weird formatting for extended attributes.  We usually do something like:

[
    Foo,
    Bar
] interface TextDecoder {

> Source/WebCore/dom/TextEncoder.idl:29
> +[Constructor,
> +Exposed=(Window,Worker),
> +ImplementationLacksVTable]
> +interface TextEncoder {

Formatting.
Comment 16 Darin Adler 2016-10-21 11:59:04 PDT
Comment on attachment 292323 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=292323&action=review

> Source/WebCore/dom/TextDecoder.cpp:39
> +static bool isEncodingWhitespace(UChar c)
> +{
> +    return c == 0x09
> +        || c == 0x0A
> +        || c == 0x0C
> +        || c == 0x0D
> +        || c == 0x20;
> +}

This function does the same thing as isHTMLSpace, but less efficiently. I suggest either using isHTMLSpace directly, or having this function be an inline that calls isHTMLSpace.

> Source/WebCore/dom/TextDecoder.cpp:41
> +ExceptionOr<Ref<TextDecoder>> TextDecoder::create(String label, Options options)

This function should take a const String&, not a String.

> Source/WebCore/dom/TextDecoder.cpp:43
> +    String strippedLabel = label.stripWhiteSpace(isEncodingWhitespace);

This should use the much more efficient stripLeadingAndTrailingHTMLSpaces function rather than the one where we pass a function in.

> Source/WebCore/dom/TextDecoder.cpp:46
> +    if (strippedLabel != utf8Label.data())
> +        return Exception { RangeError };

What is the point of this? I think this is just a slow way to check wither the label has any non-ASCII characters in it. We have faster ways to check that.

> Source/WebCore/dom/TextDecoder.cpp:48
> +    Ref<TextDecoder> decoder = adoptRef(*new TextDecoder(utf8Label, options));

Should use auto here, I think.

> Source/WebCore/dom/TextDecoder.cpp:54
> +TextDecoder::TextDecoder(CString label, Options options)

This should take const char*, not CString.

> Source/WebCore/dom/TextDecoder.cpp:93
> +    const UChar utf16BEBOM[2] = {0xFEFF, '\0'};
> +    return makeString(utf16BEBOM, decoded);

This is so expensive, requires copying the entire string to prepend this single character. Is there another way to accomplish this? Can we put this into the buffer beforehand or something?

> Source/WebCore/dom/TextDecoder.cpp:136
> +            m_hasDecoded = true;
> +            return WTFMove(result);

Should just remove these two lines of code. The function will do the same thing without them.

> Source/WebCore/dom/TextEncoder.cpp:38
> +RefPtr<Uint8Array> TextEncoder::encode(String&& input) const

This should return Ref, not RefPtr.

> Source/WebCore/dom/TextEncoder.cpp:41
> +        return Uint8Array::create(input.characters8(), input.length());

This is wrong if any characters are non-ASCII. The Latin-1 characters won’t be converted to UTF-8 and so the encoding will be done incorrectly. Need test cases to cover this.

> Source/WebCore/dom/TextEncoder.cpp:43
> +    CString utf8 = input.utf8();

Using this function means copying the UTF-8 bytes, rather than decoding them directly into a Uint8Array buffer. We have the functions that could do this more efficiently or we can refactor to make sure we don’t need to alllocate an extra buffer and I would like to consider doing that instead.

The utf8() function can fail, returning an empty string, when the input text is not valid UTF-16. Is it OK that we create an empty array in that case? Do we have test cases covering that?

> Source/WebCore/dom/TextEncoder.h:35
> +class TextEncoder : public RefCounted<TextEncoder> {

The class is named TextEncoder, but it always encodes as UTF-8?
Comment 17 Darin Adler 2016-10-21 11:59:48 PDT
Comment on attachment 292323 [details]
Patch

review- because of the Latin-1/UTF-8 mistake, which is a showstopper

Also, there seems to be insufficient test coverage in the tests we imported for this.
Comment 18 Alex Christensen 2016-11-16 17:33:08 PST
Created attachment 295003 [details]
Patch
Comment 19 Sam Weinig 2016-11-16 23:04:57 PST
Please make sure to update the features.json file with this change.
Comment 20 Sam Weinig 2016-11-16 23:05:36 PST
Also, might be worth splitting out the patch from the tests for review, as 2.4 MB is a bit big.
Comment 21 Alex Christensen 2016-11-17 10:29:56 PST
Created attachment 295053 [details]
Patch
Comment 22 Alex Christensen 2016-11-17 12:06:32 PST
Created attachment 295066 [details]
Patch
Comment 23 Alex Christensen 2016-11-17 15:05:25 PST
Created attachment 295092 [details]
source_changes
Comment 24 Sam Weinig 2016-11-17 15:24:21 PST
Comment on attachment 295092 [details]
source_changes

View in context: https://bugs.webkit.org/attachment.cgi?id=295092&action=review

> Source/WebCore/dom/TextEncoder.cpp:43
> +    // FIXME: We should not need to allocate a CString to encode into a Uint8Array.
> +    CString utf8 = input.utf8();
> +    return Uint8Array::create(reinterpret_cast<const uint8_t*>(utf8.data()), utf8.length());

While I agree with your comment, is there no way to have Uint8Array adopt the internal buffer of the c-string, which can relinquish control?  Copying here seems really unfortunate.

> Source/WebCore/dom/TextEncoder.h:39
> +    RefPtr<Uint8Array> encode(String&&) const;

This can return a Ref.

> Source/WebCore/dom/TextEncoder.h:41
> +    TextEncoder() { };

Can this be = default?

> Source/WebCore/dom/TextEncoder.idl:34
> +
> +    // FIXME: This should have [NewObject]
> +    Uint8Array encode(optional USVString input = "");

Why not add it? What happens?
Comment 25 Build Bot 2016-11-17 16:04:19 PST
Comment on attachment 295092 [details]
source_changes

Attachment 295092 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/2533704

New failing tests:
imported/w3c/web-platform-tests/encoding/textencoder-constructor-non-utf.html
imported/w3c/web-platform-tests/encoding/textdecoder-streaming.html
imported/w3c/web-platform-tests/encoding/textdecoder-fatal-single-byte.html
imported/w3c/web-platform-tests/encoding/api-replacement-encodings.html
imported/w3c/web-platform-tests/encoding/idlharness.html
imported/w3c/web-platform-tests/encoding/textencoder-utf16-surrogates.html
imported/w3c/web-platform-tests/encoding/api-basics.html
imported/w3c/web-platform-tests/encoding/iso-2022-jp-decoder.html
js/dom/global-constructors-attributes-dedicated-worker.html
imported/w3c/web-platform-tests/encoding/api-surrogates-utf8.html
imported/w3c/web-platform-tests/encoding/textdecoder-utf16-surrogates.html
imported/w3c/web-platform-tests/encoding/textdecoder-ignorebom.html
imported/w3c/web-platform-tests/encoding/textdecoder-fatal-streaming.html
imported/w3c/web-platform-tests/encoding/textdecoder-fatal.html
imported/w3c/web-platform-tests/encoding/textdecoder-labels.html
imported/w3c/web-platform-tests/encoding/api-invalid-label.html
imported/w3c/web-platform-tests/encoding/single-byte-decoder.html
imported/w3c/web-platform-tests/encoding/textdecoder-byte-order-marks.html
Comment 26 Build Bot 2016-11-17 16:04:24 PST
Created attachment 295102 [details]
Archive of layout-test-results from ews103 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 27 Build Bot 2016-11-17 16:07:54 PST
Comment on attachment 295092 [details]
source_changes

Attachment 295092 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/2533712

New failing tests:
imported/w3c/web-platform-tests/encoding/textencoder-constructor-non-utf.html
imported/w3c/web-platform-tests/encoding/textdecoder-streaming.html
imported/w3c/web-platform-tests/encoding/textdecoder-fatal-single-byte.html
imported/w3c/web-platform-tests/encoding/api-replacement-encodings.html
imported/w3c/web-platform-tests/encoding/idlharness.html
imported/w3c/web-platform-tests/encoding/textencoder-utf16-surrogates.html
imported/w3c/web-platform-tests/encoding/api-basics.html
imported/w3c/web-platform-tests/encoding/iso-2022-jp-decoder.html
js/dom/global-constructors-attributes-dedicated-worker.html
imported/w3c/web-platform-tests/encoding/api-surrogates-utf8.html
imported/w3c/web-platform-tests/encoding/textdecoder-utf16-surrogates.html
imported/w3c/web-platform-tests/encoding/textdecoder-ignorebom.html
imported/w3c/web-platform-tests/encoding/textdecoder-fatal-streaming.html
imported/w3c/web-platform-tests/encoding/textdecoder-fatal.html
imported/w3c/web-platform-tests/encoding/textdecoder-labels.html
imported/w3c/web-platform-tests/encoding/api-invalid-label.html
imported/w3c/web-platform-tests/encoding/single-byte-decoder.html
imported/w3c/web-platform-tests/encoding/textdecoder-byte-order-marks.html
js/dom/global-constructors-attributes.html
Comment 28 Build Bot 2016-11-17 16:07:59 PST
Created attachment 295103 [details]
Archive of layout-test-results from ews104 for mac-yosemite-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews104  Port: mac-yosemite-wk2  Platform: Mac OS X 10.10.5
Comment 29 Build Bot 2016-11-17 16:20:29 PST
Comment on attachment 295092 [details]
source_changes

Attachment 295092 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/2533764

New failing tests:
imported/w3c/web-platform-tests/encoding/textencoder-constructor-non-utf.html
imported/w3c/web-platform-tests/encoding/textdecoder-streaming.html
imported/w3c/web-platform-tests/encoding/textdecoder-fatal-single-byte.html
imported/w3c/web-platform-tests/encoding/api-replacement-encodings.html
imported/w3c/web-platform-tests/encoding/idlharness.html
imported/w3c/web-platform-tests/encoding/textencoder-utf16-surrogates.html
imported/w3c/web-platform-tests/encoding/api-basics.html
imported/w3c/web-platform-tests/encoding/iso-2022-jp-decoder.html
js/dom/global-constructors-attributes-dedicated-worker.html
imported/w3c/web-platform-tests/encoding/api-surrogates-utf8.html
imported/w3c/web-platform-tests/encoding/textdecoder-utf16-surrogates.html
imported/w3c/web-platform-tests/encoding/textdecoder-ignorebom.html
imported/w3c/web-platform-tests/encoding/textdecoder-fatal-streaming.html
imported/w3c/web-platform-tests/encoding/textdecoder-fatal.html
imported/w3c/web-platform-tests/encoding/textdecoder-labels.html
imported/w3c/web-platform-tests/encoding/api-invalid-label.html
imported/w3c/web-platform-tests/encoding/single-byte-decoder.html
imported/w3c/web-platform-tests/encoding/textdecoder-byte-order-marks.html
Comment 30 Build Bot 2016-11-17 16:20:34 PST
Created attachment 295104 [details]
Archive of layout-test-results from ews115 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews115  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 31 Build Bot 2016-11-17 16:27:07 PST
Comment on attachment 295092 [details]
source_changes

Attachment 295092 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/2533771

New failing tests:
imported/w3c/web-platform-tests/encoding/textencoder-constructor-non-utf.html
imported/w3c/web-platform-tests/encoding/textdecoder-streaming.html
imported/w3c/web-platform-tests/encoding/textdecoder-fatal-single-byte.html
imported/w3c/web-platform-tests/encoding/api-replacement-encodings.html
imported/w3c/web-platform-tests/encoding/idlharness.html
imported/w3c/web-platform-tests/encoding/textencoder-utf16-surrogates.html
imported/w3c/web-platform-tests/encoding/api-basics.html
imported/w3c/web-platform-tests/encoding/iso-2022-jp-decoder.html
js/dom/global-constructors-attributes-dedicated-worker.html
imported/w3c/web-platform-tests/encoding/api-surrogates-utf8.html
imported/w3c/web-platform-tests/encoding/textdecoder-utf16-surrogates.html
imported/w3c/web-platform-tests/encoding/textdecoder-ignorebom.html
imported/w3c/web-platform-tests/encoding/textdecoder-fatal-streaming.html
imported/w3c/web-platform-tests/encoding/textdecoder-fatal.html
imported/w3c/web-platform-tests/encoding/textdecoder-labels.html
imported/w3c/web-platform-tests/encoding/api-invalid-label.html
imported/w3c/web-platform-tests/encoding/single-byte-decoder.html
imported/w3c/web-platform-tests/encoding/textdecoder-byte-order-marks.html
Comment 32 Build Bot 2016-11-17 16:27:12 PST
Created attachment 295106 [details]
Archive of layout-test-results from ews125 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews125  Port: ios-simulator-wk2  Platform: Mac OS X 10.11.6
Comment 33 Alex Christensen 2016-11-17 18:11:25 PST
http://trac.webkit.org/changeset/208872
Comment 34 Chris Rebert 2016-12-06 22:14:14 PST
*** Bug 160653 has been marked as a duplicate of this bug. ***