Bug 216219 - TextCodec refinements
Summary: TextCodec refinements
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: WebKit Nightly Build
Hardware: All All
: P2 Normal
Assignee: Darin Adler
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-09-05 14:44 PDT by Darin Adler
Modified: 2020-09-08 12:45 PDT (History)
16 users (show)

See Also:


Attachments
Patch (60.25 KB, patch)
2020-09-05 15:15 PDT, Darin Adler
no flags Details | Formatted Diff | Diff
Patch (62.84 KB, patch)
2020-09-05 15:30 PDT, Darin Adler
no flags Details | Formatted Diff | Diff
Patch (69.44 KB, patch)
2020-09-05 18:20 PDT, Darin Adler
no flags Details | Formatted Diff | Diff
Patch (69.92 KB, patch)
2020-09-05 22:06 PDT, Darin Adler
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Darin Adler 2020-09-05 14:44:42 PDT
TextCodec refinements
Comment 1 Darin Adler 2020-09-05 15:15:20 PDT Comment hidden (obsolete)
Comment 2 Darin Adler 2020-09-05 15:30:39 PDT Comment hidden (obsolete)
Comment 3 Darin Adler 2020-09-05 18:20:01 PDT Comment hidden (obsolete)
Comment 4 Darin Adler 2020-09-05 18:34:24 PDT
Earlier versions of the patch had the BOM logic exactly backwards! Should be all better now.
Comment 5 Darin Adler 2020-09-05 22:06:03 PDT
Created attachment 408115 [details]
Patch
Comment 6 Darin Adler 2020-09-05 22:16:39 PDT
Comment on attachment 408115 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=408115&action=review

> Source/WebCore/platform/text/TextCodecCJK.cpp:195
> +    auto range = std::equal_range(jis0212, jis0212 + std::size(jis0212), std::pair<uint16_t, UChar>(pointer, 0), [](const auto& a, const auto& b) {

Should use std::begin/end here instead of std::size.
Comment 7 Darin Adler 2020-09-06 10:30:50 PDT
All green EWS now, so I guess it’s time for review.
Comment 8 Sam Weinig 2020-09-06 11:13:03 PDT
Comment on attachment 408115 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=408115&action=review

> Source/WebCore/platform/text/TextCodecCJK.cpp:542
> +            static constexpr std::array<UChar32, 63> iso2022JPKatakana {

Annoying that you have to provide the 63 here, but it looks like that is necessary if you want to use UChar32 explicitly, otherwise, you could do 

`static constexpr std::array iso2022JPKatakana { 0x3002, 0x300C, 0x300D ... }

and it would deduce the size for you.

> Source/WebCore/platform/text/TextCodecCJK.cpp:690
> +        auto table = new EUCKREncodingIndex;
> +        for (size_t i = 0; i < std::size(eucKRDecodingIndex); i++)
>              (*table)[i] = { eucKRDecodingIndex[i].second, eucKRDecodingIndex[i].first };

One of these days I will get around to adding a zip iterator to make this a bit nicer.
Comment 9 Darin Adler 2020-09-06 11:56:40 PDT
Comment on attachment 408115 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=408115&action=review

>> Source/WebCore/platform/text/TextCodecCJK.cpp:542
>> +            static constexpr std::array<UChar32, 63> iso2022JPKatakana {
> 
> Annoying that you have to provide the 63 here, but it looks like that is necessary if you want to use UChar32 explicitly, otherwise, you could do 
> 
> `static constexpr std::array iso2022JPKatakana { 0x3002, 0x300C, 0x300D ... }
> 
> and it would deduce the size for you.

I suspect that instead we could use a standard C array instead of a std::array here. Or maybe we should write this:

    std::array<UChar32, (0xFF9F - 0xFF61) + 1> ...

Or this:

    static_assert(std::size(iso2022JPKatakana) == (0xFF9F - 0xFF61) + 1);
Comment 10 EWS 2020-09-06 12:10:16 PDT
Committed r266681: <https://trac.webkit.org/changeset/266681>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 408115 [details].
Comment 11 Radar WebKit Bug Importer 2020-09-06 12:11:15 PDT
<rdar://problem/68431646>