Bug 216253

Summary: Derive big5, jis0208 and jis0212 tables from ICU
Product: WebKit Reporter: Alex Christensen <achristensen>
Component: New BugsAssignee: Alex Christensen <achristensen>
Status: RESOLVED FIXED    
Severity: Normal CC: darin, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch none

Description Alex Christensen 2020-09-07 15:06:02 PDT
Derive jis0208 and jis0212 tables from ICU
Comment 1 Alex Christensen 2020-09-07 15:07:46 PDT
Created attachment 408198 [details]
Patch
Comment 2 Darin Adler 2020-09-07 16:00:06 PDT
Comment on attachment 408198 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=408198&action=review

r=me assuming all the tests pass

> Source/WebCore/platform/text/EncodingTables.cpp:1064
> +        array = new std::array<std::pair<uint16_t, UChar>, 7724>();

No need for the parens here.

> Source/WebCore/platform/text/EncodingTables.cpp:1081
> +                ucnv_toUnicode(icuConverter.get(), &output, output + 1, &input, input + sizeof(icuInput), nullptr, true, &error);

Are these 8836 separate calls to ucnv_toUnicode fast enough for our purposes?

> Source/WebCore/platform/text/EncodingTables.cpp:1887
> +                ucnv_toUnicode(icuConverter.get(), &output, output + 1, &input, input + sizeof(icuInput), nullptr, true, &error);

Ditto.

> Source/WebCore/platform/text/EncodingTables.h:47
> -template<typename CollectionType> void sortByFirst(CollectionType&);
> +template<typename CollectionType> void stableSortByFirst(CollectionType&);

I suggest keeping both around because std::sort is more efficient than std::stable_sort when they keys are all unique.

We would use sortByFirst in any cases where the keys are unique, and it could even do the sortedFirstsAreUnique assertion. Removing the assertion would be an nice cleanup to the call sites.
Comment 3 Alex Christensen 2020-09-07 23:34:45 PDT
Comment on attachment 408198 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=408198&action=review

> Source/WebCore/platform/text/TextCodecCJK.cpp:632
> +        for (auto* pair = range.first; pair < range.second; pair++) {

It looks like I'll have to make pair auto instead of auto* because of MSVC's STL implementation.
Comment 4 Alex Christensen 2020-09-07 23:36:25 PDT
Created attachment 408214 [details]
Patch
Comment 5 Alex Christensen 2020-09-07 23:47:33 PDT
I measured these time overheads in the complete table generation process:

big5: 0.001075 seconds
jis0208: 0.000405 seconds
jis0212: 0.000417 seconds

I also measured a 146240 byte decrease in WebCore's binary size with this change, which I think is worth it.
Comment 6 EWS 2020-09-08 10:51:09 PDT
Committed r266729: <https://trac.webkit.org/changeset/266729>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 408214 [details].
Comment 7 Radar WebKit Bug Importer 2020-09-08 10:52:19 PDT
<rdar://problem/68517504>
Comment 8 Darin Adler 2020-09-08 10:53:11 PDT
Comment on attachment 408214 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=408214&action=review

> Source/WebCore/platform/text/TextCodecCJK.cpp:126
> +using JIS0208EncodeIndex = std::array<std::pair<UChar, uint16_t>, sizeof(jis0208()) / sizeof(jis0208()[0])>;

Why this sizeof math instead of std::size?

> Source/WebCore/platform/text/TextCodecCJK.cpp:648
> +using EUCKREncodingIndex = std::array<std::pair<UChar, uint16_t>, sizeof(eucKR()) / sizeof(eucKR()[0])>;

Why this sizeof math instead of std::size?
Comment 9 Darin Adler 2020-09-08 10:55:34 PDT
Comment on attachment 408214 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=408214&action=review

>> Source/WebCore/platform/text/TextCodecCJK.cpp:126
>> +using JIS0208EncodeIndex = std::array<std::pair<UChar, uint16_t>, sizeof(jis0208()) / sizeof(jis0208()[0])>;
> 
> Why this sizeof math instead of std::size?

There must be some way to get the size based on the type rather than doing this math. I guess std::size is not it.