Bug 216253 - Derive big5, jis0208 and jis0212 tables from ICU
Summary: Derive big5, jis0208 and jis0212 tables from ICU
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alex Christensen
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-09-07 15:06 PDT by Alex Christensen
Modified: 2020-09-08 10:55 PDT (History)
2 users (show)

See Also:


Attachments
Patch (303.78 KB, patch)
2020-09-07 15:07 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (977.24 KB, patch)
2020-09-07 23:36 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Christensen 2020-09-07 15:06:02 PDT
Derive jis0208 and jis0212 tables from ICU
Comment 1 Alex Christensen 2020-09-07 15:07:46 PDT
Created attachment 408198 [details]
Patch
Comment 2 Darin Adler 2020-09-07 16:00:06 PDT
Comment on attachment 408198 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=408198&action=review

r=me assuming all the tests pass

> Source/WebCore/platform/text/EncodingTables.cpp:1064
> +        array = new std::array<std::pair<uint16_t, UChar>, 7724>();

No need for the parens here.

> Source/WebCore/platform/text/EncodingTables.cpp:1081
> +                ucnv_toUnicode(icuConverter.get(), &output, output + 1, &input, input + sizeof(icuInput), nullptr, true, &error);

Are these 8836 separate calls to ucnv_toUnicode fast enough for our purposes?

> Source/WebCore/platform/text/EncodingTables.cpp:1887
> +                ucnv_toUnicode(icuConverter.get(), &output, output + 1, &input, input + sizeof(icuInput), nullptr, true, &error);

Ditto.

> Source/WebCore/platform/text/EncodingTables.h:47
> -template<typename CollectionType> void sortByFirst(CollectionType&);
> +template<typename CollectionType> void stableSortByFirst(CollectionType&);

I suggest keeping both around because std::sort is more efficient than std::stable_sort when they keys are all unique.

We would use sortByFirst in any cases where the keys are unique, and it could even do the sortedFirstsAreUnique assertion. Removing the assertion would be an nice cleanup to the call sites.
Comment 3 Alex Christensen 2020-09-07 23:34:45 PDT
Comment on attachment 408198 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=408198&action=review

> Source/WebCore/platform/text/TextCodecCJK.cpp:632
> +        for (auto* pair = range.first; pair < range.second; pair++) {

It looks like I'll have to make pair auto instead of auto* because of MSVC's STL implementation.
Comment 4 Alex Christensen 2020-09-07 23:36:25 PDT
Created attachment 408214 [details]
Patch
Comment 5 Alex Christensen 2020-09-07 23:47:33 PDT
I measured these time overheads in the complete table generation process:

big5: 0.001075 seconds
jis0208: 0.000405 seconds
jis0212: 0.000417 seconds

I also measured a 146240 byte decrease in WebCore's binary size with this change, which I think is worth it.
Comment 6 EWS 2020-09-08 10:51:09 PDT
Committed r266729: <https://trac.webkit.org/changeset/266729>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 408214 [details].
Comment 7 Radar WebKit Bug Importer 2020-09-08 10:52:19 PDT
<rdar://problem/68517504>
Comment 8 Darin Adler 2020-09-08 10:53:11 PDT
Comment on attachment 408214 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=408214&action=review

> Source/WebCore/platform/text/TextCodecCJK.cpp:126
> +using JIS0208EncodeIndex = std::array<std::pair<UChar, uint16_t>, sizeof(jis0208()) / sizeof(jis0208()[0])>;

Why this sizeof math instead of std::size?

> Source/WebCore/platform/text/TextCodecCJK.cpp:648
> +using EUCKREncodingIndex = std::array<std::pair<UChar, uint16_t>, sizeof(eucKR()) / sizeof(eucKR()[0])>;

Why this sizeof math instead of std::size?
Comment 9 Darin Adler 2020-09-08 10:55:34 PDT
Comment on attachment 408214 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=408214&action=review

>> Source/WebCore/platform/text/TextCodecCJK.cpp:126
>> +using JIS0208EncodeIndex = std::array<std::pair<UChar, uint16_t>, sizeof(jis0208()) / sizeof(jis0208()[0])>;
> 
> Why this sizeof math instead of std::size?

There must be some way to get the size based on the type rather than doing this math. I guess std::size is not it.