RESOLVED FIXED 216253
Derive big5, jis0208 and jis0212 tables from ICU
https://bugs.webkit.org/show_bug.cgi?id=216253
Summary Derive big5, jis0208 and jis0212 tables from ICU
Alex Christensen
Reported 2020-09-07 15:06:02 PDT
Derive jis0208 and jis0212 tables from ICU
Attachments
Patch (303.78 KB, patch)
2020-09-07 15:07 PDT, Alex Christensen
no flags
Patch (977.24 KB, patch)
2020-09-07 23:36 PDT, Alex Christensen
no flags
Alex Christensen
Comment 1 2020-09-07 15:07:46 PDT
Darin Adler
Comment 2 2020-09-07 16:00:06 PDT
Comment on attachment 408198 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408198&action=review r=me assuming all the tests pass > Source/WebCore/platform/text/EncodingTables.cpp:1064 > + array = new std::array<std::pair<uint16_t, UChar>, 7724>(); No need for the parens here. > Source/WebCore/platform/text/EncodingTables.cpp:1081 > + ucnv_toUnicode(icuConverter.get(), &output, output + 1, &input, input + sizeof(icuInput), nullptr, true, &error); Are these 8836 separate calls to ucnv_toUnicode fast enough for our purposes? > Source/WebCore/platform/text/EncodingTables.cpp:1887 > + ucnv_toUnicode(icuConverter.get(), &output, output + 1, &input, input + sizeof(icuInput), nullptr, true, &error); Ditto. > Source/WebCore/platform/text/EncodingTables.h:47 > -template<typename CollectionType> void sortByFirst(CollectionType&); > +template<typename CollectionType> void stableSortByFirst(CollectionType&); I suggest keeping both around because std::sort is more efficient than std::stable_sort when they keys are all unique. We would use sortByFirst in any cases where the keys are unique, and it could even do the sortedFirstsAreUnique assertion. Removing the assertion would be an nice cleanup to the call sites.
Alex Christensen
Comment 3 2020-09-07 23:34:45 PDT
Comment on attachment 408198 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408198&action=review > Source/WebCore/platform/text/TextCodecCJK.cpp:632 > + for (auto* pair = range.first; pair < range.second; pair++) { It looks like I'll have to make pair auto instead of auto* because of MSVC's STL implementation.
Alex Christensen
Comment 4 2020-09-07 23:36:25 PDT
Alex Christensen
Comment 5 2020-09-07 23:47:33 PDT
I measured these time overheads in the complete table generation process: big5: 0.001075 seconds jis0208: 0.000405 seconds jis0212: 0.000417 seconds I also measured a 146240 byte decrease in WebCore's binary size with this change, which I think is worth it.
EWS
Comment 6 2020-09-08 10:51:09 PDT
Committed r266729: <https://trac.webkit.org/changeset/266729> All reviewed patches have been landed. Closing bug and clearing flags on attachment 408214 [details].
Radar WebKit Bug Importer
Comment 7 2020-09-08 10:52:19 PDT
Darin Adler
Comment 8 2020-09-08 10:53:11 PDT
Comment on attachment 408214 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408214&action=review > Source/WebCore/platform/text/TextCodecCJK.cpp:126 > +using JIS0208EncodeIndex = std::array<std::pair<UChar, uint16_t>, sizeof(jis0208()) / sizeof(jis0208()[0])>; Why this sizeof math instead of std::size? > Source/WebCore/platform/text/TextCodecCJK.cpp:648 > +using EUCKREncodingIndex = std::array<std::pair<UChar, uint16_t>, sizeof(eucKR()) / sizeof(eucKR()[0])>; Why this sizeof math instead of std::size?
Darin Adler
Comment 9 2020-09-08 10:55:34 PDT
Comment on attachment 408214 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408214&action=review >> Source/WebCore/platform/text/TextCodecCJK.cpp:126 >> +using JIS0208EncodeIndex = std::array<std::pair<UChar, uint16_t>, sizeof(jis0208()) / sizeof(jis0208()[0])>; > > Why this sizeof math instead of std::size? There must be some way to get the size based on the type rather than doing this math. I guess std::size is not it.
Note You need to log in before you can comment on or make changes to this bug.