Bug 216253

Summary: Derive big5, jis0208 and jis0212 tables from ICU
Product: WebKit Reporter: Alex Christensen <achristensen>
Component: New BugsAssignee: Alex Christensen <achristensen>
Status: RESOLVED FIXED    
Severity: Normal CC: darin, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch none

Alex Christensen
Reported 2020-09-07 15:06:02 PDT
Derive jis0208 and jis0212 tables from ICU
Attachments
Patch (303.78 KB, patch)
2020-09-07 15:07 PDT, Alex Christensen
no flags
Patch (977.24 KB, patch)
2020-09-07 23:36 PDT, Alex Christensen
no flags
Alex Christensen
Comment 1 2020-09-07 15:07:46 PDT
Darin Adler
Comment 2 2020-09-07 16:00:06 PDT
Comment on attachment 408198 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408198&action=review r=me assuming all the tests pass > Source/WebCore/platform/text/EncodingTables.cpp:1064 > + array = new std::array<std::pair<uint16_t, UChar>, 7724>(); No need for the parens here. > Source/WebCore/platform/text/EncodingTables.cpp:1081 > + ucnv_toUnicode(icuConverter.get(), &output, output + 1, &input, input + sizeof(icuInput), nullptr, true, &error); Are these 8836 separate calls to ucnv_toUnicode fast enough for our purposes? > Source/WebCore/platform/text/EncodingTables.cpp:1887 > + ucnv_toUnicode(icuConverter.get(), &output, output + 1, &input, input + sizeof(icuInput), nullptr, true, &error); Ditto. > Source/WebCore/platform/text/EncodingTables.h:47 > -template<typename CollectionType> void sortByFirst(CollectionType&); > +template<typename CollectionType> void stableSortByFirst(CollectionType&); I suggest keeping both around because std::sort is more efficient than std::stable_sort when they keys are all unique. We would use sortByFirst in any cases where the keys are unique, and it could even do the sortedFirstsAreUnique assertion. Removing the assertion would be an nice cleanup to the call sites.
Alex Christensen
Comment 3 2020-09-07 23:34:45 PDT
Comment on attachment 408198 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408198&action=review > Source/WebCore/platform/text/TextCodecCJK.cpp:632 > + for (auto* pair = range.first; pair < range.second; pair++) { It looks like I'll have to make pair auto instead of auto* because of MSVC's STL implementation.
Alex Christensen
Comment 4 2020-09-07 23:36:25 PDT
Alex Christensen
Comment 5 2020-09-07 23:47:33 PDT
I measured these time overheads in the complete table generation process: big5: 0.001075 seconds jis0208: 0.000405 seconds jis0212: 0.000417 seconds I also measured a 146240 byte decrease in WebCore's binary size with this change, which I think is worth it.
EWS
Comment 6 2020-09-08 10:51:09 PDT
Committed r266729: <https://trac.webkit.org/changeset/266729> All reviewed patches have been landed. Closing bug and clearing flags on attachment 408214 [details].
Radar WebKit Bug Importer
Comment 7 2020-09-08 10:52:19 PDT
Darin Adler
Comment 8 2020-09-08 10:53:11 PDT
Comment on attachment 408214 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408214&action=review > Source/WebCore/platform/text/TextCodecCJK.cpp:126 > +using JIS0208EncodeIndex = std::array<std::pair<UChar, uint16_t>, sizeof(jis0208()) / sizeof(jis0208()[0])>; Why this sizeof math instead of std::size? > Source/WebCore/platform/text/TextCodecCJK.cpp:648 > +using EUCKREncodingIndex = std::array<std::pair<UChar, uint16_t>, sizeof(eucKR()) / sizeof(eucKR()[0])>; Why this sizeof math instead of std::size?
Darin Adler
Comment 9 2020-09-08 10:55:34 PDT
Comment on attachment 408214 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408214&action=review >> Source/WebCore/platform/text/TextCodecCJK.cpp:126 >> +using JIS0208EncodeIndex = std::array<std::pair<UChar, uint16_t>, sizeof(jis0208()) / sizeof(jis0208()[0])>; > > Why this sizeof math instead of std::size? There must be some way to get the size based on the type rather than doing this math. I guess std::size is not it.
Note You need to log in before you can comment on or make changes to this bug.