RESOLVED FIXED 216016
Align Big5 decoding with spec, Chrome, and Firefox
https://bugs.webkit.org/show_bug.cgi?id=216016
Summary Align Big5 decoding with spec, Chrome, and Firefox
Alex Christensen
Reported 2020-08-31 16:10:36 PDT
Align Big5 decoding with spec, Chrome, and Firefox
Attachments
Patch (1.09 MB, patch)
2020-08-31 16:13 PDT, Alex Christensen
no flags
Patch (1.09 MB, patch)
2020-08-31 18:11 PDT, Alex Christensen
no flags
Patch (1.09 MB, patch)
2020-08-31 18:17 PDT, Alex Christensen
no flags
Patch (1.09 MB, patch)
2020-08-31 20:46 PDT, Alex Christensen
no flags
Alex Christensen
Comment 1 2020-08-31 16:13:57 PDT
Darin Adler
Comment 2 2020-08-31 16:42:07 PDT
Comment on attachment 407633 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=407633&action=review > Source/WebCore/platform/text/EncodingTables.h:32 > +constexpr size_t jis0208Size { 7724 }; This should not be needed. There’s std::begin, std::end, std::size, and range-based for loops that all work with standard C arrays without requiring a separate constant. > Source/WebCore/platform/text/EncodingTables.h:35 > +constexpr size_t big5EncodingMapSize { 14686 }; Ditto. > Source/WebCore/platform/text/EncodingTables.h:37 > +constexpr size_t big5DecodingExtrasSize { 3904 }; Ditto. > Source/WebCore/platform/text/TextCodecCJK.cpp:218 > + static std::array<std::pair<uint16_t, UChar32>, big5DecodingExtrasSize + big5EncodingMapSize>* table { nullptr }; > + static std::once_flag flag; > + std::call_once(flag, [&] { call_once should not be required. Should just be able to initialize it like this: static std::array<std::pair<uint16_t, UChar32>, big5DecodingExtrasSize + big5EncodingMapSize>* table = []{ But also, why is this on the heap? > Source/WebCore/platform/text/TextCodecCJK.cpp:228 > + for (size_t i = 0; i < big5DecodingExtrasSize; i++) > + (*table)[i] = big5DecodingExtras[i]; > + for (size_t i = 0; i < big5EncodingMapSize; i++) { > + (*table)[i + big5DecodingExtrasSize].first = big5EncodingMap[i].second; > + (*table)[i + big5DecodingExtrasSize].second = big5EncodingMap[i].first; > + } > + std::sort(table->begin(), table->end(), [] (auto& a, auto& b) { > + return a.first < b.first; > + }); Can we do this at compile time instead of runtime? > Source/WebCore/platform/text/TextCodecCJK.h:39 > + enum class Encoding : uint8_t { > + EUC_JP, > + Big5 > + }; > + > + explicit TextCodecCJK(Encoding); Does this need to be public? Maybe so makeUnique can see it? In concept this is private. > Source/WebCore/platform/text/TextCodecCJK.h:40 > + virtual ~TextCodecCJK() = default; I suggest just omitting this; compiler will generate exactly this without you having to write anything. > Source/WebCore/platform/text/TextCodecCJK.h:56 > + std::unique_ptr<TextCodec> m_icuCodec; Wait, what? Is this permanent for EUC-JP or just temporary?
Alex Christensen
Comment 3 2020-08-31 16:46:26 PDT
(In reply to Darin Adler from comment #2) > But also, why is this on the heap? I wanted the memory use to be 0 for non-Big5 pages. > > Source/WebCore/platform/text/TextCodecCJK.cpp:228 > > + for (size_t i = 0; i < big5DecodingExtrasSize; i++) > > + (*table)[i] = big5DecodingExtras[i]; > > + for (size_t i = 0; i < big5EncodingMapSize; i++) { > > + (*table)[i + big5DecodingExtrasSize].first = big5EncodingMap[i].second; > > + (*table)[i + big5DecodingExtrasSize].second = big5EncodingMap[i].first; > > + } > > + std::sort(table->begin(), table->end(), [] (auto& a, auto& b) { > > + return a.first < b.first; > > + }); > > Can we do this at compile time instead of runtime? That would increase the binary size of WebCore by over 100kb. > Wait, what? Is this permanent for EUC-JP or just temporary? euc-jp decoding is quite close to correct. We may be able to just use it as it is now.
Alex Christensen
Comment 4 2020-08-31 18:11:55 PDT
Alex Christensen
Comment 5 2020-08-31 18:17:39 PDT
Alex Christensen
Comment 6 2020-08-31 20:46:08 PDT
Comment on attachment 407644 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=407644&action=review > Source/WebCore/platform/text/TextCodecCJK.cpp:48 The case of the first alias needs to match the case of the name or it will assert because the assertion uses strcmp but the HashTable uses ascii-case-insensitive hashing. No problem. I'll match cases.
Alex Christensen
Comment 7 2020-08-31 20:46:49 PDT
EWS
Comment 8 2020-09-01 09:02:17 PDT
Committed r266397: <https://trac.webkit.org/changeset/266397> All reviewed patches have been landed. Closing bug and clearing flags on attachment 407654 [details].
Radar WebKit Bug Importer
Comment 9 2020-09-01 09:03:18 PDT
Anne van Kesteren
Comment 10 2022-09-27 06:44:37 PDT
*** Bug 179307 has been marked as a duplicate of this bug. ***
Note You need to log in before you can comment on or make changes to this bug.