Align Big5 decoding with spec, Chrome, and Firefox
Created attachment 407633 [details] Patch
Comment on attachment 407633 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=407633&action=review > Source/WebCore/platform/text/EncodingTables.h:32 > +constexpr size_t jis0208Size { 7724 }; This should not be needed. There’s std::begin, std::end, std::size, and range-based for loops that all work with standard C arrays without requiring a separate constant. > Source/WebCore/platform/text/EncodingTables.h:35 > +constexpr size_t big5EncodingMapSize { 14686 }; Ditto. > Source/WebCore/platform/text/EncodingTables.h:37 > +constexpr size_t big5DecodingExtrasSize { 3904 }; Ditto. > Source/WebCore/platform/text/TextCodecCJK.cpp:218 > + static std::array<std::pair<uint16_t, UChar32>, big5DecodingExtrasSize + big5EncodingMapSize>* table { nullptr }; > + static std::once_flag flag; > + std::call_once(flag, [&] { call_once should not be required. Should just be able to initialize it like this: static std::array<std::pair<uint16_t, UChar32>, big5DecodingExtrasSize + big5EncodingMapSize>* table = []{ But also, why is this on the heap? > Source/WebCore/platform/text/TextCodecCJK.cpp:228 > + for (size_t i = 0; i < big5DecodingExtrasSize; i++) > + (*table)[i] = big5DecodingExtras[i]; > + for (size_t i = 0; i < big5EncodingMapSize; i++) { > + (*table)[i + big5DecodingExtrasSize].first = big5EncodingMap[i].second; > + (*table)[i + big5DecodingExtrasSize].second = big5EncodingMap[i].first; > + } > + std::sort(table->begin(), table->end(), [] (auto& a, auto& b) { > + return a.first < b.first; > + }); Can we do this at compile time instead of runtime? > Source/WebCore/platform/text/TextCodecCJK.h:39 > + enum class Encoding : uint8_t { > + EUC_JP, > + Big5 > + }; > + > + explicit TextCodecCJK(Encoding); Does this need to be public? Maybe so makeUnique can see it? In concept this is private. > Source/WebCore/platform/text/TextCodecCJK.h:40 > + virtual ~TextCodecCJK() = default; I suggest just omitting this; compiler will generate exactly this without you having to write anything. > Source/WebCore/platform/text/TextCodecCJK.h:56 > + std::unique_ptr<TextCodec> m_icuCodec; Wait, what? Is this permanent for EUC-JP or just temporary?
(In reply to Darin Adler from comment #2) > But also, why is this on the heap? I wanted the memory use to be 0 for non-Big5 pages. > > Source/WebCore/platform/text/TextCodecCJK.cpp:228 > > + for (size_t i = 0; i < big5DecodingExtrasSize; i++) > > + (*table)[i] = big5DecodingExtras[i]; > > + for (size_t i = 0; i < big5EncodingMapSize; i++) { > > + (*table)[i + big5DecodingExtrasSize].first = big5EncodingMap[i].second; > > + (*table)[i + big5DecodingExtrasSize].second = big5EncodingMap[i].first; > > + } > > + std::sort(table->begin(), table->end(), [] (auto& a, auto& b) { > > + return a.first < b.first; > > + }); > > Can we do this at compile time instead of runtime? That would increase the binary size of WebCore by over 100kb. > Wait, what? Is this permanent for EUC-JP or just temporary? euc-jp decoding is quite close to correct. We may be able to just use it as it is now.
Created attachment 407643 [details] Patch
Created attachment 407644 [details] Patch
Comment on attachment 407644 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=407644&action=review > Source/WebCore/platform/text/TextCodecCJK.cpp:48 The case of the first alias needs to match the case of the name or it will assert because the assertion uses strcmp but the HashTable uses ascii-case-insensitive hashing. No problem. I'll match cases.
Created attachment 407654 [details] Patch
Committed r266397: <https://trac.webkit.org/changeset/266397> All reviewed patches have been landed. Closing bug and clearing flags on attachment 407654 [details].
<rdar://problem/68147204>
*** Bug 179307 has been marked as a duplicate of this bug. ***