RESOLVED FIXED216081
Align EUC-KR encoding with Chrome, Firefox, and specification
https://bugs.webkit.org/show_bug.cgi?id=216081
Summary Align EUC-KR encoding with Chrome, Firefox, and specification
Alex Christensen
Reported 2020-09-02 10:30:35 PDT
Align EUC-KR encoding with Chrome, Firefox, and specification
Attachments
Patch (429.79 KB, patch)
2020-09-02 10:32 PDT, Alex Christensen
no flags
Patch (431.56 KB, patch)
2020-09-02 14:10 PDT, Alex Christensen
no flags
Patch (431.11 KB, patch)
2020-09-02 14:26 PDT, Alex Christensen
darin: review+
Alex Christensen
Comment 1 2020-09-02 10:32:23 PDT
Alex Christensen
Comment 2 2020-09-02 14:10:44 PDT
Alex Christensen
Comment 3 2020-09-02 14:26:34 PDT
Alexey Proskuryakov
Comment 4 2020-09-02 18:28:45 PDT
Comment on attachment 407815 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=407815&action=review > Source/WebCore/ChangeLog:3 > + Align EUC-KR encoding with Chrome, Firefox, and specification Will we be using ICU codecs at all once you are done? :-O
Alex Christensen
Comment 5 2020-09-02 18:59:10 PDT
Yes. Most of them are compliant with modern standards, and https://bugs.webkit.org/show_bug.cgi?id=216094 shows evidence that they are making a nonzero amount of changes to comply with modern standards (windows-1255 changed to be "correct" on Big Sur and iOS 14), but I'm not going to just wait for them to move at the speed of browsers.
Darin Adler
Comment 6 2020-09-02 21:38:29 PDT
I think we might want to create an automated way to try ICU in the future so we it’s easy enough to know when to turn these back off and rely on ICU again. Worth thinking about making this very easy to notice and fix.
Darin Adler
Comment 7 2020-09-02 21:40:57 PDT
Comment on attachment 407815 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=407815&action=review > Source/WebCore/platform/text/TextCodecCJK.cpp:340 > + static auto* table = [] { > + auto table = new EUCKREncodingIndex(); > + for (size_t i = 0; i < WTF_ARRAY_LENGTH(eucKRDecodingIndex); i++) > + (*table)[i] = { eucKRDecodingIndex[i].second, eucKRDecodingIndex[i].first }; > + std::sort(table->begin(), table->end(), [] (auto& a, auto& b) { > + return a.first < b.first; > + }); > + return table; > + }(); > + return *table; Since we allocate these tables on the heap, can we figure out a way to make these smaller, some kind of simple compression scheme? Not about this particular one, but all the giant tables. > Source/WebCore/platform/text/TextCodecCJK.cpp:372 > + result.append(pointer / 190 + 0x81); > + result.append(pointer % 190 + 0x41); Where do these magic numbers come from? > Source/WebCore/platform/text/TextCodecCJK.cpp:385 > + uint16_t pointer = (lead - 0x81) * 190 + byte - 0x41; Where do these magic numbers come from?
Alex Christensen
Comment 8 2020-09-02 22:34:22 PDT
Comment on attachment 407815 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=407815&action=review >> Source/WebCore/platform/text/TextCodecCJK.cpp:340 >> + return *table; > > Since we allocate these tables on the heap, can we figure out a way to make these smaller, some kind of simple compression scheme? Not about this particular one, but all the giant tables. If we're using them then we want them in memory somewhere in this form. I think we should compress them then decompress them if needed to reduce the size of the WebCore dylib >> Source/WebCore/platform/text/TextCodecCJK.cpp:372 >> + result.append(pointer % 190 + 0x41); > > Where do these magic numbers come from? https://encoding.spec.whatwg.org/#euc-kr-encoder >> Source/WebCore/platform/text/TextCodecCJK.cpp:385 >> + uint16_t pointer = (lead - 0x81) * 190 + byte - 0x41; > > Where do these magic numbers come from? https://encoding.spec.whatwg.org/#euc-kr-decoder
Darin Adler
Comment 9 2020-09-03 00:10:50 PDT
Comment on attachment 407815 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=407815&action=review >>> Source/WebCore/platform/text/TextCodecCJK.cpp:340 >>> + return *table; >> >> Since we allocate these tables on the heap, can we figure out a way to make these smaller, some kind of simple compression scheme? Not about this particular one, but all the giant tables. > > If we're using them then we want them in memory somewhere in this form. I think we should compress them then decompress them if needed to reduce the size of the WebCore dylib Yes, that’s what was talking about. The tables in the binary, the ones we use to as input to a function to allocate the tables we actually use. I was suggesting that we compress those.
Alex Christensen
Comment 10 2020-09-03 09:05:13 PDT
fast/block/margin-collapse/103.html did not change with this patch, but some of the iOS failures (also not from this patch) are things I need to rebase.
Alex Christensen
Comment 11 2020-09-03 09:06:35 PDT
Radar WebKit Bug Importer
Comment 12 2020-09-03 09:07:18 PDT
Note You need to log in before you can comment on or make changes to this bug.