WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
216016
Align Big5 decoding with spec, Chrome, and Firefox
https://bugs.webkit.org/show_bug.cgi?id=216016
Summary
Align Big5 decoding with spec, Chrome, and Firefox
Alex Christensen
Reported
2020-08-31 16:10:36 PDT
Align Big5 decoding with spec, Chrome, and Firefox
Attachments
Patch
(1.09 MB, patch)
2020-08-31 16:13 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Patch
(1.09 MB, patch)
2020-08-31 18:11 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Patch
(1.09 MB, patch)
2020-08-31 18:17 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Patch
(1.09 MB, patch)
2020-08-31 20:46 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Alex Christensen
Comment 1
2020-08-31 16:13:57 PDT
Created
attachment 407633
[details]
Patch
Darin Adler
Comment 2
2020-08-31 16:42:07 PDT
Comment on
attachment 407633
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=407633&action=review
> Source/WebCore/platform/text/EncodingTables.h:32 > +constexpr size_t jis0208Size { 7724 };
This should not be needed. There’s std::begin, std::end, std::size, and range-based for loops that all work with standard C arrays without requiring a separate constant.
> Source/WebCore/platform/text/EncodingTables.h:35 > +constexpr size_t big5EncodingMapSize { 14686 };
Ditto.
> Source/WebCore/platform/text/EncodingTables.h:37 > +constexpr size_t big5DecodingExtrasSize { 3904 };
Ditto.
> Source/WebCore/platform/text/TextCodecCJK.cpp:218 > + static std::array<std::pair<uint16_t, UChar32>, big5DecodingExtrasSize + big5EncodingMapSize>* table { nullptr }; > + static std::once_flag flag; > + std::call_once(flag, [&] {
call_once should not be required. Should just be able to initialize it like this: static std::array<std::pair<uint16_t, UChar32>, big5DecodingExtrasSize + big5EncodingMapSize>* table = []{ But also, why is this on the heap?
> Source/WebCore/platform/text/TextCodecCJK.cpp:228 > + for (size_t i = 0; i < big5DecodingExtrasSize; i++) > + (*table)[i] = big5DecodingExtras[i]; > + for (size_t i = 0; i < big5EncodingMapSize; i++) { > + (*table)[i + big5DecodingExtrasSize].first = big5EncodingMap[i].second; > + (*table)[i + big5DecodingExtrasSize].second = big5EncodingMap[i].first; > + } > + std::sort(table->begin(), table->end(), [] (auto& a, auto& b) { > + return a.first < b.first; > + });
Can we do this at compile time instead of runtime?
> Source/WebCore/platform/text/TextCodecCJK.h:39 > + enum class Encoding : uint8_t { > + EUC_JP, > + Big5 > + }; > + > + explicit TextCodecCJK(Encoding);
Does this need to be public? Maybe so makeUnique can see it? In concept this is private.
> Source/WebCore/platform/text/TextCodecCJK.h:40 > + virtual ~TextCodecCJK() = default;
I suggest just omitting this; compiler will generate exactly this without you having to write anything.
> Source/WebCore/platform/text/TextCodecCJK.h:56 > + std::unique_ptr<TextCodec> m_icuCodec;
Wait, what? Is this permanent for EUC-JP or just temporary?
Alex Christensen
Comment 3
2020-08-31 16:46:26 PDT
(In reply to Darin Adler from
comment #2
)
> But also, why is this on the heap?
I wanted the memory use to be 0 for non-Big5 pages.
> > Source/WebCore/platform/text/TextCodecCJK.cpp:228 > > + for (size_t i = 0; i < big5DecodingExtrasSize; i++) > > + (*table)[i] = big5DecodingExtras[i]; > > + for (size_t i = 0; i < big5EncodingMapSize; i++) { > > + (*table)[i + big5DecodingExtrasSize].first = big5EncodingMap[i].second; > > + (*table)[i + big5DecodingExtrasSize].second = big5EncodingMap[i].first; > > + } > > + std::sort(table->begin(), table->end(), [] (auto& a, auto& b) { > > + return a.first < b.first; > > + }); > > Can we do this at compile time instead of runtime?
That would increase the binary size of WebCore by over 100kb.
> Wait, what? Is this permanent for EUC-JP or just temporary?
euc-jp decoding is quite close to correct. We may be able to just use it as it is now.
Alex Christensen
Comment 4
2020-08-31 18:11:55 PDT
Created
attachment 407643
[details]
Patch
Alex Christensen
Comment 5
2020-08-31 18:17:39 PDT
Created
attachment 407644
[details]
Patch
Alex Christensen
Comment 6
2020-08-31 20:46:08 PDT
Comment on
attachment 407644
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=407644&action=review
> Source/WebCore/platform/text/TextCodecCJK.cpp:48
The case of the first alias needs to match the case of the name or it will assert because the assertion uses strcmp but the HashTable uses ascii-case-insensitive hashing. No problem. I'll match cases.
Alex Christensen
Comment 7
2020-08-31 20:46:49 PDT
Created
attachment 407654
[details]
Patch
EWS
Comment 8
2020-09-01 09:02:17 PDT
Committed
r266397
: <
https://trac.webkit.org/changeset/266397
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 407654
[details]
.
Radar WebKit Bug Importer
Comment 9
2020-09-01 09:03:18 PDT
<
rdar://problem/68147204
>
Anne van Kesteren
Comment 10
2022-09-27 06:44:37 PDT
***
Bug 179307
has been marked as a duplicate of this bug. ***
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug