Bug 216016 - Align Big5 decoding with spec, Chrome, and Firefox
Summary: Align Big5 decoding with spec, Chrome, and Firefox
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alex Christensen
URL:
Keywords: InRadar
: 179307 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-08-31 16:10 PDT by Alex Christensen
Modified: 2022-09-27 06:44 PDT (History)
9 users (show)

See Also:


Attachments
Patch (1.09 MB, patch)
2020-08-31 16:13 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (1.09 MB, patch)
2020-08-31 18:11 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (1.09 MB, patch)
2020-08-31 18:17 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (1.09 MB, patch)
2020-08-31 20:46 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Christensen 2020-08-31 16:10:36 PDT
Align Big5 decoding with spec, Chrome, and Firefox
Comment 1 Alex Christensen 2020-08-31 16:13:57 PDT
Created attachment 407633 [details]
Patch
Comment 2 Darin Adler 2020-08-31 16:42:07 PDT
Comment on attachment 407633 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=407633&action=review

> Source/WebCore/platform/text/EncodingTables.h:32
> +constexpr size_t jis0208Size { 7724 };

This should not be needed. There’s std::begin, std::end, std::size, and range-based for loops that all work with standard C arrays without requiring a separate constant.

> Source/WebCore/platform/text/EncodingTables.h:35
> +constexpr size_t big5EncodingMapSize { 14686 };

Ditto.

> Source/WebCore/platform/text/EncodingTables.h:37
> +constexpr size_t big5DecodingExtrasSize { 3904 };

Ditto.

> Source/WebCore/platform/text/TextCodecCJK.cpp:218
> +    static std::array<std::pair<uint16_t, UChar32>, big5DecodingExtrasSize + big5EncodingMapSize>* table { nullptr };
> +    static std::once_flag flag;
> +    std::call_once(flag, [&] {

call_once should not be required. Should just be able to initialize it like this:

    static std::array<std::pair<uint16_t, UChar32>, big5DecodingExtrasSize + big5EncodingMapSize>* table = []{

But also, why is this on the heap?

> Source/WebCore/platform/text/TextCodecCJK.cpp:228
> +        for (size_t i = 0; i < big5DecodingExtrasSize; i++)
> +            (*table)[i] = big5DecodingExtras[i];
> +        for (size_t i = 0; i < big5EncodingMapSize; i++) {
> +            (*table)[i + big5DecodingExtrasSize].first = big5EncodingMap[i].second;
> +            (*table)[i + big5DecodingExtrasSize].second = big5EncodingMap[i].first;
> +        }
> +        std::sort(table->begin(), table->end(), [] (auto& a, auto& b) {
> +            return a.first < b.first;
> +        });

Can we do this at compile time instead of runtime?

> Source/WebCore/platform/text/TextCodecCJK.h:39
> +    enum class Encoding : uint8_t {
> +        EUC_JP,
> +        Big5
> +    };
> +    
> +    explicit TextCodecCJK(Encoding);

Does this need to be public? Maybe so makeUnique can see it? In concept this is private.

> Source/WebCore/platform/text/TextCodecCJK.h:40
> +    virtual ~TextCodecCJK() = default;

I suggest just omitting this; compiler will generate exactly this without you having to write anything.

> Source/WebCore/platform/text/TextCodecCJK.h:56
> +    std::unique_ptr<TextCodec> m_icuCodec;

Wait, what? Is this permanent for EUC-JP or just temporary?
Comment 3 Alex Christensen 2020-08-31 16:46:26 PDT
(In reply to Darin Adler from comment #2)

> But also, why is this on the heap?
I wanted the memory use to be 0 for non-Big5 pages.

> > Source/WebCore/platform/text/TextCodecCJK.cpp:228
> > +        for (size_t i = 0; i < big5DecodingExtrasSize; i++)
> > +            (*table)[i] = big5DecodingExtras[i];
> > +        for (size_t i = 0; i < big5EncodingMapSize; i++) {
> > +            (*table)[i + big5DecodingExtrasSize].first = big5EncodingMap[i].second;
> > +            (*table)[i + big5DecodingExtrasSize].second = big5EncodingMap[i].first;
> > +        }
> > +        std::sort(table->begin(), table->end(), [] (auto& a, auto& b) {
> > +            return a.first < b.first;
> > +        });
> 
> Can we do this at compile time instead of runtime?
That would increase the binary size of WebCore by over 100kb.

> Wait, what? Is this permanent for EUC-JP or just temporary?
euc-jp decoding is quite close to correct.  We may be able to just use it as it is now.
Comment 4 Alex Christensen 2020-08-31 18:11:55 PDT
Created attachment 407643 [details]
Patch
Comment 5 Alex Christensen 2020-08-31 18:17:39 PDT
Created attachment 407644 [details]
Patch
Comment 6 Alex Christensen 2020-08-31 20:46:08 PDT
Comment on attachment 407644 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=407644&action=review

> Source/WebCore/platform/text/TextCodecCJK.cpp:48


The case of the first alias needs to match the case of the name or it will assert because the assertion uses strcmp but the HashTable uses ascii-case-insensitive hashing.  No problem.  I'll match cases.
Comment 7 Alex Christensen 2020-08-31 20:46:49 PDT
Created attachment 407654 [details]
Patch
Comment 8 EWS 2020-09-01 09:02:17 PDT
Committed r266397: <https://trac.webkit.org/changeset/266397>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 407654 [details].
Comment 9 Radar WebKit Bug Importer 2020-09-01 09:03:18 PDT
<rdar://problem/68147204>
Comment 10 Anne van Kesteren 2022-09-27 06:44:37 PDT
*** Bug 179307 has been marked as a duplicate of this bug. ***