WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 215970
Remove NFC normalization when submitting forms and encoding URL queries and fix EUC-JP encoding
https://bugs.webkit.org/show_bug.cgi?id=215970
Summary
Remove NFC normalization when submitting forms and encoding URL queries and f...
Alex Christensen
Reported
2020-08-28 21:32:01 PDT
Remove NFC normalization when submitting forms and encoding URL queries and fix EUC-JP encoding
Attachments
Patch
(2.34 MB, patch)
2020-08-28 21:54 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Patch
(2.35 MB, patch)
2020-08-28 22:50 PDT
,
Alex Christensen
darin
: review+
Details
Formatted Diff
Diff
patch
(3.15 MB, patch)
2020-08-29 17:46 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Alex Christensen
Comment 1
2020-08-28 21:54:59 PDT
Created
attachment 407524
[details]
Patch
Alex Christensen
Comment 2
2020-08-28 22:50:34 PDT
Created
attachment 407530
[details]
Patch
Darin Adler
Comment 3
2020-08-29 12:15:44 PDT
Comment on
attachment 407530
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=407530&action=review
Looks like there are more tests to rebase.
> Source/WebCore/platform/text/ICUExtras.h:32 > +extern const UChar jis0208[15448];
It doesn’t make sense to me to call this "ICU" extras. This is our own encoder/decoder for a particular encoding, and not part of ICU. I suggest a different name for this file.
> Source/WebCore/platform/text/TextCodecICU.cpp:436 > + auto pointerRange = std::equal_range(reinterpret_cast<const std::pair<UChar, UChar>*>(jis0208), reinterpret_cast<const std::pair<UChar, UChar>*>(jis0208 + WTF_ARRAY_LENGTH(jis0208)), std::pair<UChar, UChar>(c, 0), [](const auto& a, const auto& b) {
Why are we casting an array of UChar into an array of pairs? Why not actually compile an array of pairs? That should work just as well, just with more braces in the source file, and without the reinterpret_cast.
> Source/WebCore/platform/text/TextCodecICU.cpp:454 > + CString decimal = makeString(static_cast<unsigned>(c)).utf8(); > + for (size_t i = 0; i < decimal.length(); i++) > + result.uncheckedAppend(decimal.data()[i]);
We have functions that will do this more efficiently, without allocating memory: uint8_t buffer[5]; writeIntegerToBuffer(character, buffer); result.append(buffer, lengthOfIntegerAsString(character)); Or some variation on this.
> Source/WebCore/platform/text/TextCodecICU.cpp:504 > + if (!strcmp(m_canonicalConverterName, "euc-jp-2007")) > + return eucJPEncode(string, unencodableHandler(handling));
The way our codec classes are designed, you would normally create and register a separate class derived TextCodec rather than adding a non-ICU implementation to the ICU one. Did you consider that? Why did you reject it?
> Source/WebCore/platform/text/TextEncoding.cpp:72 > +Vector<uint8_t> TextEncoding::encode(StringView string, UnencodableHandling handling, NFCNormalize nfcNormalize) const
I suggest naming the local variable just "normalize".
> Source/WebCore/platform/text/TextEncoding.cpp:82 > + auto normalizedString = normalizedNFC(string); > + return newTextCodec(*this)->encode(normalizedString.view, handling);
I suggest making this just a one-liner.
Alex Christensen
Comment 4
2020-08-29 15:49:29 PDT
(In reply to Darin Adler from
comment #3
)
> > Source/WebCore/platform/text/ICUExtras.h:32 > > +extern const UChar jis0208[15448]; > > It doesn’t make sense to me to call this "ICU" extras. This is our own > encoder/decoder for a particular encoding, and not part of ICU. I suggest a > different name for this file.
I called it ICU extras because I think these should be in ICU, and the ICU that chromium uses has this in it. I could call it "EncodingTables.{h, cpp}"
> > Source/WebCore/platform/text/TextCodecICU.cpp:436 > > + auto pointerRange = std::equal_range(reinterpret_cast<const std::pair<UChar, UChar>*>(jis0208), reinterpret_cast<const std::pair<UChar, UChar>*>(jis0208 + WTF_ARRAY_LENGTH(jis0208)), std::pair<UChar, UChar>(c, 0), [](const auto& a, const auto& b) { > > Why are we casting an array of UChar into an array of pairs? Why not > actually compile an array of pairs? That should work just as well, just with > more braces in the source file, and without the reinterpret_cast.
I was thinking that a const array of UChar would definitely pack as tight as possible in the binary to only increase by 30k. I wasn't so sure about a const array of std::pair<UChar, UChar>, but now that I think about it it should pack it just as tight, especially in a release build.
Darin Adler
Comment 5
2020-08-29 16:27:44 PDT
(In reply to Alex Christensen from
comment #4
)
> I could call it "EncodingTables.{h, cpp}"
I think that’s better.
> now that I think about it it > should pack it just as tight, especially in a release build.
Think of it this way: if it wasn’t going to pack as tight, then the reinterpret_cast wouldn’t work.
Alex Christensen
Comment 6
2020-08-29 17:26:27 PDT
(In reply to Darin Adler from
comment #3
)
> Looks like there are more tests to rebase.
Ah, yes. Changing the NFC normalization changed the encoding of the code points that would have been normalized, which added more PASSes outside of euc-jp. I'll rebase and make them all PASS soon.
Alex Christensen
Comment 7
2020-08-29 17:46:52 PDT
Created
attachment 407558
[details]
patch
Alex Christensen
Comment 8
2020-08-29 18:49:21 PDT
Comment on
attachment 407558
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=407558&action=review
> Source/WebCore/platform/text/TextCodecICU.cpp:496 > + } > +}
I added ASSERT_NOT_REACHED(); return entityUnencodableHandler; here to make MSVC happy.
http://trac.webkit.org/r266330
Radar WebKit Bug Importer
Comment 9
2020-08-29 18:50:18 PDT
<
rdar://problem/68011503
>
Anne van Kesteren
Comment 10
2022-09-27 08:01:50 PDT
***
Bug 159891
has been marked as a duplicate of this bug. ***
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug