Bug 122795 - Make UTF-8 encoding of unpaired surrogates match Encoding standard
Summary: Make UTF-8 encoding of unpaired surrogates match Encoding standard
Status: RESOLVED CONFIGURATION CHANGED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: BlinkMergeCandidate
Depends on:
Blocks:
 
Reported: 2013-10-14 17:51 PDT by Ryosuke Niwa
Modified: 2022-08-21 13:28 PDT (History)
10 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ryosuke Niwa 2013-10-14 17:51:49 PDT
Consider merging https://chromium.googlesource.com/chromium/blink/+/109e9896a406aa3e76350a733bd030e8eeacc4c4

The Encoding standard says that unpaired UTF-16 surrogates in JS
strings should be converted into U+FFFD (replacement character)
during encode operations. This is (optionally) done already in
WTFString::utf8() but not handled in TextCodecUTF8.
Comment 1 Ahmad Saleem 2022-08-21 00:35:23 PDT
I didn't find good test from Chromium patch but this is the place where this patch needs to be applied:

Link - https://github.com/WebKit/WebKit/blob/4ddaf4f8c28e7795d0dae5f39fad1873a566067e/Source/WebCore/PAL/pal/text/TextCodecUTF8.cpp#L466

I don't if this is still needed or not. Appreciate if someone else can comment. Thanks!
Comment 2 Alexey Proskuryakov 2022-08-21 13:28:01 PDT
WebKit passes all tests that were added with this Chromium commit.