Summary: | TextCodec should treat lone surrogates as the replacement character | ||||||
---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Andreu Botella <abotella> | ||||
Component: | WebCore Misc. | Assignee: | Nobody <webkit-unassigned> | ||||
Status: | NEW --- | ||||||
Severity: | Normal | CC: | achristensen, ahmad.saleem792, annevk, mmaxfield, webkit-bug-importer, ysuzuki | ||||
Priority: | P2 | Keywords: | InRadar | ||||
Version: | WebKit Nightly Build | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Bug Depends on: | 254888 | ||||||
Bug Blocks: | 179303 | ||||||
Attachments: |
|
Description
Andreu Botella
2022-01-17 17:46:23 PST
*** Firefox Nightly 109 *** href attribute of link is: "?a\ud800b" (should be "?a\ud800b") href property of link is: "https://bug-235307-attachments.webkit.org/attachment.cgi?a%EF%BF%BDb" (should end in "?a%26%2365533%3Bb") *** Chrome Canary 110 *** href attribute of link is: "?a\ud800b" (should be "?a\ud800b") href property of link is: "https://bug-235307-attachments.webkit.org/attachment.cgi?a%EF%BF%BDb" (should end in "?a%26%2365533%3Bb") *** Safari 16.1 *** href attribute of link is: "?a\ud800b" (should be "?a\ud800b") href property of link is: "https://bug-235307-attachments.webkit.org/attachment.cgi?a%EF%BF%BDb" (should end in "?a%26%2365533%3Bb") _______- All browsers are matching or I am testing it wrong? JSFiddle - https://jsfiddle.net/b50n7e2s/ (same test but took from Chrome / Blink bug). That test doesn't seem to test windows-1252 (due to JSFiddle forcing UTF-8), but when actually testing windows-1252 all browsers seem to agree as well: https://github.com/web-platform-tests/wpt/pull/37250. However, 1. Comment 0 also describes a problem on Windows that might still exist. 2. Code inspection shows that https://github.com/WebKit/WebKit/blob/5e81d33ff5c0150dbabbebbe2e96fb08ff4d6ad3/Source/WebCore/PAL/pal/text/TextCodecUTF8.cpp#L461-L472 does not do surrogate handling. (Also, if as comment 0 suggests this is somehow intentional, which I suspect it's not, it shouldn't be called UTF-8.) Hmm, I'm no longer convinced there's a problem here. Especially since Windows is no longer targeted. Andreu, what do you think? I was wrong about the non-UTF-8 encoders: https://github.com/web-platform-tests/wpt/pull/39324. I created bug 179303 to fix that. Keeping this open to find out if the UTF-8 issue is exposed somewhere. |