Bug 292925

Summary:	[WTR] Replace invalid UTF-8 bytes instead of crashing
Product:	WebKit	Reporter:	Alicia Boya García <aboya>
Component:	Tools / Tests	Assignee:	Alicia Boya García <aboya>
Status:	RESOLVED FIXED
Severity:	Normal	CC:	webkit-bug-importer
Priority:	P2	Keywords:	InRadar
Version:	WebKit Nightly Build
Hardware:	Unspecified
OS:	Unspecified

Alicia Boya García

Reported 2025-05-13 04:44:54 PDT

Currently, if invalid UTF-8 is printed to stderr, the test runner crashes with an error like this: UnicodeDecodeError raised: 'utf-8' codec can't decode byte 0xd6 in position 179: invalid continuation byte This is particularly a problem when running tests with environment variables used by various libraries for debugging, as any improper encoding will not only crash, but leave you with very few cues of what caused it. This patch makes the test runner code that reads stderr use errors="replace" when decoding UTF-8: any invalid UTF-8 sequences will be replaced by U+FFFD � REPLACEMENT CHARACTER. This allows users to continue debugging in the presence of invalid UTF-8 in stderr logs. Any invalid UTF-8 sequences can still be found by searching for the replacement character. The specific invalid sequence is lost. Personally, I would prefer if stderr was collected as a bytestring so that the -stderr.txt file contained byte-by-byte match of what the test runner emitted, but the refactor necessary to be able to accomplish that is outside of the scope of this patch.

Attachments
Add attachment proposed patch, testcase, etc.

Alicia Boya García

Comment 1 2025-05-13 04:45:47 PDT

Pull request: https://github.com/WebKit/WebKit/pull/45307

EWS

Comment 2 2025-05-19 17:57:03 PDT

Committed 295135@main (3d5b2ebc6280): <https://commits.webkit.org/295135@main> Reviewed commits have been landed. Closing PR #45307 and removing active labels.

Radar WebKit Bug Importer

Comment 3 2025-05-19 18:02:14 PDT

<rdar://problem/151656136>

Note You need to log in before you can comment on or make changes to this bug.