Bug 270063 - [WebDriver][socket] Titles containing multibyte characters cannot be retrieved correctly
Summary: [WebDriver][socket] Titles containing multibyte characters cannot be retrieve...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebDriver (show other bugs)
Version: WebKit Nightly Build
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2024-02-25 20:07 PST by haruhisa.shin
Modified: 2024-06-06 00:09 PDT (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description haruhisa.shin 2024-02-25 20:07:29 PST
This is a problem with the WebDriver of socket implementation.
I have confirmed this problem with wincairo and playstation.

If the document title contains multibyte characters such as Japanese or entity references, the "Get Title" result will be garbled.

For example:
<title>foobar&reg;</title>
<title>日本語</title>

The title is obtained by JavaScript's "document.title()" and is UTF-8 encoded in WebDriverService::sendResponse.
https://github.com/WebKit/WebKit/blob/main/Source/WebDriver/WebDriverService.cpp#L332

However, when the result is concatenated in HttpServer, StringBuilder.append decodes the characters with fromLatin1().
This seems to be causing the multibyte characters to be garbled.
https://github.com/WebKit/WebKit/blob/main/Source/WebDriver/socket/HTTPServerSocket.cpp#L131
https://github.com/WebKit/WebKit/blob/main/Source/WTF/wtf/text/StringBuilder.h#L233

I think that String::fromUTF8() should be used before concatenating strings.
Comment 1 haruhisa.shin 2024-02-25 22:29:07 PST
Pull request: https://github.com/WebKit/WebKit/pull/25085
Comment 2 Radar WebKit Bug Importer 2024-03-03 20:08:14 PST
<rdar://problem/123987149>
Comment 3 EWS 2024-06-06 00:09:40 PDT
Committed 279767@main (0a3175f4f36a): <https://commits.webkit.org/279767@main>

Reviewed commits have been landed. Closing PR #25085 and removing active labels.