Bug 37327 - String::format() does not support UTF-8 on all platforms, yet used with UTF-8 strings
Summary: String::format() does not support UTF-8 on all platforms, yet used with UTF-8...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-04-09 06:01 PDT by Andrey Kosyakov
Modified: 2019-02-03 11:17 PST (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey Kosyakov 2010-04-09 06:01:44 PDT
String::format() creates resulting string applying StringImpl::create() to narrow char buffer resulting from vsnprintf(). StringImpl::create() treats input data as ASCII, performing conversion to UChars by simply expanding bytes to words, thus mangling whatever UTF-8 strings might result from vsnprintf. The below is an incomplete list of calls where we pass UTF-8 data to String::format():

>find . -type d -name .svn -prune -o -type f | xargs grep String::format.*utf8

./inspector/InspectorController.cpp:    String message = String::format("Profile \"webkit-profile://%s/%s#%d\" finished.", CPUProfileType, encodeWithURLEscapeSequences(profile->title()).utf8().data(), profile->uid());
./inspector/InspectorController.cpp:    String message = String::format("Profile \"webkit-profile://%s/%s#0\" started.", CPUProfileType, encodeWithURLEscapeSequences(title).utf8().data());
./inspector/InspectorController.cpp:    String identifier = title + String::format("@%s:%d", sourceID.utf8().data(), lineNumber);
./inspector/InspectorController.cpp:    String message = String::format("%s: %d", title.utf8().data(), count);
./page/XSSAuditor.cpp:        String consoleMessage = String::format("Refused to load an object. URL found within request: \"%s\".\n", url.utf8().data());
./platform/graphics/cg/ImageBufferCG.cpp:    return String::format("data:%s;base64,%s", mimeType.utf8().data(), out.data());
./platform/graphics/qt/ImageBufferQt.cpp:    return String::format("data:%s;base64,%s", mimeType.utf8().data(), data.toBase64().data());

Note that some of the above may be harmless, as utf8() is called on the string that is supposed to be in ASCII subset.
I suggest we introduce a version of format() that uses String::fromUTF8() to produce resulting wide string and replace the calls above with the calls to UTF8-aware version.
Comment 1 Ademar Reis 2010-10-27 14:02:56 PDT
String::format() implementation is very platform dependent and solving this problem is quite tricky. See bug 18994 for a long discussion on the pitfalls involved and also bug 48463 specifically for the Qt port.
Comment 2 Ademar Reis 2010-10-27 14:22:46 PDT
I've just noticed that the plan is to eliminate String::format() completely. See bug 30342 :-)

I suggest closing this as WONTFIX.
Comment 3 Darin Adler 2019-02-03 11:17:30 PST
This is a portability problem.

A bit of a blindspot for a lot of the project leaders who work at Apple because on Apple’s platforms the underlying C library calls pass through bytes without interpretation and so work fine with UTF-8 strings.

It’s possible we can fix the implementation for those other platforms; I’m not sure what the status of this is.

I suspect this won’t be completely resolved until we stop using C library functions that use format strings entirely (both with String::format and elsewhere), but it’s also true that many of the call sites use the utf8() function on strings that are processed in a way that already guarantees they have only ASCII characters, such as MIME types.

While we do want to get rid of String::format, it might take us a while.