<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>37327</bug_id>
          
          <creation_ts>2010-04-09 06:01:44 -0700</creation_ts>
          <short_desc>String::format() does not support UTF-8 on all platforms, yet used with UTF-8 strings</short_desc>
          <delta_ts>2025-04-08 14:03:40 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>JavaScriptCore</component>
          <version>528+ (Nightly build)</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>NEW</bug_status>
          <resolution></resolution>
          
          <see_also>https://bugs.webkit.org/show_bug.cgi?id=30342</see_also>
    
    <see_also>https://bugs.webkit.org/show_bug.cgi?id=48463</see_also>
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Andrey Kosyakov">caseq</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>ademar</cc>
    
    <cc>ap</cc>
    
    <cc>darin</cc>
    
    <cc>nikkamy</cc>
    
    <cc>pfeldman</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>210693</commentid>
    <comment_count>0</comment_count>
    <who name="Andrey Kosyakov">caseq</who>
    <bug_when>2010-04-09 06:01:44 -0700</bug_when>
    <thetext>String::format() creates resulting string applying StringImpl::create() to narrow char buffer resulting from vsnprintf(). StringImpl::create() treats input data as ASCII, performing conversion to UChars by simply expanding bytes to words, thus mangling whatever UTF-8 strings might result from vsnprintf. The below is an incomplete list of calls where we pass UTF-8 data to String::format():

&gt;find . -type d -name .svn -prune -o -type f | xargs grep String::format.*utf8

./inspector/InspectorController.cpp:    String message = String::format(&quot;Profile \&quot;webkit-profile://%s/%s#%d\&quot; finished.&quot;, CPUProfileType, encodeWithURLEscapeSequences(profile-&gt;title()).utf8().data(), profile-&gt;uid());
./inspector/InspectorController.cpp:    String message = String::format(&quot;Profile \&quot;webkit-profile://%s/%s#0\&quot; started.&quot;, CPUProfileType, encodeWithURLEscapeSequences(title).utf8().data());
./inspector/InspectorController.cpp:    String identifier = title + String::format(&quot;@%s:%d&quot;, sourceID.utf8().data(), lineNumber);
./inspector/InspectorController.cpp:    String message = String::format(&quot;%s: %d&quot;, title.utf8().data(), count);
./page/XSSAuditor.cpp:        String consoleMessage = String::format(&quot;Refused to load an object. URL found within request: \&quot;%s\&quot;.\n&quot;, url.utf8().data());
./platform/graphics/cg/ImageBufferCG.cpp:    return String::format(&quot;data:%s;base64,%s&quot;, mimeType.utf8().data(), out.data());
./platform/graphics/qt/ImageBufferQt.cpp:    return String::format(&quot;data:%s;base64,%s&quot;, mimeType.utf8().data(), data.toBase64().data());

Note that some of the above may be harmless, as utf8() is called on the string that is supposed to be in ASCII subset.
I suggest we introduce a version of format() that uses String::fromUTF8() to produce resulting wide string and replace the calls above with the calls to UTF8-aware version.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>300536</commentid>
    <comment_count>1</comment_count>
    <who name="Ademar Reis">ademar</who>
    <bug_when>2010-10-27 14:02:56 -0700</bug_when>
    <thetext>String::format() implementation is very platform dependent and solving this problem is quite tricky. See bug 18994 for a long discussion on the pitfalls involved and also bug 48463 specifically for the Qt port.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>300570</commentid>
    <comment_count>2</comment_count>
    <who name="Ademar Reis">ademar</who>
    <bug_when>2010-10-27 14:22:46 -0700</bug_when>
    <thetext>I&apos;ve just noticed that the plan is to eliminate String::format() completely. See bug 30342 :-)

I suggest closing this as WONTFIX.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1501641</commentid>
    <comment_count>3</comment_count>
    <who name="Darin Adler">darin</who>
    <bug_when>2019-02-03 11:17:30 -0800</bug_when>
    <thetext>This is a portability problem.

A bit of a blindspot for a lot of the project leaders who work at Apple because on Apple’s platforms the underlying C library calls pass through bytes without interpretation and so work fine with UTF-8 strings.

It’s possible we can fix the implementation for those other platforms; I’m not sure what the status of this is.

I suspect this won’t be completely resolved until we stop using C library functions that use format strings entirely (both with String::format and elsewhere), but it’s also true that many of the call sites use the utf8() function on strings that are processed in a way that already guarantees they have only ASCII characters, such as MIME types.

While we do want to get rid of String::format, it might take us a while.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>