Bug 192746 - Convert additional String::format clients to alternative approaches
Summary: Convert additional String::format clients to alternative approaches
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Darin Adler
URL:
Keywords: InRadar
Depends on:
Blocks: 30342
  Show dependency treegraph
 
Reported: 2018-12-16 13:46 PST by Darin Adler
Modified: 2019-02-03 11:19 PST (History)
12 users (show)

See Also:


Attachments
Patch (42.36 KB, patch)
2018-12-16 14:01 PST, Darin Adler
no flags Details | Formatted Diff | Diff
Patch (42.48 KB, patch)
2018-12-16 15:33 PST, Darin Adler
ap: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Darin Adler 2018-12-16 13:46:04 PST
Convert additional String::format clients to alternative approaches
Comment 1 Darin Adler 2018-12-16 14:01:11 PST
Created attachment 357420 [details]
Patch
Comment 2 Darin Adler 2018-12-16 15:33:17 PST
Created attachment 357422 [details]
Patch
Comment 3 Darin Adler 2018-12-19 18:45:00 PST
Anyone interested in reviewing?
Comment 4 Alexey Proskuryakov 2018-12-21 10:11:12 PST
Comment on attachment 357422 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=357422&action=review

Looks good. Some ranty nitpicks below.

> Source/WTF/wtf/JSONValues.cpp:449
> +inline bool appendDoubleQuotedStringEscapedCharacter(StringBuilder& builder, UChar character)

This function name doesn't quite parse for me. Maybe appendCharacterEscapingForDoubleQuotedStringIfNeeded?

> Source/WTF/wtf/JSONValues.cpp:489
> +        // FIXME: This is likely incorrect for surrogate pairs.

Seems worth elaborating on what the action for this FIXME is. "Likely" is too weak of a statement, as it's obviously never correct to serialize a code unit into two \u escape sequences.

But I'm not sure if it's observable, maybe callers guarantee that strings here don't contain surrogate pairs? It just feels unlikely that a bug of this magnitude could have survived so long in the era of Emoji if it were a real bug. In this case, I'm not sure what the action would be, maybe to rename the function so that it wouldn't be misused in the future.

FWIW, I played with JSON.stringify, and couldn't get two \u sequences out of it. But I didn't check if it takes this code path.

> Source/WTF/wtf/JSONValues.cpp:494
> +        builder.appendLiteral("\\u");
> +        builder.append(upperNibbleToASCIIHexDigit(codeUnit >> 8));
> +        builder.append(lowerNibbleToASCIIHexDigit(codeUnit >> 8));
> +        builder.append(upperNibbleToASCIIHexDigit(codeUnit));
> +        builder.append(lowerNibbleToASCIIHexDigit(codeUnit));

I never looked into StringBuilder performance before, but a brief glance over it doesn't fill me with confidence that appending characters one by one is performant. This class over-allocates the buffer, but it's still a non-inlined function call for every character.

Probably no worse than String::format though.

> Source/WebCore/Modules/indexeddb/IDBKeyData.cpp:367
> +        return makeString("<date> - ", FormattedNumber::fixedWidth(WTF::get<double>(m_value), 6));

Not new in this patch, but it's sad that we don't have a consistent story for the WTF namespace. Most of WTF code is designed to be used without an explicit namespace, and symbols are exported into global namespace right away. But that kind of thing wouldn't work with "get".

> Source/WebCore/PAL/pal/FileSizeFormatter.cpp:42
> +        return makeString(FormattedNumber::fixedWidth(size / 1000., 1), " KB"); // This should be lowercase k.

This lacks "FIXME". May be worth elaborating on what exactly the goal here is. Obviously, "kilo" prefix is lower case, but using upper case is quite traditional. Is there action for this FIXME to just make this change locally, or to switch the whole code base consistently?

> Source/WebCore/html/track/VTTCue.cpp:231
> -            String::format("translate(-%.2f%%, -%.2f%%)", position.first, position.second));
> +            makeString("translate(", FormattedNumber::fixedWidth(-position.first, 2), "%, ", FormattedNumber::fixedWidth(-position.second, 2), "%)"));

I think that negating is problem-free for floating point numbers that we can encounter here, but is it an improvement to do an arithmetic operation here before converting the number to string?

> Source/WebCore/page/cocoa/ResourceUsageOverlayCocoa.mm:434
> -        return String::format("%.1f kB", static_cast<double>(number) / 1024);
> +        return makeString(FormattedNumber::fixedWidth(number / 1024, 1), " kB");

FIXME: Should be "KiB" (and same for others)? Hate hate hate.

> Source/WebCore/page/scrolling/AxisScrollSnapOffsets.cpp:106
> +    return makeString("{{",

Hmm, why does StringBuilder need a separate appendLiteral function, but makeString gets an unadorned literal? Do we need ASCIILiteral here by any chance?

> Source/WebCore/platform/cocoa/KeyEventCocoa.mm:500
> +            return makeString("U+",

Is there any concern about non-BMP characters here?

> Source/WebKit/UIProcess/WebAuthentication/Cocoa/LocalAuthenticator.mm:475
> +    // FIXME: I don't think that formatting with %s and passing an NSObject from a dictionary will work.

Would you be willing to file a bug for this? I believe that this is new code, and fixing it should probably be tracked with more than a FIXME.

> Tools/WebKitTestRunner/TestController.cpp:2272
> +    if (auto port = WKSecurityOriginGetPort(origin)) {

This is one of the cases where I think that the type matters, and "auto" takes away from the reader. I want to know whether it's an unsigned short or something else when reading this.

> Tools/WebKitTestRunner/TestController.cpp:2274
> +        // FIXME: Use static_cast<int> to work around makeString's limitation when formatting numbers that are the same type as UChar.
> +        return makeString(protocol, "://", host, ':', static_cast<int>(port));

What is the FIXME action? It says "use", but the code already does that.

I'm guessing that it's about removing static_cast once there is a better way, but this seems like a strange place to motivate creating that better way.
Comment 5 Darin Adler 2018-12-21 13:54:27 PST
Comment on attachment 357422 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=357422&action=review

>> Source/WTF/wtf/JSONValues.cpp:449
>> +inline bool appendDoubleQuotedStringEscapedCharacter(StringBuilder& builder, UChar character)
> 
> This function name doesn't quite parse for me. Maybe appendCharacterEscapingForDoubleQuotedStringIfNeeded?

I don’t love the name you suggested. Let me think and see if I can come up with a name I love.

>> Source/WTF/wtf/JSONValues.cpp:489
>> +        // FIXME: This is likely incorrect for surrogate pairs.
> 
> Seems worth elaborating on what the action for this FIXME is. "Likely" is too weak of a statement, as it's obviously never correct to serialize a code unit into two \u escape sequences.
> 
> But I'm not sure if it's observable, maybe callers guarantee that strings here don't contain surrogate pairs? It just feels unlikely that a bug of this magnitude could have survived so long in the era of Emoji if it were a real bug. In this case, I'm not sure what the action would be, maybe to rename the function so that it wouldn't be misused in the future.
> 
> FWIW, I played with JSON.stringify, and couldn't get two \u sequences out of it. But I didn't check if it takes this code path.

I believe this code is currently used only for our web development tools, and nothing that is part of the web platform.

>> Source/WTF/wtf/JSONValues.cpp:494
>> +        builder.append(lowerNibbleToASCIIHexDigit(codeUnit));
> 
> I never looked into StringBuilder performance before, but a brief glance over it doesn't fill me with confidence that appending characters one by one is performant. This class over-allocates the buffer, but it's still a non-inlined function call for every character.
> 
> Probably no worse than String::format though.

Seems like something we can return to. We could make a more efficient overload of append that can take a list of characters; maybe initializer_list<char>.

>> Source/WebCore/PAL/pal/FileSizeFormatter.cpp:42
>> +        return makeString(FormattedNumber::fixedWidth(size / 1000., 1), " KB"); // This should be lowercase k.
> 
> This lacks "FIXME". May be worth elaborating on what exactly the goal here is. Obviously, "kilo" prefix is lower case, but using upper case is quite traditional. Is there action for this FIXME to just make this change locally, or to switch the whole code base consistently?

I know that it’s traditional, but I am suggesting we consider changing it, especially in a function designed for reuse. I wouldn’t call it a "FIXME". I would rather remove the comment entirely than add a FIXME.

I think I’ll remove the comment.

>> Source/WebCore/html/track/VTTCue.cpp:231
>> +            makeString("translate(", FormattedNumber::fixedWidth(-position.first, 2), "%, ", FormattedNumber::fixedWidth(-position.second, 2), "%)"));
> 
> I think that negating is problem-free for floating point numbers that we can encounter here, but is it an improvement to do an arithmetic operation here before converting the number to string?

Yes, I believe it is an improvement. For one thing, the old code depends on the number being non-negative. Prepending a "-" sign seems quite non-traditional and if more efficient, only extremely slightly so.

>> Source/WebCore/page/cocoa/ResourceUsageOverlayCocoa.mm:434
>> +        return makeString(FormattedNumber::fixedWidth(number / 1024, 1), " kB");
> 
> FIXME: Should be "KiB" (and same for others)? Hate hate hate.

Worth taking this up with whoever created this overlay. Simon perhaps?

>> Source/WebCore/page/scrolling/AxisScrollSnapOffsets.cpp:106
>> +    return makeString("{{",
> 
> Hmm, why does StringBuilder need a separate appendLiteral function, but makeString gets an unadorned literal? Do we need ASCIILiteral here by any chance?

Maybe some day; I don’t think should do it right now. I am not sure what the size/speed tradeoff is for passing the length rather than scanning for a null character.

>> Source/WebCore/platform/cocoa/KeyEventCocoa.mm:500
>> +            return makeString("U+",
> 
> Is there any concern about non-BMP characters here?

Might be worth researching that, but I won’t do that right now. At the moment this function takes a 16-bit integer, so it can’t be a non-BMP character, although it could be an unpaired surrogate.

>> Source/WebKit/UIProcess/WebAuthentication/Cocoa/LocalAuthenticator.mm:475
>> +    // FIXME: I don't think that formatting with %s and passing an NSObject from a dictionary will work.
> 
> Would you be willing to file a bug for this? I believe that this is new code, and fixing it should probably be tracked with more than a FIXME.

Sure.

>> Tools/WebKitTestRunner/TestController.cpp:2272
>> +    if (auto port = WKSecurityOriginGetPort(origin)) {
> 
> This is one of the cases where I think that the type matters, and "auto" takes away from the reader. I want to know whether it's an unsigned short or something else when reading this.

I guess I could just make it int, and then get rid of the static_cast<int> below.

>> Tools/WebKitTestRunner/TestController.cpp:2274
>> +        return makeString(protocol, "://", host, ':', static_cast<int>(port));
> 
> What is the FIXME action? It says "use", but the code already does that.
> 
> I'm guessing that it's about removing static_cast once there is a better way, but this seems like a strange place to motivate creating that better way.

I think the FIXME should probably be separate and say:

// FIXME: Consider changing the type to uint16_t once makeString can handle numbers that are the same type as UChar.
Comment 6 Darin Adler 2019-02-01 20:06:01 PST
Committed r240892: <https://trac.webkit.org/changeset/240892>
Comment 7 Radar WebKit Bug Importer 2019-02-01 20:06:27 PST
<rdar://problem/47758658>
Comment 8 Simon Fraser (smfr) 2019-02-02 11:20:53 PST
What's the impetus for these changes? Security? Performance? Code cleanup?
Comment 9 Darin Adler 2019-02-02 19:04:18 PST
Primarily security and its intersection with portability; hard to get the format specifiers right in all cases; there is type checking based on the format strings in some but not all of the modern compilers.

Also should give us gain in efficiency in most cases; we have seen in profiles that the formatting code is fairly costly and inefficient. But while makeString is quite efficient, in other cases the non-String::format replacement may not be written for maximum efficiency.

Some small style/maintainability benefit. There are a mix of techniques needed to correctly portably format types such as "uint64_t" that are different from the ones for "unsigned long long" and different in turn from the ones needed for "size_t". These aren’t all equally easy to read. With makeString, there is no need to learn any of these.

Also, some economy of implementation; we have and won’t be getting rid of our numeric formatting functions, and it would be good to not *also* depend on the ones in the underlying operating system.

Some costs, perhaps, too. Loss of a familiar idiom, primarily. And until we update logging macros to use a different approach, we will still have many of the costs above just for logging.

For a little background there are comments in bug 188191, bug 187955, and bug 192742.

Please do feel free to steer me in another direction if you think this isn’t a smart thing to be doing.
Comment 10 Darin Adler 2019-02-03 11:03:54 PST
Also found bug 30342, bug 37327, and bug 176035.
Comment 11 Darin Adler 2019-02-03 11:19:25 PST
I tried to gather a bit more information about this in bug 30342, relating the bugs better. The motivation that drove that oldest bug the most seems to be problems with UTF-8 processing in implementations of String::format on non-Apple platforms, which isn’t even one of the reasons I listed above.