Bug 96749

Summary: WTFString::show doesn't dump non-ASCII characters in a readable manner
Product: WebKit Reporter: Glenn Adams <glenn>
Component: Web Template FrameworkAssignee: Glenn Adams <glenn>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, benjamin, darin, seikwon.kim, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 96873    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
address comments in comment #4
none
fix release build
none
address comment #20 none

Description Glenn Adams 2012-09-14 03:45:40 PDT
To repeat: in debugger, invoke WTFString::show on a String that contains a non-ASCII character.  That non-ASCII character is dumped as '?' which isn't useful for debugging. It should dump a reasonable ASCII representation instead, e.g., using the \uXXXX syntax.
Comment 1 Glenn Adams 2012-09-14 03:53:35 PDT
Created attachment 164093 [details]
Patch
Comment 2 Glenn Adams 2012-09-14 04:54:20 PDT
Comment on attachment 164093 [details]
Patch

suggest benjaminp review
Comment 3 Alexey Proskuryakov 2012-09-14 10:19:24 PDT
Comment on attachment 164093 [details]
Patch

We have existing functionality for escaping strings in WebCore/platform/text/TextCodec.h and in window.escape function implementation. I suggest either reusing existing code, or at least keeping implementations close together.
Comment 4 Benjamin Poulain 2012-09-14 12:48:33 PDT
Comment on attachment 164093 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=164093&action=review

> Source/WTF/wtf/text/WTFString.cpp:1141
> -    buffer.resize(length + 1);
> -    for (unsigned i = 0; i < length; ++i) {
> +    unsigned bufferNeeds = 1;
> +    for (unsigned i = 0; i < length; i++) {
>          UChar ch = characters[i];
> -        buffer[i] = ch && (ch < 0x20 || ch > 0x7f) ? '?' : ch;
> +        bufferNeeds += !ch || ch < 0x20 || ch > 0x7f ? 6 : 1;
> +    }
> +
> +    buffer.resize(bufferNeeds);
> +    unsigned bufferUsed = 0;

The name bufferNeeds is not good enough, it poorly maps to the purpose of the variable.

This is debug code, it should be simple to read. Drop the first loop, use Vector::append() instead of handling the size manually. 

Why would you change ++i to i++?

> Source/WTF/wtf/text/WTFString.cpp:1143
> +        UChar ch = characters[i];

I would drop impl->characters() before and use (*impl)[i] here.

> Source/WTF/wtf/text/WTFString.cpp:1144
> +        if (!ch || ch < 0x20 || ch > 0x7f) {

We have isASCII() for that.

> Source/WTF/wtf/text/WTFString.cpp:1150
> +            for (unsigned k = 0; k < 4; k++, ch <<= 4) {
> +                static const char hexDigits[] = "0123456789ABCDEF";
> +                buffer[bufferUsed++] = hexDigits[(ch >> 12) & 0xF];
> +            }

We have functions for that in HexNumbers.h
Comment 5 Glenn Adams 2012-09-14 17:49:45 PDT
(In reply to comment #3)
> (From update of attachment 164093 [details])
> We have existing functionality for escaping strings in WebCore/platform/text/TextCodec.h and in window.escape function implementation. I suggest either reusing existing code, or at least keeping implementations close together.

understood, but perhaps WTF should not be dependent on WebCore? yes?
Comment 6 Benjamin Poulain 2012-09-14 18:05:50 PDT
> > We have existing functionality for escaping strings in WebCore/platform/text/TextCodec.h and in window.escape function implementation. I suggest either reusing existing code, or at least keeping implementations close together.
> 
> understood, but perhaps WTF should not be dependent on WebCore? yes?

That is not what Alexey suggested :)

WTF should not depend on WebCore, but it is common for features to move from JavaScriptCore and WebCore to WTF.
In this case, I am not entirely convinced it is necessary, but I haven't looked at TextCodec to see what can be done.
Comment 7 Glenn Adams 2012-09-14 19:18:01 PDT
Created attachment 164263 [details]
address comments in comment #4
Comment 8 Glenn Adams 2012-09-14 19:21:50 PDT
(In reply to comment #6)
> > > We have existing functionality for escaping strings in WebCore/platform/text/TextCodec.h and in window.escape function implementation. I suggest either reusing existing code, or at least keeping implementations close together.
> > 
> > understood, but perhaps WTF should not be dependent on WebCore? yes?
> 
> That is not what Alexey suggested :)
> 
> WTF should not depend on WebCore, but it is common for features to move from JavaScriptCore and WebCore to WTF.
> In this case, I am not entirely convinced it is necessary, but I haven't looked at TextCodec to see what can be done.

in this present case, my goal is simply to have WTFString::show do something more sensible with !isASCIIPrintable UChars; might I suggest another BZ be filed if there is a desire to do more (such as move other related WebCore functions into WTFString et al.)
Comment 9 Benjamin Poulain 2012-09-14 21:05:20 PDT
> in this present case, my goal is simply to have WTFString::show do something more sensible with !isASCIIPrintable UChars; might I suggest another BZ be filed if there is a desire to do more (such as move other related WebCore functions into WTFString et al.)

WebKit has a shared responsibility codebase. As such, if there is a better patch to be done, it should be done.
Letting someone else clean after you is not a great option...

I'll have a look at the patch.
Comment 10 Alexey Proskuryakov 2012-09-14 21:06:38 PDT
> In this case, I am not entirely convinced it is necessary, but I haven't looked at TextCodec to see what can be done.

Ditto. Having code like this in three places surely feels wrong though.
Comment 11 Benjamin Poulain 2012-09-14 21:11:47 PDT
Comment on attachment 164263 [details]
address comments in comment #4

Looks good.

I don't think there is enough advantages to share the code with TextCodec.
Comment 12 WebKit Review Bot 2012-09-14 21:38:53 PDT
Comment on attachment 164263 [details]
address comments in comment #4

Clearing flags on attachment: 164263

Committed r128682: <http://trac.webkit.org/changeset/128682>
Comment 13 WebKit Review Bot 2012-09-14 21:38:57 PDT
All reviewed patches have been landed.  Closing bug.
Comment 14 Glenn Adams 2012-09-14 23:31:50 PDT
(In reply to comment #9)
> > in this present case, my goal is simply to have WTFString::show do something more sensible with !isASCIIPrintable UChars; might I suggest another BZ be filed if there is a desire to do more (such as move other related WebCore functions into WTFString et al.)
> 
> WebKit has a shared responsibility codebase. As such, if there is a better patch to be done, it should be done.
> Letting someone else clean after you is not a great option...
> 
> I'll have a look at the patch.

Understood. I'd be happy to work on "a better patch" or an additional patch if you or Alexey believe it is worth pursuing, in which case please give me a few hints of what you would like to see done.

I agree one shouldn't leave around detritus to be cleaned up, and hope I'm not doing that here.

Thanks for landing this one!
Comment 15 Mark Rowe (bdash) 2012-09-15 00:01:30 PDT
This broke all of the Mac release builds, so I rolled it out in r128684.
Comment 17 Glenn Adams 2012-09-15 00:33:32 PDT
Created attachment 164277 [details]
fix release build
Comment 18 Glenn Adams 2012-09-15 01:08:58 PDT
Comment on attachment 164277 [details]
fix release build

mea culpa; looks like HexNumber.h can't be included without a template expansion that references hexDigitsForMode(); this patch update fixes the Mac builds problem by conditionalizing the inclusion
Comment 19 Benjamin Poulain 2012-09-15 01:22:06 PDT
Comment on attachment 164277 [details]
fix release build

I would prefer HexNumber.h modified so that does not happen again.
Comment 20 Darin Adler 2012-09-16 08:30:30 PDT
It’s not so great to dump in an ambiguous format. Probably need to change "\" into "\\" if we’re also going to use "\" as a metacharacter.
Comment 21 Glenn Adams 2012-09-16 21:18:21 PDT
(In reply to comment #20)
> It’s not so great to dump in an ambiguous format. Probably need to change "\" into "\\" if we’re also going to use "\" as a metacharacter.

Good point (even though this is primarily intended for human reading, where it is unlikely to be mistaken). I'll make this fix and r? after someone lands the depends-on patch at [1].

[1] https://bugs.webkit.org/attachment.cgi?id=164304&action=review
Comment 22 Glenn Adams 2012-09-17 23:59:26 PDT
Created attachment 164502 [details]
address comment #20
Comment 23 Glenn Adams 2012-09-18 01:58:56 PDT
Comment on attachment 164502 [details]
address comment #20

all comments addressed, please review and land
Comment 24 WebKit Review Bot 2012-09-18 10:57:10 PDT
Comment on attachment 164502 [details]
address comment #20

Clearing flags on attachment: 164502

Committed r128908: <http://trac.webkit.org/changeset/128908>
Comment 25 WebKit Review Bot 2012-09-18 10:57:14 PDT
All reviewed patches have been landed.  Closing bug.