Bug 31330 - Expose dtoa() for ECMA-262
Summary: Expose dtoa() for ECMA-262
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 27451
  Show dependency treegraph
 
Reported: 2009-11-10 21:34 PST by Kent Tamura
Modified: 2009-11-18 19:14 PST (History)
2 users (show)

See Also:


Attachments
Proposed patch (rev.2) (8.43 KB, patch)
2009-11-10 21:36 PST, Kent Tamura
darin: review-
Details | Formatted Diff | Diff
Proposed patch (rev.3) (8.36 KB, patch)
2009-11-11 23:54 PST, Kent Tamura
darin: review-
Details | Formatted Diff | Diff
sunspider-compare-results output (3.74 KB, text/plain)
2009-11-11 23:55 PST, Kent Tamura
no flags Details
Proposed patch (rev.4) (11.39 KB, patch)
2009-11-16 20:29 PST, Kent Tamura
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kent Tamura 2009-11-10 21:34:13 PST
Move the code in UString::from(double) to WTF, and expose it in order to be used by HTML5 Forms code.
Comment 1 Kent Tamura 2009-11-10 21:36:05 PST
Created attachment 42923 [details]
Proposed patch (rev.2)

The difference from the patch in Bug#27451 is "using WTF::ecma262dtoa;" in dtoa.h.
Comment 2 Oliver Hunt 2009-11-11 12:25:15 PST
Comment on attachment 42923 [details]
Proposed patch (rev.2)

Can you please verify that you haven't regressed SunSpider performance?
Comment 3 Darin Adler 2009-11-11 16:06:58 PST
Comment on attachment 42923 [details]
Proposed patch (rev.2)

>  UString UString::from(double d)
>  {
> +    return UString(ecma262dtoa(d));
>  }

The above change will slow down JavaScriptCore, because it makes an extra copy of the buffer. The old version translated directly from a char buffer to a UChar buffer. The new code instead goes from a char buffer to a UChar buffer and then copies the buffer again to put it into the UString. I don't see an obvious way to avoid this as long as the dtoa function returns a char buffer and the ecma262dtoa function returns a UChar buffer. But it's unacceptable to slow things down without doing performance tests to prove this is not a significant slowdown for JavaScript execution.

We should investigate other solutions that don't require allocating memory and copying the string an extra time.

If we can figure out a hard limit for the longest a double could be, then we could make the function put its result into a C-style array so there is no extra memory allocation.

> +static ALWAYS_INLINE void appendAscii(const char* src, unsigned size, Vector<UChar, 80>& dst)

Since ASCII is an acronym, we should always capitalize it as ASCII.

I’d like to see a typedef for the vector type here at the top of the file so the magic number 80 isn’t in three different places.
Comment 4 Kent Tamura 2009-11-11 20:11:34 PST
ecma262dtoa() uses Vector<> to allocate a buffer and it means using a heap allocation though the original code uses a local array.  This also can cause performance regression.

Definitely I need to change the interface of ecma262dtoa().
Comment 5 Kent Tamura 2009-11-11 23:54:10 PST
Created attachment 43034 [details]
Proposed patch (rev.3)

* Changed the function signature to "unsigned ecma262dtoa(double d, char* buf)"
* Avoid strlen()
* Avoid array index arithmetics
Comment 6 Kent Tamura 2009-11-11 23:55:29 PST
Created attachment 43035 [details]
sunspider-compare-results output
Comment 7 Darin Adler 2009-11-12 09:36:48 PST
Comment on attachment 43034 [details]
Proposed patch (rev.3)

>  UString UString::from(double d)
>  {
>      char buf[80];
> +    unsigned len = ecma262dtoa(d, buf);
> +    buf[len] = 0;
>      return UString(buf);
>  }

We could make this even more efficient by making a new UString constructor that takes a const char* and a length to avoid the unnecessary strlen. That's not strictly necessary, but sure would be nice.

>  #include <wtf/Assertions.h>
>  #include <wtf/FastMalloc.h>
> +//#include <wtf/MathExtras.h>
>  #include <wtf/Vector.h>
>  #include <wtf/Threading.h>

Why the added-but-commented-out include?

> +static ALWAYS_INLINE char* append(char* next, const char* src, unsigned size)
> +{
> +    for (unsigned i = 0; i < size; ++i)
> +        *next++ = static_cast<unsigned char>(*src++);
> +    return next;
> +}
> +
> +static ALWAYS_INLINE char* append(char* next, const char* src)
> +{
> +    while (*src)
> +        *next++ = static_cast<unsigned char>(*src++);
> +    return next;
> +}

I would have made the "next" argument be a char*& rather than using a return value.

Why the casts to unsigned char? Those are not at all helpful.

> +    // dtoa() for ECMA-262 'ToString Applied to the Number Type.'
> +    // The buffer pointed by buf must have at least 80 elements.

I'd like to see a constant with that value 80. It's not good to have a comment in one file and then the number typed in some other source file. Maybe something like this?

    typedef char dtoaECMA262Buffer[80];

> +    // The return value is the length of the resultant string in buf.
> +    // The resultant string isn't terminated by 0.
> +    unsigned ecma262dtoa(double d, char* buf);

I don't think it's great to name the function dtoa when it returns a length. Maybe the length could be an out argument instead. Or somehow we could name the function in a way that makes it clear its return value is the length of the string.

It might be better to give this a name with a nicer format.I'm not sure having ECMA262 in the name is good. The colloquial name for this is JavaScript, so maybe this super-long name would be OK, or maybe some shorter variant.

    doubleToStringInJavaScriptFormatReturningLength

The argument name "d" doesn't add anything here, so I'd suggest leaving it out. The argument name "buf" could instead be a word, "buffer".

It would be clearer if the buffer argument was declared as an 80-character array rather than a char*. Because of how C handles arrays, the two are probably equivalent at runtime, but declaring it that way is a clearer way to indicate it needs the 80 characters than just the comment.

review- mainly because of that stray commented out include, but please consider my other comments and suggestions
Comment 8 Kent Tamura 2009-11-16 20:29:36 PST
Created attachment 43340 [details]
Proposed patch (rev.4)

Thank you for reviewing.  I have updated the patch.

(In reply to comment #7)
> > +    unsigned len = ecma262dtoa(d, buf);
> > +    buf[len] = 0;
> >      return UString(buf);
> >  }
> 
> We could make this even more efficient by making a new UString constructor that
> takes a const char* and a length to avoid the unnecessary strlen. That's not
> strictly necessary, but sure would be nice.

I added a UString(const char*, unsigned) and createRep(ditto).

> > +//#include <wtf/MathExtras.h>
> >  #include <wtf/Vector.h>
> >  #include <wtf/Threading.h>
> 
> Why the added-but-commented-out include?

Oops, removed.

> > +static ALWAYS_INLINE char* append(char* next, const char* src, unsigned size)
> > +static ALWAYS_INLINE char* append(char* next, const char* src)
> 
> I would have made the "next" argument be a char*& rather than using a return
> value.

done.

> Why the casts to unsigned char? Those are not at all helpful.

Removed.  It was meaningful when the destination was UChar*.

> > +    // dtoa() for ECMA-262 'ToString Applied to the Number Type.'
> > +    // The buffer pointed by buf must have at least 80 elements.
> 
> I'd like to see a constant with that value 80. It's not good to have a comment
> in one file and then the number typed in some other source file. Maybe
> something like this?
> 
>     typedef char dtoaECMA262Buffer[80];

Introduced DtoaBuffer, which is for both of dtoa() and this.

> > +    // The return value is the length of the resultant string in buf.
> > +    // The resultant string isn't terminated by 0.
> > +    unsigned ecma262dtoa(double d, char* buf);
> 
> I don't think it's great to name the function dtoa when it returns a length.
> Maybe the length could be an out argument instead. Or somehow we could name the
> function in a way that makes it clear its return value is the length of the
> string.
> 
> It might be better to give this a name with a nicer format.I'm not sure having
> ECMA262 in the name is good. The colloquial name for this is JavaScript, so
> maybe this super-long name would be OK, or maybe some shorter variant.
> 
>     doubleToStringInJavaScriptFormatReturningLength
>
> The argument name "d" doesn't add anything here, so I'd suggest leaving it out.
> The argument name "buf" could instead be a word, "buffer".

I changed it to:
  void doubleToStringInJavaScriptFormat(double, DtoaBuffer buffer, unsigned* resultLength);
Comment 9 Darin Adler 2009-11-18 10:55:57 PST
Comment on attachment 43340 [details]
Proposed patch (rev.4)

> +    UChar* d;
> +    if (!allocChars(length).getValue(d))
> +        return &UString::Rep::null();
> +    else {
> +        for (unsigned i = 0; i < length; i++)
> +            d[i] = static_cast<unsigned char>(c[i]); // use unsigned char to zero-extend instead of sign-extend
> +        return UString::Rep::create(d, static_cast<int>(length));
> +    }
> +
> +}

Normally we do early return, not else after return. There's also an extra blank line here.

Is the static_cast<int> really needed?

I suggest marking this createRep function, which is used in only one place, inline.

>          UString();
> +        // Constructor for null-terminated ASCII string.
>          UString(const char*);
> +        // Constructor for non-null-terminated ASCII string.
> +        UString(const char*, unsigned length);
>          UString(const UChar*, int length);
>          UString(UChar*, int length, bool copy);

Given that the other constructors take int for length, why did you chose unsigned? If this was a new class, I would probably use size_t or perhaps unsigned, but typically we just keep new consistent with old, locally.

> +    // dtoa() for ECMA-262 'ToString Applied to the Number Type.'
> +    // The *resultLength will have the length of the resultant string in bufer.
> +    // The resultant string isn't terminated by 0.
> +    void doubleToStringInJavaScriptFormat(double, DtoaBuffer buffer, unsigned* resultLength);
>  } // namespace WTF

Probably would be slightly nicer to have a blank line before the closing brace. Could omit the name "buffer" since the type name already says that.

r=me
Comment 10 Kent Tamura 2009-11-18 19:06:23 PST
I'll land this manually with the following changes.

(In reply to comment #9)
> (From update of attachment 43340 [details])
> > +    UChar* d;
> > +    if (!allocChars(length).getValue(d))
> > +        return &UString::Rep::null();
> > +    else {
> > +        for (unsigned i = 0; i < length; i++)
> > +            d[i] = static_cast<unsigned char>(c[i]); // use unsigned char to zero-extend instead of sign-extend
> > +        return UString::Rep::create(d, static_cast<int>(length));
> > +    }
> > +
> > +}
> 
> Normally we do early return, not else after return. There's also an extra blank
> line here.

Fixed them.

> Is the static_cast<int> really needed?

Changed the unsigned parameter to int.  So it is not needed.

> I suggest marking this createRep function, which is used in only one place,
> inline.

Done.

> > +        UString(const char*, unsigned length);
> >          UString(const UChar*, int length);
> >          UString(UChar*, int length, bool copy);
> 
> Given that the other constructors take int for length, why did you chose
> unsigned? If this was a new class, I would probably use size_t or perhaps
> unsigned, but typically we just keep new consistent with old, locally.

Change it to int.

> > +    void doubleToStringInJavaScriptFormat(double, DtoaBuffer buffer, unsigned* resultLength);
> >  } // namespace WTF
> 
> Probably would be slightly nicer to have a blank line before the closing brace.

Done.

> Could omit the name "buffer" since the type name already says that.

Done.
Comment 11 Kent Tamura 2009-11-18 19:13:52 PST
Committed r51168: <http://trac.webkit.org/changeset/51168>