Bug 38914 - Start using ropes in String.prototype.replace
Summary: Start using ropes in String.prototype.replace
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Geoffrey Garen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-05-11 09:57 PDT by Geoffrey Garen
Modified: 2010-05-11 13:52 PDT (History)
2 users (show)

See Also:


Attachments
patch (21.13 KB, patch)
2010-05-11 09:57 PDT, Geoffrey Garen
oliver: review+
Details | Formatted Diff | Diff
patch (2.08 KB, patch)
2010-05-11 13:37 PDT, Geoffrey Garen
ggaren: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Geoffrey Garen 2010-05-11 09:57:44 PDT
Created attachment 55714 [details]
patch

Patch coming.
Comment 1 Oliver Hunt 2010-05-11 10:08:26 PDT
Comment on attachment 55714 [details]
patch

r=me
Comment 2 Oliver Hunt 2010-05-11 10:09:03 PDT
Comment on attachment 55714 [details]
patch

> +        (JSC::JSString::replaceCharacter): Added a replaceCharacter function, which creates
> +        a rope for the resuling replacement.
resulting
Comment 3 Darin Adler 2010-05-11 10:19:51 PDT
Comment on attachment 55714 [details]
patch

Looks good.

> -            unsigned fiberCountMinusOne = rope->fiberCount() - 1;
> +            unsigned fiberCountMinusOne = rope->size() - 1;

I think that size is a confusing name for a Rope member function, since a rope has both fibers and characters and it's not clear whether size is the number of fibers or number of characters. Maybe this should be named stringCount instead of fiberCount, but I think size leaves room for confusion.

> +    if (!isRope()) {
> +        size_t matchPos = m_value.find(character);
> +        if (matchPos == notFound)
> +            return JSValue(this);
> +        size_t matchEnd = matchPos + 1;
> +        return jsString(exec, m_value.substr(0, matchPos),
> +                              replacement,
> +                              m_value.substr(matchEnd, m_value.size() - matchEnd));

You can omit the length from the second substr call.

We don't format like this, lining subsequent lines up with the comma in the first line. If you break the line I suggest indenting only once. But I think this now fits on one line:

           return jsString(exec, m_value.substr(0, matchPos), replacement, m_value.substr(matchPos + 1));

You should do it like that.

> +    UStringImpl* match = 0;
> +    size_t matchPos = notFound;

I think that "match" should be named "matchString" and "matchPos" should be named "matchPosition".

> +    for (RopeIterator it(m_other.m_fibers, m_fiberCount); it != end; ++it) {
> +        ++fiberCount;
> +        if (match)
> +            continue;
> +
> +        UStringImpl* impl = *it;

I would call this local variable "string" rather than "impl".

> +            builder.append(UString(impl));

I don't think the cast to UString here should be required.

> +        size_t matchEnd = matchPos + 1;
> +        builder.append(UString(impl).substr(0, matchPos));
> +        if (replacement.size())
> +            builder.append(replacement);
> +        builder.append(UString(impl).substr(matchEnd, impl->length() - matchEnd));

No need to specify a length here. This should just be:

    builder.append(UString(impl).substr(matchPos + 1));

It's also unfortunate that UStringImpl* doesn't offer a substr operation.

> +                RopeIterator() { }

Why is this needed?

> +                UString::Rep* operator*()

The new name for the UString::Rep type is UStringImpl. I don't think we want to use UString::Rep in new code. Ask Gavin to be sure. Eventually it will just be StringImpl, but since that's stll in the WebCore namespace I guess we can't call it that yet.

> +                bool operator!=(const RopeIterator& other) const
> +                {
> +                    return m_workQueue != other.m_workQueue;
> +                }

Strange to have != and not ==, but I guess you just coded what you needed.

> +                    if (!m_workQueue.size())
> +                        return;

Normally we use isEmpty for this type of thing.

> +    size_t matchEnd = matchPos + matchLen;
> +    int ovector[2] = { matchPos, matchEnd };
> +    return jsString(exec, source.substr(0, matchPos),
> +                          substituteBackreferences(replacementString, source, ovector, 0),
> +                          source.substr(matchEnd, source.size() - matchEnd));

    return jsString(exec, source.substr(0.matchPos), substituteBackreferences(replacementString, source, ovector, 0), source.substr(matchPos + matchLen));

No need for the size - end thing so no need for the local variable.

r=me
Comment 4 Mark Rowe (bdash) 2010-05-11 11:25:53 PDT
Can you please please please keep an eye out for newly-introduced leaks on the build bot when landing this?  String-related changes have been a frequent cause of large leaks in the past.
Comment 5 Geoffrey Garen 2010-05-11 11:41:19 PDT
> I think that size is a confusing name for a Rope member function, since a rope has both fibers and characters and it's not clear whether size is the number of fibers or number of characters. Maybe this should be named stringCount instead of fiberCount, but I think size leaves room for confusion.

OK. I'll go back to fiberCount. stringCount is not so good because it implies that you're only counting the fibers that are strings, and not the fibers that are ropes.

Ultimately, this data structure should probably just become a Vector, or a wrapper for a Vector with special append functions.

> > +    if (!isRope()) {
> > +        size_t matchPos = m_value.find(character);
> > +        if (matchPos == notFound)
> > +            return JSValue(this);
> > +        size_t matchEnd = matchPos + 1;
> > +        return jsString(exec, m_value.substr(0, matchPos),
> > +                              replacement,
> > +                              m_value.substr(matchEnd, m_value.size() - matchEnd));
> 
> You can omit the length from the second substr call.

Fixed.

> We don't format like this, lining subsequent lines up with the comma in the first line. If you break the line I suggest indenting only once. But I think this now fits on one line:
> 
>            return jsString(exec, m_value.substr(0, matchPos), replacement, m_value.substr(matchPos + 1));
> 
> You should do it like that.

Fixed.

This worked great in almost all cases. There's one case now, though, where it's pretty hard to tell which strings are being spliced together, since everything is on one long line.

> > +    UStringImpl* match = 0;
> > +    size_t matchPos = notFound;
> 
> I think that "match" should be named "matchString" and "matchPos" should be named "matchPosition".

Fixed.

> > +    for (RopeIterator it(m_other.m_fibers, m_fiberCount); it != end; ++it) {
> > +        ++fiberCount;
> > +        if (match)
> > +            continue;
> > +
> > +        UStringImpl* impl = *it;
> 
> I would call this local variable "string" rather than "impl".

Fixed.

> > +            builder.append(UString(impl));
> 
> I don't think the cast to UString here should be required.

UString only has a constructor for PassRefPtr<UStringImpl> -- I guess it's trying to discourage inefficiencies. 

> > +        size_t matchEnd = matchPos + 1;
> > +        builder.append(UString(impl).substr(0, matchPos));
> > +        if (replacement.size())
> > +            builder.append(replacement);
> > +        builder.append(UString(impl).substr(matchEnd, impl->length() - matchEnd));
> 
> No need to specify a length here. This should just be:
> 
>     builder.append(UString(impl).substr(matchPos + 1));

Fixed.

> It's also unfortunate that UStringImpl* doesn't offer a substr operation.

Yeah.

> > +                RopeIterator() { }
> 
> Why is this needed?

End iterator.

> > +                UString::Rep* operator*()
> 
> The new name for the UString::Rep type is UStringImpl. I don't think we want to use UString::Rep in new code. Ask Gavin to be sure. Eventually it will just be StringImpl, but since that's stll in the WebCore namespace I guess we can't call it that yet.

Fixed.

> > +                    if (!m_workQueue.size())
> > +                        return;
> 
> Normally we use isEmpty for this type of thing.

Fixed.

> > +    size_t matchEnd = matchPos + matchLen;
> > +    int ovector[2] = { matchPos, matchEnd };
> > +    return jsString(exec, source.substr(0, matchPos),
> > +                          substituteBackreferences(replacementString, source, ovector, 0),
> > +                          source.substr(matchEnd, source.size() - matchEnd));
> 
>     return jsString(exec, source.substr(0.matchPos), substituteBackreferences(replacementString, source, ovector, 0), source.substr(matchPos + matchLen));
> 
> No need for the size - end thing so no need for the local variable.

Fixed.
Comment 6 Geoffrey Garen 2010-05-11 11:42:05 PDT
I was hoping to use the EWS bots, like some folks suggested on webkit-dev, but it's been almost two hours with no input from them, so I guess I'll commit now.
Comment 7 Geoffrey Garen 2010-05-11 11:42:58 PDT
Committed revision 59161.
Comment 8 Geoffrey Garen 2010-05-11 13:37:09 PDT
Created attachment 55746 [details]
patch 

Fix 64bit.
Comment 9 Geoffrey Garen 2010-05-11 13:52:43 PDT
Comment on attachment 55746 [details]
patch 

r=gavin