Bug 98224 - MarkupAccumulator should optimally handle 8 bit Strings
Summary: MarkupAccumulator should optimally handle 8 bit Strings
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: HTML Editing (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-02 18:34 PDT by Michael Saboff
Modified: 2012-10-09 12:55 PDT (History)
5 users (show)

See Also:


Attachments
Patch (7.68 KB, patch)
2012-10-02 18:54 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Patch with some updates from reviewer's comments (7.79 KB, patch)
2012-10-09 08:50 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2012-10-02 18:34:36 PDT
Currently MarkupAccumulator::appendNodeValue() and the related function appendCharactersReplacingEntities() only handle UChar* string data.  These should handle 8 bit string data as well.
Comment 1 Michael Saboff 2012-10-02 18:54:21 PDT
Created attachment 166798 [details]
Patch
Comment 2 Eric Seidel (no email) 2012-10-08 16:12:18 PDT
Sorry, I meant rniwa, not abarth.
Comment 3 Ryosuke Niwa 2012-10-08 17:27:15 PDT
Comment on attachment 166798 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=166798&action=review

> Source/WebCore/editing/MarkupAccumulator.cpp:74
> +            for (size_t m = 0; m < WTF_ARRAY_LENGTH(entityMaps); ++m) {

Please don't use abbreviations like m.

> Source/WebCore/editing/MarkupAccumulator.cpp:98
> +        const UChar* text = source.characters16() + offset;
> +
> +        size_t positionAfterLastEntity = 0;
> +        for (size_t i = 0; i < length; ++i) {
> +            for (size_t m = 0; m < WTF_ARRAY_LENGTH(entityMaps); ++m) {
> +                if (text[i] == entityMaps[m].entity && entityMaps[m].mask & entityMask) {
> +                    result.append(text + positionAfterLastEntity, i - positionAfterLastEntity);
> +                    result.append(entityMaps[m].reference);
> +                    positionAfterLastEntity = i + 1;
> +                    break;
> +                }
>              }
>          }
> +        result.append(text + positionAfterLastEntity, length - positionAfterLastEntity);

It seems like we should be able to use a template to avoid code duplication here.

> Source/WebCore/editing/markup.cpp:247
> +        MarkupAccumulator::appendCharactersReplacingEntities(buffer, content, 0, content.length(), EntityMaskInPCDATA);

StyledMarkupAccumulator inherits from MarkupAccumulator so we shouldn't have to quality it.

> Source/WebCore/editing/markup.cpp:994
> -    appendCharactersReplacingEntities(markup, title.characters(), title.length(), EntityMaskInPCDATA);
> +    MarkupAccumulator::appendCharactersReplacingEntities(markup, title, 0, title.length(), EntityMaskInPCDATA);

Ditto.
Comment 4 Michael Saboff 2012-10-09 08:50:56 PDT
Created attachment 167764 [details]
Patch with some updates from reviewer's comments

(In reply to comment #3)
> (From update of attachment 166798 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=166798&action=review
> 
> > Source/WebCore/editing/MarkupAccumulator.cpp:74
> > +            for (size_t m = 0; m < WTF_ARRAY_LENGTH(entityMaps); ++m) {
> 
> Please don't use abbreviations like m.

This was from the original code, but I changed 'm' to 'entryIndex'.

> > Source/WebCore/editing/MarkupAccumulator.cpp:98
> > +        const UChar* text = source.characters16() + offset;
> > +
> > +        size_t positionAfterLastEntity = 0;
> > +        for (size_t i = 0; i < length; ++i) {
> > +            for (size_t m = 0; m < WTF_ARRAY_LENGTH(entityMaps); ++m) {
> > +                if (text[i] == entityMaps[m].entity && entityMaps[m].mask & entityMask) {
> > +                    result.append(text + positionAfterLastEntity, i - positionAfterLastEntity);
> > +                    result.append(entityMaps[m].reference);
> > +                    positionAfterLastEntity = i + 1;
> > +                    break;
> > +                }
> >              }
> >          }
> > +        result.append(text + positionAfterLastEntity, length - positionAfterLastEntity);
> 
> It seems like we should be able to use a template to avoid code duplication here.

I thought about that with the original patch.  Given there are static variables in the method that are used in the loops, either the statics would need to be duplicated in the template or a pointer to the array along with it's size needs to be passed into the templated loop function.  Both of these options seemed less desirable to having the two loops.  If you think either of these would be more readable, I could be convinced.

> > Source/WebCore/editing/markup.cpp:247
> > +        MarkupAccumulator::appendCharactersReplacingEntities(buffer, content, 0, content.length(), EntityMaskInPCDATA);
> 
> StyledMarkupAccumulator inherits from MarkupAccumulator so we shouldn't have to quality it.

Removed the class name here.

> > Source/WebCore/editing/markup.cpp:994
> > -    appendCharactersReplacingEntities(markup, title.characters(), title.length(), EntityMaskInPCDATA);
> > +    MarkupAccumulator::appendCharactersReplacingEntities(markup, title, 0, title.length(), EntityMaskInPCDATA);
> 
> Ditto.

Can't remove the class name here as this isn't a member function.
Comment 5 WebKit Review Bot 2012-10-09 12:26:52 PDT
Comment on attachment 167764 [details]
Patch with some updates from reviewer's comments

Clearing flags on attachment: 167764

Committed r130795: <http://trac.webkit.org/changeset/130795>
Comment 6 WebKit Review Bot 2012-10-09 12:26:55 PDT
All reviewed patches have been landed.  Closing bug.
Comment 7 Geoffrey Garen 2012-10-09 12:29:35 PDT
Comment on attachment 167764 [details]
Patch with some updates from reviewer's comments

View in context: https://bugs.webkit.org/attachment.cgi?id=167764&action=review

> Source/WebCore/editing/MarkupAccumulator.cpp:70
> +    if (source.is8Bit()) {
> +        const LChar* text = source.characters8() + offset;

Is there some advantage to writing this function out twice instead of using a function template?
Comment 8 Michael Saboff 2012-10-09 12:46:11 PDT
(In reply to comment #7)
> (From update of attachment 167764 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=167764&action=review
> 
> > Source/WebCore/editing/MarkupAccumulator.cpp:70
> > +    if (source.is8Bit()) {
> > +        const LChar* text = source.characters8() + offset;
> 
> Is there some advantage to writing this function out twice instead of using a function template?

See the response in comment #4 to Ryosuke's comment.  Templating this function is complicated by the function static variables.
Comment 9 Ryosuke Niwa 2012-10-09 12:55:40 PDT
(In reply to comment #7)
> (From update of attachment 167764 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=167764&action=review
> 
> > Source/WebCore/editing/MarkupAccumulator.cpp:70
> > +    if (source.is8Bit()) {
> > +        const LChar* text = source.characters8() + offset;
> 
> Is there some advantage to writing this function out twice instead of using a function template?

Static variables :( Let us know if you have an idea on how to share more code here.