Bug 74063 - Improve charactersAreAllASCII() to compare multiple characters at a time
Summary: Improve charactersAreAllASCII() to compare multiple characters at a time
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Benjamin Poulain
URL:
Keywords: EasyFix
Depends on:
Blocks:
 
Reported: 2011-12-08 00:14 PST by Benjamin Poulain
Modified: 2012-01-05 01:39 PST (History)
4 users (show)

See Also:


Attachments
Patch (2.71 KB, patch)
2011-12-08 00:42 PST, Benjamin Poulain
no flags Details | Formatted Diff | Diff
Patch (18.96 KB, patch)
2011-12-12 01:39 PST, Benjamin Poulain
no flags Details | Formatted Diff | Diff
Patch (19.01 KB, patch)
2011-12-12 13:54 PST, Benjamin Poulain
no flags Details | Formatted Diff | Diff
Patch (17.93 KB, patch)
2012-01-03 17:58 PST, Benjamin Poulain
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Benjamin Poulain 2011-12-08 00:14:04 PST
We could load a word size of character at every iteration...
Memory access would be more effective, and we would need less iterations in the loop to go over the string.
Comment 1 Benjamin Poulain 2011-12-08 00:42:27 PST
Created attachment 118344 [details]
Patch
Comment 2 Benjamin Poulain 2011-12-08 00:47:15 PST
This is more for fun than anything.

I measured the gain of the function itself, it is a bit more than the expected 4x, 8x, 16x. I did not really look into it but I guess that is because of the reduced number of memory access.

The function get a lot more complicated than the original, so I am not sure that is worth it for the project.
Comment 3 Early Warning System Bot 2011-12-08 02:32:56 PST
Comment on attachment 118344 [details]
Patch

Attachment 118344 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/10797174
Comment 4 Darin Adler 2011-12-08 10:35:08 PST
Comment on attachment 118344 [details]
Patch

The functions in TextCodecASCIIFastPath.h provide a more readable way to do this type of optimization for single-byte characters, and ideally I’d like to see this share code with that. Maybe we can move those functions into WTF?
Comment 5 Benjamin Poulain 2011-12-08 10:48:30 PST
> The functions in TextCodecASCIIFastPath.h provide a more readable way to do this type of optimization for single-byte characters, and ideally I’d like to see this share code with that. Maybe we can move those functions into WTF?

I was unaware the mask were available in TextCodecASCIIFastPath.h. I can have a look at moving the file to WTF.
Comment 6 Benjamin Poulain 2011-12-12 01:39:20 PST
Created attachment 118751 [details]
Patch
Comment 7 Early Warning System Bot 2011-12-12 01:57:01 PST
Comment on attachment 118751 [details]
Patch

Attachment 118751 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/10836058
Comment 8 Benjamin Poulain 2011-12-12 13:54:54 PST
Created attachment 118838 [details]
Patch
Comment 9 Benjamin Poulain 2011-12-19 02:30:48 PST
Now that ASCIIType.h is no longer bloated, what do you think about merging the ASCIIFastPathHelper.h in this patch in ASCIIType?
Comment 10 Darin Adler 2011-12-19 11:31:38 PST
(In reply to comment #9)
> Now that ASCIIType.h is no longer bloated, what do you think about merging the ASCIIFastPathHelper.h in this patch in ASCIIType?

I don’t think this belongs in <wtf/ASCIICType.h>. The point of <ASCIICType.h> is to offer functions like the ones in <ctype.h>, but ones that work with ASCII. It should not have other functions in it. In fact, I believe some of the functions creeping in there do not belong there.

I do think its’s good to have some functions for working with ASCII all in one header. It could even be named <wtf/ASCII.h>. I don’t like the word “helper” in a header name. Headers all help you!
Comment 11 Darin Adler 2011-12-19 11:38:07 PST
Comment on attachment 118838 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=118838&action=review

Did you measure the performance impact of this? The code is faster for long strings but I suspect it’s slower for short ones.

> Source/JavaScriptCore/wtf/text/ASCIIFastPathHelper.h:4
> + * Copyright (C) 2011 Benjamin Poulain <benjamin@webkit.org>

If you are doing this work as an Apple employee you should not add your own copyright to the file. If you did the work when not an Apple employee that might be OK.

> Source/JavaScriptCore/wtf/text/ASCIIFastPathHelper.h:46
> +
> +

We normally don’t use multiple blank lines to paragraph. We use just one.

If you want a larger separator you’d normally use a comment.

> Source/JavaScriptCore/wtf/text/ASCIIFastPathHelper.h:47
> +template<size_t size, typename CharType> struct NonASCIIMask;

I’d prefer that “character type” not be abbreviated to CharType. How about CharacterType? I would like that in ASCIICType.h too.

> Source/JavaScriptCore/wtf/text/ASCIIFastPathHelper.h:69
> +// Note: this function assume the input is likely all ASCII, and
> +// do not leave early if it is not the case.

The word “this” should be capitalized.

The word “do” should be “does”.

> Source/JavaScriptCore/wtf/text/ASCIIFastPathHelper.h:80
> +    // Prologue: align the input.
> +    while (!isAlignedToMachineWord(characters) && (characters != end)) {
> +        allCharBits |= *characters;
> +        ++characters;
> +    }

We may need to find a way to incorporate a fast path for buffers that are known to be aligned to a machine word that omits this first loop. For example, I suspect all string buffers for StringImpl have such a guarantee.

I don’t think the parentheses are characters != end are needed.

> Source/JavaScriptCore/wtf/text/ASCIIFastPathHelper.h:83
> +    const CharType *wordEnd = alignToMachineWord(end);

Misplaced * here.
Comment 12 Benjamin Poulain 2011-12-19 11:53:01 PST
ASCIIFastPath.h it is then :)

> > Source/JavaScriptCore/wtf/text/ASCIIFastPathHelper.h:4
> > + * Copyright (C) 2011 Benjamin Poulain <benjamin@webkit.org>
> 
> If you are doing this work as an Apple employee you should not add your own copyright to the file. If you did the work when not an Apple employee that might be OK.

I use to have the distinction between work done...at work, and what I do on my weekends at home. I'll remove this.
Comment 13 Benjamin Poulain 2012-01-03 17:58:53 PST
Created attachment 121030 [details]
Patch
Comment 14 Andreas Kling 2012-01-03 18:01:50 PST
Comment on attachment 121030 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=121030&action=review

> Source/JavaScriptCore/wtf/text/ASCIIFastPath.h:2
> + * Copyright (C) 2011 Apple Inc. All rights reserved.

Needs more 2012!
Comment 15 Benjamin Poulain 2012-01-03 18:03:41 PST
All the bots are red, I'll wait a bit before landing this.
Comment 16 WebKit Review Bot 2012-01-05 01:39:24 PST
Comment on attachment 121030 [details]
Patch

Clearing flags on attachment: 121030

Committed r104127: <http://trac.webkit.org/changeset/104127>
Comment 17 WebKit Review Bot 2012-01-05 01:39:30 PST
All reviewed patches have been landed.  Closing bug.