Bug 45085 - [WTFURL] Add a mechanism for classifying types of characters
Summary: [WTFURL] Add a mechanism for classifying types of characters
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Other OS X 10.5
: P2 Normal
Assignee: Adam Barth
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-01 19:51 PDT by Adam Barth
Modified: 2010-10-13 10:21 PDT (History)
1 user (show)

See Also:


Attachments
Patch (16.70 KB, patch)
2010-09-01 19:53 PDT, Adam Barth
mjs: review+
mjs: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Adam Barth 2010-09-01 19:51:55 PDT
[WTFURL] Add a mechanism for classifying types of characters
Comment 1 Adam Barth 2010-09-01 19:53:47 PDT
Created attachment 66315 [details]
Patch
Comment 2 Maciej Stachowiak 2010-10-13 00:32:20 PDT
Comment on attachment 66315 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=66315&action=review

r=me If the above comments are addressed.

> JavaScriptCore/wtf/url/src/URLCharacterTypes.cpp:37
> +    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00 - 0x0f
> +    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x10 - 0x1f
> +    0, // 0x20  ' ' (escape spaces in queries)

Not sure about this, but maybe instead of 0 for no flags, there should be a symbolic constant like InvalidCharacter.

> JavaScriptCore/wtf/url/src/URLCharacterTypes.h:42
> +    // Bits that identify different character types. These types identify different
> +    // bits that are set for each 8-bit character in the characterTypeTable.

This comment seems redundant. The enum and its values are fairly self-documenting.

> JavaScriptCore/wtf/url/src/URLCharacterTypes.h:50
> +    enum CharTypes {
> +      QueryCharacter = 1,
> +      UserInfoCharacter = 2,
> +      IPv4Character = 4,
> +      HexCharacter = 8,
> +      DecimalCharacter = 16,
> +      OctalCharacter = 32,
> +    };

WebKit convention is usually to use a shifted 1 as the value for enums that are meant to be used with masking (1 << 0, 1 << 1, 1 << 2, etc).

Might also want to look at how the similar character classes are documented in KURL.cpp.

(On a personal note, I find it surprising that both URL parsers have a character flags table but have completely different classes!)

> JavaScriptCore/wtf/url/src/URLCharacterTypes.h:58
> +    // This table contains the flags in CharTypes for each 8-bit character.
> +    // Some canonicalization functions have their own specialized lookup table.
> +    // For those with simple requirements, we have collected the flags in one
> +    // place so there are fewer lookup tables to load into the CPU cache.
> +    //
> +    // Using an unsigned char type has a small but measurable performance benefit
> +    // over using a 32-bit number.

This comment seems overly verbose and the second part in particular is needless detail, unless there's a reason to believe someone would accidentally change the "unsigned char" to "unsigned". If some of these details do need to be documented, then perhaps the comment should go in the implementation file, since this is a private static data member.
Comment 3 Adam Barth 2010-10-13 10:21:24 PDT
Committed r69669: <http://trac.webkit.org/changeset/69669>