45088 – [WTFURL] Add URLQueryCanonicalizer

RESOLVED FIXED 45088

[WTFURL] Add URLQueryCanonicalizer

https://bugs.webkit.org/show_bug.cgi?id=45088

Summary [WTFURL] Add URLQueryCanonicalizer

Adam Barth

Reported 2010-09-01 22:43:37 PDT

[WTFURL] Add URLQueryCanonicalizer

Attachments
Patch (16.36 KB, patch) 2010-09-01 22:45 PDT, Adam Barth	no flags	Details Formatted Diff Diff
Patch (13.51 KB, patch) 2010-10-13 10:38 PDT, Adam Barth	no flags	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Adam Barth

Comment 1 2010-09-01 22:45:37 PDT

Created attachment 66334 [details] Patch

Maciej Stachowiak

Comment 2 2010-10-13 00:50:32 PDT

Comment on attachment 66334 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=66334&action=review r- because at least the all-capts typenames definitely need to be fixed > JavaScriptCore/wtf/url/src/URLEscape.h:37 > +extern const char hexCharacterTable[0x10]; It would be more conventional to say [16] instead of [0x10]/ > JavaScriptCore/wtf/url/src/URLEscape.h:42 > +// Write a single character, escaped, to the output. This always escapes: it > +// does no checking that thee character requires escaping. > +// Escaping makes sense only 8 bit chars, so code works in all cases of > +// input parameters (8/16bit). Comment seems excessive - the function name is reasonably descriptive as it is. > JavaScriptCore/wtf/url/src/URLEscape.h:44 > +template<typename INCHAR, typename OUTCHAR> > +inline void appendEscapedCharacter(INCHAR ch, URLBuffer<OUTCHAR>& buffer) - All-caps typenames are not standard WebKit style. - Consider naming the function appendURLEscapedCharacter. Given only the context of the WTF namespace, it might not be clear what kind of escaping is intended, but "URL escaped" or "percent escaped" would be pretty unambiguous. > JavaScriptCore/wtf/url/src/URLQueryCanonicalizer.h:69 > +// Query canonicalization in IE > +// ---------------------------- > +// IE is very permissive for query parameters specified in links on the page > +// (in contrast to links that it constructs itself based on form data). It does > +// not unescape any character. It does not reject any escape sequence (be they > +// invalid like "%2y" or freaky like %00). > +// > +// IE only escapes spaces and nothing else. Embedded NULLs, tabs (0x09), > +// LF (0x0a), and CR (0x0d) are removed (this probably happens at an earlier > +// layer since they are removed from all portions of the URL). All other > +// characters are passed unmodified. Invalid UTF-16 sequences are preserved as > +// well, with each character in the input being converted to UTF-8. It is the > +// server's job to make sense of this invalid query. > +// > +// Invalid multibyte sequences (for example, invalid UTF-8 on a UTF-8 page) > +// are converted to the invalid character and sent as unescaped UTF-8 (0xef, > +// 0xbf, 0xbd). This may not be canonicalization, the parser may generate these > +// strings before the URL handler ever sees them. > +// > +// Our query canonicalization > +// -------------------------- > +// We escape all non-ASCII characters and control characters, like Firefox. > +// This is more conformant to the URL spec, and there do not seem to be many > +// problems relating to Firefox's behavior. > +// > +// Like IE, we will never unescape (although the application may want to try > +// unescaping to present the user with a more understandable URL). We will > +// replace all invalid sequences (including invalid UTF-16 sequences, which IE > +// doesn't) with the "invalid character," and we will escape it. Does this comment really need to be essay-length? Perhaps it could be pared down to the key non-obvious information. In particular, anything that someone needing to read or modify this code is likely to need to know. I'm not sure a detailed spec of IE's behavior, which is not in fact implemented here, is likely to be all that relevant. > JavaScriptCore/wtf/url/src/URLQueryCanonicalizer.h:71 > +template<typename CHAR, typename OUTCHAR, void convertCharset(const CHAR*, int length, URLBuffer<char>&)> Please don't use all-caps for template parameters. Also, why CHAR/OUTCHAR here instead of INCHAR/OUTCHAR as in the code above? > JavaScriptCore/wtf/url/src/URLQueryCanonicalizer.h:74 > + static void DoCanonicalizeQuery(const CHAR* spec, const URLComponent& query, URLBuffer<OUTCHAR>& buffer, URLComponent& resultQuery) Member function names should not start with a capital letter in WebKit style. Also "do" seems redundant - why not just "canonicalizeQuery"? > JavaScriptCore/wtf/url/src/URLQueryCanonicalizer.h:92 > + // FIXME: This should be an unsigned comparison. Does this FIXME really need fixing? If so, please do. Tentatively, it seems like it might be broken when CHAR is "char" and the platform has signed char. > JavaScriptCore/wtf/url/src/URLQueryCanonicalizer.h:102 > + // Appends the given string to the output, escaping characters that do not > + // match the given |type| in SharedCharTypes. This version will accept 8 or 16 > + // bit characters, but assumes that they have only 7-bit values. It also assumes > + // that all UTF-8 values are correct, so doesn't bother checking Comment is too verbose. I can see how it's useful to document the preconditions of the function, but that would be even better done with assertions than with comments. From reading the source, it's not clear how SharedCharTypes gets involved. SharedCharTypes does not appear elsewhere in this patch. Is that part of the comment accurate?

Adam Barth

Comment 3 2010-10-13 00:56:49 PDT

Thanks for the review. I'll be more aggressive about removing comments in the future. :) I'm sad the stylebot and I both missed the capitalization issues.

Adam Barth

Comment 4 2010-10-13 10:38:52 PDT

Created attachment 70625 [details] Patch

Maciej Stachowiak

Comment 5 2010-10-13 12:48:54 PDT

Comment on attachment 70625 [details] Patch r=me

Adam Barth

Comment 6 2010-10-13 12:54:40 PDT

thanks

Adam Barth

Comment 7 2010-10-13 13:02:08 PDT

Comment on attachment 70625 [details] Patch Clearing flags on attachment: 70625 Committed r69686: <http://trac.webkit.org/changeset/69686>

Adam Barth

Comment 8 2010-10-13 13:02:13 PDT

All reviewed patches have been landed. Closing bug.

WebKit Review Bot

Comment 9 2010-10-13 14:06:18 PDT

http://trac.webkit.org/changeset/69686 might have broken Chromium Mac Release

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Other

OS OS X 10.5

Product WebKit

Component New Bugs

Assignee

Adam Barth

Reported

2010-09-01 22:43 PDT

Modified

2010-10-13 14:06 PDT History

CC List

5 users Show

URL

Keywords

Depends on

Blocks