Bug 59834 - WebCore should have QuotedPrintable encoding capabilities
Summary: WebCore should have QuotedPrintable encoding capabilities
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Jay Civelli
URL:
Keywords:
Depends on:
Blocks: 7168
  Show dependency treegraph
 
Reported: 2011-04-29 14:50 PDT by Jay Civelli
Modified: 2011-05-05 20:38 PDT (History)
6 users (show)

See Also:


Attachments
Patch (17.89 KB, patch)
2011-05-02 09:39 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (17.92 KB, patch)
2011-05-02 09:48 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (17.82 KB, patch)
2011-05-02 17:33 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (17.12 KB, patch)
2011-05-03 14:54 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (17.56 KB, patch)
2011-05-05 09:52 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (20.55 KB, patch)
2011-05-05 15:57 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (20.09 KB, patch)
2011-05-05 16:00 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (20.01 KB, patch)
2011-05-05 16:34 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jay Civelli 2011-04-29 14:50:17 PDT
WebCore should have QuotedPrintable encoding capabilities, that is required for MHTML support.
Comment 1 Jay Civelli 2011-05-02 09:39:11 PDT
Created attachment 91930 [details]
Patch
Comment 2 Jay Civelli 2011-05-02 09:48:02 PDT
Created attachment 91931 [details]
Patch
Comment 3 Alexey Proskuryakov 2011-05-02 10:59:07 PDT
Comment on attachment 91931 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=91931&action=review

As Adam mentioned in bug 7168, this should be discussed on webkit-dev first. It is not obvious to me if there is any value in supporting MHTML now.

> Source/WebCore/platform/text/QuotedPrintable.cpp:34
> +#include <wtf/text/CString.h>

CString isn't used in this file, as far as I can see.

> Source/WebCore/platform/text/QuotedPrintable.cpp:39
> +static const int maxCharPerLine = 76;

Plz don't abbr.

> Source/WebCore/platform/text/QuotedPrintable.cpp:45
> +static int hexDigitValue(char c)

Why not use the existing toASCIIHexValue()? I see that you want a check instead of assertion, but it's still not good to add copy/pasted code, and returning a failure in a magic value is not right.

Also, this should be marked inline.

> Source/WebCore/platform/text/QuotedPrintable.cpp:55
> +// Returns the length in characters of the EOL found at index, 0 if there isn't one.
> +static int isEOL(const char* input, unsigned index, unsigned length)

This function name is obviously wrong, it doesn't even return a boolean like an "is" function should.

> Source/WebCore/platform/text/QuotedPrintable.cpp:79
> +String quotedPrintableEncode(const char* input, unsigned length)

I can see how it is necessary to decode this atrocity to support reading MHTML files created by other browsers, but can we avoid encoding like that? Are any other encoding forms supported in MHTML?

There are many lengths in this function, so I think that the argument needs a better name.

The types seem wrong. Ultimately, the source in Unicode, and encoding output is ASCII, but types are exact opposite of that (see the question about decoding routine though).

> Source/WebCore/platform/text/QuotedPrintable.cpp:82
> +    // The number of characters in the current line.Ve

Garbage at the end.

> Source/WebCore/platform/text/QuotedPrintable.cpp:83
> +    int charCount = 0;

As a count, this should be unsigned.

> Source/WebCore/platform/text/QuotedPrintable.cpp:88
> +        // Whether this character can be inserted without encoding.
> +        bool asIs = false;

You could say the same in the variable name.

> Source/WebCore/platform/text/QuotedPrintable.cpp:96
> +        if (!asIs && (c == '\t' || c == ' ') && !lastChar && !isEOL(input, i + 1, length))
> +            asIs = true;

Is it really that only ASCII characters are encoded as is? It seems like a practical problem that non-ASCII text would be exploded in size.

> Source/WebCore/platform/text/QuotedPrintable.cpp:102
> +            if (eolLen > 0) {
> +            result.append(eol);

Bad indentation.

> Source/WebCore/platform/text/QuotedPrintable.cpp:114
> +            result.append('=');
> +        result.append(eol);
> +            charCount = 0;

Bad indentation.

> Source/WebCore/platform/text/QuotedPrintable.cpp:141
> +bool quotedPrintableDecode(const char* data, unsigned len, Vector<char>& out)

Please don't abbreviate; again, there are many different lengths here, and a better name is needed for the argument.

How does MHTML encode non-ASCII characters - is it actually a post-processing step after creating a byte stream?

> Source/WebCore/platform/text/QuotedPrintable.cpp:148
> +    for (unsigned i = 0; i < len; i++) {

Style nit: we prefer prefix increment and decrement when it doesn't matter.

> Source/WebCore/platform/text/QuotedPrintable.cpp:158
> +            // Unfinished = sequence, append as is.
> +            success = false;

Are we really interested in a failure like this? Is there anything the caller can reasonably do, other than ignore the returned boolean value?

> Source/WebCore/platform/text/QuotedPrintable.cpp:173
> +            success = false;

Ditto.

> Source/WebCore/platform/text/QuotedPrintable.cpp:181
> +        char r = static_cast<char>(((i1 << 4) & 0xF0) | (i2 & 0x0F));

Please don't abbreviate.
Comment 4 Jay Civelli 2011-05-02 17:33:34 PDT
Comment on attachment 91931 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=91931&action=review

>> Source/WebCore/platform/text/QuotedPrintable.cpp:34
>> +#include <wtf/text/CString.h>
> 
> CString isn't used in this file, as far as I can see.

It's needed ny line 71 (String.ascii() returns a CString).

>> Source/WebCore/platform/text/QuotedPrintable.cpp:39
>> +static const int maxCharPerLine = 76;
> 
> Plz don't abbr.

K

>> Source/WebCore/platform/text/QuotedPrintable.cpp:45
>> +static int hexDigitValue(char c)
> 
> Why not use the existing toASCIIHexValue()? I see that you want a check instead of assertion, but it's still not good to add copy/pasted code, and returning a failure in a magic value is not right.
> 
> Also, this should be marked inline.

Removed that function in favor of toASCIIHexValue().

>> Source/WebCore/platform/text/QuotedPrintable.cpp:55
>> +static int isEOL(const char* input, unsigned index, unsigned length)
> 
> This function name is obviously wrong, it doesn't even return a boolean like an "is" function should.

Renamed eolLengthAtIndex.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:79
>> +String quotedPrintableEncode(const char* input, unsigned length)
> 
> I can see how it is necessary to decode this atrocity to support reading MHTML files created by other browsers, but can we avoid encoding like that? Are any other encoding forms supported in MHTML?
> 
> There are many lengths in this function, so I think that the argument needs a better name.
> 
> The types seem wrong. Ultimately, the source in Unicode, and encoding output is ASCII, but types are exact opposite of that (see the question about decoding routine though).

There are fortunately more encoding available in the MHTML specs, including a binary type that let's you include a resource as is. Some initial experimentations seemed to indicate that IE does not like it though. We'll have to worry about it when I get to the CL where I build the actual MHTML file. At which point I may end up removing the encoding part in this file.
Changed the length arg name.
I took inspiration from Base64.h for the APIs and different types.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:82
>> +    // The number of characters in the current line.Ve
> 
> Garbage at the end.

Garbage removed.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:83
>> +    int charCount = 0;
> 
> As a count, this should be unsigned.

Done.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:88
>> +        bool asIs = false;
> 
> You could say the same in the variable name.

Done.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:96
>> +            asIs = true;
> 
> Is it really that only ASCII characters are encoded as is? It seems like a practical problem that non-ASCII text would be exploded in size.

Yes, that's per the specs http://tools.ietf.org/html/rfc2045#section-6.7
We might be able to use a different encoding mechanism for MHTML generation (see comment above).

>> Source/WebCore/platform/text/QuotedPrintable.cpp:102
>> +            result.append(eol);
> 
> Bad indentation.

Fixed.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:114
>> +            charCount = 0;
> 
> Bad indentation.

Fixed.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:141
>> +bool quotedPrintableDecode(const char* data, unsigned len, Vector<char>& out)
> 
> Please don't abbreviate; again, there are many different lengths here, and a better name is needed for the argument.
> 
> How does MHTML encode non-ASCII characters - is it actually a post-processing step after creating a byte stream?

Changed parameter name.
The MIME type header contains the charset that should be used for the decoded bytes.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:148
>> +    for (unsigned i = 0; i < len; i++) {
> 
> Style nit: we prefer prefix increment and decrement when it doesn't matter.

Done.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:158
>> +            success = false;
> 
> Are we really interested in a failure like this? Is there anything the caller can reasonably do, other than ignore the returned boolean value?

Yeah, I was trying to get inspiration from the Base64 code, but since I do not have a policy parameter for failure the returned value is not that helpful.
Removed.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:181
>> +        char r = static_cast<char>(((i1 << 4) & 0xF0) | (i2 & 0x0F));
> 
> Please don't abbreviate.

OK.
Comment 5 Jay Civelli 2011-05-02 17:33:52 PDT
Created attachment 92016 [details]
Patch
Comment 6 Alexey Proskuryakov 2011-05-02 21:12:51 PDT
> It's needed ny line 71 (String.ascii() returns a CString).

I see, missed that one (I didn't closely review encoding functions in general). Ugh, so this function loses information without saying a word :-(

> I took inspiration from Base64.h for the APIs and different types.

I don't immediately have a suggestion, but using String for 8-bit data streams seems unfortunate.
Comment 7 Alexey Proskuryakov 2011-05-02 21:31:35 PDT
> Ugh, so this function loses information without saying a word :-(

In particular, it replaces line feeds and tabs with question marks.
Comment 8 Jay Civelli 2011-05-03 14:53:29 PDT
You are right, it's a pretty bad API. I removed the String methods.
Comment 9 Jay Civelli 2011-05-03 14:54:24 PDT
Created attachment 92138 [details]
Patch
Comment 10 Adam Barth 2011-05-04 17:10:04 PDT
Comment on attachment 92138 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=92138&action=review

This looks good.  I just have an epic number of naming nits.  For some reason WebKit is supper excited about name variables and functions with names that explain their meaning.  The general thought process is that names are more valuable than comments.  If you find yourself writing a comment to explain what a variable or function means, then you should try deleting that comment and changing the name to contain the information you should have put in the comment.

> Source/WebCore/platform/text/QuotedPrintable.cpp:39
> +static const int lineMaximumSize = 76;

lineMaximumSize => maximumLineLength

> Source/WebCore/platform/text/QuotedPrintable.cpp:43
> +static const char eol[] = "\r\n";

crlfLineEnding ?

> Source/WebCore/platform/text/QuotedPrintable.cpp:46
> +// Returns the length in characters of the EOL found at index, 0 if there isn't one.
> +static int eolLengthAtIndex(const char* input, unsigned index, unsigned length)

I'd remove the comment and change the name to something that explains what the function does:

lengthOfLineEndingAtIndex

Also, unsigned should probably be size_t, right?

> Source/WebCore/platform/text/QuotedPrintable.cpp:47
> +{

Can you add an assert that index < length ?

> Source/WebCore/platform/text/QuotedPrintable.cpp:65
> +String quotedPrintableEncode(const char* input, unsigned length)

unsigned => size_t (that's what in.size() returns, right?)

> Source/WebCore/platform/text/QuotedPrintable.cpp:69
> +    // The number of characters in the current line.
> +    unsigned charCount = 0;

I'd remove the comment and rename charCount -> currentLineLength

> Source/WebCore/platform/text/QuotedPrintable.cpp:71
> +        bool lastChar = (i == length - 1);

lastChar => isLastCharacter.  (lastChar would be the last character itself)

> Source/WebCore/platform/text/QuotedPrintable.cpp:84
> +            int eolLen = eolLengthAtIndex(input, i, length);

eolLen  => lengthOfLineEnding

> Source/WebCore/platform/text/QuotedPrintable.cpp:94
> +        unsigned minCharsNeeded = requiresEncoding ? lineMaximumSize - 4 : lineMaximumSize - 2;

It would be helpful to name these constants.  Also minCharsNeeded should be renamed to something that doesn't use abbreviations.  I'm not really sure what it does, so it's hard for me to recommend a name.

> Source/WebCore/platform/text/QuotedPrintable.cpp:105
> +            result.append(hexTable[static_cast<int>((c >> 4) & 0xF)]);
> +            result.append(hexTable[static_cast<int>(c & 0x0F)]);

I think we already have code that does this bit munging.  Maybe toASCIIHexValue in ASCIIType.h?

> Source/WebCore/platform/text/QuotedPrintable.cpp:120
> +void quotedPrintableDecode(const char* data, unsigned dataLength, Vector<char>& out)

Shouldn't this take a string as input?  It seems odd that it's not the opposite of encode.

> Source/WebCore/platform/text/QuotedPrintable.cpp:126
> +    for (unsigned i = 0; i < dataLength; ++i) {

unsigned => size_t

> Source/WebCore/platform/text/QuotedPrintable.cpp:127
> +        char character1 = data[i];

character1 => currentCharacter

> Source/WebCore/platform/text/QuotedPrintable.cpp:129
> +            // Regular character, append as is.

No need for this comment.

> Source/WebCore/platform/text/QuotedPrintable.cpp:140
> +        char character2 = data[++i];
> +        char character3 = data[++i];

character2 => upperCharacter
character3 => lowerCharacter

Those are sort of lame name suggestions, but something with words is better than using numerals.

> Source/WebCore/platform/text/QuotedPrintable.cpp:142
> +            // Soft line break, ignored.

I'd remove this comment as well.

> Source/WebCore/platform/text/QuotedPrintable.cpp:156
> +        char decodedCharacter = static_cast<char>(((i1 << 4) & 0xF0) | (i2 & 0x0F));

We must have code that does this assembly already, in which case you can call it without creating these local variables with tricky names.
Comment 11 Jay Civelli 2011-05-05 09:47:04 PDT
Comment on attachment 92138 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=92138&action=review

>> Source/WebCore/platform/text/QuotedPrintable.cpp:39
>> +static const int lineMaximumSize = 76;
> 
> lineMaximumSize => maximumLineLength

Done.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:43
>> +static const char eol[] = "\r\n";
> 
> crlfLineEnding ?

OK.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:46
>> +static int eolLengthAtIndex(const char* input, unsigned index, unsigned length)
> 
> I'd remove the comment and change the name to something that explains what the function does:
> 
> lengthOfLineEndingAtIndex
> 
> Also, unsigned should probably be size_t, right?

Done.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:47
>> +{
> 
> Can you add an assert that index < length ?

OK.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:65
>> +String quotedPrintableEncode(const char* input, unsigned length)
> 
> unsigned => size_t (that's what in.size() returns, right?)

Done.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:69
>> +    unsigned charCount = 0;
> 
> I'd remove the comment and rename charCount -> currentLineLength

Done.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:71
>> +        bool lastChar = (i == length - 1);
> 
> lastChar => isLastCharacter.  (lastChar would be the last character itself)

Done.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:84
>> +            int eolLen = eolLengthAtIndex(input, i, length);
> 
> eolLen  => lengthOfLineEnding

Done.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:94
>> +        unsigned minCharsNeeded = requiresEncoding ? lineMaximumSize - 4 : lineMaximumSize - 2;
> 
> It would be helpful to name these constants.  Also minCharsNeeded should be renamed to something that doesn't use abbreviations.  I'm not really sure what it does, so it's hard for me to recommend a name.

Named constants and fixed local var name.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:105
>> +            result.append(hexTable[static_cast<int>(c & 0x0F)]);
> 
> I think we already have code that does this bit munging.  Maybe toASCIIHexValue in ASCIIType.h?

I believe i'd still have to do the bit work with ASCIIHexValue and then still use my hexTable. Surprisingly I could not find any method in that file or any other file that does that.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:120
>> +void quotedPrintableDecode(const char* data, unsigned dataLength, Vector<char>& out)
> 
> Shouldn't this take a string as input?  It seems odd that it's not the opposite of encode.

I used to have one such API in my previous patch (I took inspiration from the Base64 API), but as Aleksey pointed out, it needs to call String::ascii() on it. At which point any non ASCII character is silently transformed to a '?'
So it felt like a bad API. If you feel the symmetry is important between the encode/decode, I could resurrect the method that takes a String and make the decode return false if not all characters are ASCII?

>> Source/WebCore/platform/text/QuotedPrintable.cpp:126
>> +    for (unsigned i = 0; i < dataLength; ++i) {
> 
> unsigned => size_t

Done.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:127
>> +        char character1 = data[i];
> 
> character1 => currentCharacter

OK.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:129
>> +            // Regular character, append as is.
> 
> No need for this comment.

Removed.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:140
>> +        char character3 = data[++i];
> 
> character2 => upperCharacter
> character3 => lowerCharacter
> 
> Those are sort of lame name suggestions, but something with words is better than using numerals.

Done.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:142
>> +            // Soft line break, ignored.
> 
> I'd remove this comment as well.

Removed.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:156
>> +        char decodedCharacter = static_cast<char>(((i1 << 4) & 0xF0) | (i2 & 0x0F));
> 
> We must have code that does this assembly already, in which case you can call it without creating these local variables with tricky names.

Grepped the code, no luck.
Improved names.
Comment 12 Jay Civelli 2011-05-05 09:52:33 PDT
Created attachment 92429 [details]
Patch
Comment 13 Alexey Proskuryakov 2011-05-05 10:15:31 PDT
Comment on attachment 92429 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=92429&action=review

Looks good to me. Adam, would you like to take another look?

> Source/WebCore/platform/text/QuotedPrintable.cpp:47
> +static size_t lengthOfLineEndingAtIndex(const char* input, size_t length, size_t index)

Maybe length -> lineLength?

Functions like this always make me wonder what should happen with malformed line endings (LFCR? CRLFCR?). Not sure if it's something to be worried about at this stage though.

> Source/WebCore/platform/text/QuotedPrintable.cpp:55
> +            return 1; // Single CR (Commodore and Old Macs).

I don't think that mentioning Commodore will be of any help to anyone reading this code. Also, "Classic Mac OS" is a more common term than "Old".

> Source/WebCore/platform/text/QuotedPrintable.cpp:62
> +String quotedPrintableEncode(const Vector<char>& in)

I'm still not sure of types here. The result is a 7-bit data buffer, do we really need a String?

> Source/WebCore/platform/text/QuotedPrintable.cpp:86
> +            if (lengthOfLineEnding > 0) {

No need to compare to zero, just "if (lengthOfLineEnding)" would be it per WebKit coding style.

> Source/WebCore/platform/text/QuotedPrintable.cpp:106
> +            result.append(hexTable[static_cast<int>((currentCharacter >> 4) & 0xF)]);
> +            result.append(hexTable[static_cast<int>(currentCharacter & 0xF)]);

Why "int"? This is an index into a table.

> Source/WebCore/platform/text/QuotedPrintable.cpp:142
> +        if (upperCharacter == '\r' && lowerCharacter == '\n')
> +            continue;

What does IE do with malformed sources (e.g. a single CR as soft line ending)?

> Source/WebCore/platform/text/QuotedPrintable.cpp:154
> +        char decodedCharacter = static_cast<char>(((upperValue << 4) & 0xF0) | (lowerValue & 0x0F));

I'm not sure if this masking is helpful. toASCIIHexValue returns a sane value already (although its implementation and return type look strange to me).
Comment 14 Adam Barth 2011-05-05 10:33:18 PDT
Comment on attachment 92429 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=92429&action=review

This patch is in good shape.  Please address Alexey's comments and consider moving the hex bit mashing into WTF before landing.  I'm happy to take a look at an updated patch, but you can also use your judgement if you feel like you've addressed all the comments from this iteration.

> Source/WebCore/platform/text/QuotedPrintable.cpp:96
> +        if (!isLastCharacter && currentLineLength > maximumLineLengthToFitCharacter) {

The !isLastCharacter branch confuses me slightly here.  Does that mean we can exceed the maximumLineLength on the last character of the buffer?  For example, what if we currently have 76 characters in the line and we have an last character that requires encoding.  That would seem to push us up to 79 characters.  Maybe that's ok because we're aiming for < 80 ?

> Source/WebCore/platform/text/QuotedPrintable.cpp:105
> +            result.append(hexTable[static_cast<int>((currentCharacter >> 4) & 0xF)]);

If these functions don't exist in ASCIIType.h, maybe it would make senes to add them there?  They seem more general purpose than this file.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:154
>> +        int upperValue = toASCIIHexValue(upperCharacter);
>> +        int lowerValue = toASCIIHexValue(lowerCharacter);
>> +        char decodedCharacter = static_cast<char>(((upperValue << 4) & 0xF0) | (lowerValue & 0x0F));
> 
> I'm not sure if this masking is helpful. toASCIIHexValue returns a sane value already (although its implementation and return type look strange to me).

Same with this function.
Comment 15 Darin Adler 2011-05-05 10:40:25 PDT
(In reply to comment #13)
> toASCIIHexValue returns a sane value already (although its implementation and return type look strange to me)

It’s implementation may look a bit strange because of how it uses "& 0xF" to handle both lowercase and uppercase hex without adding a branch. There may be a more elegant way to do it.

What return type would you prefer to int? Maybe signed or unsigned char? Maybe something else?
Comment 16 Alexey Proskuryakov 2011-05-05 10:59:16 PDT
> It’s implementation may look a bit strange because of how it uses "& 0xF" to handle both lowercase and uppercase hex without adding a branch. There may be a more elegant way to do it.

I think that it's fine as is. The feeling came from spending too much effort figuring out whether precedence was right, so a few more parentheses would have made it easier to me.

> What return type would you prefer to int? Maybe signed or unsigned char? Maybe something else?

Perhaps uint8_t? Character types seem wrong because it's not a character that is returned, and unsigned int would be frightening to convert to signed int where needed.
Comment 17 Jay Civelli 2011-05-05 15:56:46 PDT
Comment on attachment 92429 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=92429&action=review

>> Source/WebCore/platform/text/QuotedPrintable.cpp:47
>> +static size_t lengthOfLineEndingAtIndex(const char* input, size_t length, size_t index)
> 
> Maybe length -> lineLength?
> 
> Functions like this always make me wonder what should happen with malformed line endings (LFCR? CRLFCR?). Not sure if it's something to be worried about at this stage though.

It's actually the input length, renamed.
I am not too worried with malformed line endings, worst case scenario they would create more empty lines in the encoded result.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:55
>> +            return 1; // Single CR (Commodore and Old Macs).
> 
> I don't think that mentioning Commodore will be of any help to anyone reading this code. Also, "Classic Mac OS" is a more common term than "Old".

I thought mentioning Commodore in my code would make it look totally more old-school :-)
I removed the Commodore and renamed to "Classic Mac OS".

>> Source/WebCore/platform/text/QuotedPrintable.cpp:62
>> +String quotedPrintableEncode(const Vector<char>& in)
> 
> I'm still not sure of types here. The result is a 7-bit data buffer, do we really need a String?

I was initially trying to match the Base64 API, buts it is not a good API, so I should probably not.
I changed the API to take a Vector<char>& out instead.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:86
>> +            if (lengthOfLineEnding > 0) {
> 
> No need to compare to zero, just "if (lengthOfLineEnding)" would be it per WebKit coding style.

Done.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:96
>> +        if (!isLastCharacter && currentLineLength > maximumLineLengthToFitCharacter) {
> 
> The !isLastCharacter branch confuses me slightly here.  Does that mean we can exceed the maximumLineLength on the last character of the buffer?  For example, what if we currently have 76 characters in the line and we have an last character that requires encoding.  That would seem to push us up to 79 characters.  Maybe that's ok because we're aiming for < 80 ?

Ah! Good catch. I fixed the code (and added a test case to my unit-tests, I'll land that in a different CL).

>> Source/WebCore/platform/text/QuotedPrintable.cpp:105
>> +            result.append(hexTable[static_cast<int>((currentCharacter >> 4) & 0xF)]);
> 
> If these functions don't exist in ASCIIType.h, maybe it would make senes to add them there?  They seem more general purpose than this file.

OK, I moved this method to ASCIICType.h.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:106
>> +            result.append(hexTable[static_cast<int>(currentCharacter & 0xF)]);
> 
> Why "int"? This is an index into a table.

Method has been moved to ASCIICType.h

>> Source/WebCore/platform/text/QuotedPrintable.cpp:142
>> +            continue;
> 
> What does IE do with malformed sources (e.g. a single CR as soft line ending)?

Unfortunately, my IE does not want to load MHTML files anymore for some reason  so I cannot test :-(

>>> Source/WebCore/platform/text/QuotedPrintable.cpp:154
>>> +        char decodedCharacter = static_cast<char>(((upperValue << 4) & 0xF0) | (lowerValue & 0x0F));
>> 
>> I'm not sure if this masking is helpful. toASCIIHexValue returns a sane value already (although its implementation and return type look strange to me).
> 
> Same with this function.

Moved the method to ASCIICType,h, removed the superfluous masking.
Comment 18 Jay Civelli 2011-05-05 15:57:25 PDT
Created attachment 92486 [details]
Patch
Comment 19 Jay Civelli 2011-05-05 16:00:26 PDT
Created attachment 92488 [details]
Patch
Comment 20 Alexey Proskuryakov 2011-05-05 16:18:25 PDT
> Unfortunately, my IE does not want to load MHTML files anymore for some reason  so I cannot test :-(

Ugh. I hope they didn't disable it permanently?

We'll need a lot of testing for upcoming patches.
Comment 21 Jay Civelli 2011-05-05 16:24:20 PDT
(In reply to comment #20)
> > Unfortunately, my IE does not want to load MHTML files anymore for some reason  so I cannot test :-(
> 
> Ugh. I hope they didn't disable it permanently?
> 
> We'll need a lot of testing for upcoming patches.

It is still enabled (the option to generate the MHTML is there), but somehow my IE here at work, chokes when loading the MHTML it itself generated. It used to work in the exact same IE, I might have screwed up my IE somehow (not clear how). I'll try on my home PC tonight.
Comment 22 Jay Civelli 2011-05-05 16:34:07 PDT
Created attachment 92493 [details]
Patch
Comment 23 WebKit Review Bot 2011-05-05 16:36:36 PDT
Attachment 92493 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/wtf/ASCIICType.h:149:  More than one command on the same line  [whitespace/newline] [4]
Source/JavaScriptCore/wtf/ASCIICType.h:157:  More than one command on the same line  [whitespace/newline] [4]
Source/JavaScriptCore/wtf/ASCIICType.h:158:  More than one command on the same line  [whitespace/newline] [4]
Total errors found: 3 in 11 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 24 Adam Barth 2011-05-05 17:02:47 PDT
Comment on attachment 92493 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=92493&action=review

Lets rock and roll.

> Source/WebCore/platform/text/QuotedPrintable.cpp:97
> +        if (currentLineLength + lengthOfEncodedCharacter > maximumLineLength) {

Note: No integer overflow problem here because these values are all bounded.
Comment 25 WebKit Commit Bot 2011-05-05 18:38:12 PDT
Comment on attachment 92493 [details]
Patch

Clearing flags on attachment: 92493

Committed r85909: <http://trac.webkit.org/changeset/85909>
Comment 26 WebKit Commit Bot 2011-05-05 18:38:17 PDT
All reviewed patches have been landed.  Closing bug.
Comment 27 WebKit Commit Bot 2011-05-05 20:38:02 PDT
The commit-queue encountered the following flaky tests while processing attachment 92493 [details]:

java/lc3/JavaObject/JavaObjectToByte-006.html bug 60333 (author: ap@webkit.org)
The commit-queue is continuing to process your patch.