Bug 37913 - MD5 is required for WebSocket new protocol implementation
Summary: MD5 is required for WebSocket new protocol implementation
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Fumitoshi Ukai
URL:
Keywords:
Depends on:
Blocks: 35572
  Show dependency treegraph
 
Reported: 2010-04-20 23:59 PDT by Fumitoshi Ukai
Modified: 2010-04-23 12:16 PDT (History)
5 users (show)

See Also:


Attachments
Patch (20.25 KB, patch)
2010-04-21 04:18 PDT, Fumitoshi Ukai
no flags Details | Formatted Diff | Diff
Patch (23.69 KB, patch)
2010-04-21 23:08 PDT, Fumitoshi Ukai
abarth: review+
abarth: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Fumitoshi Ukai 2010-04-20 23:59:36 PDT
WebSocket new protocol described in http://www.whatwg.org/specs/web-socket-protocol/ requires MD5 implementation.
Comment 1 Fumitoshi Ukai 2010-04-21 04:18:19 PDT
Created attachment 53939 [details]
Patch
Comment 2 Mark Rowe (bdash) 2010-04-21 04:48:03 PDT
Comment on attachment 53939 [details]
Patch

> diff --git a/JavaScriptCore/GNUmakefile.am b/JavaScriptCore/GNUmakefile.am
> index df0181e..4cbfb29 100644
> --- a/JavaScriptCore/GNUmakefile.am
> +++ b/JavaScriptCore/GNUmakefile.am
> @@ -244,6 +244,8 @@ javascriptcore_sources += \
>  	JavaScriptCore/wtf/ListHashSet.h \
>  	JavaScriptCore/wtf/ListRefPtr.h \
>  	JavaScriptCore/wtf/Locker.h \
> +        JavaScriptCore/wtf/MD5.cpp \
> +        JavaScriptCore/wtf/MD5.h \

This indentation looks wrong.


> diff --git a/JavaScriptCore/wtf/MD5.cpp b/JavaScriptCore/wtf/MD5.cpp
> new file mode 100644
> index 0000000..5ad44c0
> --- /dev/null
> +++ b/JavaScriptCore/wtf/MD5.cpp
> @@ -0,0 +1,232 @@
> +// The original file was copied from sqlite, and was in the public domain.
> +// Modifications Copyright 2006 Google Inc. All Rights Reserved

This suggests that the original code was in the public domain, but modifications are copyright Google.  Shouldn’t this mention which license Google is releasing their modifications under?

> +#include <wtf/OwnPtr.h>

This doesn’t seem to be used.
Comment 3 Darin Adler 2010-04-21 09:19:06 PDT
Comment on attachment 53939 [details]
Patch

>  	JavaScriptCore/wtf/Locker.h \
> +        JavaScriptCore/wtf/MD5.cpp \
> +        JavaScriptCore/wtf/MD5.h \
>  	JavaScriptCore/wtf/MainThread.cpp \

Looks like existing entries here use tabs and your new entry uses spaces.

> +static void byteReverse(uint8_t *buf, unsigned longs)

The "*" should be next to the type, uint8_t.

The name here is not great. The words "byte reverse" are not a verb phrase. I think reverseBytes would be better.

> +{
> +    uint32_t t;
> +    do {
> +        t = static_cast<uint32_t>(static_cast<unsigned>(buf[3]) << 8 | buf[2]) << 16 | (static_cast<unsigned>(buf[1]) << 8 | buf[0]);

The casts to unsigned here are not helpful. The code will work fine without them, because the operation sizes are based on the type "unsigned" not the type "uint8_t" due how the C language works.

> +        *(reinterpret_cast<uint32_t *>(buf)) = t;

There is an extra set of parentheses here that is to needed.

This won't work on all WebKit platforms -- some CPUs require alignment. See StringHash.h for an example of this.

> +        buf += 4;
> +    } while (--longs);

Since this will malfunction if passed 0 for "longs" there should be an assertion that longs is non-zero.

> +}
> +// The four core functions - F1 is optimized somewhat
> +

This comment is about the code below, so it should have a blank line before it and possibly no blank line below it.

> +// #define F1(x, y, z) (x & y | ~x & z)
> +#define F1(x, y, z) (z ^ (x & (y ^ z)))

It is unclear to have the commented out F1 as well as the real F1. The comment above saying "F1 is optimized somewhat" does not make things clear enough. I think there's a better way to write the comment and say explicitly what's going on.

> +#define F2(x, y, z) F1(z, x, y)
> +#define F3(x, y, z) (x ^ y ^ z)
> +#define F4(x, y, z) (y ^ (x | ~z))
> +
> +// This is the central step in the MD5 algorithm.
> +#define MD5STEP(f, w, x, y, z, data, s) \
> +    (w += f(x, y, z) + data,  w = w << s | w >> (32 - s),  w += x)

Extra spaces here after the commas.

> +    register uint32_t a, b, c, d;
> +
> +    a = buf[0];
> +    b = buf[1];
> +    c = buf[2];
> +    d = buf[3];

The use of "register" here is quaint and outdated. No compiler we use responds to that keyword.

The variables should be declared when they are initialized, not on a separate line.

> +    uint32_t t;
> +
> +    // Update bitcount
> +
> +    t = m_bits[0];

THe variable t should be initialized on the same line it's declared on.

> +    if ((m_bits[0] = t + (static_cast<uint32_t>(length) << 3)) < t)

Is the static_cast here needed?

Can this code be written with the assignment outside the if statement instead?

> +        m_bits[1]++; // Carry from low to high
> +    m_bits[1] += static_cast<uint32_t>(length >> 29);

Is the static_cast here needed? Does it silence a warning?

> +        MD5Transform(m_buf, reinterpret_cast<uint32_t *>(m_in));

The "*" goes next to the type.

This won't work on all WebKit platforms -- some CPUs require alignment. See StringHash.h for an example of this.

> +        MD5Transform(m_buf, reinterpret_cast<uint32_t *>(m_in));

The "*" goes next to the type.

This won't work on all WebKit platforms -- some CPUs require alignment. See StringHash.h for an example of this.

> +Vector<uint8_t, 16> MD5::checksum()
> +{
> +    unsigned count;
> +    uint8_t* p;
> +
> +    // Compute number of bytes mod 64
> +    count = (m_bits[0] >> 3) & 0x3F;
> +
> +    // Set the first char of padding to 0x80.  This is safe since there is
> +    // always at least one byte free
> +    p = m_in + count;

THe variables count and p should be initialized on the same line they are declared on.

> +        // Two lots of padding:  Pad the first block to 64 bytes

I don't understand the term "lots of padding". What is a "lot"?

> +        MD5Transform(m_buf, reinterpret_cast<uint32_t *>(m_in));

The "*" goes next to the type.

This won't work on all WebKit platforms -- some CPUs require alignment. See StringHash.h for an example of this.

> +    // Append length in bits and transform
> +    (reinterpret_cast<uint32_t *>(m_in))[14] = m_bits[0];
> +    (reinterpret_cast<uint32_t *>(m_in))[15] = m_bits[1];
> +
> +    MD5Transform(m_buf, reinterpret_cast<uint32_t *>(m_in));

The "*" goes next to the type.

This won't work on all WebKit platforms -- some CPUs require alignment. See StringHash.h for an example of this.

> +    byteReverse(reinterpret_cast<uint8_t *>(m_buf), 4);

The "*" goes next to the type.

Does this work on both little-endian and big-endian platforms?

> +#include <wtf/OwnPtr.h>
> +#include <wtf/Vector.h>

There is no reason to include the OwnPtr.h header here.

> +    ~MD5() { }

Please omit this destructor. It has no effect and matches what the compiler would generate if you didn't write anything.

> +    void addBytes(const Vector<uint8_t>& input)
> +    {
> +        addBytes(input.data(), input.size());
> +    }
> +    void addBytes(const uint8_t* input, size_t length);
> +    Vector<uint8_t, 16> checksum();

I think you need some sort of comment to make clear that calling checksum has a side effect of resetting the state of the object. And the checksum function should go in a separate paragraph.

I know that often we have classes with no testing. But it's critical we have a unit test for this class somewhere. If nothing else, people porting will need to know that this compiled and works properly. It could be as simple as having a debug only function that runs the unit test in debug builds the first time the MD5 constructor is called.

review- primarily because of the alignment issues
Comment 4 Fumitoshi Ukai 2010-04-21 23:08:45 PDT
Created attachment 54026 [details]
Patch
Comment 5 Fumitoshi Ukai 2010-04-21 23:12:36 PDT
Thanks for review.

(In reply to comment #2)
> (From update of attachment 53939 [details])
> > diff --git a/JavaScriptCore/GNUmakefile.am b/JavaScriptCore/GNUmakefile.am
> > index df0181e..4cbfb29 100644
> > --- a/JavaScriptCore/GNUmakefile.am
> > +++ b/JavaScriptCore/GNUmakefile.am
> > @@ -244,6 +244,8 @@ javascriptcore_sources += \
> >  	JavaScriptCore/wtf/ListHashSet.h \
> >  	JavaScriptCore/wtf/ListRefPtr.h \
> >  	JavaScriptCore/wtf/Locker.h \
> > +        JavaScriptCore/wtf/MD5.cpp \
> > +        JavaScriptCore/wtf/MD5.h \
> 
> This indentation looks wrong.

Fixed.

> 
> 
> > diff --git a/JavaScriptCore/wtf/MD5.cpp b/JavaScriptCore/wtf/MD5.cpp
> > new file mode 100644
> > index 0000000..5ad44c0
> > --- /dev/null
> > +++ b/JavaScriptCore/wtf/MD5.cpp
> > @@ -0,0 +1,232 @@
> > +// The original file was copied from sqlite, and was in the public domain.
> > +// Modifications Copyright 2006 Google Inc. All Rights Reserved
> 
> This suggests that the original code was in the public domain, but
> modifications are copyright Google.  Shouldn’t this mention which license
> Google is releasing their modifications under?

Add license terms.

> 
> > +#include <wtf/OwnPtr.h>
> 
> This doesn’t seem to be used.

Oops. Removed.
Comment 6 Fumitoshi Ukai 2010-04-21 23:22:47 PDT
Thanks for review.

(In reply to comment #3)
> (From update of attachment 53939 [details])
> >  	JavaScriptCore/wtf/Locker.h \
> > +        JavaScriptCore/wtf/MD5.cpp \
> > +        JavaScriptCore/wtf/MD5.h \
> >  	JavaScriptCore/wtf/MainThread.cpp \
> 
> Looks like existing entries here use tabs and your new entry uses spaces.

Fixed.

> 
> > +static void byteReverse(uint8_t *buf, unsigned longs)
> 
> The "*" should be next to the type, uint8_t.

Fixed.

> 
> The name here is not great. The words "byte reverse" are not a verb phrase. I
> think reverseBytes would be better.

Ok. renamed to reverseBytes.

> 
> > +{
> > +    uint32_t t;
> > +    do {
> > +        t = static_cast<uint32_t>(static_cast<unsigned>(buf[3]) << 8 | buf[2]) << 16 | (static_cast<unsigned>(buf[1]) << 8 | buf[0]);
> 
> The casts to unsigned here are not helpful. The code will work fine without
> them, because the operation sizes are based on the type "unsigned" not the type
> "uint8_t" due how the C language works.

Ok. original code had (unsigned) cast here, but I agree we don't need it.
Removed casts to unsigned.

> 
> > +        *(reinterpret_cast<uint32_t *>(buf)) = t;
> 
> There is an extra set of parentheses here that is to needed.

Fixed.
 
> This won't work on all WebKit platforms -- some CPUs require alignment. See
> StringHash.h for an example of this.

I believe buf is always aligned to uint32_t natural boundary here.
I added assertion to check alignement.  Is it ok?

> 
> > +        buf += 4;
> > +    } while (--longs);
> 
> Since this will malfunction if passed 0 for "longs" there should be an
> assertion that longs is non-zero.

Add ASSERT(longs > 0) at the beginning of reverseBytes().

> 
> > +}
> > +// The four core functions - F1 is optimized somewhat
> > +
> 
> This comment is about the code below, so it should have a blank line before it
> and possibly no blank line below it.
> 
> > +// #define F1(x, y, z) (x & y | ~x & z)
> > +#define F1(x, y, z) (z ^ (x & (y ^ z)))
> 
> It is unclear to have the commented out F1 as well as the real F1. The comment
> above saying "F1 is optimized somewhat" does not make things clear enough. I
> think there's a better way to write the comment and say explicitly what's going
> on.

Updated.

> 
> > +#define F2(x, y, z) F1(z, x, y)
> > +#define F3(x, y, z) (x ^ y ^ z)
> > +#define F4(x, y, z) (y ^ (x | ~z))
> > +
> > +// This is the central step in the MD5 algorithm.
> > +#define MD5STEP(f, w, x, y, z, data, s) \
> > +    (w += f(x, y, z) + data,  w = w << s | w >> (32 - s),  w += x)
> 
> Extra spaces here after the commas.

Fixed.

> 
> > +    register uint32_t a, b, c, d;
> > +
> > +    a = buf[0];
> > +    b = buf[1];
> > +    c = buf[2];
> > +    d = buf[3];
> 
> The use of "register" here is quaint and outdated. No compiler we use responds
> to that keyword.
> 
> The variables should be declared when they are initialized, not on a separate
> line.

Fixed.

> 
> > +    uint32_t t;
> > +
> > +    // Update bitcount
> > +
> > +    t = m_bits[0];
> 
> THe variable t should be initialized on the same line it's declared on.

Fixed.

> 
> > +    if ((m_bits[0] = t + (static_cast<uint32_t>(length) << 3)) < t)
> 
> Is the static_cast here needed?

I don't think we need casts here.  Removed.

> 
> Can this code be written with the assignment outside the if statement instead?

Ok. refactored.

> 
> > +        m_bits[1]++; // Carry from low to high
> > +    m_bits[1] += static_cast<uint32_t>(length >> 29);
> 
> Is the static_cast here needed? Does it silence a warning?

Maybe, unnecessary.

> 
> > +        MD5Transform(m_buf, reinterpret_cast<uint32_t *>(m_in));
> 
> The "*" goes next to the type.

Fixed.
 
> This won't work on all WebKit platforms -- some CPUs require alignment. See
> StringHash.h for an example of this.

I think m_in is always aligned to uint32_t natural boundary.
I added alignement check of m_in in MD5 constructor.  Is it ok?

> 
> > +        MD5Transform(m_buf, reinterpret_cast<uint32_t *>(m_in));
> 
> The "*" goes next to the type.

Fixed.

> 
> This won't work on all WebKit platforms -- some CPUs require alignment. See
> StringHash.h for an example of this.
> 
> > +Vector<uint8_t, 16> MD5::checksum()
> > +{
> > +    unsigned count;
> > +    uint8_t* p;
> > +
> > +    // Compute number of bytes mod 64
> > +    count = (m_bits[0] >> 3) & 0x3F;
> > +
> > +    // Set the first char of padding to 0x80.  This is safe since there is
> > +    // always at least one byte free
> > +    p = m_in + count;
> 
> THe variables count and p should be initialized on the same line they are
> declared on.

Fixed.

> 
> > +        // Two lots of padding:  Pad the first block to 64 bytes
> 
> I don't understand the term "lots of padding". What is a "lot"?
> 
> > +        MD5Transform(m_buf, reinterpret_cast<uint32_t *>(m_in));
> 
> The "*" goes next to the type.

Fixed.

> 
> This won't work on all WebKit platforms -- some CPUs require alignment. See
> StringHash.h for an example of this.
> 
> > +    // Append length in bits and transform
> > +    (reinterpret_cast<uint32_t *>(m_in))[14] = m_bits[0];
> > +    (reinterpret_cast<uint32_t *>(m_in))[15] = m_bits[1];
> > +
> > +    MD5Transform(m_buf, reinterpret_cast<uint32_t *>(m_in));
> 
> The "*" goes next to the type.

Fixed.

> 
> This won't work on all WebKit platforms -- some CPUs require alignment. See
> StringHash.h for an example of this.
> 
> > +    byteReverse(reinterpret_cast<uint8_t *>(m_buf), 4);
> 
> The "*" goes next to the type.

Fixed.

> 
> Does this work on both little-endian and big-endian platforms?

I think so, because it comes from sqlite and I believe sqlite works on both little-endian and big-endian platforms.

> 
> > +#include <wtf/OwnPtr.h>
> > +#include <wtf/Vector.h>
> 
> There is no reason to include the OwnPtr.h header here.

Fixed.

> 
> > +    ~MD5() { }
> 
> Please omit this destructor. It has no effect and matches what the compiler
> would generate if you didn't write anything.

Removed.

> 
> > +    void addBytes(const Vector<uint8_t>& input)
> > +    {
> > +        addBytes(input.data(), input.size());
> > +    }
> > +    void addBytes(const uint8_t* input, size_t length);
> > +    Vector<uint8_t, 16> checksum();
> 
> I think you need some sort of comment to make clear that calling checksum has a
> side effect of resetting the state of the object. And the checksum function
> should go in a separate paragraph.

Ok.  Added comments and separete paragraph for checksum().

> 
> I know that often we have classes with no testing. But it's critical we have a
> unit test for this class somewhere. If nothing else, people porting will need
> to know that this compiled and works properly. It could be as simple as having
> a debug only function that runs the unit test in debug builds the first time
> the MD5 constructor is called.

Added.

> 
> review- primarily because of the alignment issues
Comment 7 Adam Barth 2010-04-22 12:43:22 PDT
Comment on attachment 54026 [details]
Patch

JavaScriptCore/wtf/MD5.cpp:74
 +        snprintf(buf, 3, "%02x", digest.at(i));
Why the mix of two space and four space indent?

JavaScriptCore/wtf/MD5.h:31
 +  #ifndef MD5_h
I thought we had a WTF in the header guards in WTF.
Comment 8 Darin Adler 2010-04-22 17:32:34 PDT
Comment on attachment 54026 [details]
Patch

JavaScriptCore/wtf/MD5.cpp:61
 +  #define testMD5() /* do nothing */
Could use an inline function instead of a macro for this.

JavaScriptCore/wtf/MD5.cpp:112
 +  // #define F1(x, y, z) (x & y | ~x & z)
This line should be removed.
Comment 9 Fumitoshi Ukai 2010-04-22 18:08:50 PDT
(In reply to comment #7)
> (From update of attachment 54026 [details])
> JavaScriptCore/wtf/MD5.cpp:74
>  +        snprintf(buf, 3, "%02x", digest.at(i));
> Why the mix of two space and four space indent?

Fixed.

> 
> JavaScriptCore/wtf/MD5.h:31
>  +  #ifndef MD5_h
> I thought we had a WTF in the header guards in WTF.

Ok. but check-webkit-style reports
JavaScriptCore/wtf/MD5.h:31:  #ifndef header guard has wrong style, please use: MD5_h  [build/header_guard] [5]
Comment 10 Fumitoshi Ukai 2010-04-22 18:09:32 PDT
(In reply to comment #8)
> (From update of attachment 54026 [details])
> JavaScriptCore/wtf/MD5.cpp:61
>  +  #define testMD5() /* do nothing */
> Could use an inline function instead of a macro for this.

Fixed.

> 
> JavaScriptCore/wtf/MD5.cpp:112
>  +  // #define F1(x, y, z) (x & y | ~x & z)
> This line should be removed.

Ok. Removed.
Comment 11 Fumitoshi Ukai 2010-04-22 18:13:59 PDT
Committed r58136: <http://trac.webkit.org/changeset/58136>
Comment 12 WebKit Review Bot 2010-04-22 18:26:17 PDT
http://trac.webkit.org/changeset/58136 might have broken Chromium Mac Release