Bug 22949 - Refactor HTMLFormElement code
: Refactor HTMLFormElement code
Status: RESOLVED FIXED
: WebKit
XML
: 528+ (Nightly build)
: Macintosh Mac OS X 10.5
: P2 Normal
Assigned To:
:
:
:
: 20393 22638
  Show dependency treegraph
 
Reported: 2008-12-20 17:14 PST by
Modified: 2008-12-21 17:46 PST (History)


Attachments
Initial patch (50.38 KB, patch)
2008-12-20 17:19 PST, Nikolas Zimmermann
no flags Review Patch | Details | Formatted Diff | Diff
Updated patch (52.34 KB, patch)
2008-12-20 19:39 PST, Nikolas Zimmermann
darin: review-
Review Patch | Details | Formatted Diff | Diff
Updated patch v2 (51.59 KB, patch)
2008-12-21 15:33 PST, Nikolas Zimmermann
staikos: review+
Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2008-12-20 17:14:12 PST
HTMLFormElement contains useful code regarding GET/POST to url handling, multi-part form handling, boundary string generation etc.

This should be abstracted in a helper class usable by both HTMLFormElement & WMLGoElement - to share the whole fragile form-building logic in a central place.
------- Comment #1 From 2008-12-20 17:19:55 PST -------
Created an attachment (id=26173) [details]
Initial patch
------- Comment #2 From 2008-12-20 18:48:50 PST -------
(From update of attachment 26173 [details])
Clearing review flag for now. As suggested by Holger, a better approach would be to create a FormDataBuilder class, which encapsulates the Vector<char> instead of having to pass it to several functions, as it is in this patch. Uploading a revised version soon.
------- Comment #3 From 2008-12-20 19:39:43 PST -------
Created an attachment (id=26179) [details]
Updated patch
------- Comment #4 From 2008-12-20 20:02:03 PST -------
(From update of attachment 26179 [details])
Okay, I've realized memory is not freed immediately with the current patch.
The simple fix is to move "m_formDataBuilder.clear()" from the beginning of the createFormData() to the end.

Note: the memory is not leaked, it's just freed too late - upon HTMLFormElement destruction, or if another createFormData() call is invoked. This is not a good idea, the m_encodedData / m_multiPartData fields of FormDataBuilder, aren't needed anymore after processing, as FormData:appendString() memcpy's the char ptr, in it's internal data buffer.

> +PassRefPtr<FormData> HTMLFormElement::createFormData(const CString& boundary)
>  {
> +    m_formDataBuilder.clear();

Move the call

>      result->appendData(encodedData.data(), encodedData.size());
-> here.
>      return result;
>  }
------- Comment #5 From 2008-12-21 09:54:14 PST -------
(From update of attachment 26179 [details])
I think the basic idea on this patch is OK, but making the data used to build form data into a member of HTMLFormElement is a step backward. Your troubles with the lifetime of the contents of m_encodedData and m_multiPartData demonstrate this. It's just not good to take local data and turn it into a data member.

There are plenty of ways around this. One is to pass the vectors in to all the relevant functions. The other is to make this two classes instead of one.

> + * Copyright (C) 2008 Nikolas Zimmermann <nikolas.zimmermann@torchmobile.com>

When you're grabbing a bunch of existing code and refactoring it, you should not assume that the original copyright holders are relinquishing copyright on all the code. You need to list the appropriate original authors here too.

> +static inline void appendString(Vector<char>& buffer, const char string)
> +{
> +    buffer.append(string);
> +}

It's strange to call this function "appendString" since it appends a character, not a string. The "string" in "appendString" is what's being appended, not what you're appending to. Maybe we should just name the helper function "append" instead of "appendString". Also, the type of the argument should be "char", not "const char".

> +    static const char s_hexDigits[17] = "0123456789ABCDEF";
> +
> +    // Same safe characters as Netscape for compatibility.
> +    static const char s_safeCharacters[] = "-._*";

I don't think these s_ prefixes add anything; since these are so local I think it reads great to just use normal names. I also think the code read well when the definitions were right where these things were used.

> +    // The RFC 2046 spec says the AlphaNumeric characters plus the following characters

The word "alphanumeric" does not have a capital "n" nor should it be capitalized.

> +    // are legal for boundaries:  '()+_,-./:=?
> +    // However the following characters, though legal, cause some sites to fail:
> +    // (),./:=
> +    // http://bugs.webkit.org/show_bug.cgi?id=13352
> +    static const char s_alphaNumericEncodingMap[64] =
> +    {
> +      0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48,

Braces go on the same line as the definition.

Indentation is 4 spaces, not 2 spaces.

> +    bool isPostMethod() const { return m_isPostMethod; }
> +    void setIsPostMethod(bool value) { m_isPostMethod = value; }
> +
> +    bool isMultiPartForm() const { return m_isMultiPartForm; }
> +    void setIsMultiPartForm(bool value) { m_isMultiPartForm = value; }
> +
> +    String encodingType() const { return m_encodingType; }
> +    void setEncodingType(const String& value) { m_encodingType = value; }
> +
> +    String acceptCharset() const { return m_acceptCharset; }
> +    void setAcceptCharset(const String& value) { m_acceptCharset = value; }
> +
> +    void parseEncodingType(const String&);
> +    void parseMethodType(const String&);
> +
> +    TextEncoding dataEncoding(Document*) const;
> +
> +    // Helper functions used by HTMLFormElement/WMLGoElement for multi-part form data
> +    void beginMultiPartHeader(Vector<char>&, const CString& boundary, const CString& name) const;
> +    void addBoundaryToMultiPartHeader(Vector<char>& header, const CString& boundary, bool isLastBoundary = false) const;
> +    void addFileNameToMultiPartHeader(Vector<char>&, const TextEncoding& encoding, const String& fileName) const;
> +    void addContentTypeToMultiPartHeader(Vector<char>&, const CString& mimeType) const;
> +    void finishMultiPartHeader(Vector<char>&) const;
> +
> +    // Functions used by HTMLFormElement/WMLGoElement for non multi-part form data
> +    void addKeyValuePairAsFormData(Vector<char>&, const CString& key, const CString& value) const;
> +
> +    // Encode the input string as HTML form compatible data
> +    void encodeAsFormData(Vector<char>&, const CString&) const;
> +
> +    // Returns a 0-terminated C string in the vector
> +    void generateUniqueBoundaryString(Vector<char>&) const;

Can any of these functions be private? When you make a new class with so many public functions and so few private ones, I worry that the architecture isn't right.

There's no need for the name "encoding" for the const TextEncoding& argument.

Do we use the spelling "fileName" or "filename" in WebKit? It looks like you preferred "fileName", treating it as two separate words, and renamed many things from "fileName" to "filename" but I see "filename" in all sorts of other places in the WebKit code; maybe we should standardize on that.

I originally preferred "fileName", but now I'm leaning toward "filename".

I'm going to say review- because I don't think it's good to add two Vector members to HTMLFormElement that used to be local variables.
------- Comment #6 From 2008-12-21 10:40:31 PST -------
(In reply to comment #5)
> (From update of attachment 26179 [details] [review])
> I think the basic idea on this patch is OK, but making the data used to build
> form data into a member of HTMLFormElement is a step backward. Your troubles
> with the lifetime of the contents of m_encodedData and m_multiPartData
> demonstrate this. It's just not good to take local data and turn it into a data
> member.

Hey Darin,

okay, this is really stupid. This was exactly my previous approach - as seen in "Initial patch". Holger convinced me this was not the right way to go, and I updated the patch....

Would you prefer a solution, as it's done in the 'Initial patch'?

Greetings,
Niko
------- Comment #7 From 2008-12-21 11:09:50 PST -------
(In reply to comment #6)
> okay, this is really stupid. This was exactly my previous approach - as seen in
> "Initial patch". Holger convinced me this was not the right way to go, and I
> updated the patch....

Sorry, I don't mean to make you go in circles like this.

> Would you prefer a solution, as it's done in the 'Initial patch'?

Yes, I do prefer that aspect of the initial patch.

Holger might be right that it's easier to read when the code has fewer arguments. That could be addressed by making those arguments be a class that groups them together or some other similar simplification.

But having state in the DOM tree all the time that has to do only with the encoding process both makes the DOM tree bigger and makes bugs possible that are otherwise impossible. It's better to make bugs due to leftover state completely impossible, by leaving that state out.

Sorry I haven't been on IRC with you and Holger so we could all discuss it together.
------- Comment #8 From 2008-12-21 12:11:13 PST -------
Good evening,

> Sorry, I don't mean to make you go in circles like this.
No problem, you couldn't know it - there are no comments in the bug report that indicate Holgers influence :-)

> Yes, I do prefer that aspect of the initial patch.
> 
> Holger might be right that it's easier to read when the code has fewer
> arguments. That could be addressed by making those arguments be a class that
> groups them together or some other similar simplification.
Okay, I readded the "Vector<char>&" as first parameter to the affected functions, and made them static.

> But having state in the DOM tree all the time that has to do only with the
> encoding process both makes the DOM tree bigger and makes bugs possible that
> are otherwise impossible. It's better to make bugs due to leftover state
> completely impossible, by leaving that state out.
Yes, you're definately right. It's better to combine the both patch variations, so after all it wasn't the worst idea to code both :-)

> 
> Sorry I haven't been on IRC with you and Holger so we could all discuss it
> together.
No worries, a new patch will be uploaded later.
------- Comment #9 From 2008-12-21 15:33:25 PST -------
Created an attachment (id=26190) [details]
Updated patch v2
------- Comment #10 From 2008-12-21 17:42:41 PST -------
r+ with comment about incorrect copyright notices
------- Comment #11 From 2008-12-21 17:46:37 PST -------
Thanks for the review.

As Darins issues appear to be solved - landed in r39430.
Darin, in case you still see issues, please comment and I'll revert and/or fix them.