Bug 22949 - Refactor HTMLFormElement code
: Refactor HTMLFormElement code
Status: RESOLVED FIXED
Product: WebKit
Classification: Unclassified
Component: XML
: 528+ (Nightly build)
: Macintosh Mac OS X 10.5
: P2 Normal
Assigned To: Nikolas Zimmermann
:
Depends on:
Blocks: 20393 22638
  Show dependency treegraph
 
Reported: 2008-12-20 17:14 PST by Nikolas Zimmermann
Modified: 2008-12-21 17:46 PST (History)
1 user (show)

See Also:


Attachments
Initial patch (50.38 KB, patch)
2008-12-20 17:19 PST, Nikolas Zimmermann
no flags Details | Formatted Diff | Diff
Updated patch (52.34 KB, patch)
2008-12-20 19:39 PST, Nikolas Zimmermann
darin: review-
Details | Formatted Diff | Diff
Updated patch v2 (51.59 KB, patch)
2008-12-21 15:33 PST, Nikolas Zimmermann
staikos: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Nikolas Zimmermann 2008-12-20 17:14:12 PST
HTMLFormElement contains useful code regarding GET/POST to url handling, multi-part form handling, boundary string generation etc.

This should be abstracted in a helper class usable by both HTMLFormElement & WMLGoElement - to share the whole fragile form-building logic in a central place.
Comment 1 Nikolas Zimmermann 2008-12-20 17:19:55 PST
Created attachment 26173 [details]
Initial patch
Comment 2 Nikolas Zimmermann 2008-12-20 18:48:50 PST
Comment on attachment 26173 [details]
Initial patch

Clearing review flag for now. As suggested by Holger, a better approach would be to create a FormDataBuilder class, which encapsulates the Vector<char> instead of having to pass it to several functions, as it is in this patch. Uploading a revised version soon.
Comment 3 Nikolas Zimmermann 2008-12-20 19:39:43 PST
Created attachment 26179 [details]
Updated patch
Comment 4 Nikolas Zimmermann 2008-12-20 20:02:03 PST
Comment on attachment 26179 [details]
Updated patch

Okay, I've realized memory is not freed immediately with the current patch.
The simple fix is to move "m_formDataBuilder.clear()" from the beginning of the createFormData() to the end.

Note: the memory is not leaked, it's just freed too late - upon HTMLFormElement destruction, or if another createFormData() call is invoked. This is not a good idea, the m_encodedData / m_multiPartData fields of FormDataBuilder, aren't needed anymore after processing, as FormData:appendString() memcpy's the char ptr, in it's internal data buffer.

> +PassRefPtr<FormData> HTMLFormElement::createFormData(const CString& boundary)
>  {
> +    m_formDataBuilder.clear();

Move the call

>      result->appendData(encodedData.data(), encodedData.size());
-> here.
>      return result;
>  }
Comment 5 Darin Adler 2008-12-21 09:54:14 PST
Comment on attachment 26179 [details]
Updated patch

I think the basic idea on this patch is OK, but making the data used to build form data into a member of HTMLFormElement is a step backward. Your troubles with the lifetime of the contents of m_encodedData and m_multiPartData demonstrate this. It's just not good to take local data and turn it into a data member.

There are plenty of ways around this. One is to pass the vectors in to all the relevant functions. The other is to make this two classes instead of one.

> + * Copyright (C) 2008 Nikolas Zimmermann <nikolas.zimmermann@torchmobile.com>

When you're grabbing a bunch of existing code and refactoring it, you should not assume that the original copyright holders are relinquishing copyright on all the code. You need to list the appropriate original authors here too.

> +static inline void appendString(Vector<char>& buffer, const char string)
> +{
> +    buffer.append(string);
> +}

It's strange to call this function "appendString" since it appends a character, not a string. The "string" in "appendString" is what's being appended, not what you're appending to. Maybe we should just name the helper function "append" instead of "appendString". Also, the type of the argument should be "char", not "const char".

> +    static const char s_hexDigits[17] = "0123456789ABCDEF";
> +
> +    // Same safe characters as Netscape for compatibility.
> +    static const char s_safeCharacters[] = "-._*";

I don't think these s_ prefixes add anything; since these are so local I think it reads great to just use normal names. I also think the code read well when the definitions were right where these things were used.

> +    // The RFC 2046 spec says the AlphaNumeric characters plus the following characters

The word "alphanumeric" does not have a capital "n" nor should it be capitalized.

> +    // are legal for boundaries:  '()+_,-./:=?
> +    // However the following characters, though legal, cause some sites to fail:
> +    // (),./:=
> +    // http://bugs.webkit.org/show_bug.cgi?id=13352
> +    static const char s_alphaNumericEncodingMap[64] =
> +    {
> +      0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48,

Braces go on the same line as the definition.

Indentation is 4 spaces, not 2 spaces.

> +    bool isPostMethod() const { return m_isPostMethod; }
> +    void setIsPostMethod(bool value) { m_isPostMethod = value; }
> +
> +    bool isMultiPartForm() const { return m_isMultiPartForm; }
> +    void setIsMultiPartForm(bool value) { m_isMultiPartForm = value; }
> +
> +    String encodingType() const { return m_encodingType; }
> +    void setEncodingType(const String& value) { m_encodingType = value; }
> +
> +    String acceptCharset() const { return m_acceptCharset; }
> +    void setAcceptCharset(const String& value) { m_acceptCharset = value; }
> +
> +    void parseEncodingType(const String&);
> +    void parseMethodType(const String&);
> +
> +    TextEncoding dataEncoding(Document*) const;
> +
> +    // Helper functions used by HTMLFormElement/WMLGoElement for multi-part form data
> +    void beginMultiPartHeader(Vector<char>&, const CString& boundary, const CString& name) const;
> +    void addBoundaryToMultiPartHeader(Vector<char>& header, const CString& boundary, bool isLastBoundary = false) const;
> +    void addFileNameToMultiPartHeader(Vector<char>&, const TextEncoding& encoding, const String& fileName) const;
> +    void addContentTypeToMultiPartHeader(Vector<char>&, const CString& mimeType) const;
> +    void finishMultiPartHeader(Vector<char>&) const;
> +
> +    // Functions used by HTMLFormElement/WMLGoElement for non multi-part form data
> +    void addKeyValuePairAsFormData(Vector<char>&, const CString& key, const CString& value) const;
> +
> +    // Encode the input string as HTML form compatible data
> +    void encodeAsFormData(Vector<char>&, const CString&) const;
> +
> +    // Returns a 0-terminated C string in the vector
> +    void generateUniqueBoundaryString(Vector<char>&) const;

Can any of these functions be private? When you make a new class with so many public functions and so few private ones, I worry that the architecture isn't right.

There's no need for the name "encoding" for the const TextEncoding& argument.

Do we use the spelling "fileName" or "filename" in WebKit? It looks like you preferred "fileName", treating it as two separate words, and renamed many things from "fileName" to "filename" but I see "filename" in all sorts of other places in the WebKit code; maybe we should standardize on that.

I originally preferred "fileName", but now I'm leaning toward "filename".

I'm going to say review- because I don't think it's good to add two Vector members to HTMLFormElement that used to be local variables.
Comment 6 Nikolas Zimmermann 2008-12-21 10:40:31 PST
(In reply to comment #5)
> (From update of attachment 26179 [details] [review])
> I think the basic idea on this patch is OK, but making the data used to build
> form data into a member of HTMLFormElement is a step backward. Your troubles
> with the lifetime of the contents of m_encodedData and m_multiPartData
> demonstrate this. It's just not good to take local data and turn it into a data
> member.

Hey Darin,

okay, this is really stupid. This was exactly my previous approach - as seen in "Initial patch". Holger convinced me this was not the right way to go, and I updated the patch....

Would you prefer a solution, as it's done in the 'Initial patch'?

Greetings,
Niko
Comment 7 Darin Adler 2008-12-21 11:09:50 PST
(In reply to comment #6)
> okay, this is really stupid. This was exactly my previous approach - as seen in
> "Initial patch". Holger convinced me this was not the right way to go, and I
> updated the patch....

Sorry, I don't mean to make you go in circles like this.

> Would you prefer a solution, as it's done in the 'Initial patch'?

Yes, I do prefer that aspect of the initial patch.

Holger might be right that it's easier to read when the code has fewer arguments. That could be addressed by making those arguments be a class that groups them together or some other similar simplification.

But having state in the DOM tree all the time that has to do only with the encoding process both makes the DOM tree bigger and makes bugs possible that are otherwise impossible. It's better to make bugs due to leftover state completely impossible, by leaving that state out.

Sorry I haven't been on IRC with you and Holger so we could all discuss it together.
Comment 8 Nikolas Zimmermann 2008-12-21 12:11:13 PST
Good evening,
 
> Sorry, I don't mean to make you go in circles like this.
No problem, you couldn't know it - there are no comments in the bug report that indicate Holgers influence :-)
 
> Yes, I do prefer that aspect of the initial patch.
> 
> Holger might be right that it's easier to read when the code has fewer
> arguments. That could be addressed by making those arguments be a class that
> groups them together or some other similar simplification.
Okay, I readded the "Vector<char>&" as first parameter to the affected functions, and made them static.
 
> But having state in the DOM tree all the time that has to do only with the
> encoding process both makes the DOM tree bigger and makes bugs possible that
> are otherwise impossible. It's better to make bugs due to leftover state
> completely impossible, by leaving that state out.
Yes, you're definately right. It's better to combine the both patch variations, so after all it wasn't the worst idea to code both :-)

> 
> Sorry I haven't been on IRC with you and Holger so we could all discuss it
> together.
No worries, a new patch will be uploaded later.
Comment 9 Nikolas Zimmermann 2008-12-21 15:33:25 PST
Created attachment 26190 [details]
Updated patch v2
Comment 10 George Staikos 2008-12-21 17:42:41 PST
r+ with comment about incorrect copyright notices
Comment 11 Nikolas Zimmermann 2008-12-21 17:46:37 PST
Thanks for the review.

As Darins issues appear to be solved - landed in r39430.
Darin, in case you still see issues, please comment and I'll revert and/or fix them.