Bug 228724

Summary: Stop tracking form control elements in FormController
Product: WebKit Reporter: Cameron McCormack (:heycam) <heycam>
Component: FormsAssignee: Cameron McCormack (:heycam) <heycam>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, cdumez, changseok, cmarcelo, darin, esprehn+autocc, ews-watchlist, gyuyoung.kim, kangil.han, mifenton, rniwa, webkit-bug-importer, wenson_hsieh, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch none

Description Cameron McCormack (:heycam) 2021-08-02 16:41:39 PDT
The FormController::m_formElementsWithState ListHashSet accesses show up in Speedometer profiles.  Form control elements are only ever associated with a single FormController, so instead we can place them in their own linked list and avoid having the HashSet.
Comment 1 Radar WebKit Bug Importer 2021-08-02 16:41:51 PDT
<rdar://problem/81435095>
Comment 2 Cameron McCormack (:heycam) 2021-08-02 17:17:17 PDT
Created attachment 434807 [details]
Patch
Comment 3 Cameron McCormack (:heycam) 2021-08-02 17:43:31 PDT
(Debug build errors just need some "." / "->" replacement and probably moving that debug-only function to a member function so the friendship works.)
Comment 4 Alex Christensen 2021-08-02 17:47:57 PDT
Comment on attachment 434807 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=434807&action=review

I'm not very familiar with this code.  How do we know that a HTMLFormControlElementWithState can't be in the list of. more than one FormController at the same time?
Adding raw pointers like this has to be perfect and future proof so nobody comes along later and adds code that introduces use-after-free bugs.  Is there a way we could instead do something like add T* m_prev and m_next to DoublyLinkedListNode instead?

> Source/WebCore/html/FormController.cpp:351
> +        element->setNext(0);

WebKit style prefers nullptr instead of 0.

> Source/WebCore/html/FormController.cpp:527
> +    control.setNext(0);

ditto

> Source/WebCore/html/HTMLFormControlElementWithState.h:59
> +    HTMLFormControlElementWithState* m_prev;

These should have an initializer list { nullptr } so we don't accidentally use uninitialized memory.
Comment 5 Cameron McCormack (:heycam) 2021-08-02 18:02:08 PDT
(In reply to Alex Christensen from comment #4)
> I'm not very familiar with this code.  How do we know that a
> HTMLFormControlElementWithState can't be in the list of. more than one
> FormController at the same time?

A Document has a unique FormController that doesn't change, and we only associate and de-associate an HTMLFormControlElementWithState with the FormController when the element is inserted into and removed from a document, and the element can only be inserted into one document at a time.

Existing assertions don't check that an HTMLFormControlElementWithState is inserted into only one FormController, but the assertions in the patch here should.

> Adding raw pointers like this has to be perfect and future proof so nobody
> comes along later and adds code that introduces use-after-free bugs.  Is
> there a way we could instead do something like add T* m_prev and m_next to
> DoublyLinkedListNode instead?

It looks like the purpose of DoublyLinkedListNode is to give the consumer freedom about where to put those fields, but for our case it makes no difference if the fields are in this base class.  So I could add a DoublyLinkedListNodeStorage or something that inherits from DoublyLinkedListNode and has the fields.

In terms of the other manual manipulation of the raw pointers, it would be better if the clearing of the links is done by code in DoublyLinkedList.h.  And even better if the refcounting were done by some version of DoublyLinkedList too.

> > Source/WebCore/html/FormController.cpp:351
> > +        element->setNext(0);
> 
> WebKit style prefers nullptr instead of 0.

(I copied that style from DoublyLinkedList.h; I will clean that up elsewhere.)

> These should have an initializer list { nullptr } so we don't accidentally
> use uninitialized memory.

Should've done this, yes.
Comment 6 Darin Adler 2021-08-02 18:03:12 PDT
Comment on attachment 434807 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=434807&action=review

I’m tempted to write review+ since I don’t see any significant problems.

> Source/WebCore/ChangeLog:19
> +        DoublyLinkedList doesn't clear out prev/next pointers when removing
> +        elements, so we do that manually from FormController.

Do you know why? Are other clients relying on that?

> Source/WebCore/ChangeLog:24
> +        Because we store the list links in the HTMLFormElementWithState objects,
> +        and to avoid wasteful refcount traffic (if we made those links RefPtrs),
> +        we must manually ref and deref the objects when adding and removing from
> +        the list.

Should we add a template derived from DoublyLinkedList so the add/remove with ref/deref is done generically?

I’m not thrilled with the DoublyLinkedListNode design where it requires a m_prev and m_next data member and uses friend and such. I’d like to research why it was designed like this at some point.

> Source/WebCore/ChangeLog:26
> +        As a followup, it would be nice if DoublyLinkedList supported iterators.

Would be easy to add!

> Source/WebCore/html/FormController.h:69
> +    size_t m_formElementsWithStateCount { 0 };

I’m not absolutely sure that it’s important to store this count. If we are already going to iterate the list to build a vector of strings, does not seem prohibitively expensive to iterate it a second time to pre-allocate the appropriately sized vector.
Comment 7 Cameron McCormack (:heycam) 2021-08-02 18:19:59 PDT
(In reply to Cameron McCormack (:heycam) from comment #5)
> (In reply to Alex Christensen from comment #4)
> > These should have an initializer list { nullptr } so we don't accidentally
> > use uninitialized memory.
> 
> Should've done this, yes.

Ah the DoublyLinkedListNode constructor already does this.
Comment 8 Cameron McCormack (:heycam) 2021-08-02 18:24:31 PDT
(In reply to Darin Adler from comment #6)
> > Source/WebCore/ChangeLog:19
> > +        DoublyLinkedList doesn't clear out prev/next pointers when removing
> > +        elements, so we do that manually from FormController.
> 
> Do you know why? Are other clients relying on that?

The other clients are JSC GC and memory management.  Yusuke, is it important to avoid clearing out the next/prev links in the JSC uses of DoublyLinkedList?

> > Source/WebCore/ChangeLog:24
> > +        Because we store the list links in the HTMLFormElementWithState objects,
> > +        and to avoid wasteful refcount traffic (if we made those links RefPtrs),
> > +        we must manually ref and deref the objects when adding and removing from
> > +        the list.
> 
> Should we add a template derived from DoublyLinkedList so the add/remove
> with ref/deref is done generically?

I will try and see if it feels right.

> I’m not thrilled with the DoublyLinkedListNode design where it requires a
> m_prev and m_next data member and uses friend and such. I’d like to research
> why it was designed like this at some point.

Assume it's to avoid separate allocations for the linked list nodes, but maybe it's possible to still avoid that with a different design.

> > Source/WebCore/html/FormController.h:69
> > +    size_t m_formElementsWithStateCount { 0 };
> 
> I’m not absolutely sure that it’s important to store this count. If we are
> already going to iterate the list to build a vector of strings, does not
> seem prohibitively expensive to iterate it a second time to pre-allocate the
> appropriately sized vector.

Probably true.
Comment 9 Yusuke Suzuki 2021-08-02 18:40:17 PDT
Comment on attachment 434807 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=434807&action=review

>> Source/WebCore/ChangeLog:19
>> +        elements, so we do that manually from FormController.
> 
> Do you know why? Are other clients relying on that?

I don't think there is a specific reason, possibly just because it was not necessary.
Comment 10 Yusuke Suzuki 2021-08-02 18:44:23 PDT
Comment on attachment 434807 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=434807&action=review

>>> Source/WebCore/ChangeLog:19
>>> +        elements, so we do that manually from FormController.
>> 
>> Do you know why? Are other clients relying on that?
> 
> I don't think there is a specific reason, possibly just because it was not necessary.

But when changing it, can you ensure that all the clients are not accessing `next()` / `prev()` to retrieve next / prev elements after the removal?
Like, if there is a code like this, it will be broken.

while (!m_list.isEmpty()) {
    Node* node = m_list.removeHead();
    Node* next = node->next();
    Node::destroy(node);
    node = next;
}
Comment 11 Ryosuke Niwa 2021-08-03 00:56:13 PDT
Comment on attachment 434807 [details]
Patch

So this will make an existing bug a security bug. I'll let you know the details privately.
Comment 12 Cameron McCormack (:heycam) 2021-08-04 20:15:42 PDT
Ryosuke separately pointed out that it's likely not necessary to keep FormController::m_formElementsWithState up to date throughout the document's life, and that it might be sufficient to traverse the document to those elements at the time we need to save the form data.  I tried that, and got similar Speedometer speed gains and no adverse effect on PLT5.
Comment 13 Cameron McCormack (:heycam) 2021-08-04 20:26:22 PDT
(In reply to Cameron McCormack (:heycam) from comment #12)
> Ryosuke separately pointed out that it's likely not necessary to keep
> FormController::m_formElementsWithState up to date throughout the document's
> life, and that it might be sufficient to traverse the document to those
> elements at the time we need to save the form data.  I tried that, and got
> similar Speedometer speed gains and no adverse effect on PLT5.

(Which means I'll abandon plans to improve DoublyLinkedList etc.)
Comment 14 Cameron McCormack (:heycam) 2021-08-04 20:26:43 PDT
Created attachment 434962 [details]
Patch
Comment 15 Darin Adler 2021-08-04 21:00:08 PDT
Comment on attachment 434962 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=434962&action=review

Nice solution. A few thoughts, mostly on existing code rather than the new code.

> Source/WebKitLegacy/mac/ChangeLog:9
> +        formElementsCharacterCount is unused and can be removed.

Hooray!

> Source/WebCore/html/FormController.cpp:354
> +std::unique_ptr<FormController::SavedFormStateMap> FormController::createSavedFormStateMap(const Vector<const HTMLFormControlElementWithState*>& controlList)

Maybe not urgent and required for this fix, but noticed all these things:

1) This could be changed to just return SavedFormStateMap since a HashMap is already just a pointer and just a null pointer if the map is empty. Wrapping it in a unique_ptr makes it use slightly more memory, and makes things slightly slower with no benefit that I am aware  of. At the time this code was originally written that may have been necessary since we didn’t have sufficient move semantics for removing that sort of value, but likely it is sufficient now.

2) Similarly, the values in the map could be SavedFormState instead of unique_ptr<SavedFormState>. The issues aren’t identical since SavedFormState has two members, and thus the map would use a bit more memory for empty hash table slots, but I think it’s similarly unnecessary and the code would be tighter without the null check and makeUnique.

3) We could also consider using a lambda instead of a separate function for this, or just inlining it in the function below, since it’s only used in one place.

4) Further, it’s unclear why this creates a map. While it’s true that we will move this into a map after restoring, here when saving, all we do is iterate the map, and all the hashing does is unnecessarily randomize the order of things and possibly discard form elements with equal keys (can that even happen?). I suggest we consider changing it to a Vector. We’ll still make the map in formStatesFromStateVector.

5) If we do items (3) and (4) I think it will become clear that we can just build the vector of strings ("stateVector") as we go without making another vector as an intermediate state.

This refactoring would simplify things quite a bit.

> Source/WebCore/html/FormController.cpp:382
> +    std::unique_ptr<SavedFormStateMap> stateMap = createSavedFormStateMap(controls);

Should use auto here. Except that if we do what I suggest above, this code will be gone.

> Source/WebCore/html/FormController.cpp:384
> +    stateVector.reserveInitialCapacity(controls.size() * 4);

Ah, the mysterious 4! Only the most expert programmers can understand why this should be 4 and not 3 or 5, and luckily we didn’t leave any clues around in the form of comments to help the others.

Also, I think we want a shrinkToFit after making stateVector.

> Source/WebCore/html/HTMLFormControlElementWithState.cpp:52
> +    m_insertionIndex = 0;

Why do this? I believe it would have no effect if we removed this line of code. Maybe it’s aesthetically appealing to not leave this old value lying around?

I suppose it makes the assertion more valuable.
Comment 16 Cameron McCormack (:heycam) 2021-08-04 21:04:39 PDT
(In reply to Darin Adler from comment #15)
> > Source/WebCore/html/HTMLFormControlElementWithState.cpp:52
> > +    m_insertionIndex = 0;
> 
> Why do this? I believe it would have no effect if we removed this line of
> code. Maybe it’s aesthetically appealing to not leave this old value lying
> around?
> 
> I suppose it makes the assertion more valuable.

Right.  I could wrap that in an `#if ASSERT_ENABLED` but it's cheap.
Comment 17 Alex Christensen 2021-08-05 10:10:44 PDT
Comment on attachment 434962 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=434962&action=review

> Source/WebCore/html/FormController.cpp:375
> +        controls.append(&control);

Since these are never null, you could use a Vector<std::reference_wrapper<const HTMLFormControlElementWithState>> to indicate such.
Comment 18 Cameron McCormack (:heycam) 2021-08-05 16:07:06 PDT
Created attachment 435036 [details]
Patch
Comment 19 Darin Adler 2021-08-05 17:36:25 PDT
Comment on attachment 435036 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=435036&action=review

> Source/WebCore/html/FormController.cpp:382
> +    std::unique_ptr<SavedFormStateMap> stateMap = createSavedFormStateMap(controls);

auto

> Source/WebCore/html/FormController.h:56
> -    typedef ListHashSet<RefPtr<HTMLFormControlElementWithState>> FormElementListHashSet;
> +    typedef Vector<std::reference_wrapper<const HTMLFormControlElementWithState>> FormControlVector;

Typically in new code we are using "using" instead of "typedef".
Comment 20 Cameron McCormack (:heycam) 2021-08-05 19:22:32 PDT
Created attachment 435048 [details]
Patch
Comment 21 Cameron McCormack (:heycam) 2021-08-05 19:24:49 PDT
Filed bug 228854 to capture the earlier assorted FormController.cpp cleanup that Darin suggested.
Comment 22 EWS 2021-08-05 20:10:46 PDT
Committed r280718 (240309@main): <https://commits.webkit.org/240309@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 435048 [details].