Bug 23346

Summary: Restore form control values to a wrong form
Product: WebKit Reporter: Oleksandr Yevstafyev <oyevstaf>
Component: FormsAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, beidson, benjie, bill, brunner, darin, futurama, ghollins, michael, oyevstaf, priyajeet.hora, service, sullivan, tkent
Priority: P1    
Version: 525.x (Safari 3.2)   
Hardware: PC   
OS: All   
Bug Depends on: 26241, 36762, 53312, 88272, 88685, 88768, 89409, 89412, 89623, 89628, 89847, 89950, 89962, 90259, 90964, 91050, 91209, 91804    
Bug Blocks:    
Attachments:
Description Flags
HTML files showing the bug
none
Proposed patch beidson: review-

Description Oleksandr Yevstafyev 2009-01-15 03:14:42 PST
Sometimes wrong form will be submitted if we play with Back button prior to sumbit.

Steps to reproduce:

1. Go to http://www.williams-sonoma.com, select a product from menu (for example, http://www.williams-sonoma.com/products/sku6463004/index.cfm?pkey=ccookware%2Dsets&ckey=cookware%2Dsets).

2. Click "Write a Review" link, enter a email address, click "CONTINUE", "Write a Review" page displays.

3. Enter nickname, specify rating value and whether you recommend the product to a friend, enter review headline and review text.

4. Click "PREVIEW", "Please Review Your Feedback" page displays.

5. Click browser's "Back" button twice, each time cancel re-submitting the form as suggested by browser. Clicking once also may result in the error at step 6, but clicking twice increases the probability.

6. Click "Edit". At this point wrong form is submitted sometimes. "Wrong" means that actual HTTP parameters sent to server are:

sp = S0
$ImageURLSubmit$0 = Preview
Form0 = $ValidHidden,$ValidHidden$0,sessionParams,inputRating,inputBuyAgain,inputDisplayName,inputTitle,inputReviewText,inputLocation,contextData,contextData$0,contextData$1,contextData$2,contextData$3,netPromoterScore,netPromoterComment,$ImageURLSubmit$0
service = direct/1/review-1031/$BazaarvoiceForm
$ImageURLSubmit$0.y = 12
$ImageURLSubmit$0.x = 12
returnURL = http://www.williams-sonoma.com/products/sku6463004/index.cfm?pkey=ccookware%2Dsets&ckey=cookware%2Dsets

while expected (and actually rendered within the form) ones are:

sp = S0
$ImageURLSubmit$0 = Edit
Form0 = $ImageURLSubmit,$ImageURLSubmit$0,$ImageURLSubmit$1
service = direct/1/preview-1031/$BazaarvoiceForm
$ImageURLSubmit$0.y = 10
$ImageURLSubmit$0.x = 31
returnURL = http://www.williams-sonoma.com/products/sku6463004/index.cfm?pkey=ccookware%2Dsets&ckey=cookware%2Dsets

Note the difference between actual and expected values of "$ImageURLSubmit$0" and "Form0" parameters. In fact, it looks like the form we filled in at step 3 is (partially) submitted instead of the form we see displayed before step 6.

Note, the error may not happen for first time. In this case, repeat steps 3-6 more times (if there is no error at step 6, you will be brought back to step 3 by the application).
Comment 1 OPAL Group Development 2009-03-06 08:30:18 PST
This same problem is occurring in our system, exactly as described by the original bug author, only 100% reliably.  Only affects Safari/Chrome, and no other browsers.

Detail:
* We output multiple forms onto a page in a system that responds with a 302 redirect in response to a form submission.  So, in effect, the page submits to itself, but then redirects to a results page.
* The "visible" form that the interface renders is actually the 5th form in the HTML page.  The first 4 are stub forms with no interface elements, used for Javascript operations elsewhere on the page.

Steps to reproduce:
1) User submits the visible form, and is returned to a menu page.
2) User clicks on the same link again to re-load the form.
3) Rather than filling out the form, user clicks Back in the browser.

* PROBLEM #1:  In this case, the browser does NOT display the previous page.  Rather it displays the page with the form on it.  Possibly related to the 302 redirects, but cannot confirm this.

4) User then hits the Submit button on the page again.

* PROBLEM #2:  The browser does not submit the form that the Submit button is attached to.  Rather, the browser submits the _first_ form on the page, which is the hidden javascript-sensitive form.

We've confirmed this behavior simply by re-ordering the forms as output to the page.  Whatever form appears first in the HTML is the form that will be submitted in response to clicking ANY submit button on the page, immediately following the browser Back.

Will attempt to distill this down to a simple test case that we'll post.  The actual system that it's in is both private-credentialed and overly complex to use as a test case.  We can reproduce this faithfully 100% of the time in our Production system, however.  Hopefully, we'll be able to provide a simplified example to add to the original submitter's.

Comment 2 OPAL Group Development 2009-03-06 09:45:06 PST
Well, I can replicate HALF of the problem in a test case.  Posted an example here:
  http://penguin.theopalgroup.com/~jkilmer/bugs/

Unfortunately, I can only replicate the first part of the bad behavior.  The erroneous Back button behavior is accurately displayed by this example.  Unfortunately, the core problem (submitting the wrong form) doesn't occur in this simplified example.  Will continue trying to isolate the cause of that more serious problem, but you should at least be able to see the first part using this example.
Comment 3 OPAL Group Development 2009-03-06 09:56:22 PST
So weird!!

Actually, the example in Comment #2 _can_ be used to recreate the entire problem.  It just requires MULTIPLE back button-clicks in this case, while our production system breaks with only a single back click.  Use the following steps:

0) Go to: http://penguin.theopalgroup.com/~jkilmer/bugs/
1) Click the "CLICK HERE TO BEGIN" link
2) Click the "Load form: HERE" link
3) Click Submit button
4) Click the "Load form: HERE" link
5) Click Browser BACK button
6) Click Submit button
7) Click Browser BACK button
8) Click Browser BACK button (second time)
9) Click the "Load form: HERE" link

WHAM!
Comment 4 benjie 2010-03-16 08:02:45 PDT
This shows up for me in Safari/Chrome as well, on our production and testing sites.

Here is the scenario:

Page A: user is shown a form M, with hidden form field named X whose value is
M_X, and a submit form field named S whose value is M_S. There is also a form N, with hidden form field named X whose value is N_X, and a submit form field named S whose value is N_S

User POSTS to go to page B.

User hits back button to go back to Page A.

In form M, the value for X is now N_X, not M_X as it should be, and the value for S is N_S, not M_S.

I believe this scenario is caused by the same bug that caused the problems described in other comments related to this bug.
Comment 5 benjie 2010-03-16 09:43:24 PDT
Created attachment 50802 [details]
HTML files showing the bug
Comment 6 benjie 2010-03-16 09:49:45 PDT
I've added some more examples. 

It's very easy to reproduce, at least some form of the bug.

First, untar the bug.tgz file

Scenario 1

1. In your Safari or Chrome, load x_bug.html
2. Don't submit anything, just click on the Next Page link
3. Click Back button
4. You will see that the first form now has field values from the second form

The interesting thing is that if you remove the "autocomplete=off" from x_bug.html, the problem goes away.

Scenario 2

1. cp x0_bug.html z.html
2. In your Safari or Chrome, load z.html
3. Click on "First Form Submit" button
4. cp x1_bug.html z.html
5. In browser, click on Back button
6. You will see that the new page in the browser, the page source and inspect element return different results. The page suppose to show form control from 2nd form from x1_bug.html, but instead it got old values from x0_bug.html.

Scenario 2 happens a lot on our website: we have a form, people submit the form, redirected to some other page, hours later the user's login session expires, user hits back button, and got a slightly different page, but the forms on the page are all messed up.
Comment 7 benjie 2010-03-16 14:07:02 PDT
I figured out what's causing the bug.

Current WebKit based browsers (as of 3/16/2010), e.g. Safari and Chrome, exhibit the following bugs. Perhaps someone can take a look. Thanks.


Bug 1: If a page A has multiple form elements F1 and F2, and the first (in order of appearance in HTML) form, F1, has autocomplete set to "off" (i.e. <form ... autocomplete="off">), but F2 has autocomplete set to "on" (default behavior), then after navigating away from page A, and then hitting browser back button to go back to page A, F1 and F2 may be auto-completed incorrectly. In particular, if F1 and F2 both have input elements with the same name and type, say N and T (i.e. <input name="N" type="T" ...>), then when navigating back to page A using the back button, F1.N's value will be autocompleted with F2.N's value. 

Bug 2: First, browser hits page A, and server returns an HTML page with form elements F1 and F2 (both forms have autocomplete set to on). Then, user navigates away from page A, and subsequently returns to page A using the browser back button. On the second visits to page A, WebKit issues another request for A to the server (this differs from FireFox's behavior, where on back button no addition request is issued to server). If the server returns a different HTML page (e.g. because user session has logged out), with form elements F3 and F4 that are different from F1 and F2, but consisting of input elements with the same name and type, then F3 and F4 will be autocompleted with F1 and F2 input element values, even for input element type hidden and submit.
 

WORK AROUND

Bug 1: never use autocomplete="off" unless you have this set for ALL the forms on the same HTML page.

Bug 2: case specific, no good generic solution. We found an acceptable work around by including hidden forms to make sure two versions of page A have similar forms; first version has F1, F2, F3, and second one has F1, F2', and F3, where F2' is a hidden version of F2. If we don't include F2', then the second version of page A is F1, and F3, and F3 will be auto-completed with F2's element values, even for hidden and submit elements in F3.
 

ANALYSIS of WebKit CODE
 
These two bugs occur in the same part of the code, but can probably be considered as two separate bugs. The code are in WebCore sub-directory of the WebKit code tree.
 

Bug 1: in Document::formElementsState, input elements that have autocomplete turned ON (checked via HTMLInputElement::saveFormControlState), have their states saved in a vector. However, in HTMLFormControlElementWithState::finishParsingChildren, every form element, regardless if autocomplete is ON or OFF, restores state from the aforementioned vector. This results in bug 1.

Bug 1 Fix: this should be a fairly straight-forward fix - finishParsingChildren should not restore state if element has autocomplete turned OFF. 

Disclaimer: I don't develop on Mac. I only use it and we develop a website. I just browse the WebKit code today. Hence, I have not created or tested a patch.
 

Bug 2. This is much more complex.
 
I assume that in a design decision unrelated to autocomplete, WebKit is designed to re-fetch page A if user is using the back button to go back in history to page A.

(I'd be interested in hearing about this too)
 
Fundamentally, WebKit makes the incorrect assumption that the second fetch of page A results in the same HTML, or at least the same set of forms, as the first fetch. If this is not the case, then the autocomplete logic no longer produces the correct/expected behavior.
 
When WebKit saves state for a page, it calls Document::formElementsState, which simply creates a map of pairs, and puts each input element's name+type and value pair into the map. If two input elements in two separate forms have the same name and type, both the values are saved.
 
For example, say page A has forms F1 and F2, and F1 has input elements with names a1 and a2, with types t1 and t2, with values v1 and v2 respectively. F2 has input elements with names a3 and a2, with types t1 and t2, and values v3 and v4, respectively. WebKit saves the state of this page as (in JSON notiation)
 
{  "a1,t1" : [ v1 ], "a2,t2" : [ v2, v4 ], "a3,t1" : [ v3 ]  }
 
If user revisits page A using the browser back button, WebKit will try to autocomplete forms on the new version of page A, fetched from the server, using the above state. If the new version of page A has exactly the same forms as the last, then everything works. If not, then WebKit produces incorrect behavior. For example, assume the second time page A is fetched, server returns just one form F3, and F3 has input elements with names a4 and a2, with types t1 and t2, then F3's a2 element will be populated with v2, saved from the previous page.
 
(Note: actual logic of storing state and recovering state, as used in the code, are slightly different, but the idea is the same)
 
This problem manifests itself on websites when user sessions may expire, and after a session expires, hitting page A may produce slightly different HTML. E.g. may give you a "please login" form, or may give you roughly the same content, but in place of a search user data form on top, a login form appears. In these cases, the visible text input element, hidden input element, and submit input elements may all have their values changed by WebKit.
 
Bug 2 Fix: This is difficult, because WebKit re-fetches page A when user uses the back button. If new version of page A is different from the old version, WebKit cannot easily match a form's state from old version of the page to some form, if it even exists, on the new version. You can't realistically require all forms to have the same DOM id, and even if you do, that's still not entirely correct since DOM ids are required to be unique within a HTML page, but not need to be unique across separate HTML Documents.
 
The only fix I can think of is this: when you save the state from the first time you visit page A, take a MD5 or SHA1 hash of the page, and store that with the input element state. When you go back to page A, only restore state if the MD5 or SHA1 hash is the same.
Comment 8 Kent Tamura 2010-03-28 04:25:53 PDT
*** Bug 31413 has been marked as a duplicate of this bug. ***
Comment 9 Kent Tamura 2010-03-28 04:41:38 PDT
Created attachment 51857 [details]
Proposed patch
Comment 10 Kent Tamura 2010-03-28 04:43:50 PDT
(In reply to comment #9)
> Created an attachment (id=51857) [details]
> Proposed patch

I confirmed this patch resolved both scenarios of Comment #6.
Comment 11 benjie 2010-03-29 07:30:23 PDT
Thanks Kent for the patch.

While the patch solves bug 1 reported in comment 7, and it probably solves some of the manifestations of bug 2 from comment 7 as well, it does not address the root of the problem and leaves some scenarios unsolved.

Bug 2 reported in comment 7 is caused by WebKit re-fetching the HTML content of a URL when user hits the back button to reach the said URL, loading up a different HTML content than before, but using the saved form state to re-populate the new, changed HTML content.

The patch attempts to get around this problem by checking the "action" attribute of the form. This helps, but it does not solve this scenario:

1) First time fetching ULR X, the content of X contains a form with action A, with a hidden input element H, with value V_H, and text input element T, with value V_T, and another hidden input element J, with value V_J.

2) User moves from X to page Y, the hit back button to go back to X

3) On second fetching of URL X, due to some other state change (e.g. user "session" expired), the content of X now contains a form with action A, and text input element T, with no value pre-set, and another hidden input element J, with value pre-set to V_J'.

Note that the value of J has changed from V_J to V_J'.

4) WebKit will repopulate the new content of X, with old form values. Based on the logic in the code and the patch, it will change the form value to 
   T  --> V_T  (using old value)
   J --> V_J (using old value)

This in fact is the wrong behavior, because the server may have returned a pre-set value V_J' that meant to take the user to a different action on form submit. 

Yes, this does assume that the CGI target -- the form action, stays the same and the CGI demultiplexes task based on a CGI variable. Whether you agree with this style of design or not, it's something allowed by the CGI architecture.

Okay, suppose we put in a simple fix. Let's say that we DO NOT restore form values for "hidden" and "submit" fields. Then that'd solve the above scenario, right?

Not quite. While it means the hidden fields and submit fields are populated correctly, and that's a big plus, the form values will still be populated wrong. For example, consider the following scenario

1) First time fetching ULR X, the content of X contains a form F1 with action A, with a hidden input element H, with value V_H, and text input element T, with no pre-set value, and text input element R, w/o pre-set value, and another hidden input element J, with value V_J.

2) User enters values V_T and V_R for fields T and R. User moves from X to page Y, the hit back button to go back to X

3) On second fetching of URL X, due to some other state change (e.g. user "session" expired), the content of X now contains a new form F0, appearing before F1, with action A, and text input element T, with no value pre-set, and another hidden input element K, with value pre-set to V_K. Then form F1, un-changed from step 1, shows up in the page.

4) WebKit will repopulate the new content of X, with old form values. Based on the logic in the code and the patch, it will change the form value to 
  
  F0:
   T  --> V_T  (using old value)
   K --> V_K (not changing this value since it's a hidden element)

  F1:
   T --> ??? (no value to pre-fill)
   R --> V_R (using old value)
   J --> V_J (not changing since it's a hidden field)

So this is better, in that the forms probably will head to the right controller on the server, but the form values T and R in F1, on the new page, are not filled consistently. Plus, this may break the case when JS changes hidden field values.

The point is, in a form, you have an action A, and a set of CGI fields, and WebKit should either fill in values for ALL of the fields, or NONE of the fields. AND, it should only fill in values if the form has not changed. 

Given that WebKit has decided to refetch URL X when user hits back button, you cannot guarantee that form has not changed, and the server has not changed its intent, unless you compare either 1) the entire HTML content of URL X, or 2) entire HTML content of the form, prior to user/JS actions. E.g. if you take the SHA1 hash of the form HTML content, and use that as your identifier, instead of form action, then you'd solve this problem.

At the least, I propose you also patch the code to NOT fill in values for hidden and submit elements.
Comment 12 benjie 2010-03-29 07:36:25 PDT
An addition to the last paragraph of comment 11. I propose that, short of using the content-hash approach to determine if HTML has changed, you patch the code to

1. Not fill in hidden and submit form elements, if the value you want to fill is different from the pre-set value from the HTML, AND

2. Never fill in value for a field of a form, no matter what type of the field is, if there is a hidden or submit form element in the form that failed condition 1.

This is still just a stop-gap measure. The real fix is either NOT reloading page on back button, or determine if HTML for the form has changed.
Comment 13 Kent Tamura 2010-03-29 07:59:16 PDT
Yeah, I understand my patch doesn't fix the whole of this issue. But I expect the patch solves many problems.

I already have another patch to disable restoring values for button, hidden, image, reset, and submit.  They are not user-editable, so we don't need to restore values for them.

Hashing the whole HTML document is not acceptable because loading an HTML document might take long time.  It might be possible to hash each of FORM elements by DOMHASH-like algorithm.
Comment 14 benjie 2010-03-29 08:11:50 PDT
Kent, 

That's great (what you mentioned in comment 13). What would be nice is to make sure the 2nd part of comment 12 is also true, which is: if WebKit determined that a hidden field of a form should not have its value filled in, because the value fetched from the HTML is different than the value saved, then WebKit should also NOT fill in any other text field in the form.

With that, if on refetch there is an additional form, or a changed form, the rest of the forms on the page can still be re-filled correctly.

Thanks
Comment 15 Brady Eidson 2010-03-29 09:01:54 PDT
Guys,

I can't even review this patch on its merits of what it does for this bug, because it's not clear *precisely* what this bug represents.

It'd be a lot easier to follow all of this if we had separate bugs for each *specific* issue Benjie describes. Especially since he likes to be very verbose about all of the issues at once, making it hard to digest.  ;)

This bug can be an umbrella for the broken out issues.
Comment 16 benjie 2010-03-29 09:31:58 PDT
Sorry, I can be clearer.

There are two bugs.

1) autocomplete forms should not consume saved state -- this one is solved by Kent's patch

2) WebKit re-fills values for forms that have been changed by the server -- this one is not solved by Kent's patch, although the patch improves the behavior and eliminates a set of scenarios.

Comment 11 through 14 are for bug number 2.
Comment 17 Brady Eidson 2010-03-29 09:38:20 PDT
(In reply to comment #16)
> Sorry, I can be clearer.
> 
> There are two bugs.
> 
> 1) autocomplete forms should not consume saved state -- this one is solved by
> Kent's patch

Great.

> 2) WebKit re-fills values for forms that have been changed by the server --
> this one is not solved by Kent's patch, although the patch improves the
> behavior and eliminates a set of scenarios.

Please file a second bug for this (and mention it here for posterity).
Comment 18 OPAL Group Development 2010-03-29 09:45:20 PDT
As one of the posters (in Comment #3) that hit this bug a year ago, we're still following it as we had to put workarounds in place in our own product.

Something to note on #2 above:  I believe this bug is going to be greatly complicated by Javascript.  While it breaks simply because of the form load with cached values, if Hidden fields are manipulated by JS during page load, things go even screwier.

In our live application, such fields can end up with the correct value, the cached value, or occasionally JS "undefined", with no rhyme or reason as to when which occurs.  I've not been able to isolate that in a simple test case, so I have nothing to add other than the suggestion that the new bug be aware that JS onload events seem to complicate this underlying problem exponentially.
Comment 19 benjie 2010-03-29 10:00:30 PDT
I created bug 36762, to keep track of the FIRST bug, regarding autocomplete forms should not consume saved state. Because most of the discussions here are for the 2nd bug, I think this particular thread/bug number should be about the 2nd bug.

Regarding comment 18 from OPAL: to make sure JS changing hidden/submit field values are properly handled (i.e. they are also restored to the post JS value), I think the best approach would be to get a hash of the form DOM, and use that as an unique identifier for the form. In that case, on re-fetch of a URL, if the form DOM has not changed at all, then its values can be re-filled in. Otherwise, don't re-fill it.
Comment 20 Brady Eidson 2010-03-29 10:31:36 PDT
Comment on attachment 51857 [details]
Proposed patch

Curious about this issue, I applied the patch.

I might be missing something here, but this test functions the same in a webkit nightly as it does with the patch applied.

Layout tests are supposed to fail before a patch, and pass with it.  What is the actual change in behavior here?
Comment 21 Brady Eidson 2010-03-29 11:48:00 PDT
(In reply to comment #20)
> (From update of attachment 51857 [details])
> Curious about this issue, I applied the patch.
> 
> I might be missing something here, but this test functions the same in a webkit
> nightly as it does with the patch applied.
> 
> Layout tests are supposed to fail before a patch, and pass with it.  What is
> the actual change in behavior here?

I realize now that Safari's forms autofill was stepping in and clouding my results - DRT sees the before and after.  I'll do a proper review now.
Comment 22 Kent Tamura 2010-03-30 22:28:33 PDT
*** Bug 20778 has been marked as a duplicate of this bug. ***
Comment 23 Kent Tamura 2010-04-02 12:18:09 PDT
* autocomplete=off issue and
* type=hidden issue
has been resolved by r57012 and r57013.  The remaining is the form identification issue.
Comment 24 Kent Tamura 2010-04-02 13:19:36 PDT
I'm planing to implement the "form hash" idea.

- Form control states are saved when a user leaves the page.
- Form control states are restored when the page is parsed.
So, If we introduce the form hash, and if the form DOM is changed after parsing and before leaving (e.g. another control is added to the form by JavaScript), control states won't be restored.

Is it acceptable?
Comment 25 Beat 2010-04-02 13:31:53 PDT
First of all, thanks for fixing the related bug.

Back to your question:

No, don't think form hashes will work in our case:

In our web-applications (used on a large number of servers), we use hidden fields combined with cookie states and javascript states to efficiently protect forms against spambots.

So typically, the same form re-displayed will have different hidden field values, and auto-fill-in restore from saved state is useful when hitting back button. But if you do DOM hash of form, it will be different. Same for the captcha image where applicable.

So implementing hashing the DOM of FORM will break the extremely useful form restoration on the back button.

That's if I understand right what you mean by that.

But I might have understood wrongly what you meant by form hash...
Comment 26 benjie 2010-04-02 17:28:29 PDT
Beat: I believe Kent is talking about taking the hash of the form DOM.

The problem with NOT taking the hash of the form DOM, is that there is no other good way for the browser to know that the form has not changed. Since WebKit's behavior is on back button, it reloads the page, if the form has changed, then restoring form values will give you the wrong behaviors.

Right now, I can't come up with any better solution. To me, the acceptable solutions are: 1) use some kind of hash thing to see if the form has changed, or 2) no restore of values unless the page is loaded from cache, and not from server. If the form's DOM changes and there are different input elements, restoring values will result in a bogus form. 

This may be outside the scope of this particular discussion: but is there a good reason why WebKit refetches page on back button? I figure there must be.
Comment 27 Darin Adler 2010-04-02 17:31:58 PDT
(In reply to comment #26)
> This may be outside the scope of this particular discussion: but is there a
> good reason why WebKit refetches page on back button? I figure there must be.

 All browsers refetch pages on back button in some circumstances. They don’t hold all the pages in the entire back/forward list memory. There is an optimization for the back button in some browsers, originally created in Mac Internet Explorer, and in Safari for a long time, and recently introduced in Firefox, where some pages are cached to make the back command even faster.
Comment 28 Kent Tamura 2010-04-03 07:37:30 PDT
(In reply to comment #25)
> No, don't think form hashes will work in our case:

My current plan is that the form hash is a hash value of the following information:

* form/@action, form/@method, form/@enctype
* for each of form controls
  - @type
  - @name

and it doesn't contain other attributes of <input> including @value, text nodes, non-control elements.
So, the following A and B will be equivalent, but C won't be equivalent with others.
Does this make sense?

A:
<form action="foo.cgi" id=form1>
 <input type=hidden value=session12345>
 Username: <input type=text name=user>
 Password: <input type=password name=password>
</form>

B:
<form action="foo.cgi" id=form2>
 <input type=hidden value=session98765>
 Your name: <input type=text name=user>
 PIN: <input type=password name=password>
</form>

C:
<form action="foo.cgi" id=form1>
 <input type=hidden value=session4567>
 Username: <input type=text name=user>
 Password: <input type=password name=password>
 <input type=checkbox name=remember>Remember me.
</form>
Comment 29 Beat 2010-04-03 10:32:31 PDT
In our forms, we vary names of security-related hidden inputs as well, not only value, to make things particularly difficult for spambots...keeping things in sync with server-side sessions. I know that popular CMS systems like joomla do that too.

I guess if the form hash ignores all values and completely all hidden inputs, but concentrates just on names and type of visible input elements it can make sense. It's anyway only the visible input elements (except password fields and now except hidden fields as well) which now get restored.

Obviously, cases where a form has optional inputs which are there or not depending on state will not be restored properly with such a form hash...don't have a solution here.

The form fields restore feature of webkit on back button is awesome and has been a lifesaver in many cases for me. It's actually one of the top 3 features making it my favorite browser.
Comment 30 benjie 2010-04-03 11:37:44 PDT
Kent, 

Thanks for the update on your plan.

There is a problem with what you described, not including @value. Any CGI infrastructure where the action depends on the value of a hidden CGI variable, will be broken when user hits back button.

Consider this: form A on page X, when user is logged in, has CGI variable named action, value set to "add". User goes from X to next page, user session times out, hits back button, lands on page X, but form A is replaced by server with form B: with CGI variable named action, value set to "login".  But WebKit would replace the value from "login" to "add".

If you are not going to take the hash of the entire form, you should at least include hidden values in the hash. 

I wonder if there's a different way to consider this problem. We've been struggling with what's the best solution if the page is reloaded. No matter what we are guessing what the server's intention is.

Perhaps there should be two separate behaviors. First, if when user hits the back button, WebKit does NOT reload the page, but instead loads the page from cache, and also does not fire off JS again, then you can auto-fill forms using exactly the same technique as implemented in the existing code base.

Second, if when user hits the back button, WebKit does RELOAD from server, then after loading the page, any JS hooks will fire off as well. In this case, we shouldn't worry about things like hidden values and type=submit input element values. In fact, in this case, perhaps the auto-fill should ONLY handle values that users can type. So in this case, what Kent proposed would work, if augmented with not changing values of any input element that the user cannot enter via keyboard.

Does that make sense at all?
Comment 31 benjie 2010-04-03 11:41:15 PDT
The reason I broke up the solution into two separate cases is to handle weather Javascript is fired off again on back button. If JS is NOT fired off because page is loaded from cache, then you have to restore hidden and submit type values. Because those can be changed by JS. If JS is fired because page is loaded from server, then only user entered values should be restored, IMHO.
Comment 32 benjie 2010-04-03 11:45:08 PDT
I apologize, this is the last comment -- I want to clarify comment 30, option 2: what I think would work is Kent's solution, but hash has to include hidden CGI names AND values, and basically name and values of all input elements that users cannot enter text for. So if hidden field name OR value changed, then autofill should NOT occur.
Comment 33 Darin Adler 2010-04-03 23:32:58 PDT
Why are we inventing this form hash technique? No other browser has it, that I know of. Can we figure out what the behavior of other browsers is, please? We will have to construct tests carefully to avoid being confused by the back/forward caching feature. We need to know the behavior when a back/forward cache is not involved.
Comment 34 benjie 2010-04-04 06:36:17 PDT
Darin,

Good point. May be we don't need the hash, if we don't refill values when user hits back button, and the page is reloaded from server instead from cache.

This is the behavior on Firefox. If on back button, the page loaded from cache, values are filled in, and there's no danger of form confusion. If the page is loaded from the server, then FireFox does not fill in values.
Comment 35 Kent Tamura 2010-04-12 06:45:46 PDT
(In reply to comment #33)
> Why are we inventing this form hash technique? No other browser has it, that I
> know of. Can we figure out what the behavior of other browsers is, please? We
> will have to construct tests carefully to avoid being confused by the
> back/forward caching feature. We need to know the behavior when a back/forward
> cache is not involved.

I investigated Firefox.
Summary: Firefox has a similar issue.  But it rarely make a problem because Firefox has page cache.

I couldn't find a way to avoid page cache of Firefox, so confirmed the issue with another way.
1. Change a startup configuration to "When Firefox starts: Show my windows and tabs from last time"
2. Open a local HTML file with two forms, and fill some values
3. Close Firefox
4. Edit the local HTML.  e.g. exchange the location of two forms
5. Start Firefox again

Then, we can see incorrectly filled forms.

I looked the source code of Firefox.  It identifies target controls by id attribute or simple XPath expressions with name attribute value and ordinal number.

(In reply to comment #34)
> Good point. May be we don't need the hash, if we don't refill values when user
> hits back button, and the page is reloaded from server instead from cache.

Chromium has no page cache at this moment.  I'm afraid removing this refill feature makes users unhappy.
Comment 36 benjie 2010-04-12 07:14:51 PDT
Kent, 

Thanks for the investigative work.

Is it possible then to change the goal of the autofill system? Currently the goal is to restore the entire state of the form elements, including elements users cannot see. What if the autofill's goal is to aid users such that they don't have to re-type data. In that case, we can fulfill the goal by just restoring user enterable text, and not change hidden element values.

Then it's up to the user to figure out what's right and wrong; at least we won't change any hidden or submit element values, in such a way that changes the server's intention and confuses the user.
Comment 37 Alexey Proskuryakov 2010-04-12 08:02:47 PDT
> But it rarely make a problem because Firefox has page cache.

Why do you keep saying this? Safari also has a page cache, and as Darin explained above, we had it even before Firefox did.

> I couldn't find a way to avoid page cache of Firefox

Please see <https://developer.mozilla.org/En/Using_Firefox_1.5_caching>.
Comment 38 benjie 2010-04-12 08:59:20 PDT
Alexey,

It doesn't matter Safari has a page cache or not, but rather, is page cache used when user hits back button, to go back to a page with a form.

In FF, for my test pages (and it seems like for Kent's as well), more often than not, hitting back button to go back to a page with a form results in FF loading that page from page cache. On Safari, the page is reloaded from the server.

When it's reloaded from the server, we get the bug.
Comment 39 Brady Eidson 2010-04-12 09:03:02 PDT
(In reply to comment #38)
> Alexey,
> 
> It doesn't matter Safari has a page cache or not, but rather, is page cache
> used when user hits back button, to go back to a page with a form.
> 
> In FF, for my test pages (and it seems like for Kent's as well), more often
> than not, hitting back button to go back to a page with a form results in FF
> loading that page from page cache. On Safari, the page is reloaded from the
> server.
> 
> When it's reloaded from the server, we get the bug.

I think Alexey's point is that:
1 - There's well known, published ways to avoid Firefox's Page Cache
2 - In the wild (not with specialized test cases), many form-heavy sites *DO* avoid the page cache in every major browser with the simple step of having an unload handler.

So, yes - this is a real-world problem in Firefox as well.
Comment 40 Alexey Proskuryakov 2010-04-12 09:18:58 PDT
The code that decides whether a frame can be cached is in FrameLoader::canCachePage(). Forms do not prevent caching, as far as I can tell.
Comment 41 Brady Eidson 2010-04-12 09:51:13 PDT
(In reply to comment #40)
> The code that decides whether a frame can be cached is in
> FrameLoader::canCachePage(). Forms do not prevent caching, as far as I can
> tell.

Correct, we *definitely* cache pages with forms.  Looking at the attached test case, I don't see anything that would make it not page-cache worthy.

For the purposes of making "autocomplete=off" be a security feature, we also happen to clear certain form values, even in the page cache case.  But I don't *THINK* we do any restoring.
Comment 42 Kent Tamura 2010-04-24 09:44:55 PDT
(In reply to comment #37)
> > I couldn't find a way to avoid page cache of Firefox
> 
> Please see <https://developer.mozilla.org/En/Using_Firefox_1.5_caching>.

Thank you for the information.
I tested Firefox behavior without the page cache, and confirmed that the behavior was the same as Comment #35, Firefox filled values to a wrong form.


Internet Explorer 8:
* Without the form completion feature
 When IE fetches a page from a server by the Back button, form values are not restored.
* With the form completion feature
 If a refetched page is different from the original page, IE doesn't restore form values. I guess IE checks a digest value or something of the HTML.
Comment 43 Galen Hollins 2011-01-31 09:23:27 PST
I've seen this problem as well in my application.

Here is a simple test case to reproduce in Safari/Chrome.

http://jsfiddle.net/rwaldron/UvmDv/7/ 

I am concerned that this bug has an importance of "Normal".  This bug has caused data corruption in one of my apps (via the back, then resubmit pattern), and therefore I think this problem should be classified as "Major".   If it was merely a view-layer problem, things would be different, but this bug actually changes form fields, that in many cases may get resubmitted.

Thanks,
Galen
Comment 44 Galen Hollins 2011-01-31 09:24:21 PST
*** Bug 53312 has been marked as a duplicate of this bug. ***
Comment 45 Kent Tamura 2011-04-24 13:19:16 PDT
I have thought how to resolve this issue for one year.  My current thought is below:


What is this feature?
  The purpose of the feature is to restore user-input values without on-memory back-forward cache.
   -> So, we should not restore values for non user-editable controls such as elements with no renderer, visibility:none style, read-only/disabled elements.
   -> We should restore for controls with autocomplete=off because users can edit them.
   -> WebKit should fire 'change' events when values are restored because restoring is a kind of user-input.
 
What's wrong with the current implementation?
  * Values are restored to unexpected controls.
    This often happens for dynamically generated pages.
    -> We should check page identity more strictly.  Currently we have queues for type-name pairs.  We should have stricter data structure such as a list of type-name.

  * Values are restored to dynamically-generated controls.
    If the restore value queue is not empty and a dynamically-generated control accidentally match to a type-name pair in the queue, the control has the restored value.
    -> We should restore values only once, and purge unused saved values just after that.
        IMO, we should restore values just after document's 'load' event because the 'load' event is often used to build a page.  If we change so, we won't restore values for controls added after the 'load' event. 


Comments?
Comment 46 Darth 2011-04-24 13:36:58 PDT
> WebKit should fire 'change' events when values are restored because restoring is a kind of user-input.

What if the change event caused a page refresh or navigation?
Say I have a form with some drop down to choose some item, and some particular item on choose causes a page navigation since its a new workflow, wont' pressing back on the browser cause an issue?

Another question, will there be any difference in what is done on https vs https pages? What about pages declaring no-cache / no-store etc? thanks
Comment 47 Kent Tamura 2011-04-24 14:45:31 PDT
(In reply to comment #46)
> > WebKit should fire 'change' events when values are restored because restoring is a kind of user-input.
> 
> What if the change event caused a page refresh or navigation?
> Say I have a form with some drop down to choose some item, and some particular item on choose causes a page navigation since its a new workflow, wont' pressing back on the browser cause an issue?

Good point.
We should dispatch the change event anyway because updating page structure by checking/unchecking checkboxes or radio buttons is a typical usage.  So we would disable navigation during restoring control values.

> Another question, will there be any difference in what is done on https vs https pages? What about pages declaring no-cache / no-store etc? thanks

I don't think they matter.  I have never seen complaints with them.
Comment 48 Darth 2012-06-07 14:23:51 PDT
Another issue recently I encountered, but seems more of a regression in Chrome at least as Safari seems consistently broken.

http://code.google.com/p/chromium/issues/detail?id=131645
Comment 49 Darth 2012-06-07 15:12:46 PDT
(In reply to comment #48)
> Another issue recently I encountered, but seems more of a regression in Chrome at least as Safari seems consistently broken.
> 
> http://code.google.com/p/chromium/issues/detail?id=131645

I found a Safari 5.1.3 machine, and issue doesnt happen on that, so my prior statement suggesting all Safari's show this issue is incorrect. Safari 5.1.5 does show the issue
Comment 50 Brady Eidson 2012-06-27 08:41:27 PDT
(In reply to comment #47)
> (In reply to comment #46)
> > Another question, will there be any difference in what is done on https vs https pages? What about pages declaring no-cache / no-store etc? thanks
> 
> I don't think they matter.  I have never seen complaints with them.

I'm not quite sure what you're saying.

Maybe what you're saying is "I've never seen WebKit users complain about those sites behaving differently," in which case I have nothing to add.

But if you're saying "I have never seen those sites complain about form restoring, so we'll treat them the same as any other site," then I must take pause.

They do matter.  The code is in there for a reason.  We'd much rather *not* see major U.S. and International banks disable working in WebKit browsers, for example.
Comment 51 Kent Tamura 2012-07-23 23:17:02 PDT
(In reply to comment #50)
> > > Another question, will there be any difference in what is done on https vs https pages? What about pages declaring no-cache / no-store etc? thanks
> > 
> > I don't think they matter.  I have never seen complaints with them.
> 
> I'm not quite sure what you're saying.

I meant we didn't need to change the current WebKit behavior.
Comment 52 Kent Tamura 2012-07-23 23:24:23 PDT
I made a major improvement of the feature recently, and I believe the restore feature is not bad now.  I'm closing this bug. If someone still have a problem, please file another bug and cc to me.

(In reply to comment #45)
> What is this feature?
>   The purpose of the feature is to restore user-input values without on-memory back-forward cache.
>    -> So, we should not restore values for non user-editable controls such as elements with no renderer, visibility:none style, read-only/disabled elements.

I didn't change this behavior. We saves almost all of form control values, including type=hidden.

>    -> We should restore for controls with autocomplete=off because users can edit them.

I don't change this behavior yet.

>    -> WebKit should fire 'change' events when values are restored because restoring is a kind of user-input.

I didn't do it. Because we restore form control state only during document parsing, and I'm not sure dispatching event would make sense in such case.

> What's wrong with the current implementation?
>   * Values are restored to unexpected controls.
>     This often happens for dynamically generated pages.
>     -> We should check page identity more strictly.  Currently we have queues for type-name pairs.  We should have stricter data structure such as a list of type-name.

Done.  We identify forms with form structure.

>   * Values are restored to dynamically-generated controls.

When we switched to HTML5 parser, this behavior was gone, and no one requested to fix it.
Comment 53 Alexey Proskuryakov 2012-09-13 11:11:06 PDT
*** Bug 96589 has been marked as a duplicate of this bug. ***