Bug 16122

Summary: When posting to these boards, all Safari users have "webkitformboundary-gibberish" appended to their name and message
Product: WebKit Reporter: Sam Shallenberger <sam>
Component: FormsAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: ap, billmonk2, sisirie, webkit
Priority: P2    
Version: 523.x (Safari 3)   
Hardware: Macintosh   
OS: OS X 10.5   
URL: http://www.subguns.com/boards/sword.cgi
Attachments:
Description Flags
Proposed patch for bug ID 16122 ap: review-

Description Sam Shallenberger 2007-11-24 16:01:38 PST
This is a non-authenticated, threaded discussion board.  When a Safari user posts, their name shows up as "Name - WebKitFormBoundary23oulskdfjao832u4239023420" or similar, and their message text as "message text - WebKitFormBoundary23498s9dfs9f" or similar.

To test, go to a message near the bottom of the board and reply to a post using a "test" message.  The messages at the bottom of the board roll of quickly.
Comment 1 David Kilzer (:ddkilzer) 2007-11-24 16:12:18 PST
Probably related to or a duplicate of Bug 16082.

Comment 2 David Kilzer (:ddkilzer) 2007-11-24 16:16:10 PST
This happens when previewing a post as well.

What bulletin board software is this site using?

Comment 3 Sam Shallenberger 2007-11-24 16:22:03 PST
Board is using WebBBS 5.12 (UNCONFIRMED, but as reported in site footer)


(In reply to comment #2)
> This happens when previewing a post as well.
> 
> What bulletin board software is this site using?
> 
Comment 4 David Kilzer (:ddkilzer) 2007-11-24 16:30:13 PST
This is not related to Bug 16082 as there is no redirect involved.

This multipart/form-data post looks fine to me.  I think the sword.cgi script is decoding the form data incorrectly.  See RFC 2388 <http://www.faqs.org/rfcs/rfc2388.html> for details on how to properly decode it.

Changing to an evangelism bug.  Confirmed with a local debug build of WebKit r27997 with Safari 3.0.4 (523.12) on Mac OS X 10.4.11 (8S165).

POST /boards/sword.cgi?post HTTP/1.1
Accept-Language: en
Accept-Encoding: gzip, deflate
Referer: http://www.subguns.com/boards/sword.cgi?read=930341
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/525.1+ (KHTML, like Gecko) Version/3.0.4 Safari/523.12
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryXtw3auxW8P5V6R70
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Content-Length: 962
Connection: keep-alive
Host: www.subguns.com

------WebKitFormBoundaryXtw3auxW8P5V6R70
Content-Disposition: form-data; name="followup"

930341
------WebKitFormBoundaryXtw3auxW8P5V6R70
Content-Disposition: form-data; name="name"

joe webkit
------WebKitFormBoundaryXtw3auxW8P5V6R70
Content-Disposition: form-data; name="email"

webkit-test@hotmail.com
------WebKitFormBoundaryXtw3auxW8P5V6R70
Content-Disposition: form-data; name="subject"

Re: Translation inside------WebKitFormBoundary1+WF
------WebKitFormBoundaryXtw3auxW8P5V6R70
Content-Disposition: form-data; name="body"

This is a test message.


------WebKitFormBoundaryXtw3auxW8P5V6R70
Content-Disposition: form-data; name="url"

http://bugs.webkit.org/show_bug.cgi?id=16122
------WebKitFormBoundaryXtw3auxW8P5V6R70
Content-Disposition: form-data; name="url_title"


------WebKitFormBoundaryXtw3auxW8P5V6R70
Content-Disposition: form-data; name="Preview"

Preview Message
------WebKitFormBoundaryXtw3auxW8P5V6R70--
Comment 5 David Kilzer (:ddkilzer) 2007-11-24 16:36:32 PST
(In reply to comment #3)
> Board is using WebBBS 5.12 (UNCONFIRMED, but as reported in site footer)

http://awsd.com/scripts/webbbs/

I tried finding other sites on the web that use this software to test the behavior, but couldn't find an open forum after three Google search page results.

Comment 6 David Kilzer (:ddkilzer) 2007-11-24 16:44:35 PST
This seems to work with the WebBBS support forum itself.

http://awsd.com/cgi-bin/webbbs.pl/noframes/read/100948

Perhaps the www.subguns.com web server needs to upgrade its LWP::Simple module.

Comment 7 David Kilzer (:ddkilzer) 2007-12-09 15:05:29 PST
*** Bug 16368 has been marked as a duplicate of this bug. ***
Comment 8 closerlook 2008-02-23 10:59:33 PST
>>I tried finding other sites on the web that use this software to test the
>>behavior, but couldn't find an open forum after three Google search page
>>results.

Here is one (running WebBBS 5.12) which exhibits the WebkitFormBoundary problem.

http://www.misterguitar.us/cgi-bin/chetboard.pl

Comment 9 Brooks 2008-08-17 22:56:43 PDT
http://wwwboard.modelcarkits.com/ is another example of an open board using 5.12 that exhibits this bug.
Comment 10 Bill Monk 2009-03-24 16:16:50 PDT
There are about a dozen affected boards hosted here:

http://www.comicscommunity.com/
Comment 11 Bill Monk 2009-03-24 16:34:46 PDT
The issue is not really WebKit's, though WebKit has worked aroud similar issues. Below is a repeatable explanation for why this issue occurs. A patch will be submitted shortly.

When WebBBS 5.12 parses a POST, it uses a perl regex to locate form boundaries, then uses the found boundary text itself as a regex to decide whether to skip boudary text or accept it as user text. I'm sure you can guess where this is going...

In /webkit/WebCore/platform/network/FormDataBuilder.cpp, ::generateUniqueBoundaryString() creates random strings from the set of alphanumeric characters plus the '+' character. A comment there mentions that a few other legal characters (according to RFC 2046) have been omitted from because they gmail and possibly other sites have problems with them. <http://bugs.webkit.org/show_bug.cgi?id=13352> and <rdar://problem/5252577>

Thus WebKit generates boundary strings which legally, though randomly, may contain '+' characters.

Next, looking at the latest WebBBS perl files (last updated 2002) at
http://awsd.com/download/webbbs/webbbs_files.zip

in the file webbbs_post.pl, the following code occurs at line 28:

 sub Parse_Post {
	if ($ENV{'CONTENT_TYPE'} =~ /boundary=(\"?([^\";,]+)\"?)*/) { $boundary = $1; }
	binmode STDIN;
	read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
	@buffer = split(/\r\n/,$buffer);
	foreach $line (@buffer) {
		if ($line =~ /$boundary/) { $Current = ""; next; }

...

}

The first regex tries to identify the random portions of form boundaries as "one or more characters which are not a quote symbol, a semicolon, or a comma." This is not the same thing as the set of legal characters identified by RFC 2046. Still, it seems to work more or less reliably as-is. 

The second regex, in this code

	foreach $line (@buffer) {
		if ($line =~ /$boundary/) { $Current = ""; next; }

then uses the boundary itself as a regex, to decide whether to ignore the boundary. I'm certain you see where this is going...

Given that WebKit can randomly emit boundaries containing '+' characters, say a given boundary is "a+b". The first regex finds it. The second regex then asks "does the string "a+b" exactly match the pattern "one or more 'a' characters followed by one 'b' character?" This of course fails. Thus the boundary text is processed as if it were user input, causing "------WebKitFormBoundary" (followed by the random boundary characters, always containing a '+') to randomly appear in WebBBBS user input.

This ca be tested by going to the sites mentioned in previous comments,
http://wwwboard.modelcarkits.com/
http://www.misterguitar.us/cgi-bin/chetboard.pl

Scroll to bottom of page, click "Post New Message", enter anything for name, subject, and message body, then click the "Preview Message" button.
If "------WebKitFormBoundary...." does not appear in various fields, WebKit's randomly-generated form boundary text did not contain a '+' character and so did not trigger the WebBBS regex escaping issue. Continue clicking the "Preview Message" button until the boundary text appears - it may take a few tries. When it does, note that the "gibberish" or "garbage characters" reported by users following the "WebKitFormBoundary" contains one or more '+' characters.

This bug also can trigger WebBBS' rudimentary "bad langauge filter" - a file of "udesirable phrases" which can optionally be supplied by the board's admin. If such a file is in use, it is not uncommon for the random boundary text to contain character sequences which are close enough to the target words to trigger the filter, causing the board to reject the post and ask the user to rephrase their submission. When this occurs, no amount of editing will clear the restriction since the "problem word" is in the boundary text (most likely interspersed with other characters and not immediately recognizable) and not in the user's submission at all. In the course of investigating this issue, I noticed some quite surprising "near-misses" being emitted.

A WebBBS fix (which works here, running it under OS X WebSharing) is to escape any '+' characters in found boundary text before using the text as a regex:

	if ($ENV{'CONTENT_TYPE'} =~ /boundary=(\"?([^\";,]+)\"?)*/) 	
		{ $boundary = $1; }
	$boundary =~ s/\+/\\+/g;


However, while the WebBBS site appears to be active, WebBBS itself has not seen an update since 2002; it may effectively be abandonware. I see no obvious place to submit patches there.

A WebKit workaround is to do what's been done before: in generateUniqueBoundaryString(), omit the legal '+' from the array of characters used to create the random boundary strings. 

In the array below, the next-to-last value (0x2B) would be replaced by some other character in the array, as was done with the last value (which now duplicates the first, due to http://bugs.webkit.org/show_bug.cgi?id=13352 and/or <rdar://problem/5252577>):

    static const char alphaNumericEncodingMap[64] = {
        0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48,
        0x49, 0x4A, 0x4B, 0x4C, 0x4D, 0x4E, 0x4F, 0x50,
        0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58,
        0x59, 0x5A, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66,
        0x67, 0x68, 0x69, 0x6A, 0x6B, 0x6C, 0x6D, 0x6E,
        0x6F, 0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76,
        0x77, 0x78, 0x79, 0x7A, 0x30, 0x31, 0x32, 0x33,
        0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x2B, 0x41
};

Pehaps 'a' is a decent choice, for no reason other than that it's near the middle of the array, halfway between the other two repeated characters.

In general, the legal boundary characters which are also regex metacharacters seem likely to have unexpected effects in the wild.
The + character is the last such character remaining in the array.

Note that even patching both WebBBS and WebKit will not immediately eliminate all reports of "------WebKitFormBoundary...." appearing on these sites. Safari users, for instance, who have experienced the issue now likely have form boundary text (always containing a '+') in their forms AutoFill, which gets auto-re-entered into WebBBS forms even when WebKit's current boundary text does not contain a '+' and doesn't actually trigger the WebBBS bug! This makes it rather annoying and difficult to get rid of on active boards.

Once either WebBBS is fixed or WebKit has a workaround in place, WebBBS admins may wish to have their users to edit their BBS' URLs out of Safari->Preferences->AutoFill->other forms. But until one patch or the other is in place, this won't help - WebKit's legal, randomly-generated '+' characters will continue to trigger the issue in WebBBS.

The patch above has been built and tested on all reported problem sites. It appears to successfully work around the issue, and will be submitted shortly. 
Comment 12 Bill Monk 2009-03-25 22:24:14 PDT
Another effect of the issue is that it triggers admin moderation when the "------WebKitFormBoundary" text gets into email fields. This causes posts to be not appear right away, causing users to complain to admin about the delays. 

This board has begun appending an automated advisory to such posts, which reads:

"Post delayed due to Safari browser glitch. Mac users, please use Firefox (or anything but Safari). "

For an example, see http://www.misterguitar.us/cgi-bin/chetboard.pl?read=92909




Comment 13 Alexey Proskuryakov 2009-03-25 22:32:10 PDT
Yes, it seems clear that we should add a workaround.

> A patch will be submitted shortly.

Please do!
Comment 14 Bill Monk 2009-03-25 22:54:03 PDT
An additional confirmation that the issue is due to + characters in form boundary strings:

1:  google for WebKitFormBoundary. In every case I have examined (of, there are thousands) where it appears as unintended user input, it contains at least one + character. 

(Note that google hits where people *intentionally* refer to WebKitFormBoundary, i.e. "What the #$@! is this WebKitFormBoundary in my post?!?" don't necessarily contain +.  People don't always quote the random trailing characters, referring to them as "garbage" or "gibberish", and assuming they are incidental or accidental and that "WebKitFormBoundary" is the problem - when actually it is the *intentionally random characters* which contain the problem, the +  character).

2: Firefox (along with Explorer) are reported not to trigger the issue. Examining Firefox's source, its boundary strings are created using dashes and digits only.
Comment 15 Bill Monk 2009-03-26 00:56:23 PDT
Created attachment 28964 [details]
Proposed patch for bug ID 16122

Patch submitted.
Comment 16 Bill Monk 2009-03-26 01:36:11 PDT
To test, running an unpatched WebKit go to any of 

http://www.comicscommunity.com/boards/brereton/

http://wwwboard.modelcarkits.com/

http://www.misterguitar.us/cgi-bin/chetboard.pl


Click any existing post

Fill in these fields:

Name: any text
Anti-Spam (if field is present):5
Email: leave blank or any text
Subject: any text
Message: any text

At bottom of page, click "Preview Message" repeatedly.

Without the patch, "------WebKitFormBoundary<random chars>+<random chars>" etc will eventually appear in multiple fields.

Continue clicking. More occurrences will appear at random, if the boundary text contains a '+'.
Note the text in, say, the Subject field, and delete it. Retype first character of the original subject. Popup menu appears with many choices containing "------WebKitFormBoundary" variants.


Now switch to patched WebKit. 
In Safari->Preferences->AutoFill->other forms, remove any entries for the site being tested.
Clear and reenter all fields
Repeatedly click Preview Message. 
Confirm that "------WebKitFormBoundary..." does not appear even after many attempts.

To test on the server side, it is necessary to set up a WebBBS board using the scripts at http://awsd.com/download/webbbs/webbbs_files.zip
This is rather a hassle to set up but runs fine on any Mac under Apache in OS X.
In the file webbbs_post.pl, in Parse_Post, set up some logging to confirm that '+' characters are no longer appearing in form boundaries sent by WebKit.
The location of the specific WebBBS bug is in earlier comments above - boundary text is used as a regex, without escaping metacharacters, causing
the script to incorrectly indentify boundaries as user text when a + appears in a boundary.

While individual admins may be amenable to patching their installs of WebBBS, the base distribution has not been updated since 2002 so it seems likely this bug will continue to propagate for as long as the perl source remains available.




Comment 17 Bill Monk 2009-04-10 17:33:29 PDT
Admin of http://www.misterguitar.us/cgi-bin/chetboard.pl has patched that board with the perl fix mentioned here, so that URL is no longer useful for testing.
Comment 18 Bill Monk 2009-04-10 17:44:11 PDT
I tested some of the boards mentioned here using latest versions (as of today's date) of

Safari for Windows
Google Chrome 

as well as two versions of Mobile Safari (both on the device and in the Simulator).

All exhibited the issue.

Also tested with Mobile Safari using latest developer beta. I believe NDA would preclude posting actual results, but it should be safe to say that they were unsurprising.
Comment 19 Bill Monk 2009-04-10 17:52:22 PDT
(In reply to comment #13)
> Yes, it seems clear that we should add a workaround.
> 
> > A patch will be submitted shortly.
> 
> Please do!
> 

I think the patch probably went unnoticed in the stream of other bugs up for review. A couple weeks have passed, so wondering if it'd be helpful to re-submit? Not sure of protocol here... 

The patch is literally a 1-character change, and fixes a fairly visible issue, and since it was fairly time-consuming to ferret out, would love to see it get reviewed.
Comment 20 Alexey Proskuryakov 2009-04-10 23:02:37 PDT
Comment on attachment 28964 [details]
Proposed patch for bug ID 16122

> I think the patch probably went unnoticed

Looks like that, sorry!

+        WARNING: NO TEST CASES ADDED OR CHANGED

This line is just a reminder to the person making the patch, please remove it, or replace with a short explanation of why a test is not possible, or better yet, add a test. I realize that the behavior is random, but it's acceptable to have tests that are not 100% reliable.

It's not often that I have a chance to say this, but I think that the ChangeLog has a bit too much information - for a full discussion, one can always refer to this bug.

This patch has a lot of tab characters, please replace them with spaces.

+        // Note: it may be better to not restore the / character, based on the regex escaping issues
+        // documented in https://bugs.webkit.org/show_bug.cgi?id=16122// 

Yes, please remove the FIXME, and change this whole comment to just explain the behavior.

The substance of the patch looks good.