Bug 13045 - REGRESSION: Blackboard CourseWare Error with Nightlies after Mar 8
Summary: REGRESSION: Blackboard CourseWare Error with Nightlies after Mar 8
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Forms (show other bugs)
Version: 523.x (Safari 3)
Hardware: Macintosh PowerPC OS X 10.4
: P1 Major
Assignee: Nobody
URL: http://moreheadstate.blackboard.com/w...
Keywords: Regression
Depends on:
Blocks:
 
Reported: 2007-03-12 05:19 PDT by Robert Royar
Modified: 2007-03-13 07:47 PDT (History)
3 users (show)

See Also:


Attachments
tcpdump of form interaction with build 20107 (broken) (60.43 KB, text/plain)
2007-03-12 10:39 PDT, Robert Royar
no flags Details
tcpdump log for form interaction (OK) build 20051 (65.27 KB, text/plain)
2007-03-12 10:41 PDT, Robert Royar
no flags Details
Webarchive of form page. This is the (broken) 20107 build (290.00 KB, application/x-webarchive)
2007-03-12 10:44 PDT, Robert Royar
no flags Details
Webarchive using (OK) webkit build 20051 (260.77 KB, application/x-webarchive)
2007-03-12 10:46 PDT, Robert Royar
no flags Details
Patch v1 (1.29 KB, patch)
2007-03-12 15:53 PDT, David Kilzer (:ddkilzer)
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Royar 2007-03-12 05:19:30 PDT
Version from last working nightly disk-image VERSION file: 20051 uploaded 03:51 US EST 08 March 2007.
Browser: Safari using run script

For Blackboard forms that contain the ability to upload an attachment, Blackboard reports an error on submitting the form (regardless of whether there is an attachment attached).   This appeared to be a javascript error (because it occurs when the script tied to the submit button is activated); however, turning on the jscript console  and using debugging does not lead to any report from the kit or Safari itself.

Here is what Blackboard's error reports:
 Invalid data: no boundary string found
 Blackboard error

I realize that Bb is a closed system and that it is likely none of the developers has access to it.  I can provide source listings to developers of a problem page using both the nightly that works and a later build that produces the error.
Comment 1 David Kilzer (:ddkilzer) 2007-03-12 07:20:07 PDT
This is almost certainly a regression from r20074:

http://trac.webkit.org/projects/webkit/changeset/20074

Robert, great job tracking down which WebKit nightly (r20051) last worked!  Since this is a server-side error, the best information you could provide is "tcpdump" output of the exact same form (with the exact same data) being submitted to the server from a WebKit nightly that works, and from one that doesn't.

Comment 2 David Kilzer (:ddkilzer) 2007-03-12 07:23:15 PDT
Actually, if you are able to upload the source of the original HTML page being submitted, that would help as well.  Saving it as a webarchive (from a WebKit nightly, please, since there have been some fixes to webarchive output since Safari was released in Tiger) and attaching it to this bug should be sufficient!

Comment 3 David Kilzer (:ddkilzer) 2007-03-12 07:56:24 PDT
Section 5.1 of RFC 1521 suggests that the character sequence "=_" appear in the boundary string since it will never appear in a quoted-printable body:

   Since the hyphen character ("-") is represented as itself in the
   Quoted-Printable encoding, care must be taken, when encapsulating a
   quoted-printable encoded body in a multipart entity, to ensure that
   the encapsulation boundary does not appear anywhere in the encoded
   body.  (A good strategy is to choose a boundary that includes a
   character sequence such as "=_" which can never appear in a quoted-
   printable body.  See the definition of multipart messages later in
   this document.)

Comment 4 Darin Adler 2007-03-12 10:08:58 PDT
I suspect that adding =_ will not be sufficient. More likely we need to disallow certain Base64 characters from being included in the boundary as well.
Comment 5 Robert Royar 2007-03-12 10:39:25 PDT
Created attachment 13595 [details]
tcpdump of form interaction with build 20107 (broken)

as requested by ddkilzer@webkit.org
Comment 6 Robert Royar 2007-03-12 10:41:34 PDT
Created attachment 13596 [details]
tcpdump log for form interaction (OK) build 20051

as requested by ddkilzer@webkit.org
Comment 7 David Kilzer (:ddkilzer) 2007-03-12 10:42:08 PDT
(In reply to comment #4)
> I suspect that adding =_ will not be sufficient. More likely we need to
> disallow certain Base64 characters from being included in the boundary as well.

Sorry, I did not mean to imply that adding "=_" would fix the issue, but I thought it would be a Good Thing(tm) to do while we're fixing the main cause of the bug (which is still unknown at this point).
Comment 8 Robert Royar 2007-03-12 10:44:39 PDT
Created attachment 13597 [details]
Webarchive of form page.  This is the (broken) 20107 build

Webarchive requested by ddkilzer@webkit.org
Comment 9 Robert Royar 2007-03-12 10:46:38 PDT
Created attachment 13598 [details]
Webarchive using (OK) webkit build 20051

file requested by ddkilzer@webkit.org
Comment 10 Darin Adler 2007-03-12 10:46:54 PDT
(In reply to comment #7)
> Sorry, I did not mean to imply that adding "=_" would fix the issue, but I
> thought it would be a Good Thing(tm) to do while we're fixing the main cause of
> the bug (which is still unknown at this point).

Good point.
Comment 11 David Kilzer (:ddkilzer) 2007-03-12 10:48:54 PDT
(In reply to comment #5)
> Created an attachment (id=13595) [edit]
> tcpdump of form interaction with build 20107 (broken)
> as requested by ddkilzer@webkit.org

(In reply to comment #6)
> Created an attachment (id=13596) [edit]
> tcpdump log for form interaction (OK) build 20051
> as requested by ddkilzer@webkit.org

Sorry, Robert, I should have been more specific about the format of the dump.  I wanted the actual packets being sent to the web server so that I could review the format of the HTML form data sent to the server.  (This requires the "-w outputfilename" flag to be used with tcpdump.)

If you think there may be non-public information in the packet captures, please email the files to adele at apple dot com and ddkilzer at webkit dot org.  Thanks!
Comment 12 David Kilzer (:ddkilzer) 2007-03-12 12:48:48 PDT
Robert, does submitting a form fail every time you try it with r20107, or just sometimes?

(In reply to comment #4)
> I suspect that adding =_ will not be sufficient. More likely we need to
> disallow certain Base64 characters from being included in the boundary as well.

As Darin alluded to in this comment, the server-side form parser may have issues with some characters in the multipart/form boundary string, even though I believe the new code is within spec (RFC 1521, Section 7.2.1; http://www.faqs.org/rfcs/rfc1521.html):

   The only mandatory parameter for the multipart Content-Type is the
   boundary parameter, which consists of 1 to 70 characters from a set
   of characters known to be very robust through email gateways, and NOT
   ending with white space.  (If a boundary appears to end with white
   space, the white space must be presumed to have been added by a
   gateway, and must be deleted.)  It is formally specified by the
   following BNF:

   boundary := 0*69<bchars> bcharsnospace

   bchars := bcharsnospace / " "

   bcharsnospace :=    DIGIT / ALPHA / "'" / "(" / ")" / "+" /"_"
                 / "," / "-" / "." / "/" / ":" / "=" / "?"

Note that Base64 encoding is a subset of bcharsnospace.

Comment 13 David Kilzer (:ddkilzer) 2007-03-12 15:15:01 PDT
(In reply to comment #12)
> Robert, does submitting a form fail every time you try it with r20107, or just
> sometimes?

It's probably going to fail every time.  Looking at a tcpdump of the form submission provided by Robert via private email, the Content-Type header looks like this:

Content-Type: multipart/form-data; boundary=----WebKit-form-boundary-KQmgzhzOA3IPUXpI

The multipart/form-data looks fine as well (I abbreviated the actual data to the first and last form items sent):

------WebKit-form-boundary-KQmgzhzOA3IPUXpI
Content-Disposition: form-data; name="action"

MODIFY
------WebKit-form-boundary-KQmgzhzOA3IPUXpI
Content-Disposition: form-data; name="restrict_end_date_am"

1
------WebKit-form-boundary-KQmgzhzOA3IPUXpI--

Within the body of the response, however, is a Java stack trace within an HTML comment that starts like this:

java.io.IOException: Invalid data: no boundary string found.
	at blackboard.platform.filesystem.FileSystemServiceImpl.processUpload(FileSystemServiceImpl.java:1104)
	at blackboard.web.assignments.instructor.proc_005fedit_005fassignment_jsp._jspService(proc_005fedit_005fassignment_jsp.java:113)
	at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
[...]

Unfortunately, it appears that Blackboard has rolled their own form processing code (as opposed to using a robust open source library such as commons-fileupload from the Jakarta project), which means that they will most likely need to fix their implementation.  (If I had to guess, they probably don't expect dash characters to appear after non-dash characters in the boundary string.)

The only suggestion I have to try is to double-quote the boundary string in the Content-Type header per RFC 2046, Section 5.1.1 (http://www.faqs.org/rfcs/rfc2046.html) in case Blackboard's implementation handles that case:

   WARNING TO IMPLEMENTORS:  The grammar for parameters on the Content-
   type field is such that it is often necessary to enclose the boundary
   parameter values in quotes on the Content-type line.  This is not
   always necessary, but never hurts. Implementors should be sure to
   study the grammar carefully in order to avoid producing invalid
   Content-type fields.  Thus, a typical "multipart" Content-Type header
   field might look like this:

     Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08j34c0p

   But the following is not valid:

     Content-Type: multipart/mixed; boundary=gc0pJq0M:08jU534c0p

   (because of the colon) and must instead be represented as

     Content-Type: multipart/mixed; boundary="gc0pJq0M:08jU534c0p"

Comment 14 David Kilzer (:ddkilzer) 2007-03-12 15:22:53 PDT
(In reply to comment #13)
> The only suggestion I have to try is to double-quote the boundary string in the
> Content-Type header per RFC 2046, Section 5.1.1
> (http://www.faqs.org/rfcs/rfc2046.html) in case Blackboard's implementation
> handles that case:

Hopefully doing this won't cause other server-side code to stop working properly!

For comparison, this is what Firefox 2.0.0.2 sends:

Content-Type: multipart/form-data; boundary=---------------------------168072824752491622650073

This is what Opera 9.10 sends:

Content-Type: multipart/form-data; boundary=----------7aLL9V73idje3BWjvvB1u8

And Mac IE 5.2.3 sends:

Content-type: multipart/form-data; boundary=---------------------------168071508944249

I don't have MSIE 6 or MSIE 7 available to test them.

Comment 15 David Kilzer (:ddkilzer) 2007-03-12 15:34:37 PDT
(In reply to comment #13)
> The only suggestion I have to try is to double-quote the boundary string in the
> Content-Type header per RFC 2046, Section 5.1.1
> (http://www.faqs.org/rfcs/rfc2046.html) in case Blackboard's implementation
> handles that case:

Or we could remove the dashes, for example:

Content-Type: multipart/form-data;
boundary=----WebKitKQmgzhzOA3IPUXpI

Content-Type: multipart/form-data;
boundary=----WebKitFormBoundaryKQmgzhzOA3IPUXpI
Comment 16 David Kilzer (:ddkilzer) 2007-03-12 15:53:29 PDT
Created attachment 13603 [details]
Patch v1

(In reply to comment #15)
> Or we could remove the dashes, for example:

A patch that removes the dashes from the boundary prefix.  It's a speculative fix, but will match more closely the behavior of Firefox 2.0.0.2, Opera 9.10 and Mac IE 5.2.3.

Someone should verify what MSIE 6 and/or MSIE 7 do as well.
Comment 17 Darin Adler 2007-03-12 17:47:12 PDT
Comment on attachment 13603 [details]
Patch v1

r=me
Comment 18 David Kilzer (:ddkilzer) 2007-03-12 21:10:08 PDT
Committed revision 20136.

Robert, please try a WebKit nightly on or after r20136 to see if that fixes the issue for you.  Thanks!

Comment 19 David Kilzer (:ddkilzer) 2007-03-12 21:32:21 PDT
(In reply to comment #14)
> I don't have MSIE 6 or MSIE 7 available to test them.

MSIE 7.0.5730.11 sends a boundary ending like this:

-----------------------------7d72602201fc--

Comment 20 Robert Royar 2007-03-13 03:33:43 PDT
(In reply to comment #18)
> Committed revision 20136.
> 
> Robert, please try a WebKit nightly on or after r20136 to see if that fixes the
> issue for you.  Thanks!
> 

I tied r20136, and it works, thank you.
Comment 21 David Kilzer (:ddkilzer) 2007-03-13 03:51:24 PDT
(In reply to comment #20)
> I tied r20136, and it works, thank you.

Thanks, Robert!  If you're feeling generous, I'd recommend filing a bug report with Blackboard, having them pay special attention to Comment #12 and Comment #13 in this bug.  Those comments should contain enough information to fix their form boundary parsing bug.

Comment 22 Robert Royar 2007-03-13 07:38:50 PDT
(In reply to comment #21)
> (In reply to comment #20)
> > I tied r20136, and it works, thank you.
> 
> Thanks, Robert!  If you're feeling generous, I'd recommend filing a bug report
> with Blackboard, having them pay special attention to Comment #12 and Comment
> #13 in this bug.  Those comments should contain enough information to fix their
> form boundary parsing bug.

I am feeling generous.  It's in my best interest since over 50 % of my teaching load is often mediated via Blackboard.  I do not believe that Bb has a bug-reporting mechanism for users.  I reported the information about Webkit and Bb to our local admin to pass along.  I hope that gets through.

I believe that someone in the developer group might begin actively monitoring cases where Bb (and WebCT) break with the Webkit.  When I tried first to track down the eror, I found most of the error reports by searching for Safari and Blackboard rather than Webkit and Blackboard, but I think the ones I found were really the engine rather than the front end--in the way folks refer to Firefox but not Gecko when they see browser bugs.  

Anyway, finding links such as http://west.wwu.edu/atus/blackboard/browser.shtml might help developers track down bugs that never get reported.  So many of the schools take the approach that users should switch browsers.  For what it's worth, I am no fan of Blackboard, neither are most of my students.

Comment 23 David Kilzer (:ddkilzer) 2007-03-13 07:47:31 PDT
So to summarize some of my suggestions in previous comments:

- I did not add "=_" since none of the other mainstream browsers did this.

- I did not add quotes around the boundary string in the Content-Type header since none of the other browsers did this (NOTE: MSIE 6/7 headers have not been tested, although there is no need for them to do this given their generated boundary string).

- I did not add a number to the beginning of the non-dash characters even though all the other mainstream browsers appear to do this in the small sampling taken (including the original string for KHTML: 0xKhTmLbOuNdArY).  Hopefully no programmers implementing a subset of the real specification noticed this and wrote their parsing routines to expect it.  The spec certainly doesn't require it.