Bug 60620

Summary: [Windows] REGRESSION(r65868): Tables from Excel are pasted as plain text
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: DOMAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Major CC: abarth, aestes, ap, aroben, dcheng, enrica, eric, komoroske, tony
Priority: P1 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 60644, 62112    
Bug Blocks: 41115    
Attachments:
Description Flags
HTML content to be pasted in step 2
none
test for bisection none

Description Ryosuke Niwa 2011-05-11 05:05:28 PDT
On WebKit TOT, copying & pasting table cells from Microsoft Excel result in table cells being pasted as plain text.  This is due to the fact we're stripping tr, td, and other table cell elements when we're parsing the following HTML fragment (no table element):

 <col width=64 span=2 style='width:48pt'>
 <tr height=20 style='height:15.0pt'>
  <td height=20 class=xl65 width=64 style='height:15.0pt;width:48pt'>hello</span></td>
  <td class=xl65 width=64 style='border-left:none;width:48pt'>world</td>
 </tr>
 <tr height=20 style='height:15.0pt'>
  <td height=20 class=xl65 style='height:15.0pt;border-top:none'>webkit</td>
  <td class=xl65 style='border-top:none;border-left:none'>&nbsp;</td>
 </tr>

http://crbug.com/19360
Comment 1 Ryosuke Niwa 2011-05-11 05:35:07 PDT
Unfortunately, nighties between this range crash on start an I cannot test the behavior.  But I suspect that http://trac.webkit.org/changeset/65868 is the cause.
Comment 2 Ryosuke Niwa 2011-05-11 05:36:41 PDT
To reproduce this bug, insert the markup in comment #1 by execCommand('insertHTML'); WebKit should insert tr's and td's.
Comment 3 Ryosuke Niwa 2011-05-11 05:39:04 PDT
Created attachment 93111 [details]
HTML content to be pasted in step 2

To reproduce the original bug, you must follow the lengthly steps below:

Reproduction steps:
1. Download Windows Clipboard Viewer from http://www.peterbuettner.de/develop/tools/clipview/
2. Launch the program and copy & paste the attached content
3. Type in "HTML Format" (without quotations) into a box right of "Push in Clip"
4. Press "Push in Clip"
5. Open http://www.mozilla.org/editor/midasdemo/ in WebKit Windows port.
6. Paste in the content editable region of the page

Expected result:
"hello", "world", and "WebKit" are in table cells

Actual result:
"hello world WebKit" is pasted as a plain text.
Comment 4 Adam Roben (:aroben) 2011-05-11 06:11:03 PDT
<rdar://problem/9420024>
Comment 5 Ryosuke Niwa 2011-05-11 06:24:24 PDT
Created attachment 93113 [details]
test for bisection
Comment 6 Adam Roben (:aroben) 2011-05-11 07:22:12 PDT
Bisection indicates that r65868 is to blame, as suspected.
Comment 7 Adam Roben (:aroben) 2011-05-11 07:22:38 PDT
r65868 is "Use new HTML5 TreeBuilder for fragment parsing" (bug 44475).
Comment 8 Eric Seidel (no email) 2011-05-11 10:37:53 PDT
What does firefox do?  Should we not be using fragment parsing for copy/paste?
Comment 9 Ryosuke Niwa 2011-05-11 10:44:00 PDT
(In reply to comment #8)
> What does firefox do?  Should we not be using fragment parsing for copy/paste?

They seem to have a special parsing algorithm just to deal with CF HTML :(
Comment 10 Tony Chang 2011-05-11 11:00:15 PDT
(In reply to comment #9)
> (In reply to comment #8)
> > What does firefox do?  Should we not be using fragment parsing for copy/paste?
> 
> They seem to have a special parsing algorithm just to deal with CF HTML :(

Ryosuke and I were looking at the Moz code yesterday.  He's saying there's a parsing algorithm that parses CF_HTML to generate HTML, which I assume gets passed to their HTML5 parser.  WebKit has similar code for parsing CF_HTML into HTML, but it's very simplistic.
Comment 11 Ryosuke Niwa 2011-11-25 15:54:13 PST
This has been fixed on Chromium Windows.