Bug 101120 - Copying & pasting tables from Excel results in verbose markup
Summary: Copying & pasting tables from Excel results in verbose markup
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: HTML Editing (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2012-11-02 16:10 PDT by Ryosuke Niwa
Modified: 2012-11-02 17:03 PDT (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ryosuke Niwa 2012-11-02 16:10:08 PDT
Copying tables from a html document Excel generated, and pasting into a contenteditable (e.g. Gmail) results in a very verbose markup because we end up inlining all styles.

e.g.

<style>
table {
    mso-displayed-decimal-separator:"\.";
    mso-displayed-thousand-separator:"\,";
}
td {
    mso-style-parent:style0;
    padding:0px;
    mso-ignore:padding;
    color:black;
    font-size:12.0pt;
    font-weight:400;
    font-style:normal;
    text-decoration:none;
    font-family:Candara, sans-serif;
    mso-font-charset:0;
    mso-number-format:General;
    text-align:general;
    vertical-align:bottom;
    border:none;
    mso-background-source:auto;
    mso-pattern:auto;
    mso-protection:locked visible;
    white-space:nowrap;
    mso-rotate:0;
}
</style>
<table border=0 cellpadding=0 cellspacing=0 width=916 style='border-collapse:
 collapse;table-layout:fixed;width:916pt'>
 <tr height=3 style='mso-height-source:userset;height:3.0pt'>
  <td height=3 width=6 style='height:3.0pt;width:6pt'></td>
 </tr>
</table>

can be transformed into:

<td height="3" width="6" style="padding: 0px; font-size: 12pt; font-family: Candara, sans-serif; vertical-align: bottom; border: none; white-space: nowrap; height: 3pt; width: 6pt; "></td>

<rdar://problem/7044154>
Comment 1 Ryosuke Niwa 2012-11-02 16:12:10 PDT
We can do better by removing redundant styles like padding: 0px & border: none; since they’re default styles. We can also remove width & height content attributes since they’re specified in CSS anyway:

<td style="font-size: 12pt; font-family: Candara, sans-serif; vertical-align: bottom; white-space: nowrap; height: 3pt; width: 6pt;"></td>

We can also strip whitespaces between :, ;, & , to get:

<td style="font-size:12pt;font-family:Candara,sans-serif;vertical-align:bottom;white-space:nowrap;height:3pt;width:6pt;"></td>
Comment 2 Ryosuke Niwa 2012-11-02 16:21:40 PDT
To remove properties that match the default style (i.e. redundant), we need some way of knowing their default values. Simon suggested that we might be able to extend StyleSelector to give us a style ignoring inline styles.

Antti, do you think this is feasible?
Comment 3 Antti Koivisto 2012-11-02 17:03:32 PDT
(In reply to comment #2)
> To remove properties that match the default style (i.e. redundant), we need some way of knowing their default values. Simon suggested that we might be able to extend StyleSelector to give us a style ignoring inline styles.
> 
> Antti, do you think this is feasible?

Should be easy. Inline style is included here:

http://trac.webkit.org/browser/trunk/Source/WebCore/css/StyleResolver.cpp?rev=133324#L950

Just needs a new RuleMatchingBehavior.