RESOLVED FIXED Bug 40044
resolve urls in text/html clipboard data
https://bugs.webkit.org/show_bug.cgi?id=40044
Summary resolve urls in text/html clipboard data
Tony Chang
Reported 2010-06-01 23:53:17 PDT
resolve urls in text/html clipboard data
Attachments
Patch (30.37 KB, patch)
2010-06-01 23:55 PDT, Tony Chang
no flags
Patch (30.31 KB, patch)
2010-06-13 18:14 PDT, Tony Chang
ojan: review+
Tony Chang
Comment 1 2010-06-01 23:55:59 PDT
Tony Chang
Comment 2 2010-06-02 00:05:38 PDT
For example, try copying text from wikipedia with links and pasting into http://www.mozilla.org/editor/midasdemo/ . The pasted content should have absolute URLs that point to wikipedia, even though wikipedia uses relative URLs. This works already on windows ports because of the CF_HTML clipboard format which includes the base URL. This works on Safari as well because (I think) the WebArchive format also keeps the base URL. This is currently broken on Linux variants and Chromium Mac, which doesn't use WebArchive. The fix is to expand the URLs during the copy operation. I didn't expand file:/// urls because there might be privacy issues with expanding them. This doesn't match the current Mac impl because it will expand URLs from a file:/// source. I can revert that part since it's kind of orthogonal. The other tricky bit about the Mac impl is that it only resolves URLs if the page you're pasting into is a different URL from where you copied from. I don't think we can do that with text/html data because it doesn't keep track of the base URL. This makes testing a bit tricky (see the http test case). The chromium bug report of this is http://crbug.com/28960
Tony Chang
Comment 3 2010-06-09 22:25:57 PDT
Ojan or Jianli, could one of you review this?
Ojan Vafai
Comment 4 2010-06-10 18:25:11 PDT
Comment on attachment 57626 [details] Patch LayoutTests/editing/pasteboard/copy-resolves-urls.html:16 + s.setBaseAndExtent(test, 0, test, 4); Nit: a somewhat more readable way to do this: s.selectAllChildren(test); LayoutTests/http/tests/misc/copy-resolves-urls.html:16 + s.setBaseAndExtent(test, 0, test, 4); ditto. LayoutTests/ChangeLog:11 + * editing/pasteboard/paste-noscript.html: Updated to no longer throw a JS exception so the results are the same nice! Here are the behaviors I saw with a quick test in different browsers: 1. IE: always keeps relative URLs relative 2. Safari 5.0 Mac / Chrome Windows / FF Windows: Depends on where you copy from. If you copy from within a contentEditable, relative URLs stay relative, otherwise they are resolved. 3. FF Mac: Always resolve URLs. I wouldn't mind someone double-checking to make sure I didn't get that wrong. > The other tricky bit about the Mac impl is that it only resolves URLs if the page you're pasting into is a different URL from where you copied from. I don't think we can do that with text/html data because it doesn't keep track of the base URL. This makes testing a bit tricky (see the http test case). Are you sure about this? I didn't see this in Safari 5.0 Mac. As in, copy-paste from one contentEditable to another contentEditable kept things relative even if they were on different pages. Anyways, behavior (2) seems optimal. I think we should do that. Basically, s/AbsoluteURLs/AbsoluteNonEditableURLs/. Otherwise, the code changes look good to me. Also, FWIW, I like the idea of never resolving file URLs.
Tony Chang
Comment 5 2010-06-10 23:05:44 PDT
(In reply to comment #4) > Here are the behaviors I saw with a quick test in different browsers: > 1. IE: always keeps relative URLs relative Hmm, I always get absolute URLs. See test URL added to the bug. If I copy out of the first content editable field, I get an absolute URL no matter where I paste it (same contenteditable, second contenteditable, contenteditable in iframe). I wonder if it matters that I'm testing over remote desktop? > 2. Safari 5.0 Mac / Chrome Windows / FF Windows: Depends on where you copy from. If you copy from within a contentEditable, relative URLs stay relative, otherwise they are resolved. When I run the same test on safari 4 mac, I get relative URLs in the first 2 contenteditables, but I get absolute URLs in the iframe. It doesn't seem to matter if I copy from within a contenteditable or not. Also, the existing test editing/pasteboard/paste-noscript.html doesn't copy from a contenteditable, and it gets a relative URL. In Chrome Win, I get the same behavior as safari 4 mac (absolute in the iframe). In Firefox win (3.6), I get relative URLs in the test case, but if I copy from wikipedia, I get full URLs. If I load http://ponderer.org/tests/urls.html and http://www.aypwip.org/tests/urls.html (which happen to be the same file on different domains), and copy from one to the other, I get full URLs. I guess Firefox is matching on domain or something. This was also over remote desktop. > 3. FF Mac: Always resolve URLs. I get the same behavior here as FF Win. It seems to be domain based. We must be doing something different because we're getting very different results.
Julie Parent
Comment 6 2010-06-11 17:26:52 PDT
One reason you may have been getting different results is that it can depend on how you get/set the hrefs. For example, in IE, if you set a href using innerHTML, it is always made absolute, even if you set using relative. I just tests this in all major browsers on Windows and Mac. Here are my results: Tested on: Win FF 3.6.3 Win Chrome 6.0.427.0 Win Safari 5 Win IE 8 Mac Chrome 6.0.427.0 Mac Safari 4.0.5 Mac FF 3.6.3 Within a contentEditable region: Everyone: relative (but FF will add ../ to the start if the href starts with a /) Within a designMode iframe OR contentEditable iframe to contentEditable iframe (same domain or diff domain): absolute: FF relative: chrome, safari relative, with about: added to the start: IE Non-editable to contentEditable iframe (same domain or not): absolute: FF, chrome win, Safari relative: IE, chrome mac non-editable to contentEditable div in the same window: everyone: relative (but FF will add ../ to the start if the href starts with a /)
Ojan Vafai
Comment 7 2010-06-11 18:43:30 PDT
(In reply to comment #6) Thanks for getting all that data Julie! data++ I think it's very clear that we want copying from a non-editable page (e.g. wikipedia) to resolve relative URLs when pasted (e.g. into gmail). Given that, there's a few options. Discussion on IRC landed on resolving URLs only for clipboard formats that don't have a concept of a base. Tony, does that seem reasonable to you? [6:26pm] othermaciej: that is some weird behavior [6:26pm] ojan: yeah, my intuition is that we should always resolve urls on copy [6:26pm] ojan: it makes links work by default. [6:26pm] ojan: if a site specifically needs relative URLs, then they can catch the paste event and make the URLs relative again. [6:27pm] othermaciej: ojan: I think keeping them relative and carrying a base is better, then the program doing the pasting can decide what is best [6:27pm] othermaciej: ojan: of course, when the paste recipient is also WebKit, the buck stops with us [6:28pm] ojan: othermaciej: do OS clipboards have a concept of a base? [6:29pm] ojan: othermaciej: i guess...it's not clear to me how we would implement having a base unless that's already a standard part of OS clipboards [6:31pm] othermaciej: ojan: we could keep things relative for clipboard formats that support carrying a base, and resolve to absolute otherwise, I suppose [6:32pm] othermaciej: ojan: that's really based on the format I think, not the OS [6:32pm] othermaciej: ojan: though in principle, any clipboard that can carry HTML can also carry a base, via a <base> element [6:32pm] othermaciej: ojan: but I'm not sure if paste recipients would DTRT in that case [6:35pm] ojan: othermaciej: keeping things relative only for clipboard formats the support carrying a base makes sense to me. i didn't realize that any clipboard formats did. i can't imagine <base> would work for HTML as the paste would then affect all links on the page if the paste handler didn't handle it correclty. [6:36pm] othermaciej: ojan: according to the bug comments, it seems like at least WebArchive and some Windows format do
Tony Chang
Comment 8 2010-06-13 17:19:43 PDT
(In reply to comment #7) > (In reply to comment #6) > I think it's very clear that we want copying from a non-editable page (e.g. wikipedia) to resolve relative URLs when pasted (e.g. into gmail). Given that, there's a few options. Discussion on IRC landed on resolving URLs only for clipboard formats that don't have a concept of a base. Tony, does that seem reasonable to you? This sounds like exactly what the current patch is doing :) It only changes 'text/html', which is a format that doesn't carry the base. It doesn't change CF_HTML or WebArchive, both of which carry a base URL. To restate: we should always resolve URLs for 'text/html', no matter what the copy source is, right? That's what this patch does. Note that this does make testing hard because simply doing a copy/paste doesn't tell you which format is being used. I will reupload the patch with the selectAllChildren fix. I'm still getting slightly different data than what you an Julie are seeing (when I copy from wikipedia and paste into a content editable in IE, I get full URLs), but I guess it doesn't matter for this patch (I assume it has to do with RDP and the clipboard helper thingy).
Tony Chang
Comment 9 2010-06-13 18:14:42 PDT
Tony Chang
Comment 10 2010-06-15 01:35:20 PDT
Note You need to log in before you can comment on or make changes to this bug.