Bug 11154

Summary: Soft hyphenation and the clipboard
Product: WebKit Reporter: Sylvan Migdal <sylvan>
Component: TextAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Major CC: abarth, ap, eric, mitz, simon, webkit-ews, webkit.review.bot
Priority: P3    
Version: 419.x   
Hardware: Mac   
OS: OS X 10.4   
Attachments:
Description Flags
test case
none
Improve the handling of soft hyphens in Copy and Find operations
none
Improve the handling of soft hyphens in Copy and Find operations darin: review+

Description Sylvan Migdal 2006-10-04 10:32:21 PDT
There is a subtle but important inconsistency in the implementation of the soft hyphen character. Although the hyphen seems to be rendered (or not rendered) correctly on the page itself, when text featuring a soft hyphen is copied to the clipboard (or delivered through the OS X Services menu), it becomes highly erratic in the way it is displayed.

Sometimes all soft hyphens are rendered as spaces. Sometimes they are rendered as ordinary, visible hyphen characters. Sometimes they are invisible. Sometimes, these different behaviors are mixed, depending on whether or not each individual soft hyphen was rendered on the page it was copied from. This mix of behaviors appears to differ not only from application to application, but even within a single application, depending on how or where it is pasted.

As I understand it, the soft hyphen ought simply to be omitted when copying text to the clipboard, since the clipboard does not (and should not) preserve the text wrapping status of the text within the orignal application.
Comment 1 Alexey Proskuryakov 2006-10-04 21:33:57 PDT
(In reply to comment #0)
> Sometimes all soft hyphens are rendered as spaces. Sometimes they are rendered
> as ordinary, visible hyphen characters. Sometimes they are invisible.
> Sometimes, these different behaviors are mixed, depending on whether or not
> each individual soft hyphen was rendered on the page it was copied from.

Can you please provide detailed steps to reproduce for each of these scenarios?
Comment 2 Sylvan Migdal 2006-10-05 08:33:37 PDT
> Can you please provide detailed steps to reproduce for each of these scenarios?

1. When copying/pasting into a web form or the URL bar within Safari: rendered soft hyphens become hyphens, unrendered soft hyphens become spaces.

2. When copying/pasting into the Google toolbar within Safari: rendered soft hyphens disappear, unrendered soft hyphens become spaces.

3. When copying/pasting into a BBEdit document: all soft hyphens become hyphens.

4. When delivering into a BBEdit document via Services/BBEdit/New Window with Selection: same behavior as (1).

5. When copying/pasting into the body of a Mail message: soft hyphens remain as soft hyphens, even if the message is set as plain text.

6. When copying/pasting into one of the header fields of a Mail message: same behavior as (2).

7. When copying/pasting (or delivering via Servies) into a TextEdit document: same behavior as (1).

8. When copying/pasting into the search field of a Finder or iTunes window: same behavior as (3).

9. When copying/pasting into a Terminal window: unrendered soft hyphens become spaces, rendered soft hyphen become the sequence: \302\255

I could possibly find more variations, but that's all I've tried.
Comment 3 Alexey Proskuryakov 2006-10-05 09:22:58 PDT
Confirmed. I think this is a major issue, upgrading severity.

I wouldn't necessarily agree with the suggested fix, however - my expectation is that soft hyphens should be preserved when copying (as U+00AD SOFT HYPHEN), because one would want the pasted test to be hyphenated correctly, too.
Comment 4 Alexey Proskuryakov 2006-10-05 09:25:28 PDT
Created attachment 10929 [details]
test case
Comment 5 Sylvan Migdal 2006-10-05 10:02:34 PDT
(In reply to comment #3)
> I wouldn't necessarily agree with the suggested fix, however - my expectation
> is that soft hyphens should be preserved when copying (as U+00AD SOFT HYPHEN),
> because one would want the pasted test to be hyphenated correctly, too.

I guess you're right - that would probably be the most correct behavior. My thinking was that if applications that users are likely to paste the text into choke on the character, it could result in a disruption to the text far out of proportion to the mere loss of discretionary hyphens. I don't know how widespread a problem that's likely to be, but it's possible that it would be more "user-friendly" to simply omit the character, which, on a web site, is likely intended purely for the particular constraints of the layout it appears in, and isn't critically important outside of that context.

Anyway, I leave it to those who are actually webkit developers to decide.
Comment 6 Alexey Proskuryakov 2009-06-30 02:33:45 PDT
See also: bug 26774 (most likely, it has the same underlying reason).
Comment 7 mitz 2010-09-28 11:05:57 PDT
Created attachment 69072 [details]
Improve the handling of soft hyphens in Copy and Find operations
Comment 8 Alexey Proskuryakov 2010-09-28 11:14:28 PDT
Comment on attachment 69072 [details]
Improve the handling of soft hyphens in Copy and Find operations

View in context: https://bugs.webkit.org/attachment.cgi?id=69072&action=review

> WebCore/ChangeLog:20
> +        boxes. Changed font code to render the soft hpyhen character as a zero width space, so that

Will double clicking to select a word work? As well as DOMSelection methods to extend a selection with word/line granularity?
Comment 9 Early Warning System Bot 2010-09-28 11:24:35 PDT
Attachment 69072 [details] did not build on qt:
Build output: http://queues.webkit.org/results/3995166
Comment 10 mitz 2010-09-28 11:31:48 PDT
Created attachment 69076 [details]
Improve the handling of soft hyphens in Copy and Find operations

Added parentheses to silence a compiler warning.
Comment 11 Darin Adler 2010-09-28 11:49:53 PDT
Comment on attachment 69076 [details]
Improve the handling of soft hyphens in Copy and Find operations

View in context: https://bugs.webkit.org/attachment.cgi?id=69076&action=review

Would be nice to add some test cases for things Alexey mentioned (double clicking to select a word, granularity functions) at some point.

> WebCore/ChangeLog:34
> +        (WebCore::foldQuoteMarkOrSoftHyphen): Renamed foldQuoteMark() to this and added folding of
> +        soft hpyhen to 0.

Tpyo.

> WebCore/editing/TextIterator.cpp:1549
> +        case softHyphen:
> +            return 0;

I think we might need a comment, not just change log, about why folding a soft hyphen to a U+0000 is helpful. It’s much more obvious why folding a quote mark to the ASCII equivalent is helpful, which is why that has no “why” comment.
Comment 12 mitz 2010-09-28 12:40:49 PDT
Thanks for the review. I add comments and fixed two instances of that typo.

Fixed in <http://trac.webkit.org/changeset/68551>.
Comment 13 WebKit Review Bot 2010-09-28 13:04:27 PDT
http://trac.webkit.org/changeset/68551 might have broken Qt Linux Release
Comment 14 mitz 2010-09-28 13:15:15 PDT
(In reply to comment #13)
> http://trac.webkit.org/changeset/68551 might have broken Qt Linux Release

Added Qt-specific results in <http://trac.webkit.org/changeset/68553>.
Comment 15 mitz 2010-09-28 13:20:30 PDT
Updated expected result in <http://trac.webkit.org/changeset/68555>.