RESOLVED MOVED 242822
Em dash should not be separated from preceding word
https://bugs.webkit.org/show_bug.cgi?id=242822
Summary Em dash should not be separated from preceding word
Brad Andalman
Reported 2022-07-15 15:12:08 PDT
Created attachment 460938 [details] HTML that shows incorrect word wrap for a word followed by an em dash When an em dash immediately follows a word, and that em dash can't fit on a line, then both the preceding word and the em dash should be moved to the next line. This works for hyphens, en dashes, and figure dashes, but does not work for em dashes. Both Safari and Chrome exhibit this bug. Firefox, however, behaves correctly.
Attachments
HTML that shows incorrect word wrap for a word followed by an em dash (1.33 KB, text/html)
2022-07-15 15:12 PDT, Brad Andalman
no flags
Screenshot of Safari, Chrome, and Firefox (601.96 KB, image/png)
2022-07-15 15:14 PDT, Brad Andalman
no flags
Test case (381 bytes, text/html)
2022-07-15 16:43 PDT, zalan
no flags
Apple Books showing em dash and quotation mark on its own line (653.75 KB, image/png)
2022-07-15 18:53 PDT, Brad Andalman
no flags
Brad Andalman
Comment 1 2022-07-15 15:14:44 PDT
Created attachment 460939 [details] Screenshot of Safari, Chrome, and Firefox Screenshot of Safari, Chrome, and Firefox rendering the HTML in the first attachment. Safari and Chrome both exhibit the bug. Firefox, on the right, behaves correctly.
zalan
Comment 2 2022-07-15 16:43:50 PDT
Created attachment 460941 [details] Test case Apparently ubrk_following() returns position 2 for XX[em dash]XX and position 3 for XX[figure dash]XX so we find a soft wrap opportunity between XX and [em dash]. (not sure how FF resolve this. we strictly rely on ICU here)
Alexey Proskuryakov
Comment 3 2022-07-15 18:06:11 PDT
This looks like correct behavior per UAX #14. It also matches TextEdit.
Brad Andalman
Comment 4 2022-07-15 18:52:35 PDT
UAX#14 does assert that "Line breaks can occur before and after an EM DASH." It also claims that the only use for an EM DASH is to "set off parenthetical text." That is only one of the ways that an EM DASH can be used, however. The Chicago Manual of Style, for instance, enumerates EIGHT different, valid uses for an EM DASH. In entry 6.87 of the 17th edition, the Chicago Manual of Style mentions that an EM DASH should be used for "sudden breaks or interruptions." One of the examples it uses is as follows: "Well, I don't know," I began tentatively. "I thought I might—” "Might what?" she demanded. If that trailing EM DASH followed by a quotation mark were to end on its own line, it would look terrible. This is easy to make happen on a simple web page, as in my original attachment, but it is easily seen in Apple Books as well. (I'll attach a screenshot of The Invisible Man that illustrates this.) The Chicago Manual of Style also addresses the problem of line breaks directly (in 6.90): "In printed publications, line breaks should generally be made after an em dash but not before, in the manner of hyphens. In the case of a closing quotation mark (or any other mark of punctuation) immediately following the dash, however, the quotation mark and dash MUST NOT BE BROKEN AT THE END OF A LINE" [emphasis mine].
Brad Andalman
Comment 5 2022-07-15 18:53:24 PDT
Created attachment 460950 [details] Apple Books showing em dash and quotation mark on its own line
Alexey Proskuryakov
Comment 6 2022-07-16 13:34:57 PDT
An author can implement the desired behavior with a zero width joiner (e.g. "sir‍—" for the attached test), among other ways. While the CSS spec is not fully prescriptive on exactly following UAX #14, it does reference it as the baseline. So WebKit is not wrong here, and given that Chrome behaves in the same way, keeping our current behavior is best for compatibility. https://drafts.csswg.org/css-text/#soft-wrap-opportunity
Myles C. Maxfield
Comment 7 2022-07-16 21:05:07 PDT
WebKit treats ICU as the source-of-truth for line breaking behavior. If you want this to be fixed, I recommend reporting this to the ICU project instead at https://unicode-org.atlassian.net/jira/software/c/projects/ICU/issues/?filter=allissues
Myles C. Maxfield
Comment 8 2022-07-16 21:06:00 PDT
> If that trailing EM DASH followed by a quotation mark were to end on its own line, it would look terrible. I agree, but this needs to be fixed in ICU, not WebKit.
Brad Andalman
Comment 9 2022-07-18 10:20:44 PDT
Brad Andalman
Comment 10 2022-07-18 10:27:28 PDT
Thanks for helping me find the right place to report this!
Brent Fulgham
Comment 11 2022-07-18 11:54:36 PDT
Reclassifying as MOVED (as the bug is in the ICU component). The bug is not INVALID.
Myles C. Maxfield
Comment 12 2022-07-18 12:14:53 PDT
Thank you fo refiling!
Brad Andalman
Comment 13 2022-07-19 10:56:11 PDT
I was informed that filing with the ICU wasn't correct, so I refiled it as an error against UAX#14. My comments have been added to PRI #446 for feedback: https://www.unicode.org/review/pri446/ Once again, thanks to everyone for helping me submit this to the right venue. I truly appreciate it!
zalan
Comment 14 2022-07-19 11:00:41 PDT
(In reply to Brad Andalman from comment #13) > I was informed that filing with the ICU wasn't correct, so I refiled it as > an error against UAX#14. My comments have been added to PRI #446 for > feedback: > https://www.unicode.org/review/pri446/ > > Once again, thanks to everyone for helping me submit this to the right > venue. I truly appreciate it! Thank you for filing it! When the fix comes through both WebKit and Chrome will progress!
Myles C. Maxfield
Comment 15 2022-09-07 21:59:55 PDT
*** Bug 21677 has been marked as a duplicate of this bug. ***
Karl Dubost
Comment 16 2022-09-08 08:36:48 PDT
Note You need to log in before you can comment on or make changes to this bug.