WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
262497
Inconsistency in line-breaking due to special-casing of ASCII
https://bugs.webkit.org/show_bug.cgi?id=262497
Summary
Inconsistency in line-breaking due to special-casing of ASCII
Jonathan Kew
Reported
2023-10-02 12:26:52 PDT
To reproduce: (1) Load the example: data:text/html;charset=utf-8,<p style="width:0">écouter/parler</p> <p style="width:0">parler/écouter</p> in Safari or Safari Tech Preview. (2) Observe the rendering. Expected result: The line-breaking behavior of the two test paragraphs should be the same: either both of them should have a break after the slash, or neither of them. Actual result: In the case "écouter/parler", no break occurs. In "parler/écouter", Safari breaks the line after the slash
Attachments
Add attachment
proposed patch, testcase, etc.
Darin Adler
Comment 1
2023-10-03 18:01:14 PDT
Not only do we want to be consistent with ourselves, we’d also like to be consistent with the CSS specification and other web browsers.
zalan
Comment 2
2023-10-03 20:14:34 PDT
nextBreakablePosition tells inline layout there's a soft wrap opportunity between '/' and 'é' and we are happily break content right there (given the 0 horizontal constraint).
zalan
Comment 3
2023-10-03 20:16:20 PDT
Chrome (119.0.6045.6 (Official Build) canary (arm64)) finds a breakable position too between '/' and 'é'.
zalan
Comment 4
2023-10-03 20:21:02 PDT
we generally consult ICU to find such soft wrap opportunities (gonna fire up the debugger to confirm it)
zalan
Comment 5
2023-10-04 09:44:48 PDT
according to
https://www.unicode.org/reports/tr14/#SY
, the SOLIDUS symbol allows line breaking _after_ (except certain cases). parler/écouter <- due to the character 'é' we consult ICU and ICU tells us there's a soft wrap opportunity _after_ the SOLIDUS. écouter/parler <- we consider it as a simple case (characters are all within a certain range, see lineBreakTable in BreakLines.cpp) so we end up not consulting ICU and go with _no_ soft wrap opportunity after the SOLIDUS. I think Blink has the same logic/map and that's why they produce the same result. (I am sure Myles has some insight here)
Darin Adler
Comment 6
2023-10-04 10:52:37 PDT
To fix this seems like our BreakLines.cpp needs the simple case subset "line break after solidus" logic added.
Jonathan Kew
Comment 7
2023-10-04 13:39:46 PDT
Well, treatment of solidus is not the only discrepancy between the ICU behavior and your ASCII-only table. E.g. data:text/html;charset=utf-8,<p%20style="width:0">a!a%20å!å I suspect that *not* breaking after between / and a following letter may be the more web-compatible option; note that this is historically what ~all browsers have done for plain English content, at least. An alternative fix for the inconsistency would be to tailor the line-breaking class of the solidus (and probably a few other characters; exclamation appears to be one example) when using the ICU / UAX#14-based breaker, so as to get the (perhaps desired) non-breaking result, regardless of whether the letters involved are strictly ASCII or not. My point is not necessarily that this particular break position should or shouldn't be allowed (UAX#14 says yes by default, but allows tailoring); my point is that the behavior should be consistent for all (Latin, at least) letters.
Darin Adler
Comment 8
2023-10-04 18:27:56 PDT
Helpful to point out that it might be important to retain the old behavior for practical web compatibility. It’s a complication that in some cases we don’t want the behavior that comes from the UAX#14 breaker. This needs to be specified, presumably in CSS, and not decided by a discussion in a WebKit bug. Obviously, we want our fast case code to match our slow case. I’m not sure saying "my point is" really helps. To fix this bug and test thoroughly someone needs a list of the known inconsistencies. I’m sorry: I thought you were pointing out the one you knew about, and didn’t realize that you expected someone else to research what others there were. I think we should consider configuring our debug builds to always run both the fast and the slow break finding logic and log any inconsistencies or maybe crash on inconsistency, so we can more easily spot this mistake over time. But we’ll still need to build test cases for all the ones we are currently get wrong. I think we should contribute those test cases to WPT.
zalan
Comment 9
2023-10-04 19:21:22 PDT
This is what CSS Text Module (lvl4) (
https://www.w3.org/TR/css-text-4/#line-break-details
) says about breaking around punctuation marks: "UAs that allow wrapping at punctuation other than word separators in writing systems that use them should prioritize breakpoints. (For example, if breaks after slashes are given a lower priority than spaces, the sequence “check /etc” will never break between the "/" and the "e".) As long as care is taken to avoid such awkward breaks, allowing breaks at appropriate punctuation other than word separators is recommended, as it results in more even-looking margins, particularly in narrow measures" and
https://www.unicode.org/reports/tr14/#SY
also says: "As a side effect, some common abbreviations such as “w/o” or “A/S”, which normally would not be broken, acquire a line break opportunity. The recommendation in this case is for the layout system not to utilize a line break opportunity allowed by SY unless the distance between it and the next line break opportunity exceeds an implementation-defined minimal distance."
Radar WebKit Bug Importer
Comment 10
2023-10-09 12:27:14 PDT
<
rdar://problem/116694174
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug