Bug 166485 - CSS hyphens: auto should not work if lang="" is not declared
Summary: CSS hyphens: auto should not work if lang="" is not declared
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: CSS (show other bugs)
Version: Safari Technology Preview
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-26 15:13 PST by Simon Pieters
Modified: 2017-01-08 13:51 PST (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Pieters 2016-12-26 15:13:01 PST
Equivalent Chromium bug: https://bugs.chromium.org/p/chromium/issues/detail?id=676270

WebKit hyphenates text with 'hyphens: auto' when no language is declared. Firefox does not.

MDN says:

> Hyphenation rules are language-specific. In HTML, the language is determined by the lang attribute, and browsers will hyphenate only if this attribute is present and if an appropriate hyphenation dictionary is available.

Spec says:

> Correct automatic hyphenation requires a hyphenation resource appropriate to the language of the text being broken. The UA is therefore only required to automatically hyphenate text for which the content language is known and for which it has an appropriate hyphenation resource.
>
> Authors should correctly tag their content’s language (e.g. using the HTML lang attribute) in order to obtain correct automatic hyphenation. UAs may refuse to automatically hyphenate untagged content regardless of the hyphens property value.

https://drafts.csswg.org/css-text-3/#valdef-hyphens-auto

Now the spec doesn't forbid it, but I think the intent is that UAs should not hyphenate untagged content.

I don't know if WebKit uses the system language when it is not declared, or if it uses language-agnostic rules, or something else (but does not seem to auto-detect English in my simple test). If it should be automatic, then it seems more reliable to apply language detection than using system language. But I think for now we should just disable it and tell Web developers to specify lang="" correctly.


Test case/demo:
http://software.hixie.ch/utilities/js/live-dom-viewer/saved/4761

<!DOCTYPE html>
<style> div { border:solid; width:150px; -webkit-hyphens:auto; hyphens:auto; } </style>
No lang
<div>Long words like implementation, initialization, realization, and hyphenation.</div>
lang=en-US
<div lang=en-US>Long words like implementation, initialization, realization, and hyphenation.</div>
Comment 1 Alexey Proskuryakov 2016-12-28 10:09:59 PST
This makes sense in principle, as system language doesn't necessarily match content language. But there are quite a few features that default to system language. From the top of my head: default fonts and font fallback; quotes; spellchecker language; even default character encoding.

I think that preventing hyphenation when a language is not explicitly specified would be inconsistent and confusing.
Comment 2 Simon Pieters 2016-12-30 02:33:32 PST
I think those other defaults can also be problematic, especially for users who travel and use someone else's computer (or public computer), or users who visits sites in different languages. I think there has also been some experiments to move away from using system language for at least character encoding fallback in Gecko.

WebKit prevents hyphenation for lang="unknownasdfasdf", which is already inconsistent with the other features you mention.

Maybe we should take this discussion to the CSSWG?
Comment 3 Myles C. Maxfield 2017-01-05 13:08:25 PST
I agree with Alexey. Removing hyphenation which used to "work" would be viewed by our users as a regression.
Comment 4 Simon Pieters 2017-01-08 13:51:04 PST
Spec issue opened: https://github.com/w3c/csswg-drafts/issues/869