Bug 76701

Summary: Use content-language from http-equiv to set document locale and font
Product: WebKit Reporter: Matt Falkenhagen <falken>
Component: Layout and RenderingAssignee: Matt Falkenhagen <falken>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, bashi, darin, ian, jshin, macpherson, tkent, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 10874, 119055    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch none

Description Matt Falkenhagen 2012-01-20 05:14:41 PST
We should map the HTTP content-language header to -webkit-locale for use in font selection, similar what bug 67586 did for HTML lang/xml:lang.
Comment 1 Matt Falkenhagen 2012-01-20 05:41:01 PST
Created attachment 123296 [details]
Patch
Comment 2 Matt Falkenhagen 2012-01-20 05:47:50 PST
I still need to add the pixel layout test to most platforms' Skipped file, but I was wondering how this approach looks. It seems inefficient to call recalcStyle in Document::setContentLanguage, but it's needed because styleForDocument is called before the content language is available.
Comment 3 Alexey Proskuryakov 2012-01-20 09:13:03 PST
Comment on attachment 123296 [details]
Patch

Could you add an actual http test with Content-Language header?

Also, would be nice to test subtags, as these use different formats in different specs (en_US, en-US etc.).
Comment 4 Kenichi Ishibashi 2012-01-22 18:21:14 PST
Comment on attachment 123296 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=123296&action=review

> LayoutTests/fast/text/content-language-mapped-to-webkit-locale.html:6
> +<script src="../js/resources/js-test-pre.js"></script>

IIRC, js-test-*.js is deprecated so it might be better not to use.

> LayoutTests/fast/text/international/content-language-font-selection.html:14
> +This is in Ahem font.

Is it possible to use reftest-style expectation here?
Comment 5 Kent Tamura 2012-01-22 18:31:24 PST
Comment on attachment 123296 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=123296&action=review

>> LayoutTests/fast/text/content-language-mapped-to-webkit-locale.html:6
>> +<script src="../js/resources/js-test-pre.js"></script>
> 
> IIRC, js-test-*.js is deprecated so it might be better not to use.

@bashi, you seem confused with script-tests deprecation outside of fast/js/.  Using js-test-*.js is nice.

> LayoutTests/fast/text/content-language-mapped-to-webkit-locale.html:15
> +function getLangOfNode(n) {
> +    e = document.getElementById(n);

We had better follow C++ style rule for naming.  The function name should be languageOfNode().
http://www.webkit.org/coding/coding-style.html#names-setter-getter

The argument name 'n' is not appropriate.  It should be 'id'.

'e' should be a local variable;  'var element'.
Or you can just remove the local variable and fold this expression in the next expression; getComputedStyle(document.getElementById(...

> LayoutTests/fast/text/content-language-mapped-to-webkit-locale.html:19
> +shouldBeEqualToString("getLangOfNode('x1')", "zh");
> +shouldBeEqualToString("getLangOfNode('m1')", "ar");

You should add more test cases as Alexey wrote.
Comment 6 Kenichi Ishibashi 2012-01-22 18:37:43 PST
(In reply to comment #5)
> @bashi, you seem confused with script-tests deprecation outside of fast/js/.  Using js-test-*.js is nice.

Thank you for a correction. @falken, please ignore my previous comment.
Comment 7 Matt Falkenhagen 2012-01-24 03:53:38 PST
Created attachment 123711 [details]
Patch
Comment 8 Matt Falkenhagen 2012-01-24 03:56:26 PST
Thanks for the review comments! I've added a new patch.

Regarding the HTTP Content-Language header, it looks like WebKit doesn't parse it at all. So I would like to tackle that in a separate patch if possible (bug 76892), and just use the content-language set by http-equiv for this patch.
Comment 9 Kenichi Ishibashi 2012-01-24 04:21:10 PST
Comment on attachment 123711 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=123711&action=review

Does this patch add port specific behavior? If not, you need not update Skipped files.

> LayoutTests/fast/text/international/content-language-font-selection.html:6
> +if (window.layoutTestController) {

nit: braces are not needed here (same as above).
Comment 10 Alexey Proskuryakov 2012-01-24 09:33:51 PST
Comment on attachment 123711 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=123711&action=review

> Source/WebCore/ChangeLog:3
> +        Use content-language to set document locale and font

Does this work in Firefox and/or IE?

> Source/WebCore/css/CSSStyleSelector.cpp:1238
> +    if (!document->contentLanguage().isEmpty())
> +        documentStyle->setLocale(document->contentLanguage());

What guarantees correct precedence of these hints? In tests, I see that a lang attribute on a node takes precedence. Are there any locale hints already implemented in WebKit for which precedence needs to be tested?

> Source/WebCore/css/CSSStyleSelector.cpp:1272
> -        const AtomicString& stdfont = settings->standardFontFamily();
> +        const AtomicString& stdfont = settings->standardFontFamily(fontDescription.script());

It's not a new name, but "stdfont" is of course not in accordance to WebKit coding style.

> Source/WebCore/dom/Document.cpp:1096
> +void Document::setContentLanguage(const String& language)

I think that this name should make it clear that we're parsing it in accordance to pragma rules. "This pragma is not exactly equivalent to the HTTP Content-Language header. [HTTP]"

HTML5 itself calls this "pragma set default language".

> Source/WebCore/dom/Document.cpp:1102
> +    size_t pos = strippedLanguage.find(' ');

Please check for all HTML whitespace characters, not just U+0020. I don't think that simplifyWhiteSpace() gets you exactly that, not to mention that it's unnecessarily slow to modify the string.

> Source/WebCore/dom/Document.cpp:1107
> +        recalcStyle(Force);

I don't know if it's OK to make this call here. We don't do that often in Document.cpp.

Could you please check with a layout&rendering expert?

> Source/WebCore/dom/Document.cpp:2715
> +        if (contentLanguage().isEmpty())
> +            setContentLanguage(content);

It may be a mistake in the spec that an empty or invalid Content-Language can be overwritten with a later pragma. Why the special case?

> Source/WebCore/dom/Document.cpp:2716
> +    } else if (equalIgnoringCase(equiv, "x-dns-prefetch-control"))

Unrelated to this patch, but I wonder if any other pragmas need to be respected only once. It can be important to not let attacker controlled content override these values.

> LayoutTests/fast/text/content-language-multiple-tags.html:4
> +<meta http-equiv="content-language" content="zh-CN"/>

These are all HTML files - why do you put the slash in the end? It's just ignored by the parser.

> LayoutTests/fast/text/content-language-multiple-tags.html:7
> +<link rel="stylesheet" href="../js/resources/js-test-style.css" />

Ditto.

> LayoutTests/fast/text/international/content-language-font-selection.html:4
> +<meta http-equiv="content-language" content="zh-tw"/>

It looks like all tests have a lower case "content-language" string. Please test that it's case insensitive.
Comment 11 Matt Falkenhagen 2012-01-25 22:08:58 PST
Comment on attachment 123711 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=123711&action=review

Thanks for the review, ap and bashi. I'll upload a new patch.

@bashi Some of the tests use overridePreference to set a per-script font setting, but this is supported by Chromium's DumpRenderTree only (bug 71110). So I added those tests to the Skipped files of the other platforms.

>> Source/WebCore/ChangeLog:3
>> +        Use content-language to set document locale and font
> 
> Does this work in Firefox and/or IE?

I don't know if Firefox and IE have something analogous to -webkit-locale, but they do use http-equiv content-language for matching the CSS :lang selector. (WebKit does also.) Firefox uses it for font selection, but surprisingly IE doesn't seem to.

>> Source/WebCore/css/CSSStyleSelector.cpp:1238
>> +        documentStyle->setLocale(document->contentLanguage());
> 
> What guarantees correct precedence of these hints? In tests, I see that a lang attribute on a node takes precedence. Are there any locale hints already implemented in WebKit for which precedence needs to be tested?

As far as I can tell, the only other hint used for setting locale is the -webkit-locale case in CSSStyleSelector::applyProperty. In the future we might use character encoding and application locale as locale hints, so we'd need those precedence tests then.

>> Source/WebCore/dom/Document.cpp:2715
>> +            setContentLanguage(content);
> 
> It may be a mistake in the spec that an empty or invalid Content-Language can be overwritten with a later pragma. Why the special case?

Hm, I don't know. I do notice that http-equiv Refresh also has the special case.

>> LayoutTests/fast/text/international/content-language-font-selection.html:4
>> +<meta http-equiv="content-language" content="zh-tw"/>
> 
> It looks like all tests have a lower case "content-language" string. Please test that it's case insensitive.

I'll add a layout test for this.
Comment 12 Matt Falkenhagen 2012-01-25 22:27:30 PST
Created attachment 124065 [details]
Patch
Comment 13 Alexey Proskuryakov 2012-01-25 22:31:12 PST
> I don't know if Firefox and IE have something analogous to -webkit-locale, but they do use http-equiv content-language for matching the CSS :lang selector. (WebKit does also.) 

OK. I suppose tests for that were added with an earlier patch?

This patch does two things - it fixes Content-Language parsing, and actually does what its title says.

Perhaps it would be better to test subtle aspects of Content-Language parsing with a test that actually works in other browsers. That way, we could confirm that HTML5 actually matches their behavior.
Comment 14 Matt Falkenhagen 2012-01-25 22:50:26 PST
(In reply to comment #13)
> > I don't know if Firefox and IE have something analogous to -webkit-locale, but they do use http-equiv content-language for matching the CSS :lang selector. (WebKit does also.) 
> 
> OK. I suppose tests for that were added with an earlier patch?

Yes, that functionality actually wasn't added recently but a few years ago: bug 9454 and http://trac.webkit.org/changeset/29336

> 
> This patch does two things - it fixes Content-Language parsing, and actually does what its title says.
> 
> Perhaps it would be better to test subtle aspects of Content-Language parsing with a test that actually works in other browsers. That way, we could confirm that HTML5 actually matches their behavior.

That's true. I'll test the other browsers.
Comment 15 Matt Falkenhagen 2012-01-26 01:07:07 PST
(In reply to comment #14)
> (In reply to comment #13)
> > Perhaps it would be better to test subtle aspects of Content-Language parsing with a test that actually works in other browsers. That way, we could confirm that HTML5 actually matches their behavior.
> 
> That's true. I'll test the other browsers.

I tested Firefox and IE. Basically, they don't follow the HTML5 spec:
- In case of multiple Content-Language pragmas, the final one wins, not the first.
- Firefox ignores leading and trailing whitespace but it's not the spec's algorithm of taking the first sequence of non-whitespace characters. It's unclear what IE does but it's not as per the spec and doesn't appear to be a verbatim match either.
- Firefox handles a comma-separated list of languages, so a :lang selector for any language in the list is matched.

Given that, perhaps it's not obvious what behavior we want for WebKit. So, I propose for this patch to keep the parsing as it is and just do what it says in the bug title. I'll upload a new patch.
Comment 16 Matt Falkenhagen 2012-01-26 02:49:02 PST
Created attachment 124085 [details]
Patch
Comment 17 Matt Falkenhagen 2012-01-26 02:55:30 PST
Darin, can you please review this patch and particularly the call to recalcStyle in Document::setContentLanguage? It's needed because styleForDocument is called before the content language is available.
Comment 18 Alexey Proskuryakov 2012-01-26 09:08:22 PST
> Given that, perhaps it's not obvious what behavior we want for WebKit. So, I propose for this patch to keep the parsing as it is and just do what it says in the bug title. I'll upload a new patch.

Makes sense to me. Would you be willing to e-mail WHATWG or W3C HTML list for clarification, and for possible spec changes?
Comment 19 Darin Adler 2012-01-26 09:24:31 PST
Comment on attachment 124085 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=124085&action=review

> Source/WebCore/dom/Document.cpp:1099
> +    m_contentLanguage = language;
> +    recalcStyle(Force);

Since recalcStyle is a heavyweight operation, there are a few things I suggest doing:

1) If m_contentLanguage is already equal to the new value for language, return without doing any further .

2) Your comment in the bug explained why this call to recalcStyle is needed. The code needs a similar comment.

3) Investigate the idiom further. This is almost certainly wrong. Synchronously calculating style happens in a few places, but none of them make clear why the normal model does not apply. The normal model is that style is calculated as part of the painting or hit testing process or as part of an operation that asks about style. Changes invalidate style, but don’t synchronously recalculate it. For example, Document::setVisuallyOrdered does not call recalcStyle(Force). Document::setDesignMode calls scheduleForcedStyleRecalc rather than actually calling recalcStyle. Changing preferences that affect default fonts calls styleSelectorChanged(DeferRecalcStyle).
Comment 20 Matt Falkenhagen 2012-01-30 02:36:18 PST
(In reply to comment #18)
> > Given that, perhaps it's not obvious what behavior we want for WebKit. So, I propose for this patch to keep the parsing as it is and just do what it says in the bug title. I'll upload a new patch.
> 
> Makes sense to me. Would you be willing to e-mail WHATWG or W3C HTML list for clarification, and for possible spec changes?

Sure, I can look into this.
Comment 21 Matt Falkenhagen 2012-01-30 03:50:52 PST
Comment on attachment 124085 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=124085&action=review

>> Source/WebCore/dom/Document.cpp:1099
>> +    recalcStyle(Force);
> 
> Since recalcStyle is a heavyweight operation, there are a few things I suggest doing:
> 
> 1) If m_contentLanguage is already equal to the new value for language, return without doing any further .
> 
> 2) Your comment in the bug explained why this call to recalcStyle is needed. The code needs a similar comment.
> 
> 3) Investigate the idiom further. This is almost certainly wrong. Synchronously calculating style happens in a few places, but none of them make clear why the normal model does not apply. The normal model is that style is calculated as part of the painting or hit testing process or as part of an operation that asks about style. Changes invalidate style, but don’t synchronously recalculate it. For example, Document::setVisuallyOrdered does not call recalcStyle(Force). Document::setDesignMode calls scheduleForcedStyleRecalc rather than actually calling recalcStyle. Changing preferences that affect default fonts calls styleSelectorChanged(DeferRecalcStyle).

Thanks for the suggestions. I investigated further and I think calling styleSelectorChanged is the best way. Oddly, not changing setContentLanguage at all works when testing manually; styleSelectorChanged is called sometime after content language is set, resulting in a call to styleForDocument, but when running the layout tests, one test (content-language-with-subtags.html) kept failing as flaky. Calling styleSelectorChanged(DeferRecalcStyle) seems to stop the flakiness and also doesn't result in an extra call to styleForDocument when testing manually, as recalcStyle(Force) does.
Comment 22 Matt Falkenhagen 2012-01-30 03:55:50 PST
Created attachment 124519 [details]
Patch
Comment 23 Matt Falkenhagen 2012-02-01 20:46:48 PST
Darin, could you please review the latest patch? Thanks!
Comment 24 Darin Adler 2012-02-02 12:21:52 PST
Comment on attachment 124519 [details]
Patch

I don’t see any test cases for setting the language to something bad, like the empty string, or a string with just whitespace in it. I think we need to cover that.

Also, I don’t see anything here about handling dynamic changes to meta elements. Is that allowed? What is it supposed to do? Also, what about when it’s later in the document, not right at the start?
Comment 25 Matt Falkenhagen 2012-02-02 18:31:36 PST
Thanks!

> I don’t see any test cases for setting the language to something bad, like the empty string, or a string with just whitespace in it. I think we need to cover that.

True, I had those test cases but removed them after realizing it wasn't clear how we wanted to parse them (comment #15). I'll see if I can add test cases to show it at least does something sane for now.

> 
> Also, I don’t see anything here about handling dynamic changes to meta elements. Is that allowed? What is it supposed to do? Also, what about when it’s later in the document, not right at the start?

Thanks for the ideas, I'll look into this.
Comment 26 Matt Falkenhagen 2012-02-02 22:40:04 PST
Comment on attachment 124519 [details]
Patch

I started looking into it, but now think I'd like to commit the patch as is and work on improving the parsing and handling dynamic changes in a subsequent patch. This patch doesn't change the parsing we already had for content-language and is just using it to set -webkit-locale and the font (comment #15).

I'll open a new bug for that work.
Comment 27 WebKit Review Bot 2012-02-02 23:30:36 PST
Comment on attachment 124519 [details]
Patch

Clearing flags on attachment: 124519

Committed r106632: <http://trac.webkit.org/changeset/106632>
Comment 28 WebKit Review Bot 2012-02-02 23:30:43 PST
All reviewed patches have been landed.  Closing bug.
Comment 29 Matt Falkenhagen 2012-02-03 01:45:21 PST
Created bug 77724 for subtle content-language parsing behavior.
Comment 30 Ian 'Hixie' Hickson 2012-08-27 12:37:07 PDT
(In reply to comment #15)
> - In case of multiple Content-Language pragmas, the final one wins, not the first.

I've updated the spec to match this.


> - Firefox ignores leading and trailing whitespace but it's not the spec's algorithm of taking the first sequence of non-whitespace characters. It's unclear what IE does but it's not as per the spec and doesn't appear to be a verbatim match either.

IE and WebKit both just treat the value verbatim.


> - Firefox handles a comma-separated list of languages, so a :lang selector for any language in the list is matched.

Haven't changed the spec for this as Firefox is the only one who supports multiple languages and it doesn't really make any sense or match any specs (not even the HTTP analogue).


Please feel free to file bugs on the spec if you want further changes.