Bug 67586

Summary: Map 'lang' and xml:lang attributes to '-webkit-locale' CSS property for use with font fallback and text-transform
Product: WebKit Reporter: Jungshik Shin <jshin>
Component: Layout and RenderingAssignee: Jungshik Shin <jshin>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, darin, dglazkov, eric, falken, macpherson, mitz, nickshanks, simon.fraser, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 10874, 21312    
Attachments:
Description Flags
patch
none
patch updated
webkit.review.bot: commit-queue-
same as the previous
none
same as the previous with bug # in layout test updated to this bug
none
rebase
none
revised patch
none
incorporated review comments none

Description Jungshik Shin 2011-09-04 23:13:50 PDT
Spun off bug 16801 where I tried to solve the problem in the summary line while improving the performance of 'lang/xml:lang' retrieval (for pseudo-lang selector) at the same time. I came across a problem there and this issue is more urgent. So, I'm fileing this bug separately. 

By fixing this bug separately, we can speed up fixing bug 10874.
Comment 1 Jungshik Shin 2011-09-04 23:52:34 PDT
Created attachment 106311 [details]
patch
Comment 2 WebKit Review Bot 2011-09-04 23:54:30 PDT
Attachment 106311 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast..." exit_code: 1

LayoutTests/ChangeLog:8:  Line contains tab character.  [whitespace/tab] [5]
LayoutTests/ChangeLog:9:  Line contains tab character.  [whitespace/tab] [5]
LayoutTests/ChangeLog:10:  Line contains tab character.  [whitespace/tab] [5]
LayoutTests/ChangeLog:11:  Line contains tab character.  [whitespace/tab] [5]
Source/WebCore/ChangeLog:8:  Line contains tab character.  [whitespace/tab] [5]
Source/WebCore/ChangeLog:9:  Line contains tab character.  [whitespace/tab] [5]
Source/WebCore/ChangeLog:10:  Line contains tab character.  [whitespace/tab] [5]
Source/WebCore/ChangeLog:11:  Line contains tab character.  [whitespace/tab] [5]
Source/WebCore/html/HTMLElement.cpp:205:  Should have a space between // and comment  [whitespace/comments] [4]
Total errors found: 9 in 5 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Alexey Proskuryakov 2011-09-05 00:43:54 PDT
Did you have a chance to talk to a CSS expert about why the proper patch didn't work?
Comment 4 Jungshik Shin 2011-09-06 10:15:53 PDT
Created attachment 106437 [details]
patch updated
Comment 5 WebKit Review Bot 2011-09-06 10:17:29 PDT
Attachment 106437 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/update-webkit', '--chromium']" exit_code: 2

Last 3072 characters of output:
 media
    url: http://src.chromium.org/svn/trunk/src/media@99393
    parsed_url: http://src.chromium.org/svn/trunk/src/media@99393
    should_process: True
    processed: True
    requirements: set(['./'])
  
    name: build
    url: http://src.chromium.org/svn/trunk/src/build@99393
    parsed_url: http://src.chromium.org/svn/trunk/src/build@99393
    should_process: True
    requirements: set(['./'])
  
    name: net
    url: http://src.chromium.org/svn/trunk/src/net@99393
    should_process: True
    requirements: set(['./'])
  
    name: tools/data_pack
    url: http://src.chromium.org/svn/trunk/src/tools/data_pack@99393
    should_process: True
    requirements: set(['./'])
  
    name: third_party/ffmpeg
    url: From('chromium_deps', 'src/third_party/ffmpeg')
    should_process: True
    requirements: set(['third_party', 'chromium_deps', './'])
  
    name: tools/generate_stubs
    url: http://src.chromium.org/svn/trunk/src/tools/generate_stubs@99393
    should_process: True
    requirements: set(['./'])
  
    name: third_party/libjpeg_turbo
    url: From('chromium_deps', 'src/third_party/libjpeg_turbo')
    should_process: True
    requirements: set(['third_party', 'chromium_deps', './'])
  
    name: third_party/v8-i18n
    url: From('chromium_deps', 'src/third_party/v8-i18n')
    should_process: True
    requirements: set(['third_party', 'chromium_deps', './'])
  
    name: tools/grit
    url: http://src.chromium.org/svn/trunk/src/tools/grit@99393
    should_process: True
    requirements: set(['./'])
  
    name: base
    url: http://src.chromium.org/svn/trunk/src/base@99393
    should_process: True
    requirements: set(['./'])
  
    name: sql
    url: http://src.chromium.org/svn/trunk/src/sql@99393
    should_process: True
    requirements: set(['./'])
  
    name: v8
    url: From('chromium_deps', 'src/v8')
    should_process: True
    requirements: set(['chromium_deps', './'])
  
    name: testing/gtest
    url: From('chromium_deps', 'src/testing/gtest')
    should_process: True
    requirements: set(['testing', 'chromium_deps', './'])
  
    name: third_party/libwebp
    url: http://src.chromium.org/svn/trunk/src/third_party/libwebp@99393
    should_process: True
    requirements: set(['third_party', './'])
  
    name: googleurl
    url: From('chromium_deps', 'src/googleurl')
    should_process: True
    requirements: set(['chromium_deps', './'])
  
    name: third_party/skia/include
    url: From('chromium_deps', 'src/third_party/skia/include')
    should_process: True
    requirements: set(['third_party', 'chromium_deps', './'])
  
    name: third_party/ots
    url: From('chromium_deps', 'src/third_party/ots')
    should_process: True
    requirements: set(['third_party', 'chromium_deps', './'])
  
    name: third_party/snappy/src
    url: From('chromium_deps', 'src/third_party/snappy/src')
    should_process: True
    requirements: set(['third_party', 'chromium_deps', './'])

Died at Tools/Scripts/update-webkit-chromium line 80.
No such file or directory at Tools/Scripts/update-webkit line 104.


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 6 WebKit Review Bot 2011-09-06 10:20:46 PDT
Comment on attachment 106437 [details]
patch updated

Attachment 106437 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/9601038
Comment 7 WebKit Review Bot 2011-09-06 10:24:16 PDT
Comment on attachment 106437 [details]
patch updated

Attachment 106437 [details] did not pass cr-mac-ews (chromium):
Output: http://queues.webkit.org/results/9592912
Comment 8 Jungshik Shin 2011-09-07 14:50:21 PDT
Created attachment 106645 [details]
same as the previous 

cr-* bots were broken at the time of testing yesterday. Trying again.
Comment 9 Jungshik Shin 2011-09-08 10:16:28 PDT
Created attachment 106756 [details]
same as the previous with bug # in layout test updated to this bug

cr-linux-EWS is failing with or without this patch. Other EWS bots are returning green. This patch is the same as the previous except for the bug number in the layout test is updated along with the description of the bug/test in the layout test (html and expected result). 

So, I'm asking for review.
Comment 10 Alexey Proskuryakov 2011-09-08 10:58:03 PDT
My question still stands:

> Did you have a chance to talk to a CSS expert about why the proper patch didn't work?
Comment 11 Jungshik Shin 2011-09-08 15:24:06 PDT
Ooops. I missed your comment #3. 

By proper patch, do you mean my patch attached to bug 16801? 

 I don't think my patch here is less proper than the patch there. The patch there does more things than this patch. I narrowed the scope of the work to mapping lang/xml:lang to webkit-locale.  This alone is useful for utilizing 'lang/xml:lang' in font fallback because that directly references webkit-locale instead of calling 'computeInheritedLanguage'.  

'computeInheritedLanguage' is called to deal with 'pseudo-lang' selector. And, bug 16801 is about speeding up that function. Anyway, I'll ask CSS experts (mitz or dhyatt ?) about that, but I think this patch can be reviewed independently of that. 

Hmm, perhaps when dealing with pseudo-lang selector, we might be able to get the computed value of '-webkit-locale' directly as well. Then, we don't have to worry about speeding up computeInheritedLanguage. To do that, we have to map 'content-language' to 'webkit-locale' in another place. 

Anyway, that can be done in bug 16801 and I believe the patch here is useful / necessary by itself.
Comment 12 Jungshik Shin 2011-11-14 16:47:54 PST
mitz, can you review this patch?  Thank you !!
Comment 13 Matt Falkenhagen 2011-12-07 23:09:34 PST
Created attachment 118334 [details]
rebase
Comment 14 Matt Falkenhagen 2011-12-07 23:14:48 PST
I just rebased as the previous patch was a few months old, but no changes were needed.

Could someone please review?

This is needed to allow font selection based on language (see comments on bug 10874, particularly Alexey's comment #22).
Comment 15 Darin Adler 2011-12-08 10:24:41 PST
Comment on attachment 118334 [details]
rebase

View in context: https://bugs.webkit.org/attachment.cgi?id=118334&action=review

> Source/WebCore/html/HTMLElement.cpp:197
> +        // FIXME: this does not work when xml:lang is present without lang.

We use sentence style and so should capitalize the word "this".

Why should we write the code knowing it does not handle xml:lang properly? It's trivial to write code that covers xml:lang too, so lets do that instead of landing it wrong with the FIXME. I believe we just need to put a case for XMLNames::langAttr into this parseMappedAttribute function.

> Source/WebCore/html/HTMLElement.cpp:203
> +        if (hasAttribute(XMLNames::langAttr))
> +            value = getAttribute(XMLNames::langAttr);
> +        else
> +            value = attr->value();

This is the wrong idiom to use.

- First, we should call fastGetAttribute and not getAttribute.
- Second, we should check for presence of the attribute by checking the value for null, not by a separate call to hasAttribute.
- Third, this code is called only when HTMLNames::langAttr is changed. We should have code called when XMLNames::langAttr is changed; this entire function should do nothing at all if an XMLNames::langAttr is present. (See comment above about covering xml:lang instead of doing a FIXME).

> Source/WebCore/html/HTMLElement.cpp:205
> +        if (value.length()) {

Better style is to use isEmpty.

> Source/WebCore/html/HTMLElement.cpp:211
> +            // Have to enclose with a pair of quotation marks to get the
> +            // locale id treated as a string instead of as a CSS keyword.
> +            DEFINE_STATIC_LOCAL(String, doubleQuoteChar, ("\""));
> +            value.insert(doubleQuoteChar, 0);
> +            value.append(doubleQuoteChar);
> +            addCSSProperty(attr, CSSPropertyWebkitLocale, value);

I am extremely surprised that quoting is the correct way to do this or only way to do so. Please double check there is no better solution. If we do need to quote there are a few things wrong:

1) This is unnecessarily inefficient, allocating two intermediate strings. It's more efficient to just write "\"" + value + "\"", since it will do both insertions at once.
2) This does not handle strings with quote marks in them correctly.
3) There is a function named quoteCSSString in CSSParser.h that we could probably use for this that would do it correctly.
Comment 16 Matt Falkenhagen 2011-12-15 02:31:53 PST
(In reply to comment #15)

Thanks for the review!!

> Why should we write the code knowing it does not handle xml:lang properly? It's trivial to write code that covers xml:lang too, so lets do that instead of landing it wrong with the FIXME. I believe we just need to put a case for XMLNames::langAttr into this parseMappedAttribute function.

I think there may be a bug preventing this from working.

If I try:
    if (attr->name().matches(XMLNames::langAttr) { ... }

in this function, it fails to match "xml:lang" of, e.g.,  <div xml:lang='en'>. 

The reason is that QualifiedName::matches is:
    return m_impl == other.m_impl || (localName() == other.localName() && namespaceURI() == other.namespaceURI());

In this case, "xml:lang" has localName="xml:lang" and namespaceURI="", and XMLNames::langAttr has localName="lang" and namespaceURI="http://www.w3.org/XML/1998/namespace"

It seems the xml:lang attribute wasn't created correctly, or we need some additional logic so it can be matched with XMLNames::langAttr.  Or am I missing something here?
Comment 17 Alexey Proskuryakov 2011-12-15 08:50:08 PST
Were you testing HTML by chance? I don't think that we want xml:lang work in HTML documents, just XML ones.
Comment 18 Darin Adler 2011-12-15 08:54:11 PST
(In reply to comment #16)
> > Why should we write the code knowing it does not handle xml:lang properly? It's trivial to write code that covers xml:lang too, so lets do that instead of landing it wrong with the FIXME. I believe we just need to put a case for XMLNames::langAttr into this parseMappedAttribute function.
> 
> I think there may be a bug preventing this from working.
> 
> If I try:
>     if (attr->name().matches(XMLNames::langAttr) { ... }
> 
> in this function, it fails to match "xml:lang" of, e.g.,  <div xml:lang='en'>.

It looks like the problem you are seeing is that the HTML parser is *not* supposed to parse the xml:lang attribute as the one from the XML namespace when parsing an HTML token. You can see this in the HTML specification section about the construction. The step that adds the appropriate namespace is called “adjust foreign attributes” and is not triggered unless MathML or SVG is involved.

So that means that <div xml:lang='en'> does not have an XML lang attribute. Even though it looks like it does. According to the HTML standard.

Before we start writing code, we need to make sure we know what we want to implement. Maybe the HTML standard is wrong? Maybe you need a test case involving an SVG or MathML element instead?
Comment 19 Matt Falkenhagen 2011-12-16 03:14:19 PST
Created attachment 119591 [details]
revised patch
Comment 20 Matt Falkenhagen 2011-12-16 03:16:22 PST
I've uploaded a revised patch.

Regarding xml:lang, I was indeed testing on HTML instead of XML. I hadn't realized it was not supposed to work on HTML. When I test on Jungshik's xhtml test case, it works.

Regarding quoting the value, I don't see another way to do it. I looked at other mapped attributes but couldn't find any with this issue, and quoting seems to be what mitz@ recommended in https://bugs.webkit.org/show_bug.cgi?id=16801#c10
Comment 21 Darin Adler 2011-12-16 10:18:33 PST
Comment on attachment 119591 [details]
revised patch

View in context: https://bugs.webkit.org/attachment.cgi?id=119591&action=review

> Source/WebCore/html/HTMLElement.cpp:173
> +void HTMLElement::mapLanguageAttributeToLocale(Attribute* attr)

In new code like this, please use words rather than abbreviations. So “attribute” rather than “attr”.

> Source/WebCore/html/HTMLElement.cpp:176
> +    String value = attr->value();

The local variable should be const AtomicString& instead of String to avoid unnecessary reference count churn.

> Source/WebCore/html/HTMLElement.cpp:179
> +        // Have to quote so the locale id is treated as a string instead of as a
> +        // CSS keyword.

Should keep this comment all on one line.

> Source/WebCore/html/HTMLElement.cpp:182
> +        // Empty 'lang' should be treated as 'auto'.

Is it really correct to special case both the lack of an attribute (value of null) and the empty string (value is empty), but not, say, a string that is entirely whitespace, other invalid strings, or the word "auto"? I’d like to see tests covering these cases and also see specification wording that makes it clear that an empty string should work that way.

> Source/WebCore/html/HTMLElement.cpp:213
> +    } else if (attr->name().matches(XMLNames::langAttr)) {
> +        mapLanguageAttributeToLocale(attr);

Is using matches instead of == here correct behavior? Should an attribute named xml:lang that is not actually in the XML namespace be respected? I don’t want to land this patch without determining this. I see no test cases covering this since the only test case is XHTML. And no information about why it’s correct to ignore the namespace, citations of say, other browsers’ behavior or a reference to relevant text in a standard or specification.

Also, this code will not correctly handle the case where xml:lang is being removed and we already have a lang attribute. We need a test case that covers that.
Comment 22 Eric Seidel (no email) 2011-12-19 08:55:42 PST
Comment on attachment 119591 [details]
revised patch

View in context: https://bugs.webkit.org/attachment.cgi?id=119591&action=review

>> Source/WebCore/html/HTMLElement.cpp:182
>> +        // Empty 'lang' should be treated as 'auto'.
> 
> Is it really correct to special case both the lack of an attribute (value of null) and the empty string (value is empty), but not, say, a string that is entirely whitespace, other invalid strings, or the word "auto"? I’d like to see tests covering these cases and also see specification wording that makes it clear that an empty string should work that way.

The rules for xml:lang in HTML are pretty nutty it seems:
http://www.whatwg.org/specs/web-apps/current-work/#attr-xml-lang

it's "supported", but only if there is also a matching lang= attribute on the same node... which seems to me to be a backwards way of saying xml:lang is not supported in HTML.
Comment 23 Darin Adler 2011-12-19 11:51:12 PST
(In reply to comment #22)
> The rules for xml:lang in HTML are pretty nutty it seems:
> http://www.whatwg.org/specs/web-apps/current-work/#attr-xml-lang
> 
> it's "supported", but only if there is also a matching lang= attribute on the same node... which seems to me to be a backwards way of saying xml:lang is not supported in HTML.

Eric, I think what’s confusing you is that most of those are rules for authors, not web engines.

Turns out the specification is clear. It says: ‘The attribute in no namespace with no prefix and with the literal localname "xml:lang" has no effect on language processing.’

That means this patch is incorrectly. Only the actual xml:lang attribute in the real XML namespace should have an effect, so specifying xml:lang in an HTML document should not have an effect.
Comment 24 Darin Adler 2011-12-19 11:52:03 PST
Comment on attachment 119591 [details]
revised patch

View in context: https://bugs.webkit.org/attachment.cgi?id=119591&action=review

> Source/WebCore/html/HTMLElement.cpp:175
> +    ASSERT(attr && (attr->name() == langAttr || attr->name().matches(XMLNames::langAttr)));

Having re-read the specification, it’s clear now that this needs to be attr->name() == XMLNames::langAttr, not matches.

>> Source/WebCore/html/HTMLElement.cpp:213
>> +    } else if (attr->name().matches(XMLNames::langAttr)) {
>> +        mapLanguageAttributeToLocale(attr);
> 
> Is using matches instead of == here correct behavior? Should an attribute named xml:lang that is not actually in the XML namespace be respected? I don’t want to land this patch without determining this. I see no test cases covering this since the only test case is XHTML. And no information about why it’s correct to ignore the namespace, citations of say, other browsers’ behavior or a reference to relevant text in a standard or specification.
> 
> Also, this code will not correctly handle the case where xml:lang is being removed and we already have a lang attribute. We need a test case that covers that.

Having re-read the specification, it’s clear now that this needs to be attr->name() == XMLNames::langAttr, not matches.
Comment 25 Matt Falkenhagen 2011-12-20 02:07:08 PST
Comment on attachment 119591 [details]
revised patch

View in context: https://bugs.webkit.org/attachment.cgi?id=119591&action=review

Darin, thank you for the thorough review and comments. I'll upload a revised patch.

>>> Source/WebCore/html/HTMLElement.cpp:182
>>> +        // Empty 'lang' should be treated as 'auto'.
>> 
>> Is it really correct to special case both the lack of an attribute (value of null) and the empty string (value is empty), but not, say, a string that is entirely whitespace, other invalid strings, or the word "auto"? I’d like to see tests covering these cases and also see specification wording that makes it clear that an empty string should work that way.
> 
> The rules for xml:lang in HTML are pretty nutty it seems:
> http://www.whatwg.org/specs/web-apps/current-work/#attr-xml-lang
> 
> it's "supported", but only if there is also a matching lang= attribute on the same node... which seems to me to be a backwards way of saying xml:lang is not supported in HTML.

The standard says for HTML lang, "Its value must be a valid BCP 47 language tag, or the empty string. Setting the attribute to the empty string indicates that the primary language is unknown." The XML standard says the same for xml:lang. So I think the empty string the only thing we want to special case.

I'll upload a new patch with the test cases you mentioned, although I'm not sure what behavior we want for some of them. I thought values should generally be mapped verbatim, but when I tested, invalid strings get mapped single-quoted, e.g., lang="  " becomes "'  '" in getComputedStyle().webkitLocale, and "][;][[)" becomes "'][;][[)'". I notice the same quoting happens if I set --webkit-locale to such a value directly in HTML. I suspect it is some thing to do with what are allowable CSS strings; if that's true it seems like reasonable behavior. If I look at the values in CSSStyleSelector::applyProperty, they don't have the single-quotes.

Unrecognized language tags like "xyzzy" get mapped to "xyzzy" which is what we want; the specification says such tags "must be treated as an unknown language having the given language tag, distinct from all other languages."

lang="auto" and lang="" both get mapped to the string "auto" in JavaScript, although the values are distinct in CSSStyleSelector::applyProperty (one is the string "auto" and the other is CSSValueAuto). So in the current layout test that just looks at computed style in JavaScript, they would have the same expectation. A better test may be to do something like set per-script fonts for script "auto" and show those are used for lang="auto" and not for lang="", but our per-script font settings don't support this since they are restricted to known script codes. I'm not sure we can test this properly.

>>> Source/WebCore/html/HTMLElement.cpp:213
>>> +        mapLanguageAttributeToLocale(attr);
>> 
>> Is using matches instead of == here correct behavior? Should an attribute named xml:lang that is not actually in the XML namespace be respected? I don’t want to land this patch without determining this. I see no test cases covering this since the only test case is XHTML. And no information about why it’s correct to ignore the namespace, citations of say, other browsers’ behavior or a reference to relevant text in a standard or specification.
>> 
>> Also, this code will not correctly handle the case where xml:lang is being removed and we already have a lang attribute. We need a test case that covers that.
> 
> Having re-read the specification, it’s clear now that this needs to be attr->name() == XMLNames::langAttr, not matches.

Maybe I'm misunderstanding. matches does look at the namespace:

bool matches(const QualifiedName& other) const { return m_impl == other.m_impl || (localName() == other.localName() && namespaceURI() == other.namespaceURI()); }

Using matches, xml:lang is ignored in HTML since the namespace differs from that of XMLNames::lang. But if I use ==, xml:lang is not recognized even in XHTML. It is because xml:lang has prefix "xml" and XMLNames::lang has no prefix, although they have the same localname and namespace.

I'm not sure what is meant by "the case where xml:lang is being removed and we already have a lang attribute". If it's something like:

<div lang="ar">
  <div xml:lang="">

I believe the correct behavior is to map to CSSValueAuto as normal. The spec says "If these attributes are omitted from an element, then the language of this element is the same as the language of its parent element, if any." which seems to mean lang information should be inherited only if both lang and xml:lang are not present. If that's correct, the code is doing the correct thing. I'll also add a test case for this.
Comment 26 Matt Falkenhagen 2011-12-20 02:10:41 PST
Created attachment 120000 [details]
incorporated review comments
Comment 27 Darin Adler 2011-12-20 11:45:53 PST
Comment on attachment 120000 [details]
incorporated review comments

View in context: https://bugs.webkit.org/attachment.cgi?id=120000&action=review

> Source/WebCore/html/HTMLElement.cpp:211
> +    } else if (attr->name().matches(XMLNames::langAttr)) {

Which of the test cases fails if you change this from matches to ==? I need to know that to be sure this code is correct.
Comment 28 Matt Falkenhagen 2011-12-20 19:19:42 PST
Comment on attachment 120000 [details]
incorporated review comments

View in context: https://bugs.webkit.org/attachment.cgi?id=120000&action=review

>> Source/WebCore/html/HTMLElement.cpp:211
>> +    } else if (attr->name().matches(XMLNames::langAttr)) {
> 
> Which of the test cases fails if you change this from matches to ==? I need to know that to be sure this code is correct.

In lang-mapped-to-webkit-locale.xhtml, all of the test cases with an "xml:lang" (and without a "lang" with the same value) fail. For example:

<div xml:lang="fr" id="n1">
FAIL getLangOfNode('n1') should be fr. Was auto.

Full list of failures:
FAIL getLangOfNode('x1') should be ja. Was auto.
FAIL getLangOfNode('x2') should be ja. Was auto.
FAIL getLangOfNode('x3') should be ja. Was auto.
FAIL getLangOfNode('n1') should be fr. Was auto.
FAIL getLangOfNode('n2') should be fr. Was auto.
FAIL getLangOfNode('p1') should be ja. Was auto.
FAIL getLangOfNode('q3') should be auto. Was ja.
FAIL getLangOfNode('q4') should be ar. Was ja.
FAIL getLangOfNode('q5') should be auto. Was ja.
Comment 29 Darin Adler 2011-12-21 11:30:24 PST
(In reply to comment #28)
> In lang-mapped-to-webkit-locale.xhtml, all of the test cases with an "xml:lang" (and without a "lang" with the same value) fail. For example:
> 
> <div xml:lang="fr" id="n1">
> FAIL getLangOfNode('n1') should be fr. Was auto.
> 
> Full list of failures:
> FAIL getLangOfNode('x1') should be ja. Was auto.
> FAIL getLangOfNode('x2') should be ja. Was auto.
> FAIL getLangOfNode('x3') should be ja. Was auto.
> FAIL getLangOfNode('n1') should be fr. Was auto.
> FAIL getLangOfNode('n2') should be fr. Was auto.
> FAIL getLangOfNode('p1') should be ja. Was auto.
> FAIL getLangOfNode('q3') should be auto. Was ja.
> FAIL getLangOfNode('q4') should be ar. Was ja.
> FAIL getLangOfNode('q5') should be auto. Was ja.

The question now is whether that test is correct. Does the test properly set up the xml prefix to correspond to the XML namespace? One way to check is to find another browser that’s known to implement this correctly. Another is to read the specification and find the place where it explains why it’s supposed to already be set up that way without doing anything explicit.

I’ll try to spend some researching this myself later in case no one else has time to.
Comment 30 Jungshik Shin 2011-12-21 12:07:33 PST
Darin, thank you for the review and getting this move forward. I missed your review comments earlier, but thanks again for the review. 
 
(In reply to comment #29)
> (In reply to comment #28)
> > In lang-mapped-to-webkit-locale.xhtml, all of the test cases with an "xml:lang" (and without a "lang" with the same value) fail. For example:
> > 
> > <div xml:lang="fr" id="n1">
> > FAIL getLangOfNode('n1') should be fr. Was auto.
> > 
> > Full list of failures:

The following 3 failures are from this line. It's about xml:lang taking a higher priority than lang. 

<div xml:lang="ja" lang="en" id="x1"><div id="x2"><div id="x3"></div></div></div>

> > FAIL getLangOfNode('x1') should be ja. Was auto.
> > FAIL getLangOfNode('x2') should be ja. Was auto.
> > FAIL getLangOfNode('x3') should be ja. Was auto.

The following two are about 'xml:lang' in absence of 'lang' : 
<div xml:lang="fr" id="n1"><div id="n2"><div lang="sv" id="n3"></div></div></div>

> > FAIL getLangOfNode('n1') should be fr. Was auto.
> > FAIL getLangOfNode('n2') should be fr. Was auto.

The following is the same as before: 
<div xml:lang="ja" id="p1"><div lang="" id="p2"></div></div>

> > FAIL getLangOfNode('p1') should be ja. Was auto.

The following 3 also failed when 'xml:lang' is present with 'lang' absent: 

<div lang="ja" id="q1">
  <div lang="" id="q2"></div>
 <div xml:lang="" id="q3"></div>
 <div xml:lang="ar" id="q4"><div xml:lang="" id="q5"></div></div>

> > FAIL getLangOfNode('q3') should be auto. Was ja.
> > FAIL getLangOfNode('q4') should be ar. Was ja.
> > FAIL getLangOfNode('q5') should be auto. Was ja.


> The question now is whether that test is correct. Does the test properly set up the xml prefix to correspond to the XML namespace? One way to check is to find another browser that’s known to implement this correctly. 

I'll try that in Firefox ('known to implement correctly' is not 100% certain). Firefox does implement lang/xml:lang-based font selection. 

Reading http://www.w3.org/International/questions/qa-css-lang.en.php (not a spec) about xml:lang in CSS led me to think that we might need some more tweaking here. 


> Another is to read the specification and find the place where it explains why it’s supposed to already be set up that way without doing anything explicit.
> 
> I’ll try to spend some researching this myself later in case no one else has time to.

Thank you for that.
Comment 31 Darin Adler 2011-12-21 22:10:22 PST
The bug causing trouble for the xml namespace and prefix in xml:lang lies in the XML parser. It does not properly implement the XML namespaces specification. I filed bug 75066, describing what needs to be fixed.
Comment 32 Darin Adler 2011-12-21 22:11:36 PST
Comment on attachment 120000 [details]
incorporated review comments

View in context: https://bugs.webkit.org/attachment.cgi?id=120000&action=review

>>> Source/WebCore/html/HTMLElement.cpp:211
>>> +    } else if (attr->name().matches(XMLNames::langAttr)) {
>> 
>> Which of the test cases fails if you change this from matches to ==? I need to know that to be sure this code is correct.
> 
> In lang-mapped-to-webkit-locale.xhtml, all of the test cases with an "xml:lang" (and without a "lang" with the same value) fail. For example:
> 
> <div xml:lang="fr" id="n1">
> FAIL getLangOfNode('n1') should be fr. Was auto.
> 
> Full list of failures:
> FAIL getLangOfNode('x1') should be ja. Was auto.
> FAIL getLangOfNode('x2') should be ja. Was auto.
> FAIL getLangOfNode('x3') should be ja. Was auto.
> FAIL getLangOfNode('n1') should be fr. Was auto.
> FAIL getLangOfNode('n2') should be fr. Was auto.
> FAIL getLangOfNode('p1') should be ja. Was auto.
> FAIL getLangOfNode('q3') should be auto. Was ja.
> FAIL getLangOfNode('q4') should be ar. Was ja.
> FAIL getLangOfNode('q5') should be auto. Was ja.

The use of matches here instead of == is indeed wrong or at least unneeded, and since == is more efficient we should not use matches.

The use of matches is a workaround for bug 75066, which should be fixed instead.
Comment 33 Darin Adler 2011-12-21 23:28:01 PST
Comment on attachment 120000 [details]
incorporated review comments

View in context: https://bugs.webkit.org/attachment.cgi?id=120000&action=review

>>>> Source/WebCore/html/HTMLElement.cpp:211
>>>> +    } else if (attr->name().matches(XMLNames::langAttr)) {
>>> 
>>> Which of the test cases fails if you change this from matches to ==? I need to know that to be sure this code is correct.
>> 
>> In lang-mapped-to-webkit-locale.xhtml, all of the test cases with an "xml:lang" (and without a "lang" with the same value) fail. For example:
>> 
>> <div xml:lang="fr" id="n1">
>> FAIL getLangOfNode('n1') should be fr. Was auto.
>> 
>> Full list of failures:
>> FAIL getLangOfNode('x1') should be ja. Was auto.
>> FAIL getLangOfNode('x2') should be ja. Was auto.
>> FAIL getLangOfNode('x3') should be ja. Was auto.
>> FAIL getLangOfNode('n1') should be fr. Was auto.
>> FAIL getLangOfNode('n2') should be fr. Was auto.
>> FAIL getLangOfNode('p1') should be ja. Was auto.
>> FAIL getLangOfNode('q3') should be auto. Was ja.
>> FAIL getLangOfNode('q4') should be ar. Was ja.
>> FAIL getLangOfNode('q5') should be auto. Was ja.
> 
> The use of matches here instead of == is indeed wrong or at least unneeded, and since == is more efficient we should not use matches.
> 
> The use of matches is a workaround for bug 75066, which should be fixed instead.

I was wrong again. The use of matches is correct. We could use "==" because of the special rules about the xml prefix, but we don’t need to do that optimization.
Comment 34 Alexey Proskuryakov 2011-12-22 10:14:51 PST
Comment on attachment 120000 [details]
incorporated review comments

View in context: https://bugs.webkit.org/attachment.cgi?id=120000&action=review

The patch doesn't have any tests for dynamic attribute changes, especially for attributes that cannot be inserted by the parser. As commented below, behavior in that case is likely wrong.

It would be also good to investigate what happens in svg-in-html parsing mode.

> Source/WebCore/html/HTMLElement.cpp:213
>      } else if (attr->name() == langAttr) {

This looks like it would match an attribute created with document.createAttributeNS("foobar", "fb:lang").
Comment 35 Darin Adler 2011-12-22 10:26:31 PST
Comment on attachment 120000 [details]
incorporated review comments

View in context: https://bugs.webkit.org/attachment.cgi?id=120000&action=review

>> Source/WebCore/html/HTMLElement.cpp:213
>>      } else if (attr->name() == langAttr) {
> 
> This looks like it would match an attribute created with document.createAttributeNS("foobar", "fb:lang").

No, it wouldn’t. attr->name() returns a QualifiedName and the == operator compares all three things: the lack of a prefix, the local name "lang" and the lack of a namespace URI.
Comment 36 Darin Adler 2011-12-22 10:27:08 PST
(In reply to comment #34)
> The patch doesn't have any tests for dynamic attribute changes, especially for attributes that cannot be inserted by the parser.

That’s a good point. We do need more tests to cover those. That’s the hard part of this.
Comment 37 WebKit Review Bot 2011-12-22 23:29:48 PST
Comment on attachment 120000 [details]
incorporated review comments

Clearing flags on attachment: 120000

Committed r103608: <http://trac.webkit.org/changeset/103608>
Comment 38 WebKit Review Bot 2011-12-22 23:29:55 PST
All reviewed patches have been landed.  Closing bug.
Comment 39 Darin Adler 2012-01-17 17:26:48 PST
As described in bug 76492, this made setting a lang attribute about 5X slower than it was before, even if locale is not used at all in the document’s styling.