The :lang() pseudo-selector does not inherit the content language from its parent element or the HTTP Content-Language header. KHTML already has code for both of these inheritance paths. rdar://problem/3611451
Created attachment 12107 [details] Proposed patch Test implements simple inheritance checking of LANG. The check for HTTP content-language has not been added.
Created attachment 12108 [details] Test of inheriting LANG
Created attachment 12109 [details] Another LANG inheritance test
Comment on attachment 12107 [details] Proposed patch Nice to see this finally getting ported over. I think it would be appropriate to add a comment/FIXME or two about the missing features (ie, no support for HTTP content-language or xml:lang (though as you explained that has not been defined)). We also usually put the patch, the tests (and their expected results) and the ChangeLogs together in one patch to be easily landed. We have a prepare-ChangeLog script to make the ChangeLog and svn-create-patch script to create the patch. Only one semi-color required here. + Node *n = e->parent();; I also have a few style nits. We put the * on the otherside + Node *n = e->parent();; Single line if statements don't get curly-braces. + if (n->isElementNode()) { + value = static_cast<Element*>(n)->getAttribute(langAttr); + } r- for now.
Created attachment 12133 [details] Patch Patch with comments and test-case
Comment on attachment 12133 [details] Patch I'm a bit concerned about the performance of this approach, since you'll end up crawling up a lot of nodes. We also need to do something about style sharing. :lang is going to get confused by it right now.
Comment on attachment 12133 [details] Patch >+<html > No need for the extra space. >+ p:lang(x) { background:lime } Please use green here, it is used throughout the layout tests. The testcase seems fine, be sure to include expected results(.txt,.checksum and .png). You can get them by running run-webkit-tests and run-webkit-tests -p. >+ // but the it has not been clarified how lang and xml:lang coexists You probably didnt mean to write "the" there. >+ Node *n = e->parent(); * needs to right next to Node (see style guidelines). >+ /* >+ else >+ if (n->isDocumentNode()) { >+ value = static_cast<Document*>(n)->contentLanguage(); >+ */ AFAIK code in comments should not be included in patches. Also the code is not 100% according to style guidelines. >+2006-12-31 Allan Sandfeld Jensen,,, <set EMAIL_ADDRESS environment variable> Make sure this line is correct, you need to give your mail, and I see no need for the ,,, So coding wise this patch seems fine (except some style problems mentioned). As soon as you fix the points above I am fine with r+ing it.
Some more testcases: http://tc.labs.opera.com/selectors/lang/
Created attachment 12390 [details] quick and dirty fix for rob's comments :-)
Comment on attachment 12390 [details] quick and dirty fix for rob's comments :-) >+ value = static_cast<Element*>(n)->getAttribute(langAttr); >+ // FIXME: Document needs to provide the MIME content-language >+ >+ >+ >+ >+ >+ n = n->parent(); You don't want the empty lines there. This patch is much better. However it needs test results included before it can be okayed. Cheers, Rob.
Also, HTML5 says how to work with lang and xml:lang. You always take xml:lang. (Note that xml:lang can occur in an HTML DOM as well.)
Created attachment 15931 [details] New patch New patch. This implements parsing of http-equiv content-language. Still doesn't read HTTP headers though.
Hyatt: The performance is a really a design decision. I've used the same model when implementing the nth-selectors. The idea is to not to let these rarely-used features bleed code-wise or performance-wise over anything else. If :lang selectors become more common you can always improve the performance for a general memory-hit. Rob: run-webkit-tests doesn't work here. You have to make the test-results yourself. Hmm. Seems I missed one thing. I'll make a new patch with green results..
Created attachment 15932 [details] Patch New patch now with green test-results.
Comment on attachment 15932 [details] Patch This looks really good. + if (isHTMLDocument()) Why restrict this to HTML documents? In the past we've found we need these document-level functions for any documents with HTML content, even ones that aren't HTML documents. Does Gecko have this function for non-HTML documents? + this->setContentLanguage(content); Why an explicit "this->" here? + String m_contentLanguage; New fields ought to be private, rather than protected, unless there's some need a derived class needs access. We need to scale back our use of protected to the minimum. + // but the it has not been clarified how lang and xml:lang coexists Need to take out the word "the" here. + Node* n = e->parent(); + while (n && value.isEmpty()) { [...] + + n = n->parent(); + } I think this would read even better as a for loop rather than a while loop. I also think we have too much code here in-line. I'd like to see it put in a helper function. There are some features we're planning in the future that will require figuring out what the language is for a Node, so it would be nice to have this loop and logic somewhere in the DOM outside the style system. Maybe even: AtomicString Node::language() I think I'm going to say r=me despite these quibbles. But please consider revising the patch.
Allan, it would be great if you could revise the patch to address Darin's comments so that it can easily be landed.
The biggest use for lang is probably in the engine itself, see bug 10874. It looks like future hot code.
This should presumably land on the feature branch and not on the main trunk.
Created attachment 16473 [details] patch with darin's suggested fixes I applied darin's suggested fixes to the patch. Personally I think this code will need to be re-written to store the language off of RenderStyle. It's more efficient to cascade the language that way, instead of an ancestor crawl. I'm not sure that that issue in any way prevents this from landing. It just kinda prevents using :lang in the engine. ;)
Comment on attachment 16473 [details] patch with darin's suggested fixes Changes to this file are missing from the patch: + * css/CSSStyleSelector.cpp:
Created attachment 16482 [details] even better patch, now including test case results and xml:lang support
Comment on attachment 16482 [details] even better patch, now including test case results and xml:lang support I think we should fix the perf issue here and hang the language off RenderStyle. We have time to fix this.
I tried, and had some trouble implementing this as a RenderStyle-based hack. Need to talk with Dave again (or just land the existing working code).
I do not think the ancestor crawl should be landed as is. The problem is that often the kinds of Web pages that use this feature use generic :lang rules that aren't scoped to anything. You will then end up crawling up the entire ancestor chain for every single unique style in your document. This *will* hurt performance. I also disagree with the idea of doing DOM-based inheritance. I don't think it's necessary, and I think the RenderStyle can hold the current language in this case. Maybe you could elaborate on what problem you were having.
I don't think it is serious performance problem. It is less expensive than a descent selector where the first rule has matched. On sites like slashdot.org they have 100 rules like ".class li" that matches most elements on the page, and this is obviously not a big problem. I made a survey of webpages last year, to figure out if it would be worthwhile to speed-up descent selectors, and it turned out webpages in general are very flat. The only issue that should be considered is the likelyhood of anyone combining a descent and language selector in which case the runtime goes up to square DOM-tree height. I consider it unlikely. To move the language to RenderStyle means you will need a system for dirtying and recalculating language subtrees whenever a node changes LANG. I really really don't think it is worth it, unless you have a strong need for a constant-time language result in text-layout.
I discussed this with hyatt over IRC, and he identified some issues in trying to implement this based on RenderStyle. His primary issue was cascade order, as if we hang this off of the elements inline style declaration that will be applied *after* the UA stylesheet (which might at some later point contain rules which depend on :lang working correctly). Hyatt said he would noodle on the problem and get back to me.
Comment on attachment 16482 [details] even better patch, now including test case results and xml:lang support I'm going to flag this as review? again, because I don't think the performance problem is critical and doing lang inheritence more efficiently but still correctly in the face of dynamic changes seems hard. In particular, what if you have <div><span><i> nested, and the <i> inherits its lang from the div, and you add a lang attribute to the span. Does the span need to update the RenderStyle for all its children? I think we should land this now and optimize later because DOM trees tend to be shallow and many more common types of selectors already need to look at ancestors.
Comment on attachment 15932 [details] Patch clearing review flag so this doesn't show up in the commit queue
dhyatt: i think a single :lang rule (with no qualifiers) will really hurt perf [5:39pm] dhyatt: every element will crawl up the whole tree [5:39pm] dhyatt: i don't think it's good enough as is [5:39pm] dhyatt: existing pages that use this commonly have orphaned :lang rules [5:39pm] dhyatt: and it's used as a css detection hack i think also [5:39pm] MacDome: dhyatt: meaning? [5:39pm] dhyatt: meaning that existing pages will slow down [5:39pm] MacDome: (by orphan'd rules) [5:39pm] dhyatt: if we land this [5:39pm] dhyatt: global :lang rules [5:40pm] dhyatt: orphaned is the wrong term to use i guess [5:40pm] MacDome: like *:lang { } [5:40pm] dhyatt: right [5:40pm] MacDome: ick [5:40pm] dhyatt: *:lang will be lethal [5:40pm] dhyatt: with this patch [5:40pm] dhyatt: but safari2/3 ignore it [5:40pm] dhyatt: so it doesn't matter [5:40pm] dhyatt: and ffx honors it and has an efficient impl [5:40pm] dhyatt: when it comes to css selectors you just can't check in a slow impl imo [5:41pm] dhyatt: they're too easily abused
Lethal is a strong word, ".class *" is a semi-common combination and it doesn't destroy performance. The current patch has the exact same performance as "[lang] *".
Test 33 in Acid3 hits this bug.
The test number for Acid3 has changed: Test 34: FAIL (expected 1, got: 0 - descendants inheriting lang=en-GB should be matched by :lang(en)) This patch is still just waiting for mjs & hyatt to debate it out. We've not come up with any alternative approach yet. Certainly hanging it off the RenderStyle does not work, as noted by hyatt himself in comments related to this bug. What could move us closer to landing this? Would creating a worst-case perf test case help?
I guess one possibility for a perf-improvement would be to cache the computed :lang value on the renderer during the reverse-walk up the tree for the :lang computation. We'd just have to know how to invalidate it (and the whole subtree) every time a lang= value changed.
I am not sure it is worthwhile to do the low-grained invalidation of a selector-cache. I have a selector-cache implementation for KHTML which caches Descent selectors, it can easily cache :lang selectors as well. The short-cut I decided to do to make the cache really simple and mem-safe was to only use it on recalcStyle(Forced) runs. These runs happens every time a new style-element is loaded or added to the document, and everytime a top-selector in a Descent selector changes (eg. [attr] div, should create a run when attr changes). Such a cache will effectively give you and O(1) run-time on these ancestor-crawls. For the language though, it doesn't solve the fast-access need if you are going to use them in font-selection.
"Certainly hanging it off the RenderStyle does not work, as noted by hyatt himself in comments related to this bug." I'm not sure how you reached that conclusion Eric. As this is exactly what Firefox does, obviously such an approach can work.
(In reply to comment #35) My comment was a restatement of: http://bugs.webkit.org/show_bug.cgi?id=9454#c26 I bet we can dig up the IRC logs if necessary. You had expressed doubts towards a correct implementation of :lang via RenderStyle, which I was summarizing in comment #26. Right now we have a patch which works, and some performance concerns related to that patch brought up by a senior engineer in whom I place a lot of trust (that's you). In order to move forward with this patch, either need to alleviate your performance concerns (with a test case?), or move to a different model (hopefully with some guidance from said senior engineer). I'd be interested to hear your suggested course of action. How would you like to see this resolved hyatt? You'd suggest a RenderStyle based solution? Again, I understand this is not a mission critical bug. I appreciate you spending cycles discussing it.
It is mission-critical for some people: WebKit users who need to handle multiple languages.
Ok, let's just land it and file a followup bug to improve the performance.
Comment on attachment 16482 [details] even better patch, now including test case results and xml:lang support r=me
Landed as r29336, bug 16801 was filed to track any perf concerns regarding :lang.