Bug 3233

Summary: CSS2: Web Kit does not support the :lang pseudo-class
Product: WebKit Reporter: Dave Hyatt <hyatt>
Component: CSSAssignee: Nicholas Shanks <nickshanks>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, foxden, nickshanks, webkit
Priority: P2 Keywords: InRadar
Version: 412   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 3234, 9454    
Attachments:
Description Flags
:lang() support
none
patch rev 1.1
none
patch rev 1.2
none
patch rev 2.0 alpha 1
eric: review-
patch rev 1.3
none
Patch rev 1.4 hyatt: review+

Description Dave Hyatt 2005-06-01 14:44:13 PDT
* STEPS TO REPRODUCE
1. 10.3.3, Safari 1.2.1
2. Go to http://www.w3.org/International/tests/sec-css-lang-1.html

* RESULTS
Safari fails the test.

* NOTES
There are questionable results on some of the other CSS international tests, but no outright failures:
http://www.w3.org/International/tests/sec-css-lang-2.html
http://www.w3.org/International/tests/sec-css-lang-3.html
Comment 1 Dave Hyatt 2005-06-01 14:44:51 PDT
Apple Bug: <a href="rdar://3611451">rdar://3611451</a>
Comment 3 Nicholas Shanks 2005-06-14 16:44:09 PDT
Created attachment 2342 [details]
patch rev 1.1

DOMStrings changed to AtomicStrings, QStrings left unchanged due to use of
QString::startsWith() which has no AtomicString/DOMString equivalent. Added
initialization/deletion to constructor and destructor.
Comment 4 Dave Hyatt 2005-06-14 16:57:33 PDT
delete is for objects allocated via new.  You don't need it.  You can also use const AtomicString& to hold 
the result from getAttribute.
Comment 5 Nicholas Shanks 2005-06-15 07:35:08 PDT
Created attachment 2361 [details]
patch rev 1.2
Comment 6 Nicholas Shanks 2005-06-15 08:59:08 PDT
This patch works to a fashion but isn't very good. Basically it emulates *:[lang^="…"] which is not what we 
want - language is not inherited and :lang(e) will match both lang="en" and lang="es" attributes. It doesn't 
support xml:lang="…" either. I am working on a new patch to implement Darin's suggestion of adding 
language to the RenderStyle.
Comment 7 Ben Gertzfield 2005-06-15 10:42:21 PDT
Here's another test case on Wikipedia that uses lang="zh", lang="zh-cn", lang="zh-tw", lang="ja", 
lang="ko" to illustrate glyph variants when browsers correctly switch fonts based on the lang tag:

http://en.wikipedia.org/wiki/Han_unification#Check_your_browser

In Mozilla Firefox 1.0.3 on Mac OS X 10.4.1, each line of the table in that section is rendered differently.  

In Safari 2.0 on OS X 10.4.1, each line is rendered identically, depending on your language setting.  By 
default, the Japanese style of glyph is used for every line of the table; if your language is set to simplified 
Chinese, you get the simplified Chinese glyphs in every line of the table, etc.
Comment 8 Nicholas Shanks 2005-06-16 09:49:53 PDT
Created attachment 2388 [details]
patch rev 2.0 alpha 1

alpha patch, not tested (and doesn't function well). Posted here for
comments/feedback. The main part of this is adding the lang attribute to the
RenderStyle for he element, so that it gets inherited by the element's
children.

This version also fixes the problem with :lang(e) matching both en and es by
stealing the implementation for the [attr|="val"] hyphenated attribute selector
instead of using QString::startsWith(). Might now be possible to eliminate the
QStrings, not sure yet.
Comment 9 Ben Gertzfield 2005-06-17 20:03:35 PDT
Well, the 2.0 alpha 1 patch is definitely alpha. :)  I did my first compile of WebKit, did run-safari, and went 
to this bug's page.  I got a non-styled page, and when I clicked on the link to http://www.w3.org/
International/tests/sec-css-lang-1.html Safari crashed.

I may use rev 1.2 as a base to try to implement the font hints.
Comment 10 Allan Sandfeld Jensen 2005-11-27 05:52:05 PST
Since the patch is partially based on my patch for KHTML, I should note that 
it has been updated since to handle the problem with "-" and with case.  
There is also a decision to make with case-insensitive matching since KHTML 
now follows the spec but Mozilla does not. 
Comment 11 Eric Seidel (no email) 2005-12-28 02:00:29 PST
I was doing similar work the other day for the nth-child() and other nth-* selectors (which are also 
recorded in bugzill somewhere).  The parser obviously needs to be augmented to handle these other types 
of selector functions.
Comment 12 Eric Seidel (no email) 2005-12-28 02:06:59 PST
Comment on attachment 2388 [details]
patch rev 2.0 alpha 1

In general, I think this patch is good.  A couple comments:

1. I think that your additional "string-arg" member variable for CSSSelector
shoudl either be part of a union with simpleSelector, or should be part of a
subclass of CSSSelector.  Either way, we should only pay 4bytes for
simpleSelector and string_arg (bad name, btw) not 8, like we currently are.

Second, I think that NOTFUNCTION can just be replaced by "not(" in the actual
grammar if you like.

No need to add any //kdDebug lines, we're removing those as we go.

When we go about adding nth-* selectors, I don't think we'll keep the string
value in CSSSelector (we might).  We'll probably have bison do the actual an+b
parsing for us, and store a,b on CSSSelector (or more likely a subclass) as
floats perhaps.
Comment 13 Eric Seidel (no email) 2005-12-28 02:07:43 PST
Nick are you still intersted in finishing this one off?  It's certainly very close to being done!
Comment 14 Nicholas Shanks 2006-02-21 13:47:07 PST
Eric: no, sorry I have been off working on my own things for six months. If you want to make the changes you suggested yourself and check the patch in, assign it to yourself, I probably won't get on to it in a while.
Comment 15 Nicholas Shanks 2006-06-04 11:42:34 PDT
Replacing the NOTFUNCTION token results in this:

    | ':' 'n' 'o' 't' '(' maybe_space simple_selector ')' {
        CSSParser* p = static_cast<CSSParser*>(parser);
        $$ = p->createFloatingSelector();
        $$->match = CSSSelector::PseudoClass;
        $$->simpleSelector = p->sinkFloatingSelector($7);
        $$->value = atomicString("not(");
    }

and means that it can't be used for anything else one day (CSS4?). I suggest leaving the token in the lexicon but calling it SIMPLEFUNCTION instead.
Comment 16 Nicholas Shanks 2006-06-04 11:47:15 PDT
OK, scratch that, I didn't think it through :-)
The tokenizer cannot distinguish between my concepts of simple and not simple functions.
Comment 17 Nicholas Shanks 2006-06-05 14:55:21 PDT
Created attachment 8719 [details]
patch rev 1.3

This is an update of 1.2 for the current tree and file layout. It doesn't make any changes to the layout engine and adds four bytes to each CSSSelector, but compiles and runs and doesn't screw anything up :-). Will update with more sophisticated patch soon.
Comment 18 Nicholas Shanks 2006-06-15 08:49:52 PDT
Created attachment 8856 [details]
Patch rev 1.4

I have tried to address Eric's first concern, I really have, but due to the fact that AtomicString (and DeprecatedString) both have constructors and destructors, they are not permitted within a union. Everything else is either fixed or open to suggestion and later adjustment (like the variable name).

I am submitting this as the final patch offering for this bug. It really needs to be committed and into the tree as has been here for 12 months now.

Note that this patch is less ambitious than I first wanted: it does not attempt to inherit the language from parent elements nor the HTTP headers, as that was causing instability. I will open a separate bug to track that.
Comment 19 Eric Seidel (no email) 2006-06-15 09:44:59 PDT
Seems fine.  Hyatt should peek at this one before it's landed.
Comment 20 Dave Hyatt 2006-06-15 11:24:12 PDT
Comment on attachment 8856 [details]
Patch rev 1.4

r=me.  Make sure we have bugs covering the remaining issues with :lang.
Comment 21 Darin Adler 2006-06-15 19:25:19 PDT
Landed as r14879.