Bug 30015 - text-transform:capitalize is failing in CSS2.1 test suite
Summary: text-transform:capitalize is failing in CSS2.1 test suite
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: CSS (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-02 04:57 PDT by Shinichiro Hamaji
Modified: 2012-03-02 15:44 PST (History)
9 users (show)

See Also:


Attachments
Patch v0 (16.53 KB, patch)
2009-10-02 04:58 PDT, Shinichiro Hamaji
mjs: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Shinichiro Hamaji 2009-10-02 04:57:20 PDT
(Though this is duplicate of Bug 23267, I'd like to open separate bug for this test case as the original bug has actually 3 bugs.)

Let me explain why t1605-c545-txttrans-00-b-ag (http://www.w3.org/Style/CSS/Test/CSS2.1/current/html4/t1605-c545-txttrans-00-b-ag.htm) is failing. This test is exercising CSS' text-transform property. This test is failing for text-transform:capitalize. For the following texts,

Xx xx x. (x.x. XX)
Pp pp p. (p.p. PP)
Éé éé é. (é.é. ÉÉ)

the test suite is expecting

Xx Xx X. (x.x. XX)
Pp Pp P. (p.p. PP)
Éé Éé É. (é.é. ÉÉ)

but WebKit is capitalizing lower characters in parens like

Xx Xx X. (X.X. XX)
Pp Pp P. (P.P. PP)
Éé Éé É. (É.É. ÉÉ)

For example, this behavior converts "e.g.," into "E.G.,", which may look weird. You can see this example in another CSS test suite, LayoutTests/css1/text_properties/text_transform.html (http://www.w3.org/Style/CSS/Test/CSS1/current/sec545.htm).

The CSS1 test suite is saying "There Should Be A Capital Letter After A Non-Breaking Space (&Nbsp;).". I guess it may mean a character should be capitalized if and only if it comes after whitespaces. The CSS2.1 test suite is expecting this behavior as well.

Now, our implementation of capitalization uses ICU's word breaking, which is kind of "smart". For example, it converts from "hi" to "Hi", from foo's to Foo's, from layout-tests to Layout-Tests, and from e.g., to E.G., .

If we use the approach the CSS test suite may be expecting (capitalize characters iff. it comes after whitespaces), the results will be "hi", Foo's, Layout-tests, and E.g., specifically.

I'm not sure what is the best behavior. It seems that we intentionally chose the current behavior in Bug 4171. If we decide not to fix this, please just make this bug WONTFIX. Otherwise, my patch will fix this.

FYI, I summarized results for some texts using 4 browsers.

http://spreadsheets.google.com/pub?key=tGCsFWV1bwyYanB9y4Ri9IQ&output=html

Bug 3406 is also related to this bug.
Comment 1 Shinichiro Hamaji 2009-10-02 04:58:02 PDT
Created attachment 40511 [details]
Patch v0
Comment 2 Shinichiro Hamaji 2009-10-02 04:58:52 PDT
(In reply to comment #1)
> Created an attachment (id=40511) [details]
> Patch v0

If we decide to make the CSS2.1 test suite pass, I'll refine this patch a bit.
Comment 3 Darin Adler 2009-10-02 09:51:14 PDT
Comment on attachment 40511 [details]
Patch v0

Is this approach really correct? Shouldn't this be fixed inside the text break function instead?
Comment 4 Shinichiro Hamaji 2009-10-02 10:17:49 PDT
(In reply to comment #3)
> (From update of attachment 40511 [details])
> Is this approach really correct? Shouldn't this be fixed inside the text break
> function instead?

It seems that the word break iterator is used by selection in editing, too. I guess changing word break iterator causes regressions for selection. For example, if you double click 'o' in "(foo bar)", current implementation selects "foo". If we change wordBreakIterator, "(foo" would be selected. But, it's just my guess. I didn't confirm if my guess is true. I'll check it. Or, I'm happy if someone who knows editing well gives comments.
Comment 5 Darin Adler 2009-10-02 11:12:15 PDT
I think we really need some input from CSS experts. We have here the ICU library’s concept of word breaks, and then the CSS concept of what words are capitalization-wise. We also are not considering the language.

It’s challenging to change this without being sure what the desired behavior is.

I suppose the CSS test suite makes it clear we’re wrong to be using the ICU concept?
Comment 6 Shinichiro Hamaji 2009-10-06 05:12:30 PDT
Yeah, I agree. I want advices from CSS experts. Thanks Darin for adding dhyatt into the CC lists.

By the way, it seems that a microsoft person is saying that the result should be "(É.é. ÉÉ)" so IE and Firefox are OK because of the word breaking rule in Unicode. I'm not sure Unicode's rule should be applied to this style though.

http://lists.w3.org/Archives/Public/public-css-testsuite/2009Feb/0040.html

http://www.unicode.org/reports/tr29/tr29-13.html#Word_Boundaries
Comment 7 Eric Seidel (no email) 2009-11-26 21:18:51 PST
Any thoughts Ian or Hixie?
Comment 8 Maciej Stachowiak 2009-12-28 01:24:30 PST
Comment on attachment 40511 [details]
Patch v0

I don't think the test suite behavior is either correct or required by the spec. The test's expectation is just wrong. If we match any other browser, it should be Firefox. I don't think the Opera behavior gives good results. r- to either reconsider this or try a different approach. Input from experts still welcome.
Comment 9 Shinichiro Hamaji 2010-01-20 01:56:33 PST
(In reply to comment #8)
> (From update of attachment 40511 [details])
> I don't think the test suite behavior is either correct or required by the
> spec. The test's expectation is just wrong. If we match any other browser, it
> should be Firefox. I don't think the Opera behavior gives good results. r- to
> either reconsider this or try a different approach. Input from experts still
> welcome.

Thanks for the comment and sorry for the latency. I played with Firefox and found the logic is not as simple as I expected. For example, no capitalization occurs for "あfoo" but "、foo" becomes "、Foo". I think I have too few knowledges around this to fix this issue correctly. I give up to fix this issue for now :(
Comment 10 Adam Klein 2012-03-02 15:44:10 PST
Rebaselined a bunch of Chromium results, including t1605-c545-txttrans-00-b-ag, in http://trac.webkit.org/changeset/109625. This may be a "bad" baseline, but decided it was better to have it checked in than to keep this test marked as "failing" for years.