WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
21312
text-transform: lowercase is not lang-dependent (Turkish languages : tr,az)
https://bugs.webkit.org/show_bug.cgi?id=21312
Summary
text-transform: lowercase is not lang-dependent (Turkish languages : tr,az)
Jungshik Shin
Reported
2008-10-02 10:50:50 PDT
1. Go to the URL above and scroll down the Turkish section Expected: U+0049 should be transformed to U+0131 (lowercase letter dotless I). U+0130 (latin capital letter dotted I) should be transformed to U+0069 (lowercase I = i). Actual: U+0049 and U+0130 are transformed to U+0069 and <U+0069, U+0307> ('i' followed by a combining mark dot above). This is a subset of the problem (not honoring 'lang'/'xml:lang' in CSS).
Attachments
tentative 1st step toward a fix
(2.60 KB, patch)
2009-03-06 09:18 PST
,
Jungshik Shin
darin
: review-
Details
Formatted Diff
Diff
pilot patch (to work with/test the patch for bug 16801)
(2.74 KB, patch)
2011-07-19 15:58 PDT
,
Jungshik Shin
no flags
Details
Formatted Diff
Diff
Patch
(14.15 KB, patch)
2013-10-04 22:05 PDT
,
Darin Adler
no flags
Details
Formatted Diff
Diff
Patch
(26.49 KB, patch)
2013-10-04 22:54 PDT
,
Darin Adler
rniwa
: review+
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2008-10-02 11:05:48 PDT
<
rdar://problem/3611341
>
Jungshik Shin
Comment 2
2009-03-06 09:18:30 PST
Created
attachment 28363
[details]
tentative 1st step toward a fix I realized that Eric added the code language 'calcuation' for pseudo-lang selector support (so my previous comment about the lack of lang-dependent behavior is wrong). To refer to the value of lang in text transform (RenderText), I moved the code to Node. The second part will be to make upper/lower/capitalize lang-dependent (using ICU for ports that use it and by other means elsewhere) I also want to use this for
bug 10874
(and
bug 18085
) as well. Before going further, I'd like to know whether I'm on the right track. So, it'd be nice if any of you can take a quick look. Thanks !
Jungshik Shin
Comment 3
2009-03-09 10:41:01 PDT
Comment on
attachment 28363
[details]
tentative 1st step toward a fix I don't expect r+ for this patch (this patch even has '#if 0-enclosed code'), but I like to get some (potentially harsh) feedback to move things forward because the patch is also necessary for lang-based font selection bugs (mentioned earlier).
Jungshik Shin
Comment 4
2009-03-09 10:49:24 PDT
Comment on
attachment 28363
[details]
tentative 1st step toward a fix I realized that
bug 16801
deals with this issue and uploaded this path there.
Darin Adler
Comment 5
2009-03-09 10:52:26 PDT
Comment on
attachment 28363
[details]
tentative 1st step toward a fix There are 3 things done here: 1) Move the language-determining code for the style system from CSSStyleSelector into a member function in the Node class. 2) Adding a cache in every Node object to store that language code. 3) Changes to the logic of the language-determining code. I can't easily comment on much of (3), because the patch doesn't show the change from the old to the new due to the use of #if 0. There are some issues with (1): A) The style system only needs the language for elements, not other nodes, I would have the code in Element, not Node. But we won't want to dedicate 4 bytes in every Element to caching the language for that element. See below. In general, we need to keep the number of function members in these basic classes to a minimum, so it's unfortunate that the code is moving here. B) Breaking the language-computation out into a separate function is fine. That could be done as a first step of refactoring inside the CSSStyleSelector class. I don't think the function needs to be a member of the Node classes. The biggest problems are with (2): C) There's no way we can make all nodes 4 bytes bigger just to make it faster to retrieve the value of the lang attribute, so that aspect of this patch is unacceptable. There are techniques we could use, such as putting the cached language into the rare data or using an external hash table that could keep the cost down for typical elements. D) There's no code to invalidate the cached language when attributes change or when the DOM tree structure changes. For example, if you store the value of an attribute such as langAttr, you then need code to deal with the case where that attribute changes. This becomes even more complex when you cache the value of the lang attribute of a parent. Then that parent's attribute changes and somehow the cached value needs to be invalidated for all the descendants of that parent. The cost of a cache is the invalidation cost, and since there's no invalidation here it's unclear we could afford it. One issue with (3): E) It's generally much better to use iteration to walk through the parent list than to use recursion; we don't want an algorithm that uses machine stack proportional to the depth of the DOM tree if we can avoid it.
Jungshik Shin
Comment 6
2009-03-09 12:01:12 PDT
Thanks a lot for your very extensive review comment. In
bug 16801 comment 3
, Dave mentioned that lang should be in the style. Your comment on RareData led me to find that ElementRareData has access to computedStyle. So, if lang becomes available in the style, perhaps, we can just access it via ElementRareData. Then, I think there's no need to add it to ElementRareData (let alone Element or Node).
Avram Lyon
Comment 7
2010-03-18 23:10:04 PDT
Before this lands, please note that there are other affected languages. The Latin scripts for Crimean Tatar (crh), Volga Tatar (tt), and Bashkir (ba) all use the Turkish/Azerbaijani-style i/İ and ı/I pairings. All three have both Cyrillic and Latin scripts, and only the latter is affected, so perhaps this would require the use of script variants (e.g., tt-Latn), but Azerbaijani also has a well-represented Cyrillic script. I would like to see the new text-transform behavior apply to tt, crh, and ba as well.
Jungshik Shin
Comment 8
2011-06-24 10:49:02 PDT
Ironically, Element.cpp has computeInheritedLanguage almost identical to my patch here with the same shortcomings :-) It's added by isherman on 2010-09-02. (
http://trac.webkit.org/changeset/66647
) Anyway, I'll try to use it for text-transform. I'm also looking into the possibility of mapping 'lang' to '-webkit-locale' in HTMLElement.cpp (the same way HTML attribute dir is mapped to CSS property direction. I don't understand why we introduced '-webkit-locale' when we have 'lang' to determine what locale to use for hyphenation, line breaking, text-transform, font selection, etc. However, in retrospect, introducing '-webkit-locale' seems to be a brilliant move because that way, Darin's comment in this bug is almost automatically addressed. Perhaps, I'll do that in
bug 16801
.
Ilya Sherman
Comment 9
2011-06-24 14:28:03 PDT
(In reply to
comment #8
)
> Ironically, Element.cpp has computeInheritedLanguage almost identical to my patch here with the same shortcomings :-) It's added by isherman on 2010-09-02. (
http://trac.webkit.org/changeset/66647
)
Does my patch really have the same issues (I assume you're referring to
comment #5
)? It does not add any caching, so it looks to me like it satisfies Darin's concerns. If not, though, we should probably fix it :-)
Jungshik Shin
Comment 10
2011-06-24 14:57:34 PDT
(In reply to
comment #9
)
> (In reply to
comment #8
) > > Ironically, Element.cpp has computeInheritedLanguage almost identical to my patch here with the same shortcomings :-) It's added by isherman on 2010-09-02. (
http://trac.webkit.org/changeset/66647
) > > Does my patch really have the same issues (I assume you're referring to
comment #5
)? It does not add any caching, so it looks to me like it satisfies Darin's concerns. If not, though, we should probably fix it :-)
Sorry I missed two differences (recursion vs iteration and caching). I'll try to map 'lang' (HTML attribute) to '-webkit-locale' (CSS property), which may speed things up for font-fallback handing patch (in
bug 10874
).
Jungshik Shin
Comment 11
2011-07-19 15:58:11 PDT
Created
attachment 101403
[details]
pilot patch (to work with/test the patch for
bug 16801
) This is not meant to be for review or anything. It's just to test the patch for
bug 16801
. When
bug 16801
is fixed, perhaps I have to add a version of String::makeUpper() and String::makeLower() that accept a locale as an input. With the patch for
bug 16801
and this patch, it works correctly for Turkish tests at
http://www.w3.org/International/tests/tests-html-css/list-text-transform
Darin Adler
Comment 12
2013-10-04 22:05:47 PDT
Created
attachment 213441
[details]
Patch
Ryosuke Niwa
Comment 13
2013-10-04 22:20:41 PDT
Comment on
attachment 213441
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=213441&action=review
> Source/WTF/wtf/text/StringImpl.cpp:570 > +#if !USE(ICU_UNICODE) > + UNUSED_PARAM(localeIdentifier); > + return lower(); > +#else
Can we move this stuff into wtf/unicode? We can just leave the non-ICU version fallback to the regular lower().
> Source/WTF/wtf/text/StringImpl.cpp:572 > + if (!(localeIdentifier == "tr" || localeIdentifier == "az"))
It seems somewhat fragile to hard-code these two languages.
> Source/WTF/wtf/text/StringImpl.cpp:587 > + u_strToLower(data16, realLength, source16, length, "tr", &status);
The documentation says &status cannot indicate a failure. So maybe we need to reset the value here? But we're probably not getting the right real length if there was a failure anyway so I'm not sure if that's what we're supposed to do.
Ryosuke Niwa
Comment 14
2013-10-04 22:25:27 PDT
Comment on
attachment 213441
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=213441&action=review
> Source/WTF/wtf/text/StringImpl.cpp:584 > + if (U_SUCCESS(status) && realLength == length)
Is it correct to hard-code "tr" here? Perhaps this deserves a comment.
EFL EWS Bot
Comment 15
2013-10-04 22:29:21 PDT
Comment on
attachment 213441
[details]
Patch
Attachment 213441
[details]
did not pass efl-ews (efl): Output:
http://webkit-queues.appspot.com/results/3334026
mitz
Comment 16
2013-10-04 22:38:46 PDT
Comment on
attachment 213441
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=213441&action=review
>> Source/WTF/wtf/text/StringImpl.cpp:572 >> + if (!(localeIdentifier == "tr" || localeIdentifier == "az")) > > It seems somewhat fragile to hard-code these two languages.
Is it sufficient to check for these two strings? Shouldn’t localeIdentifier values such as "tr-TR" or even “tR-Tr” be handled too?
Darin Adler
Comment 17
2013-10-04 22:54:13 PDT
Created
attachment 213442
[details]
Patch
Darin Adler
Comment 18
2013-10-04 22:56:05 PDT
(In reply to
comment #16
)
> Shouldn’t localeIdentifier values such as "tr-TR" or even “tR-Tr” be handled too?
Possibly. If so, then we need test coverage for that. And if true, then it seems wrong for us to use the type AtomicString for localeIdentifier values in RenderStyle, since equality comparison is an inappropriate way to interpret them.
Ryosuke Niwa
Comment 19
2013-10-04 23:02:26 PDT
Comment on
attachment 213442
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=213442&action=review
> Source/WTF/wtf/text/StringImpl.cpp:577 > + if (!(localeIdentifier == "tr" || localeIdentifier == "az")) > + return lower();
We should do case-insensitive comparison according to BCP47 format as specified in
http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#the-lang-and-xml:lang-attributes
We can add then add test cases for TR-TR, etc...
Darin Adler
Comment 20
2013-10-04 23:37:10 PDT
Committed
r156948
: <
http://trac.webkit.org/changeset/156948
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug