Created attachment 445674 [details] Test case When the start of a line contains a consonant cluster that uses a conjunct (rather than visible virama), ::first-letter should highlight the whole cluster. Usually, modern Tamil has only two of these conjuncts, however one of them can be created in two ways (making a total of 3 clusters to test). This doesn't work well if segmentation relies on Unicode grapheme clusters, since a conjunct with two consonants will be parsed as two grapheme clusters (the first ending after the virama, and the second starting with the second consonant and including any following vowel-signs or other combining characters). For these situations it is necessary to tailor the segmentation algorithm, so that it recognises the whole consonant cluster plus any attached vowel-signs or combining characters as a single unit. This is a particular issue for Tamil, since all other clusters are typically decomposed and show the virama. Tests & results: Interactive test, When ::first-letter is applied to Tamil the browser will select the KSHA and SHRI conjuncts as a single unit https://github.com/w3c/line_paragraph_tests/issues/72 Gecko produces the expected result. Webkit and Blink only select the first consonant+pulli.
I wonder which Unicode algorithm is the basis for implementing the correct behavior here. We don’t want to come up with something novel, but I understand that to get this right we need to go beyond "grapheme cluster".
For example, is "extended grapheme cluster" enough?
FWIW, following https://drafts.csswg.org/css-pseudo/#first-letter-pseudo it looks like we'd need to devise something that matches platform behavior: > A UA must use the extended grapheme cluster (not legacy grapheme cluster), as defined in UAX29, as the basis for its typographic character unit. However, the UA should tailor the definitions as required by typographic tradition since the default rules are not always appropriate or ideal—and is expected to tailor them differently depending on the operation as needed. Maybe it can be the same as character selection.
I’m not sure if our platform has any concept of initial letter… Maybe I should talk to the Pages engineers.
It does have the concept of "shift-right-arrow to select one character", which is what Alexey was referring to.
<rdar://problem/86255152>