<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>233748</bug_id>
          
          <creation_ts>2021-12-01 23:06:02 -0800</creation_ts>
          <short_desc>Tamil conjuncts are not selected as a single unit when styling initials</short_desc>
          <delta_ts>2021-12-08 23:06:17 -0800</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Layout and Rendering</component>
          <version>WebKit Nightly Build</version>
          <rep_platform>Unspecified</rep_platform>
          <op_sys>Unspecified</op_sys>
          <bug_status>NEW</bug_status>
          <resolution></resolution>
          
          <see_also>https://bugs.webkit.org/show_bug.cgi?id=228992</see_also>
    
    <see_also>https://bugs.webkit.org/show_bug.cgi?id=179815</see_also>
    
    <see_also>https://bugs.webkit.org/show_bug.cgi?id=127519</see_also>
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords>InRadar</keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Fuqiao Xue">xfq.free</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>ap</cc>
    
    <cc>bfulgham</cc>
    
    <cc>darin</cc>
    
    <cc>mmaxfield</cc>
    
    <cc>simon.fraser</cc>
    
    <cc>webkit-bug-importer</cc>
    
    <cc>zalan</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>1819580</commentid>
    <comment_count>0</comment_count>
      <attachid>445674</attachid>
    <who name="Fuqiao Xue">xfq.free</who>
    <bug_when>2021-12-01 23:06:02 -0800</bug_when>
    <thetext>Created attachment 445674
Test case

When the start of a line contains a consonant cluster that uses a conjunct (rather than visible virama), ::first-letter should highlight the whole cluster. Usually, modern Tamil has only two of these conjuncts, however one of them can be created in two ways (making a total of 3 clusters to test).

This doesn&apos;t work well if segmentation relies on Unicode grapheme clusters, since a conjunct with two consonants will be parsed as two grapheme clusters (the first ending after the virama, and the second starting with the second consonant and including any following vowel-signs or other combining characters).

For these situations it is necessary to tailor the segmentation algorithm, so that it recognises the whole consonant cluster plus any attached vowel-signs or combining characters as a single unit.  This is a particular issue for Tamil, since all other clusters are typically decomposed and show the virama.

Tests &amp; results:

Interactive test, When ::first-letter is applied to Tamil the browser will select the KSHA and SHRI conjuncts as a single unit
https://github.com/w3c/line_paragraph_tests/issues/72

Gecko produces the expected result. Webkit and Blink only select the first consonant+pulli.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1820182</commentid>
    <comment_count>1</comment_count>
    <who name="Darin Adler">darin</who>
    <bug_when>2021-12-03 09:41:09 -0800</bug_when>
    <thetext>I wonder which Unicode algorithm is the basis for implementing the correct behavior here. We don’t want to come up with something novel, but I understand that to get this right we need to go beyond &quot;grapheme cluster&quot;.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1820193</commentid>
    <comment_count>2</comment_count>
    <who name="Darin Adler">darin</who>
    <bug_when>2021-12-03 10:04:43 -0800</bug_when>
    <thetext>For example, is &quot;extended grapheme cluster&quot; enough?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1820391</commentid>
    <comment_count>3</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2021-12-03 17:37:49 -0800</bug_when>
    <thetext>FWIW, following https://drafts.csswg.org/css-pseudo/#first-letter-pseudo it looks like we&apos;d need to devise something that matches platform behavior:

&gt; A UA must use the extended grapheme cluster (not legacy grapheme cluster), as defined in UAX29, as the basis for its typographic character unit. However, the UA should tailor the definitions as required by typographic tradition since the default rules are not always appropriate or ideal—and is expected to tailor them differently depending on the operation as needed.

Maybe it can be the same as character selection.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1820396</commentid>
    <comment_count>4</comment_count>
    <who name="Myles C. Maxfield">mmaxfield</who>
    <bug_when>2021-12-03 18:11:47 -0800</bug_when>
    <thetext>I’m not sure if our platform has any concept of initial letter… Maybe I should talk to the Pages engineers.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1820825</commentid>
    <comment_count>5</comment_count>
    <who name="Darin Adler">darin</who>
    <bug_when>2021-12-06 17:00:38 -0800</bug_when>
    <thetext>It does have the concept of &quot;shift-right-arrow to select one character&quot;, which is what Alexey was referring to.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1822011</commentid>
    <comment_count>6</comment_count>
    <who name="Radar WebKit Bug Importer">webkit-bug-importer</who>
    <bug_when>2021-12-08 23:06:17 -0800</bug_when>
    <thetext>&lt;rdar://problem/86255152&gt;</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>445674</attachid>
            <date>2021-12-01 23:06:02 -0800</date>
            <delta_ts>2021-12-01 23:06:02 -0800</delta_ts>
            <desc>Test case</desc>
            <filename>index.html</filename>
            <type>text/html</type>
            <size>429</size>
            <attacher name="Fuqiao Xue">xfq.free</attacher>
            
              <data encoding="base64">PCFET0NUWVBFIGh0bWw+CjxodG1sIGxhbmc9ImVuIj4KCjxoZWFkPgogIDxtZXRhIGNoYXJzZXQ9
InV0Zi04Ij4KICA8dGl0bGU+TXkgdGVzdCBwYWdlPC90aXRsZT4KICA8c3R5bGU+CiAgICAuZmly
c3RMZXR0ZXJUZXN0IHsKICAgICAgY29sb3I6IGdyZXk7CiAgICB9CgogICAgLmZpcnN0TGV0dGVy
VGVzdDo6Zmlyc3QtbGV0dGVyIHsKICAgICAgY29sb3I6IGJsdWU7CiAgICAgIGZvbnQtc2l6ZTog
MTUwJTsKICAgIH0KICA8L3N0eWxlPgo8L2hlYWQ+Cgo8Ym9keSBzdHlsZT0iZm9udC1mYW1pbHk6
IE5vdG8gU2FucyBUYW1pbDsiPgoKICA8cCBjbGFzcz0iZmlyc3RMZXR0ZXJUZXN0Ij4KICAgIOCu
leCvjeCutwogIDwvcD4KCiAgPHAgY2xhc3M9ImZpcnN0TGV0dGVyVGVzdCI+CiAgICDgrrbgr43g
rrDgr4AKICA8L3A+Cgo8L2JvZHk+Cgo8L2h0bWw+
</data>

          </attachment>
      

    </bug>

</bugzilla>