<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>234030</bug_id>
          
          <creation_ts>2021-12-08 12:55:22 -0800</creation_ts>
          <short_desc>TextCodecUTF8 can skip characters after an invalid sequence near EOF</short_desc>
          <delta_ts>2021-12-09 09:50:33 -0800</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Page Loading</component>
          <version>WebKit Nightly Build</version>
          <rep_platform>Unspecified</rep_platform>
          <op_sys>Unspecified</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>DUPLICATE</resolution>
          <dup_id>233921</dup_id>
          <see_also>https://bugs.webkit.org/show_bug.cgi?id=233921</see_also>
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Andreu Botella">andreu</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>achristensen</cc>
    
    <cc>andreu</cc>
    
    <cc>beidson</cc>
    
    <cc>darin</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>1821736</commentid>
    <comment_count>0</comment_count>
      <attachid>446414</attachid>
    <who name="Andreu Botella">andreu</who>
    <bug_when>2021-12-08 12:55:22 -0800</bug_when>
    <thetext>Created attachment 446414
Sample to show that this bug affects page loading.

WPT tests: https://wpt.fyi/results/encoding/textdecoder-eof.any.html?label=experimental&amp;label=master&amp;aligned (also tests for bug 233921).

When the TextCodecUTF8 decoder finds a non-ASCII lead byte, it waits until enough bytes are consumed to make a valid sequence starting at that position, before starting to process the bytes. But if the stream is flushed before that, the decoder assumes that the remaining bytes are part of a truncated partial sequence, and so discards them while emitting a single replacement character. But this assumption doesn&apos;t necessarily hold, and it can result in non-replacement characters being skipped:

// &quot;�A&quot; in Firefox and Chromium 98, and according to the spec.
// &quot;��A&quot; in earlier versions of Chromium.
// &quot;�&quot; in WebKit.
new TextDecoder().decode(new Uint8Array([0xF0, 0x9F, 0x41]));

This can also result in fewer replacement characters being emitted than should be the case:

// &quot;��A&quot; in Firefox, Chrome, and according to the spec.
// &quot;�&quot; in WebKit.
new TextDecoder().decode(new Uint8Array([0xF0, 0x80, 0x41]));

This bug also affects page loading, as with the attached sample.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1822164</commentid>
    <comment_count>1</comment_count>
    <who name="Alex Christensen">achristensen</who>
    <bug_when>2021-12-09 09:50:17 -0800</bug_when>
    <thetext>

*** This bug has been marked as a duplicate of bug 233921 ***</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1822166</commentid>
    <comment_count>2</comment_count>
    <who name="Alex Christensen">achristensen</who>
    <bug_when>2021-12-09 09:50:33 -0800</bug_when>
    <thetext>This will be fixed with the same fix as bug 233921</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>446414</attachid>
            <date>2021-12-08 12:55:22 -0800</date>
            <delta_ts>2021-12-08 12:55:22 -0800</delta_ts>
            <desc>Sample to show that this bug affects page loading.</desc>
            <filename>utf8-eof-skipped-char-test.html</filename>
            <type>text/html</type>
            <size>50</size>
            <attacher name="Andreu Botella">andreu</attacher>
            
              <data encoding="base64">PCFET0NUWVBFIGh0bWw+CjxtZXRhIGNoYXJzZXQ9IlVURi04IiAvPgpUZXN0OiDwn0E=
</data>

          </attachment>
      

    </bug>

</bugzilla>