Bug 179416

Summary: Remove support for iOS-only softbank-sjis encoding if possible
Product: WebKit Reporter: Maciej Stachowiak <mjs>
Component: TextAssignee: Maciej Stachowiak <mjs>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, buildbot, commit-queue, darin, mmaxfield, rniwa, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: Safari Technology Preview   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 179303    
Attachments:
Description Flags
Patch
none
Patch, doesn't yet address request to avoid depending on parsing details
none
Archive of layout-test-results from ews103 for mac-elcapitan
none
Archive of layout-test-results from ews107 for mac-elcapitan-wk2
none
Archive of layout-test-results from ews126 for ios-simulator-wk2
none
Archive of layout-test-results from ews117 for mac-elcapitan
none
Patch with test that works, and done the way Alexey suggested
none
Patch with further tweaks to the regression test
none
Patch for landing none

Description Maciej Stachowiak 2017-11-08 00:02:52 PST
Remove support for iOS-only softbank-sjis encoding if possible. I'm not entirely sure how to check. There are apparently also docomo-sjis and kddi-sjis, but we don't support those in WebKit. See bug #19309 for more info.
Comment 1 Maciej Stachowiak 2017-11-08 00:03:51 PST
I meant bug #179309
Comment 2 Maciej Stachowiak 2017-11-08 00:44:28 PST
By my reading of the code, it seems like the softbank-sjis text codec can't possibly actually work, which might be evidence that it's actually not needed.

The code looks like this:

#if PLATFORM(IOS)
    // See comment above in registerEncodingNames().
    int32_t i = 0;
    for (const char* alias = textCodecMacAliases[i]; alias; alias = textCodecMacAliases[++i])
        registrar(alias, create, 0);
#endif

But that will result in a TextCodecICU with a null m_canonicalConverterName, which will therefore always fail to create the ICU converter, because it will pass null to ucnv_open.
Comment 3 Darin Adler 2017-11-08 09:09:08 PST
(In reply to Maciej Stachowiak from comment #2)
> But that will result in a TextCodecICU with a null m_canonicalConverterName,
> which will therefore always fail to create the ICU converter, because it
> will pass null to ucnv_open.

I think you are right that it doesn’t work.

From my reading of the code, it won’t fail, but also won’t do anything useful. From the ucnv_open documentation:

"If this parameter is NULL, the default converter will be used."

So it will create ICU's "default converter", whatever that is; reading the documentation doesn’t really make it all that clear to me what it is, although it can be changed with ucnv_setDefaultName.

And also, it seems that TextCodecICU::createICUConverter will crash with a null dereference calling strcmp, if this is tried again after the converter is cached.

So, in summary, should almost certainly remove, highly like it does not do anything useful!
Comment 4 Alexey Proskuryakov 2017-11-08 09:33:13 PST
Indeed, <meta charset="softbank-sjis"> doesn't work! May be good to also check if it cannot be specified using WebKit API or SPI by a client.
Comment 5 Darin Adler 2017-11-08 09:38:50 PST
I can’t see any way it would work given the way the code is written.
Comment 6 Maciej Stachowiak 2017-11-08 11:24:48 PST
(In reply to Alexey Proskuryakov from comment #4)
> Indeed, <meta charset="softbank-sjis"> doesn't work! May be good to also
> check if it cannot be specified using WebKit API or SPI by a client.

Do you know how I could make a test case for this? I'm not sure what the softbank-sjis encoding is supposed to look like.
Comment 7 Maciej Stachowiak 2017-11-08 17:03:05 PST
Created attachment 326408 [details]
Patch
Comment 8 Alexey Proskuryakov 2017-11-08 17:51:22 PST
Comment on attachment 326408 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=326408&action=review

This patch only has LayoutTests changes, nothing in WebCore.

> LayoutTests/fast/encoding/charset-softbank-sjis.html:5
> +    <meta charset=utf-8>

Pretty sure that default encoding is well defined in DRT/WKTR, so I'd omit this line. I feel like we are unnecessarily relying on parser details here.

The small downside is that the test will fail locally. One way to correct for that is to check that document.charset is not softbank-sjis (or doesn't include "jis", for some resilience). Or you could print the actual charset and have text explaining that the test passes as long as it's not softbank-sjis.

> LayoutTests/fast/encoding/charset-softbank-sjis.html:9
> +    if (window.testRunner)
> +        testRunner.dumpAsText();

js-test-pre.js enables dumpAsText.

I recommend using js-test.js in new tests instead, as is saves one line from boilerplate.
Comment 9 Maciej Stachowiak 2017-11-08 18:29:16 PST
(In reply to Alexey Proskuryakov from comment #8)
> Comment on attachment 326408 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=326408&action=review
> 
> This patch only has LayoutTests changes, nothing in WebCore.

Oops, I'll upload a new one with the actual code change.

> 
> > LayoutTests/fast/encoding/charset-softbank-sjis.html:5
> > +    <meta charset=utf-8>
> 
> Pretty sure that default encoding is well defined in DRT/WKTR, so I'd omit
> this line. I feel like we are unnecessarily relying on parser details here.
> 
> The small downside is that the test will fail locally. One way to correct
> for that is to check that document.charset is not softbank-sjis (or doesn't
> include "jis", for some resilience). Or you could print the actual charset
> and have text explaining that the test passes as long as it's not
> softbank-sjis.

This was the only way I could figure out to get the test to behave consistently between DRT/WKTR and when opened in Safari. DTR/WKTR defaults to windows-1252, while Safari defaults to utf-8.

I could make a test for encoding name that doesn't depend on the exact encoding chosen as you suggest. But I am not sure how I could do the specific character test, where I check that a UTF-8 e with acute accent is decoded as such. Can you think of a way to do that that's less weird than depending on HTML rules for parsing <meta charset>?

> 
> > LayoutTests/fast/encoding/charset-softbank-sjis.html:9
> > +    if (window.testRunner)
> > +        testRunner.dumpAsText();
> 
> js-test-pre.js enables dumpAsText.
> 
> I recommend using js-test.js in new tests instead, as is saves one line from
> boilerplate.

Will do.
Comment 10 Maciej Stachowiak 2017-11-08 18:33:15 PST
Created attachment 326415 [details]
Patch, doesn't yet address request to avoid depending on parsing details
Comment 11 Alexey Proskuryakov 2017-11-08 18:53:13 PST
> But I am not sure how I could do the specific character test, where I check that a UTF-8 e with acute accent is decoded as such.

What is the reason why you want to decode a character? I think that checking document.charset is a good test for this fix.
Comment 12 Maciej Stachowiak 2017-11-08 19:05:45 PST
(In reply to Alexey Proskuryakov from comment #11)
> > But I am not sure how I could do the specific character test, where I check that a UTF-8 e with acute accent is decoded as such.
> 
> What is the reason why you want to decode a character? I think that checking
> document.charset is a good test for this fix.

Some of our other encoding tests do that, and it seems worthwhile to check that the document decodes as expected in addition to checking the exposed encoding name.
Comment 13 Build Bot 2017-11-08 19:28:56 PST
Comment on attachment 326415 [details]
Patch, doesn't yet address request to avoid depending on parsing details

Attachment 326415 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/5157050

New failing tests:
fast/encoding/charset-softbank-sjis.html
Comment 14 Build Bot 2017-11-08 19:28:57 PST
Created attachment 326416 [details]
Archive of layout-test-results from ews103 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 15 Build Bot 2017-11-08 19:32:42 PST
Comment on attachment 326415 [details]
Patch, doesn't yet address request to avoid depending on parsing details

Attachment 326415 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/5157002

New failing tests:
fast/encoding/charset-softbank-sjis.html
Comment 16 Build Bot 2017-11-08 19:32:44 PST
Created attachment 326417 [details]
Archive of layout-test-results from ews107 for mac-elcapitan-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107  Port: mac-elcapitan-wk2  Platform: Mac OS X 10.11.6
Comment 17 Alexey Proskuryakov 2017-11-08 19:48:31 PST
I personally think that it's enough to check the exposed name, but if you want to test end to end, the workaround is to test in a subframe - the encoding of the top frame is inherited as default for the child frame, as long as they are same origin.
Comment 18 Build Bot 2017-11-08 20:01:36 PST
Comment on attachment 326415 [details]
Patch, doesn't yet address request to avoid depending on parsing details

Attachment 326415 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/5157104

New failing tests:
fast/encoding/charset-softbank-sjis.html
Comment 19 Build Bot 2017-11-08 20:01:37 PST
Created attachment 326421 [details]
Archive of layout-test-results from ews126 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews126  Port: ios-simulator-wk2  Platform: Mac OS X 10.12.6
Comment 20 Build Bot 2017-11-08 20:07:17 PST
Comment on attachment 326415 [details]
Patch, doesn't yet address request to avoid depending on parsing details

Attachment 326415 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/5157100

New failing tests:
fast/encoding/charset-softbank-sjis.html
Comment 21 Build Bot 2017-11-08 20:07:19 PST
Created attachment 326423 [details]
Archive of layout-test-results from ews117 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews117  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 22 Maciej Stachowiak 2017-11-08 23:02:35 PST
I accidentally took out the test for encoding name. I'll upload a new patch with that fixed, plus Alexey's suggestion to use an iframe for consistent encoding results.
Comment 23 Maciej Stachowiak 2017-11-08 23:22:50 PST
Created attachment 326431 [details]
Patch with test that works, and done the way Alexey suggested
Comment 24 Alexey Proskuryakov 2017-11-08 23:51:16 PST
Comment on attachment 326431 [details]
Patch with test that works, and done the way Alexey suggested

View in context: https://bugs.webkit.org/attachment.cgi?id=326431&action=review

> LayoutTests/fast/encoding/charset-softbank-sjis-expected.txt:5
> +TEST COMPLETE
> +Test PASSED if this renders as an accented e: âéâ.

This is not right.

> LayoutTests/fast/encoding/resources/softbank-sjis-iframe.html:5
> +<script src="../../../resources/js-test.js"></script>

I think that this should be in the main frame. Otherwise, it will call notifyDone once the subframe is done loading, which may be before the main frame is done.

> LayoutTests/fast/encoding/resources/softbank-sjis-iframe.html:11
> +Test PASSED if this renders as an accented e: âéâ.

I think that it would work better if this were at the top of <body> (and I'd make it an explicit element). But also, why not put it into a script as a shouldBe? I think that this should do it:

shouldBe("'\xe9'", "'é'");
Comment 25 Maciej Stachowiak 2017-11-09 00:05:19 PST
Created attachment 326433 [details]
Patch with further tweaks to the regression test
Comment 26 Alexey Proskuryakov 2017-11-09 09:40:10 PST
Comment on attachment 326433 [details]
Patch with further tweaks to the regression test

View in context: https://bugs.webkit.org/attachment.cgi?id=326433&action=review

Great!

> Source/WebCore/ChangeLog:9
> +        (WebCore::TextCodecICU::registerEncodingNames): Remove support for softbanj-sjis

"softbank"
Comment 27 Maciej Stachowiak 2017-11-09 10:05:28 PST
Created attachment 326461 [details]
Patch for landing
Comment 28 WebKit Commit Bot 2017-11-09 10:37:13 PST
Comment on attachment 326461 [details]
Patch for landing

Clearing flags on attachment: 326461

Committed r224636: <https://trac.webkit.org/changeset/224636>
Comment 29 WebKit Commit Bot 2017-11-09 10:37:15 PST
All reviewed patches have been landed.  Closing bug.
Comment 30 Radar WebKit Bug Importer 2017-11-15 09:39:56 PST
<rdar://problem/35562183>