Bug 137756 - WKWebView: JavaScript fails to load, apparently due to decoding error
Summary: WKWebView: JavaScript fails to load, apparently due to decoding error
Status: RESOLVED INVALID
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit2 (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-15 15:17 PDT by Nolan Lawson
Modified: 2014-10-20 12:39 PDT (History)
3 users (show)

See Also:


Attachments
testcase as HTML/JS (55.55 KB, application/x-gzip)
2014-10-15 15:17 PDT, Nolan Lawson
no flags Details
screenshot showing the index.js interpreted with CJK characters (150.31 KB, image/png)
2014-10-15 15:17 PDT, Nolan Lawson
no flags Details
Minimal iOS repro case (15.14 KB, application/zip)
2014-10-20 09:01 PDT, Morten Heiberg
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nolan Lawson 2014-10-15 15:17:12 PDT
Created attachment 239900 [details]
testcase as HTML/JS

Somehow it seems possible to completely fail to load a remote JavaScript file. We can apparently trigger it by loading a large string into memory (or maybe it's that the string contains non-ASCII characters - haven't figured it out yet.)

In any case, we have a working test case that will reproduce the error. Open up the attached index.html and index.js file in a WKWebView, and notice that the index.js is not loaded. If you open up the Safari web inspector, you'll notice that the file is apparently not interpreted in the correct encoding, because it's full of CJK characters (see screenshot). This occurs regardless of whether the file is local or served by a remote web server.

If the provided HTML/JS files are not enough to reproduce the bug, then we will try to put together a minimal iOS app along with the HTML/JS to reproduce.

Live test case: http://bl.ocks.org/nolanlawson/d5a2d5f6e73bc6aaf300
Comment 1 Nolan Lawson 2014-10-15 15:17:57 PDT
Created attachment 239901 [details]
screenshot showing the index.js interpreted with CJK characters
Comment 2 Nolan Lawson 2014-10-15 15:18:58 PDT
BTW this is on an iPhone 6 (both device and simulator) running 8.0 (12A365).
Comment 3 Nolan Lawson 2014-10-15 16:45:06 PDT
We found a workaround: if you load the large script from an external JS file, then everything's fine. Here's a demo of the workaround: http://bl.ocks.org/nolanlawson/49622757cc96e1c4c066
Comment 4 Alexey Proskuryakov 2014-10-16 00:14:59 PDT
What happens here is that when you call -[WKWebView loadHTMLString:baseURL:], the string is a UTF-16 one, and thus the document has UTF-16 encoding (which is the right thing to happen, because transport encoding takes precedence over meta charset). And then the JS resource inherits an encoding from the main document.

To fix this, you need to specify subresource encoding explicitly. The best way to do this is via Content-Type response header field.
Comment 5 Nolan Lawson 2014-10-16 14:12:40 PDT
Interesting. So do you know why the workaround fixes it? If the index.html is already UTF-16, then shouldn't the index.js also be misinterpreted in that case?

Otherwise it seems like the behavior differs based on the size of the index.html, or maybe the size of the inline script in the index.html.
Comment 6 Alexey Proskuryakov 2014-10-16 14:30:20 PDT
Not sure - I do not have any easy way to reproduce and debug. There are many differences between the original case and the one with workaround, notably you switched from <src src=...> to dynamically creating a script element.

Perhaps we have a quirk that dynamically created elements do not inherit encoding from the main document? Seems worth reducing the differences between the two tests to figure out what's going on.
Comment 7 Nolan Lawson 2014-10-16 14:47:46 PDT
Good point. I went ahead and reduced the difference between the two examples, so that the *only* difference is that the script in the <head> is loaded inline vs. remotely. I'm still able to reproduce the bug and workaround it in the same way as before.

New repro: http://bl.ocks.org/nolanlawson/a37af15ebd1ce20ee347
New workaround: http://bl.ocks.org/nolanlawson/d28d625a3c21ed450e06

Note that the only difference is the replacement.js, which is inline when the bug occurs.
Comment 8 Alexey Proskuryakov 2014-10-16 15:43:18 PDT
Interesting. One thing about the large script is that it has an apostrophe character in it, which makes it non-ASCII.

It is actually somewhat accidental that documents loaded via -loadHTMLString get UTF-16 encoding. The string goes through several conversions, and if it happens to be Latin-1 in the end, then the document gets latin-1 encoding, and otherwise it gets UTF-16 encoding. But it is unpredictable, and actually depends on OS version and other factors.

We should stop exposing this internal processing detail, so please don't rely on it as a workaround. The right workaround is to have encoding explicitly specified by subresources.
Comment 9 Nolan Lawson 2014-10-20 08:19:04 PDT
Specifying the subresource encoding is a fine workaround when the developer has control over the web server. But in our case we want to use `loadHTMLString` to load a dynamic `index.html` page and then have subresources accessed as file:// URLs. So in this case we have no control over the headers.

In Android the roughly equivalent `loadDataWithBaseURL` method [1] gives the option to specify the encoding. Is this something that could be exposed in the WK framework? If not, a typical hybrid app with local resources that wants to use `loadHTMLString` would need to also run a local web server on the device, which seems pretty wasteful.

[1]: https://developer.android.com/reference/android/webkit/WebView.html#loadDataWithBaseURL%28java.lang.String%2C%20java.lang.String%2C%20java.lang.String%2C%20java.lang.String%2C%20java.lang.String%29
Comment 10 Morten Heiberg 2014-10-20 09:01:13 PDT
It looks like the apostrophe that Alexey spotted in the large script is exactly the reason why this happens.

I am attaching a small iOS project which reliably reproduces this issue. 

The large script from Nolan's original index.html has been removed and replaced with a single string variable containing the U+2019 "smart quote" apostrophe, reducing the test case to 16 lines.

The ViewController.m file instantiates a WKWebView and a LOAD_THROUGH_STRING macro toggles loading the HTML either through WKWebView's loadHTMLString:baseURL: or loadRequest: method.

The UTF-16 JS encoding issue is seen when using loadHTMLString:baseURL: but not when using loadRequest:

Currently it seems that the only two workarounds for this are

1: Embed a web server in the application and serve local assets over HTTP, controlling the encoding using HTTP headers.
2: Write all dynamically generated HTML strings to the file system (which might not even work long term if I read Alexey's comment correctly).

Neither seems like a good solution. In light of this I would like to ask for the "INVALID" resolution to be reconsidered.
Comment 11 Morten Heiberg 2014-10-20 09:01:57 PDT
Created attachment 240127 [details]
Minimal iOS repro case
Comment 12 Alexey Proskuryakov 2014-10-20 10:02:59 PDT
> Specifying the subresource encoding is a fine workaround when the developer has control over the web server. But in our case we want to use `loadHTMLString` to load a dynamic `index.html` page and then have subresources accessed as file:// URLs.

It sounds like there is a bigger problem if this is what you are doing - subresources won't load at all. You said that you observed the problem on a device, which is very surprising - sandboxing should have prevented loading any local files.

> So in this case we have no control over the headers.

A charset can also be specified in other ways:
- at the start of any text subresource, you can have a UTF-8 BOM (EF BB BF), which will make the resource interpreted as UTF-8;
- at the start of a CSS subresource, you can have a @charset rule;
- XSL stylesheets can have the charset declared in a regular XML way (<?xml version="1.0" encoding="UTF-8" ?>);
- in HTML that loads subresources, you can have charset attributes (<script src="myscript.js" charset="UTF-8">).

> Is this something that could be exposed in the WK framework?

UIWebView has always had a -loadData method, and it does seem like WKWebView needs one, too.

-(void)loadData:(id)data MIMEType:(id)type textEncodingName:(id)name baseURL:(id)url;

> In light of this I would like to ask for the "INVALID" resolution to be reconsidered.

In any case, please let's not reopen this bug. It has gone a long way since the original decoding mystery, and exactly because of this, it will be very difficult for others to read.

My suggestion would be to file a bug with Apple via http://bugreport.apple.com, asking for an equivalent of UIWebView's -loadData.
Comment 13 Morten Heiberg 2014-10-20 11:49:34 PDT
Just wanted to clarify:

> sandboxing should have prevented loading any local files

It does, unless you stage your files to NSTemporaryDirectory. If you look in the iOS repro project I added you'll see I'm doing exactly that to get around the sandboxing issue.
Comment 14 Alexey Proskuryakov 2014-10-20 12:39:31 PDT
> I'm doing exactly that to get around the sandboxing issue.

This is a smart workaround, however it's also very risky in that it may very well break in future releases.