Bug 72944 - [Windows, WinCairo] Load from non-ASCII file system path fails
Summary: [Windows, WinCairo] Load from non-ASCII file system path fails
Status: ASSIGNED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Windows 7
: P2 Normal
Assignee: Brent Fulgham
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-22 05:19 PST by Heiner Wolf
Modified: 2023-12-14 10:35 PST (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Heiner Wolf 2011-11-22 05:19:15 PST
My app which uses WebKit-Cairo fails to load from a file:// URL with non-ASCII characters in the path

I verified this to be a WebKit-Cairo problem by modifying WebKit/Tools/WinLauncher/WinLauncher to load from a file path. Works with ASCII only paths. Does not load when I change one ASCII character to a German umlaut (ü) or Japanese Katakana (ホ). I assume non ASCII paths works with Apple libs, hence I point to the Cairo port.

Using this URL:
  L"file://C/ProjDir/test/aホü/NonAsciiPath.html"
as input to
  IWebMutableURLRequest::initWithURL(BSTR)

hits IWebResourceLoadDelegate::didFailLoadingWithError() with 
  localizedDescription=Couldn't read a file:// file
  failingURL=file://C/ProjDir/test/a%EF%BE%8E%C3%BC/NonAsciiPath.html

The Windows wide chars
  L"aホü";
are equivalent to UTF-8:
  "a\xef\xbe\x8e\xc3\xbc"
Same as:
  a%EF%BE%8E%C3%BC
The failingURL is basically correct, a URL-encoded UTF-8 of the input.

I observe, that WebKit UTF-8 and URL-encodes the WCHAR URL from IWebMutableURLRequest::initWithURL(BSTR) and feeds it to libcurl [Windows, WinCairo]. I _assume_, that libcurl tries to fopen() the URL-encoded path, which fails. For file:// it would need a back conversion from URL-encode to UTF-8 and UTF-8 to WCHAR, then _wopen. 

If curl is using the ASCII Windows API as I suspect reading
  https://github.com/bagder/curl/blob/7248439fece7d0f32cc3d52c253b3960a66ad2b3/lib/file.c#L344
then we are doomed. This should be _wopen() with a prior ::MultiByteToWideChar() or _curl_win32_UTF8_to_wchar() for that matter.

Maybe someone who knows the inner workings better can comment, before I dig deeper.