Bug 12643

Summary: NPN_Status is using latin-1 encoding for the message instead of UTF-8
Product: WebKit Reporter: Alexey Proskuryakov <ap>
Component: Plug-insAssignee: Alexey Proskuryakov <ap>
Severity: Normal CC: darin
Priority: P2    
Version: 420+   
Hardware: Mac   
OS: OS X 10.4   
Description Flags
proposed patch
mjs: review+
fix the fix darin: review+

Description Alexey Proskuryakov 2007-02-06 21:49:22 PST
I believe the convention is that char* pointers are UTF-8.
Comment 1 Alexey Proskuryakov 2007-02-06 21:52:41 PST
Created attachment 12996 [details]
proposed patch
Comment 2 Geoffrey Garen 2007-02-06 22:08:58 PST
Comments in the file indicate that we use Latin1 on purpose to handle invalid input (since there's no such thing as invalid Latin1).

Darin's check-in comment (from svn) was:

Use Windows Latin-1 rather than MacRoman for messages. I guess we should test later to find out which is used by MacIE and conform to that, but for now this seems like a better default.

Have you done any testing to see what other browsers support?

I'd like Darin to review this, but I think it should be a - unless we're diverging from other browsers here.
Comment 3 Alexey Proskuryakov 2007-02-06 22:17:37 PST
Mozilla uses UTF-8: <http://lxr.mozilla.org/mozilla1.8/source/layout/generic/nsObjectFrame.cpp#2547>
Comment 4 Maciej Stachowiak 2007-02-06 22:45:42 PST
Comment on attachment 12996 [details]
proposed patch

Comment 5 Geoffrey Garen 2007-02-07 06:45:27 PST
In that case, should we convert all the calls from Latin1?
Comment 6 Alexey Proskuryakov 2007-02-07 09:15:57 PST
That's a fair question! While investigating this bug, I looked for other places in WebBaseNetscapePluginView.mm that handled C strings as Latin-1, and found GetURL/PostURL. However, URLs are formally ASCII anyway, so the behavior for non-ASCII ones may be more tricky, and I didn't investigate it yet.
Comment 7 Alexey Proskuryakov 2007-02-10 06:44:36 PST
Darin, Geoff has suggested asking you to look at this patch before landing it - could you please take a look?
Comment 8 Darin Adler 2007-02-12 10:39:53 PST
I did look and talked to Geoff. I don't have anything definitive to say here except that:

    1) we should not break existing plug-ins, even ones that have assumptions about our strange behavior
    2) we should act like other browsers do, to make plug-ins work as well as possible
    3) we should behave in a way that "makes sense" and works with any characters
    4) if we are encoding or decoding in a way that can have failures, we need to carefully define what we do in those failure cases -- converting arbitrary bytes interpreted as Latin-1 to UTF-16 is an example of something that can never fail, as is encoding valid UTF-16 to UTF-8 bytes, where as converting to Latin-1 or decoding UTF-8 are examples of things that can fail

But I assume you already know all of that.
Comment 9 Alexey Proskuryakov 2007-02-13 11:44:03 PST
Committed revision 19608.

I believe that in this particular case an outright failure to set status text is better than displaying garbage that may result from processing it as Latin-1, thus landing as is.
Comment 10 Alexey Proskuryakov 2007-02-13 21:28:27 PST
Created attachment 13160 [details]
fix the fix

Darin has pointed out that I forgot the most obvious problem: CFRelease will crash when given a null parameter. So, it's still necessary to explicitly check for conversion failure.
Comment 11 Darin Adler 2007-02-14 08:57:16 PST
Comment on attachment 13160 [details]
fix the fix

Comment 12 Alexey Proskuryakov 2007-02-14 10:12:09 PST
Committed revision 19625.