Bug 12643

Summary:

NPN_Status is using latin-1 encoding for the message instead of UTF-8

Product:

WebKit

Reporter:

Alexey Proskuryakov <ap>

Component:

Plug-ins

Assignee:

Alexey Proskuryakov <ap>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

darin

Priority:

Version:

420+

Hardware:

Mac

OS:

OS X 10.4

Attachments:

Description	Flags
proposed patch	mjs: review+
fix the fix	darin: review+

Alexey Proskuryakov

Reported 2007-02-06 21:49:22 PST

I believe the convention is that char* pointers are UTF-8.

Attachments
proposed patch (1.34 KB, patch) 2007-02-06 21:52 PST, Alexey Proskuryakov	mjs: review+	Details Formatted Diff Diff
fix the fix (1.28 KB, patch) 2007-02-13 21:28 PST, Alexey Proskuryakov	darin: review+	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Alexey Proskuryakov

Comment 1 2007-02-06 21:52:41 PST

Created attachment 12996 [details] proposed patch

Geoffrey Garen

Comment 2 2007-02-06 22:08:58 PST

Comments in the file indicate that we use Latin1 on purpose to handle invalid input (since there's no such thing as invalid Latin1). Darin's check-in comment (from svn) was: Use Windows Latin-1 rather than MacRoman for messages. I guess we should test later to find out which is used by MacIE and conform to that, but for now this seems like a better default. Have you done any testing to see what other browsers support? I'd like Darin to review this, but I think it should be a - unless we're diverging from other browsers here.

Alexey Proskuryakov

Comment 3 2007-02-06 22:17:37 PST

Mozilla uses UTF-8: <http://lxr.mozilla.org/mozilla1.8/source/layout/generic/nsObjectFrame.cpp#2547>

Maciej Stachowiak

Comment 4 2007-02-06 22:45:42 PST

Comment on attachment 12996 [details] proposed patch r=me

Geoffrey Garen

Comment 5 2007-02-07 06:45:27 PST

In that case, should we convert all the calls from Latin1?

Alexey Proskuryakov

Comment 6 2007-02-07 09:15:57 PST

That's a fair question! While investigating this bug, I looked for other places in WebBaseNetscapePluginView.mm that handled C strings as Latin-1, and found GetURL/PostURL. However, URLs are formally ASCII anyway, so the behavior for non-ASCII ones may be more tricky, and I didn't investigate it yet.

Alexey Proskuryakov

Comment 7 2007-02-10 06:44:36 PST

Darin, Geoff has suggested asking you to look at this patch before landing it - could you please take a look?

Darin Adler

Comment 8 2007-02-12 10:39:53 PST

I did look and talked to Geoff. I don't have anything definitive to say here except that: 1) we should not break existing plug-ins, even ones that have assumptions about our strange behavior 2) we should act like other browsers do, to make plug-ins work as well as possible 3) we should behave in a way that "makes sense" and works with any characters 4) if we are encoding or decoding in a way that can have failures, we need to carefully define what we do in those failure cases -- converting arbitrary bytes interpreted as Latin-1 to UTF-16 is an example of something that can never fail, as is encoding valid UTF-16 to UTF-8 bytes, where as converting to Latin-1 or decoding UTF-8 are examples of things that can fail But I assume you already know all of that.

Alexey Proskuryakov

Comment 9 2007-02-13 11:44:03 PST

Committed revision 19608. I believe that in this particular case an outright failure to set status text is better than displaying garbage that may result from processing it as Latin-1, thus landing as is.

Alexey Proskuryakov

Comment 10 2007-02-13 21:28:27 PST

Created attachment 13160 [details] fix the fix Darin has pointed out that I forgot the most obvious problem: CFRelease will crash when given a null parameter. So, it's still necessary to explicitly check for conversion failure.

Darin Adler

Comment 11 2007-02-14 08:57:16 PST

Comment on attachment 13160 [details] fix the fix r=me

Alexey Proskuryakov

Comment 12 2007-02-14 10:12:09 PST

Committed revision 19625.

Note You need to log in before you can comment on or make changes to this bug.