Bug 47896 - [GTK] Use GCharsetConverter instead of g_iconv in TextCodecGtk
Summary: [GTK] Use GCharsetConverter instead of g_iconv in TextCodecGtk
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-19 03:44 PDT by Carlos Garcia Campos
Modified: 2010-10-21 12:33 PDT (History)
1 user (show)

See Also:


Attachments
Patch to use GCharsetConverter (11.65 KB, patch)
2010-10-19 03:54 PDT, Carlos Garcia Campos
mrobinson: review-
Details | Formatted Diff | Diff
Updated patch according to review (11.77 KB, patch)
2010-10-20 10:51 PDT, Carlos Garcia Campos
mrobinson: review-
Details | Formatted Diff | Diff
Another update according to review (11.98 KB, patch)
2010-10-20 23:50 PDT, Carlos Garcia Campos
mrobinson: review+
commit-queue: commit-queue-
Details | Formatted Diff | Diff
Update patch for current git master (11.87 KB, patch)
2010-10-21 10:18 PDT, Carlos Garcia Campos
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Carlos Garcia Campos 2010-10-19 03:44:57 PDT
It makes error handling easier
Comment 1 Carlos Garcia Campos 2010-10-19 03:54:06 PDT
Created attachment 71150 [details]
Patch to use GCharsetConverter

It fixes these tests when building with glib unicode:

fast/encoding/invalid-multi-byte-over-consumption.html
fast/encoding/invalid-xml.html
fast/encoding/japanese-encoding-mix.html
Comment 2 Martin Robinson 2010-10-20 09:39:04 PDT
Comment on attachment 71150 [details]
Patch to use GCharsetConverter

View in context: https://bugs.webkit.org/attachment.cgi?id=71150&action=review

Great patch! Needs a few fixes, but I like where this is headed.

> WebCore/platform/text/gtk/TextCodecGtk.cpp:322
> +    , m_iconvDecoder(0)
> +    , m_iconvEncoder(0)

No need to initialize these. The default constructor sets the RefPtr to 0.

> WebCore/platform/text/gtk/TextCodecGtk.cpp:359
> +    gsize bytesRead, bytesWritten;
> +    const gchar *input = bytes;

bytesRead and bytesWritten should be initialized and on separate lines. Should be const gchar* input = bytes.

> WebCore/platform/text/gtk/TextCodecGtk.cpp:362
> +    int flags = !length ?  G_CONVERTER_INPUT_AT_END : G_CONVERTER_NO_FLAGS;

There's an extra space after ?

> WebCore/platform/text/gtk/TextCodecGtk.cpp:392
> +        GOwnPtr<GError> err;
> +        GConverterResult res;
> +
> +        res = g_converter_convert(G_CONVERTER(m_iconvDecoder.get()),
> +                                  input,
> +                                  inputLength,
> +                                  buffer,
> +                                  sizeof(buffer),
> +                                  (GConverterFlags)flags,
> +                                  &bytesRead,
> +                                  &bytesWritten,
> +                                  &err.outPtr());

Please change "err" to "error".

No need to define and declare res separately. In general, I like this style of indenting, but in this case I think that it makes the method harder to read, because it is so long. Perhaps it could be like this?

GConverterResult res = g_converter_convert(G_CONVERTER(m_iconvDecoder.get()), input, inputLength
                                           buffer, sizeof(buffer), ...

I think you should also use a static_cast for flags.

> WebCore/platform/text/gtk/TextCodecGtk.cpp:399
> +            if (g_error_matches(err.get(), G_IO_ERROR, G_IO_ERROR_PARTIAL_INPUT)) {
> +                memcpy(m_bufferedBytes, input, inputLength);
> +                m_numBufferedBytes = inputLength;

You might want to also leave a comment here about why this might happen and what the strategy is for dealing with it, since it's not obvious from an initial inspection.

> WebCore/platform/text/gtk/TextCodecGtk.cpp:440
> +    gsize bytesRead, bytesWritten;
> +    const gchar *input = reinterpret_cast<const char*>(characters);

bytesRead and bytesWritten should be initialized and on separate lines. Careful with the * again.

> WebCore/platform/text/gtk/TextCodecGtk.cpp:444
>      GOwnPtr<GError> err;

Probably a good idea to get rid of this abbreviation now.

> WebCore/platform/text/gtk/TextCodecGtk.cpp:458
> +        res = g_converter_convert(G_CONVERTER(m_iconvEncoder.get()),
> +                                  input,
> +                                  inputLength,
> +                                  buffer,
> +                                  sizeof(buffer),
> +                                  G_CONVERTER_INPUT_AT_END,
> +                                  &bytesRead,
> +                                  &bytesWritten,
> +                                  &err.outPtr());

Same comment here as above.
Comment 3 Carlos Garcia Campos 2010-10-20 10:51:42 PDT
Created attachment 71308 [details]
Updated patch according to review
Comment 4 Martin Robinson 2010-10-20 20:50:06 PDT
Comment on attachment 71308 [details]
Updated patch according to review

View in context: https://bugs.webkit.org/attachment.cgi?id=71308&action=review

I have a couple more nits, but it's very close.

> WebCore/platform/text/gtk/TextCodecGtk.cpp:348
> +        ASSERT(m_iconvDecoder);
> +        if (!m_iconvDecoder) {

Sorry, I missed this before. Either we should remove the ASSERT here or not handle the failure case.

> WebCore/platform/text/gtk/TextCodecGtk.cpp:393
> +                // there is not enough input to fully determine what the conversion should produce
> +                // save it to a buffer to prepend it to the next input

Nice comment, but it should be a full sentence with punctuation, etc.

> WebCore/platform/text/gtk/TextCodecGtk.cpp:399
> +            } else if (g_error_matches(error.get(), G_IO_ERROR, G_IO_ERROR_NO_SPACE)) {
> +                bufferWasFull = true;
> +            } else if (g_error_matches(error.get(), G_IO_ERROR, G_IO_ERROR_INVALID_DATA)) {

WebKit style dictates that one-line blocks omit their surrounding curly braces (as much as I dislike it).

> WebCore/platform/text/gtk/TextCodecGtk.cpp:410
> +                m_numBufferedBytes = 0; // reset state for subsequent calls to decode

This comment should also be in sentence form.
Comment 5 Carlos Garcia Campos 2010-10-20 23:50:39 PDT
Created attachment 71394 [details]
Another update according to review
Comment 6 Martin Robinson 2010-10-21 00:22:44 PDT
Comment on attachment 71394 [details]
Another update according to review

Great. Thanks for the fixes. Hopefully we can continue to improve this code as we go along.
Comment 7 WebKit Commit Bot 2010-10-21 04:04:48 PDT
Comment on attachment 71394 [details]
Another update according to review

Rejecting patch 71394 from commit-queue.

Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', 'apply-attachment', '--force-clean', '--non-interactive', '--quiet', 71394]" exit_code: 2
Cleaning working directory
Updating working directory
Logging in as commit-queue@webkit.org...
Fetching: https://bugs.webkit.org/attachment.cgi?id=71394&action=edit
Fetching: https://bugs.webkit.org/show_bug.cgi?id=47896&ctype=xml
Processing 1 patch from 1 bug.
Processing patch 71394 from bug 47896.
Failed to run "[u'/Projects/CommitQueue/WebKitTools/Scripts/svn-apply', u'--reviewer', u'Martin Robinson', u'--force']" exit_code: 1

Full output: http://queues.webkit.org/results/4705008
Comment 8 Carlos Garcia Campos 2010-10-21 10:18:41 PDT
Created attachment 71454 [details]
Update patch for current git master
Comment 9 WebKit Commit Bot 2010-10-21 12:33:21 PDT
Comment on attachment 71454 [details]
Update patch for current git master

Clearing flags on attachment: 71454

Committed r70257: <http://trac.webkit.org/changeset/70257>
Comment 10 WebKit Commit Bot 2010-10-21 12:33:27 PDT
All reviewed patches have been landed.  Closing bug.