Bug 94202 - [GTK] Bad utf8 data is being passed to enchant_dict_check
Summary: [GTK] Bad utf8 data is being passed to enchant_dict_check
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: Gtk
Depends on:
Blocks:
 
Reported: 2012-08-16 00:54 PDT by Mario Sanchez Prada
Modified: 2012-08-16 10:33 PDT (History)
2 users (show)

See Also:


Attachments
Patch proposal (2.42 KB, patch)
2012-08-16 00:59 PDT, Mario Sanchez Prada
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mario Sanchez Prada 2012-08-16 00:54:57 PDT
I observed today the following error in the bots, when running certain layout tests like the following one:

  23:47:55.622 4977 worker/22 editing/selection/move-by-word-visually-single-space-inline-element.html output stderr lines:
  23:47:55.622 4977   enchant_dict_check: assertion `g_utf8_validate(word, len, NULL)' failed
  23:47:55.622 4977   enchant_dict_check: assertion `g_utf8_validate(word, len, NULL)' failed
  23:47:55.623 4977   enchant_dict_check: assertion `g_utf8_validate(word, len, NULL)' failed
  23:47:55.623 4977   enchant_dict_check: assertion `g_utf8_validate(word, len, NULL)' failed
  [...] << repeats some more times >>

So, I briefly investigated the issue and it seems the problem is easily fixable by doing this:

  --- a/Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp
  +++ b/Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp
  @@ -115,7 +115,7 @@ void TextCheckerEnchant::checkSpellingOfString(const String& string, int& misspe
               g_utf8_strncpy(word.get(), cstart, wordLength);
   
               for (; dictIter != m_enchantDictionaries.end(); ++dictIter) {
  -                if (enchant_dict_check(*dictIter, word.get(), wordLength)) {
  +                if (enchant_dict_check(*dictIter, word.get(), bytes)) {
                       misspellingLocation = start;
                       misspellingLength = wordLength;
                   } else {
  
The explanation is that the 'length' parameter in enchant_dict_check accepts a number of bytes and not the number of utf8 characters, so it will fail in cases like this:

  word: דעפ => total characters: 3 / total bytes: 6

Thus, of course a call to enchant_dict_check with 3 as length will fail
Comment 1 Mario Sanchez Prada 2012-08-16 00:59:10 PDT
Created attachment 158741 [details]
Patch proposal

Here comes the patch
Comment 2 Mario Sanchez Prada 2012-08-16 10:12:48 PDT
Comment on attachment 158741 [details]
Patch proposal

I am lazy
Comment 3 WebKit Review Bot 2012-08-16 10:33:46 PDT
Comment on attachment 158741 [details]
Patch proposal

Clearing flags on attachment: 158741

Committed r125791: <http://trac.webkit.org/changeset/125791>
Comment 4 WebKit Review Bot 2012-08-16 10:33:49 PDT
All reviewed patches have been landed.  Closing bug.