Bug 90269 - [GTK] Add a new and reusable enchant-based spellchecker in WebCore
Summary: [GTK] Add a new and reusable enchant-based spellchecker in WebCore
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Mario Sanchez Prada
URL:
Keywords: Gtk
Depends on: 92655 92656 92773
Blocks: 90268
  Show dependency treegraph
 
Reported: 2012-06-29 04:08 PDT by Mario Sanchez Prada
Modified: 2012-08-03 01:04 PDT (History)
4 users (show)

See Also:


Attachments
Patch proposal (24.29 KB, patch)
2012-06-29 05:35 PDT, Mario Sanchez Prada
no flags Details | Formatted Diff | Diff
Patch proposal (23.88 KB, patch)
2012-07-18 10:12 PDT, Mario Sanchez Prada
no flags Details | Formatted Diff | Diff
Patch proposal (26.16 KB, patch)
2012-07-20 08:47 PDT, Mario Sanchez Prada
no flags Details | Formatted Diff | Diff
Patch proposal (24.68 KB, patch)
2012-07-31 05:30 PDT, Mario Sanchez Prada
mrobinson: review+
Details | Formatted Diff | Diff
Patch proposal (25.89 KB, patch)
2012-08-02 06:52 PDT, Mario Sanchez Prada
mrobinson: review+
mrobinson: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mario Sanchez Prada 2012-06-29 04:08:59 PDT
The idea is to have a class in WebCore that we can use both for WebKit and WebKit2, which would take care of dealing with libenchant to implement a spellchecker using that library.

This new class would allow simplifying code in Source/WebKit/gtk/webkit/webkitspellcheckerenchant.cpp and would ease the implementation of the new spell checker API for WebKit2GTK (see bug 90268)
Comment 1 Mario Sanchez Prada 2012-06-29 05:35:51 PDT
Created attachment 150155 [details]
Patch proposal

Attaching proposal of new TextCheckerEnchant class under WebCore/platform/text/gtk
Comment 2 Martin Robinson 2012-06-29 09:01:47 PDT
Comment on attachment 150155 [details]
Patch proposal

View in context: https://bugs.webkit.org/attachment.cgi?id=150155&action=review

Looks good, though I think we should make the interface more closely match WebCore style.

> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:57
> +void TextCheckerEnchant::ignoreWord(const char* word)
> +{
> +    for (Vector<EnchantDict*>::const_iterator iter = m_enchantDicts.begin(); iter != m_enchantDicts.end(); ++iter)
> +        enchant_dict_add_to_session(*iter, word, -1);
> +}
> +
> +void TextCheckerEnchant::learnWord(const char* word)
> +{

These should probably take Strings, which encapsulate encoding information better.

> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:62
> +void TextCheckerEnchant::checkSpellingOfString(const char* string, int* misspellingLocation, int* misspellingLength)

It think it makes sense to rename this to be more precise. checkSpellingOfString could be getLocationAndLengthOfFirstMisspellilng. This aligns with the WebKit style of using the word "get" for functions and methods that have output arguments.

> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:86
> +            while (attrs.get()[end].is_word_end < 1)
> +                end++;

You're missing the fix that I recently made to the WebKit1 enchant code to handle contractions.

> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:114
> +char** TextCheckerEnchant::getGuessesForWord(const char* word)

Since this is a WebCore interface, a Vector<String> makes more sense here.

> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:125
> +        if (numberOfSuggestions > 0) {

This looks like it should be an early continue.

> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:127
> +            if (numberOfSuggestions > 10)
> +                numberOfSuggestions = 10;

10 should probably be a constant like:
static const int maximumNumberOfSuggestions = 10;

> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:141
> +void TextCheckerEnchant::updateSpellCheckingLanguages(const char* languages)

It makes sense for this to take a String instead of a const char*.

> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:146
> +        char** langs = g_strsplit(languages, ",", -1);

You could use String.split here and avoid having to free the list of languages.

> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:179
> +#endif /* ENABLE(SPELLCHECK) */

Nit: You should use a C++ comment here.
Comment 3 Mario Sanchez Prada 2012-06-29 15:06:06 PDT
(In reply to comment #2)
> (From update of attachment 150155 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=150155&action=review
> 
> Looks good, though I think we should make the interface more closely match WebCore style.

Thanks for the quick review, Martin. As you can guess the patch is mostly about moving code down to WebCore, but I agree with the points you made and will work on that after coming from holidays.

Actually I shouldn't be checking mail now already... just couldn't help it :-)

Thanks,
Mario
Comment 4 Mario Sanchez Prada 2012-07-18 07:23:42 PDT
Comment on attachment 150155 [details]
Patch proposal

View in context: https://bugs.webkit.org/attachment.cgi?id=150155&action=review

Now working on this. See some comments in the meanwhile...

>> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:57
>> +{
> 
> These should probably take Strings, which encapsulate encoding information better.

Makes sense. I'll change that.

>> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:62
>> +void TextCheckerEnchant::checkSpellingOfString(const char* string, int* misspellingLocation, int* misspellingLength)
> 
> It think it makes sense to rename this to be more precise. checkSpellingOfString could be getLocationAndLengthOfFirstMisspellilng. This aligns with the WebKit style of using the word "get" for functions and methods that have output arguments.

Even if I agree with you in that something like that could be a better name as a general thought, I don't think we should do it in this case since it would be confusing, as checkSpellingOfString() is the expected name according to what is defined in WebCore/platform/text/TextCheckerClient.h:

  class TextCheckerClient {
  public:
      virtual ~TextCheckerClient() {}

      [...]
      virtual void learnWord(const String&) = 0;
      virtual void checkSpellingOfString(const UChar*, int length, int* misspellingLocation, int* misspellingLength) = 0;
      virtual String getAutoCorrectSuggestionForMisspelledWord(const String& misspelledWord) = 0;
      [...]
};

So, I think we should leave this name as is, to reflect that its related to that.

>> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:86
>> +                end++;
> 
> You're missing the fix that I recently made to the WebKit1 enchant code to handle contractions.

I guess you mean SVN r121360. I'll fix that in an upcoming patch.

>> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:114
>> +char** TextCheckerEnchant::getGuessesForWord(const char* word)
> 
> Since this is a WebCore interface, a Vector<String> makes more sense here.

Makes sense. I'll change that.

>> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:125
>> +        if (numberOfSuggestions > 0) {
> 
> This looks like it should be an early continue.

Ok.

>> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:127
>> +                numberOfSuggestions = 10;
> 
> 10 should probably be a constant like:
> static const int maximumNumberOfSuggestions = 10;

Just moved code from WebKit1, but I agree with this change too. I'll fix it.

>> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:141
>> +void TextCheckerEnchant::updateSpellCheckingLanguages(const char* languages)
> 
> It makes sense for this to take a String instead of a const char*.

Ok.

>> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:146
>> +        char** langs = g_strsplit(languages, ",", -1);
> 
> You could use String.split here and avoid having to free the list of languages.

Ok.

>> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:179
>> +#endif /* ENABLE(SPELLCHECK) */
> 
> Nit: You should use a C++ comment here.

Ok.
Comment 5 Mario Sanchez Prada 2012-07-18 10:12:14 PDT
Created attachment 153034 [details]
Patch proposal

New patch addressing issues pointed out by Martin
Comment 6 Mario Sanchez Prada 2012-07-20 08:47:41 PDT
Created attachment 153515 [details]
Patch proposal

I found today an issue in the previous patch in TextCheckerEnchant::~TextCheckerEnchant(), so this new patch basically fixes that.

Thus, this is the one to be reviewed. Sorry for the noise.
Comment 7 Martin Robinson 2012-07-28 04:49:56 PDT
Comment on attachment 153515 [details]
Patch proposal

View in context: https://bugs.webkit.org/attachment.cgi?id=153515&action=review

> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:30
> +static const size_t maximumNumberOfSuggestions = 10;

Since this is only used in one place, feel free to simply move this to right above where you use it.

> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:34
> +    Vector<CString>* dicts = static_cast<Vector<CString>*>(data);

dicts -> dictionaries

> Source/WebCore/platform/text/gtk/TextCheckerEnchant.h:35
> +class TextCheckerEnchant : public RefCounted<TextCheckerEnchant> {
> +public:
> +    static PassRefPtr<TextCheckerEnchant> create() { return adoptRef(new TextCheckerEnchant); }

This shouldn't be shared so there's no reason to make it RefCounted.

> Source/WebKit/gtk/webkit/webkitspellcheckerenchant.cpp:42
> +    RefPtr<TextCheckerEnchant> textCheckerEnchant;

This can just be TextCheckerEnchant textCheckerEnchant.
Comment 8 Mario Sanchez Prada 2012-07-28 11:28:47 PDT
Committed r123966: <http://trac.webkit.org/changeset/123966>
Comment 9 WebKit Review Bot 2012-07-30 10:06:25 PDT
Re-opened since this is blocked by 92655
Comment 10 Mario Sanchez Prada 2012-07-31 05:30:04 PDT
Created attachment 155504 [details]
Patch proposal

There was an issue using the new TextCheckerEnchant class from WebKitSpellCheckerEnchant, where we should use String::fromUTF8() instead of plainly using the String (const char*) constructor.

This new patch fixes that.
Comment 11 WebKit Review Bot 2012-07-31 05:44:12 PDT
Comment on attachment 153515 [details]
Patch proposal

Cleared Martin Robinson's review+ from obsolete attachment 153515 [details] so that this bug does not appear in http://webkit.org/pending-commit.
Comment 12 Martin Robinson 2012-07-31 08:25:46 PDT
Comment on attachment 155504 [details]
Patch proposal

Okay. Let's give it a shot. Sorry that I didn't see this issue in my previous reviews.
Comment 13 Mario Sanchez Prada 2012-07-31 08:51:52 PDT
Committed r124207: <http://trac.webkit.org/changeset/124207>
Comment 14 WebKit Review Bot 2012-07-31 11:08:30 PDT
Re-opened since this is blocked by 92773
Comment 15 Mario Sanchez Prada 2012-08-02 06:52:44 PDT
Created attachment 156071 [details]
Patch proposal

Strike 3.

So it seems the problem happened due to wrong usage of CString::length(). I was assuming it was returning the actual number of utf8 characters and it was actually returning the number of elements in CString's internal vector holding the data:

    size_t length = string.utf8().length();

...and this was causing trouble when special characters were present in the layout test (as it was the case for those failing tests)

The right version of this code (in TextEnchantChecker::checkSpellingOfString()) would be like this:

    size_t length = string.length();

This new patch fixed that.
Comment 16 Martin Robinson 2012-08-02 10:33:46 PDT
Comment on attachment 156071 [details]
Patch proposal

View in context: https://bugs.webkit.org/attachment.cgi?id=156071&action=review

Looks good, but I have one tiny suggestion before landing.

> Source/WebCore/platform/text/gtk/TextCheckerEnchant.cpp:165
> +            GOwnPtr<gchar> currentLanguage(g_strdup(iter->utf8().data()));

This can just be:

CString currentLanguage = iter->utf8();

and you could avoid one memory allocation.
Comment 17 Mario Sanchez Prada 2012-08-03 01:04:14 PDT
Committed r124578: <http://trac.webkit.org/changeset/124578>