Bug 94320 - Share WebKit-Gtk's Enchant implementation with others WebKit ports
Summary: Share WebKit-Gtk's Enchant implementation with others WebKit ports
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All Linux
: P2 Normal
Assignee: Grzegorz Czajkowski
URL:
Keywords:
Depends on:
Blocks: 91854
  Show dependency treegraph
 
Reported: 2012-08-17 02:29 PDT by Grzegorz Czajkowski
Modified: 2012-08-24 07:01 PDT (History)
5 users (show)

See Also:


Attachments
proposed patch (12.27 KB, patch)
2012-08-21 07:55 PDT, Grzegorz Czajkowski
no flags Details | Formatted Diff | Diff
updated patch (12.36 KB, patch)
2012-08-22 03:50 PDT, Grzegorz Czajkowski
no flags Details | Formatted Diff | Diff
patch v3 (12.34 KB, patch)
2012-08-23 00:27 PDT, Grzegorz Czajkowski
no flags Details | Formatted Diff | Diff
passing cstring to enchant (12.26 KB, patch)
2012-08-23 00:42 PDT, Grzegorz Czajkowski
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Grzegorz Czajkowski 2012-08-17 02:29:29 PDT
WebKit-Gtk ensures spellchecking with the Enchant library support, mostly implemented at https://bugs.webkit.org/show_bug.cgi?id=90269.
This code is used in both WebKit-Gtk.
The WebKit-Efl's implemention of spellchecking is based on Enchant too. It doesn't have sense to duplicate implementation from text/gtk/TextCheckerEnchant* files for WebKit's Efl.

To share WebKit-Gtk's implemnetation with others WebKit, this patch proposes:
1. Move TextCheckerEnchant{h.cpp} files form WebCore/platform/text/gtk/ to WebCore/platform/text/enchant/.
2. Limit Glib's calls as possible for example, GOwnPtr -> WTF::OwnPtr etc.
Comment 1 Grzegorz Czajkowski 2012-08-21 07:55:49 PDT
Created attachment 159689 [details]
proposed patch
Comment 2 Carlos Garcia Campos 2012-08-21 09:53:53 PDT
Comment on attachment 159689 [details]
proposed patch

Since the GTK+ port already uses ICU (at least by default) maybe we could use the same code based on ICU and avoid the #ifdefs? And for the language maybe we could use WebCore::defaultLanguage() or any other method in Language.h. What you guys think?
Comment 3 Martin Robinson 2012-08-21 10:01:33 PDT
I think #ifdefs should be avoided here if possible and it seems like it's very possible.
Comment 4 Mario Sanchez Prada 2012-08-21 10:37:47 PDT
Comment on attachment 159689 [details]
proposed patch

View in context: https://bugs.webkit.org/attachment.cgi?id=159689&action=review

I agree with Martin and Carlos's comments. Additionally, you're missing the fix for bug 94202 here, thus re-introducing the issue with this patch.

See my comment below...

> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:137
> +                if (enchant_dict_check(*dictIter, word.utf8().data(), wordLength)) {

You are probably missing the fix for bug 94202, committed last week, which is about not calling this function passing wordLength because it will fail for strings with special UTF8 characters, such as "דעפ", for instance. You should pass 'bytes' instead which is what enchant_dict_check actually expects.

See bug 94202 for more details.

> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:203
> +            // No dictionaries selected, we get first from the list.

"we get first from the list" -> "we get the first one from the list"
Comment 5 Grzegorz Czajkowski 2012-08-22 03:50:14 PDT
Created attachment 159895 [details]
updated patch
Comment 6 Grzegorz Czajkowski 2012-08-22 04:03:04 PDT
(In reply to comment #2)
> (From update of attachment 159689 [details])
> Since the GTK+ port already uses ICU (at least by default) maybe we could use the same code based on ICU and avoid the #ifdefs?

Ok, I used ICU instead of Pango at the patch so any Platform specif code doesn't exist.

> And for the language maybe we could use WebCore::defaultLanguage() or any other method in Language.h. What you guys think?

Good idea.
For WebKit-Gtk the method returns "en-US". There is no problem for the Enchant backed, its 'normalize' function is called to properly find the desired dictionary.
Comment 7 Grzegorz Czajkowski 2012-08-22 04:14:52 PDT
Comment on attachment 159689 [details]
proposed patch

View in context: https://bugs.webkit.org/attachment.cgi?id=159689&action=review

>> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:137
>> +                if (enchant_dict_check(*dictIter, word.utf8().data(), wordLength)) {
> 
> You are probably missing the fix for bug 94202, committed last week, which is about not calling this function passing wordLength because it will fail for strings with special UTF8 characters, such as "דעפ", for instance. You should pass 'bytes' instead which is what enchant_dict_check actually expects.
> 
> See bug 94202 for more details.

You're right.
This bug shows up that the String::fromUTF8() also requires the number of bytes instead of number of characters.
To properly compute the number of bytes from utf8 string I had to use glib's API - g_utf8_offset_to_pointer() as it was done before. Unfortunately it makes TextCheckerEnchant implementation glib independent. Of course for Gtk, Efl it is no problem - we already use it.

>> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:203
>> +            // No dictionaries selected, we get first from the list.
> 
> "we get first from the list" -> "we get the first one from the list"

Fixed thanks.
Comment 8 Mario Sanchez Prada 2012-08-22 06:56:35 PDT
Comment on attachment 159895 [details]
updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=159895&action=review

Thanks for the update. However, I'm afraid you will need to provide yet another one mainly because a new patch has recently (today!) been committed that affects the base your patch :(. See bug 94683 for more details.

Also, I commented a couple of things on this patch too. Hope you find them interesting / useful...

> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:27
> +#include <glib.h> // For utf8 support.

I don't think this comment is actually needed (also, it's not very common to have inline comments, even less for includes, AFAIK)

> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:101
> +            const char* cstart = cString + start;

Shouldn't you use g_utf8_offset_to_pointer() here too?

> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:106
> +                if (enchant_dict_check(*dictIter, word.utf8().data(), numberOfBytes)) {

I'm not completely sure, but I think it's not safe to call enchant_dict_check passing directly word.utf8().data() as parameter, since the CString returned by .utf8() will be disposed early and you'll have a gchar* (as returned by .data()) pointing to invalid memory.

Thus, it's probably safer to do something like this in line 103:

  CString word = String::fromUTF8(cstart, numberOfBytes).utf8();

... and then just call to word.data() from that point on where you need it (lines 106 and 130 only, I think)

> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:130
> +        char** suggestions = enchant_dict_suggest(*iter, word.utf8().data(), -1, &numberOfSuggestions);

This line would also benefit of the CString declaration I mentioned before.
Comment 9 Martin Robinson 2012-08-22 08:22:10 PDT
Comment on attachment 159895 [details]
updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=159895&action=review

>> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:106
>> +                if (enchant_dict_check(*dictIter, word.utf8().data(), numberOfBytes)) {
> 
> I'm not completely sure, but I think it's not safe to call enchant_dict_check passing directly word.utf8().data() as parameter, since the CString returned by .utf8() will be disposed early and you'll have a gchar* (as returned by .data()) pointing to invalid memory.
> 
> Thus, it's probably safer to do something like this in line 103:
> 
>   CString word = String::fromUTF8(cstart, numberOfBytes).utf8();
> 
> ... and then just call to word.data() from that point on where you need it (lines 106 and 130 only, I think)

It's safe to use .data(), but not to store it and use it on a later line.
Comment 10 Grzegorz Czajkowski 2012-08-23 00:10:46 PDT
(In reply to comment #8)
> (From update of attachment 159895 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=159895&action=review
> 
> Thanks for the update. However, I'm afraid you will need to provide yet another one mainly because a new patch has recently (today!) been committed that affects the base your patch :(. See bug 94683 for more details.

Thanks for the information. This change is good for us too :) Our internal implementation also bases on vector to pass/get languages. Frankly speaking I wanted to change in the another patch:)

> > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:27
> > +#include <glib.h> // For utf8 support.
> 
> I don't think this comment is actually needed (also, it's not very common to have inline comments, even less for includes, AFAIK)

You're right. I removed this useless comment.

> > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:101
> > +            const char* cstart = cString + start;
> 
> Shouldn't you use g_utf8_offset_to_pointer() here too?

Ok, I wanted to limit GLib's invocation as possible. The beginning of the word can be determined based on raw pointers - it doesn't matter whether string contains UTF8 data or not. I couldn't find alternative way how to determine the number of bytes of the string that is passes in UTF8 (only GLib API works) we can restore origin code. We have to use GLib anyway :)

> > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:106
> > +                if (enchant_dict_check(*dictIter, word.utf8().data(), numberOfBytes)) {
> 
> I'm not completely sure, but I think it's not safe to call enchant_dict_check passing directly word.utf8().data() as parameter, since the CString returned by .utf8() will be disposed early and you'll have a gchar* (as returned by .data()) pointing to invalid memory.
> 
> Thus, it's probably safer to do something like this in line 103:
> 
>   CString word = String::fromUTF8(cstart, numberOfBytes).utf8();
> 
> ... and then just call to word.data() from that point on where you need it (lines 106 and 130 only, I think)

I understand your idea. I tested this patch and I didn't notice any issues with that. As Martin said using word().utf8().data() is safety. Anyway 'ignoreWord', 'learnWord', 'getGuessesForWord' also use word.utf8().data() directly.

> 
> > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:130
> > +        char** suggestions = enchant_dict_suggest(*iter, word.utf8().data(), -1, &numberOfSuggestions);
> 
> This line would also benefit of the CString declaration I mentioned before.

Here only gchar -> char is being changed. So I leave word.utf8().data() as it is.
Comment 11 Carlos Garcia Campos 2012-08-23 00:18:25 PDT
Comment on attachment 159895 [details]
updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=159895&action=review

> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:103
> +            String word = String::fromUTF8(cstart, numberOfBytes);

Why converting it to String here? the string is then converted to utf8 before it's passed to enchant, so I think it would be better to use a CString in this case and pass word.data() to enchant, or even use cstart directly since the number of bytes is passed to enchant too.
Comment 12 Grzegorz Czajkowski 2012-08-23 00:27:30 PDT
Created attachment 160101 [details]
patch v3
Comment 13 Grzegorz Czajkowski 2012-08-23 00:42:45 PDT
Created attachment 160104 [details]
passing cstring to enchant

Thanks Carlos for suggestion - applied.
Comment 14 Carlos Garcia Campos 2012-08-24 06:05:35 PDT
Comment on attachment 160104 [details]
passing cstring to enchant

Looks good, thanks!
Comment 15 Grzegorz Czajkowski 2012-08-24 06:16:42 PDT
(In reply to comment #14)
> (From update of attachment 160104 [details])
> Looks good, thanks!

Thanks!
This patch changes base code of spelling for WebKit-Gtk. I will set cq+ if unit and layout tests pass.
Comment 16 Grzegorz Czajkowski 2012-08-24 06:38:02 PDT
(In reply to comment #15)
> (In reply to comment #14)
> > (From update of attachment 160104 [details] [details])
> > Looks good, thanks!
> 
> Thanks!
> This patch changes base code of spelling for WebKit-Gtk. I will set cq+ if unit and layout tests pass.

Unit test result:
TestWebKitWebContext which tests spelling passes.

TEST: ./Tools/gtk/../Scripts/../../WebKitBuild/Release/Programs/WebKit2APITests/TestWebKitWebContext... (pid=12321)
PASS: ./Tools/gtk/../Scripts/../../WebKitBuild/Release/Programs/WebKit2APITests/TestWebKitWebContext

Layout test result:
./Tools/Scripts/run-webkit-tests --gtk editing/spelling

Found 27 tests; running 25, skipping 2.
Running 1 DumpRenderTree over 1 shard.

[1/25] editing/spelling/context-menu-suggestions.html failed
[2/25] editing/spelling/design-mode-spellcheck-off.html passed
[3/25] editing/spelling/grammar-edit-word.html failed
[4/25] editing/spelling/grammar-markers.html passed
[5/25] editing/spelling/grammar.html failed
[6/25] editing/spelling/inline_spelling_markers.html passed
[7/25] editing/spelling/spellcheck-api-crash.html passed
[8/25] editing/spelling/spellcheck-async-mutation.html failed
[9/25] editing/spelling/spellcheck-async-remove-frame.html passed
[10/25] editing/spelling/spellcheck-async.html failed
[11/25] editing/spelling/spellcheck-attribute.html passed
[12/25] editing/spelling/spellcheck-input-search-crash.html passed
[13/25] editing/spelling/spellcheck-paste-disabled.html passed
[14/25] editing/spelling/spellcheck-paste.html passed
[15/25] editing/spelling/spellcheck-queue.html failed
[16/25] editing/spelling/spellcheck-sequencenum.html failed
[17/25] editing/spelling/spelling-attribute-at-child.html passed
[18/25] editing/spelling/spelling-attribute-change.html passed
[19/25] editing/spelling/spelling-backspace-between-lines.html passed
[20/25] editing/spelling/spelling-hasspellingmarker.html passed
[21/25] editing/spelling/spelling-insert-html.html passed
[22/25] editing/spelling/spelling-linebreak.html passed
[23/25] editing/spelling/spelling-marker-description.html failed
[24/25] editing/spelling/spelling-unified-emulation.html failed
[25/25] editing/spelling/spelling.html passed

All 25 tests ran as expected.
Comment 17 Grzegorz Czajkowski 2012-08-24 06:43:37 PDT
Comment on attachment 160104 [details]
passing cstring to enchant

No regression, cq+.
Comment 18 WebKit Review Bot 2012-08-24 07:01:38 PDT
Comment on attachment 160104 [details]
passing cstring to enchant

Clearing flags on attachment: 160104

Committed r126583: <http://trac.webkit.org/changeset/126583>
Comment 19 WebKit Review Bot 2012-08-24 07:01:43 PDT
All reviewed patches have been landed.  Closing bug.