94320 – Share WebKit-Gtk's Enchant implementation with others WebKit ports

RESOLVED FIXED94320

Share WebKit-Gtk's Enchant implementation with others WebKit ports

https://bugs.webkit.org/show_bug.cgi?id=94320

Summary Share WebKit-Gtk's Enchant implementation with others WebKit ports

Grzegorz Czajkowski

Reported 2012-08-17 02:29:29 PDT

WebKit-Gtk ensures spellchecking with the Enchant library support, mostly implemented at https://bugs.webkit.org/show_bug.cgi?id=90269. This code is used in both WebKit-Gtk. The WebKit-Efl's implemention of spellchecking is based on Enchant too. It doesn't have sense to duplicate implementation from text/gtk/TextCheckerEnchant* files for WebKit's Efl. To share WebKit-Gtk's implemnetation with others WebKit, this patch proposes: 1. Move TextCheckerEnchant{h.cpp} files form WebCore/platform/text/gtk/ to WebCore/platform/text/enchant/. 2. Limit Glib's calls as possible for example, GOwnPtr -> WTF::OwnPtr etc.

Attachments
proposed patch (12.27 KB, patch) 2012-08-21 07:55 PDT, Grzegorz Czajkowski	no flags	Details Formatted Diff Diff
updated patch (12.36 KB, patch) 2012-08-22 03:50 PDT, Grzegorz Czajkowski	no flags	Details Formatted Diff Diff
patch v3 (12.34 KB, patch) 2012-08-23 00:27 PDT, Grzegorz Czajkowski	no flags	Details Formatted Diff Diff
passing cstring to enchant (12.26 KB, patch) 2012-08-23 00:42 PDT, Grzegorz Czajkowski	no flags	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Grzegorz Czajkowski

Comment 1 2012-08-21 07:55:49 PDT

Created attachment 159689 [details] proposed patch

Carlos Garcia Campos

Comment 2 2012-08-21 09:53:53 PDT

Comment on attachment 159689 [details] proposed patch Since the GTK+ port already uses ICU (at least by default) maybe we could use the same code based on ICU and avoid the #ifdefs? And for the language maybe we could use WebCore::defaultLanguage() or any other method in Language.h. What you guys think?

Martin Robinson

Comment 3 2012-08-21 10:01:33 PDT

I think #ifdefs should be avoided here if possible and it seems like it's very possible.

Mario Sanchez Prada

Comment 4 2012-08-21 10:37:47 PDT

Comment on attachment 159689 [details] proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=159689&action=review I agree with Martin and Carlos's comments. Additionally, you're missing the fix for bug 94202 here, thus re-introducing the issue with this patch. See my comment below... > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:137 > + if (enchant_dict_check(*dictIter, word.utf8().data(), wordLength)) { You are probably missing the fix for bug 94202, committed last week, which is about not calling this function passing wordLength because it will fail for strings with special UTF8 characters, such as "דעפ", for instance. You should pass 'bytes' instead which is what enchant_dict_check actually expects. See bug 94202 for more details. > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:203 > + // No dictionaries selected, we get first from the list. "we get first from the list" -> "we get the first one from the list"

Grzegorz Czajkowski

Comment 5 2012-08-22 03:50:14 PDT

Created attachment 159895 [details] updated patch

Grzegorz Czajkowski

Comment 6 2012-08-22 04:03:04 PDT

(In reply to comment #2) > (From update of attachment 159689 [details]) > Since the GTK+ port already uses ICU (at least by default) maybe we could use the same code based on ICU and avoid the #ifdefs? Ok, I used ICU instead of Pango at the patch so any Platform specif code doesn't exist. > And for the language maybe we could use WebCore::defaultLanguage() or any other method in Language.h. What you guys think? Good idea. For WebKit-Gtk the method returns "en-US". There is no problem for the Enchant backed, its 'normalize' function is called to properly find the desired dictionary.

Grzegorz Czajkowski

Comment 7 2012-08-22 04:14:52 PDT

Comment on attachment 159689 [details] proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=159689&action=review >> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:137 >> + if (enchant_dict_check(*dictIter, word.utf8().data(), wordLength)) { > > You are probably missing the fix for bug 94202, committed last week, which is about not calling this function passing wordLength because it will fail for strings with special UTF8 characters, such as "דעפ", for instance. You should pass 'bytes' instead which is what enchant_dict_check actually expects. > > See bug 94202 for more details. You're right. This bug shows up that the String::fromUTF8() also requires the number of bytes instead of number of characters. To properly compute the number of bytes from utf8 string I had to use glib's API - g_utf8_offset_to_pointer() as it was done before. Unfortunately it makes TextCheckerEnchant implementation glib independent. Of course for Gtk, Efl it is no problem - we already use it. >> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:203 >> + // No dictionaries selected, we get first from the list. > > "we get first from the list" -> "we get the first one from the list" Fixed thanks.

Mario Sanchez Prada

Comment 8 2012-08-22 06:56:35 PDT

Comment on attachment 159895 [details] updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=159895&action=review Thanks for the update. However, I'm afraid you will need to provide yet another one mainly because a new patch has recently (today!) been committed that affects the base your patch :(. See bug 94683 for more details. Also, I commented a couple of things on this patch too. Hope you find them interesting / useful... > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:27 > +#include <glib.h> // For utf8 support. I don't think this comment is actually needed (also, it's not very common to have inline comments, even less for includes, AFAIK) > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:101 > + const char* cstart = cString + start; Shouldn't you use g_utf8_offset_to_pointer() here too? > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:106 > + if (enchant_dict_check(*dictIter, word.utf8().data(), numberOfBytes)) { I'm not completely sure, but I think it's not safe to call enchant_dict_check passing directly word.utf8().data() as parameter, since the CString returned by .utf8() will be disposed early and you'll have a gchar* (as returned by .data()) pointing to invalid memory. Thus, it's probably safer to do something like this in line 103: CString word = String::fromUTF8(cstart, numberOfBytes).utf8(); ... and then just call to word.data() from that point on where you need it (lines 106 and 130 only, I think) > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:130 > + char** suggestions = enchant_dict_suggest(*iter, word.utf8().data(), -1, &numberOfSuggestions); This line would also benefit of the CString declaration I mentioned before.

Martin Robinson

Comment 9 2012-08-22 08:22:10 PDT

Comment on attachment 159895 [details] updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=159895&action=review >> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:106 >> + if (enchant_dict_check(*dictIter, word.utf8().data(), numberOfBytes)) { > > I'm not completely sure, but I think it's not safe to call enchant_dict_check passing directly word.utf8().data() as parameter, since the CString returned by .utf8() will be disposed early and you'll have a gchar* (as returned by .data()) pointing to invalid memory. > > Thus, it's probably safer to do something like this in line 103: > > CString word = String::fromUTF8(cstart, numberOfBytes).utf8(); > > ... and then just call to word.data() from that point on where you need it (lines 106 and 130 only, I think) It's safe to use .data(), but not to store it and use it on a later line.

Grzegorz Czajkowski

Comment 10 2012-08-23 00:10:46 PDT

(In reply to comment #8) > (From update of attachment 159895 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=159895&action=review > > Thanks for the update. However, I'm afraid you will need to provide yet another one mainly because a new patch has recently (today!) been committed that affects the base your patch :(. See bug 94683 for more details. Thanks for the information. This change is good for us too :) Our internal implementation also bases on vector to pass/get languages. Frankly speaking I wanted to change in the another patch:) > > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:27 > > +#include <glib.h> // For utf8 support. > > I don't think this comment is actually needed (also, it's not very common to have inline comments, even less for includes, AFAIK) You're right. I removed this useless comment. > > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:101 > > + const char* cstart = cString + start; > > Shouldn't you use g_utf8_offset_to_pointer() here too? Ok, I wanted to limit GLib's invocation as possible. The beginning of the word can be determined based on raw pointers - it doesn't matter whether string contains UTF8 data or not. I couldn't find alternative way how to determine the number of bytes of the string that is passes in UTF8 (only GLib API works) we can restore origin code. We have to use GLib anyway :) > > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:106 > > + if (enchant_dict_check(*dictIter, word.utf8().data(), numberOfBytes)) { > > I'm not completely sure, but I think it's not safe to call enchant_dict_check passing directly word.utf8().data() as parameter, since the CString returned by .utf8() will be disposed early and you'll have a gchar* (as returned by .data()) pointing to invalid memory. > > Thus, it's probably safer to do something like this in line 103: > > CString word = String::fromUTF8(cstart, numberOfBytes).utf8(); > > ... and then just call to word.data() from that point on where you need it (lines 106 and 130 only, I think) I understand your idea. I tested this patch and I didn't notice any issues with that. As Martin said using word().utf8().data() is safety. Anyway 'ignoreWord', 'learnWord', 'getGuessesForWord' also use word.utf8().data() directly. > > > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:130 > > + char** suggestions = enchant_dict_suggest(*iter, word.utf8().data(), -1, &numberOfSuggestions); > > This line would also benefit of the CString declaration I mentioned before. Here only gchar -> char is being changed. So I leave word.utf8().data() as it is.

Carlos Garcia Campos

Comment 11 2012-08-23 00:18:25 PDT

Comment on attachment 159895 [details] updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=159895&action=review > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:103 > + String word = String::fromUTF8(cstart, numberOfBytes); Why converting it to String here? the string is then converted to utf8 before it's passed to enchant, so I think it would be better to use a CString in this case and pass word.data() to enchant, or even use cstart directly since the number of bytes is passed to enchant too.

Grzegorz Czajkowski

Comment 12 2012-08-23 00:27:30 PDT

Created attachment 160101 [details] patch v3

Grzegorz Czajkowski

Comment 13 2012-08-23 00:42:45 PDT

Created attachment 160104 [details] passing cstring to enchant Thanks Carlos for suggestion - applied.

Carlos Garcia Campos

Comment 14 2012-08-24 06:05:35 PDT

Comment on attachment 160104 [details] passing cstring to enchant Looks good, thanks!

Grzegorz Czajkowski

Comment 15 2012-08-24 06:16:42 PDT

(In reply to comment #14) > (From update of attachment 160104 [details]) > Looks good, thanks! Thanks! This patch changes base code of spelling for WebKit-Gtk. I will set cq+ if unit and layout tests pass.

Grzegorz Czajkowski

Comment 16 2012-08-24 06:38:02 PDT

(In reply to comment #15) > (In reply to comment #14) > > (From update of attachment 160104 [details] [details]) > > Looks good, thanks! > > Thanks! > This patch changes base code of spelling for WebKit-Gtk. I will set cq+ if unit and layout tests pass. Unit test result: TestWebKitWebContext which tests spelling passes. TEST: ./Tools/gtk/../Scripts/../../WebKitBuild/Release/Programs/WebKit2APITests/TestWebKitWebContext... (pid=12321) PASS: ./Tools/gtk/../Scripts/../../WebKitBuild/Release/Programs/WebKit2APITests/TestWebKitWebContext Layout test result: ./Tools/Scripts/run-webkit-tests --gtk editing/spelling Found 27 tests; running 25, skipping 2. Running 1 DumpRenderTree over 1 shard. [1/25] editing/spelling/context-menu-suggestions.html failed [2/25] editing/spelling/design-mode-spellcheck-off.html passed [3/25] editing/spelling/grammar-edit-word.html failed [4/25] editing/spelling/grammar-markers.html passed [5/25] editing/spelling/grammar.html failed [6/25] editing/spelling/inline_spelling_markers.html passed [7/25] editing/spelling/spellcheck-api-crash.html passed [8/25] editing/spelling/spellcheck-async-mutation.html failed [9/25] editing/spelling/spellcheck-async-remove-frame.html passed [10/25] editing/spelling/spellcheck-async.html failed [11/25] editing/spelling/spellcheck-attribute.html passed [12/25] editing/spelling/spellcheck-input-search-crash.html passed [13/25] editing/spelling/spellcheck-paste-disabled.html passed [14/25] editing/spelling/spellcheck-paste.html passed [15/25] editing/spelling/spellcheck-queue.html failed [16/25] editing/spelling/spellcheck-sequencenum.html failed [17/25] editing/spelling/spelling-attribute-at-child.html passed [18/25] editing/spelling/spelling-attribute-change.html passed [19/25] editing/spelling/spelling-backspace-between-lines.html passed [20/25] editing/spelling/spelling-hasspellingmarker.html passed [21/25] editing/spelling/spelling-insert-html.html passed [22/25] editing/spelling/spelling-linebreak.html passed [23/25] editing/spelling/spelling-marker-description.html failed [24/25] editing/spelling/spelling-unified-emulation.html failed [25/25] editing/spelling/spelling.html passed All 25 tests ran as expected.

Grzegorz Czajkowski

Comment 17 2012-08-24 06:43:37 PDT

Comment on attachment 160104 [details] passing cstring to enchant No regression, cq+.

WebKit Review Bot

Comment 18 2012-08-24 07:01:38 PDT

Comment on attachment 160104 [details] passing cstring to enchant Clearing flags on attachment: 160104 Committed r126583: <http://trac.webkit.org/changeset/126583>

WebKit Review Bot

Comment 19 2012-08-24 07:01:43 PDT

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS Linux

Product WebKit

Component WebCore Misc.

Assignee

Grzegorz Czajkowski

Reported

2012-08-17 02:29 PDT

Modified

2012-08-24 07:01 PDT History

CC List

5 users Show

URL

Keywords

Depends on

Blocks

91854

Dependencies

tree graph