WebKit-Gtk ensures spellchecking with the Enchant library support, mostly implemented at https://bugs.webkit.org/show_bug.cgi?id=90269. This code is used in both WebKit-Gtk. The WebKit-Efl's implemention of spellchecking is based on Enchant too. It doesn't have sense to duplicate implementation from text/gtk/TextCheckerEnchant* files for WebKit's Efl. To share WebKit-Gtk's implemnetation with others WebKit, this patch proposes: 1. Move TextCheckerEnchant{h.cpp} files form WebCore/platform/text/gtk/ to WebCore/platform/text/enchant/. 2. Limit Glib's calls as possible for example, GOwnPtr -> WTF::OwnPtr etc.
Created attachment 159689 [details] proposed patch
Comment on attachment 159689 [details] proposed patch Since the GTK+ port already uses ICU (at least by default) maybe we could use the same code based on ICU and avoid the #ifdefs? And for the language maybe we could use WebCore::defaultLanguage() or any other method in Language.h. What you guys think?
I think #ifdefs should be avoided here if possible and it seems like it's very possible.
Comment on attachment 159689 [details] proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=159689&action=review I agree with Martin and Carlos's comments. Additionally, you're missing the fix for bug 94202 here, thus re-introducing the issue with this patch. See my comment below... > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:137 > + if (enchant_dict_check(*dictIter, word.utf8().data(), wordLength)) { You are probably missing the fix for bug 94202, committed last week, which is about not calling this function passing wordLength because it will fail for strings with special UTF8 characters, such as "דעפ", for instance. You should pass 'bytes' instead which is what enchant_dict_check actually expects. See bug 94202 for more details. > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:203 > + // No dictionaries selected, we get first from the list. "we get first from the list" -> "we get the first one from the list"
Created attachment 159895 [details] updated patch
(In reply to comment #2) > (From update of attachment 159689 [details]) > Since the GTK+ port already uses ICU (at least by default) maybe we could use the same code based on ICU and avoid the #ifdefs? Ok, I used ICU instead of Pango at the patch so any Platform specif code doesn't exist. > And for the language maybe we could use WebCore::defaultLanguage() or any other method in Language.h. What you guys think? Good idea. For WebKit-Gtk the method returns "en-US". There is no problem for the Enchant backed, its 'normalize' function is called to properly find the desired dictionary.
Comment on attachment 159689 [details] proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=159689&action=review >> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:137 >> + if (enchant_dict_check(*dictIter, word.utf8().data(), wordLength)) { > > You are probably missing the fix for bug 94202, committed last week, which is about not calling this function passing wordLength because it will fail for strings with special UTF8 characters, such as "דעפ", for instance. You should pass 'bytes' instead which is what enchant_dict_check actually expects. > > See bug 94202 for more details. You're right. This bug shows up that the String::fromUTF8() also requires the number of bytes instead of number of characters. To properly compute the number of bytes from utf8 string I had to use glib's API - g_utf8_offset_to_pointer() as it was done before. Unfortunately it makes TextCheckerEnchant implementation glib independent. Of course for Gtk, Efl it is no problem - we already use it. >> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:203 >> + // No dictionaries selected, we get first from the list. > > "we get first from the list" -> "we get the first one from the list" Fixed thanks.
Comment on attachment 159895 [details] updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=159895&action=review Thanks for the update. However, I'm afraid you will need to provide yet another one mainly because a new patch has recently (today!) been committed that affects the base your patch :(. See bug 94683 for more details. Also, I commented a couple of things on this patch too. Hope you find them interesting / useful... > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:27 > +#include <glib.h> // For utf8 support. I don't think this comment is actually needed (also, it's not very common to have inline comments, even less for includes, AFAIK) > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:101 > + const char* cstart = cString + start; Shouldn't you use g_utf8_offset_to_pointer() here too? > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:106 > + if (enchant_dict_check(*dictIter, word.utf8().data(), numberOfBytes)) { I'm not completely sure, but I think it's not safe to call enchant_dict_check passing directly word.utf8().data() as parameter, since the CString returned by .utf8() will be disposed early and you'll have a gchar* (as returned by .data()) pointing to invalid memory. Thus, it's probably safer to do something like this in line 103: CString word = String::fromUTF8(cstart, numberOfBytes).utf8(); ... and then just call to word.data() from that point on where you need it (lines 106 and 130 only, I think) > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:130 > + char** suggestions = enchant_dict_suggest(*iter, word.utf8().data(), -1, &numberOfSuggestions); This line would also benefit of the CString declaration I mentioned before.
Comment on attachment 159895 [details] updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=159895&action=review >> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:106 >> + if (enchant_dict_check(*dictIter, word.utf8().data(), numberOfBytes)) { > > I'm not completely sure, but I think it's not safe to call enchant_dict_check passing directly word.utf8().data() as parameter, since the CString returned by .utf8() will be disposed early and you'll have a gchar* (as returned by .data()) pointing to invalid memory. > > Thus, it's probably safer to do something like this in line 103: > > CString word = String::fromUTF8(cstart, numberOfBytes).utf8(); > > ... and then just call to word.data() from that point on where you need it (lines 106 and 130 only, I think) It's safe to use .data(), but not to store it and use it on a later line.
(In reply to comment #8) > (From update of attachment 159895 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=159895&action=review > > Thanks for the update. However, I'm afraid you will need to provide yet another one mainly because a new patch has recently (today!) been committed that affects the base your patch :(. See bug 94683 for more details. Thanks for the information. This change is good for us too :) Our internal implementation also bases on vector to pass/get languages. Frankly speaking I wanted to change in the another patch:) > > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:27 > > +#include <glib.h> // For utf8 support. > > I don't think this comment is actually needed (also, it's not very common to have inline comments, even less for includes, AFAIK) You're right. I removed this useless comment. > > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:101 > > + const char* cstart = cString + start; > > Shouldn't you use g_utf8_offset_to_pointer() here too? Ok, I wanted to limit GLib's invocation as possible. The beginning of the word can be determined based on raw pointers - it doesn't matter whether string contains UTF8 data or not. I couldn't find alternative way how to determine the number of bytes of the string that is passes in UTF8 (only GLib API works) we can restore origin code. We have to use GLib anyway :) > > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:106 > > + if (enchant_dict_check(*dictIter, word.utf8().data(), numberOfBytes)) { > > I'm not completely sure, but I think it's not safe to call enchant_dict_check passing directly word.utf8().data() as parameter, since the CString returned by .utf8() will be disposed early and you'll have a gchar* (as returned by .data()) pointing to invalid memory. > > Thus, it's probably safer to do something like this in line 103: > > CString word = String::fromUTF8(cstart, numberOfBytes).utf8(); > > ... and then just call to word.data() from that point on where you need it (lines 106 and 130 only, I think) I understand your idea. I tested this patch and I didn't notice any issues with that. As Martin said using word().utf8().data() is safety. Anyway 'ignoreWord', 'learnWord', 'getGuessesForWord' also use word.utf8().data() directly. > > > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:130 > > + char** suggestions = enchant_dict_suggest(*iter, word.utf8().data(), -1, &numberOfSuggestions); > > This line would also benefit of the CString declaration I mentioned before. Here only gchar -> char is being changed. So I leave word.utf8().data() as it is.
Comment on attachment 159895 [details] updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=159895&action=review > Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:103 > + String word = String::fromUTF8(cstart, numberOfBytes); Why converting it to String here? the string is then converted to utf8 before it's passed to enchant, so I think it would be better to use a CString in this case and pass word.data() to enchant, or even use cstart directly since the number of bytes is passed to enchant too.
Created attachment 160101 [details] patch v3
Created attachment 160104 [details] passing cstring to enchant Thanks Carlos for suggestion - applied.
Comment on attachment 160104 [details] passing cstring to enchant Looks good, thanks!
(In reply to comment #14) > (From update of attachment 160104 [details]) > Looks good, thanks! Thanks! This patch changes base code of spelling for WebKit-Gtk. I will set cq+ if unit and layout tests pass.
(In reply to comment #15) > (In reply to comment #14) > > (From update of attachment 160104 [details] [details]) > > Looks good, thanks! > > Thanks! > This patch changes base code of spelling for WebKit-Gtk. I will set cq+ if unit and layout tests pass. Unit test result: TestWebKitWebContext which tests spelling passes. TEST: ./Tools/gtk/../Scripts/../../WebKitBuild/Release/Programs/WebKit2APITests/TestWebKitWebContext... (pid=12321) PASS: ./Tools/gtk/../Scripts/../../WebKitBuild/Release/Programs/WebKit2APITests/TestWebKitWebContext Layout test result: ./Tools/Scripts/run-webkit-tests --gtk editing/spelling Found 27 tests; running 25, skipping 2. Running 1 DumpRenderTree over 1 shard. [1/25] editing/spelling/context-menu-suggestions.html failed [2/25] editing/spelling/design-mode-spellcheck-off.html passed [3/25] editing/spelling/grammar-edit-word.html failed [4/25] editing/spelling/grammar-markers.html passed [5/25] editing/spelling/grammar.html failed [6/25] editing/spelling/inline_spelling_markers.html passed [7/25] editing/spelling/spellcheck-api-crash.html passed [8/25] editing/spelling/spellcheck-async-mutation.html failed [9/25] editing/spelling/spellcheck-async-remove-frame.html passed [10/25] editing/spelling/spellcheck-async.html failed [11/25] editing/spelling/spellcheck-attribute.html passed [12/25] editing/spelling/spellcheck-input-search-crash.html passed [13/25] editing/spelling/spellcheck-paste-disabled.html passed [14/25] editing/spelling/spellcheck-paste.html passed [15/25] editing/spelling/spellcheck-queue.html failed [16/25] editing/spelling/spellcheck-sequencenum.html failed [17/25] editing/spelling/spelling-attribute-at-child.html passed [18/25] editing/spelling/spelling-attribute-change.html passed [19/25] editing/spelling/spelling-backspace-between-lines.html passed [20/25] editing/spelling/spelling-hasspellingmarker.html passed [21/25] editing/spelling/spelling-insert-html.html passed [22/25] editing/spelling/spelling-linebreak.html passed [23/25] editing/spelling/spelling-marker-description.html failed [24/25] editing/spelling/spelling-unified-emulation.html failed [25/25] editing/spelling/spelling.html passed All 25 tests ran as expected.
Comment on attachment 160104 [details] passing cstring to enchant No regression, cq+.
Comment on attachment 160104 [details] passing cstring to enchant Clearing flags on attachment: 160104 Committed r126583: <http://trac.webkit.org/changeset/126583>
All reviewed patches have been landed. Closing bug.