Bug 149056 - Spellchecker rejects word when adding a period or colon character if there is no trailing space before the next word
Summary: Spellchecker rejects word when adding a period or colon character if there is...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: Other
Hardware: PC Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 153291
  Show dependency treegraph
 
Reported: 2015-09-10 20:59 PDT by Michael Catanzaro
Modified: 2016-03-10 06:38 PST (History)
10 users (show)

See Also:


Attachments
Check every parts of words containing periods or colons. (4.41 KB, patch)
2016-02-12 00:57 PST, Adrien Plazas
mrobinson: review-
Details | Formatted Diff | Diff
Screenshot of spellchecking before and after the patch (28.53 KB, image/png)
2016-02-12 01:02 PST, Adrien Plazas
no flags Details
Expand newAdjacentwords selection from its boundaries (3.28 KB, patch)
2016-03-03 04:52 PST, Adrien Plazas
mcatanzaro: review-
buildbot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews115 for mac-yosemite (553.94 KB, application/zip)
2016-03-03 05:35 PST, Build Bot
no flags Details
Expand newAdjacentwords selection from its boundaries (3.60 KB, patch)
2016-03-03 07:45 PST, Adrien Plazas
darin: review-
darin: commit-queue-
Details | Formatted Diff | Diff
Expand newAdjacentwords selection from its boundaries (8.58 KB, patch)
2016-03-08 02:54 PST, Adrien Plazas
darin: review-
buildbot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews100 for mac-yosemite (762.00 KB, application/zip)
2016-03-08 03:41 PST, Build Bot
no flags Details
Archive of layout-test-results from ews107 for mac-yosemite-wk2 (809.81 KB, application/zip)
2016-03-08 03:43 PST, Build Bot
no flags Details
Archive of layout-test-results from ews112 for mac-yosemite (830.45 KB, application/zip)
2016-03-08 03:56 PST, Build Bot
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Catanzaro 2015-09-10 20:59:31 PDT
Say you have a text entry containing some text:

This is my text!

Then say you want to revise that sentence by adding more text before it. I will use the ^ to indicate the character that the cursor is positioned immediately IN FRONT OF. You position the cursor before your existing text:

This is my text!
^

then you start typing:

This is some more textThis is my text!
                      ^

So far so good: nothing appears as misspelled. Then you type the period:

This is some more text.This is my text!
                       ^

And now the word "text" is improperly highlighted as misspelled, and stays that way after you add the space. It's quite interesting that this problem does not occur for other punctuation like exclamation points or commas, only for periods.
Comment 1 Joanmarie Diggs 2016-01-20 15:48:07 PST
This issue has some interesting implications for screen reader users who navigate into the text. Orca uses the accessible text attributes to identify the presence of the red squiggly and speak "misspelled" when that attribute is present.

(Note: At the moment there are some other, accessibility-related bugs which are interfering with the aforementioned functionality. But when those are addressed, it would be nice if arrowing into the error didn't make the error indication vanish. So that Orca could make this presentation.)
Comment 2 Joanmarie Diggs 2016-01-20 15:55:47 PST
(In reply to comment #1)

> [Text belonging in bug 153290]

Sorry for the noise!
Comment 3 Adrien Plazas 2016-02-11 00:42:22 PST
It does the same with the column character too.

I couldn't find any other punctuation characters than the period and the column doing so, despite trying some "exotic" ones like «»¿¡…§ amongs many others.
Comment 4 Adrien Plazas 2016-02-11 06:17:02 PST
Not splitting on colons makes sense as it is used in some languages like Swedish and Finnish: https://en.wikipedia.org/wiki/Colon_(punctuation)#Word-medial_separator
Comment 5 Adrien Plazas 2016-02-11 06:20:19 PST
Not splitting on periods makes sense as they are used in American English for acronyms: https://en.wikipedia.org/wiki/Full_stop#Acronyms_and_initialisms

Hence I suppose it's not a bug?
Comment 6 Adrien Plazas 2016-02-12 00:57:51 PST
Created attachment 271148 [details]
Check every parts of words containing periods or colons.
Comment 7 WebKit Commit Bot 2016-02-12 01:00:10 PST
Attachment 271148 [details] did not pass style-queue:


ERROR: Source/WebCore/ChangeLog:8:  You should remove the 'No new tests' and either add and list tests, or explain why no new tests were possible.  [changelog/nonewtests] [5]
Total errors found: 1 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 Adrien Plazas 2016-02-12 01:02:40 PST
Created attachment 271149 [details]
Screenshot of spellchecking before and after the patch
Comment 9 Martin Robinson 2016-02-12 13:53:49 PST
Comment on attachment 271148 [details]
Check every parts of words containing periods or colons.

View in context: https://bugs.webkit.org/attachment.cgi?id=271148&action=review

> Source/WebCore/ChangeLog:8
> +        No new tests (OOPS!).

I think this change deserves a test.

> Source/WebCore/ChangeLog:17
> +        - or Swedish and Finnish words containing a colon.

I think we need to understand why this is a problem for Enchant and not for the Mac port. If it is a problem for the Mac port, we likely need to do this fix at a higher level. We should compare the behavior of the GTK+ port to Safari.

> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:-102
>      int start = textBreakFirst(iter);
>      for (int end = textBreakNext(iter); end != TextBreakDone; end = textBreakNext(iter)) {
>          if (isWordTextBreak(iter)) {
> -            checkSpellingOfWord(utf8String, start, end, misspellingLocation, misspellingLength);

Why isn't wordBreakIterator breaking on the punctuation already? I think that understanding the context of the problem is important to reviewing the patch.
Comment 10 Adrien Plazas 2016-02-17 00:59:01 PST
(In reply to comment #9)
> Why isn't wordBreakIterator breaking on the punctuation already? I think
> that understanding the context of the problem is important to reviewing the
> patch.

wordBreakIterator seems to be pretty smart actually: not breaking on every punctuation mark seems to make sense in some languages as the punctuation marks can be considered part of the word itself (kind of as a word construction mechanism). Breaking such words would make them either not words or not the same words, hence not breaking the words seems to be the right thing to do.

That being said, this creates bugs when checking the spelling of sentences with a poor syntax (like ones with no space after a period or a colon), but this is a spell checking bug, not a word breaking one.
Comment 11 Michael Catanzaro 2016-02-17 11:19:09 PST
(In reply to comment #10)
> wordBreakIterator seems to be pretty smart actually: not breaking on every
> punctuation mark seems to make sense in some languages as the punctuation
> marks can be considered part of the word itself (kind of as a word
> construction mechanism).

It doesn't seem smart to me, if it's not actually able to detect word breaks properly. A punctuation mark might occur inside a word in some languages, but if it's followed by a space, then surely it is always a word break character? I haven't looked at this closely, but I suspect we are somehow misusing the ICU API, as I'd like to think it's smart enough to handle this. At least, detecting word breaks properly is a standard feature of GtkTextIter, so I would expect ICU to be able to do it as well. If we can't figure it out, might need to look at what GtkTextIter is doing.

> That being said, this creates bugs when checking the spelling of sentences
> with a poor syntax (like ones with no space after a period or a colon), but
> this is a spell checking bug, not a word breaking one.

In that case, I would expect the spellchecker to flag the word foo.bar as misspelled.
Comment 12 Michael Catanzaro 2016-02-17 11:24:30 PST
Here is the ICU documentation: http://userguide.icu-project.org/boundaryanalysis

Note that each of the four period characters is clearly shown to be a word boundary character in the example "Your balance is $1234.56... I think."

It says it follows this standard for determining word boundaries: http://www.unicode.org/reports/tr29/#Word_Boundaries

That standard includes a grammar used to determine whether the period (U+002 FULL STOP) as well as other punctuation characters constitutes a word boundary or not, depending on the context of its use. (E.g. in 1234.56 it's not a word boundary.) We of course don't want to implement that in WebKit; I suspect ICU has already done so.
Comment 13 Adrien Plazas 2016-02-24 04:42:55 PST
(In reply to comment #12)
> Here is the ICU documentation:
> http://userguide.icu-project.org/boundaryanalysis
> 
> Note that each of the four period characters is clearly shown to be a word
> boundary character in the example "Your balance is $1234.56... I think."

Only the three trailing ones are but not the one in the middle of the "word".

Also in http://www.unicode.org/reports/tr29/#Word_Boundaries:
"The goal of matching user perceptions cannot always be met exactly because the text alone does not always contain enough information to unambiguously decide boundaries. For example, the period (U+002E FULL STOP) is used ambiguously, sometimes for end-of-sentence purposes, sometimes for abbreviations, and sometimes for numbers. In most cases, however, programmatic text boundaries can match user perceptions quite closely, although sometimes the best that can be done is not to surprise the user."

They can't always decide when to break because of ambiguity and try to be conservative which is the right thing to do from their POV, letting thir users to choose what to do when there is ambiguity.

I don't see how to fix the spell checking without splitting the words into smaller pieces and maybe they have stronger rules to split than "words", but otherwise I don't think it's far fetched to split the words again ourselves.
Comment 14 Michael Catanzaro 2016-02-25 20:04:24 PST
I think the problem is in WebCore, it shouldn't be requesting we check the spelling of the word ("text.This" from comment #0) while the cursor is still in it.
Comment 15 Adrien Plazas 2016-03-01 07:43:43 PST
After investigation, the problem seems to be in Editor::editorUIUpdateTimerFired.

Editor::editorUIUpdateTimerFired checks the spelling and grammar and the results of the checking are correct, but later in the method in "if (!textChecker() || textChecker()->shouldEraseMarkersAfterChangeSelection(TextCheckingTypeSpelling))" parts of the markers are erased.

I'll keep investigating why this happens.
Comment 16 Adrien Plazas 2016-03-03 04:52:41 PST
Created attachment 272752 [details]
Expand newAdjacentwords selection from its boundaries
Comment 17 Build Bot 2016-03-03 05:35:20 PST
Comment on attachment 272752 [details]
Expand newAdjacentwords selection from its boundaries

Attachment 272752 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/915558

Number of test failures exceeded the failure limit.
Comment 18 Build Bot 2016-03-03 05:35:23 PST
Created attachment 272754 [details]
Archive of layout-test-results from ews115 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews115  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 19 Michael Catanzaro 2016-03-03 06:05:21 PST
Comment on attachment 272752 [details]
Expand newAdjacentwords selection from its boundaries

View in context: https://bugs.webkit.org/attachment.cgi?id=272752&action=review

Looks like it breaks 26230 layout tests on mac-debug. You can dig around in the archive posted by the bot to see what's wrong:

ASSERTION FAILED: anchorType() == PositionIsOffsetInAnchor
/Volumes/Data/EWS/WebKit/Source/WebCore/dom/Position.h(109) : int WebCore::Position::offsetInContainerNode() const
1   0x10d21d160 WTFCrash
2   0x111c195dc WebCore::Position::offsetInContainerNode() const
3   0x1123cf745 WebCore::Editor::editorUIUpdateTimerFired()
4   0x1123ced9d WebCore::Editor::updateEditorUINowIfScheduled()
5   0x1123ce7cf WebCore::Editor::appliedEditing(WTF::PassRefPtr<WebCore::CompositeEditCommand>)
6   0x111ef9b48 WebCore::CompositeEditCommand::apply()
7   0x111ef9a11 WebCore::applyCommand(WTF::PassRefPtr<WebCore::CompositeEditCommand>)
8   0x1123cb922 WebCore::Editor::deleteSelectionWithSmartDelete(bool, WebCore::EditAction)
9   0x1123d0fe7 WebCore::Editor::performCutOrCopy(WebCore::Editor::EditorActionSpecifier)
10  0x1123d0991 WebCore::Editor::cut()
11  0x1123e45e1 WebCore::executeCut(WebCore::Frame&, WebCore::Event*, WebCore::EditorCommandSource, WTF::String const&)
12  0x1123e3947 WebCore::Editor::Command::execute(WTF::String const&, WebCore::Event*) const
13  0x11223be59 WebCore::Document::execCommand(WTF::String const&, bool, WTF::String const&)
14  0x112c60c43 WebCore::jsDocumentPrototypeFunctionExecCommand(JSC::ExecState*)
15  0x58b086001028
16  0x10ce5829b llint_entry
17  0x10ce5829b llint_entry
18  0x10ce517ae vmEntryToJavaScript
19  0x10cc724da JSC::JITCode::execute(JSC::VM*, JSC::ProtoCallFrame*)
20  0x10cc0c1f1 JSC::Interpreter::executeCall(JSC::ExecState*, JSC::JSObject*, JSC::CallType, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&)
21  0x10c52c9fe JSC::call(JSC::ExecState*, JSC::JSValue, JSC::CallType, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&)
22  0x10c52ca63 JSC::call(JSC::ExecState*, JSC::JSValue, JSC::CallType, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&, WTF::NakedPtr<JSC::Exception>&)
23  0x10c52cc5b JSC::profiledCall(JSC::ExecState*, JSC::ProfilingReason, JSC::JSValue, JSC::CallType, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&, WTF::NakedPtr<JSC::Exception>&)
24  0x112bb78db WebCore::JSMainThreadExecState::profiledCall(JSC::ExecState*, JSC::ProfilingReason, JSC::JSValue, JSC::CallType, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&, WTF::NakedPtr<JSC::Exception>&)
25  0x112d6e81e WebCore::JSEventListener::handleEvent(WebCore::ScriptExecutionContext*, WebCore::Event*)
26  0x112455467 WebCore::EventTarget::fireEventListeners(WebCore::Event&, WebCore::EventTargetData*, WTF::Vector<WebCore::RegisteredEventListener, 1ul, WTF::CrashOnOverflow, 16ul>&)
27  0x112454c98 WebCore::EventTarget::fireEventListeners(WebCore::Event&)
28  0x11237ba60 WebCore::DOMWindow::dispatchEvent(WebCore::Event&, WebCore::EventTarget*)
29  0x1123834e2 WebCore::DOMWindow::dispatchLoadEvent()
30  0x112231a5d WebCore::Document::dispatchWindowLoadEvent()
31  0x11222de3e WebCore::Document::implicitClose()

Are you using a debug build? (You should be. ;)

> Source/WebCore/editing/Editor.cpp:3411
> +            int newAdjacentWordsLength = newAdjacentWords.end().offsetInContainerNode() - newAdjacentWords.start().offsetInContainerNode();

Nit: it's better to use auto in new code, so you don't have to look up the exact type, and in case the return type changes to something other than int in the future
Comment 20 Adrien Plazas 2016-03-03 07:45:03 PST
Created attachment 272757 [details]
Expand newAdjacentwords selection from its boundaries
Comment 21 Darin Adler 2016-03-03 08:53:20 PST
Comment on attachment 272757 [details]
Expand newAdjacentwords selection from its boundaries

View in context: https://bugs.webkit.org/attachment.cgi?id=272757&action=review

A bug fix like this needs a regression test to help make clear exactly what it changes and fixes. Any reason we don’t have one here?

> Source/WebCore/ChangeLog:3
> +        [GTK] Spellchecker rejects word when adding a period or colon character if there is no trailing space before the next word

This is not a GTK-specific patch and should not have the GTK prefix on it.

> Source/WebCore/editing/Editor.cpp:3407
> +            // being conservative allow to fix that and doesn't seem to disturb any other behavior.

"allow to fix that" is not the right phrase here

"fixes that" would be correct grammar

> Source/WebCore/editing/VisibleSelection.h:88
> +    int getLength() const { return end().computeOffsetInContainerNode() - start().computeOffsetInContainerNode(); }

This naming is not consistent with WebKit coding style. We don’t use get in the name of functions like this.

This code is wrong. It will give an incorrect value any time the start and end positions are not in the same container node, or when they are in the same container node, but it is not a text node.
Comment 22 Darin Adler 2016-03-03 08:53:54 PST
Thanks for tackling this. Still needs work but possibly on the right track.
Comment 23 Darin Adler 2016-03-03 08:55:41 PST
Comment on attachment 272757 [details]
Expand newAdjacentwords selection from its boundaries

View in context: https://bugs.webkit.org/attachment.cgi?id=272757&action=review

>> Source/WebCore/editing/VisibleSelection.h:88
>> +    int getLength() const { return end().computeOffsetInContainerNode() - start().computeOffsetInContainerNode(); }
> 
> This naming is not consistent with WebKit coding style. We don’t use get in the name of functions like this.
> 
> This code is wrong. It will give an incorrect value any time the start and end positions are not in the same container node, or when they are in the same container node, but it is not a text node.

One way to make cases that cover that would be spelling check in words when the text of the word crosses multiple elements, such as <span> elements.
Comment 24 Michael Catanzaro 2016-03-03 09:53:25 PST
Look in LayoutTests/editing/spelling for how to write spellchecker tests.
Comment 25 Michael Catanzaro 2016-03-03 10:02:49 PST
By the way, we weren't expecting this task to be nearly so complicated when we assigned it to you Adrian, you've gotten quite deep into WebCore already, when I had been expecting some simple thing to fix in TextCheckerEnchant. Learning how to write a layout test is the next big step!

If you don't understand anything about Darin's comments, please ask him for clarification; he understands this code much better than I do.
Comment 26 Adrien Plazas 2016-03-08 02:54:16 PST
Created attachment 273284 [details]
Expand newAdjacentwords selection from its boundaries

The definition of VisibleSelection::length() haven't been fixed yet.

In this version of the patch, I fixed the comment and added a test that fails without this patch and passes with it.
Comment 27 Build Bot 2016-03-08 03:41:28 PST
Comment on attachment 273284 [details]
Expand newAdjacentwords selection from its boundaries

Attachment 273284 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/941138

New failing tests:
editing/spelling/spelling-words-with-punctuation.html
Comment 28 Build Bot 2016-03-08 03:41:32 PST
Created attachment 273286 [details]
Archive of layout-test-results from ews100 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews100  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 29 Build Bot 2016-03-08 03:43:37 PST
Comment on attachment 273284 [details]
Expand newAdjacentwords selection from its boundaries

Attachment 273284 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/941140

New failing tests:
editing/spelling/spelling-words-with-punctuation.html
Comment 30 Build Bot 2016-03-08 03:43:40 PST
Created attachment 273287 [details]
Archive of layout-test-results from ews107 for mac-yosemite-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107  Port: mac-yosemite-wk2  Platform: Mac OS X 10.10.5
Comment 31 Build Bot 2016-03-08 03:56:14 PST
Comment on attachment 273284 [details]
Expand newAdjacentwords selection from its boundaries

Attachment 273284 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/941146

New failing tests:
editing/spelling/spelling-words-with-punctuation.html
Comment 32 Build Bot 2016-03-08 03:56:18 PST
Created attachment 273288 [details]
Archive of layout-test-results from ews112 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 33 Adrien Plazas 2016-03-09 02:43:38 PST
All tests using initSpellTest() from editing/spelling/resources/utils.js fail on Mac.

I suspect that either one these two lines form initSpellTest() don't do its job as expect:
    destination.focus();
    document.execCommand("InsertText", false, testText);
or that the timed out test is actually executed before the text have been inserted:
    shouldBecomeDifferent('internals.markerCountForNode(destination.childNodes[0], "spelling")', '0', ...);

This test always fail on Mac and prevent the tests from starting (they time out early).

It doesn't seem that the problem is in the test added by the patch and I currently have no idea how to make these tests work on Mac.
Comment 34 Michael Catanzaro 2016-03-09 07:18:22 PST
(In reply to comment #33)
> It doesn't seem that the problem is in the test added by the patch and I
> currently have no idea how to make these tests work on Mac.

Adrien, these tests pass on the bots without your patch. :)
Comment 35 Adrien Plazas 2016-03-10 02:30:23 PST
Are you sure ? I rebased on master, reverted the changes to Source/WebCore/editing/Editor.cpp and kept LayoutTests/editing/spelling/spelling-words-with-punctuation.html.

Then I ran:
Tools/Scripts/run-webkit-tests `git grep initSpellTest LayoutTests/editing/spelling/ | cut -d: -f1 | uniq | grep "\.html"`

On Linux GTK+:
- editing/spelling/context-menu-suggestions-multiword-selection.html timed out
- LayoutTests/editing/spelling/spelling-words-with-punctuation.html failed (as expected, the changes in Editor.cpp will make it pass)
- the 6 other tests passed

On Mac, all the 8 tests timed out.
Comment 36 Michael Catanzaro 2016-03-10 06:38:29 PST
The Mac tests are all passing on the bots without your patch, else the EWS would not have turned red. ;)