WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
149056
Spellchecker rejects word when adding a period or colon character if there is no trailing space before the next word
https://bugs.webkit.org/show_bug.cgi?id=149056
Summary
Spellchecker rejects word when adding a period or colon character if there is...
Michael Catanzaro
Reported
2015-09-10 20:59:31 PDT
Say you have a text entry containing some text: This is my text! Then say you want to revise that sentence by adding more text before it. I will use the ^ to indicate the character that the cursor is positioned immediately IN FRONT OF. You position the cursor before your existing text: This is my text! ^ then you start typing: This is some more textThis is my text! ^ So far so good: nothing appears as misspelled. Then you type the period: This is some more text.This is my text! ^ And now the word "text" is improperly highlighted as misspelled, and stays that way after you add the space. It's quite interesting that this problem does not occur for other punctuation like exclamation points or commas, only for periods.
Attachments
Check every parts of words containing periods or colons.
(4.41 KB, patch)
2016-02-12 00:57 PST
,
Adrien Plazas
mrobinson
: review-
Details
Formatted Diff
Diff
Screenshot of spellchecking before and after the patch
(28.53 KB, image/png)
2016-02-12 01:02 PST
,
Adrien Plazas
no flags
Details
Expand newAdjacentwords selection from its boundaries
(3.28 KB, patch)
2016-03-03 04:52 PST
,
Adrien Plazas
mcatanzaro
: review-
buildbot
: commit-queue-
Details
Formatted Diff
Diff
Archive of layout-test-results from ews115 for mac-yosemite
(553.94 KB, application/zip)
2016-03-03 05:35 PST
,
Build Bot
no flags
Details
Expand newAdjacentwords selection from its boundaries
(3.60 KB, patch)
2016-03-03 07:45 PST
,
Adrien Plazas
darin
: review-
darin
: commit-queue-
Details
Formatted Diff
Diff
Expand newAdjacentwords selection from its boundaries
(8.58 KB, patch)
2016-03-08 02:54 PST
,
Adrien Plazas
darin
: review-
buildbot
: commit-queue-
Details
Formatted Diff
Diff
Archive of layout-test-results from ews100 for mac-yosemite
(762.00 KB, application/zip)
2016-03-08 03:41 PST
,
Build Bot
no flags
Details
Archive of layout-test-results from ews107 for mac-yosemite-wk2
(809.81 KB, application/zip)
2016-03-08 03:43 PST
,
Build Bot
no flags
Details
Archive of layout-test-results from ews112 for mac-yosemite
(830.45 KB, application/zip)
2016-03-08 03:56 PST
,
Build Bot
no flags
Details
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Joanmarie Diggs
Comment 1
2016-01-20 15:48:07 PST
This issue has some interesting implications for screen reader users who navigate into the text. Orca uses the accessible text attributes to identify the presence of the red squiggly and speak "misspelled" when that attribute is present. (Note: At the moment there are some other, accessibility-related bugs which are interfering with the aforementioned functionality. But when those are addressed, it would be nice if arrowing into the error didn't make the error indication vanish. So that Orca could make this presentation.)
Joanmarie Diggs
Comment 2
2016-01-20 15:55:47 PST
(In reply to
comment #1
)
> [Text belonging in
bug 153290
]
Sorry for the noise!
Adrien Plazas
Comment 3
2016-02-11 00:42:22 PST
It does the same with the column character too. I couldn't find any other punctuation characters than the period and the column doing so, despite trying some "exotic" ones like «»¿¡…§ amongs many others.
Adrien Plazas
Comment 4
2016-02-11 06:17:02 PST
Not splitting on colons makes sense as it is used in some languages like Swedish and Finnish:
https://en.wikipedia.org/wiki/Colon_(punctuation)#Word-medial_separator
Adrien Plazas
Comment 5
2016-02-11 06:20:19 PST
Not splitting on periods makes sense as they are used in American English for acronyms:
https://en.wikipedia.org/wiki/Full_stop#Acronyms_and_initialisms
Hence I suppose it's not a bug?
Adrien Plazas
Comment 6
2016-02-12 00:57:51 PST
Created
attachment 271148
[details]
Check every parts of words containing periods or colons.
WebKit Commit Bot
Comment 7
2016-02-12 01:00:10 PST
Attachment 271148
[details]
did not pass style-queue: ERROR: Source/WebCore/ChangeLog:8: You should remove the 'No new tests' and either add and list tests, or explain why no new tests were possible. [changelog/nonewtests] [5] Total errors found: 1 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Adrien Plazas
Comment 8
2016-02-12 01:02:40 PST
Created
attachment 271149
[details]
Screenshot of spellchecking before and after the patch
Martin Robinson
Comment 9
2016-02-12 13:53:49 PST
Comment on
attachment 271148
[details]
Check every parts of words containing periods or colons. View in context:
https://bugs.webkit.org/attachment.cgi?id=271148&action=review
> Source/WebCore/ChangeLog:8 > + No new tests (OOPS!).
I think this change deserves a test.
> Source/WebCore/ChangeLog:17 > + - or Swedish and Finnish words containing a colon.
I think we need to understand why this is a problem for Enchant and not for the Mac port. If it is a problem for the Mac port, we likely need to do this fix at a higher level. We should compare the behavior of the GTK+ port to Safari.
> Source/WebCore/platform/text/enchant/TextCheckerEnchant.cpp:-102 > int start = textBreakFirst(iter); > for (int end = textBreakNext(iter); end != TextBreakDone; end = textBreakNext(iter)) { > if (isWordTextBreak(iter)) { > - checkSpellingOfWord(utf8String, start, end, misspellingLocation, misspellingLength);
Why isn't wordBreakIterator breaking on the punctuation already? I think that understanding the context of the problem is important to reviewing the patch.
Adrien Plazas
Comment 10
2016-02-17 00:59:01 PST
(In reply to
comment #9
)
> Why isn't wordBreakIterator breaking on the punctuation already? I think > that understanding the context of the problem is important to reviewing the > patch.
wordBreakIterator seems to be pretty smart actually: not breaking on every punctuation mark seems to make sense in some languages as the punctuation marks can be considered part of the word itself (kind of as a word construction mechanism). Breaking such words would make them either not words or not the same words, hence not breaking the words seems to be the right thing to do. That being said, this creates bugs when checking the spelling of sentences with a poor syntax (like ones with no space after a period or a colon), but this is a spell checking bug, not a word breaking one.
Michael Catanzaro
Comment 11
2016-02-17 11:19:09 PST
(In reply to
comment #10
)
> wordBreakIterator seems to be pretty smart actually: not breaking on every > punctuation mark seems to make sense in some languages as the punctuation > marks can be considered part of the word itself (kind of as a word > construction mechanism).
It doesn't seem smart to me, if it's not actually able to detect word breaks properly. A punctuation mark might occur inside a word in some languages, but if it's followed by a space, then surely it is always a word break character? I haven't looked at this closely, but I suspect we are somehow misusing the ICU API, as I'd like to think it's smart enough to handle this. At least, detecting word breaks properly is a standard feature of GtkTextIter, so I would expect ICU to be able to do it as well. If we can't figure it out, might need to look at what GtkTextIter is doing.
> That being said, this creates bugs when checking the spelling of sentences > with a poor syntax (like ones with no space after a period or a colon), but > this is a spell checking bug, not a word breaking one.
In that case, I would expect the spellchecker to flag the word foo.bar as misspelled.
Michael Catanzaro
Comment 12
2016-02-17 11:24:30 PST
Here is the ICU documentation:
http://userguide.icu-project.org/boundaryanalysis
Note that each of the four period characters is clearly shown to be a word boundary character in the example "Your balance is $1234.56... I think." It says it follows this standard for determining word boundaries:
http://www.unicode.org/reports/tr29/#Word_Boundaries
That standard includes a grammar used to determine whether the period (U+002 FULL STOP) as well as other punctuation characters constitutes a word boundary or not, depending on the context of its use. (E.g. in 1234.56 it's not a word boundary.) We of course don't want to implement that in WebKit; I suspect ICU has already done so.
Adrien Plazas
Comment 13
2016-02-24 04:42:55 PST
(In reply to
comment #12
)
> Here is the ICU documentation: >
http://userguide.icu-project.org/boundaryanalysis
> > Note that each of the four period characters is clearly shown to be a word > boundary character in the example "Your balance is $1234.56... I think."
Only the three trailing ones are but not the one in the middle of the "word". Also in
http://www.unicode.org/reports/tr29/#Word_Boundaries
: "The goal of matching user perceptions cannot always be met exactly because the text alone does not always contain enough information to unambiguously decide boundaries. For example, the period (U+002E FULL STOP) is used ambiguously, sometimes for end-of-sentence purposes, sometimes for abbreviations, and sometimes for numbers. In most cases, however, programmatic text boundaries can match user perceptions quite closely, although sometimes the best that can be done is not to surprise the user." They can't always decide when to break because of ambiguity and try to be conservative which is the right thing to do from their POV, letting thir users to choose what to do when there is ambiguity. I don't see how to fix the spell checking without splitting the words into smaller pieces and maybe they have stronger rules to split than "words", but otherwise I don't think it's far fetched to split the words again ourselves.
Michael Catanzaro
Comment 14
2016-02-25 20:04:24 PST
I think the problem is in WebCore, it shouldn't be requesting we check the spelling of the word ("text.This" from
comment #0
) while the cursor is still in it.
Adrien Plazas
Comment 15
2016-03-01 07:43:43 PST
After investigation, the problem seems to be in Editor::editorUIUpdateTimerFired. Editor::editorUIUpdateTimerFired checks the spelling and grammar and the results of the checking are correct, but later in the method in "if (!textChecker() || textChecker()->shouldEraseMarkersAfterChangeSelection(TextCheckingTypeSpelling))" parts of the markers are erased. I'll keep investigating why this happens.
Adrien Plazas
Comment 16
2016-03-03 04:52:41 PST
Created
attachment 272752
[details]
Expand newAdjacentwords selection from its boundaries
Build Bot
Comment 17
2016-03-03 05:35:20 PST
Comment on
attachment 272752
[details]
Expand newAdjacentwords selection from its boundaries
Attachment 272752
[details]
did not pass mac-debug-ews (mac): Output:
http://webkit-queues.webkit.org/results/915558
Number of test failures exceeded the failure limit.
Build Bot
Comment 18
2016-03-03 05:35:23 PST
Created
attachment 272754
[details]
Archive of layout-test-results from ews115 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews115 Port: mac-yosemite Platform: Mac OS X 10.10.5
Michael Catanzaro
Comment 19
2016-03-03 06:05:21 PST
Comment on
attachment 272752
[details]
Expand newAdjacentwords selection from its boundaries View in context:
https://bugs.webkit.org/attachment.cgi?id=272752&action=review
Looks like it breaks 26230 layout tests on mac-debug. You can dig around in the archive posted by the bot to see what's wrong: ASSERTION FAILED: anchorType() == PositionIsOffsetInAnchor /Volumes/Data/EWS/WebKit/Source/WebCore/dom/Position.h(109) : int WebCore::Position::offsetInContainerNode() const 1 0x10d21d160 WTFCrash 2 0x111c195dc WebCore::Position::offsetInContainerNode() const 3 0x1123cf745 WebCore::Editor::editorUIUpdateTimerFired() 4 0x1123ced9d WebCore::Editor::updateEditorUINowIfScheduled() 5 0x1123ce7cf WebCore::Editor::appliedEditing(WTF::PassRefPtr<WebCore::CompositeEditCommand>) 6 0x111ef9b48 WebCore::CompositeEditCommand::apply() 7 0x111ef9a11 WebCore::applyCommand(WTF::PassRefPtr<WebCore::CompositeEditCommand>) 8 0x1123cb922 WebCore::Editor::deleteSelectionWithSmartDelete(bool, WebCore::EditAction) 9 0x1123d0fe7 WebCore::Editor::performCutOrCopy(WebCore::Editor::EditorActionSpecifier) 10 0x1123d0991 WebCore::Editor::cut() 11 0x1123e45e1 WebCore::executeCut(WebCore::Frame&, WebCore::Event*, WebCore::EditorCommandSource, WTF::String const&) 12 0x1123e3947 WebCore::Editor::Command::execute(WTF::String const&, WebCore::Event*) const 13 0x11223be59 WebCore::Document::execCommand(WTF::String const&, bool, WTF::String const&) 14 0x112c60c43 WebCore::jsDocumentPrototypeFunctionExecCommand(JSC::ExecState*) 15 0x58b086001028 16 0x10ce5829b llint_entry 17 0x10ce5829b llint_entry 18 0x10ce517ae vmEntryToJavaScript 19 0x10cc724da JSC::JITCode::execute(JSC::VM*, JSC::ProtoCallFrame*) 20 0x10cc0c1f1 JSC::Interpreter::executeCall(JSC::ExecState*, JSC::JSObject*, JSC::CallType, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&) 21 0x10c52c9fe JSC::call(JSC::ExecState*, JSC::JSValue, JSC::CallType, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&) 22 0x10c52ca63 JSC::call(JSC::ExecState*, JSC::JSValue, JSC::CallType, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&, WTF::NakedPtr<JSC::Exception>&) 23 0x10c52cc5b JSC::profiledCall(JSC::ExecState*, JSC::ProfilingReason, JSC::JSValue, JSC::CallType, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&, WTF::NakedPtr<JSC::Exception>&) 24 0x112bb78db WebCore::JSMainThreadExecState::profiledCall(JSC::ExecState*, JSC::ProfilingReason, JSC::JSValue, JSC::CallType, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&, WTF::NakedPtr<JSC::Exception>&) 25 0x112d6e81e WebCore::JSEventListener::handleEvent(WebCore::ScriptExecutionContext*, WebCore::Event*) 26 0x112455467 WebCore::EventTarget::fireEventListeners(WebCore::Event&, WebCore::EventTargetData*, WTF::Vector<WebCore::RegisteredEventListener, 1ul, WTF::CrashOnOverflow, 16ul>&) 27 0x112454c98 WebCore::EventTarget::fireEventListeners(WebCore::Event&) 28 0x11237ba60 WebCore::DOMWindow::dispatchEvent(WebCore::Event&, WebCore::EventTarget*) 29 0x1123834e2 WebCore::DOMWindow::dispatchLoadEvent() 30 0x112231a5d WebCore::Document::dispatchWindowLoadEvent() 31 0x11222de3e WebCore::Document::implicitClose() Are you using a debug build? (You should be. ;)
> Source/WebCore/editing/Editor.cpp:3411 > + int newAdjacentWordsLength = newAdjacentWords.end().offsetInContainerNode() - newAdjacentWords.start().offsetInContainerNode();
Nit: it's better to use auto in new code, so you don't have to look up the exact type, and in case the return type changes to something other than int in the future
Adrien Plazas
Comment 20
2016-03-03 07:45:03 PST
Created
attachment 272757
[details]
Expand newAdjacentwords selection from its boundaries
Darin Adler
Comment 21
2016-03-03 08:53:20 PST
Comment on
attachment 272757
[details]
Expand newAdjacentwords selection from its boundaries View in context:
https://bugs.webkit.org/attachment.cgi?id=272757&action=review
A bug fix like this needs a regression test to help make clear exactly what it changes and fixes. Any reason we don’t have one here?
> Source/WebCore/ChangeLog:3 > + [GTK] Spellchecker rejects word when adding a period or colon character if there is no trailing space before the next word
This is not a GTK-specific patch and should not have the GTK prefix on it.
> Source/WebCore/editing/Editor.cpp:3407 > + // being conservative allow to fix that and doesn't seem to disturb any other behavior.
"allow to fix that" is not the right phrase here "fixes that" would be correct grammar
> Source/WebCore/editing/VisibleSelection.h:88 > + int getLength() const { return end().computeOffsetInContainerNode() - start().computeOffsetInContainerNode(); }
This naming is not consistent with WebKit coding style. We don’t use get in the name of functions like this. This code is wrong. It will give an incorrect value any time the start and end positions are not in the same container node, or when they are in the same container node, but it is not a text node.
Darin Adler
Comment 22
2016-03-03 08:53:54 PST
Thanks for tackling this. Still needs work but possibly on the right track.
Darin Adler
Comment 23
2016-03-03 08:55:41 PST
Comment on
attachment 272757
[details]
Expand newAdjacentwords selection from its boundaries View in context:
https://bugs.webkit.org/attachment.cgi?id=272757&action=review
>> Source/WebCore/editing/VisibleSelection.h:88 >> + int getLength() const { return end().computeOffsetInContainerNode() - start().computeOffsetInContainerNode(); } > > This naming is not consistent with WebKit coding style. We don’t use get in the name of functions like this. > > This code is wrong. It will give an incorrect value any time the start and end positions are not in the same container node, or when they are in the same container node, but it is not a text node.
One way to make cases that cover that would be spelling check in words when the text of the word crosses multiple elements, such as <span> elements.
Michael Catanzaro
Comment 24
2016-03-03 09:53:25 PST
Look in LayoutTests/editing/spelling for how to write spellchecker tests.
Michael Catanzaro
Comment 25
2016-03-03 10:02:49 PST
By the way, we weren't expecting this task to be nearly so complicated when we assigned it to you Adrian, you've gotten quite deep into WebCore already, when I had been expecting some simple thing to fix in TextCheckerEnchant. Learning how to write a layout test is the next big step! If you don't understand anything about Darin's comments, please ask him for clarification; he understands this code much better than I do.
Adrien Plazas
Comment 26
2016-03-08 02:54:16 PST
Created
attachment 273284
[details]
Expand newAdjacentwords selection from its boundaries The definition of VisibleSelection::length() haven't been fixed yet. In this version of the patch, I fixed the comment and added a test that fails without this patch and passes with it.
Build Bot
Comment 27
2016-03-08 03:41:28 PST
Comment on
attachment 273284
[details]
Expand newAdjacentwords selection from its boundaries
Attachment 273284
[details]
did not pass mac-ews (mac): Output:
http://webkit-queues.webkit.org/results/941138
New failing tests: editing/spelling/spelling-words-with-punctuation.html
Build Bot
Comment 28
2016-03-08 03:41:32 PST
Created
attachment 273286
[details]
Archive of layout-test-results from ews100 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews100 Port: mac-yosemite Platform: Mac OS X 10.10.5
Build Bot
Comment 29
2016-03-08 03:43:37 PST
Comment on
attachment 273284
[details]
Expand newAdjacentwords selection from its boundaries
Attachment 273284
[details]
did not pass mac-wk2-ews (mac-wk2): Output:
http://webkit-queues.webkit.org/results/941140
New failing tests: editing/spelling/spelling-words-with-punctuation.html
Build Bot
Comment 30
2016-03-08 03:43:40 PST
Created
attachment 273287
[details]
Archive of layout-test-results from ews107 for mac-yosemite-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews107 Port: mac-yosemite-wk2 Platform: Mac OS X 10.10.5
Build Bot
Comment 31
2016-03-08 03:56:14 PST
Comment on
attachment 273284
[details]
Expand newAdjacentwords selection from its boundaries
Attachment 273284
[details]
did not pass mac-debug-ews (mac): Output:
http://webkit-queues.webkit.org/results/941146
New failing tests: editing/spelling/spelling-words-with-punctuation.html
Build Bot
Comment 32
2016-03-08 03:56:18 PST
Created
attachment 273288
[details]
Archive of layout-test-results from ews112 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews112 Port: mac-yosemite Platform: Mac OS X 10.10.5
Adrien Plazas
Comment 33
2016-03-09 02:43:38 PST
All tests using initSpellTest() from editing/spelling/resources/utils.js fail on Mac. I suspect that either one these two lines form initSpellTest() don't do its job as expect: destination.focus(); document.execCommand("InsertText", false, testText); or that the timed out test is actually executed before the text have been inserted: shouldBecomeDifferent('internals.markerCountForNode(destination.childNodes[0], "spelling")', '0', ...); This test always fail on Mac and prevent the tests from starting (they time out early). It doesn't seem that the problem is in the test added by the patch and I currently have no idea how to make these tests work on Mac.
Michael Catanzaro
Comment 34
2016-03-09 07:18:22 PST
(In reply to
comment #33
)
> It doesn't seem that the problem is in the test added by the patch and I > currently have no idea how to make these tests work on Mac.
Adrien, these tests pass on the bots without your patch. :)
Adrien Plazas
Comment 35
2016-03-10 02:30:23 PST
Are you sure ? I rebased on master, reverted the changes to Source/WebCore/editing/Editor.cpp and kept LayoutTests/editing/spelling/spelling-words-with-punctuation.html. Then I ran: Tools/Scripts/run-webkit-tests `git grep initSpellTest LayoutTests/editing/spelling/ | cut -d: -f1 | uniq | grep "\.html"` On Linux GTK+: - editing/spelling/context-menu-suggestions-multiword-selection.html timed out - LayoutTests/editing/spelling/spelling-words-with-punctuation.html failed (as expected, the changes in Editor.cpp will make it pass) - the 6 other tests passed On Mac, all the 8 tests timed out.
Michael Catanzaro
Comment 36
2016-03-10 06:38:29 PST
The Mac tests are all passing on the bots without your patch, else the EWS would not have turned red. ;)
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug