Bug 155908 - Spell checking should not use DOM types
Summary: Spell checking should not use DOM types
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: HTML Editing (show other bugs)
Version: WebKit Local Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-25 16:44 PDT by Beth Dakin
Modified: 2016-03-30 08:35 PDT (History)
3 users (show)

See Also:


Attachments
Patch (29.71 KB, patch)
2016-03-25 16:49 PDT, Beth Dakin
no flags Details | Formatted Diff | Diff
Patch (30.03 KB, patch)
2016-03-25 17:18 PDT, Beth Dakin
no flags Details | Formatted Diff | Diff
Patch (31.78 KB, patch)
2016-03-28 11:51 PDT, Beth Dakin
no flags Details | Formatted Diff | Diff
Patch (31.61 KB, patch)
2016-03-29 12:31 PDT, Beth Dakin
darin: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Beth Dakin 2016-03-25 16:44:47 PDT
Spell checking should not use DOM types

This is a follow-on from https://bugs.webkit.org/show_bug.cgi?id=155532

As Darin pointed out in some of the comments in that bug, the patch to resolve that issue started using DOM typed in spell checking code, but generally we have avoided doing that.
Comment 1 Beth Dakin 2016-03-25 16:49:51 PDT
Created attachment 274954 [details]
Patch
Comment 2 Beth Dakin 2016-03-25 17:18:15 PDT
Created attachment 274958 [details]
Patch
Comment 3 Beth Dakin 2016-03-28 11:51:13 PDT
Created attachment 275033 [details]
Patch

This iteration should fix the Windows build.
Comment 4 Beth Dakin 2016-03-29 12:31:50 PDT
Created attachment 275118 [details]
Patch
Comment 5 Darin Adler 2016-03-30 08:35:53 PDT
Comment on attachment 275118 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=275118&action=review

It’s excellent to do this work to eliminate the layering violation. I like the direction we are going here, but I think we need more refinement to make the code sufficiently clear.

I think that many of the more-obscure call sites aren’t doing this quite right. And even the ones that are right are unnecessarily fragile, computing the text and the offset separately and counting on the two matching without a clear enough guarantee.

To me, the insertion point offset seems to go hand in hand with the text to be checked; to make it likely we get this right I would want a struct that combines a StringView with an Optional<unsigned> that represents a paragraph with insertion point information. That struct could help us organize the code better because it would encourage us to have functions that compute both at the same time. And it would make it much clearer what the offset is relative to; it’s an offset within the StringView.

Given the idiom is a bit repetitive, we might consider a slightly different helper function. Every call site seems to be starting with the frame or a document or node and working its way to the insertion point. Part of that, though, is because we have moved the code so it’s not unified with the code extracting the text. If it was unified, then more likely we would have a paragraph range to pass in since we’d be using that same range to extract the text. So it’s unclear exactly which helper function we’d want until we do that work. I think we should refactor and figure out once that is done exactly which helper function is useful. I am guessing we probably won’t decide to put it in the TextIterator class itself, but not sure. It mostly depends on what happens as we move the extraction of this offset so that it’s unified with the extraction of the text.

> Source/WebCore/accessibility/AccessibilityObject.cpp:427
> +        checkTextOfParagraph(*textChecker, stringValue(), TextCheckingTypeSpelling, results, TextIterator::insertionPointOffsetInParagraph(frame->selection().selection()));

I don’t understand what guarantees that stringValue() is the text of an entire paragraph. It seems to me we want the insertion point offset relative to the start of stringValue(), not relative to the start of what the TextIterator thinks is a paragraph.

> Source/WebCore/accessibility/mac/WebAccessibilityObjectWrapperMac.mm:1161
> +        checkTextOfParagraph(*checker, text, TextCheckingTypeSpelling, results, TextIterator::insertionPointOffsetInParagraph(node->document().frame()->selection().selection()));

I don’t understand what guarantees that “text” is the text of an entire paragraph. It seems to me we want the insertion point offset relative to the start of the text in "text", not relative to the start of what the TextIterator thinks is a paragraph.

> Source/WebCore/editing/AlternativeTextController.cpp:356
> +        textChecker()->getGuessesForWord(m_alternativeTextInfo.originalText, paragraphText, TextIterator::insertionPointOffsetInParagraph(m_frame.selection().selection()), suggestions);

What guarantees that TextCheckingParagraph gets the exact same paragraph that TextIterator::insertionPointOffsetInParagraph gets? It seems like they both do similar work, but I don’t see how it’s guaranteed they will match.

> Source/WebCore/editing/Editor.cpp:2131
> +        textChecker()->getGuessesForWord(word, String(), TextIterator::insertionPointOffsetInParagraph(m_frame.selection().selection()), guesses);

How is it helpful to have the insertion point relative to the start of the paragraph in this case, since we are not passing in the text of the paragraph? The text checker is only getting a single word, so what is the reference point for the insertion point we are passing. I think that in this case we either don’t need to pass the insertion point information, or we need to restructure the code so we can get a correct value for it.

> Source/WebCore/editing/Editor.cpp:2405
> +    checkTextOfParagraph(*textChecker(), paragraphToCheck.text(), resolveTextCheckingTypeMask(textCheckingOptions), results, TextIterator::insertionPointOffsetInParagraph(m_frame.selection().selection()));

Same question as above about whether TextCheckingParagraph is guaranteed to be dealing with the same paragraph as TextIterator::insertionPointOffsetInParagraph.

I think the cure is to use the same range to extract the text and to compute the insertion point offset within the range. That probably means moving the code into the TextCheckingParagraph class in this particular case.

> Source/WebCore/editing/SpellChecker.cpp:185
> +    client()->requestCheckingOfString(m_processingRequest, TextIterator::insertionPointOffsetInParagraph(m_frame.selection().selection()));

Seems more natural that this would be captured at the time m_processingRequest was created. The text will be captured at that point. It seems a bit peculiar to have the offset be captured now instead of at the same time the text was captured. Perhaps there is some benefit to this different timing, but I suspect it just makes the code unnecessarily fragile.

> Source/WebCore/editing/TextCheckingHelper.cpp:352
> +                    insertionPointOffset = TextIterator::insertionPointOffsetInParagraph(frame->selection().selection());

Seems safer if we actually use paragraphRange here instead of depending on TextIterator::insertionPointOffsetInParagraph finding the same paragraph.

> Source/WebCore/editing/TextCheckingHelper.cpp:587
> +        insertionPointOffset = TextIterator::insertionPointOffsetInParagraph(frame->selection().selection());

Ditto.

> Source/WebCore/editing/TextCheckingHelper.cpp:594
> +            m_client->textChecker()->getGuessesForWord(misspelledWord, String(), insertionPointOffset, guesses);

With no context to go with it, unclear how the insertionPointOffset will be valuable here.

> Source/WebCore/editing/TextCheckingHelper.h:106
> +void checkTextOfParagraph(TextCheckerClient&, StringView, TextCheckingTypeMask, Vector<TextCheckingResult>&, int insertionPointOffset);

Since I presume the insertion point offset "goes with" the StringView, I would expect them to be consecutive arguments, and I would also expect that we’d use unsigned since that’s the type for offsets within a StringView.

We might also consider whether we want to use Optional<unsigned> rather than passing a pre-selected value such as 0 when there is no insertion point.

> Source/WebCore/platform/text/TextCheckerClient.h:52
> +    virtual Vector<TextCheckingResult> checkTextOfParagraph(StringView, TextCheckingTypeMask checkingTypes, int insertionPointOffset) = 0;

Same though that the insertion point offset should be "next to" the StringView that it’s relative to.

> Source/WebCore/platform/text/TextCheckerClient.h:58
> +    virtual void getGuessesForWord(const String& word, const String& context, int insertionPointOffset, Vector<String>& guesses) = 0;

Unclear whether the offset is relative to the language identification context or relative to the word. Also would be nice in future to use StringView for these rather than const String&.

> Source/WebCore/platform/text/TextCheckerClient.h:59
> +    virtual void requestCheckingOfString(PassRefPtr<TextCheckingRequest>, int insertionPointOffset) = 0;

It’s unclear to me what this insertion point offset is relative to. I think this should move into TextCheckingRequest rather than being passed separately.