Bug 24419 - WordAwareIterator not working in WebCore/editing/Editor.cpp
Summary: WordAwareIterator not working in WebCore/editing/Editor.cpp
Status: RESOLVED INVALID
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-03-05 22:26 PST by Diego Escalante Urrelo
Modified: 2009-04-06 13:10 PDT (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Diego Escalante Urrelo 2009-03-05 22:26:41 PST
So in WebCore/editing/Editor.cpp there's the code calling checkSpellingOfString of the current backend, while implementing the one for GTK+ I realized that this code (and I might be wrong) should be handling me the string analyzed /word by word/ and not as a whole.
Example:
- wrong:
            client->checkSpellingOfString("helol john", len, &misspellingLocation, &misspellingLength);
- right:
            client->checkSpellingOfString("helol", len, &misspellingLocation, &misspellingLength);
(second iteration)
            client->checkSpellingOfString("john", len, &misspellingLocation, &misspellingLength);

I base my theory in the name of the text iterator (WordAwareIterator?) and that there's a while() loop for calling checkSpellingOfString with an .advance() method and a check for .atEnd(). If this was not supposed to split by word, then why the while()? why .advance() the iterator?.

Based on all that, I suggest that this is a bug in the text iterator and that it should actually be giving me words and not the full phrase.

static String findFirstMisspellingInRange(EditorClient* client, Range* searchRange, int& firstMisspellingOffset, bool markAll)
{
    ASSERT_ARG(client, client);
    ASSERT_ARG(searchRange, searchRange);
    
    WordAwareIterator it(searchRange);
    firstMisspellingOffset = 0;
    
    String firstMisspelling;
    int currentChunkOffset = 0;

    while (!it.atEnd()) {
        const UChar* chars = it.characters();
        int len = it.length();
        
        // Skip some work for one-space-char hunks
        if (!(len == 1 && chars[0] == ' ')) {
            
            int misspellingLocation = -1;
            int misspellingLength = 0;
            client->checkSpellingOfString(chars, len, &misspellingLocation, &misspellingLength);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

            // 5490627 shows that there was some code path here where the String constructor below crashes.
            // We don't know exactly what combination of bad input caused this, so we're making this much
            // more robust against bad input on release builds.
            ASSERT(misspellingLength >= 0);
            ASSERT(misspellingLocation >= -1);
            ASSERT(misspellingLength == 0 || misspellingLocation >= 0);
            ASSERT(misspellingLocation < len);
            ASSERT(misspellingLength <= len);
            ASSERT(misspellingLocation + misspellingLength <= len);
            
            if (misspellingLocation >= 0 && misspellingLength > 0 && misspellingLocation < len && misspellingLength <= len && misspellingLocation + misspellingLength <= len) {
                
                // Remember first-encountered misspelling and its offset
                if (!firstMisspelling) {
                    firstMisspellingOffset = currentChunkOffset + misspellingLocation;
                    firstMisspelling = String(chars + misspellingLocation, misspellingLength);
                }
                
                // Mark this instance if we're marking all instances. Otherwise bail out because we found the first one.
                if (!markAll)
                    break;
                
                // Compute range of misspelled word
                RefPtr<Range> misspellingRange = TextIterator::subrange(searchRange, currentChunkOffset + misspellingLocation, misspellingLength);
                
                // Store marker for misspelled word
                ExceptionCode ec = 0;
                misspellingRange->startContainer(ec)->document()->addMarker(misspellingRange.get(), DocumentMarker::Spelling);
                ASSERT(ec == 0);
            }
        }
        
        currentChunkOffset += len;
        it.advance();
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    }
    
    return firstMisspelling;
}
Comment 1 Mark Rowe (bdash) 2009-03-06 00:06:01 PST
It's not quite clear to me what you think is wrong.  it.characters() returns a pointer into a given buffer.  it.length() returns the number of characters in the current word.  The buffer that it.characters() points in to may be substantially longer than the value returned by it.length(), which I suspect is what is throwing you off, but that is because the underlying buffer is probably the one associated with a DOM node or similar and allocating a new buffer per word would be slooooow.

Am I missing something here?
Comment 2 Gustavo Noronha (kov) 2009-03-12 07:24:23 PDT
I think this is working correctly. What happens is the patch in https://bugs.webkit.org/show_bug.cgi?id=15616 doesn't return the correct values for misspellingLocation and misspellingLength.
Comment 3 Gustavo Noronha (kov) 2009-04-06 13:10:20 PDT
I think we have pretty much confirmed that this is working correctly, by now. I'll close the bug and let Diego reopen it if he disagrees. =)