So in WebCore/editing/Editor.cpp there's the code calling checkSpellingOfString of the current backend, while implementing the one for GTK+ I realized that this code (and I might be wrong) should be handling me the string analyzed /word by word/ and not as a whole. Example: - wrong: client->checkSpellingOfString("helol john", len, &misspellingLocation, &misspellingLength); - right: client->checkSpellingOfString("helol", len, &misspellingLocation, &misspellingLength); (second iteration) client->checkSpellingOfString("john", len, &misspellingLocation, &misspellingLength); I base my theory in the name of the text iterator (WordAwareIterator?) and that there's a while() loop for calling checkSpellingOfString with an .advance() method and a check for .atEnd(). If this was not supposed to split by word, then why the while()? why .advance() the iterator?. Based on all that, I suggest that this is a bug in the text iterator and that it should actually be giving me words and not the full phrase. static String findFirstMisspellingInRange(EditorClient* client, Range* searchRange, int& firstMisspellingOffset, bool markAll) { ASSERT_ARG(client, client); ASSERT_ARG(searchRange, searchRange); WordAwareIterator it(searchRange); firstMisspellingOffset = 0; String firstMisspelling; int currentChunkOffset = 0; while (!it.atEnd()) { const UChar* chars = it.characters(); int len = it.length(); // Skip some work for one-space-char hunks if (!(len == 1 && chars[0] == ' ')) { int misspellingLocation = -1; int misspellingLength = 0; client->checkSpellingOfString(chars, len, &misspellingLocation, &misspellingLength); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // 5490627 shows that there was some code path here where the String constructor below crashes. // We don't know exactly what combination of bad input caused this, so we're making this much // more robust against bad input on release builds. ASSERT(misspellingLength >= 0); ASSERT(misspellingLocation >= -1); ASSERT(misspellingLength == 0 || misspellingLocation >= 0); ASSERT(misspellingLocation < len); ASSERT(misspellingLength <= len); ASSERT(misspellingLocation + misspellingLength <= len); if (misspellingLocation >= 0 && misspellingLength > 0 && misspellingLocation < len && misspellingLength <= len && misspellingLocation + misspellingLength <= len) { // Remember first-encountered misspelling and its offset if (!firstMisspelling) { firstMisspellingOffset = currentChunkOffset + misspellingLocation; firstMisspelling = String(chars + misspellingLocation, misspellingLength); } // Mark this instance if we're marking all instances. Otherwise bail out because we found the first one. if (!markAll) break; // Compute range of misspelled word RefPtr<Range> misspellingRange = TextIterator::subrange(searchRange, currentChunkOffset + misspellingLocation, misspellingLength); // Store marker for misspelled word ExceptionCode ec = 0; misspellingRange->startContainer(ec)->document()->addMarker(misspellingRange.get(), DocumentMarker::Spelling); ASSERT(ec == 0); } } currentChunkOffset += len; it.advance(); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ } return firstMisspelling; }
It's not quite clear to me what you think is wrong. it.characters() returns a pointer into a given buffer. it.length() returns the number of characters in the current word. The buffer that it.characters() points in to may be substantially longer than the value returned by it.length(), which I suspect is what is throwing you off, but that is because the underlying buffer is probably the one associated with a DOM node or similar and allocating a new buffer per word would be slooooow. Am I missing something here?
I think this is working correctly. What happens is the patch in https://bugs.webkit.org/show_bug.cgi?id=15616 doesn't return the correct values for misspellingLocation and misspellingLength.
I think we have pretty much confirmed that this is working correctly, by now. I'll close the bug and let Diego reopen it if he disagrees. =)