Bug 50245

Summary:

Add word-prefix search options to the text search without ICU

Product:

WebKit

Reporter:

Benjamin Poulain <benjamin>

Component:

Text

Assignee:

Nobody <webkit-unassigned>

Status:

NEW

Severity:

Normal

CC:

cshu, max.hong.shen, tonikitoo

Priority:

Keywords:

Qt, QtTriaged

Version:

528+ (Nightly build)

Hardware:

All

OS:

All

Bug Depends on:

50038, 60786

Bug Blocks:

Attachments:

Description	Flags
word-prefix search for no-ICU code path (v1.0)	darin: review-

Benjamin Poulain

Reported 2010-11-30 10:24:25 PST

The patch r72887 (https://bugs.webkit.org/show_bug.cgi?id=50038 ) adds options to Editor::findString() but does not implement the changes to the non-ICU version of SearchBuffer. The test editing/text-iterator/findString.html fails on any case using the new options.

Attachments
word-prefix search for no-ICU code path (v1.0) (8.29 KB, patch) 2011-05-26 10:39 PDT, Yi Shen	darin: review-	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Yi Shen

Comment 1 2011-05-26 10:39:05 PDT

Created attachment 95002 [details] word-prefix search for no-ICU code path (v1.0) For Qt, this patch makes the LayoutTests/editing/text-iterator/findString.html pass all the tests except the one for the Thai words. var thaiWords = [ "\u0e01\u0e23", "\u0e1b\u0e39\u0e40\u0e25", "\u0e01\u0e0a", "\u0e01\u0e0a\u0e01\u0e23", // thaiWords[2] + thaiWords[0] "\u0e01\u0e23\u0e01\u0e0a", // thaiWords[0] + thaiWords[2] "\u0e1a\u0e07\u0e01\u0e0a", // ends with thaiWords[2] ]; testFindString(thaiWords.join(""), thaiWords[0], ["AtWordStarts"], [[0, 2], [12, 14], []]); // Failed here After traced down the problem in SearchBuffer::isWordStartMatch() function, I found the issue is that findNextWordFromIndex() doesn't return a correct word position for Thai words, which is implemented in TextBoundariesQt.cpp by using QTextBoundaryFinder. In other word, this is a Qt issue and I will fire a Qt bug for it.

Chang Shu

Comment 2 2011-05-26 11:12:21 PDT

Comment on attachment 95002 [details] word-prefix search for no-ICU code path (v1.0) View in context: https://bugs.webkit.org/attachment.cgi?id=95002&action=review I understand you didn't unskip findString.html because of the Thai char issue. But it will be good to show some tests improvement after your fix. Maybe we can split findString.html and put Thai tests in a separate test? Overall, it's a good patch. Thanks. > Source/WebCore/editing/TextIterator.cpp:106 > + bool isWordStartMatch(size_t, size_t length) const; What do you think we move all the common apis to the common place? > Source/WebCore/editing/TextIterator.cpp:2189 > + same thing here. We can move identical code to the common place. > Source/WebCore/editing/TextIterator.cpp:2236 > + memcpy(m_context.data(), m_context.data() + 1, (m_context.size() - 1) * sizeof(UChar)); memcpy to overlapped place may cause unexpected results?

Chang Shu

Comment 3 2011-05-26 11:24:23 PDT

> > Source/WebCore/editing/TextIterator.cpp:2236 > > + memcpy(m_context.data(), m_context.data() + 1, (m_context.size() - 1) * sizeof(UChar)); > > memcpy to overlapped place may cause unexpected results? I think memmove is a better way to do it.

Darin Adler

Comment 4 2011-06-17 11:01:27 PDT

Comment on attachment 95002 [details] word-prefix search for no-ICU code path (v1.0) View in context: https://bugs.webkit.org/attachment.cgi?id=95002&action=review Need to at least fix the memcpy problem. But I think that also using a circular buffer instead of a memmove each time is the way to go. It may be tricky to get a circular buffer to work with functions such as U16_GET, though, so not sure it will be easy. >> Source/WebCore/editing/TextIterator.cpp:2189 >> + > > same thing here. We can move identical code to the common place. We don’t want to copy and paste this code, yes, so we need a version of the patch that does not involve a second copy. >> Source/WebCore/editing/TextIterator.cpp:2236 >> + memcpy(m_context.data(), m_context.data() + 1, (m_context.size() - 1) * sizeof(UChar)); > > memcpy to overlapped place may cause unexpected results? That’s right. You can’t use memcpy for this; it has to be memmove instead. And doing a memmove every time we add a character is really inefficient. I suggest considering a circular buffer instead.

Yi Shen

Comment 5 2011-06-17 13:38:08 PDT

I agree that memmove is a better way, but I think memcpy is fine in this case since the source is after the destination,right? e.g, str = "0123456789" memcpy(str+2, str+1, 3); returns: 0111356789 memcpy(str+1, str+2, 3); returns: 0234456789 Anyway, I will change it. (In reply to comment #3) > > > Source/WebCore/editing/TextIterator.cpp:2236 > > > + memcpy(m_context.data(), m_context.data() + 1, (m_context.size() - 1) * sizeof(UChar)); > > > > memcpy to overlapped place may cause unexpected results? > > I think memmove is a better way to do it.

Yi Shen

Comment 6 2011-06-17 13:38:49 PDT

Thanks both for the reviewing :) I will update the patch based on your comments. (In reply to comment #4) > (From update of attachment 95002 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=95002&action=review > > Need to at least fix the memcpy problem. But I think that also using a circular buffer instead of a memmove each time is the way to go. It may be tricky to get a circular buffer to work with functions such as U16_GET, though, so not sure it will be easy. > > >> Source/WebCore/editing/TextIterator.cpp:2189 > >> + > > > > same thing here. We can move identical code to the common place. > > We don’t want to copy and paste this code, yes, so we need a version of the patch that does not involve a second copy. > > >> Source/WebCore/editing/TextIterator.cpp:2236 > >> + memcpy(m_context.data(), m_context.data() + 1, (m_context.size() - 1) * sizeof(UChar)); > > > > memcpy to overlapped place may cause unexpected results? > > That’s right. You can’t use memcpy for this; it has to be memmove instead. > > And doing a memmove every time we add a character is really inefficient. I suggest considering a circular buffer instead.

Benjamin Poulain

Comment 7 2011-06-17 15:31:54 PDT

(In reply to comment #5) > I agree that memmove is a better way, but I think memcpy is fine in this case since the source is after the destination,right? > > e.g, > str = "0123456789" > memcpy(str+2, str+1, 3); > returns: 0111356789 > > memcpy(str+1, str+2, 3); > returns: 0234456789 This is invalid, and will fail on some platform, some libc or even depending on the alignment. The source and destination of memcpy cannot overlap (see http://linuxcertif.com/man/3/memcpy/en/ ).

Yi Shen

Comment 8 2011-06-22 07:54:29 PDT

(In reply to comment #7) > (In reply to comment #5) > > I agree that memmove is a better way, but I think memcpy is fine in this case since the source is after the destination,right? > > > > e.g, > > str = "0123456789" > > memcpy(str+2, str+1, 3); > > returns: 0111356789 > > > > memcpy(str+1, str+2, 3); > > returns: 0234456789 > > This is invalid, and will fail on some platform, some libc or even depending on the alignment. > The source and destination of memcpy cannot overlap (see http://linuxcertif.com/man/3/memcpy/en/ ). Agree, will fix it :)

Note You need to log in before you can comment on or make changes to this bug.