Bug 50245

Summary: Add word-prefix search options to the text search without ICU
Product: WebKit Reporter: Benjamin Poulain <benjamin>
Component: TextAssignee: Nobody <webkit-unassigned>
Status: NEW    
Severity: Normal CC: cshu, max.hong.shen, tonikitoo
Priority: P2 Keywords: Qt, QtTriaged
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 50038, 60786    
Bug Blocks:    
Attachments:
Description Flags
word-prefix search for no-ICU code path (v1.0) darin: review-

Benjamin Poulain
Reported 2010-11-30 10:24:25 PST
The patch r72887 (https://bugs.webkit.org/show_bug.cgi?id=50038 ) adds options to Editor::findString() but does not implement the changes to the non-ICU version of SearchBuffer. The test editing/text-iterator/findString.html fails on any case using the new options.
Attachments
word-prefix search for no-ICU code path (v1.0) (8.29 KB, patch)
2011-05-26 10:39 PDT, Yi Shen
darin: review-
Yi Shen
Comment 1 2011-05-26 10:39:05 PDT
Created attachment 95002 [details] word-prefix search for no-ICU code path (v1.0) For Qt, this patch makes the LayoutTests/editing/text-iterator/findString.html pass all the tests except the one for the Thai words. var thaiWords = [ "\u0e01\u0e23", "\u0e1b\u0e39\u0e40\u0e25", "\u0e01\u0e0a", "\u0e01\u0e0a\u0e01\u0e23", // thaiWords[2] + thaiWords[0] "\u0e01\u0e23\u0e01\u0e0a", // thaiWords[0] + thaiWords[2] "\u0e1a\u0e07\u0e01\u0e0a", // ends with thaiWords[2] ]; testFindString(thaiWords.join(""), thaiWords[0], ["AtWordStarts"], [[0, 2], [12, 14], []]); // Failed here After traced down the problem in SearchBuffer::isWordStartMatch() function, I found the issue is that findNextWordFromIndex() doesn't return a correct word position for Thai words, which is implemented in TextBoundariesQt.cpp by using QTextBoundaryFinder. In other word, this is a Qt issue and I will fire a Qt bug for it.
Chang Shu
Comment 2 2011-05-26 11:12:21 PDT
Comment on attachment 95002 [details] word-prefix search for no-ICU code path (v1.0) View in context: https://bugs.webkit.org/attachment.cgi?id=95002&action=review I understand you didn't unskip findString.html because of the Thai char issue. But it will be good to show some tests improvement after your fix. Maybe we can split findString.html and put Thai tests in a separate test? Overall, it's a good patch. Thanks. > Source/WebCore/editing/TextIterator.cpp:106 > + bool isWordStartMatch(size_t, size_t length) const; What do you think we move all the common apis to the common place? > Source/WebCore/editing/TextIterator.cpp:2189 > + same thing here. We can move identical code to the common place. > Source/WebCore/editing/TextIterator.cpp:2236 > + memcpy(m_context.data(), m_context.data() + 1, (m_context.size() - 1) * sizeof(UChar)); memcpy to overlapped place may cause unexpected results?
Chang Shu
Comment 3 2011-05-26 11:24:23 PDT
> > Source/WebCore/editing/TextIterator.cpp:2236 > > + memcpy(m_context.data(), m_context.data() + 1, (m_context.size() - 1) * sizeof(UChar)); > > memcpy to overlapped place may cause unexpected results? I think memmove is a better way to do it.
Darin Adler
Comment 4 2011-06-17 11:01:27 PDT
Comment on attachment 95002 [details] word-prefix search for no-ICU code path (v1.0) View in context: https://bugs.webkit.org/attachment.cgi?id=95002&action=review Need to at least fix the memcpy problem. But I think that also using a circular buffer instead of a memmove each time is the way to go. It may be tricky to get a circular buffer to work with functions such as U16_GET, though, so not sure it will be easy. >> Source/WebCore/editing/TextIterator.cpp:2189 >> + > > same thing here. We can move identical code to the common place. We don’t want to copy and paste this code, yes, so we need a version of the patch that does not involve a second copy. >> Source/WebCore/editing/TextIterator.cpp:2236 >> + memcpy(m_context.data(), m_context.data() + 1, (m_context.size() - 1) * sizeof(UChar)); > > memcpy to overlapped place may cause unexpected results? That’s right. You can’t use memcpy for this; it has to be memmove instead. And doing a memmove every time we add a character is really inefficient. I suggest considering a circular buffer instead.
Yi Shen
Comment 5 2011-06-17 13:38:08 PDT
I agree that memmove is a better way, but I think memcpy is fine in this case since the source is after the destination,right? e.g, str = "0123456789" memcpy(str+2, str+1, 3); returns: 0111356789 memcpy(str+1, str+2, 3); returns: 0234456789 Anyway, I will change it. (In reply to comment #3) > > > Source/WebCore/editing/TextIterator.cpp:2236 > > > + memcpy(m_context.data(), m_context.data() + 1, (m_context.size() - 1) * sizeof(UChar)); > > > > memcpy to overlapped place may cause unexpected results? > > I think memmove is a better way to do it.
Yi Shen
Comment 6 2011-06-17 13:38:49 PDT
Thanks both for the reviewing :) I will update the patch based on your comments. (In reply to comment #4) > (From update of attachment 95002 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=95002&action=review > > Need to at least fix the memcpy problem. But I think that also using a circular buffer instead of a memmove each time is the way to go. It may be tricky to get a circular buffer to work with functions such as U16_GET, though, so not sure it will be easy. > > >> Source/WebCore/editing/TextIterator.cpp:2189 > >> + > > > > same thing here. We can move identical code to the common place. > > We don’t want to copy and paste this code, yes, so we need a version of the patch that does not involve a second copy. > > >> Source/WebCore/editing/TextIterator.cpp:2236 > >> + memcpy(m_context.data(), m_context.data() + 1, (m_context.size() - 1) * sizeof(UChar)); > > > > memcpy to overlapped place may cause unexpected results? > > That’s right. You can’t use memcpy for this; it has to be memmove instead. > > And doing a memmove every time we add a character is really inefficient. I suggest considering a circular buffer instead.
Benjamin Poulain
Comment 7 2011-06-17 15:31:54 PDT
(In reply to comment #5) > I agree that memmove is a better way, but I think memcpy is fine in this case since the source is after the destination,right? > > e.g, > str = "0123456789" > memcpy(str+2, str+1, 3); > returns: 0111356789 > > memcpy(str+1, str+2, 3); > returns: 0234456789 This is invalid, and will fail on some platform, some libc or even depending on the alignment. The source and destination of memcpy cannot overlap (see http://linuxcertif.com/man/3/memcpy/en/ ).
Yi Shen
Comment 8 2011-06-22 07:54:29 PDT
(In reply to comment #7) > (In reply to comment #5) > > I agree that memmove is a better way, but I think memcpy is fine in this case since the source is after the destination,right? > > > > e.g, > > str = "0123456789" > > memcpy(str+2, str+1, 3); > > returns: 0111356789 > > > > memcpy(str+1, str+2, 3); > > returns: 0234456789 > > This is invalid, and will fail on some platform, some libc or even depending on the alignment. > The source and destination of memcpy cannot overlap (see http://linuxcertif.com/man/3/memcpy/en/ ). Agree, will fix it :)
Note You need to log in before you can comment on or make changes to this bug.