WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
105692
Element boundaries prevent Japanese line break opportunities
https://bugs.webkit.org/show_bug.cgi?id=105692
Summary
Element boundaries prevent Japanese line break opportunities
Koji Ishii
Reported
2012-12-23 09:33:49 PST
Created
attachment 180625
[details]
test case Line break opportunities before and after <span> or any inline elements are prohibited. EXPECTED: <span> or any inline elements should not prevent line break opportunities.
Attachments
test case
(675 bytes, text/html)
2012-12-23 09:33 PST
,
Koji Ishii
no flags
Details
test case result
(5.66 KB, image/png)
2013-01-13 23:05 PST
,
Koji Ishii
no flags
Details
test case expected
(8.49 KB, image/png)
2013-01-13 23:06 PST
,
Koji Ishii
no flags
Details
Patch
(7.21 KB, patch)
2013-03-11 19:14 PDT
,
Glenn Adams
webkit.review.bot
: commit-queue-
Details
Formatted Diff
Diff
Patch
(24.03 KB, patch)
2013-03-15 20:24 PDT
,
Glenn Adams
no flags
Details
Formatted Diff
Diff
Patch
(42.90 KB, patch)
2013-03-18 22:12 PDT
,
Glenn Adams
no flags
Details
Formatted Diff
Diff
Patch
(43.13 KB, patch)
2013-03-18 22:37 PDT
,
Glenn Adams
no flags
Details
Formatted Diff
Diff
Patch
(71.00 KB, patch)
2013-03-19 14:38 PDT
,
Glenn Adams
no flags
Details
Formatted Diff
Diff
Line layout performance results (en, jp) for attachment 193924.
(46.50 KB, text/html)
2013-03-19 14:44 PDT
,
Glenn Adams
no flags
Details
Parser (html5-full-render) performance results for attachment 193924.
(32.53 KB, text/html)
2013-03-19 14:47 PDT
,
Glenn Adams
no flags
Details
Patch
(73.12 KB, patch)
2013-03-20 19:43 PDT
,
Glenn Adams
no flags
Details
Formatted Diff
Diff
Patch
(263.95 KB, patch)
2013-03-21 18:29 PDT
,
Glenn Adams
no flags
Details
Formatted Diff
Diff
Archive of layout-test-results from webkit-ews-02
(1.16 MB, application/zip)
2013-03-22 08:50 PDT
,
Build Bot
no flags
Details
Patch
(79.70 KB, patch)
2013-03-26 19:02 PDT
,
Glenn Adams
glenn
: review-
webkit.review.bot
: commit-queue-
Details
Formatted Diff
Diff
Archive of layout-test-results from gce-cr-linux-06 for chromium-linux-x86_64
(1.76 MB, application/zip)
2013-03-26 21:18 PDT
,
WebKit Review Bot
no flags
Details
Archive of layout-test-results from webkit-ews-16 for mac-mountainlion-wk2
(1.15 MB, application/zip)
2013-03-26 22:12 PDT
,
Build Bot
no flags
Details
Archive of layout-test-results from gce-cr-linux-03 for chromium-linux-x86_64
(1.89 MB, application/zip)
2013-03-26 22:20 PDT
,
WebKit Review Bot
no flags
Details
Patch
(625.42 KB, patch)
2013-03-27 23:22 PDT
,
Glenn Adams
no flags
Details
Formatted Diff
Diff
Patch
(226.94 KB, patch)
2013-04-02 20:17 PDT
,
Glenn Adams
no flags
Details
Formatted Diff
Diff
Patch
(226.99 KB, patch)
2013-04-02 20:36 PDT
,
Glenn Adams
no flags
Details
Formatted Diff
Diff
Patch
(583.64 KB, patch)
2013-04-02 22:00 PDT
,
Glenn Adams
no flags
Details
Formatted Diff
Diff
Show Obsolete
(17)
View All
Add attachment
proposed patch, testcase, etc.
mitz
Comment 1
2012-12-31 19:09:56 PST
Looks like a duplicate of
bug 17427
.
Koji Ishii
Comment 2
2013-01-13 23:05:45 PST
Created
attachment 182512
[details]
test case result
Koji Ishii
Comment 3
2013-01-13 23:06:07 PST
Created
attachment 182513
[details]
test case expected
Lu jing
Comment 4
2013-01-22 05:13:03 PST
I work for fix this bug, could you assign it to me?
Glenn Adams
Comment 5
2013-01-22 18:13:44 PST
(In reply to
comment #4
)
> I work for fix this bug, could you assign it to me?
Hi Lu Jing, I have already assigned it to myself and started work on this bug, which I believe may be dependent on
bug 89235
. If you would like to help, please email me directly to discuss. Thanks, Glenn
Lu jing
Comment 6
2013-01-23 16:32:02 PST
(In reply to
comment #5
)
> (In reply to
comment #4
) > > I work for fix this bug, could you assign it to me? > > Hi Lu Jing, I have already assigned it to myself and started work on this bug, which I believe may be dependent on
bug 89235
. If you would like to help, please email me directly to discuss. Thanks, Glenn
Got it. Thanks.
Koji Ishii
Comment 7
2013-02-17 06:43:35 PST
I don't think this bug depends on
bug 89235
. 89325 is about changing iterator while this bug is about how to call iterators. Let's not try to create unnecessary dependencies as it will slow things down. Lu, can you please keep working on this if you can?
Glenn Adams
Comment 8
2013-02-17 08:48:45 PST
(In reply to
comment #7
)
> I don't think this bug depends on
bug 89235
. 89325 is about changing iterator while this bug is about how to call iterators. Let's not try to create unnecessary dependencies as it will slow things down.
agreed; however, it either depends on or is a duplicate of 17427 as pointed out by mitz in
comment #1
Lu jing
Comment 9
2013-02-17 11:33:31 PST
(In reply to
comment #7
)
> I don't think this bug depends on
bug 89235
. 89325 is about changing iterator while this bug is about how to call iterators. Let's not try to create unnecessary dependencies as it will slow things down. > > Lu, can you please keep working on this if you can?
Ok, I've read all about 17427. I'm not sure I can fix but I will keep working on this.
Koji Ishii
Comment 10
2013-02-21 06:32:36 PST
If 17427 can fix this too, since nobody seems to be working on 17427, you can work on 17427 and then resolve this as dup, if you want to.
Glenn Adams
Comment 11
2013-02-21 06:59:10 PST
(In reply to
comment #10
)
> If 17427 can fix this too, since nobody seems to be working on 17427, you can work on 17427 and then resolve this as dup, if you want to.
that's my plan
Glenn Adams
Comment 12
2013-02-25 09:28:53 PST
verified that patch
https://bugs.webkit.org/attachment.cgi?id=190055
on
bug 17427
fixes the problem shown by the attached test case
Glenn Adams
Comment 13
2013-03-11 19:14:41 PDT
Created
attachment 192625
[details]
Patch
WebKit Review Bot
Comment 14
2013-03-11 22:14:59 PDT
Comment on
attachment 192625
[details]
Patch
Attachment 192625
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://webkit-commit-queue.appspot.com/results/17013978
New failing tests: http/tests/canvas/philip/tests/security.pattern.canvas.strokeStyle.html css1/cascade/important.html css1/box_properties/acid_test.html http/tests/canvas/philip/tests/security.pattern.cross.html css1/box_properties/border_bottom.html css1/basic/containment.html http/tests/canvas/philip/tests/security.pattern.image.strokeStyle.html accessibility/radio-button-title-label.html css1/box_properties/border_bottom_width_inline.html http/tests/canvas/philip/tests/security.reset.html http/tests/canvas/philip/tests/security.pattern.canvas.timing.html css1/classification/list_style_position.html accessibility/label-for-control-hittest.html canvas/philip/tests/2d.drawImage.negativeSourceHeight2.html css1/basic/contextual_selectors.html canvas/philip/tests/2d.composite.operation.foobar.html css1/classification/display.html css1/color_and_background/background.html css1/classification/list_style.html http/tests/canvas/philip/tests/security.drawImage.image.html css1/color_and_background/background_color.html http/tests/canvas/philip/tests/security.pattern.image.fillStyle.html css1/box_properties/border_bottom_inline.html http/tests/canvas/philip/tests/security.drawImage.canvas.html css1/color_and_background/background_attachment.html http/tests/canvas/philip/tests/security.pattern.canvas.fillStyle.html http/tests/canvas/philip/tests/security.pattern.create.html canvas/philip/tests/2d.drawImage.negativeSourceHeightAndWidth.html http/tests/cache/subresource-failover-to-network.html canvas/philip/tests/2d.drawImage.negativeSourceHeight.html
Glenn Adams
Comment 15
2013-03-11 23:58:32 PDT
Comment on
attachment 192625
[details]
Patch causes functional regression
Glenn Adams
Comment 16
2013-03-15 20:24:06 PDT
Created
attachment 193419
[details]
Patch
Glenn Adams
Comment 17
2013-03-15 20:33:47 PDT
Comment on
attachment 193419
[details]
Patch Support use of prior context in non-ASCII path without burden of memory allocation. This required changing to a custom ICU text provider for UChar strings and will also require changes to the current custom text provider for LChar strings. This patch is not yet ready for review: (1) uTextLatin1* (existing LChar text provider) need to be updated to use prior context; (2) need to handle surrogate pairs in prior context as well as at boundary of prior and primary contexts; (3) need functional and performance regression testing; This patch does, however, fix the reported test reduction, as also indicated by the new LayoutTest included in this patch.
Glenn Adams
Comment 18
2013-03-15 21:10:34 PDT
(In reply to
comment #17
)
> (From update of
attachment 193419
[details]
) > Support use of prior context in non-ASCII path without burden of memory allocation. This required changing to a custom ICU text provider for UChar strings and will also require changes to the current custom text provider for LChar strings. This patch is not yet ready for review: > > (1) uTextLatin1* (existing LChar text provider) need to be updated to use prior context; > (2) need to handle surrogate pairs in prior context as well as at boundary of prior and primary contexts; > (3) need functional and performance regression testing; > > This patch does, however, fix the reported test reduction, as also indicated by the new LayoutTest included in this patch.
Adding msaboff since he implemented original uTextLatin1* text provider. I expect to ask him to take a quick look at this work once it gets to an r? stage.
Glenn Adams
Comment 19
2013-03-18 22:12:42 PDT
Created
attachment 193728
[details]
Patch
Early Warning System Bot
Comment 20
2013-03-18 22:18:46 PDT
Comment on
attachment 193728
[details]
Patch
Attachment 193728
[details]
did not pass qt-ews (qt): Output:
http://webkit-commit-queue.appspot.com/results/17160598
Early Warning System Bot
Comment 21
2013-03-18 22:20:37 PDT
Comment on
attachment 193728
[details]
Patch
Attachment 193728
[details]
did not pass qt-wk2-ews (qt): Output:
http://webkit-commit-queue.appspot.com/results/17234336
WebKit Review Bot
Comment 22
2013-03-18 22:30:40 PDT
Comment on
attachment 193728
[details]
Patch
Attachment 193728
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://webkit-commit-queue.appspot.com/results/17176607
WebKit Review Bot
Comment 23
2013-03-18 22:31:46 PDT
Comment on
attachment 193728
[details]
Patch
Attachment 193728
[details]
did not pass cr-linux-debug-ews (chromium-xvfb): Output:
http://webkit-commit-queue.appspot.com/results/17211445
Glenn Adams
Comment 24
2013-03-18 22:37:49 PDT
Created
attachment 193732
[details]
Patch
Build Bot
Comment 25
2013-03-19 00:51:36 PDT
Comment on
attachment 193732
[details]
Patch
Attachment 193732
[details]
did not pass mac-ews (mac): Output:
http://webkit-commit-queue.appspot.com/results/17183548
Build Bot
Comment 26
2013-03-19 00:59:25 PDT
Comment on
attachment 193732
[details]
Patch
Attachment 193732
[details]
did not pass mac-wk2-ews (mac-wk2): Output:
http://webkit-commit-queue.appspot.com/results/17074957
Glenn Adams
Comment 27
2013-03-19 14:38:57 PDT
Created
attachment 193924
[details]
Patch
Glenn Adams
Comment 28
2013-03-19 14:44:55 PDT
Created
attachment 193927
[details]
Line layout performance results (en, jp) for
attachment 193924
[details]
. Approximately 4% improvement for Japanese. Slight improvement for English.
Glenn Adams
Comment 29
2013-03-19 14:47:52 PDT
Created
attachment 193929
[details]
Parser (html5-full-render) performance results for
attachment 193924
[details]
. No measurable regression. Performed on MBP Retina MacOSX 10.8.3.
Glenn Adams
Comment 30
2013-03-19 15:22:08 PDT
(In reply to
comment #29
)
> Created an attachment (id=193929) [details] > Parser (html5-full-render) performance results for
attachment 193924
[details]
. > > No measurable regression. Performed on MBP Retina MacOSX 10.8.3.
I should have said no clearly significant regression within the indicated variance.
Darin Adler
Comment 31
2013-03-20 10:46:43 PDT
Comment on
attachment 193924
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=193924&action=review
Thanks for tackling this. A lot of the new code here isn’t following WebKit coding style and it should. I’m concerned about the level of test coverage for this significant amount code. I am pretty sure I spotted an infinite loop in the code, not sure why it was not a problem in practice.
> Source/WebCore/ChangeLog:28 > + (WebCore):
It’s best to remove bogus lines like this that the script adds to change log entries.
> Source/WebCore/platform/text/TextBreakIterator.h:57 > + const int TextBreakPriorContextCapacity = 2;
I think this constant would be better as a private member of LazyLineBreakIterator rather exposed at the top level as a public constant. It has nothing to do with the interface to the iterator.
> Source/WebCore/platform/text/TextBreakIterator.h:83 > + UChar lastCharacter() const { return m_priorContext[1]; }
Do we need an assertion about m_priorContextLength here?
> Source/WebCore/platform/text/TextBreakIterator.h:85 > + const UChar* getPriorContext(int& priorContextLength)
This is used only within the class, so I suggest it be public. I’m also not sure it needs to be a function. Having this logic in the get function might be good.
> Source/WebCore/platform/text/TextBreakIterator.h:95 > + m_priorContextLength = last ? (secondToLast ? 2 : 1) : 0;
This seems a bit peculiar. We are storing a length, but it’s not additional data; it’s just an indicator of which characters are zero. m_priorContextLength should not be stored if it can be computed.
> Source/WebCore/platform/text/TextBreakIterator.h:103 > + void resetPriorContext()
Renaming this to be more abstract isn’t all that useful. We still are storing just two characters. It’s not all that helpful to pretend to be more general in our naming.
> Source/WebCore/platform/text/TextBreakIterator.h:110 > - TextBreakIterator* get() > + TextBreakIterator* get(int& priorContextLength)
This interface is confusing enough that I think we’ll need a comment.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:71 > +static inline int64_t uTextWithPriorContextGetNativeLength(UText* ut) > +{ > + return ut->a + ut->b; > +}
Nothing here seems to be specific to “a UText with prior context” so the function name seems peculiar. A shorter name would likely make the code easier to read. Further, including the word “Get” in this name does not make clear “this is the inlined one you want to call”. It’s just a nearly identical name to the other function and it seems likely we’d call the wrong one. Further, I don’t think that adding “with prior context” to the names of all the functions is necessary. It’s really hard to parse those long names. There must be a more concise way to put things. WebKit coding style is to use words rather than abbreviations. I suggest the word “text” rather than the letters “ut” as the name of the object pointer in all these functions.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:77 > +// Generic native length function, works with both Latin1 and UC text providers. > +static int64_t uTextWithPriorContextNativeLength(UText* ut) > +{ > + return uTextWithPriorContextGetNativeLength(ut); > +}
Here the long name is perhaps justified, since this function exists to be put in the method table below.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:80 > -static UText* uTextLatin1Clone(UText* destination, const UText* source, UBool deep, UErrorCode* status) > +// Generic clone function, works with both Latin1 and UC text providers. > +static UText * uTextWithPriorContextClone(UText* dest, const UText* src, UBool deep, UErrorCode* status)
Why the formatting change to add a space before the "*" here? That’s not normal for our project. Why the abbreviation of destination and source? In WebKit we prefer words to abbreviations.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:89 > + void* extraNew = dest->pExtra;
What does “extra new” mean? Unclear name.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:93 > + int sizeToCopy = src->sizeOfStruct; > + if (sizeToCopy > dest->sizeOfStruct) > + sizeToCopy = dest->sizeOfStruct;
Would read more clearly with std::min instead of an if statement.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:98 > + if (extraSize) > + memcpy(dest->pExtra, src->pExtra, extraSize);
The if here is unneeded and I suggest omitting it.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:103 > +static int32_t uTextWithPriorContextExtract(UText* ut, int64_t start, int64_t limit, UChar* dest, int32_t destCapacity, UErrorCode* pErrorCode)
The use of a p prefix to mean a pointer is not a WebKit standard style element, so should not be done. The name “ut” does not seem clear; a word would be better than a pile of letters and more consistent with WebKit coding style. Please don’t use the abbreviation dest; use the word destination instead.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:128 > + int32_t copied = 0;
It’s normally best to name local variables with noun phrases rather than fragments. The word “copied” sounds like it could be a boolean and it would be clearer to call it charactersCopied or something along those lines.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:129 > + while (copied < length) {
This loop does not seem to increment “copied”, so I’d expect it to be an infinite loop. How did you test?
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:131 > + if (!ut->pFuncs->access(ut, start, TRUE)) > + ASSERT_NOT_REACHED();
Is continuing to run an ignoring the failure the most helpful behavior here for release builds?
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:133 > + int32_t ucAvailable = ut->chunkLength - ut->chunkOffset; > + int32_t ucToCopy = ucAvailable < length ? ucAvailable : length;
The “uc” prefix on these local variable names is not the usual WebKit style. It’s also best to name local variables with noun phrases rather than fragments. “to copy” is not really so great.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:134 > + StringImpl::copyChars(&dest[copied], ut->chunkContents + ut->chunkOffset, static_cast<unsigned>(ucToCopy));
Why the static_cast here? Normally we can pass an int32_t to an unsigned without an explicit cast.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:146 > +// Generic map offset to native index function, works with both Latin1 and UC text providers.
The comments like this one before each function don’t say much. I’d omit them. It’s true that these functions are used twice, for both Latin-1 and Unicode, but they’re not “generic” in any additional interesting sense, nor is it helpful to repeat the function name in prose in a comment. Also, I am not comfortable with the abbreviation “uc” for “Unicode” and I’d prefer we keep the use of that to a minimum.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:157 > + ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX);
What guarantees this is true? We can assert it only if we know it will be true.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:238 > + ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX);
Same question as elsewhere. What guarantees this?
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:250 > + ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX);
Same question as elsewhere. What guarantees this?
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:291 > +static const struct UTextFuncs uTextLatin1WithPriorContextFuncs = > +{
In WebKit coding style the brace would go on the previous line.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:347 > + ASSERT_UNUSED(forward, forward ? nativeIndex >= ut->b && nativeIndex < ut->a + ut->b : nativeIndex > ut->b && nativeIndex <= ut->a + ut->b);
We don’t normally do assertions with && in them. I suggest breaking this out into two assertions, one about the start and one about the end.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:350 > + ASSERT(ut->chunkNativeLimit - ut->chunkNativeStart < INT32_MAX);
What guarantees this? We can’t just assert because we hope it’s true.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:353 > + ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX);
What guarantees this? We can’t just assert because we hope it’s true.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:372 > + ASSERT(nativeIndex < INT32_MAX);
What guarantees this? We can’t just assert because we hope it’s true.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:486 > + UText uTextUCWithPriorContextLocal = UTEXT_INITIALIZER;
This long name does not seem helpful. Why does the phrase “with prior context” need to appear in the variable name? Why not just call this textLocalStorage or something like that?
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:488 > + UErrorCode uTextOpenStatus = U_ZERO_ERROR;
Why not just call this openStatus?
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:489 > + UText* uTextUCWithPriorContext = UTextOpenUTF16(&uTextUCWithPriorContextLocal, string, length, priorContext, priorContextLength, &uTextOpenStatus);
Why not just call this text?
Glenn Adams
Comment 32
2013-03-20 11:02:08 PDT
(In reply to
comment #31
)
> (From update of
attachment 193924
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=193924&action=review
> > Thanks for tackling this. A lot of the new code here isn’t following WebKit coding style and it should.
Thanks for the quick review! I'll address these points and post a new patch ASAP.
Michael Saboff
Comment 33
2013-03-20 12:33:13 PDT
Comment on
attachment 193924
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=193924&action=review
> Source/WebCore/ChangeLog:9 > + in non-ASCII path through nextBreakablePossition(). This enables
typo - nextBreakablePosition()
> Source/WebCore/ChangeLog:68 > + (WebCore::uTextWithPriorContextGetNativeLength): > + (WebCore): > + (WebCore::uTextWithPriorContextNativeLength): > + (WebCore::uTextWithPriorContextClone): > + (WebCore::uTextWithPriorContextExtract): > + (WebCore::uTextWithPriorContextMapOffsetToNative): > + (WebCore::uTextWithPriorContextMapNativeIndexToUTF16): > + (WebCore::uTextWithPriorContextClose): > + (WebCore::uTextGetContext): > + (WebCore::uTextLatin1GetCurrentContext): > + (WebCore::uTextLatin1MoveInPrimaryContext): > + (WebCore::uTextLatin1SwitchToPrimaryContext): > + (WebCore::uTextLatin1MoveInPriorContext): > + (WebCore::uTextLatin1SwitchToPriorContext): > + (WebCore::uTextInChunkOrOutOfRange): > + (WebCore::uTextLatin1WithPriorContextAccess): > + (WebCore::uTextWithPriorContextInit): > + (WebCore::UTextOpenLatin1): > + (WebCore::uTextUTF16GetCurrentContext): > + (WebCore::uTextUTF16MoveInPrimaryContext): > + (WebCore::uTextUTF16SwitchToPrimaryContext): > + (WebCore::uTextUTF16MoveInPriorContext): > + (WebCore::uTextUTF16SwitchToPriorContext): > + (WebCore::uTextUTF16WithPriorContextAccess):
I don't think adding WithPriorContext to the various function names adds much. Some of these functions are shared between the Latin1 and UTF16 UText processing and then there are the two Access functions.
> Source/WebCore/platform/text/TextBreakIterator.h:92 > {
Shouldn't we ASSERT here that we have at least two characters of space in m_priorContext?
> Source/WebCore/platform/text/TextBreakIterator.h:98 > {
Shouldn't we ASSERT here that we have at least two characters of space in m_priorContext?
> Source/WebCore/platform/text/TextBreakIterator.h:104 > + {
Shouldn't we ASSERT here that we have at least two characters of context to reset, or iterate through from 0..TextBreakPriorContextCapacity-1?
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:92 > + if (sizeToCopy > dest->sizeOfStruct)
When would src->sizeOfStruct != dest->sizeOfStruct?
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:94 > + memcpy(dest, src, sizeToCopy);
I prefer the element by element initialization, since there are special cases to handle in the destination. For example, the addition of a new pointer in struct UText would end up being a problem with the memcpy.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:459 > + UTextWithBuffer uTextLatin1WithPriorContextLocal; > + uTextLatin1WithPriorContextLocal.ut = emptyUText; > + uTextLatin1WithPriorContextLocal.ut.extraSize = sizeof(uTextLatin1WithPriorContextLocal.buffer); > + uTextLatin1WithPriorContextLocal.ut.pExtra = uTextLatin1WithPriorContextLocal.buffer;
Same comment about adding WithPriorContext. I don't think it is needed.
Glenn Adams
Comment 34
2013-03-20 16:56:41 PDT
These comments apply to an update to the current patch I will be submitting presently. (In reply to
comment #31
)
> (From update of
attachment 193924
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=193924&action=review
> > Thanks for tackling this. A lot of the new code here isn’t following WebKit coding style and it should.
Fixed (see details below). Most of the style diffs were from using the ICU text provider code examples (and their style).
> > I’m concerned about the level of test coverage for this significant amount code. I am pretty sure I spotted an infinite loop in the code, not sure why it was not a problem in practice.
Apparently never called, but see further comments below.
> > > Source/WebCore/ChangeLog:28 > > + (WebCore): > > It’s best to remove bogus lines like this that the script adds to change log entries.
Fixed.
> > > Source/WebCore/platform/text/TextBreakIterator.h:57 > > + const int TextBreakPriorContextCapacity = 2; > > I think this constant would be better as a private member of LazyLineBreakIterator rather exposed at the top level as a public constant. It has nothing to do with the interface to the iterator.
Fixed.
> > > Source/WebCore/platform/text/TextBreakIterator.h:83 > > + UChar lastCharacter() const { return m_priorContext[1]; } > > Do we need an assertion about m_priorContextLength here?
Added assert against priorContextCapacity.
> > > Source/WebCore/platform/text/TextBreakIterator.h:85 > > + const UChar* getPriorContext(int& priorContextLength) > > This is used only within the class, so I suggest it be public. I’m also not sure it needs to be a function. Having this logic in the get function might be good.
Removed, incorporating logic into get().
> > > Source/WebCore/platform/text/TextBreakIterator.h:95 > > + m_priorContextLength = last ? (secondToLast ? 2 : 1) : 0; > > This seems a bit peculiar. We are storing a length, but it’s not additional data; it’s just an indicator of which characters are zero. m_priorContextLength should not be stored if it can be computed.
Replaced with private compute function.
> > > Source/WebCore/platform/text/TextBreakIterator.h:103 > > + void resetPriorContext() > > Renaming this to be more abstract isn’t all that useful. We still are storing just two characters. It’s not all that helpful to pretend to be more general in our naming.
The present storage of only two characters matches the current ASCII path usage and fixes the CJK cases; however, I anticipate having to expand prior context to store up to an entire grapheme cluster (or more) in order to properly handle Indic and certain SE Asian scripts. It may eventually require storage of a prior word in order to handle dictionary based word breaking in Thai (and similar writing systems).
> > > Source/WebCore/platform/text/TextBreakIterator.h:110 > > - TextBreakIterator* get() > > + TextBreakIterator* get(int& priorContextLength) > > This interface is confusing enough that I think we’ll need a comment.
Added comment.
> > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:71 > > +static inline int64_t uTextWithPriorContextGetNativeLength(UText* ut) > > +{ > > + return ut->a + ut->b; > > +} > > Nothing here seems to be specific to “a UText with prior context” so the function name seems peculiar. A shorter name would likely make the code easier to read. Further, including the word “Get” in this name does not make clear “this is the inlined one you want to call”. It’s just a nearly identical name to the other function and it seems likely we’d call the wrong one.
Change "uTextWithPriorContext" to "text" throughout. I originally made the length getter into two parts, an inline part and a non-inline part, because I couldn't recall if the existing compilers would expand an inline as non-inline if a function pointer were taken of it (as is the case here).
> > Further, I don’t think that adding “with prior context” to the names of all the functions is necessary. It’s really hard to parse those long names. There must be a more concise way to put things.
Fixed.
> WebKit coding style is to use words rather than abbreviations. I suggest the word “text” rather than the letters “ut” as the name of the object pointer in all these functions.
Changed to "text". The "ut" usage was based on the usage found in the ICU library implementation of text providers.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:77 > > +// Generic native length function, works with both Latin1 and UC text providers. > > +static int64_t uTextWithPriorContextNativeLength(UText* ut) > > +{ > > + return uTextWithPriorContextGetNativeLength(ut); > > +} > > Here the long name is perhaps justified, since this function exists to be put in the method table below. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:80 > > -static UText* uTextLatin1Clone(UText* destination, const UText* source, UBool deep, UErrorCode* status) > > +// Generic clone function, works with both Latin1 and UC text providers. > > +static UText * uTextWithPriorContextClone(UText* dest, const UText* src, UBool deep, UErrorCode* status) > > Why the formatting change to add a space before the "*" here? That’s not normal for our project.
Fixed. It was a typo. For some reason check-webkit-style didn't catch it.
> > Why the abbreviation of destination and source? In WebKit we prefer words to abbreviations.
Fixed. This usage was from the ICU implementation examples of text providers.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:89 > > + void* extraNew = dest->pExtra; > > What does “extra new” mean? Unclear name.
This name corresponds to the ICU field UText.pExtra. See ICU utext.h for further information.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:93 > > + int sizeToCopy = src->sizeOfStruct; > > + if (sizeToCopy > dest->sizeOfStruct) > > + sizeToCopy = dest->sizeOfStruct; > > Would read more clearly with std::min instead of an if statement.
Fixed.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:98 > > + if (extraSize) > > + memcpy(dest->pExtra, src->pExtra, extraSize); > > The if here is unneeded and I suggest omitting it.
Removed.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:103 > > +static int32_t uTextWithPriorContextExtract(UText* ut, int64_t start, int64_t limit, UChar* dest, int32_t destCapacity, UErrorCode* pErrorCode) > > The use of a p prefix to mean a pointer is not a WebKit standard style element, so should not be done.
Fixed. This came from ICU text provider usage.
> > The name “ut” does not seem clear; a word would be better than a pile of letters and more consistent with WebKit coding style.
Changed to "text". I had been following ICU text provider code example usage here.
> > Please don’t use the abbreviation dest; use the word destination instead.
Fixed. I had been following ICU text provider code example usage here.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:128 > > + int32_t copied = 0; > > It’s normally best to name local variables with noun phrases rather than fragments. The word “copied” sounds like it could be a boolean and it would be clearer to call it charactersCopied or something along those lines.
I had originally called it charsCopied, but that seemed too verbose and a misnomer since it is not characters but character coding units (i.e., UTF-16 coding units) that are being copied. For example, a surrogate pair is two coding units but one character! Since I already use the prefix "uc" to refer to such unicode code (units), perhaps I'll change here to "ucCopied", which goes along with "ucToCopy". In any case, i've removed this code and this method now asserts or returns an error code if invoked. See more below.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:129 > > + while (copied < length) { > > This loop does not seem to increment “copied”, so I’d expect it to be an infinite loop. How did you test?
Good catch. I had originally incremented in an earlier fragment but it got chopped out by mistake. It appears that utext_extract() isn't being called by ICU's rbbi (rule based break iterator) implementation even though it appears to be defined as a mandatory method in UTextFuncs. Perhaps the best way to handle this would be to replace the body of the method with ASSERT_NOT_REACHED() and return U_UNSUPPORTED_ERROR code for release build, then see if we hit this in any of the test bots? There does appear to be some internal ICU tests aimed at this method, but I haven't attempted to run them. WDYT?
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:131 > > + if (!ut->pFuncs->access(ut, start, TRUE)) > > + ASSERT_NOT_REACHED(); > > Is continuing to run an ignoring the failure the most helpful behavior here for release builds?
This method now always asserts or returns error code.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:133 > > + int32_t ucAvailable = ut->chunkLength - ut->chunkOffset; > > + int32_t ucToCopy = ucAvailable < length ? ucAvailable : length; > > The “uc” prefix on these local variable names is not the usual WebKit style. It’s also best to name local variables with noun phrases rather than fragments. “to copy” is not really so great.
Code removed. This method now always asserts or returns error code.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:134 > > + StringImpl::copyChars(&dest[copied], ut->chunkContents + ut->chunkOffset, static_cast<unsigned>(ucToCopy)); > > Why the static_cast here? Normally we can pass an int32_t to an unsigned without an explicit cast.
Code removed. This method now always asserts or returns error code.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:146 > > +// Generic map offset to native index function, works with both Latin1 and UC text providers. > > The comments like this one before each function don’t say much. I’d omit them. It’s true that these functions are used twice, for both Latin-1 and Unicode, but they’re not “generic” in any additional interesting sense, nor is it helpful to repeat the function name in prose in a comment.
Removed.
> > Also, I am not comfortable with the abbreviation “uc” for “Unicode” and I’d prefer we keep the use of that to a minimum.
Removed.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:157 > > + ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX); > > What guarantees this is true? We can assert it only if we know it will be true.
Fixed. Use 0 if offset exceeds 32 bits. Note also that utext.h says: "Behavior is undefined if the native index does not fall within the current chunk." So returning zero will at least be a fixed, defined behavior.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:238 > > + ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX); > > Same question as elsewhere. What guarantees this?
Fixed. Use 0 if offset exceeds 32 bits.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:250 > > + ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX); > > Same question as elsewhere. What guarantees this?
Fixed. Use 0 if offset exceeds 32 bits.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:291 > > +static const struct UTextFuncs uTextLatin1WithPriorContextFuncs = > > +{ > > In WebKit coding style the brace would go on the previous line.
Fixed.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:347 > > + ASSERT_UNUSED(forward, forward ? nativeIndex >= ut->b && nativeIndex < ut->a + ut->b : nativeIndex > ut->b && nativeIndex <= ut->a + ut->b); > > We don’t normally do assertions with && in them. I suggest breaking this out into two assertions, one about the start and one about the end.
Fixed.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:350 > > + ASSERT(ut->chunkNativeLimit - ut->chunkNativeStart < INT32_MAX); > > What guarantees this? We can’t just assert because we hope it’s true.
Fixed. Use 0 if offset exceeds 32 bits.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:353 > > + ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX); > > What guarantees this? We can’t just assert because we hope it’s true.
Fixed. Use 0 if offset exceeds 32 bits.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:372 > > + ASSERT(nativeIndex < INT32_MAX); > > What guarantees this? We can’t just assert because we hope it’s true.
Fixed. Use 0 if offset exceeds 32 bits.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:486 > > + UText uTextUCWithPriorContextLocal = UTEXT_INITIALIZER; > > This long name does not seem helpful. Why does the phrase “with prior context” need to appear in the variable name? Why not just call this textLocalStorage or something like that?
Fixed. Changed to "textLocal".
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:488 > > + UErrorCode uTextOpenStatus = U_ZERO_ERROR; > > Why not just call this openStatus?
Fixed. This was previously based on ICU text provider code usage.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:489 > > + UText* uTextUCWithPriorContext = UTextOpenUTF16(&uTextUCWithPriorContextLocal, string, length, priorContext, priorContextLength, &uTextOpenStatus); > > Why not just call this text?
Fixed.
Glenn Adams
Comment 35
2013-03-20 17:31:45 PDT
Thanks for the quick review! The following comments apply to an update to the current patch I will be submitting presently. (In reply to
comment #33
)
> (From update of
attachment 193924
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=193924&action=review
> > > Source/WebCore/ChangeLog:9 > > + in non-ASCII path through nextBreakablePossition(). This enables > > typo - nextBreakablePosition()
Fixed.
> > > Source/WebCore/ChangeLog:68 > > + (WebCore::uTextWithPriorContextGetNativeLength): > > + (WebCore): > > + (WebCore::uTextWithPriorContextNativeLength): > > + (WebCore::uTextWithPriorContextClone): > > + (WebCore::uTextWithPriorContextExtract): > > + (WebCore::uTextWithPriorContextMapOffsetToNative): > > + (WebCore::uTextWithPriorContextMapNativeIndexToUTF16): > > + (WebCore::uTextWithPriorContextClose): > > + (WebCore::uTextGetContext): > > + (WebCore::uTextLatin1GetCurrentContext): > > + (WebCore::uTextLatin1MoveInPrimaryContext): > > + (WebCore::uTextLatin1SwitchToPrimaryContext): > > + (WebCore::uTextLatin1MoveInPriorContext): > > + (WebCore::uTextLatin1SwitchToPriorContext): > > + (WebCore::uTextInChunkOrOutOfRange): > > + (WebCore::uTextLatin1WithPriorContextAccess): > > + (WebCore::uTextWithPriorContextInit): > > + (WebCore::UTextOpenLatin1): > > + (WebCore::uTextUTF16GetCurrentContext): > > + (WebCore::uTextUTF16MoveInPrimaryContext): > > + (WebCore::uTextUTF16SwitchToPrimaryContext): > > + (WebCore::uTextUTF16MoveInPriorContext): > > + (WebCore::uTextUTF16SwitchToPriorContext): > > + (WebCore::uTextUTF16WithPriorContextAccess): > > I don't think adding WithPriorContext to the various function names adds much. Some of these functions are shared between the Latin1 and UTF16 UText processing and then there are the two Access functions.
Fixed: s/uTextWithPriorContext/text/ && s/uText/text/
> > > Source/WebCore/platform/text/TextBreakIterator.h:92 > > { > > Shouldn't we ASSERT here that we have at least two characters of space in m_priorContext?
Fixed.
> > > Source/WebCore/platform/text/TextBreakIterator.h:98 > > { > > Shouldn't we ASSERT here that we have at least two characters of space in m_priorContext?
Fixed.
> > > Source/WebCore/platform/text/TextBreakIterator.h:104 > > + { > > Shouldn't we ASSERT here that we have at least two characters of context to reset, or iterate through from 0..TextBreakPriorContextCapacity-1?
Fixed.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:92 > > + if (sizeToCopy > dest->sizeOfStruct) > > When would src->sizeOfStruct != dest->sizeOfStruct?
I modeled this code on that found in ICU utext.cpp:shallowTextClone(), which hasn't changed between version 46 and the current version 50. I believe the intent of this code was to handle clones from a source to which fields had been added to a destination where some of those added fields were not present. I suspect it does not occur in our particular usage of text, but it doesn't hurt to keep IMO.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:94 > > + memcpy(dest, src, sizeToCopy); > > I prefer the element by element initialization, since there are special cases to handle in the destination. For example, the addition of a new pointer in struct UText would end up being a problem with the memcpy.
Again, I pretty much used the ICU shallowTextClone() as a model here. In the prior implementation of uTextLatin1Clone, there were handling of fields that were specific to the Latin1 text provider usage. Here, I chose to use a generic approach as indicated in utext.cpp. In this (next) updated patch, I've added code to perform fixups on pointers in the destination text structure that originally pointed into either the source text structure's extra buffer or somewhere else in the source text structure.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:459 > > + UTextWithBuffer uTextLatin1WithPriorContextLocal; > > + uTextLatin1WithPriorContextLocal.ut = emptyUText; > > + uTextLatin1WithPriorContextLocal.ut.extraSize = sizeof(uTextLatin1WithPriorContextLocal.buffer); > > + uTextLatin1WithPriorContextLocal.ut.pExtra = uTextLatin1WithPriorContextLocal.buffer; > > Same comment about adding WithPriorContext. I don't think it is needed.
Fixed.
Glenn Adams
Comment 36
2013-03-20 19:43:22 PDT
Created
attachment 194170
[details]
Patch Address comments from darin and msaboff.
WebKit Review Bot
Comment 37
2013-03-21 09:31:10 PDT
Comment on
attachment 194170
[details]
Patch
Attachment 194170
[details]
did not pass cr-linux-debug-ews (chromium-xvfb): Output:
http://webkit-commit-queue.appspot.com/results/17132595
Glenn Adams
Comment 38
2013-03-21 18:29:17 PDT
Created
attachment 194409
[details]
Patch Address non-ICU path; replace skipped expectations with rebaselines or requests for same.
Build Bot
Comment 39
2013-03-22 08:50:17 PDT
Comment on
attachment 194409
[details]
Patch
Attachment 194409
[details]
did not pass mac-ews (mac): Output:
http://webkit-commit-queue.appspot.com/results/17286170
New failing tests: fast/sub-pixel/table-rtl-padding.html
Build Bot
Comment 40
2013-03-22 08:50:21 PDT
Created
attachment 194558
[details]
Archive of layout-test-results from webkit-ews-02 The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: webkit-ews-02 Port: <class 'webkitpy.common.config.ports.MacPort'> Platform: Mac OS X 10.8.2
Glenn Adams
Comment 41
2013-03-26 19:02:19 PDT
Created
attachment 195207
[details]
Patch Address regression in fast/sub-pixel/table-rtl-padding; expect fails on mac/chromium for rebaselining.
WebKit Review Bot
Comment 42
2013-03-26 21:18:28 PDT
Comment on
attachment 195207
[details]
Patch
Attachment 195207
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://webkit-commit-queue.appspot.com/results/17237572
New failing tests: fast/text/international/003.html fast/text/international/text-combine-image-test.html fast/writing-mode/Kusa-Makura-background-canvas.html
WebKit Review Bot
Comment 43
2013-03-26 21:18:34 PDT
Created
attachment 195218
[details]
Archive of layout-test-results from gce-cr-linux-06 for chromium-linux-x86_64 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: gce-cr-linux-06 Port: chromium-linux-x86_64 Platform: Linux-3.3.8-gcg-201212281604-x86_64-with-GCEL-10.04-gcel_10.04
Build Bot
Comment 44
2013-03-26 22:12:08 PDT
Comment on
attachment 195207
[details]
Patch
Attachment 195207
[details]
did not pass mac-wk2-ews (mac-wk2): Output:
http://webkit-commit-queue.appspot.com/results/17286470
New failing tests: fast/text/international/003.html fast/text/international/text-combine-image-test.html fast/writing-mode/Kusa-Makura-background-canvas.html
Build Bot
Comment 45
2013-03-26 22:12:13 PDT
Created
attachment 195221
[details]
Archive of layout-test-results from webkit-ews-16 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: webkit-ews-16 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.2
WebKit Review Bot
Comment 46
2013-03-26 22:20:40 PDT
Comment on
attachment 195207
[details]
Patch
Attachment 195207
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://webkit-commit-queue.appspot.com/results/17251671
New failing tests: fast/text/international/003.html fast/text/international/text-combine-image-test.html fast/writing-mode/Kusa-Makura-background-canvas.html
WebKit Review Bot
Comment 47
2013-03-26 22:20:47 PDT
Created
attachment 195222
[details]
Archive of layout-test-results from gce-cr-linux-03 for chromium-linux-x86_64 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: gce-cr-linux-03 Port: chromium-linux-x86_64 Platform: Linux-3.3.8-gcg-201212281604-x86_64-with-GCEL-10.04-gcel_10.04
Glenn Adams
Comment 48
2013-03-27 22:37:21 PDT
Comment on
attachment 195207
[details]
Patch marking as r- in order to terminate apparent endless retry loop on mac-ews test
Glenn Adams
Comment 49
2013-03-27 23:22:56 PDT
Created
attachment 195488
[details]
Patch
Darin Adler
Comment 50
2013-04-02 10:35:07 PDT
Comment on
attachment 195488
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=195488&action=review
Generally looks great. review- because of the problem in acquireLineBreakIterator.
> Source/WebCore/ChangeLog:37 > + (LazyLineBreakIterator):
Ideally, you should remove bogus lines like this that are added because of bugs in the prepare-ChangeLog script. Even better, you could get someone to fix that script, or do it yourself!
> Source/WebCore/platform/text/TextBreakIterator.h:82 > + unsigned getPriorContextLength() const
This function should be named priorContextLength in WebKit coding style. We reserve the word “get” for functions that return results in some other way.
> Source/WebCore/platform/text/TextBreakIterator.h:85 > + ASSERT(sizeof(m_priorContext) / sizeof(m_priorContext[0]) >= 2);
These should be a COMPILE_ASSERT rather than just an ASSERT. WTF has a macro for array length called WTF_ARRAY_LENGTH that should be used instead of writing out the math with two separate sizeofs. This should be asserting that it’s == 2, not just >= 2, since this function’s implementation assumes there are exactly 2 elements. Generally speaking, all the functions seem specific to a 2-element, array, so the assertions don’t seem quite right. For example, this one looks only at two characters, and others assume that the last character is at index 1 and the second to last at index 0. So all these functions are assuming 2 characters of context, not >= 2 or >= 1.
> Source/WebCore/platform/text/TextBreakIterator.h:149 > + // N.B. prior context is explicitly not reset here since it needs to be accumulated and retained > + // across multiple inline nodes, each of which cause this reset() to be invoked in order > + // to update the primary context string.
This comment makes it sound like we should consider renaming the reset() function for clarity about what it does and does not reset.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:87 > + const void* pointerSource = static_cast<const void*>(source);
It’s strange to first cast these to void* and then to char*. I suggest a reinterpret_cast directly to char*. There’s no problem comparing a void* to a char*, and the number of casts in this function might be cut down considerably.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:129 > + ASSERT(errorCode);
Seems a bit excessive to assert this. We don’t have to assert every pointer as non-null.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:130 > + ASSERT_NOT_REACHED();
Might be worth commenting why this is correct.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:145 > + return offset < INT32_MAX ? static_cast<int32_t>(offset) : 0;
I didn’t know that INT32_MAX existed, portably. Normally I’d use numeric_limits<int32_t>::max(). Why is it OK to return 0 if the offset is too big to fit into 32 bits? A comment should say why; it’s not obvious.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:187 > + text->chunkLength = length < INT32_MAX ? static_cast<int32_t>(length) : 0;
Same comments as above.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:209 > + text->chunkOffset = nativeIndex < INT32_MAX ? static_cast<int32_t>(nativeIndex) : 0;
Same thing again.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:314 > + UText* text = utext_setup(&utWithBuffer->text, sizeof(utWithBuffer->buffer), status); > + if (U_SUCCESS(*status)) > + textInit(text, &textLatin1Funcs, string, length, priorContext, priorContextLength); > + return text;
Is utext_setup guaranteed to return 0 if the status does not indicate success? WebKit style is normally early return for errors, so we’d write: if (U_FAILURE(*status)) { ASSERT(!text); return 0; } And continue with the call to textInit, rather than putting the normal case inside an if statement.
> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:419 > + if (U_SUCCESS(*status)) > + textInit(text, &textUTF16Funcs, string, length, priorContext, priorContextLength); > + return text;
Same comment as above.
> Source/WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp:263 > + Vector<UChar> utf16string(priorContextLength + length);
This isn’t a good name for the local variable. There’s an argument called string that is also UTF-16, so calling this utf16string does nothing to make clear why it’s different. Also, the capitalization does not match WebKit coding style, which requires a capital S. I would call this stringWithPrependedContext or some other sort name that makes clear what the difference is.
> Source/WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp:268 > + return acquireLineBreakIterator(utf16string.data(), utf16string.size(), locale, 0, 0);
Having the function call itself like this doesn’t seem optimal to me. Is it OK to have a line break iterator that outlives the data it’s pointed to? When the local vector, utf16string, is deleted, why will the line break iterator still work properly? I think this is a major problem!
Darin Adler
Comment 51
2013-04-02 10:36:05 PDT
Comment on
attachment 195488
[details]
Patch I think it would be a good idea to break off the renaming and reformatting from this patch in TextBreakIterator.h and land it first, to make the real patch a bit smaller. Not required, but worthwhile.
Glenn Adams
Comment 52
2013-04-02 13:32:33 PDT
(In reply to
comment #50
)
> (From update of
attachment 195488
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=195488&action=review
> > Source/WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp:263 > > + Vector<UChar> utf16string(priorContextLength + length); > > This isn’t a good name for the local variable. There’s an argument called string that is also UTF-16, so calling this utf16string does nothing to make clear why it’s different. Also, the capitalization does not match WebKit coding style, which requires a capital S. I would call this stringWithPrependedContext or some other sort name that makes clear what the difference is.
I copied this usage from existing code in platform/text/TextBreakIterator.cpp: TextBreakIterator* acquireLineBreakIterator(const LChar* string, int length, const AtomicString& locale) { Vector<UChar> utf16string(length); for (int i = 0; i < length; ++i) utf16string[i] = string[i]; return acquireLineBreakIterator(utf16string.data(), length, locale); }
> > > Source/WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp:268 > > + return acquireLineBreakIterator(utf16string.data(), utf16string.size(), locale, 0, 0); > > Having the function call itself like this doesn’t seem optimal to me. > > Is it OK to have a line break iterator that outlives the data it’s pointed to? When the local vector, utf16string, is deleted, why will the line break iterator still work properly? I think this is a major problem!
Interestingly, that is also exactly what was done by the existing code I cite above. I wonder if it is essential that I attempt to fix up this !USE(ICU_UNICODE) code path, particularly since I don't have such a platform to build/test on. WinCE perhaps? Would it be better to simply not attempt to fix up this code and to let those who depend on this code path to effect a fix? Or I could simply add the new parameters (so that a build succeeds) and add an ASSERT if priorContextLength is non-zero thus forcing a debug time testing failure? What would you suggest?
Glenn Adams
Comment 53
2013-04-02 15:44:15 PDT
(In reply to
comment #51
)
> (From update of
attachment 195488
[details]
) > I think it would be a good idea to break off the renaming and reformatting from this patch in TextBreakIterator.h and land it first, to make the real patch a bit smaller. Not required, but worthwhile.
Done:
https://bugs.webkit.org/show_bug.cgi?id=113823
Also, I separated out the new performance test:
https://bugs.webkit.org/show_bug.cgi?id=113811
Glenn Adams
Comment 54
2013-04-02 20:17:38 PDT
Created
attachment 196271
[details]
Patch
EFL EWS Bot
Comment 55
2013-04-02 20:22:44 PDT
Comment on
attachment 196271
[details]
Patch
Attachment 196271
[details]
did not pass efl-ews (efl): Output:
http://webkit-commit-queue.appspot.com/results/17388178
WebKit Review Bot
Comment 56
2013-04-02 20:30:53 PDT
Comment on
attachment 196271
[details]
Patch
Attachment 196271
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://webkit-commit-queue.appspot.com/results/17323758
WebKit Review Bot
Comment 57
2013-04-02 20:32:28 PDT
Comment on
attachment 196271
[details]
Patch
Attachment 196271
[details]
did not pass cr-linux-debug-ews (chromium-xvfb): Output:
http://webkit-commit-queue.appspot.com/results/17246817
Glenn Adams
Comment 58
2013-04-02 20:36:52 PDT
Created
attachment 196273
[details]
Patch
Glenn Adams
Comment 59
2013-04-02 22:00:40 PDT
Created
attachment 196276
[details]
Patch
Glenn Adams
Comment 60
2013-04-02 22:56:19 PDT
Comment on
attachment 196276
[details]
Patch Addressed
comment #50
issues.
Glenn Adams
Comment 61
2013-04-02 23:05:12 PDT
(In reply to
comment #50
)
> (From update of
attachment 195488
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=195488&action=review
Addressed in
https://bugs.webkit.org/attachment.cgi?id=196276
> > Source/WebCore/ChangeLog:37 > > + (LazyLineBreakIterator): > > Ideally, you should remove bogus lines like this that are added because of bugs in the prepare-ChangeLog script.
Fixed.
> > Source/WebCore/platform/text/TextBreakIterator.h:82 > > + unsigned getPriorContextLength() const > This function should be named priorContextLength in WebKit coding style. We reserve the word “get” for functions that return results in some other way.
Fixed in
https://bugs.webkit.org/show_bug.cgi?id=113823
.
> > > Source/WebCore/platform/text/TextBreakIterator.h:85 > > + ASSERT(sizeof(m_priorContext) / sizeof(m_priorContext[0]) >= 2); > > These should be a COMPILE_ASSERT rather than just an ASSERT. > > WTF has a macro for array length called WTF_ARRAY_LENGTH that should be used instead of writing out the math with two separate sizeofs. > > This should be asserting that it’s == 2, not just >= 2, since this function’s implementation assumes there are exactly 2 elements. Generally speaking, all the functions seem specific to a 2-element, array, so the assertions don’t seem quite right. For example, this one looks only at two characters, and others assume that the last character is at index 1 and the second to last at index 0. So all these functions are assuming 2 characters of context, not >= 2 or >= 1.
Fixed in
https://bugs.webkit.org/show_bug.cgi?id=113823
.
> > > Source/WebCore/platform/text/TextBreakIterator.h:149 > > + // N.B. prior context is explicitly not reset here since it needs to be accumulated and retained > > + // across multiple inline nodes, each of which cause this reset() to be invoked in order > > + // to update the primary context string. > > This comment makes it sound like we should consider renaming the reset() function for clarity about what it does and does not reset.
Fixed in
https://bugs.webkit.org/show_bug.cgi?id=113823
.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:87 > > + const void* pointerSource = static_cast<const void*>(source); > > It’s strange to first cast these to void* and then to char*. I suggest a reinterpret_cast directly to char*. There’s no problem comparing a void* to a char*, and the number of casts in this function might be cut down considerably.
Fixed.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:129 > > + ASSERT(errorCode); > > Seems a bit excessive to assert this. We don’t have to assert every pointer as non-null.
Fixed.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:130 > > + ASSERT_NOT_REACHED(); > > Might be worth commenting why this is correct.
Added comment.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:145 > > + return offset < INT32_MAX ? static_cast<int32_t>(offset) : 0; > > I didn’t know that INT32_MAX existed, portably. Normally I’d use numeric_limits<int32_t>::max(). > > Why is it OK to return 0 if the offset is too big to fit into 32 bits? A comment should say why; it’s not obvious.
Fixed. Use max() and added comments.
> > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:187 > > + text->chunkLength = length < INT32_MAX ? static_cast<int32_t>(length) : 0; > > Same comments as above. >
Fixed. Use max() and added comments.
> > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:209 > > + text->chunkOffset = nativeIndex < INT32_MAX ? static_cast<int32_t>(nativeIndex) : 0; > > Same thing again. >
Fixed. Use max() and added comments.
> > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:314 > > + UText* text = utext_setup(&utWithBuffer->text, sizeof(utWithBuffer->buffer), status); > > + if (U_SUCCESS(*status)) > > + textInit(text, &textLatin1Funcs, string, length, priorContext, priorContextLength); > > + return text; > > Is utext_setup guaranteed to return 0 if the status does not indicate success? WebKit style is normally early return for errors, so we’d write: > > if (U_FAILURE(*status)) { > ASSERT(!text); > return 0; > } > > And continue with the call to textInit, rather than putting the normal case inside an if statement. >
Fixed.
> > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:419 > > + if (U_SUCCESS(*status)) > > + textInit(text, &textUTF16Funcs, string, length, priorContext, priorContextLength); > > + return text; > > Same comment as above. >
Fixed.
> > Source/WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp:263 > > + Vector<UChar> utf16string(priorContextLength + length); > > This isn’t a good name for the local variable. There’s an argument called string that is also UTF-16, so calling this utf16string does nothing to make clear why it’s different. Also, the capitalization does not match WebKit coding style, which requires a capital S. I would call this stringWithPrependedContext or some other sort name that makes clear what the difference is.
> Removed code. See also
comment #52
.
> > Source/WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp:268 > > + return acquireLineBreakIterator(utf16string.data(), utf16string.size(), locale, 0, 0); > > Having the function call itself like this doesn’t seem optimal to me.
No longer applicable, but FYI, modern compilers convert this to iteration by tail call optimization. [It's a programming paradigm I often used in Lisp back in the "good ole days" :)]
> > Is it OK to have a line break iterator that outlives the data it’s pointed to? When the local vector, utf16string, is deleted, why will the line break iterator still work properly? I think this is a major problem!
Replaced code with asserts since I really can't build/test this anyway. See also
comment #52
. If you suggest another approach, let me know.
Glenn Adams
Comment 62
2013-04-03 02:28:10 PDT
(In reply to
comment #61
)
> (In reply to
comment #50
) > > > Source/WebCore/platform/text/TextBreakIterator.h:82 > > > + unsigned getPriorContextLength() const > > This function should be named priorContextLength in WebKit coding style. We reserve the word “get” for functions that return results in some other way. > > Fixed in
https://bugs.webkit.org/show_bug.cgi?id=113823
.
Correction. Fixed (in current patch to this bug).
WebKit Review Bot
Comment 63
2013-04-03 14:52:18 PDT
Comment on
attachment 196276
[details]
Patch Clearing flags on attachment: 196276 Committed
r147588
: <
http://trac.webkit.org/changeset/147588
>
WebKit Review Bot
Comment 64
2013-04-03 14:52:28 PDT
All reviewed patches have been landed. Closing bug.
Benjamin Poulain
Comment 65
2013-04-03 16:03:23 PDT
This caused 3 failures on the bots. Can you have a look? (rebaseline needed?)
Glenn Adams
Comment 66
2013-04-03 16:06:07 PDT
(In reply to
comment #65
)
> This caused 3 failures on the bots. Can you have a look? (rebaseline needed?)
Those were expected as I marked in the ChangeLog: Expect rebaselines will be needed due to different line breaks: fast/text/international/003.html fast/text/international/text-combine-image-test-expected.html fast/text/writing-mode/Kusa-Makura-background-canvas.html I'm waiting for bots to catch up so I can rebaseline with garden-o-magic. This will be my first attempt to use it.
Simon Fraser (smfr)
Comment 67
2013-04-03 22:25:04 PDT
Are these diffs expected?
http://build.webkit.org/results/Apple%20MountainLion%20Release%20WK1%20(Tests)/r147588%20(8648)/retries/fast/text/international/003-diffs.html
The new results look rather different.
Glenn Adams
Comment 68
2013-04-04 00:45:23 PDT
(In reply to
comment #67
)
> Are these diffs expected? >
http://build.webkit.org/results/Apple%20MountainLion%20Release%20WK1%20(Tests)/r147588%20(8648)/retries/fast/text/international/003-diffs.html
> > The new results look rather different.
Yes, they were. WIth this patch, there are new line break opportunities that were not previously considered, particularly at element boundaries. The test file you cite and two others have been rebaselined to accommodate the different line breaks. I just committed the third of three rebaseline sets: [1]
http://trac.webkit.org/changeset/147599
[Chromium] [2]
http://trac.webkit.org/changeset/147601
[GTK] [3]
http://trac.webkit.org/changeset/147613
[Mac] if I missed any, please let me know or go ahead with a rebaseline.
mitz
Comment 69
2013-04-16 21:26:10 PDT
(In reply to
comment #63
)
> (From update of
attachment 196276
[details]
) > Clearing flags on attachment: 196276 > > Committed
r147588
: <
http://trac.webkit.org/changeset/147588
>
This change caused
bug 114721
.
Mark Salisbury
Comment 70
2013-06-06 14:27:17 PDT
(In reply to
comment #52
)
> I wonder if it is essential that I attempt to fix up this !USE(ICU_UNICODE) code path, particularly since I don't have such a platform to build/test on. WinCE perhaps? Would it be better to simply not attempt to fix up this code and to let those who depend on this code path to effect a fix?
I'm planning to submit a patch for this. I've opened
https://bugs.webkit.org/show_bug.cgi?id=117320
.
Eric Seidel (no email)
Comment 71
2013-06-24 17:17:07 PDT
FYI, Blink also took this patch and found it to be a 4% regression in PerformanceTests/Layout/hindi-line-layout.html
https://code.google.com/p/chromium/issues/detail?id=252050
We're going to keep the patch, but it's also sad to make are already slow complex text path slower. :(
http://stackoverflow.com/questions/16554205/google-chrome-text-rendering-for-non-latin-languages-is-too-slow
Antoine Quint
Comment 72
2013-10-23 06:38:17 PDT
(In reply to
comment #63
)
> (From update of
attachment 196276
[details]
) > Clearing flags on attachment: 196276 > > Committed
r147588
: <
http://trac.webkit.org/changeset/147588
>
This change apparently caused a regression, see
https://bugs.webkit.org/show_bug.cgi?id=123204
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug