105692 – Element boundaries prevent Japanese line break opportunities

Koji Ishii

Reported 2012-12-23 09:33:49 PST

Created attachment 180625 [details] test case Line break opportunities before and after <span> or any inline elements are prohibited. EXPECTED: <span> or any inline elements should not prevent line break opportunities.

mitz

Comment 1 2012-12-31 19:09:56 PST

Looks like a duplicate of bug 17427.

Koji Ishii

Comment 2 2013-01-13 23:05:45 PST

Created attachment 182512 [details] test case result

Koji Ishii

Comment 3 2013-01-13 23:06:07 PST

Created attachment 182513 [details] test case expected

Lu jing

Comment 4 2013-01-22 05:13:03 PST

I work for fix this bug, could you assign it to me?

Glenn Adams

Comment 5 2013-01-22 18:13:44 PST

(In reply to comment #4) > I work for fix this bug, could you assign it to me? Hi Lu Jing, I have already assigned it to myself and started work on this bug, which I believe may be dependent on bug 89235. If you would like to help, please email me directly to discuss. Thanks, Glenn

Lu jing

Comment 6 2013-01-23 16:32:02 PST

(In reply to comment #5) > (In reply to comment #4) > > I work for fix this bug, could you assign it to me? > > Hi Lu Jing, I have already assigned it to myself and started work on this bug, which I believe may be dependent on bug 89235. If you would like to help, please email me directly to discuss. Thanks, Glenn Got it. Thanks.

Koji Ishii

Comment 7 2013-02-17 06:43:35 PST

I don't think this bug depends on bug 89235. 89325 is about changing iterator while this bug is about how to call iterators. Let's not try to create unnecessary dependencies as it will slow things down. Lu, can you please keep working on this if you can?

Glenn Adams

Comment 8 2013-02-17 08:48:45 PST

(In reply to comment #7) > I don't think this bug depends on bug 89235. 89325 is about changing iterator while this bug is about how to call iterators. Let's not try to create unnecessary dependencies as it will slow things down. agreed; however, it either depends on or is a duplicate of 17427 as pointed out by mitz in comment #1

Lu jing

Comment 9 2013-02-17 11:33:31 PST

(In reply to comment #7) > I don't think this bug depends on bug 89235. 89325 is about changing iterator while this bug is about how to call iterators. Let's not try to create unnecessary dependencies as it will slow things down. > > Lu, can you please keep working on this if you can? Ok, I've read all about 17427. I'm not sure I can fix but I will keep working on this.

Koji Ishii

Comment 10 2013-02-21 06:32:36 PST

If 17427 can fix this too, since nobody seems to be working on 17427, you can work on 17427 and then resolve this as dup, if you want to.

Glenn Adams

Comment 11 2013-02-21 06:59:10 PST

(In reply to comment #10) > If 17427 can fix this too, since nobody seems to be working on 17427, you can work on 17427 and then resolve this as dup, if you want to. that's my plan

Glenn Adams

Comment 12 2013-02-25 09:28:53 PST

verified that patch https://bugs.webkit.org/attachment.cgi?id=190055 on bug 17427 fixes the problem shown by the attached test case

Glenn Adams

Comment 13 2013-03-11 19:14:41 PDT

Created attachment 192625 [details] Patch

WebKit Review Bot

Comment 14 2013-03-11 22:14:59 PDT

Comment on attachment 192625 [details] Patch Attachment 192625 [details] did not pass chromium-ews (chromium-xvfb): Output: http://webkit-commit-queue.appspot.com/results/17013978 New failing tests: http/tests/canvas/philip/tests/security.pattern.canvas.strokeStyle.html css1/cascade/important.html css1/box_properties/acid_test.html http/tests/canvas/philip/tests/security.pattern.cross.html css1/box_properties/border_bottom.html css1/basic/containment.html http/tests/canvas/philip/tests/security.pattern.image.strokeStyle.html accessibility/radio-button-title-label.html css1/box_properties/border_bottom_width_inline.html http/tests/canvas/philip/tests/security.reset.html http/tests/canvas/philip/tests/security.pattern.canvas.timing.html css1/classification/list_style_position.html accessibility/label-for-control-hittest.html canvas/philip/tests/2d.drawImage.negativeSourceHeight2.html css1/basic/contextual_selectors.html canvas/philip/tests/2d.composite.operation.foobar.html css1/classification/display.html css1/color_and_background/background.html css1/classification/list_style.html http/tests/canvas/philip/tests/security.drawImage.image.html css1/color_and_background/background_color.html http/tests/canvas/philip/tests/security.pattern.image.fillStyle.html css1/box_properties/border_bottom_inline.html http/tests/canvas/philip/tests/security.drawImage.canvas.html css1/color_and_background/background_attachment.html http/tests/canvas/philip/tests/security.pattern.canvas.fillStyle.html http/tests/canvas/philip/tests/security.pattern.create.html canvas/philip/tests/2d.drawImage.negativeSourceHeightAndWidth.html http/tests/cache/subresource-failover-to-network.html canvas/philip/tests/2d.drawImage.negativeSourceHeight.html

Glenn Adams

Comment 15 2013-03-11 23:58:32 PDT

Comment on attachment 192625 [details] Patch causes functional regression

Glenn Adams

Comment 16 2013-03-15 20:24:06 PDT

Created attachment 193419 [details] Patch

Glenn Adams

Comment 17 2013-03-15 20:33:47 PDT

Comment on attachment 193419 [details] Patch Support use of prior context in non-ASCII path without burden of memory allocation. This required changing to a custom ICU text provider for UChar strings and will also require changes to the current custom text provider for LChar strings. This patch is not yet ready for review: (1) uTextLatin1* (existing LChar text provider) need to be updated to use prior context; (2) need to handle surrogate pairs in prior context as well as at boundary of prior and primary contexts; (3) need functional and performance regression testing; This patch does, however, fix the reported test reduction, as also indicated by the new LayoutTest included in this patch.

Glenn Adams

Comment 18 2013-03-15 21:10:34 PDT

(In reply to comment #17) > (From update of attachment 193419 [details]) > Support use of prior context in non-ASCII path without burden of memory allocation. This required changing to a custom ICU text provider for UChar strings and will also require changes to the current custom text provider for LChar strings. This patch is not yet ready for review: > > (1) uTextLatin1* (existing LChar text provider) need to be updated to use prior context; > (2) need to handle surrogate pairs in prior context as well as at boundary of prior and primary contexts; > (3) need functional and performance regression testing; > > This patch does, however, fix the reported test reduction, as also indicated by the new LayoutTest included in this patch. Adding msaboff since he implemented original uTextLatin1* text provider. I expect to ask him to take a quick look at this work once it gets to an r? stage.

Glenn Adams

Comment 19 2013-03-18 22:12:42 PDT

Created attachment 193728 [details] Patch

Early Warning System Bot

Comment 20 2013-03-18 22:18:46 PDT

Comment on attachment 193728 [details] Patch Attachment 193728 [details] did not pass qt-ews (qt): Output: http://webkit-commit-queue.appspot.com/results/17160598

Early Warning System Bot

Comment 21 2013-03-18 22:20:37 PDT

Comment on attachment 193728 [details] Patch Attachment 193728 [details] did not pass qt-wk2-ews (qt): Output: http://webkit-commit-queue.appspot.com/results/17234336

WebKit Review Bot

Comment 22 2013-03-18 22:30:40 PDT

Comment on attachment 193728 [details] Patch Attachment 193728 [details] did not pass chromium-ews (chromium-xvfb): Output: http://webkit-commit-queue.appspot.com/results/17176607

WebKit Review Bot

Comment 23 2013-03-18 22:31:46 PDT

Comment on attachment 193728 [details] Patch Attachment 193728 [details] did not pass cr-linux-debug-ews (chromium-xvfb): Output: http://webkit-commit-queue.appspot.com/results/17211445

Glenn Adams

Comment 24 2013-03-18 22:37:49 PDT

Created attachment 193732 [details] Patch

Build Bot

Comment 25 2013-03-19 00:51:36 PDT

Comment on attachment 193732 [details] Patch Attachment 193732 [details] did not pass mac-ews (mac): Output: http://webkit-commit-queue.appspot.com/results/17183548

Build Bot

Comment 26 2013-03-19 00:59:25 PDT

Comment on attachment 193732 [details] Patch Attachment 193732 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-commit-queue.appspot.com/results/17074957

Glenn Adams

Comment 27 2013-03-19 14:38:57 PDT

Created attachment 193924 [details] Patch

Glenn Adams

Comment 28 2013-03-19 14:44:55 PDT

Created attachment 193927 [details] Line layout performance results (en, jp) for attachment 193924 [details]. Approximately 4% improvement for Japanese. Slight improvement for English.

Glenn Adams

Comment 29 2013-03-19 14:47:52 PDT

Created attachment 193929 [details] Parser (html5-full-render) performance results for attachment 193924 [details]. No measurable regression. Performed on MBP Retina MacOSX 10.8.3.

Glenn Adams

Comment 30 2013-03-19 15:22:08 PDT

(In reply to comment #29) > Created an attachment (id=193929) [details] > Parser (html5-full-render) performance results for attachment 193924 [details]. > > No measurable regression. Performed on MBP Retina MacOSX 10.8.3. I should have said no clearly significant regression within the indicated variance.

Darin Adler

Comment 31 2013-03-20 10:46:43 PDT

Comment on attachment 193924 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=193924&action=review Thanks for tackling this. A lot of the new code here isn’t following WebKit coding style and it should. I’m concerned about the level of test coverage for this significant amount code. I am pretty sure I spotted an infinite loop in the code, not sure why it was not a problem in practice. > Source/WebCore/ChangeLog:28 > + (WebCore): It’s best to remove bogus lines like this that the script adds to change log entries. > Source/WebCore/platform/text/TextBreakIterator.h:57 > + const int TextBreakPriorContextCapacity = 2; I think this constant would be better as a private member of LazyLineBreakIterator rather exposed at the top level as a public constant. It has nothing to do with the interface to the iterator. > Source/WebCore/platform/text/TextBreakIterator.h:83 > + UChar lastCharacter() const { return m_priorContext[1]; } Do we need an assertion about m_priorContextLength here? > Source/WebCore/platform/text/TextBreakIterator.h:85 > + const UChar* getPriorContext(int& priorContextLength) This is used only within the class, so I suggest it be public. I’m also not sure it needs to be a function. Having this logic in the get function might be good. > Source/WebCore/platform/text/TextBreakIterator.h:95 > + m_priorContextLength = last ? (secondToLast ? 2 : 1) : 0; This seems a bit peculiar. We are storing a length, but it’s not additional data; it’s just an indicator of which characters are zero. m_priorContextLength should not be stored if it can be computed. > Source/WebCore/platform/text/TextBreakIterator.h:103 > + void resetPriorContext() Renaming this to be more abstract isn’t all that useful. We still are storing just two characters. It’s not all that helpful to pretend to be more general in our naming. > Source/WebCore/platform/text/TextBreakIterator.h:110 > - TextBreakIterator* get() > + TextBreakIterator* get(int& priorContextLength) This interface is confusing enough that I think we’ll need a comment. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:71 > +static inline int64_t uTextWithPriorContextGetNativeLength(UText* ut) > +{ > + return ut->a + ut->b; > +} Nothing here seems to be specific to “a UText with prior context” so the function name seems peculiar. A shorter name would likely make the code easier to read. Further, including the word “Get” in this name does not make clear “this is the inlined one you want to call”. It’s just a nearly identical name to the other function and it seems likely we’d call the wrong one. Further, I don’t think that adding “with prior context” to the names of all the functions is necessary. It’s really hard to parse those long names. There must be a more concise way to put things. WebKit coding style is to use words rather than abbreviations. I suggest the word “text” rather than the letters “ut” as the name of the object pointer in all these functions. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:77 > +// Generic native length function, works with both Latin1 and UC text providers. > +static int64_t uTextWithPriorContextNativeLength(UText* ut) > +{ > + return uTextWithPriorContextGetNativeLength(ut); > +} Here the long name is perhaps justified, since this function exists to be put in the method table below. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:80 > -static UText* uTextLatin1Clone(UText* destination, const UText* source, UBool deep, UErrorCode* status) > +// Generic clone function, works with both Latin1 and UC text providers. > +static UText * uTextWithPriorContextClone(UText* dest, const UText* src, UBool deep, UErrorCode* status) Why the formatting change to add a space before the "*" here? That’s not normal for our project. Why the abbreviation of destination and source? In WebKit we prefer words to abbreviations. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:89 > + void* extraNew = dest->pExtra; What does “extra new” mean? Unclear name. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:93 > + int sizeToCopy = src->sizeOfStruct; > + if (sizeToCopy > dest->sizeOfStruct) > + sizeToCopy = dest->sizeOfStruct; Would read more clearly with std::min instead of an if statement. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:98 > + if (extraSize) > + memcpy(dest->pExtra, src->pExtra, extraSize); The if here is unneeded and I suggest omitting it. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:103 > +static int32_t uTextWithPriorContextExtract(UText* ut, int64_t start, int64_t limit, UChar* dest, int32_t destCapacity, UErrorCode* pErrorCode) The use of a p prefix to mean a pointer is not a WebKit standard style element, so should not be done. The name “ut” does not seem clear; a word would be better than a pile of letters and more consistent with WebKit coding style. Please don’t use the abbreviation dest; use the word destination instead. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:128 > + int32_t copied = 0; It’s normally best to name local variables with noun phrases rather than fragments. The word “copied” sounds like it could be a boolean and it would be clearer to call it charactersCopied or something along those lines. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:129 > + while (copied < length) { This loop does not seem to increment “copied”, so I’d expect it to be an infinite loop. How did you test? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:131 > + if (!ut->pFuncs->access(ut, start, TRUE)) > + ASSERT_NOT_REACHED(); Is continuing to run an ignoring the failure the most helpful behavior here for release builds? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:133 > + int32_t ucAvailable = ut->chunkLength - ut->chunkOffset; > + int32_t ucToCopy = ucAvailable < length ? ucAvailable : length; The “uc” prefix on these local variable names is not the usual WebKit style. It’s also best to name local variables with noun phrases rather than fragments. “to copy” is not really so great. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:134 > + StringImpl::copyChars(&dest[copied], ut->chunkContents + ut->chunkOffset, static_cast<unsigned>(ucToCopy)); Why the static_cast here? Normally we can pass an int32_t to an unsigned without an explicit cast. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:146 > +// Generic map offset to native index function, works with both Latin1 and UC text providers. The comments like this one before each function don’t say much. I’d omit them. It’s true that these functions are used twice, for both Latin-1 and Unicode, but they’re not “generic” in any additional interesting sense, nor is it helpful to repeat the function name in prose in a comment. Also, I am not comfortable with the abbreviation “uc” for “Unicode” and I’d prefer we keep the use of that to a minimum. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:157 > + ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX); What guarantees this is true? We can assert it only if we know it will be true. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:238 > + ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX); Same question as elsewhere. What guarantees this? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:250 > + ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX); Same question as elsewhere. What guarantees this? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:291 > +static const struct UTextFuncs uTextLatin1WithPriorContextFuncs = > +{ In WebKit coding style the brace would go on the previous line. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:347 > + ASSERT_UNUSED(forward, forward ? nativeIndex >= ut->b && nativeIndex < ut->a + ut->b : nativeIndex > ut->b && nativeIndex <= ut->a + ut->b); We don’t normally do assertions with && in them. I suggest breaking this out into two assertions, one about the start and one about the end. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:350 > + ASSERT(ut->chunkNativeLimit - ut->chunkNativeStart < INT32_MAX); What guarantees this? We can’t just assert because we hope it’s true. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:353 > + ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX); What guarantees this? We can’t just assert because we hope it’s true. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:372 > + ASSERT(nativeIndex < INT32_MAX); What guarantees this? We can’t just assert because we hope it’s true. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:486 > + UText uTextUCWithPriorContextLocal = UTEXT_INITIALIZER; This long name does not seem helpful. Why does the phrase “with prior context” need to appear in the variable name? Why not just call this textLocalStorage or something like that? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:488 > + UErrorCode uTextOpenStatus = U_ZERO_ERROR; Why not just call this openStatus? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:489 > + UText* uTextUCWithPriorContext = UTextOpenUTF16(&uTextUCWithPriorContextLocal, string, length, priorContext, priorContextLength, &uTextOpenStatus); Why not just call this text?

Glenn Adams

Comment 32 2013-03-20 11:02:08 PDT

(In reply to comment #31) > (From update of attachment 193924 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=193924&action=review > > Thanks for tackling this. A lot of the new code here isn’t following WebKit coding style and it should. Thanks for the quick review! I'll address these points and post a new patch ASAP.

Michael Saboff

Comment 33 2013-03-20 12:33:13 PDT

Comment on attachment 193924 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=193924&action=review > Source/WebCore/ChangeLog:9 > + in non-ASCII path through nextBreakablePossition(). This enables typo - nextBreakablePosition() > Source/WebCore/ChangeLog:68 > + (WebCore::uTextWithPriorContextGetNativeLength): > + (WebCore): > + (WebCore::uTextWithPriorContextNativeLength): > + (WebCore::uTextWithPriorContextClone): > + (WebCore::uTextWithPriorContextExtract): > + (WebCore::uTextWithPriorContextMapOffsetToNative): > + (WebCore::uTextWithPriorContextMapNativeIndexToUTF16): > + (WebCore::uTextWithPriorContextClose): > + (WebCore::uTextGetContext): > + (WebCore::uTextLatin1GetCurrentContext): > + (WebCore::uTextLatin1MoveInPrimaryContext): > + (WebCore::uTextLatin1SwitchToPrimaryContext): > + (WebCore::uTextLatin1MoveInPriorContext): > + (WebCore::uTextLatin1SwitchToPriorContext): > + (WebCore::uTextInChunkOrOutOfRange): > + (WebCore::uTextLatin1WithPriorContextAccess): > + (WebCore::uTextWithPriorContextInit): > + (WebCore::UTextOpenLatin1): > + (WebCore::uTextUTF16GetCurrentContext): > + (WebCore::uTextUTF16MoveInPrimaryContext): > + (WebCore::uTextUTF16SwitchToPrimaryContext): > + (WebCore::uTextUTF16MoveInPriorContext): > + (WebCore::uTextUTF16SwitchToPriorContext): > + (WebCore::uTextUTF16WithPriorContextAccess): I don't think adding WithPriorContext to the various function names adds much. Some of these functions are shared between the Latin1 and UTF16 UText processing and then there are the two Access functions. > Source/WebCore/platform/text/TextBreakIterator.h:92 > { Shouldn't we ASSERT here that we have at least two characters of space in m_priorContext? > Source/WebCore/platform/text/TextBreakIterator.h:98 > { Shouldn't we ASSERT here that we have at least two characters of space in m_priorContext? > Source/WebCore/platform/text/TextBreakIterator.h:104 > + { Shouldn't we ASSERT here that we have at least two characters of context to reset, or iterate through from 0..TextBreakPriorContextCapacity-1? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:92 > + if (sizeToCopy > dest->sizeOfStruct) When would src->sizeOfStruct != dest->sizeOfStruct? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:94 > + memcpy(dest, src, sizeToCopy); I prefer the element by element initialization, since there are special cases to handle in the destination. For example, the addition of a new pointer in struct UText would end up being a problem with the memcpy. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:459 > + UTextWithBuffer uTextLatin1WithPriorContextLocal; > + uTextLatin1WithPriorContextLocal.ut = emptyUText; > + uTextLatin1WithPriorContextLocal.ut.extraSize = sizeof(uTextLatin1WithPriorContextLocal.buffer); > + uTextLatin1WithPriorContextLocal.ut.pExtra = uTextLatin1WithPriorContextLocal.buffer; Same comment about adding WithPriorContext. I don't think it is needed.

Glenn Adams

Comment 34 2013-03-20 16:56:41 PDT

These comments apply to an update to the current patch I will be submitting presently. (In reply to comment #31) > (From update of attachment 193924 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=193924&action=review > > Thanks for tackling this. A lot of the new code here isn’t following WebKit coding style and it should. Fixed (see details below). Most of the style diffs were from using the ICU text provider code examples (and their style). > > I’m concerned about the level of test coverage for this significant amount code. I am pretty sure I spotted an infinite loop in the code, not sure why it was not a problem in practice. Apparently never called, but see further comments below. > > > Source/WebCore/ChangeLog:28 > > + (WebCore): > > It’s best to remove bogus lines like this that the script adds to change log entries. Fixed. > > > Source/WebCore/platform/text/TextBreakIterator.h:57 > > + const int TextBreakPriorContextCapacity = 2; > > I think this constant would be better as a private member of LazyLineBreakIterator rather exposed at the top level as a public constant. It has nothing to do with the interface to the iterator. Fixed. > > > Source/WebCore/platform/text/TextBreakIterator.h:83 > > + UChar lastCharacter() const { return m_priorContext[1]; } > > Do we need an assertion about m_priorContextLength here? Added assert against priorContextCapacity. > > > Source/WebCore/platform/text/TextBreakIterator.h:85 > > + const UChar* getPriorContext(int& priorContextLength) > > This is used only within the class, so I suggest it be public. I’m also not sure it needs to be a function. Having this logic in the get function might be good. Removed, incorporating logic into get(). > > > Source/WebCore/platform/text/TextBreakIterator.h:95 > > + m_priorContextLength = last ? (secondToLast ? 2 : 1) : 0; > > This seems a bit peculiar. We are storing a length, but it’s not additional data; it’s just an indicator of which characters are zero. m_priorContextLength should not be stored if it can be computed. Replaced with private compute function. > > > Source/WebCore/platform/text/TextBreakIterator.h:103 > > + void resetPriorContext() > > Renaming this to be more abstract isn’t all that useful. We still are storing just two characters. It’s not all that helpful to pretend to be more general in our naming. The present storage of only two characters matches the current ASCII path usage and fixes the CJK cases; however, I anticipate having to expand prior context to store up to an entire grapheme cluster (or more) in order to properly handle Indic and certain SE Asian scripts. It may eventually require storage of a prior word in order to handle dictionary based word breaking in Thai (and similar writing systems). > > > Source/WebCore/platform/text/TextBreakIterator.h:110 > > - TextBreakIterator* get() > > + TextBreakIterator* get(int& priorContextLength) > > This interface is confusing enough that I think we’ll need a comment. Added comment. > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:71 > > +static inline int64_t uTextWithPriorContextGetNativeLength(UText* ut) > > +{ > > + return ut->a + ut->b; > > +} > > Nothing here seems to be specific to “a UText with prior context” so the function name seems peculiar. A shorter name would likely make the code easier to read. Further, including the word “Get” in this name does not make clear “this is the inlined one you want to call”. It’s just a nearly identical name to the other function and it seems likely we’d call the wrong one. Change "uTextWithPriorContext" to "text" throughout. I originally made the length getter into two parts, an inline part and a non-inline part, because I couldn't recall if the existing compilers would expand an inline as non-inline if a function pointer were taken of it (as is the case here). > > Further, I don’t think that adding “with prior context” to the names of all the functions is necessary. It’s really hard to parse those long names. There must be a more concise way to put things. Fixed. > WebKit coding style is to use words rather than abbreviations. I suggest the word “text” rather than the letters “ut” as the name of the object pointer in all these functions. Changed to "text". The "ut" usage was based on the usage found in the ICU library implementation of text providers. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:77 > > +// Generic native length function, works with both Latin1 and UC text providers. > > +static int64_t uTextWithPriorContextNativeLength(UText* ut) > > +{ > > + return uTextWithPriorContextGetNativeLength(ut); > > +} > > Here the long name is perhaps justified, since this function exists to be put in the method table below. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:80 > > -static UText* uTextLatin1Clone(UText* destination, const UText* source, UBool deep, UErrorCode* status) > > +// Generic clone function, works with both Latin1 and UC text providers. > > +static UText * uTextWithPriorContextClone(UText* dest, const UText* src, UBool deep, UErrorCode* status) > > Why the formatting change to add a space before the "*" here? That’s not normal for our project. Fixed. It was a typo. For some reason check-webkit-style didn't catch it. > > Why the abbreviation of destination and source? In WebKit we prefer words to abbreviations. Fixed. This usage was from the ICU implementation examples of text providers. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:89 > > + void* extraNew = dest->pExtra; > > What does “extra new” mean? Unclear name. This name corresponds to the ICU field UText.pExtra. See ICU utext.h for further information. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:93 > > + int sizeToCopy = src->sizeOfStruct; > > + if (sizeToCopy > dest->sizeOfStruct) > > + sizeToCopy = dest->sizeOfStruct; > > Would read more clearly with std::min instead of an if statement. Fixed. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:98 > > + if (extraSize) > > + memcpy(dest->pExtra, src->pExtra, extraSize); > > The if here is unneeded and I suggest omitting it. Removed. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:103 > > +static int32_t uTextWithPriorContextExtract(UText* ut, int64_t start, int64_t limit, UChar* dest, int32_t destCapacity, UErrorCode* pErrorCode) > > The use of a p prefix to mean a pointer is not a WebKit standard style element, so should not be done. Fixed. This came from ICU text provider usage. > > The name “ut” does not seem clear; a word would be better than a pile of letters and more consistent with WebKit coding style. Changed to "text". I had been following ICU text provider code example usage here. > > Please don’t use the abbreviation dest; use the word destination instead. Fixed. I had been following ICU text provider code example usage here. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:128 > > + int32_t copied = 0; > > It’s normally best to name local variables with noun phrases rather than fragments. The word “copied” sounds like it could be a boolean and it would be clearer to call it charactersCopied or something along those lines. I had originally called it charsCopied, but that seemed too verbose and a misnomer since it is not characters but character coding units (i.e., UTF-16 coding units) that are being copied. For example, a surrogate pair is two coding units but one character! Since I already use the prefix "uc" to refer to such unicode code (units), perhaps I'll change here to "ucCopied", which goes along with "ucToCopy". In any case, i've removed this code and this method now asserts or returns an error code if invoked. See more below. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:129 > > + while (copied < length) { > > This loop does not seem to increment “copied”, so I’d expect it to be an infinite loop. How did you test? Good catch. I had originally incremented in an earlier fragment but it got chopped out by mistake. It appears that utext_extract() isn't being called by ICU's rbbi (rule based break iterator) implementation even though it appears to be defined as a mandatory method in UTextFuncs. Perhaps the best way to handle this would be to replace the body of the method with ASSERT_NOT_REACHED() and return U_UNSUPPORTED_ERROR code for release build, then see if we hit this in any of the test bots? There does appear to be some internal ICU tests aimed at this method, but I haven't attempted to run them. WDYT? > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:131 > > + if (!ut->pFuncs->access(ut, start, TRUE)) > > + ASSERT_NOT_REACHED(); > > Is continuing to run an ignoring the failure the most helpful behavior here for release builds? This method now always asserts or returns error code. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:133 > > + int32_t ucAvailable = ut->chunkLength - ut->chunkOffset; > > + int32_t ucToCopy = ucAvailable < length ? ucAvailable : length; > > The “uc” prefix on these local variable names is not the usual WebKit style. It’s also best to name local variables with noun phrases rather than fragments. “to copy” is not really so great. Code removed. This method now always asserts or returns error code. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:134 > > + StringImpl::copyChars(&dest[copied], ut->chunkContents + ut->chunkOffset, static_cast<unsigned>(ucToCopy)); > > Why the static_cast here? Normally we can pass an int32_t to an unsigned without an explicit cast. Code removed. This method now always asserts or returns error code. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:146 > > +// Generic map offset to native index function, works with both Latin1 and UC text providers. > > The comments like this one before each function don’t say much. I’d omit them. It’s true that these functions are used twice, for both Latin-1 and Unicode, but they’re not “generic” in any additional interesting sense, nor is it helpful to repeat the function name in prose in a comment. Removed. > > Also, I am not comfortable with the abbreviation “uc” for “Unicode” and I’d prefer we keep the use of that to a minimum. Removed. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:157 > > + ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX); > > What guarantees this is true? We can assert it only if we know it will be true. Fixed. Use 0 if offset exceeds 32 bits. Note also that utext.h says: "Behavior is undefined if the native index does not fall within the current chunk." So returning zero will at least be a fixed, defined behavior. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:238 > > + ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX); > > Same question as elsewhere. What guarantees this? Fixed. Use 0 if offset exceeds 32 bits. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:250 > > + ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX); > > Same question as elsewhere. What guarantees this? Fixed. Use 0 if offset exceeds 32 bits. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:291 > > +static const struct UTextFuncs uTextLatin1WithPriorContextFuncs = > > +{ > > In WebKit coding style the brace would go on the previous line. Fixed. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:347 > > + ASSERT_UNUSED(forward, forward ? nativeIndex >= ut->b && nativeIndex < ut->a + ut->b : nativeIndex > ut->b && nativeIndex <= ut->a + ut->b); > > We don’t normally do assertions with && in them. I suggest breaking this out into two assertions, one about the start and one about the end. Fixed. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:350 > > + ASSERT(ut->chunkNativeLimit - ut->chunkNativeStart < INT32_MAX); > > What guarantees this? We can’t just assert because we hope it’s true. Fixed. Use 0 if offset exceeds 32 bits. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:353 > > + ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX); > > What guarantees this? We can’t just assert because we hope it’s true. Fixed. Use 0 if offset exceeds 32 bits. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:372 > > + ASSERT(nativeIndex < INT32_MAX); > > What guarantees this? We can’t just assert because we hope it’s true. Fixed. Use 0 if offset exceeds 32 bits. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:486 > > + UText uTextUCWithPriorContextLocal = UTEXT_INITIALIZER; > > This long name does not seem helpful. Why does the phrase “with prior context” need to appear in the variable name? Why not just call this textLocalStorage or something like that? Fixed. Changed to "textLocal". > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:488 > > + UErrorCode uTextOpenStatus = U_ZERO_ERROR; > > Why not just call this openStatus? Fixed. This was previously based on ICU text provider code usage. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:489 > > + UText* uTextUCWithPriorContext = UTextOpenUTF16(&uTextUCWithPriorContextLocal, string, length, priorContext, priorContextLength, &uTextOpenStatus); > > Why not just call this text? Fixed.

Glenn Adams

Comment 35 2013-03-20 17:31:45 PDT

Thanks for the quick review! The following comments apply to an update to the current patch I will be submitting presently. (In reply to comment #33) > (From update of attachment 193924 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=193924&action=review > > > Source/WebCore/ChangeLog:9 > > + in non-ASCII path through nextBreakablePossition(). This enables > > typo - nextBreakablePosition() Fixed. > > > Source/WebCore/ChangeLog:68 > > + (WebCore::uTextWithPriorContextGetNativeLength): > > + (WebCore): > > + (WebCore::uTextWithPriorContextNativeLength): > > + (WebCore::uTextWithPriorContextClone): > > + (WebCore::uTextWithPriorContextExtract): > > + (WebCore::uTextWithPriorContextMapOffsetToNative): > > + (WebCore::uTextWithPriorContextMapNativeIndexToUTF16): > > + (WebCore::uTextWithPriorContextClose): > > + (WebCore::uTextGetContext): > > + (WebCore::uTextLatin1GetCurrentContext): > > + (WebCore::uTextLatin1MoveInPrimaryContext): > > + (WebCore::uTextLatin1SwitchToPrimaryContext): > > + (WebCore::uTextLatin1MoveInPriorContext): > > + (WebCore::uTextLatin1SwitchToPriorContext): > > + (WebCore::uTextInChunkOrOutOfRange): > > + (WebCore::uTextLatin1WithPriorContextAccess): > > + (WebCore::uTextWithPriorContextInit): > > + (WebCore::UTextOpenLatin1): > > + (WebCore::uTextUTF16GetCurrentContext): > > + (WebCore::uTextUTF16MoveInPrimaryContext): > > + (WebCore::uTextUTF16SwitchToPrimaryContext): > > + (WebCore::uTextUTF16MoveInPriorContext): > > + (WebCore::uTextUTF16SwitchToPriorContext): > > + (WebCore::uTextUTF16WithPriorContextAccess): > > I don't think adding WithPriorContext to the various function names adds much. Some of these functions are shared between the Latin1 and UTF16 UText processing and then there are the two Access functions. Fixed: s/uTextWithPriorContext/text/ && s/uText/text/ > > > Source/WebCore/platform/text/TextBreakIterator.h:92 > > { > > Shouldn't we ASSERT here that we have at least two characters of space in m_priorContext? Fixed. > > > Source/WebCore/platform/text/TextBreakIterator.h:98 > > { > > Shouldn't we ASSERT here that we have at least two characters of space in m_priorContext? Fixed. > > > Source/WebCore/platform/text/TextBreakIterator.h:104 > > + { > > Shouldn't we ASSERT here that we have at least two characters of context to reset, or iterate through from 0..TextBreakPriorContextCapacity-1? Fixed. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:92 > > + if (sizeToCopy > dest->sizeOfStruct) > > When would src->sizeOfStruct != dest->sizeOfStruct? I modeled this code on that found in ICU utext.cpp:shallowTextClone(), which hasn't changed between version 46 and the current version 50. I believe the intent of this code was to handle clones from a source to which fields had been added to a destination where some of those added fields were not present. I suspect it does not occur in our particular usage of text, but it doesn't hurt to keep IMO. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:94 > > + memcpy(dest, src, sizeToCopy); > > I prefer the element by element initialization, since there are special cases to handle in the destination. For example, the addition of a new pointer in struct UText would end up being a problem with the memcpy. Again, I pretty much used the ICU shallowTextClone() as a model here. In the prior implementation of uTextLatin1Clone, there were handling of fields that were specific to the Latin1 text provider usage. Here, I chose to use a generic approach as indicated in utext.cpp. In this (next) updated patch, I've added code to perform fixups on pointers in the destination text structure that originally pointed into either the source text structure's extra buffer or somewhere else in the source text structure. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:459 > > + UTextWithBuffer uTextLatin1WithPriorContextLocal; > > + uTextLatin1WithPriorContextLocal.ut = emptyUText; > > + uTextLatin1WithPriorContextLocal.ut.extraSize = sizeof(uTextLatin1WithPriorContextLocal.buffer); > > + uTextLatin1WithPriorContextLocal.ut.pExtra = uTextLatin1WithPriorContextLocal.buffer; > > Same comment about adding WithPriorContext. I don't think it is needed. Fixed.

Glenn Adams

Comment 36 2013-03-20 19:43:22 PDT

Created attachment 194170 [details] Patch Address comments from darin and msaboff.

WebKit Review Bot

Comment 37 2013-03-21 09:31:10 PDT

Comment on attachment 194170 [details] Patch Attachment 194170 [details] did not pass cr-linux-debug-ews (chromium-xvfb): Output: http://webkit-commit-queue.appspot.com/results/17132595

Glenn Adams

Comment 38 2013-03-21 18:29:17 PDT

Created attachment 194409 [details] Patch Address non-ICU path; replace skipped expectations with rebaselines or requests for same.

Build Bot

Comment 39 2013-03-22 08:50:17 PDT

Comment on attachment 194409 [details] Patch Attachment 194409 [details] did not pass mac-ews (mac): Output: http://webkit-commit-queue.appspot.com/results/17286170 New failing tests: fast/sub-pixel/table-rtl-padding.html

Build Bot

Comment 40 2013-03-22 08:50:21 PDT

Created attachment 194558 [details] Archive of layout-test-results from webkit-ews-02 The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: webkit-ews-02 Port: <class 'webkitpy.common.config.ports.MacPort'> Platform: Mac OS X 10.8.2

Glenn Adams

Comment 41 2013-03-26 19:02:19 PDT

Created attachment 195207 [details] Patch Address regression in fast/sub-pixel/table-rtl-padding; expect fails on mac/chromium for rebaselining.

WebKit Review Bot

Comment 42 2013-03-26 21:18:28 PDT

Comment on attachment 195207 [details] Patch Attachment 195207 [details] did not pass chromium-ews (chromium-xvfb): Output: http://webkit-commit-queue.appspot.com/results/17237572 New failing tests: fast/text/international/003.html fast/text/international/text-combine-image-test.html fast/writing-mode/Kusa-Makura-background-canvas.html

WebKit Review Bot

Comment 43 2013-03-26 21:18:34 PDT

Created attachment 195218 [details] Archive of layout-test-results from gce-cr-linux-06 for chromium-linux-x86_64 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: gce-cr-linux-06 Port: chromium-linux-x86_64 Platform: Linux-3.3.8-gcg-201212281604-x86_64-with-GCEL-10.04-gcel_10.04

Build Bot

Comment 44 2013-03-26 22:12:08 PDT

Comment on attachment 195207 [details] Patch Attachment 195207 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-commit-queue.appspot.com/results/17286470 New failing tests: fast/text/international/003.html fast/text/international/text-combine-image-test.html fast/writing-mode/Kusa-Makura-background-canvas.html

Build Bot

Comment 45 2013-03-26 22:12:13 PDT

Created attachment 195221 [details] Archive of layout-test-results from webkit-ews-16 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: webkit-ews-16 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.2

WebKit Review Bot

Comment 46 2013-03-26 22:20:40 PDT

Comment on attachment 195207 [details] Patch Attachment 195207 [details] did not pass chromium-ews (chromium-xvfb): Output: http://webkit-commit-queue.appspot.com/results/17251671 New failing tests: fast/text/international/003.html fast/text/international/text-combine-image-test.html fast/writing-mode/Kusa-Makura-background-canvas.html

WebKit Review Bot

Comment 47 2013-03-26 22:20:47 PDT

Created attachment 195222 [details] Archive of layout-test-results from gce-cr-linux-03 for chromium-linux-x86_64 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: gce-cr-linux-03 Port: chromium-linux-x86_64 Platform: Linux-3.3.8-gcg-201212281604-x86_64-with-GCEL-10.04-gcel_10.04

Glenn Adams

Comment 48 2013-03-27 22:37:21 PDT

Comment on attachment 195207 [details] Patch marking as r- in order to terminate apparent endless retry loop on mac-ews test

Glenn Adams

Comment 49 2013-03-27 23:22:56 PDT

Created attachment 195488 [details] Patch

Darin Adler

Comment 50 2013-04-02 10:35:07 PDT

Comment on attachment 195488 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=195488&action=review Generally looks great. review- because of the problem in acquireLineBreakIterator. > Source/WebCore/ChangeLog:37 > + (LazyLineBreakIterator): Ideally, you should remove bogus lines like this that are added because of bugs in the prepare-ChangeLog script. Even better, you could get someone to fix that script, or do it yourself! > Source/WebCore/platform/text/TextBreakIterator.h:82 > + unsigned getPriorContextLength() const This function should be named priorContextLength in WebKit coding style. We reserve the word “get” for functions that return results in some other way. > Source/WebCore/platform/text/TextBreakIterator.h:85 > + ASSERT(sizeof(m_priorContext) / sizeof(m_priorContext[0]) >= 2); These should be a COMPILE_ASSERT rather than just an ASSERT. WTF has a macro for array length called WTF_ARRAY_LENGTH that should be used instead of writing out the math with two separate sizeofs. This should be asserting that it’s == 2, not just >= 2, since this function’s implementation assumes there are exactly 2 elements. Generally speaking, all the functions seem specific to a 2-element, array, so the assertions don’t seem quite right. For example, this one looks only at two characters, and others assume that the last character is at index 1 and the second to last at index 0. So all these functions are assuming 2 characters of context, not >= 2 or >= 1. > Source/WebCore/platform/text/TextBreakIterator.h:149 > + // N.B. prior context is explicitly not reset here since it needs to be accumulated and retained > + // across multiple inline nodes, each of which cause this reset() to be invoked in order > + // to update the primary context string. This comment makes it sound like we should consider renaming the reset() function for clarity about what it does and does not reset. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:87 > + const void* pointerSource = static_cast<const void*>(source); It’s strange to first cast these to void* and then to char*. I suggest a reinterpret_cast directly to char*. There’s no problem comparing a void* to a char*, and the number of casts in this function might be cut down considerably. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:129 > + ASSERT(errorCode); Seems a bit excessive to assert this. We don’t have to assert every pointer as non-null. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:130 > + ASSERT_NOT_REACHED(); Might be worth commenting why this is correct. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:145 > + return offset < INT32_MAX ? static_cast<int32_t>(offset) : 0; I didn’t know that INT32_MAX existed, portably. Normally I’d use numeric_limits<int32_t>::max(). Why is it OK to return 0 if the offset is too big to fit into 32 bits? A comment should say why; it’s not obvious. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:187 > + text->chunkLength = length < INT32_MAX ? static_cast<int32_t>(length) : 0; Same comments as above. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:209 > + text->chunkOffset = nativeIndex < INT32_MAX ? static_cast<int32_t>(nativeIndex) : 0; Same thing again. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:314 > + UText* text = utext_setup(&utWithBuffer->text, sizeof(utWithBuffer->buffer), status); > + if (U_SUCCESS(*status)) > + textInit(text, &textLatin1Funcs, string, length, priorContext, priorContextLength); > + return text; Is utext_setup guaranteed to return 0 if the status does not indicate success? WebKit style is normally early return for errors, so we’d write: if (U_FAILURE(*status)) { ASSERT(!text); return 0; } And continue with the call to textInit, rather than putting the normal case inside an if statement. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:419 > + if (U_SUCCESS(*status)) > + textInit(text, &textUTF16Funcs, string, length, priorContext, priorContextLength); > + return text; Same comment as above. > Source/WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp:263 > + Vector<UChar> utf16string(priorContextLength + length); This isn’t a good name for the local variable. There’s an argument called string that is also UTF-16, so calling this utf16string does nothing to make clear why it’s different. Also, the capitalization does not match WebKit coding style, which requires a capital S. I would call this stringWithPrependedContext or some other sort name that makes clear what the difference is. > Source/WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp:268 > + return acquireLineBreakIterator(utf16string.data(), utf16string.size(), locale, 0, 0); Having the function call itself like this doesn’t seem optimal to me. Is it OK to have a line break iterator that outlives the data it’s pointed to? When the local vector, utf16string, is deleted, why will the line break iterator still work properly? I think this is a major problem!

Darin Adler

Comment 51 2013-04-02 10:36:05 PDT

Comment on attachment 195488 [details] Patch I think it would be a good idea to break off the renaming and reformatting from this patch in TextBreakIterator.h and land it first, to make the real patch a bit smaller. Not required, but worthwhile.

Glenn Adams

Comment 52 2013-04-02 13:32:33 PDT

(In reply to comment #50) > (From update of attachment 195488 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=195488&action=review > > Source/WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp:263 > > + Vector<UChar> utf16string(priorContextLength + length); > > This isn’t a good name for the local variable. There’s an argument called string that is also UTF-16, so calling this utf16string does nothing to make clear why it’s different. Also, the capitalization does not match WebKit coding style, which requires a capital S. I would call this stringWithPrependedContext or some other sort name that makes clear what the difference is. I copied this usage from existing code in platform/text/TextBreakIterator.cpp: TextBreakIterator* acquireLineBreakIterator(const LChar* string, int length, const AtomicString& locale) { Vector<UChar> utf16string(length); for (int i = 0; i < length; ++i) utf16string[i] = string[i]; return acquireLineBreakIterator(utf16string.data(), length, locale); } > > > Source/WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp:268 > > + return acquireLineBreakIterator(utf16string.data(), utf16string.size(), locale, 0, 0); > > Having the function call itself like this doesn’t seem optimal to me. > > Is it OK to have a line break iterator that outlives the data it’s pointed to? When the local vector, utf16string, is deleted, why will the line break iterator still work properly? I think this is a major problem! Interestingly, that is also exactly what was done by the existing code I cite above. I wonder if it is essential that I attempt to fix up this !USE(ICU_UNICODE) code path, particularly since I don't have such a platform to build/test on. WinCE perhaps? Would it be better to simply not attempt to fix up this code and to let those who depend on this code path to effect a fix? Or I could simply add the new parameters (so that a build succeeds) and add an ASSERT if priorContextLength is non-zero thus forcing a debug time testing failure? What would you suggest?

Glenn Adams

Comment 53 2013-04-02 15:44:15 PDT

(In reply to comment #51) > (From update of attachment 195488 [details]) > I think it would be a good idea to break off the renaming and reformatting from this patch in TextBreakIterator.h and land it first, to make the real patch a bit smaller. Not required, but worthwhile. Done: https://bugs.webkit.org/show_bug.cgi?id=113823 Also, I separated out the new performance test: https://bugs.webkit.org/show_bug.cgi?id=113811

Glenn Adams

Comment 54 2013-04-02 20:17:38 PDT

Created attachment 196271 [details] Patch

EFL EWS Bot

Comment 55 2013-04-02 20:22:44 PDT

Comment on attachment 196271 [details] Patch Attachment 196271 [details] did not pass efl-ews (efl): Output: http://webkit-commit-queue.appspot.com/results/17388178

WebKit Review Bot

Comment 56 2013-04-02 20:30:53 PDT

Comment on attachment 196271 [details] Patch Attachment 196271 [details] did not pass chromium-ews (chromium-xvfb): Output: http://webkit-commit-queue.appspot.com/results/17323758

WebKit Review Bot

Comment 57 2013-04-02 20:32:28 PDT

Comment on attachment 196271 [details] Patch Attachment 196271 [details] did not pass cr-linux-debug-ews (chromium-xvfb): Output: http://webkit-commit-queue.appspot.com/results/17246817

Glenn Adams

Comment 58 2013-04-02 20:36:52 PDT

Created attachment 196273 [details] Patch

Glenn Adams

Comment 59 2013-04-02 22:00:40 PDT

Created attachment 196276 [details] Patch

Glenn Adams

Comment 60 2013-04-02 22:56:19 PDT

Comment on attachment 196276 [details] Patch Addressed comment #50 issues.

Glenn Adams

Comment 61 2013-04-02 23:05:12 PDT

(In reply to comment #50) > (From update of attachment 195488 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=195488&action=review Addressed in https://bugs.webkit.org/attachment.cgi?id=196276 > > Source/WebCore/ChangeLog:37 > > + (LazyLineBreakIterator): > > Ideally, you should remove bogus lines like this that are added because of bugs in the prepare-ChangeLog script. Fixed. > > Source/WebCore/platform/text/TextBreakIterator.h:82 > > + unsigned getPriorContextLength() const > This function should be named priorContextLength in WebKit coding style. We reserve the word “get” for functions that return results in some other way. Fixed in https://bugs.webkit.org/show_bug.cgi?id=113823. > > > Source/WebCore/platform/text/TextBreakIterator.h:85 > > + ASSERT(sizeof(m_priorContext) / sizeof(m_priorContext[0]) >= 2); > > These should be a COMPILE_ASSERT rather than just an ASSERT. > > WTF has a macro for array length called WTF_ARRAY_LENGTH that should be used instead of writing out the math with two separate sizeofs. > > This should be asserting that it’s == 2, not just >= 2, since this function’s implementation assumes there are exactly 2 elements. Generally speaking, all the functions seem specific to a 2-element, array, so the assertions don’t seem quite right. For example, this one looks only at two characters, and others assume that the last character is at index 1 and the second to last at index 0. So all these functions are assuming 2 characters of context, not >= 2 or >= 1. Fixed in https://bugs.webkit.org/show_bug.cgi?id=113823. > > > Source/WebCore/platform/text/TextBreakIterator.h:149 > > + // N.B. prior context is explicitly not reset here since it needs to be accumulated and retained > > + // across multiple inline nodes, each of which cause this reset() to be invoked in order > > + // to update the primary context string. > > This comment makes it sound like we should consider renaming the reset() function for clarity about what it does and does not reset. Fixed in https://bugs.webkit.org/show_bug.cgi?id=113823. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:87 > > + const void* pointerSource = static_cast<const void*>(source); > > It’s strange to first cast these to void* and then to char*. I suggest a reinterpret_cast directly to char*. There’s no problem comparing a void* to a char*, and the number of casts in this function might be cut down considerably. Fixed. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:129 > > + ASSERT(errorCode); > > Seems a bit excessive to assert this. We don’t have to assert every pointer as non-null. Fixed. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:130 > > + ASSERT_NOT_REACHED(); > > Might be worth commenting why this is correct. Added comment. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:145 > > + return offset < INT32_MAX ? static_cast<int32_t>(offset) : 0; > > I didn’t know that INT32_MAX existed, portably. Normally I’d use numeric_limits<int32_t>::max(). > > Why is it OK to return 0 if the offset is too big to fit into 32 bits? A comment should say why; it’s not obvious. Fixed. Use max() and added comments. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:187 > > + text->chunkLength = length < INT32_MAX ? static_cast<int32_t>(length) : 0; > > Same comments as above. > Fixed. Use max() and added comments. > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:209 > > + text->chunkOffset = nativeIndex < INT32_MAX ? static_cast<int32_t>(nativeIndex) : 0; > > Same thing again. > Fixed. Use max() and added comments. > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:314 > > + UText* text = utext_setup(&utWithBuffer->text, sizeof(utWithBuffer->buffer), status); > > + if (U_SUCCESS(*status)) > > + textInit(text, &textLatin1Funcs, string, length, priorContext, priorContextLength); > > + return text; > > Is utext_setup guaranteed to return 0 if the status does not indicate success? WebKit style is normally early return for errors, so we’d write: > > if (U_FAILURE(*status)) { > ASSERT(!text); > return 0; > } > > And continue with the call to textInit, rather than putting the normal case inside an if statement. > Fixed. > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:419 > > + if (U_SUCCESS(*status)) > > + textInit(text, &textUTF16Funcs, string, length, priorContext, priorContextLength); > > + return text; > > Same comment as above. > Fixed. > > Source/WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp:263 > > + Vector<UChar> utf16string(priorContextLength + length); > > This isn’t a good name for the local variable. There’s an argument called string that is also UTF-16, so calling this utf16string does nothing to make clear why it’s different. Also, the capitalization does not match WebKit coding style, which requires a capital S. I would call this stringWithPrependedContext or some other sort name that makes clear what the difference is. > Removed code. See also comment #52. > > Source/WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp:268 > > + return acquireLineBreakIterator(utf16string.data(), utf16string.size(), locale, 0, 0); > > Having the function call itself like this doesn’t seem optimal to me. No longer applicable, but FYI, modern compilers convert this to iteration by tail call optimization. [It's a programming paradigm I often used in Lisp back in the "good ole days" :)] > > Is it OK to have a line break iterator that outlives the data it’s pointed to? When the local vector, utf16string, is deleted, why will the line break iterator still work properly? I think this is a major problem! Replaced code with asserts since I really can't build/test this anyway. See also comment #52. If you suggest another approach, let me know.

Glenn Adams

Comment 62 2013-04-03 02:28:10 PDT

(In reply to comment #61) > (In reply to comment #50) > > > Source/WebCore/platform/text/TextBreakIterator.h:82 > > > + unsigned getPriorContextLength() const > > This function should be named priorContextLength in WebKit coding style. We reserve the word “get” for functions that return results in some other way. > > Fixed in https://bugs.webkit.org/show_bug.cgi?id=113823. Correction. Fixed (in current patch to this bug).

WebKit Review Bot

Comment 63 2013-04-03 14:52:18 PDT

Comment on attachment 196276 [details] Patch Clearing flags on attachment: 196276 Committed r147588: <http://trac.webkit.org/changeset/147588>

WebKit Review Bot

Comment 64 2013-04-03 14:52:28 PDT

All reviewed patches have been landed. Closing bug.

Benjamin Poulain

Comment 65 2013-04-03 16:03:23 PDT

This caused 3 failures on the bots. Can you have a look? (rebaseline needed?)

Glenn Adams

Comment 66 2013-04-03 16:06:07 PDT

(In reply to comment #65) > This caused 3 failures on the bots. Can you have a look? (rebaseline needed?) Those were expected as I marked in the ChangeLog: Expect rebaselines will be needed due to different line breaks: fast/text/international/003.html fast/text/international/text-combine-image-test-expected.html fast/text/writing-mode/Kusa-Makura-background-canvas.html I'm waiting for bots to catch up so I can rebaseline with garden-o-magic. This will be my first attempt to use it.

Simon Fraser (smfr)

Comment 67 2013-04-03 22:25:04 PDT

Are these diffs expected? http://build.webkit.org/results/Apple%20MountainLion%20Release%20WK1%20(Tests)/r147588%20(8648)/retries/fast/text/international/003-diffs.html The new results look rather different.

Glenn Adams

Comment 68 2013-04-04 00:45:23 PDT

(In reply to comment #67) > Are these diffs expected? > http://build.webkit.org/results/Apple%20MountainLion%20Release%20WK1%20(Tests)/r147588%20(8648)/retries/fast/text/international/003-diffs.html > > The new results look rather different. Yes, they were. WIth this patch, there are new line break opportunities that were not previously considered, particularly at element boundaries. The test file you cite and two others have been rebaselined to accommodate the different line breaks. I just committed the third of three rebaseline sets: [1] http://trac.webkit.org/changeset/147599 [Chromium] [2] http://trac.webkit.org/changeset/147601 [GTK] [3] http://trac.webkit.org/changeset/147613 [Mac] if I missed any, please let me know or go ahead with a rebaseline.

mitz

Comment 69 2013-04-16 21:26:10 PDT

(In reply to comment #63) > (From update of attachment 196276 [details]) > Clearing flags on attachment: 196276 > > Committed r147588: <http://trac.webkit.org/changeset/147588> This change caused bug 114721.

Mark Salisbury

Comment 70 2013-06-06 14:27:17 PDT

(In reply to comment #52) > I wonder if it is essential that I attempt to fix up this !USE(ICU_UNICODE) code path, particularly since I don't have such a platform to build/test on. WinCE perhaps? Would it be better to simply not attempt to fix up this code and to let those who depend on this code path to effect a fix? I'm planning to submit a patch for this. I've opened https://bugs.webkit.org/show_bug.cgi?id=117320.

Eric Seidel (no email)

Comment 71 2013-06-24 17:17:07 PDT

FYI, Blink also took this patch and found it to be a 4% regression in PerformanceTests/Layout/hindi-line-layout.html https://code.google.com/p/chromium/issues/detail?id=252050 We're going to keep the patch, but it's also sad to make are already slow complex text path slower. :( http://stackoverflow.com/questions/16554205/google-chrome-text-rendering-for-non-latin-languages-is-too-slow

Antoine Quint

Comment 72 2013-10-23 06:38:17 PDT

(In reply to comment #63) > (From update of attachment 196276 [details]) > Clearing flags on attachment: 196276 > > Committed r147588: <http://trac.webkit.org/changeset/147588> This change apparently caused a regression, see https://bugs.webkit.org/show_bug.cgi?id=123204.

Attachments
test case (675 bytes, text/html) 2012-12-23 09:33 PST, Koji Ishii	no flags	Details
test case result (5.66 KB, image/png) 2013-01-13 23:05 PST, Koji Ishii	no flags	Details
test case expected (8.49 KB, image/png) 2013-01-13 23:06 PST, Koji Ishii	no flags	Details
Patch (7.21 KB, patch) 2013-03-11 19:14 PDT, Glenn Adams	webkit.review.bot: commit-queue-	Details Formatted Diff Diff
Patch (24.03 KB, patch) 2013-03-15 20:24 PDT, Glenn Adams	no flags	Details Formatted Diff Diff
Patch (42.90 KB, patch) 2013-03-18 22:12 PDT, Glenn Adams	no flags	Details Formatted Diff Diff
Patch (43.13 KB, patch) 2013-03-18 22:37 PDT, Glenn Adams	no flags	Details Formatted Diff Diff
Patch (71.00 KB, patch) 2013-03-19 14:38 PDT, Glenn Adams	no flags	Details Formatted Diff Diff
Line layout performance results (en, jp) for attachment 193924. (46.50 KB, text/html) 2013-03-19 14:44 PDT, Glenn Adams	no flags	Details
Parser (html5-full-render) performance results for attachment 193924. (32.53 KB, text/html) 2013-03-19 14:47 PDT, Glenn Adams	no flags	Details
Patch (73.12 KB, patch) 2013-03-20 19:43 PDT, Glenn Adams	no flags	Details Formatted Diff Diff
Patch (263.95 KB, patch) 2013-03-21 18:29 PDT, Glenn Adams	no flags	Details Formatted Diff Diff
Archive of layout-test-results from webkit-ews-02 (1.16 MB, application/zip) 2013-03-22 08:50 PDT, Build Bot	no flags	Details
Patch (79.70 KB, patch) 2013-03-26 19:02 PDT, Glenn Adams	glenn: review- webkit.review.bot: commit-queue-	Details Formatted Diff Diff
Archive of layout-test-results from gce-cr-linux-06 for chromium-linux-x86_64 (1.76 MB, application/zip) 2013-03-26 21:18 PDT, WebKit Review Bot	no flags	Details
Archive of layout-test-results from webkit-ews-16 for mac-mountainlion-wk2 (1.15 MB, application/zip) 2013-03-26 22:12 PDT, Build Bot	no flags	Details
Archive of layout-test-results from gce-cr-linux-03 for chromium-linux-x86_64 (1.89 MB, application/zip) 2013-03-26 22:20 PDT, WebKit Review Bot	no flags	Details
Patch (625.42 KB, patch) 2013-03-27 23:22 PDT, Glenn Adams	no flags	Details Formatted Diff Diff
Patch (226.94 KB, patch) 2013-04-02 20:17 PDT, Glenn Adams	no flags	Details Formatted Diff Diff
Patch (226.99 KB, patch) 2013-04-02 20:36 PDT, Glenn Adams	no flags	Details Formatted Diff Diff
Patch (583.64 KB, patch) 2013-04-02 22:00 PDT, Glenn Adams	no flags	Details Formatted Diff Diff
Show Obsolete (17) View All Add attachment proposed patch, testcase, etc.

benjamin
buildbot
darin
dglazkov
dujid0
enrica
eric
esprehn+autocc
glenn
graouts
gyuyoung.kim
j2.nete
leviw
mark.salisbury
mitz
msaboff
ojan.autocc
rakuco
rniwa
simon.fraser
syoichi
webkit-ews
webkit.review.bot