http://dev.w3.org/csswg/css3-text/#line-break has introduced the following values for line-break. ‘auto’ The UA determines the set of line-breaking restrictions to use, and it may vary the restrictions based on the length of the line; e.g., use a less restrictive set of line-break rules for short lines. ‘loose’ Breaks text using the least restrictive set of line-breaking rules. Typically used for short lines, such as in newspapers. ‘normal’ Breaks text using the most common set of line-breaking rules. ‘strict’ Breaks text using the most stringent set of line-breaking rules. Depending on the value specified, line breaking behavior has to be different. For ports that use ICU, one possibility is to create separate line breaking iterators instances with different tailorings for each value. The other possibility (ports that can easily change ICU itself such as Chromium) is to add tailored line breaking rule files to ICU ( line_ja_strict for strict mode in Japanese, line_ja_loose for loose mode in Japanese, etc) and pick one by changing the locale name. See also http://unicode.org/cldr/trac/ticket/4931 (CLDR feature request to parameterize line breaking rules).
Another related question is which of three values has to be the default. IE and Firefox use 'normal' as the default. The current Safari and Chrome behavior (based on ICU 4.x) is roughly 'strict', I believe.
Regarding the default selection, allow me to add some background just in case this helps discussions. Traditional documents usually use "strict" even today, and since JLREQ[1]'s goal is to document traditional typesetting, it recommends "strict" as default. However, casual documents tend to prefer "normal" today. MS Word defaults to similar setting as "normal," IE and Firefox as well. There was a discussion in Japan to change JLREQ's default to "normal," but we concluded that JLREQ should follow tradition given its goal. Making normal as default has another benefit for small screen because strict rule will cause more justification. Less justifications are usually considered better. I prefer making normal as default for Web for reasons above, and allowing author to select would be great for e-book publishers. [1] http://www.w3.org/TR/jlreq/
A feature request at ICU is here: http://bugs.icu-project.org/trac/ticket/9379 Request to add Japanese linebreak tailoring selectable as variations
if someone else is already is working this, let me know and we can coordinate; i'm planning to implement natively as a first pass, i.e., without making use of ICU, in order to obtain behavioral consistency across ports
(In reply to comment #4) > i'm planning to implement natively as a first pass, i.e., without making use of ICU, in order to obtain behavioral consistency across ports scratch that last; now that i've looked things over, i expect i can use ICU but supply a custom rule set
(In reply to comment #5) > (In reply to comment #4) > > i'm planning to implement natively as a first pass, i.e., without making use of ICU, in order to obtain behavioral consistency across ports > > scratch that last; now that i've looked things over, i expect i can use ICU but supply a custom rule set How are you trying to "use ICU but supply a custom rule set"? Are you going to wrap ICU with additional code, or to change ICU itself? Could you please share the plan before you got too much code written? Jungshik Shin manages ICU in Chromium, so I'd like to make sure he's ok with your plan. If we modify ICU, how to upstream the changes to ICU is an issue. If we don't upstream, we need to consider how we'd keep merging the local changes to future ICU. Some platforms use ICU in OS, so that'd be another concern. If you're wrapping ICU line breaking APIs, we don't have to worry about upstreaming changes to ICU. It aligns to some requests to add @line-break-rule to allow authors fully control break-before/after characters in future levels of CSS Text. The risk here is we may need a bit more code than modifying ICU.
(In reply to comment #6) > (In reply to comment #5) > > (In reply to comment #4) > > > i'm planning to implement natively as a first pass, i.e., without making use of ICU, in order to obtain behavioral consistency across ports > > > > scratch that last; now that i've looked things over, i expect i can use ICU but supply a custom rule set > > How are you trying to "use ICU but supply a custom rule set"? Are you going to wrap ICU with additional code, or to change ICU itself? Could you please share the plan before you got too much code written? Jungshik Shin manages ICU in Chromium, so I'd like to make sure he's ok with your plan. By using ubrk_openRules instead of ubrk_open (from LineBreakIteratorPool::take); at present, ICU does not admit any customization parameters other than the locale string itself when constructing a BreakIterator. Since there is no locale intrinsic mechanism for specifying break iterator variants (without introducing a new locale string keyword, e.g., @break), that leaves us with having to use ubrk_openRules and specify the rule set from the outside, e.g., in a fashion similar to cursorBreakIterator (cf. TextBreakIteratorICU.cpp). Thus I'm leaning towards using ubrk_openRules as an initial approach pending further consideration of support in ICU itself (see below). > > If we modify ICU, how to upstream the changes to ICU is an issue. If we don't upstream, we need to consider how we'd keep merging the local changes to future ICU. Some platforms use ICU in OS, so that'd be another concern. My thinking for subsequent migration of this to ICU is to add an @break keyword to the ICU supported locale, e.g., "ja@break=loose", etc. This would load a different set of rules than currently found in icu/source/data/brkitr/line_ja.txt, e.g., loading instead from icu/source/data/brkitr/line_ja_loose.txt. > > If you're wrapping ICU line breaking APIs, we don't have to worry about upstreaming changes to ICU. It aligns to some requests to add @line-break-rule to allow authors fully control break-before/after characters in future levels of CSS Text. The risk here is we may need a bit more code than modifying ICU. Presuming that the semantics of the current CSS3 line-break values can be effectively encoded using the ICU break rules syntax (which I am now becoming familiar with), then using ubrk_openRules should minimize the additional wrapper code needed, reducing it to a number of data sets containing the necessary rules. Unfortunately, it doesn't look like there is an easy way to have one set of rules inherit from another set of rules, which means violating DRY principles. I notice, e.g., that line_ja.txt copies the rules from line.txt modifying as needed. We will need to end up with something like the following sets: line_loose.txt <= copies from line.txt with mods line_normal.txt <= copies from line.txt (or is same as line.txt?) line_strict.txt <= copies from line.txt with mods line_ja_loose.txt <= copies from line_ja.txt with mods line_ja_normal.txt <= copies from line_ja.txt with mods line_ja_strict.txt <= copies from line_ja.txt with mods (this may be same as current line_ja.txt) line_zh.txt <= copies from line.txt with mods line_zh_loose.txt <= copies from line_zh.txt with mods line_zh_normal.txt <= copies from line_zh.txt (with mods?) line_zh_strict.txt <= copies from line_zh.txt with mods Comments?
I've created two wiki pages that attempt to describe where I am going with resolving this bug. * https://trac.webkit.org/wiki/LineBreaking * https://trac.webkit.org/wiki/LineBreakingCSS3Mapping Please review and comment.
Looks good to me. You might know but please note that each port uses different versions of ICU. Chromium has its own copy in the source tree. I don't know how much ICU changes its APIs across versions, but I wish your attention here.
(In reply to comment #9) > Looks good to me. You might know but please note that each port uses different versions of ICU. Chromium has its own copy in the source tree. I don't know how much ICU changes its APIs across versions, but I wish your attention here. thanks for that heads up
this bug is expected to change the values of the -webkit-line-break property; in particular, the following changes are expected: (1) add 'auto', 'loose', and 'strict' values (as defined in CSS3 Text); (2) change value 'normal' to mean as defined by CSS3 Text; (3) change initial (default) value to 'auto' this means that authors expecting the default value returned to be 'normal' will now get 'auto', and the meaning of 'normal' will (very slightly) change from the previous use of normal; before the change, 'normal' translated to the use of ICU's default line breaking rules, possibly determined by locale (only 'ja' and 'fi' locales have a different set of rules in ICU 49); after the change, 'normal' will mean the same thing, plus the rules of CSS3 Text that apply to 'normal' will apply as well; see the following links for further details: * https://trac.webkit.org/wiki/LineBreaking * https://trac.webkit.org/wiki/LineBreakingCSS3Mapping note that no patch has yet been submitted or applied, so the above information is forward looking, and may change before this bug is closed
Created attachment 162188 [details] Patch
Created attachment 162213 [details] Patch
Has this feature been announced on webkit-dev? I saw an e-mail discussing testing strategy, but not a discussion of the feature itself.
Comment on attachment 162213 [details] Patch Attachment 162213 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13755805 New failing tests: fast/css3-line-break/line-break-loose-inseparables-ko.html fast/css3-line-break/line-break-loose-centered-zh.html fast/css3-line-break/line-break-auto-sound-marks-en.html fast/css3-line-break/line-break-auto-hyphens-ja.html fast/css3-line-break/line-break-auto-sound-marks-ko.html fast/css3-line-break/line-break-loose-iteration-marks-ja.html fast/css3-line-break/line-break-auto-half-kana-en.html fast/css3-line-break/line-break-loose-prefixes-ja.html fast/css3-line-break/line-break-loose-iteration-marks-zh.html fast/css3-line-break/line-break-auto-hyphens-ko.html fast/css3-line-break/line-break-loose-hyphens-ko.html fast/css3-line-break/line-break-loose-postfixes-ja.html fast/css3-line-break/line-break-loose-iteration-marks-ko.html fast/css3-line-break/line-break-loose-centered-ko.html fast/canvas/webgl/shader-precision-format.html fast/css3-line-break/line-break-loose-prefixes-ko.html fast/css3-line-break/line-break-loose-centered-ja.html fast/css3-line-break/line-break-loose-inseparables-en.html fast/css3-line-break/line-break-loose-hyphens-ja.html fast/css3-line-break/line-break-auto-hyphens-zh.html fast/css3-line-break/line-break-auto-sound-marks-zh.html fast/css3-line-break/line-break-loose-hyphens-zh.html fast/css3-line-break/line-break-loose-postfixes-zh.html fast/css3-line-break/line-break-loose-postfixes-ko.html fast/css3-line-break/line-break-loose-iteration-marks-en.html fast/css3-line-break/line-break-auto-sound-marks-ja.html fast/css3-line-break/line-break-loose-inseparables-ja.html fast/css3-line-break/line-break-loose-inseparables-zh.html
Thank you for the patch. (In reply to comment #5) > i expect i can use ICU but supply a custom rule set That's my approach #1 in comment #0. It works without for all ports that use ICU regardless of ICU versions used by a particular port (as long as Unicode properties you use in your rules are well-defined in the version of ICU used by a port). Not having dependency on the version of ICU (and CLDR used by ICU). Then, we don't have to wait for an ICU bug [1] and a CLDR bug [2] to be fixed. OTOH, the size of webkit binary will be slightly larger. And, we have to make sure that Unicode properties referred to in your rules are all available in ICU versions used by webkit ports. Because it takes time to fix ICU and CLDR issues and for a fixed version to be shipped with Mac OS/iOS (ports like Chrome can apply its own patches to ICU easily, but that's not the case for other ICU ports), your current approach is the only viable way to make all ICU-based webkit ports behave consistently. A slight variation would be to add #ifdef (either port-dependency or ICU-version dependency) and use custom rules for ports-using-older-icu and use ICU's upcoming api [1] for ports-using-newer-icu. [1] ttp://bugs.icu-project.org/trac/ticket/9379 [2] http://unicode.org/cldr/trac/ticket/4931
rewriting my previous comment: There can be two code paths (if #ifdef'd codes for different ports in this part of Webkit source is acceptable) : 1. Use Adam's patch for webkit-port-using-old ICU 2a Use a new upcoming ICU API (yet-to-be-implemented) for webkit-port-using-new ICU 2b Append '@css3lb={loose,strict}' to a locale name when creating a linebreak iterator for webkit ports using an ICU with separate line breaking rules for CSS3 loose and strict. No webkit port using ICU has this, but we can add them easily to Chrome. Because an ICU API to parameterize line breaking rules not just with a locale but also with CSS3 line breaking mode is not yet implemented anywhere, we can also consider taking approach 2b in the meantime for ports like Chromium that can easily add additional line breaking rules to its ICU. 2b is the second approach I proposed in the original bug report.
(In reply to comment #17) > rewriting my previous comment: > > There can be two code paths (if #ifdef'd codes for different ports in this part of Webkit source is acceptable) : We should avoid any option that requires #ifdef code if there is a viable alternative. > > 1. Use Adam's patch for webkit-port-using-old ICU This is the only viable option until (and if) ICU itself is updated to directly support all of the CSS3 LB semantics. As a preliminary step, I am using a locale with keyword in preparation for such eventuality. I used the following keywords (see patched LineBreakIteratorPool::take): @break=loose @break=normal @break=strict I would suggest a general keyword name like 'break' rather than 'css3lb'. Though, we could define css3 specific values for it, such as 'css3loose', 'css3normal', 'css3strict'. We do need an explicit 'normal' (or equivalent) since its semantics (as specified by CSS3 Text) does not match the current (builtin) ICU rules. > 2a Use a new upcoming ICU API (yet-to-be-implemented) for webkit-port-using-new ICU We can migrate to that over time when it becomes available. But that shouldn't prevent us from starting with the current approach (using WK copy of rules, parameterized in WK). > 2b Append '@css3lb={loose,strict}' to a locale name when creating a linebreak iterator for webkit ports using an ICU with separate line breaking rules for CSS3 loose and strict. No webkit port using ICU has this, but we can add them easily to Chrome. See above. > Because an ICU API to parameterize line breaking rules not just with a locale but also with CSS3 line breaking mode is not yet implemented anywhere, we can also consider taking approach 2b in the meantime for ports like Chromium that can easily add additional line breaking rules to its ICU. 2b is the second approach I proposed in the original bug report. I think we should use approach (1) for now. I don't see any need for a Chrome specific modification if its current ICU implementation supports the approach of (1). To be clear, the short term approach should be (1), while the long term approach can be (2a) a future version of released ICU that supports a @break keyword and the necessary semantics, with these changes backported into Chromes' flavor of ICU.
Created attachment 162373 [details] Patch
Please do answer the question in comment 14.
(In reply to comment #14) > Has this feature been announced on webkit-dev? I saw an e-mail discussing testing strategy, but not a discussion of the feature itself. http://mac-os-forge.2317878.n4.nabble.com/adding-support-for-css3-line-break-property-to-webkit-line-break-td192326.html
Created attachment 162404 [details] Patch
Approach looks pretty good to me. Can you talk briefly about what the difference is between auto and normal, since that is the only compatibility concern I see here?
@Glenn, I'm fine with custom-rules hardcoded into Webkit source code as it's done in your patch. It's less work and less cluttering (no #ifdef's) in Webkit tree with negligible code size increase. The only concern is that your patch is still ICU-version dependency because [:LineBreak = Conditional_Japanese_Starter:] was relatively recently introduced and not all ICU versions used by webkit-ports support that. +static const char* uax14AssignmentsBefore = + "$CJ = [:LineBreak = Conditional_Japanese_Starter:];"; To make your patch ICU-version-independent, you have to specify the set explicitly instead of relying on LB=C_J_S. Other sets used also have some ICU-version dependeny, but I think we just have to live with those differences unless we want to hard-code Unicode line breaking and other character properties in webkit source files (which I don't think we do).
Created attachment 162646 [details] Patch
(In reply to comment #24) > The only concern is that your patch is still ICU-version dependency because [:LineBreak = Conditional_Japanese_Starter:] was relatively recently introduced and not all ICU versions used by webkit-ports support that. Thanks for that pointer, as that is probably what has been causing fails on cr-linux EWS, which I see uses ICU46 (patched). I've removed reference to C_J_S and fully enumerated $CJ. Just re-submitted patch and will see if that fixes it. [I guess i'm just going to have to install and start building/testing cr-linux; guessing is not the best debugging technique for resolving port-specific fails.]
Created attachment 162654 [details] Patch
(In reply to comment #23) > Approach looks pretty good to me. Can you talk briefly about what the difference is between auto and normal, since that is the only compatibility concern I see here? From the perspective of the prior -webkit-line-break behavior, 'normal' effectively mapped to ICU's default LB rules, modulo locale specific tailoring, of which only a few locale specific rule sets were available in ICU: * default (i.e., no locale or locale with no rule set) * 'ja' locale * 'fi' locale [These are the 3 LB rule sets found in ICU50, which none of WK's current platforms use yet. Some are using ICU49, cr-linux uses a locally modified version of ICU46, etc.] Given different platforms provide different versions of ICU (in general) and different locale specific LB rule sets, the behavior of 'normal' has been somewhat variable. In contrast, CSS3 Text 'normal' prescribes specific LB rules for certain CJK related character subsets. Some of these apply if locale is CJK and some apply in non-CJK cases. For other characters, CSS3 Text is relatively silent on LB behavior, merely deferring to the default line break rules defined in [1], of which "soft break opportunity" applies in most cases. So, 'normal' in the pre-CSS3 case and in the CSS3 case differ primarily in these CJK related characters. The definition of 'auto' in CSS3 Text is non-specific, and leaves it to UA dependent behavior except for a small number of mandatory rules defined in [1]. What I've done is to map 'auto' to the same behavior of the *old* 'normal' when locale is unspecified or is not CJK; in the case of 'auto' and locale is CJK, then I map to the *new* (i.e., CSS3 Text) 'normal'. This approach seems to provide the best backward compatibility for the *old* default 'normal' while migrating towards the *new* 'normal' in CJK behavior in the case of a defaulted 'auto' value. More details can be found at [2]. [1] http://dev.w3.org/csswg/css3-text/#line-break-details [2] https://trac.webkit.org/wiki/LineBreakingCSS3Mapping
Created attachment 162663 [details] Patch
break_lines.cpp is incredibly hot code. You should be sure to run PerformanceTests/Layout/line-breaking.html to make sure you're not regressing ascii-fast-path line breaking.
(In reply to comment #30) > break_lines.cpp is incredibly hot code. You should be sure to run PerformanceTests/Layout/line-breaking.html to make sure you're not regressing ascii-fast-path line breaking. I'm getting the following results from line-layout.html (perftest) on revision 127813, using a MBP (Retina, Mid 2012), release configuration: Without Patch: avg 92.15439425716251 runs/s median 92.10526315789474 runs/s stdev 0.2407525458781455 runs/s min 91.62303664921465 runs/s max 92.5925925925926 runs/s With Patch: avg 93.33876206192781 runs/s median 93.28671328671328 runs/s stdev 0.2669316373954801 runs/s min 92.96148738379814 runs/s max 94.00705052878966 runs/s So the patch is showing a slight performance increase, though I'm at a loss to explain it.
Comment on attachment 162663 [details] Patch Attachment 162663 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13776847 New failing tests: fast/css3-line-break/line-break-loose-inseparables-ko.html fast/css3-line-break/line-break-loose-centered-zh.html fast/css3-line-break/line-break-auto-sound-marks-en.html fast/css3-line-break/line-break-auto-hyphens-ja.html fast/css3-line-break/line-break-auto-sound-marks-ko.html fast/css3-line-break/line-break-loose-iteration-marks-ja.html fast/css3-line-break/line-break-auto-half-kana-en.html fast/css3-line-break/line-break-loose-prefixes-ja.html fast/css3-line-break/line-break-loose-iteration-marks-zh.html fast/css3-line-break/line-break-auto-hyphens-ko.html fast/css3-line-break/line-break-loose-sound-marks-en.html fast/css3-line-break/line-break-loose-hyphens-ko.html fast/css3-line-break/line-break-loose-postfixes-ja.html fast/css3-line-break/line-break-loose-iteration-marks-ko.html fast/css3-line-break/line-break-loose-centered-ko.html fast/canvas/webgl/shader-precision-format.html fast/css3-line-break/line-break-loose-prefixes-ko.html fast/css3-line-break/line-break-loose-centered-ja.html fast/css3-line-break/line-break-loose-inseparables-en.html fast/css3-line-break/line-break-loose-hyphens-ja.html fast/css3-line-break/line-break-auto-hyphens-zh.html fast/css3-line-break/line-break-auto-sound-marks-zh.html fast/css3-line-break/line-break-loose-hyphens-zh.html fast/css3-line-break/line-break-loose-prefixes-zh.html fast/css3-line-break/line-break-loose-postfixes-zh.html fast/css3-line-break/line-break-loose-postfixes-ko.html fast/css3-line-break/line-break-loose-iteration-marks-en.html fast/css3-line-break/line-break-auto-sound-marks-ja.html fast/css3-line-break/line-break-loose-inseparables-ja.html fast/css3-line-break/line-break-loose-inseparables-zh.html
> Without Patch: > > avg 92.15439425716251 runs/s > median 92.10526315789474 runs/s > stdev 0.2407525458781455 runs/s > min 91.62303664921465 runs/s > max 92.5925925925926 runs/s > > With Patch: > > avg 93.33876206192781 runs/s > median 93.28671328671328 runs/s > stdev 0.2669316373954801 runs/s > min 92.96148738379814 runs/s > max 94.00705052878966 runs/s > > So the patch is showing a slight performance increase, though I'm at a loss to explain it. Did you just that just once? or are those results the median? If it is one run, you cannot trust those results. Run the test a couple of time without the patch and verify the results are stable. If they are not, find what is causing the variation on your machine. Keep all the results, and run the test again a few times with the patch. The result of the very first run is also particularly unreliable, even on a stable setup. /me wishes there was an automated way to run WebCore tests.
Do you mean like Tools/Scripts/run-perf-tests? Or were you looking for a full solution like autovicki (old safari tool)?
(In reply to comment #34) > Do you mean like Tools/Scripts/run-perf-tests? Or were you looking for a full solution like autovicki (old safari tool)? I mean something to mitigate the issues described above. Maybe I should modify run-perf-tests to do 1) one warmup launch 2) run the tests a dozen of time 3) report the median results (and tell me if the variance is reasonable). For some reason, on my setup, I have more reliable results with tests like Sunspider than when I use run-perf-tests. With run-perf-tests, I often run tests 3-4 times to make sure what I measure is correct. Am I the only one with that issue? :-D
Glenn's results already include standard deviation.
I would encourage you to file bugs against run-perf-tests and cc rniwa. I suspect he's eager to hear your feedback (and help you hack on it yourself if you so desire).
(In reply to comment #33) > > So the patch is showing a slight performance increase, though I'm at a loss to explain it. > > Did you just that just once? or are those results the median? I elided the individual test run results above, of which there were 20, with the initial one ignored. Just load PerformanceTests/Layout/line-layout.html to repeat.
(In reply to comment #36) > Glenn's results already include standard deviation. As reported by the test. Which is the one I was explaining has been unreliable in my experience.
Created attachment 163306 [details] Patch
(In reply to comment #26) > (In reply to comment #24) > > The only concern is that your patch is still ICU-version dependency because [:LineBreak = Conditional_Japanese_Starter:] was relatively recently introduced and not all ICU versions used by webkit-ports support that. Turns out that :LineBreak = Hebrew_Letter: also doesn't work on ICU46 (cr-linux). I've handled this by using the older style definition of [[:Hebrew:] & [:Letter:]]. There are also a couple of minor differences in the handling of small kana and prolonged sound marks between the default ICU49 and ICU46 non-CJK locale behaviors that affects 'auto' behavior with non-CJK @lang. I've addressed this by adding two reftest expected variants under platform/chromium-linux. The latest patch attachment 163306 [details] contains these fixes, as well as a reworking of the break_lines changes to use template parameters to remove possible performance effects from special casing of loose mode behavior.
Comment on attachment 163306 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=163306&action=review > Source/WebCore/platform/text/LineBreakIteratorPoolICU.h:61 > + else { > + StringBuilder sb; > + sb.append(locale); > + sb.appendLiteral("@break="); > + switch (mode) { Do we have any perf testing of this path? Building this string may turn out to be expensive in this possibly hot code path. It's not clear to me how often we make TextBreakIterators. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:93 > + if (!locale.isEmpty()) { nit: I would have written this as an early return. If (locale.isEmpty()) return false; > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:96 > + String language = String::createUninitialized(languageBufferLength, reinterpret_cast<LChar*&>(languageBuffer)); I'm confused. How can this work? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:101 > + language.truncate(languageLength); > + language.makeLower(); More mallocs, are going to likely make this path expensive...
Comment on attachment 163306 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=163306&action=review > Source/WebCore/rendering/break_lines.cpp:150 > -template<bool treatNoBreakSpaceAsBreak> > +template<bool treatNoBreakSpaceAsBreak, bool isLooseMode> Wait a minute, if the previous change improved performance as you claim, why did you change it to use a template? Making the generated asm bigger is not good. Also two bools is a bad idea here. You should transform that to two enums.
(In reply to comment #42) > (From update of attachment 163306 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=163306&action=review > > > Source/WebCore/platform/text/LineBreakIteratorPoolICU.h:61 > > + else { > > + StringBuilder sb; > > + sb.append(locale); > > + sb.appendLiteral("@break="); > > + switch (mode) { > > Do we have any perf testing of this path? Building this string may turn out to be expensive in this possibly hot code path. It's not clear to me how often we make TextBreakIterators. This path is taken only if author specifies an explicit line break mode of 'loose', 'normal', or 'strict'. If author specifies 'auto' or the default (initial) value 'auto' applies, then the non-allocating path if (mode == LineBreakIteratorModeUAX14) localeWithOptionalBreakKeyword = locale; is taken. If author does specify loose|normal|strict, then the string building path is taken on the first get() on LazyLineBreakIterator after a reset() of the same, which occurs once for each RenderText object enumerated by RenderBlock::LineBreaker::nextLineBreak (provided the text differs from the current text set into the LazyLineBreakIterator). I agree that the string building path could be further optimized for this latter case. I chose to take the simpler approach at this juncture, which effectively means modifying the (AtomicString) key used by the LineBreakIteratorPool.m_vendedIterators hash map: HashMap<TextBreakIterator*, AtomicString> m_vendedIterators; Since the allocating path is taken only when author uses a non-default line break mode, it seemed an acceptable choice at this time. [An optimization could be a subsequent bug/patch.] > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:93 > > + if (!locale.isEmpty()) { > > nit: I would have written this as an early return. If (locale.isEmpty()) return false; will fix in a new patch > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:96 > > + String language = String::createUninitialized(languageBufferLength, reinterpret_cast<LChar*&>(languageBuffer)); > > I'm confused. How can this work? > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:101 > > + language.truncate(languageLength); > > + language.makeLower(); > > More mallocs, are going to likely make this path expensive... will rewrite this code in a new patch
Comment on attachment 163306 [details] Patch pending updated patch
(In reply to comment #43) > (From update of attachment 163306 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=163306&action=review > > > Source/WebCore/rendering/break_lines.cpp:150 > > -template<bool treatNoBreakSpaceAsBreak> > > +template<bool treatNoBreakSpaceAsBreak, bool isLooseMode> > > Wait a minute, if the previous change improved performance as you claim, why did you change it to use a template? After further analysis of my performance tests, I concluded there was insufficient statistical evidence of an improvement or regression. However, analyzing the code itself showed that the insertion of two references to a boolean (isLooseMode) in the inner loop could have a negative effect. Based on your recent change in [1] to templatize nextBreakablePosition with a boolean treatNoBreakSpaceAsSpace parameter, I chose to take the same approach with isLooseMode. [1] http://trac.webkit.org/changeset/127974 > Making the generated asm bigger is not good. Do you have a suggested alternative approach that avoids use of isLooseMode in the inner loop of nextBreakablePosition? > Also two bools is a bad idea here. You should transform that to two enums. I'm not sure I understand you. Are you suggesting using (1) two new enum types each with two values that correspond to the true and false conditions or (2) a single new type that enumerates all four possibilities (of the two unrelated parameters)?
> > Making the generated asm bigger is not good. > > Do you have a suggested alternative approach that avoids use of isLooseMode in the inner loop of nextBreakablePosition? No, I don't have a better idea :( It is likely the safest solution. > > Also two bools is a bad idea here. You should transform that to two enums. > > I'm not sure I understand you. Are you suggesting using (1) two new enum types each with two values that correspond to the true and false conditions or (2) a single new type that enumerates all four possibilities (of the two unrelated parameters)? 1! :) It will be easier to read. E.g.: nextBreakablePosition<IncludeBreakOnNBSP, LooseBreakMode>(lazyBreakIterator, pos);
Created attachment 163806 [details] Patch
Created attachment 163814 [details] Patch
Comment on attachment 163814 [details] Patch This updated patch addresses the comments from Eric (comment #42) and Benjamin (comment #43); in addition, additional performance improvements were implemented to ensure no regression in the default case of use of default line break mode.
Created attachment 163844 [details] performance test results and comparison This attachment contains the results of performing run-perf-tests on PerformanceTests/Layout/line-layout.html unpatched revision #128140 and patched version (using attachment #163814 [details]). In addition, the results of running run-perf-tests on the new performance tests contained in the patch are provided. As can be seen, there is no regression in the default case (no use of -webkit-line-break), and very little or no difference even when -webkit-line-break is used when @lang is 'en'. In the case of explicit use of -webkit-line-break and @lang of 'ja', some slight performance drop is apparent, which is to be expected.
Comment on attachment 163814 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=163814&action=review > Source/WebCore/platform/text/LineBreakIteratorPoolICU.h:60 > + if (mode == LineBreakIteratorModeUAX14) > + localeWithOptionalBreakKeyword = locale; > + else { > + StringBuilder sb; > + sb.append(locale); > + sb.appendLiteral("@break="); sb is a poor name for a variable. (and following lines): Indent. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:113 > + if (c1 == 'c' && c2 == 'h') > + return true; > + if (c1 == 'j' && (c2 == 'a' || (c2 == 'p'))) > + return true; > + if (c1 == 'k' && c2 == 'o') > + return true; > + if (c1 == 'z' && c2 == 'h') > + return true; > + if (c1 == 'C' && c2 == 'H') > + return true; > + if (c1 == 'J' && (c2 == 'A' || (c2 == 'P'))) > + return true; > + if (c1 == 'K' && c2 == 'O') > + return true; > + if (c1 == 'Z' && c2 == 'H') > + return true; If this is a hotspot, you should aggregate by c2: if (c2 == 'h') { // test c and z } if (c2 == 'H') { // test C and Z } > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:125 > + const String& s(locale.string()); > + size_t sLen = s.length(); > + if (s.is8Bit()) > + return isCJK<LChar>(s.characters8(), sLen); > + return isCJK<UChar>(s.characters16(), sLen); "s" is a poor variable name. You don't need to create s at all, AtomicString now has characters8() and characters16(). > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:738 > + StringBuilder sb; "sb". > Source/WebCore/rendering/break_lines.cpp:175 > + // don't use ASCII shortcut (shouldBreakAfter) if loose mode Comment style: Comments are sentences in WebKit. It is the kind of comment that could be advantageously turned into an inline function. > Source/WebCore/rendering/break_lines.cpp:180 > + // always use line break iterator if loose mode > + if ((looseBehavior == LooseMode) || needsLineBreakIterator<nbspBehavior>(ch) || needsLineBreakIterator<nbspBehavior>(lastCh)) { Ditto.
Created attachment 164031 [details] Patch
Comment on attachment 164031 [details] Patch address comments in comment #52; benjaminp suggests that mitz perform final review
Comment on attachment 164031 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=164031&action=review > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:92 > +static bool isCJK(const T* s, size_t len) Why are you treating 'ch' and 'jp' as if they're 'chn' and 'jpn'? I don't think we have to accept ISO 639 3-letter codes. Even if we do, just checking the 1st two letter does not work because there are ISO 639 3-letter codes starting with 'ch' for languages other than Chinese? See http://www.loc.gov/standards/iso639-2/php/code_list.php. Here's the list. cha ch Chamorro chamorro chb Chibcha chibcha che ce Chechen tchétchène chg Chagatai djaghataï chi (B) zho (T) zh Chinese chinois chk Chuukese chuuk chm Mari mari chn Chinook jargon chinook, jargon cho Choctaw choctaw chp Chipewyan; Dene Suline chipewyan chr Cherokee cherokee chu cu Church Slavic; Old Slavonic; Church Slavonic; Old Bulgarian; Old Church Slavonic slavon d'église; vieux slave; slavon liturgique; vieux bulgare chv cv Chuvash tchouvache chy Cheyenne cheyenne Even worse is that 'ch' (ISO 639 2-letter code) is Chamorro. Given all these, I think we just have to support the shortest ISO-639 codes for CJK ('zh', 'ja', 'ko'). That's also what BCP 47 says. (http://tools.ietf.org/html/bcp47 ) language = 2*3ALPHA ; shortest ISO 639 code ["-" extlang] ; sometimes followed by ; extended language subtags / 4ALPHA ; or reserved for future use / 5*8ALPHA ; or registered language subtag
Comment on attachment 164031 [details] Patch pending fix for comment #55
Created attachment 164255 [details] Patch
Comment on attachment 164255 [details] Patch address comment #55
Comment on attachment 164255 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=164255&action=review > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:103 > + if (!c3 || c3 == '-') { I was about to ask you to do this (after my comment #55), but you beat me to that. In addition, I'd like to suggest that you make this a bit more lenient by allowing '_' (underscore) and '@' to support cases like 'ja_JP', 'zh@foo_bar'
Comment on attachment 164255 [details] Patch pending fix for comment #60
Created attachment 164656 [details] address comment #60
Comment on attachment 164656 [details] address comment #60 all comments addressed; please review and land
Thank you for addressing my comments about locale check. It looks good to me. It'd be great if hyatt or mitz can review+ this patch. BTW, Chrome is losing some changes it has made locally in its version of ICU to better handle Hebrew letters and some CJK punctuation marks (upstream bugs against CLDR were filed a while ago, but they're not yet resolved at UTC ). Now that we're including the full source of line breaking rules in webkit source, I'll file a follow-up bug to integrate those changes to webkit.
Sorry for the late comment, but EPUB 3 CSS Profile[1] defines "-epub-line-break" as an alias to the line-break property. Could you please add this one? [1] http://epub-revision.googlecode.com/svn/trunk/build/30/spec/epub30-contentdocs.html#sec-css-profile
(In reply to comment #65) > Sorry for the late comment, but EPUB 3 CSS Profile[1] defines "-epub-line-break" as an alias to the line-break property. Could you please add this one? > > [1] http://epub-revision.googlecode.com/svn/trunk/build/30/spec/epub30-contentdocs.html#sec-css-profile SInce this is really orthogonal to defining the CSS3 line-break semantics, I would prefer *not* to do this in this patch, but in a follow-on patch once this one has been applied. Is that acceptable?
(In reply to comment #66) > (In reply to comment #65) > > Sorry for the late comment, but EPUB 3 CSS Profile[1] defines "-epub-line-break" as an alias to the line-break property. Could you please add this one? > > > > [1] http://epub-revision.googlecode.com/svn/trunk/build/30/spec/epub30-contentdocs.html#sec-css-profile > > SInce this is really orthogonal to defining the CSS3 line-break semantics, I would prefer *not* to do this in this patch, but in a follow-on patch once this one has been applied. Is that acceptable? That's reasonable, thank you for the comment.
Created attachment 168709 [details] Patch
Created attachment 168884 [details] Patch
Created attachment 168899 [details] Patch
Comment on attachment 168899 [details] Patch update from recent changes to HEAD to ensure patch succeeds; move new layout tests from fast/css3-line-break to css3/line-break due to number of tests producing relatively slow overall test time
Created attachment 171129 [details] Patch
Comment on attachment 171129 [details] Patch update from recent changes to HEAD to ensure patch succeeds
Created attachment 171259 [details] Patch
Could we land the PerformanceTest changes first? Those are non-controversial and would reduce the size of the change. :) I've read through your patch once. I'll need to go through it a couple more times. Overall it looks fine. The big question in my head is performance, since break_lines.* is ridiculously hot code.
Comment on attachment 171259 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=171259&action=review The code change overall looks fine. There are some style nits which I mentioned above. The test cases seem to be needlessly repetitive (a little bit of JavaScript could probably have generated most of these). I believe you that you didn't make things slower, and I trust that you got the ICU line breaking rules correct (it's difficult for me to verify those). I'll give this another look tomorrow. I'm pretty much ready to pull the trigger here. Mitz has had a long time to comment but we should probably give him another 24 hours. Lets talk about this again tomrorow. If you'd like to post a patch fixing some of the nits mentioned that woudl be nice. Also, if you could explain the reinterpret_cast scariness, that would also be nice. :) > Source/WebCore/platform/text/TextBreakIterator.h:49 > + TextBreakIterator* acquireLineBreakIterator(const LChar*, int length, const AtomicString& locale, LineBreakIteratorMode, bool cjk); > + TextBreakIterator* acquireLineBreakIterator(const UChar*, int length, const AtomicString& locale, LineBreakIteratorMode, bool cjk); WebKit style is probably to write isCJK for a bool (although Enums are almost always preferred). > Source/WebCore/platform/text/TextBreakIterator.h:67 > + bool isCJKLocale(const AtomicString& locale); The arg name doesn't add anything. > Source/WebCore/platform/text/TextBreakIterator.h:73 > + , m_cjk(false) isCJK. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:283 > + UBreakIterator* ubrkIter = reinterpret_cast<UBreakIterator*>(iterator); Um. Bad idea. TextBreakIterator should expose a function or something... Or can this be a static_cast? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:293 > - return reinterpret_cast<TextBreakIterator*>(iterator); > + return iterator; I see, this was already doing this. Still a bad idea. Is UBreakIterator a void* or something? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:362 > + ubrkIter = ubrk_open(UBRK_LINE, localeIsEmpty ? currentTextBreakLocaleID() : locale.string().utf8().data(), 0, 0, &openStatus); locale.string().utf8().data()? I assume that ubrk_open doesn't hold onto that string? (since it will die at the end of this line). > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:600 > +static const char* uax14AssignmentsBefore = > + // explicitly enumerate $CJ since ICU versions prior to 49 don't support :LineBreak=Conditional_Japanese_Starter: > + "$CJ = [" > + "\\u3041\\u3043\\u3045\\u3047\\u3049\\u3063\\u3083\\u3085\\u3087\\u308E\\u3095\\u3096\\u30A1\\u30A3\\u30A5\\u30A7" > + "\\u30A9\\u30C3\\u30E3\\u30E5\\u30E7\\u30EE\\u30F5\\u30F6\\u30FC" > + "\\u31F0\\u31F1\\u31F2\\u31F3\\u31F4\\u31F5\\u31F6\\u31F7\\u31F8\\u31F9\\u31FA\\u31FB\\u31FC\\u31FD\\u31FE\\u31FF" > + "\\uFF67\\uFF68\\uFF69\\uFF6A\\uFF6B\\uFF6C\\uFF6D\\uFF6E\\uFF6F\\uFF70" > + "];"; Should we use #defines to key this off of the ICU version instead? If for no other reason than to allow us to remove the old code once all ports are to 49+? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:680 > + "$HL = [[:Hebrew:] & [:Letter:]];" Do we need to key this off the ICU version (as you mentioned in the bug?) Is there any advantage to the syntax you originally tried? > Source/WebCore/rendering/RenderBlockLineLayout.cpp:2557 > + bool looseMode = false; Again, we generally prefix bools with "is" or possibly "in" or "use" in this case (at least that is my understanding of webkit style). > Source/WebCore/rendering/RenderText.cpp:944 > + default: > + return LineBreakIteratorModeUAX14; We often avoid using default: so that the compiler will tell us when we add a value to the enum and forget to add it to the switch statements. > Source/WebCore/rendering/break_lines.cpp:52 > +enum LooseBehavior { > + NonLooseMode, > + LooseMode, > +}; Yes, exactly. We use enums like this instead of bools, mostly to make callsites more readable. :) > Source/WebCore/rendering/style/RenderStyleConstants.h:214 > -enum EKHTMLLineBreak { > - LBNORMAL, AFTER_WHITE_SPACE > +enum LineBreak { > + LineBreakAuto, LineBreakLoose, LineBreakNormal, LineBreakStrict, LineBreakAfterWhiteSpace This change could have been split off and done first. It probably would have made the diff slightly smaller. You even could have plumbed in the new values before adding support for them in order to reduce the size of the diff. > Source/WebCore/rendering/style/StyleRareInheritedData.h:88 > - unsigned khtmlLineBreak : 1; // EKHTMLLineBreak > + unsigned lineBreak : 3; // LineBreak Do we have any constraints on size of this object? Presumably not, it looks rather huge to begin with. :) > LayoutTests/css3/line-break/line-break-auto-centered-ja.html:36 > + <div style="font-family:'Lucida Grande'; font-size:16pt; text-decoration:underline; width:4.1em;" lang="ja"> Even some CSS could have made this test much smaller. :)
Comment on attachment 171259 [details] Patch preparing a sub-division of this patch, so canceling this r?
Created attachment 171613 [details] Patch
Comment on attachment 171613 [details] Patch Attachment 171613 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/14625857
Comment on attachment 171613 [details] Patch Attachment 171613 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/14625858
Comment on attachment 171613 [details] Patch Attachment 171613 [details] did not pass cr-android-ews (chromium-android): Output: http://queues.webkit.org/results/14688019
Comment on attachment 171613 [details] Patch Attachment 171613 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/14677113
Comment on attachment 171613 [details] Patch Attachment 171613 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/14661208
Created attachment 171628 [details] Patch
Comment on attachment 171628 [details] Patch address comment 76 issues
(In reply to comment #76) > (From update of attachment 171259 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=171259&action=review > > Also, if you could explain the reinterpret_cast scariness, that would also be nice. :) see below > > > Source/WebCore/platform/text/TextBreakIterator.h:49 > > + TextBreakIterator* acquireLineBreakIterator(const LChar*, int length, const AtomicString& locale, LineBreakIteratorMode, bool cjk); > > + TextBreakIterator* acquireLineBreakIterator(const UChar*, int length, const AtomicString& locale, LineBreakIteratorMode, bool cjk); > > WebKit style is probably to write isCJK for a bool (although Enums are almost always preferred). done > > > Source/WebCore/platform/text/TextBreakIterator.h:67 > > + bool isCJKLocale(const AtomicString& locale); > > The arg name doesn't add anything. removed > > > Source/WebCore/platform/text/TextBreakIterator.h:73 > > + , m_cjk(false) > > isCJK. done > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:283 > > + UBreakIterator* ubrkIter = reinterpret_cast<UBreakIterator*>(iterator); > > Um. Bad idea. TextBreakIterator should expose a function or something... Or can this be a static_cast? > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:293 > > - return reinterpret_cast<TextBreakIterator*>(iterator); > > + return iterator; > > I see, this was already doing this. Still a bad idea. Is UBreakIterator a void* or something? as you notice, i didn't introduce this, just refactored a bit; basically we have the opaque type typedef struct UBreakIterator UBreakIterator; defined in either ustring.h or ubrk.h, depending on which ICU include files are used. then we have an opaque WK type class TextBreakIterator; defined in TextBreakIterator.h, but never expanded elsewhere in the case of platforms that use ICU; for platforms that don't use ICU, TextBreakIterator as well as acquireLineBreakIterator() et al, return a different opaque type or return a port specific type, in which case a different acquireLineBreakIterator() is supplied; as for using static_cast<>, i'm not sure if that is feasible here > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:362 > > + ubrkIter = ubrk_open(UBRK_LINE, localeIsEmpty ? currentTextBreakLocaleID() : locale.string().utf8().data(), 0, 0, &openStatus); > > locale.string().utf8().data()? I assume that ubrk_open doesn't hold onto that string? (since it will die at the end of this line). correct, the reference isn't held > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:600 > > +static const char* uax14AssignmentsBefore = > > + // explicitly enumerate $CJ since ICU versions prior to 49 don't support :LineBreak=Conditional_Japanese_Starter: > > + "$CJ = [" > > + "\\u3041\\u3043\\u3045\\u3047\\u3049\\u3063\\u3083\\u3085\\u3087\\u308E\\u3095\\u3096\\u30A1\\u30A3\\u30A5\\u30A7" > > + "\\u30A9\\u30C3\\u30E3\\u30E5\\u30E7\\u30EE\\u30F5\\u30F6\\u30FC" > > + "\\u31F0\\u31F1\\u31F2\\u31F3\\u31F4\\u31F5\\u31F6\\u31F7\\u31F8\\u31F9\\u31FA\\u31FB\\u31FC\\u31FD\\u31FE\\u31FF" > > + "\\uFF67\\uFF68\\uFF69\\uFF6A\\uFF6B\\uFF6C\\uFF6D\\uFF6E\\uFF6F\\uFF70" > > + "];"; > > Should we use #defines to key this off of the ICU version instead? If for no other reason than to allow us to remove the old code once all ports are to 49+? done > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:680 > > + "$HL = [[:Hebrew:] & [:Letter:]];" > > Do we need to key this off the ICU version (as you mentioned in the bug?) Is there any advantage to the syntax you originally tried? done; i use the "syntax [I] originally tried" if the ICU version is 49 or later > > > Source/WebCore/rendering/RenderBlockLineLayout.cpp:2557 > > + bool looseMode = false; > > Again, we generally prefix bools with "is" or possibly "in" or "use" in this case (at least that is my understanding of webkit style). done > > > Source/WebCore/rendering/RenderText.cpp:944 > > + default: > > + return LineBreakIteratorModeUAX14; > > We often avoid using default: so that the compiler will tell us when we add a value to the enum and forget to add it to the switch statements. removed default, but added catch all return at end of function; otherwise, some platforms failed to compile, complaining that some return path did not return a value > > > Source/WebCore/rendering/break_lines.cpp:52 > > +enum LooseBehavior { > > + NonLooseMode, > > + LooseMode, > > +}; > > Yes, exactly. We use enums like this instead of bools, mostly to make callsites more readable. :) > > > Source/WebCore/rendering/style/RenderStyleConstants.h:214 > > -enum EKHTMLLineBreak { > > - LBNORMAL, AFTER_WHITE_SPACE > > +enum LineBreak { > > + LineBreakAuto, LineBreakLoose, LineBreakNormal, LineBreakStrict, LineBreakAfterWhiteSpace > > This change could have been split off and done first. It probably would have made the diff slightly smaller. You even could have plumbed in the new values before adding support for them in order to reduce the size of the diff. done and landed (bug 100739) > > > Source/WebCore/rendering/style/StyleRareInheritedData.h:88 > > - unsigned khtmlLineBreak : 1; // EKHTMLLineBreak > > + unsigned lineBreak : 3; // LineBreak > > Do we have any constraints on size of this object? Presumably not, it looks rather huge to begin with. :) don't know of any; it seems to accrete new fields regularly > > > LayoutTests/css3/line-break/line-break-auto-centered-ja.html:36 > > + <div style="font-family:'Lucida Grande'; font-size:16pt; text-decoration:underline; width:4.1em;" lang="ja"> > > Even some CSS could have made this test much smaller. :) greatly reduce test size by using CSS and combining en|ja|ko|zh tests, left with 32 new tests instead of 128
Comment on attachment 171628 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=171628&action=review I briefly looked over this patch and have some minor nits. > Source/WebCore/platform/text/LineBreakIteratorPoolICU.h:73 > + switch (mode) { > + case LineBreakIteratorModeUAX14Loose: > + localeWithKeyword.appendLiteral("loose"); > + break; > + case LineBreakIteratorModeUAX14Normal: > + localeWithKeyword.appendLiteral("normal"); > + break; > + case LineBreakIteratorModeUAX14Strict: > + localeWithKeyword.appendLiteral("strict"); > + break; > + default: > + ASSERT_NOT_REACHED(); > + } This code is error prone. Instead of using a runtime assertion to catch an unexpected mode we should take advantage of the compiler to warn us when a switch block doesn't have a case statement for each enum value by omitting the default statement. To do this, we should extract this code into a static function. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:325 > +static bool isCJKLocale(const T* s, size_t len) Nit: len => length By <http://www.webkit.org/coding/coding-style.html#names-full-words>, we prefer full words for variable names over abbreviations unless such abbreviations are more canonical. Changing "len" to "length" would also be consistent with the argument of the same name in acquireLineBreakIterator(), sentenceBreakIterator(), and among other text iterator functions. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:347 > + size_t sLen = locale.length(); I take it that the 's' in "sLen" stands for string. Regardless, I would call this variable length. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:357 > + UNUSED_PARAM(mode); I take it you plan to make use of this parameter in a follow up patch? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:360 > + bool localeIsEmpty = locale.isEmpty(); Nit: localeIsEmpty => isLocaleEmpty By <http://www.webkit.org/coding/coding-style.html#names-bool>. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:370 > + // locale comes from a web page and it can be invalid, leading ICU > + // to fail, in which case we fall back to the default locale (with default rules) Please make this comment a proper sentence that begins with a capital letter and ends with a period. This will make the comment conform to <http://www.webkit.org/coding/coding-style.html#comments-sentences>. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:668 > +#define uax14AssignmentsCustomStrictNonCJK uax14AssignmentsCustomStrictCJK > +#define uax14AssignmentsCustomDefaultCJK uax14AssignmentsCustomNormalCJK > +#define uax14AssignmentsCustomDefaultNonCJK uax14AssignmentsCustomStrictNonCJK See my remark in mapLineIteratorModeToRules() on how we can remove these defines. At the very least, we should #undef these defines at the end of this file. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:981 > + if (mode == LineBreakIteratorModeUAX14Loose) > + rulesBuilder.append(isCJK ? uax14AssignmentsCustomLooseCJK : uax14AssignmentsCustomLooseNonCJK); > + else if (mode == LineBreakIteratorModeUAX14Normal) > + rulesBuilder.append(isCJK ? uax14AssignmentsCustomNormalCJK : uax14AssignmentsCustomNormalNonCJK); > + else if (mode == LineBreakIteratorModeUAX14Strict) > + rulesBuilder.append(isCJK ? uax14AssignmentsCustomStrictCJK : uax14AssignmentsCustomStrictNonCJK); > + else > + rulesBuilder.append(isCJK ? uax14AssignmentsCustomDefaultCJK : uax14AssignmentsCustomDefaultNonCJK); I would use a switch block here with a default statement instead of if/else if/else statements as it will make this code less error prone because the compiler can ensure that we handle every enum value in enum LineBreakIteratorMode. Currently, LineBreakIteratorModeUAX14 is the only mode that is covered by the else statement. Moreover, using a switch block will allow us to remove macro defines uax14AssignmentsCustomDefaultCJK, uax14AssignmentsCustomDefaultNonCJK, and uax14AssignmentsCustomStrictNonCJK.
(In reply to comment #87) > (From update of attachment 171628 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=171628&action=review > > I briefly looked over this patch and have some minor nits. uploading updated patch shortly with corrections as described below > > Source/WebCore/platform/text/LineBreakIteratorPoolICU.h:73 > > + switch (mode) { > > + case LineBreakIteratorModeUAX14Loose: > > + localeWithKeyword.appendLiteral("loose"); > > + break; > > + case LineBreakIteratorModeUAX14Normal: > > + localeWithKeyword.appendLiteral("normal"); > > + break; > > + case LineBreakIteratorModeUAX14Strict: > > + localeWithKeyword.appendLiteral("strict"); > > + break; > > + default: > > + ASSERT_NOT_REACHED(); > > + } > > This code is error prone. Instead of using a runtime assertion to catch an unexpected mode we should take advantage of the compiler to warn us when a switch block doesn't have a case statement for each enum value by omitting the default statement. To do this, we should extract this code into a static function. fixed, though I'm not sure the relation between fully populating the switch and using a distinct function > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:325 > > +static bool isCJKLocale(const T* s, size_t len) > > Nit: len => length > > By <http://www.webkit.org/coding/coding-style.html#names-full-words>, we prefer full words for variable names over abbreviations unless such abbreviations are more canonical. Changing "len" to "length" would also be consistent with the argument of the same name in acquireLineBreakIterator(), sentenceBreakIterator(), and among other text iterator functions. fixed > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:347 > > + size_t sLen = locale.length(); > > I take it that the 's' in "sLen" stands for string. Regardless, I would call this variable length. fixed > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:357 > > + UNUSED_PARAM(mode); > > I take it you plan to make use of this parameter in a follow up patch? removed; i'd neglected to remove it after using mode > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:360 > > + bool localeIsEmpty = locale.isEmpty(); > > Nit: localeIsEmpty => isLocaleEmpty fixed > By <http://www.webkit.org/coding/coding-style.html#names-bool>. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:370 > > + // locale comes from a web page and it can be invalid, leading ICU > > + // to fail, in which case we fall back to the default locale (with default rules) > > Please make this comment a proper sentence that begins with a capital letter and ends with a period. This will make the comment conform to <http://www.webkit.org/coding/coding-style.html#comments-sentences>. i merely copied that comment from existing code without modifying, but i've fixed as you suggest > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:668 > > +#define uax14AssignmentsCustomStrictNonCJK uax14AssignmentsCustomStrictCJK > > +#define uax14AssignmentsCustomDefaultCJK uax14AssignmentsCustomNormalCJK > > +#define uax14AssignmentsCustomDefaultNonCJK uax14AssignmentsCustomStrictNonCJK > > See my remark in mapLineIteratorModeToRules() on how we can remove these defines. if you insist, i will remove, but i think removing will reduce understandability of code: the #define names are semantically significant, and are derived from a coincidental mapping that happens to apply from [1] [1] http://trac.webkit.org/wiki/LineBreakingCSS3Mapping if they are removed, the [code] reader will be left wondering about this mapping, e.g., why StrictNonCJK happens to be using StrictCJK > At the very least, we should #undef these defines at the end of this file. If this were an include file, then yes, but it doesn't seem necessary for a compilation unit (or if it does, then I'm not following your reasoning). > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:981 > > + if (mode == LineBreakIteratorModeUAX14Loose) > > + rulesBuilder.append(isCJK ? uax14AssignmentsCustomLooseCJK : uax14AssignmentsCustomLooseNonCJK); > > + else if (mode == LineBreakIteratorModeUAX14Normal) > > + rulesBuilder.append(isCJK ? uax14AssignmentsCustomNormalCJK : uax14AssignmentsCustomNormalNonCJK); > > + else if (mode == LineBreakIteratorModeUAX14Strict) > > + rulesBuilder.append(isCJK ? uax14AssignmentsCustomStrictCJK : uax14AssignmentsCustomStrictNonCJK); > > + else > > + rulesBuilder.append(isCJK ? uax14AssignmentsCustomDefaultCJK : uax14AssignmentsCustomDefaultNonCJK); > > I would use a switch block here with a default statement instead of if/else if/else statements as it will make this code less error prone because the compiler can ensure that we handle every enum value in enum LineBreakIteratorMode. Currently, LineBreakIteratorModeUAX14 is the only mode that is covered by the else statement. Moreover, using a switch block will allow us to remove macro defines uax14AssignmentsCustomDefaultCJK, uax14AssignmentsCustomDefaultNonCJK, and uax14AssignmentsCustomStrictNonCJK. i've changed to a switch statement, but i don't think we want to remove the macro defines (see above) is this acceptable?
Comment on attachment 171628 [details] Patch cancel r? pending fix for comment 87 issues
Created attachment 171924 [details] Patch
Created attachment 171927 [details] Patch
Comment on attachment 171927 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=171927&action=review > Source/WebCore/ChangeLog:53 > + Add static function to construct ICU locale argument (also used as pool key) with additional > + break keyword. It's unclear to me why we need to encode the break information in the "locale" string at all? I assume this is used as the key to some hash? > Source/WebCore/platform/text/TextBreakIterator.h:51 > + TextBreakIterator* openLineBreakIterator(const AtomicString& locale, LineBreakIteratorMode, bool isCJK); This is no longer just "locale" it seems. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:267 > - UBreakIterator* iterator = LineBreakIteratorPool::sharedPool().take(locale); > + TextBreakIterator* iterator = LineBreakIteratorPool::sharedPool().take(locale, mode, isCJK); I see. This is why we ad the break part to the string. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:361 > + if ((mode == LineBreakIteratorModeUAX14) && !isCJK) > + ubrkIter = ubrk_open(UBRK_LINE, isLocaleEmpty ? currentTextBreakLocaleID() : locale.string().utf8().data(), 0, 0, &openStatus); Do other text-breaking backends function with &break= strings like ICU does? It seems encoding that information in the locale string outside of this file may be a layering violation? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:373 > + // Locale comes from a web page and it can be invalid, leading ICU > + // to fail, in which case we fall back to the default locale (with default rules). > + if (!isLocaleEmpty && U_FAILURE(openStatus)) { > + openStatus = U_ZERO_ERROR; > + ubrkIter = ubrk_open(UBRK_LINE, currentTextBreakLocaleID(), 0, 0, &openStatus); > + } Woh, so webpages can get this behavior indirectly by specifying locale="en&break=strict" already (or do we correctly sanitize the locale string before getting in here?) > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:992 > + rules = rulesBuilder.toString(); Odd. Why pass by reference?
(In reply to comment #88) > [...] > > > Source/WebCore/platform/text/LineBreakIteratorPoolICU.h:73 > > > + switch (mode) { > > > + case LineBreakIteratorModeUAX14Loose: > > > + localeWithKeyword.appendLiteral("loose"); > > > + break; > > > + case LineBreakIteratorModeUAX14Normal: > > > + localeWithKeyword.appendLiteral("normal"); > > > + break; > > > + ... > > > [...] > fixed, though I'm not sure the relation between fully populating the switch and using a distinct function Disregard my remark about the distinct function as it isn't necessary. > > [...] > > See my remark in mapLineIteratorModeToRules() on how we can remove these defines. > > if you insist, i will remove, but i think removing will reduce understandability of code: the #define names are semantically significant, and are derived from a coincidental mapping that happens to apply from [1] > > [1] http://trac.webkit.org/wiki/LineBreakingCSS3Mapping > > if they are removed, the [code] reader will be left wondering about this mapping, e.g., why StrictNonCJK happens to be using StrictCJK > OK. I'll trust your judgement with regards to the #defines. > > > At the very least, we should #undef these defines at the end of this file. > > If this were an include file, then yes, but it doesn't seem necessary for a compilation unit (or if it does, then I'm not following your reasoning). You're right. It's unnecessary to #undef the defines since they are in a .cpp file. > > > > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:981 > > > + if (mode == LineBreakIteratorModeUAX14Loose) > > > + rulesBuilder.append(isCJK ? uax14AssignmentsCustomLooseCJK : uax14AssignmentsCustomLooseNonCJK); > > > + else if (mode == LineBreakIteratorModeUAX14Normal) > > > + rulesBuilder.append(isCJK ? uax14AssignmentsCustomNormalCJK : uax14AssignmentsCustomNormalNonCJK); > > > ... > > > >[...] Moreover, using a switch block will allow us to remove macro defines uax14AssignmentsCustomDefaultCJK, uax14AssignmentsCustomDefaultNonCJK, and uax14AssignmentsCustomStrictNonCJK. > > i've changed to a switch statement, but i don't think we want to remove the macro defines (see above) > > is this acceptable? Yes, I'll trust your judgement with regards to the #defines.
(In reply to comment #92) > (From update of attachment 171927 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=171927&action=review > > > Source/WebCore/ChangeLog:53 > > + Add static function to construct ICU locale argument (also used as pool key) with additional > > + break keyword. > > It's unclear to me why we need to encode the break information in the "locale" string at all? I assume this is used as the key to some hash? it is used both as part of the line pool hash and also as a (future) argument to ICU, which uses keyword/value pairs as locale params [1] [1] http://icu-project.org/apiref/icu4c/classLocale.html#a2e847ff3c5df01b077293a30347bac11 > > > Source/WebCore/platform/text/TextBreakIterator.h:51 > > + TextBreakIterator* openLineBreakIterator(const AtomicString& locale, LineBreakIteratorMode, bool isCJK); > > This is no longer just "locale" it seems. from ICU perspective, the locale string [may] contain such optional keyword/value pairs > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:267 > > - UBreakIterator* iterator = LineBreakIteratorPool::sharedPool().take(locale); > > + TextBreakIterator* iterator = LineBreakIteratorPool::sharedPool().take(locale, mode, isCJK); > > I see. This is why we ad the break part to the string. yep, but also in future, ICU may make use of this keyword value to parameterize the line break iterator, in which case we can remove our custom rules and let ICU take over with its (at that future time) default (but parameterized) rules [n.b. i've been talking with some folks working on ICU about doing this, and its something they may take up] in any case, i needed to distinguish the entries in the pool cache by the break param, so this served both purposes > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:361 > > + if ((mode == LineBreakIteratorModeUAX14) && !isCJK) > > + ubrkIter = ubrk_open(UBRK_LINE, isLocaleEmpty ? currentTextBreakLocaleID() : locale.string().utf8().data(), 0, 0, &openStatus); > > Do other text-breaking backends function with &break= strings like ICU does? according to [2], it would seem there is such use outside ICU, though not necessarily the same syntax [2] http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html > It seems encoding that information in the locale string outside of this file may be a layering violation? probably, but that's how the ICU interface works > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:373 > > + // Locale comes from a web page and it can be invalid, leading ICU > > + // to fail, in which case we fall back to the default locale (with default rules). > > + if (!isLocaleEmpty && U_FAILURE(openStatus)) { > > + openStatus = U_ZERO_ERROR; > > + ubrkIter = ubrk_open(UBRK_LINE, currentTextBreakLocaleID(), 0, 0, &openStatus); > > + } > > Woh, so webpages can get this behavior indirectly by specifying locale="en&break=strict" already (or do we correctly sanitize the locale string before getting in here?) this is unchanged code from prior to this patch (from pre-patch LineBreakIteratorPoolICU.h), which uses the value returned by currentTextBreakLocaleID() as a back-up, which, if that value happened to be returned with such a keyword, would be semantically significant to ICU; i haven't traced back that code to see if it sanitizes the value syntax > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:992 > > + rules = rulesBuilder.toString(); > > Odd. Why pass by reference? given the length of the rules string, i didn't want to perform an unnecessary copy, presuming that would have occurred if I had returned String, i.e., if I had done that and then had: String rules = mapLineIteratorModeToRules(...) if there is a convenient way to avoid a copy, then I could change this if you think it needed
Comment on attachment 171927 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=171927&action=review >>> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:992 >>> + rules = rulesBuilder.toString(); >> >> Odd. Why pass by reference? > > given the length of the rules string, i didn't want to perform an unnecessary copy, presuming that would have occurred if I had returned String, i.e., if I had done that and then had: > > String rules = mapLineIteratorModeToRules(...) > > if there is a convenient way to avoid a copy, then I could change this if you think it needed Returning a String by value does a little extra ref-churn, but that's all: http://trac.webkit.org/browser/trunk/Source/WTF/wtf/text/WTFString.h#L144
Comment on attachment 171927 [details] Patch address comment 87 and comment 92 nits
Comment on attachment 171927 [details] Patch should be ready to go now
Comment on attachment 171927 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=171927&action=review My job here is not really to tell you if this works right or not (you're more of a line-breaking expert than I). My job is to tell you if this makes sense in WebKit's system. Overall this seems fine. There are a couple design nits, which I've pointed out in this and previous reviews. Many of which could be fixed either in lead-up or follow-up bugs to the main change. I don't know what level of interest in you have in long-term maintenance of this code. :) > Source/WebCore/platform/text/LineBreakIteratorPoolICU.h:56 > + localeWithKeyword.appendLiteral("@break="); Since we're just using this as a hash key it feels a bit odd to hijack a string for this use. A class would make more sense, but that is a slightly more complicated change, as you'd need to both define the class (with a String/mode pair) and then define how to hash it. I suspect std::pair<String, LineBreakIteratorMode> might actually do what you want though. > Source/WebCore/platform/text/TextBreakIterator.h:123 > + LineBreakIteratorMode m_mode; > + bool m_isCJK; I don't know how tight memory is on these iterators? Presumably we only have a couple per page? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:326 > +static bool isCJKLocale(const T* s, size_t length) > +{ I'm surprised we don't already have this code somewhere else in WebCore.
(In reply to comment #98) > (From update of attachment 171927 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=171927&action=review > > My job here is not really to tell you if this works right or not (you're more of a line-breaking expert than I). My job is to tell you if this makes sense in WebKit's system. Overall this seems fine. There are a couple design nits, which I've pointed out in this and previous reviews. Many of which could be fixed either in lead-up or follow-up bugs to the main change. I don't know what level of interest in you have in long-term maintenance of this code. :) My plan on WK contribution is to continue work on it, and perhaps reach reviewer status eventually. Beyond this patch, I plan to work on filling out ruby functionality and other CT bugs, optimizations, etc. > > > Source/WebCore/platform/text/LineBreakIteratorPoolICU.h:56 > > + localeWithKeyword.appendLiteral("@break="); > > Since we're just using this as a hash key it feels a bit odd to hijack a string for this use. A class would make more sense, but that is a slightly more complicated change, as you'd need to both define the class (with a String/mode pair) and then define how to hash it. I suspect std::pair<String, LineBreakIteratorMode> might actually do what you want though. Yes, I thought of that as well when starting this, but thought it was more complex than needed to get this feature off the ground. It could certainly be done in as a later optimization. > > > Source/WebCore/platform/text/TextBreakIterator.h:123 > > + LineBreakIteratorMode m_mode; > > + bool m_isCJK; > > I don't know how tight memory is on these iterators? Presumably we only have a couple per page? That's my assumption, though I haven't instrumented it specifically. > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:326 > > +static bool isCJKLocale(const T* s, size_t length) > > +{ > > I'm surprised we don't already have this code somewhere else in WebCore. I didn't look very far to see if this was available elsewhere, but I needed a very tight and performative imll in this case to prevent introducing a slowdown in break_lines. Overall, I think this is ready to land at this juncture. Unless you disagree, I would appreciate a r/cq+, so I can move to subsequent improvements in this area.
Comment on attachment 171927 [details] Patch I agree that it's time to land this and follow-up with other patches as necessary. I've filed a couple follow-up bugs and related them to this one. They can be either fixed or closed at a later juncture.
Thank you for your perseverance. I look forward to discussing this and other patches with you in person next time you're in SF.
Comment on attachment 171927 [details] Patch Clearing flags on attachment: 171927 Committed r133375: <http://trac.webkit.org/changeset/133375>
All reviewed patches have been landed. Closing bug.
Re-opened since this is blocked by bug 101138
Unfortunately, this breaks three tests: http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showExpectations=true&tests=css3%2Fline-break%2Fline-break-auto-half-kana.html http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showExpectations=true&tests=css3%2Fline-break%2Fline-break-normal-sound-marks.html http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showExpectations=true&tests=css3%2Fline-break%2Fline-break-auto-sound-marks.html Rolling out for now.
Glenn: To be clear, the path forward here is just for you to look at the failures and to explain what's right/wrong and for us to fix the location of the -expected files to match, and then we can reland.
Created attachment 172397 [details] Patch
Comment on attachment 172397 [details] Patch Retarget chromium-linux reftest expectations to chromium (generic), but use default reftests with chromium-android. This should address chromium fails, but may see some Qt fails, for which I've opened https://bugs.webkit.org/show_bug.cgi?id=101250. If Qt fails, then should skip css3/line-break, pending fixes for bug 101250.
Comment on attachment 172397 [details] Patch Thanks.
Comment on attachment 172397 [details] Patch Clearing flags on attachment: 172397 Committed r133529: <http://trac.webkit.org/changeset/133529>
This made performance worse on chromium's intl2 page cycler by about 10% http://build.chromium.org/f/chromium/perf/chromium-rel-win7-webkit/intl1/report.html?history=150&rev=-1 Notice that performance regressed with http://trac.webkit.org/log/?verbose=on&rev=133379&stop_rev=133369, improved with this patch was reverted in http://trac.webkit.org/log/?verbose=on&rev=133386&stop_rev=133385, then regressed again in http://trac.webkit.org/log/?verbose=on&rev=133531&stop_rev=133523
Is that a test that Glenn can run himself?
(In reply to comment #112) > This made performance worse on chromium's intl2 page cycler by about 10% > > http://build.chromium.org/f/chromium/perf/chromium-rel-win7-webkit/intl1/report.html?history=150&rev=-1 > > Notice that performance regressed with http://trac.webkit.org/log/?verbose=on&rev=133379&stop_rev=133369, improved with this patch was reverted in http://trac.webkit.org/log/?verbose=on&rev=133386&stop_rev=133385, then regressed again in http://trac.webkit.org/log/?verbose=on&rev=133531&stop_rev=133523 is this perf regress chromium platform specific? e.g., cr-win specific? cr-linux? i'm set up to build/test on Ubuntu 12.04 64-bit, so i'll update and attempt to track down this test
It seems to be most pronounced on cr-win - here's a cr-linux graph for comparison: http://build.chromium.org/f/chromium/perf/linux-release-webkit-latest/intl2/report.html?history=150&rev=-1. The results overall are noisier, but there doesn't appear to be room for a 10% regression. Some parts of the line breaking code are platform-specific so that could explain it - we also use a different toolchain (msvs vs gcc/clang on linux/mac respectively).
presumably intl2 doesn't use any of the new -webkit-line-break modes [though it's worth checking of it explicitly uses 'normal' as a value in a CJK context]; more likely, this is a compile/optimizer issue, since this patch does introduce a third template parameter on a very hot function (WebCore::nextBreakablePosition), which I was very careful to check for regressions on mac ports; the hot loop on the above function should devolve to pre-patch behavior (and performance) when the looseBehavior template parameter is not LooseMode; however, if the compiler isn't optimizing this test, and is doing a compare on looseBehavior, then that could be the culprit i'd need to look at the disassembly of nextBreakBehavior on cr-win to see what the compiler was doing; i did careful perf testing of this on mac-* ports, and proved to my satisfaction that no perf regression occurred
(In reply to comment #116) > presumably intl2 doesn't use any of the new -webkit-line-break modes [though it's worth checking of it explicitly uses 'normal' as a value in a CJK context]; The string "line-break" does not appear in the test data at all. > > more likely, this is a compile/optimizer issue, since this patch does introduce a third template parameter on a very hot function (WebCore::nextBreakablePosition), which I was very careful to check for regressions on mac ports; > > the hot loop on the above function should devolve to pre-patch behavior (and performance) when the looseBehavior template parameter is not LooseMode; however, if the compiler isn't optimizing this test, and is doing a compare on looseBehavior, then that could be the culprit > > i'd need to look at the disassembly of nextBreakBehavior on cr-win to see what the compiler was doing; i did careful perf testing of this on mac-* ports, and proved to my satisfaction that no perf regression occurred The mac and linux perf bots appear to agree with your hypothesis. The chromium windows build instructions are here: http://www.chromium.org/developers/how-tos/build-instructions-windows or I'm sure we could find someone here who's already set up to build in this configuration who can help diagnose this issue (+cc a few who might be interested)
Or if you don't/can't set up a build environment yourself, in a few days this patch will hit the Google Chrome Canary and you can disassemble the binary yourself. We provide debug symbols here: http://www.chromium.org/developers/how-tos/debugging#TOC-Symbol-server
It looks like this patch may have caused a 15% regression in memory usage on the Chromium page cyclers. http://build.chromium.org/f/chromium/perf/chromium-rel-win7-webkit/intl2/report.html?rev=166210&graph=ws_single_peak_r&history=30 Regression range: http://trac.webkit.org/log/?verbose=on&rev=133531&stop_rev=133523 Can we come up with a way to support this that uses less memory? In the meantime, can we rollout this patch?
Thank you Ojan and James. Yes, we should roll this out. We can't regress a PLT by 10% and memory by 15%, at least not w/o understanding the tradeoff better.
One of the tests was also flaky (noted in bug 101368). I think to land this again, we need to run the memory tests locally, and consider splitting this test into smaller pieces.
Good new and bad. Turns out I was wrong about this patch causing the memory regression. On the other hand, it does seem to be the culprit for the performance regression James pointed out.
(In reply to comment #122) > Good new and bad. Turns out I was wrong about this patch causing the memory regression. On the other hand, it does seem to be the culprit for the performance regression James pointed out. Sigh. I'm doing too many things at once. This was the cause of the memory regression as well. I just misread the dashboard.
(In reply to comment #123) > (In reply to comment #122) > > Good new and bad. Turns out I was wrong about this patch causing the memory regression. On the other hand, it does seem to be the culprit for the performance regression James pointed out. > > Sigh. I'm doing too many things at once. This was the cause of the memory regression as well. I just misread the dashboard. Double sigh. Really doing too many things at once. It was *not* this patch the caused the memory regression, it was http://trac.webkit.org/changeset/133526.
Created attachment 189854 [details] Patch
Comment on attachment 189854 [details] Patch Attachment 189854 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/16722162
Comment on attachment 189854 [details] Patch Attachment 189854 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/16719285
Comment on attachment 189854 [details] Patch Attachment 189854 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/16718327
Created attachment 189897 [details] Patch
Comment on attachment 189897 [details] Patch Ready to re-land after: (1) subdividing patch to reduce size (see bug 109954) (2) reworking nextBreakPosition<> (break_lines.cpp) to avoid WIN optimization bug (3) incrementally enabling new tests on port by port basis as verification proceeds
Comment on attachment 189897 [details] Patch This LGTM. My main concern is possible perf regressions. We'll need to be careful to check the bots again in 24 hours to make sure this didn't cause a blip. Some of this is extremely hot code.
Comment on attachment 189897 [details] Patch Clearing flags on attachment: 189897 Committed r144019: <http://trac.webkit.org/changeset/144019>
8% performance regression on chromium-win7 intl2 page cycler noted at http://build.chromium.org/f/chromium/perf/chromium-rel-win7-webkit/intl1/report.html?history=40&rev=184623&graph=times and bug opened at https://bugs.webkit.org/show_bug.cgi?id=110872 with dependency added here rather than reverting this patch, i'd like to fix the regression in a follow on patch that addresses the win specific issue
Re-opened since this is blocked by bug 110892
Comment on attachment 189897 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=189897&action=review > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:384 > + UBreakIterator* ubrkIter = reinterpret_cast<UBreakIterator*>(iterator); Why reinterpret rather than static_cast here? Also you may want to try keeping openLineBreakIterator and especially closeLineBreakIterator (as it's a trivial forwarding function all you're doing here is adding call overhead) inline.
8-10% perf. regression on html5-full-render across the board: https://perf.webkit.org/#mode=charts&chartList=%5B%5B%22Chromium%20Mac%22%2C%22Parser%2Fhtml5-full-render%3ATime%22%5D%2C%5B%22Mac%20MountainLion%22%2C%22Parser%2Fhtml5-full-render%3ATime%22%5D%2C%5B%22Mac%20Lion%22%2C%22Parser%2Fhtml5-full-render%3ATime%22%5D%2C%5B%22Chromium%20Win%22%2C%22Parser%2Fhtml5-full-render%3ATime%22%5D%2C%5B%22Qt%205.0%22%2C%22Parser%2Fhtml5-full-render%3ATime%22%5D%2C%5B%22Qt%205.0%20WebKit%202%22%2C%22Parser%2Fhtml5-full-render%3ATime%22%5D%2C%5B%22EFL%22%2C%22Parser%2Fhtml5-full-render%3ATime%22%5D%2C%5B%22Chromium%20Linux%22%2C%22Parser%2Fhtml5-full-render%3ATime%22%5D%5D&zoom=%5B1361785056834.4497%2C1361894029801.6208%5D This is quite possibly the worst perf. regression I've ever seen on this test.
(In reply to comment #137) > 8-10% perf. regression on html5-full-render across the board: > https://perf.webkit.org/#mode=charts&chartList=%5B%5B%22Chromium%20Mac%22%2C%22Parser%2Fhtml5-full-render%3ATime%22%5D%2C%5B%22Mac%20MountainLion%22%2C%22Parser%2Fhtml5-full-render%3ATime%22%5D%2C%5B%22Mac%20Lion%22%2C%22Parser%2Fhtml5-full-render%3ATime%22%5D%2C%5B%22Chromium%20Win%22%2C%22Parser%2Fhtml5-full-render%3ATime%22%5D%2C%5B%22Qt%205.0%22%2C%22Parser%2Fhtml5-full-render%3ATime%22%5D%2C%5B%22Qt%205.0%20WebKit%202%22%2C%22Parser%2Fhtml5-full-render%3ATime%22%5D%2C%5B%22EFL%22%2C%22Parser%2Fhtml5-full-render%3ATime%22%5D%2C%5B%22Chromium%20Linux%22%2C%22Parser%2Fhtml5-full-render%3ATime%22%5D%5D&zoom=%5B1361785056834.4497%2C1361894029801.6208%5D > > This is quite possibly the worst perf. regression I've ever seen on this test. Thanks, i'll wear that as a badge of honor. :)
(In reply to comment #138) > Thanks, i'll wear that as a badge of honor. :) You're not a true WebKitten until you've introduced a major regression :)
(In reply to comment #136) > (From update of attachment 189897 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=189897&action=review > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:384 > > + UBreakIterator* ubrkIter = reinterpret_cast<UBreakIterator*>(iterator); > > Why reinterpret rather than static_cast here? it was already there, from https://trac.webkit.org/browser/trunk/Source/WebCore/platform/text/TextBreakIteratorICU.cpp?rev=89592 just moved it > Also you may want to try keeping openLineBreakIterator and especially closeLineBreakIterator (as it's a trivial forwarding function all you're doing here is adding call overhead) inline. good idea
Given that this regression showed in html5-full-render, this should be trivial to debug locally. PLT performance is a pain, but run-perf-tests is easy. try: run-perf-tests Parser/html5-full-render You can even use: run-perf-tests Parser/html5-full-render --profile to have it automatically run whatever your system's default sampling profiler is. See run-perf-tests for options.
<rdar://problem/17843913>
Created attachment 240377 [details] Rebased
Created attachment 240519 [details] Patch
Created attachment 240560 [details] Patch
Created attachment 242061 [details] Patch
Comment on attachment 242061 [details] Patch r=me, make sure perf is fine.
http://trac.webkit.org/changeset/176473