Bug 10228

Summary: CSS3: Support the hyphenate property
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: CSSAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ddkilzer, dglazkov, hclam, hyatt, jamesr, Justin, kangax, mitz, nickshanks, peter, simon.fraser, swang, tkent, webkit.review.bot
Priority: P4 Keywords: InRadar
Version: 420+   
Hardware: Mac   
OS: OS X 10.4   
URL: http://www.w3.org/TR/css3-text/#hyphenate
Attachments:
Description Flags
Add CSS Hyphenate tag support.
none
Add CSS hyphenate tag support -- latest patch with layout tests.
hyatt: review-
[WIP] Support hyphens and hyphenate-character
none
[WIP] Support hyphens and hyphenate-character
none
Implement the 'hyphens' and 'hyphenate-character' properties
darin: review-
Implement the 'hyphens' and 'hyphenate-character' properties
none
Implement the 'hyphens' and 'hyphenate-character' properties
darin: review+
Speedup
ddkilzer: review+
Potential speedup
none
Performance regression chart none

Description Eric Seidel (no email) 2006-08-02 21:56:54 PDT
I imagine with the help of iconv, WebKit could support CSS3 hypehnate relatively easily.
Comment 1 Dave Hyatt 2006-08-03 03:32:18 PDT
CSS3 Text is still in a state of extreme flux right now though, so I'd be reluctant to implement any of it yet.
Comment 2 kangax 2010-03-28 13:36:39 PDT
Amazing how after almost 4 years "the hyphenation feature is very much up-in-the-air" — quoting relevant CSS3 section (<http://www.w3.org/TR/css3-text/#hyphenate>).
Comment 3 Nicholas Shanks 2010-03-29 07:24:17 PDT
It's still "up‐in‐the‐air" because Hixie is not sure how to spec it and is waiting for a rendering engine that prides itself on good typography to take the lead. I wonder who he might look towards for that inspiration?
Comment 4 kangax 2010-03-30 09:47:35 PDT
It looks like previous version of CSS3 Text module defined few hyphenation-related properties (http://www.w3.org/TR/2007/WD-css3-gcpm-20070205/#hyphenation):

hyphenate: none | auto
hyphenate-dictionary: auto | <uri>
hyphenate-before: <integer>
hyphenate-after: <integer>
hyphenate-lines: none | <integer>
hyphenate-character: string

There was also a related discussion on WHATWG mailing list (http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2007-January/009018.html)
Comment 5 sw 2010-05-25 10:16:07 PDT
I'd like to implement the CSS tag hyphenate
---------
hyphenate: none | auto
--------------

Will the patch get accepted by WebKit?

Thanks,
Comment 6 sw 2010-06-11 14:08:19 PDT
Created attachment 58505 [details]
Add CSS Hyphenate tag support.

Not sure how to test this patch.  Please advise.  Since this patch is a limited change to add CSS Hyphenate tag parsing support.
Comment 7 sw 2010-06-15 14:48:41 PDT
Created attachment 58822 [details]
Add CSS hyphenate tag support -- latest patch with layout tests.
Comment 8 WebKit Review Bot 2010-06-15 14:50:25 PDT
Attachment 58822 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebCore/css/CSSPrimitiveValueMappings.h:1639:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
WebCore/css/CSSPrimitiveValueMappings.h:1651:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
WebCore/rendering/style/RenderStyle.h:176:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
WebCore/rendering/style/RenderStyle.h:200:  _hyphenate is incorrectly named. Don't use underscores in your identifier names.  [readability/naming] [4]
WebCore/rendering/style/RenderStyle.cpp:441:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Total errors found: 5 in 12 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 9 Simon Fraser (smfr) 2010-06-15 14:51:40 PDT
hyphenate is not a tag, it's a property.
Comment 10 Dave Hyatt 2010-06-15 14:54:37 PDT
Comment on attachment 58822 [details]
Add CSS hyphenate tag support -- latest patch with layout tests.

No vendor prefix.  This is CSS3 WD remember.  You can't add an unprefixed property.
Comment 11 Dave Hyatt 2010-06-15 14:55:01 PDT
Also not sure you're looking at current syntax:

http://www.w3.org/TR/2010/WD-css3-gcpm-20100608/
Comment 12 mitz 2010-06-15 14:57:34 PDT
I actually have a patch in progress that adds -webkit-hyphens and -webkit-hyphenate-character from the aforelinked spec. I will clean it up and post it soon.
Comment 13 sw 2010-06-15 15:19:39 PDT
Thanks all for the review.  For the style, I followed the existing code.

Anyway, I'll wait for mitz's patch then.  My patch has a layout test also, if it can be of any use.

Thanks all.
Comment 14 mitz 2010-06-15 17:58:45 PDT
Created attachment 58842 [details]
[WIP] Support hyphens and hyphenate-character

I’d like rendering experts to take a look at the rendering changes. I’ve chosen to not use a separate box for the hyphen, as this greatly simplifies selection, hit-testing, editing etc., and also allows for negative kerning between the last character and the hyphen. I’ve also decided not to subclass InlineTextBox for this purpose.

Still missing:
1. Support for hyphens: none;
2. Support for platforms that can’t hyphenate
3. Better locale selection, and locale-dependent hyphen character for 'auto'
4. Tests
Comment 15 sw 2010-06-16 10:48:11 PDT
hi, Mitz,

Seems you didn't set the review flag to "?", right?
Comment 16 mitz 2010-06-16 10:49:15 PDT
(In reply to comment #15)
> hi, Mitz,
> 
> Seems you didn't set the review flag to "?", right?

Right, the patch is incomplete, but I would like Hyatt’s and others’ feedback as I keep working on it.
Comment 17 mitz 2010-06-16 14:34:25 PDT
I got some feedback from Hyatt saying that the approach is fine (not creating extra line boxes) and suggesting better handling of computed style and some refactoring of the InlineTextBox code that’s repeated three times. I am going to continue my work based on that.
Comment 18 mitz 2010-06-16 14:35:02 PDT
<rdar://problem/7592435>
Comment 19 mitz 2010-06-17 17:51:55 PDT
Created attachment 59053 [details]
[WIP] Support hyphens and hyphenate-character

Cleaned up and added "hyphens: none" support
Comment 20 sw 2010-06-18 11:40:17 PDT
hi, Mitz,

Just wondering whether your hyphenation logic considers the case where a word can have multiple hyphenation points.  Seems from your logic, you only consider lastHyphenPosition.  For example, for word "ex-per-i-ment-al", do you just consider the first hypen between ex and per and other following also?

Thanks,
Comment 21 mitz 2010-06-18 11:50:43 PDT
(In reply to comment #20)
> hi, Mitz,
> 
> Just wondering whether your hyphenation logic considers the case where a word can have multiple hyphenation points.  Seems from your logic, you only consider lastHyphenPosition.  For example, for word "ex-per-i-ment-al", do you just consider the first hypen between ex and per and other following also?
> 
> Thanks,

It will consider the last possible hyphenation location: see how prefixLength is initially computed by measuring (or rather hit-testing), then passed as a parameter to lastHyphenLocation(), which will return the last hyphen location at or before it.
Comment 22 mitz 2010-06-19 22:28:09 PDT
Created attachment 59196 [details]
Implement the 'hyphens' and 'hyphenate-character' properties
Comment 23 WebKit Review Bot 2010-06-19 22:29:37 PDT
Attachment 59196 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebCore/css/CSSPrimitiveValueMappings.h:2165:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
WebCore/css/CSSPrimitiveValueMappings.h:2180:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
WebCore/rendering/style/RenderStyle.cpp:389:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
WebCore/rendering/style/RenderStyle.cpp:390:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
WebCore/rendering/RootInlineBox.cpp:120:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Total errors found: 5 in 47 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 24 WebKit Review Bot 2010-06-19 23:40:43 PDT
Attachment 59196 [details] did not build on chromium:
Build output: http://webkit-commit-queue.appspot.com/results/3333438
Comment 25 Darin Adler 2010-06-20 12:48:55 PDT
Comment on attachment 59196 [details]
Implement the 'hyphens' and 'hyphenate-character' properties

> +            if (style->hyphenateCharacter() == nullAtom)
> +                return CSSPrimitiveValue::createIdentifier(CSSValueAuto);

I see the use of "== nullAtom" earlier in the file. But I'm surprised we do that instead of calling the isNull function.

> +size_t lastHyphenLocation(const UChar*, size_t length, size_t maxLocation);

To me it's not obvious what the relationship between maxLocation and length is.

> +    static CFLocaleRef locale = 0;
> +    if (!locale) {
> +        RetainPtr<CFStringRef> localeIdentifier(AdoptCF, CFStringCreateWithBytesNoCopy(kCFAllocatorDefault, currentSearchLocaleID(), strlen(currentSearchLocaleID()), kCFStringEncodingASCII, False, kCFAllocatorNull));
> +        locale = CFLocaleCreate(kCFAllocatorDefault, localeIdentifier.get());
> +    }

I prefer initializing with a create function called in the initializer to a separate if statement like this one.

> +    static bool checkedLocale = false;
> +    static bool localeIsEnglish = false;
> +    if (!checkedLocale)
> +        localeIsEnglish = !strcmp("en", currentSearchLocaleID());
> +    if (!localeIsEnglish)
> +        return 0;

Why not just this?

    static bool localeIsEnglish = !strcmp("en", currentSearchLocaleID());

Seems better to me than the two booleans.

> Index: WebCore/rendering/InlineTextBox.cpp
> ===================================================================
> --- WebCore/rendering/InlineTextBox.cpp	(revision 61502)
> +++ WebCore/rendering/InlineTextBox.cpp	(working copy)
> @@ -1,7 +1,7 @@
>  /*
>   * (C) 1999 Lars Knoll (knoll@kde.org)
>   * (C) 2000 Dirk Mueller (mueller@kde.org)
> - * Copyright (C) 2004, 2005, 2006, 2007, 2008 Apple Inc. All rights reserved.
> + * Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010 Apple Inc. All rights reserved.
>   *
>   * This library is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU Library General Public
> @@ -103,6 +103,16 @@ RenderObject::SelectionState InlineTextB
>      return state;
>  }
>  
> +static void adjustCharactersAndLengthForHyphen(Vector<UChar, 256> charactersWithHyphen, RenderStyle* style, const UChar*& characters, int& length)

Did you mean to pass a reference to the Vector here?

> -    InlineIterator findNextLineBreak(InlineBidiResolver&, bool firstLine, bool& isLineEmpty, bool& previousLineBrokeCleanly, EClear* clear = 0);
> +    InlineIterator findNextLineBreak(InlineBidiResolver&, bool firstLine, bool& isLineEmpty, bool& previousLineBrokeCleanly, bool& hyphenated, EClear* clear = 0);

The argument name "clear" here seems unneeded.

I think you might want a typedef for the repeated Vector<UChar, 256> type.

The use of "const Hyphens h" also seems wrong. Just "Hyphens h" will do.

The line breaking code looks right to me. I think the test cases are the important bit here.

Looks like EWS builds are failing on this patch. I’ll say review- based on the EWS failures and the lack of the "&" in the Vector argument.
Comment 26 mitz 2010-06-20 16:09:50 PDT
Created attachment 59219 [details]
Implement the 'hyphens' and 'hyphenate-character' properties

(In reply to comment #25)
> (From update of attachment 59196 [details])
> > +            if (style->hyphenateCharacter() == nullAtom)
> > +                return CSSPrimitiveValue::createIdentifier(CSSValueAuto);
> 
> I see the use of "== nullAtom" earlier in the file. But I'm surprised we do that instead of calling the isNull function.

Changed to use isNull().

> > +size_t lastHyphenLocation(const UChar*, size_t length, size_t maxLocation);
> 
> To me it's not obvious what the relationship between maxLocation and length is.

I considered appending BeforeIndex to the function name and renaming the last parameter, but settled on just renaming maxLocation to beforeIndex.

> > +    static CFLocaleRef locale = 0;
> > +    if (!locale) {
> > +        RetainPtr<CFStringRef> localeIdentifier(AdoptCF, CFStringCreateWithBytesNoCopy(kCFAllocatorDefault, currentSearchLocaleID(), strlen(currentSearchLocaleID()), kCFStringEncodingASCII, False, kCFAllocatorNull));
> > +        locale = CFLocaleCreate(kCFAllocatorDefault, localeIdentifier.get());
> > +    }
> 
> I prefer initializing with a create function called in the initializer to a separate if statement like this one.

Done.

> Why not just this?
> 
>     static bool localeIsEnglish = !strcmp("en", currentSearchLocaleID());
> 
> Seems better to me than the two booleans.

Changed (when I wrote this I thought that checking if the locale is English was going to require multiple statements).

> > +static void adjustCharactersAndLengthForHyphen(Vector<UChar, 256> charactersWithHyphen, RenderStyle* style, const UChar*& characters, int& length)
> 
> Did you mean to pass a reference to the Vector here?

Yes.

> > -    InlineIterator findNextLineBreak(InlineBidiResolver&, bool firstLine, bool& isLineEmpty, bool& previousLineBrokeCleanly, EClear* clear = 0);
> > +    InlineIterator findNextLineBreak(InlineBidiResolver&, bool firstLine, bool& isLineEmpty, bool& previousLineBrokeCleanly, bool& hyphenated, EClear* clear = 0);
> 
> The argument name "clear" here seems unneeded.

Removed.

> I think you might want a typedef for the repeated Vector<UChar, 256> type.

Done, although I couldn’t come up with a good name.

> The use of "const Hyphens h" also seems wrong. Just "Hyphens h" will do.

Fixed.

> Looks like EWS builds are failing on this patch. I’ll say review- based on the EWS failures and the lack of the "&" in the Vector argument.

I am not sure what’s causing the failures. Windows appears to be failing to merge the vcproj changes, perhaps due to different line endings. Mac might be failing because I didn’t include an updated WebKitSystemInterface binary. As for Chromium, I don’t know what to do besides adding Hyphenation.cpp to the gypi file.
Comment 27 mitz 2010-06-20 16:11:06 PDT
Created attachment 59220 [details]
Implement the 'hyphens' and 'hyphenate-character' properties
Comment 28 WebKit Review Bot 2010-06-20 16:13:39 PDT
Attachment 59220 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebCore/rendering/style/RenderStyle.cpp:389:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
WebCore/rendering/style/RenderStyle.cpp:390:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Total errors found: 2 in 47 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 29 WebKit Review Bot 2010-06-20 17:22:50 PDT
Attachment 59220 [details] did not build on chromium:
Build output: http://webkit-commit-queue.appspot.com/results/3290488
Comment 30 mitz 2010-06-21 09:59:17 PDT
Fixed in <http://trac.webkit.org/changeset/61548>.
Comment 31 James Robinson 2010-06-21 19:19:05 PDT
http://trac.webkit.org/changeset/61548/ appears to have regressed performance on one of the Chromium page cyclers by ~10%.  The page cycler consists of lots of pages with different sorts of international content and probably stress the line breaking algorithm.  I would not expect these pages to be using -webkit-hyphen so this is probably unintentional.

Do you know what the issue is likely to be?
Comment 32 mitz 2010-06-21 23:32:16 PDT
(In reply to comment #31)
> http://trac.webkit.org/changeset/61548/ appears to have regressed performance on one of the Chromium page cyclers by ~10%.  The page cycler consists of lots of pages with different sorts of international content and probably stress the line breaking algorithm.  I would not expect these pages to be using -webkit-hyphen so this is probably unintentional.
> 
> Do you know what the issue is likely to be?

I am looking into this now.
Comment 33 mitz 2010-06-21 23:47:30 PDT
r61552 measures as <1.03x slower than r61547 in my tests. Not surprisingly, rolling out just the RenderBlock changes recovers the performance (so the slowness does appear to be in line layout, not drawing, nor style, WKSI etc.).
Comment 34 mitz 2010-06-22 00:33:59 PDT
Created attachment 59347 [details]
Speedup
Comment 35 David Kilzer (:ddkilzer) 2010-06-22 00:46:56 PDT
Comment on attachment 59347 [details]
Speedup

r=me
Comment 36 mitz 2010-06-22 00:49:36 PDT
Created attachment 59349 [details]
Potential speedup
Comment 37 Kent Tamura 2010-06-22 02:57:42 PDT
Created attachment 59357 [details]
Performance regression chart
Comment 38 Kent Tamura 2010-06-22 03:08:07 PDT
(In reply to comment #37)
> Created an attachment (id=59357) [details]
> Performance regression chart

This chart shows rendering time difference between r61547 (just before hyphenate change) and r61565 (hyphenate change + build fix).  The following sites have visible regressions.

www.jagran.com
www.startimes2.com
www.mthai.com
sgkalesh.blogspot.com
sh3bwah.com
news.google.com
manager.co.th
isna.ir
arabicnews.google.com

Note: we use local copies of these sites for performance regression.  So the site as of now might reproduce the regression.
Comment 39 Kent Tamura 2010-06-22 03:09:23 PDT
(In reply to comment #38)
> Note: we use local copies of these sites for performance regression.  So the site as of now might reproduce the regression.

might not reproduce ...
Comment 40 mitz 2010-06-22 10:28:07 PDT
Quick update on performance: I was surprised to see that often there’s higher impact on pages in languages that use the complex text code path. Then I was surprised to find indications in profiles that more time is spent measuring text in r61552 than in r61547. Next I am going to try to confirm that text-measuring is invoked more often or in a way that makes it slower, which was not supposed to happen.
Comment 41 James Robinson 2010-06-22 12:08:26 PDT
Awesome.  Since you can reproduce locally, I think we should revert this patch while you work on the perf issue.  That way our perf bots will be able to catch any other regressions that might occur in the meantime.  Any objections?
Comment 42 mitz 2010-06-22 12:24:13 PDT
(In reply to comment #41)
> Awesome.  Since you can reproduce locally, I think we should revert this patch while you work on the perf issue.  That way our perf bots will be able to catch any other regressions that might occur in the meantime.  Any objections?

I object. I am surprised that posting progress updates which show that I am actively working to resolve the performance issues somehow meant that the changes should be reverted while I am investigating.
Comment 43 James Robinson 2010-06-22 12:37:27 PDT
I don't want to hinder your work in any way and intend no offense.  Since you are able to reproduce the issue locally, however, I see no reason that it's easier to work on the issue with the patch in trunk or outside of it.  Given that my preference would be for rolling it out for now so that the perf bots have a better chance of noticing any regressions that might be introduced by changes elsewhere in the codebase.  If it is easier for you to work on the regression with the patch still in trunk then that is plenty of reason to leave it in.

Historically it's been extremely difficult to track down perf regressions if they are not caught very quickly.
Comment 44 mitz 2010-06-22 14:19:25 PDT
Filed bug 41012.