Bug 48610 - [Chromium] Implement hyphenation for Chromium
Summary: [Chromium] Implement hyphenation for Chromium
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Layout and Rendering (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Roland Steiner
URL:
Keywords: GoogleBug, LayoutTestFailure, WebExposed
Depends on:
Blocks:
 
Reported: 2010-10-28 20:06 PDT by Roland Steiner
Modified: 2012-10-08 20:55 PDT (History)
20 users (show)

See Also:


Attachments
Add stub methods. (8.89 KB, patch)
2012-02-10 02:08 PST, Hironori Bono
fishd: review-
Details | Formatted Diff | Diff
Applied comments and moved the stub functions to Platform. (6.37 KB, patch)
2012-06-27 23:36 PDT, Hironori Bono
no flags Details | Formatted Diff | Diff
Fixed a style error (6.37 KB, patch)
2012-06-27 23:46 PDT, Hironori Bono
eric: review+
Details | Formatted Diff | Diff
A pagecycler-test result (3.15 KB, text/plain)
2012-08-01 18:41 PDT, Hironori Bono
no flags Details
Fixed a conflict in WebCore.gypi (6.41 KB, patch)
2012-08-02 02:17 PDT, Hironori Bono
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Roland Steiner 2010-10-28 20:06:06 PDT
Implement hyphenation for all Chromium platforms.

See http://crbug.com/47083 for the Chromium side.
Comment 1 Hironori Bono 2012-02-10 02:08:35 PST
Created attachment 126480 [details]
Add stub methods.

Greetings,

To move this issue forward, I have added a couple of stub methods so we can implement them in Chromium. (Its Chromium-side change is <http://codereview.chromium.org/9380002/>.)

Regards,

Hironori Bono
Comment 2 WebKit Review Bot 2012-02-10 02:11:42 PST
Please wait for approval from fishd@chromium.org before submitting because this patch contains changes to the Chromium public API.
Comment 3 Tony Chang 2012-02-10 10:33:18 PST
Comment on attachment 126480 [details]
Add stub methods.

View in context: https://bugs.webkit.org/attachment.cgi?id=126480&action=review

LGTM, but please wait for fishd to take a look.

> Source/WebCore/ChangeLog:8
> +        PlatformSupport so WebCore can call these methods. It also adds these methods
> +        methods to WebKitPlatformSupport so Chromium can implement them. (This change

Nit: methods methods.
Comment 4 Darin Fisher (:fishd, Google) 2012-02-11 11:01:27 PST
Comment on attachment 126480 [details]
Add stub methods.

View in context: https://bugs.webkit.org/attachment.cgi?id=126480&action=review

> Source/WebCore/platform/chromium/PlatformSupport.h:192
> +    static size_t lastHyphenLocation(const UChar* characters, size_t length, size_t index, const String& locale);

nit: this function could probably benefit from having a verb, like "compute"... computeLastHyphenLocation.  what is the meaning of the "index" parameter?

> Source/WebCore/platform/text/chromium/Hyphenation.cpp:39
> +size_t lastHyphenLocation(const UChar* characters, size_t length, size_t beforeIndex, const AtomicString& localeIdentifier)

it seems like this should also be named computeLastHyphenLocation

> Source/WebKit/chromium/public/platform/WebKitPlatformSupport.h:168
> +    virtual bool canHyphenate(const WebKit::WebString& locale) { return false; }

nit: no need for WebKit:: prefix

> Source/WebKit/chromium/public/platform/WebKitPlatformSupport.h:172
> +

nit: two new lines above section headers

> Source/WebKit/chromium/src/PlatformSupport.cpp:539
> +bool PlatformSupport::canHyphenate(const String& locale)

Why isn't the hyphenation engine directly callable by WebKit?  WebKit uses ICU directly.  How is this API implemented on the Chromium side?
Comment 5 Tony Chang 2012-02-13 10:05:18 PST
Comment on attachment 126480 [details]
Add stub methods.

View in context: https://bugs.webkit.org/attachment.cgi?id=126480&action=review

>> Source/WebKit/chromium/src/PlatformSupport.cpp:539
>> +bool PlatformSupport::canHyphenate(const String& locale)
> 
> Why isn't the hyphenation engine directly callable by WebKit?  WebKit uses ICU directly.  How is this API implemented on the Chromium side?

It's not yet implemented on the Chromium side.  ICU doesn't have hyphenation tables.  The current plan is to use the hunspell hyphenation library (currently in third_party, but not hooked up yet).
Comment 6 Darin Fisher (:fishd, Google) 2012-02-13 10:57:16 PST
(In reply to comment #5)
> It's not yet implemented on the Chromium side.  ICU doesn't have hyphenation tables.  The current plan is to use the hunspell hyphenation library (currently in third_party, but not hooked up yet).

I see.  I guess we probably don't want to start using hunspell directly from WebKit.

It seems like there is an opportunity to unify spell-checking, grammar-checking and hyphenation support behind some common interface, but perhaps that's well outside the scope of this bug.  Just food for thought.

The current approach LGTM.  Just consider adding some kind of verb prefix :)
Comment 7 Hironori Bono 2012-02-13 19:07:46 PST
Greetings Tony and Darin,

Many thanks for your review and comments. (I was out-of-office yesterday to go to a dentist.)
I would like to add some backgrounds behind this change. (I should have added them before uploading this change.) When I implement this change, I thought three constraints:
1. Layout Tests should work consistently on all platforms;
2. Mac Chrome should emulate Sarari, i.e. Mac Chrome should use hyphenation functions of Cocoa, and;
3. Win Chrome and Linux one should use the hyphenation library of hunspell.
To compare hyphenation results of hunspell and Cocoa, sometimes hunspell works better and the other times Cocoa works better as listed in the following examples. (In general, hunspell tends to add hyphens where we should not add ones. Cocoa tends not to add hyphens where we should add ones.)
* Con-cu-pis-cent (Merriam-Webster)
 - hunspell outputs "con-cu-pis-cent".
 - Cocoa outputs "concupiscent".
* Com-fort-ably (Merriam-Webster)
 - hunspell outputs "com-fort-ab-ly".
 - Cocoa outputs "com-fort-ably".
The current layout-test results expects on the hyphenation library of Cocoa and I notice we need rebaselines (at least) on Win when we use the hyphenation library of hunspell. 
* DumpRenderTree - Mock <http://codereview.chromium.org/9380002/>.
* Mac Chrome - Cocoa.
* Win Chrome and Linux one - hunspell.
I would like to hear this idea makes sense to you before moving this issue forward. Would it be possible to give me your thoughts?

Regards,

Hironori Bono
Comment 8 Tony Chang 2012-02-14 10:53:28 PST
I see, I didn't know that you wanted to use the Cocoa.  My understanding is that hyphenation is an OSX 10.7+ feature, so on Leopard and SL, we'd probably still have to use the hunspell library.

My suggestion is to start with just the hunspell library on all platforms and OS versions and we can add Cocoa specific hooks for OSX 10.7+ in the future.

I don't think it's terribly important that we match the results of layout test exactly.  Having a separate baseline for Chromium is fine.
Comment 9 Roland Steiner 2012-02-14 18:12:44 PST
One other issue with hyphenation in general is that the way WebKit implements it (lastHyphenationlocation, et al) does not work across all languages. E.g. in German old orthography and Hungarian there are cases where the surface form of the word changes before and/or after the hyphen. E.g.:

Schiffahrt -> Schiff-fahrt
Bäcker -> Bäk-ker

Hunspell can do this, but the underlying rendering code would need to change for this.
Comment 10 Tony Chang 2012-02-15 10:09:53 PST
(In reply to comment #9)
> One other issue with hyphenation in general is that the way WebKit implements it (lastHyphenationlocation, et al) does not work across all languages. E.g. in German old orthography and Hungarian there are cases where the surface form of the word changes before and/or after the hyphen. E.g.:
> 
> Schiffahrt -> Schiff-fahrt
> Bäcker -> Bäk-ker
> 
> Hunspell can do this, but the underlying rendering code would need to change for this.

Yeah, I would initially not implement hyphenation for those languages.  Let's implement the easy stuff first (and what Safari already implements), then we can refactor to support more languages.
Comment 11 Mojca Miklavec 2012-06-22 15:02:23 PDT
I would like to bring your attention to this document:
    http://xml.resource.org/cgi-bin/xml2rfc-dev.cgi?mode=html&url=https://raw.github.com/PanderMusubi/hyphenation-definitions/master/hyphenation-definitions.xml

It is a recent attempt (under development and possibly still looking for some basic funding) to prepare more extensive & complete support for hyphenation, in particular covering exceptions in languages like German where the word changes when it is hyphenated. This means that libhyphen might be replaced with a more modern implementation in not too distant future. You might want to have support earlier than that, but it might make sense to be aware of this project, and possibly join it.

News might be posted to this mailing list: http://tug.org/mailman/listinfo/tex-hyphen, but only major announcements.

I'm one of maintainers of collection of hyphenation patterns for TeX. The same database is also used for Mozilla. If you have any feature requests regarding collection of hyphenation patterns, I'm willing to help. On the other hand I might be grateful for advices about licences.
Comment 12 Hironori Bono 2012-06-27 23:36:17 PDT
Created attachment 149882 [details]
Applied comments and moved the stub functions to Platform.

Greetings,

Thanks for your comments and sorry for my slow response. I have renamed lastHyphenLocation to computeLastHyphenLocation, and moved the stub functions to WebCore/Platform. (It seems we have been moving stub function there.) Would it be possible to review the updated one?

Regards,

Hironori Bono
Comment 13 WebKit Review Bot 2012-06-27 23:38:53 PDT
Please wait for approval from abarth@webkit.org, dglazkov@chromium.org, fishd@chromium.org, jamesr@chromium.org or tkent@chromium.org before submitting, as this patch contains changes to the Chromium public API. See also https://trac.webkit.org/wiki/ChromiumWebKitAPI.
Comment 14 WebKit Review Bot 2012-06-27 23:39:20 PDT
Attachment 149882 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/Platform/ChangeLog', u'Source/Platf..." exit_code: 1
Source/WebCore/platform/text/chromium/Hyphenation.cpp:27:  You should add a blank line after implementation file's own header.  [build/include_order] [4]
Total errors found: 1 in 5 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 15 Hironori Bono 2012-06-27 23:46:32 PDT
Created attachment 149885 [details]
Fixed a style error

Greetings,

Sorry for this style error. I forgot running webkit-check-style.
I have uploaded another change that fixes this style error.

Regards,

Hironori Bono

(In reply to comment #14)
> Attachment 149882 [details] did not pass style-queue:
> 
> Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/Platform/ChangeLog', u'Source/Platf..." exit_code: 1
> Source/WebCore/platform/text/chromium/Hyphenation.cpp:27:  You should add a blank line after implementation file's own header.  [build/include_order] [4]
> Total errors found: 1 in 5 files
> 
> 
> If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 16 Ryosuke Niwa 2012-07-19 16:41:13 PDT
Please make sure not to regress line layout performance.
Comment 17 Eric Seidel (no email) 2012-08-01 16:11:06 PDT
Comment on attachment 149885 [details]
Fixed a style error

I wonder how much of a perf impact this will be.  If hyphenation is always on, this could be a huge slowdown for some pages.
Comment 18 Eric Seidel (no email) 2012-08-01 16:13:14 PDT
Please either be aware of possible perf impact after landing, or better yet update the bug with some information about expected perf effect before landing.

I assume hyphenation is opt-in?  Or are we always applying it to pages?
Comment 19 Hironori Bono 2012-08-01 18:41:35 PDT
Created attachment 155945 [details]
A pagecycler-test result

Greetings Niwa-san and Eric,

Thanks for your comments and sorry for my slow response.
As written in the spec <http://www.w3.org/TR/css3-text/#hyphens>, the initial value of the hyphens property is 'manual', i.e. hyphenation is (and should be) disabled by default. In fact, when running run_webkit_tests of Apple WebKit, only a couple of layout tests ('hyphenate-character.html' and 'hyphens.html') call the lastHyphenLocation function.
By the way, I have written a PageCycler test to measure the rendering performance with or without enabling hyphenation and run it on my Linux PC. In brief, enabling hyphenation takes 4% more time and 2% more (~1.36MB) renderer memory. (The attached file is the raw output of this experiment.)

Regards,

Hironori Bono
Comment 20 Eric Seidel (no email) 2012-08-01 18:42:50 PDT
So long as it's off by default, I don't see this affecting any perf.  Thanks for all the extra information!
Comment 21 Hironori Bono 2012-08-02 02:17:29 PDT
Created attachment 156015 [details]
Fixed a conflict in WebCore.gypi

Greetings,

I would like to upload another change since the previous one causes a conflict with the current WebCore.gypi.

Regards,

Hironori Bono
Comment 22 Eric Seidel (no email) 2012-08-02 02:21:38 PDT
Comment on attachment 156015 [details]
Fixed a conflict in WebCore.gypi

OK.
Comment 23 WebKit Review Bot 2012-08-02 03:14:23 PDT
Comment on attachment 156015 [details]
Fixed a conflict in WebCore.gypi

Clearing flags on attachment: 156015

Committed r124434: <http://trac.webkit.org/changeset/124434>
Comment 24 Ryosuke Niwa 2012-09-13 11:03:45 PDT
Has this bug been fixed by http://trac.webkit.org/changeset/124434? Or is it still open?
Comment 25 Hironori Bono 2012-09-13 19:26:57 PDT
Greetings Niwa-san,

In terms of DumpRenderTree, no. We need another Chromium change that makes DumpRenderTree load a hyphenation dictionary to fix the layout tests listed in my r124434. (Also do we need some rebaselined results after landing the Chromium change.) Nevertheless, it is OK to close this issue because we do not need any more code changes.
In terms of Chromium, no in practice. Chromium can hyphenate words only when we install a hyphenation dictionary manually by ourselves. (We need some changes that uploads the dictionary to a server and downloads it.)

Regards,

Hironori Bono

(In reply to comment #24)
> Has this bug been fixed by http://trac.webkit.org/changeset/124434? Or is it still open?
Comment 26 Hironori Bono 2012-10-08 20:55:21 PDT
Niwa-san,

I like to close this bug because I have landed the Chromium-side change and DumpRenderTree can now hyphenate words. (Some hyphenation tests need rebaselined results, though.)

Regards,

Hironori Bono

(In reply to comment #25)
> Greetings Niwa-san,
> 
> In terms of DumpRenderTree, no. We need another Chromium change that makes DumpRenderTree load a hyphenation dictionary to fix the layout tests listed in my r124434. (Also do we need some rebaselined results after landing the Chromium change.) Nevertheless, it is OK to close this issue because we do not need any more code changes.
> In terms of Chromium, no in practice. Chromium can hyphenate words only when we install a hyphenation dictionary manually by ourselves. (We need some changes that uploads the dictionary to a server and downloads it.)
> 
> Regards,
> 
> Hironori Bono
> 
> (In reply to comment #24)
> > Has this bug been fixed by http://trac.webkit.org/changeset/124434? Or is it still open?