Bug 63209 - Make line breaking obey the -webkit-locale property
Summary: Make line breaking obey the -webkit-locale property
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: mitz
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2011-06-22 17:51 PDT by mitz
Modified: 2011-06-23 11:03 PDT (History)
6 users (show)

See Also:


Attachments
Patch (37.92 KB, patch)
2011-06-22 17:59 PDT, mitz
no flags Details | Formatted Diff | Diff
Patch (42.35 KB, patch)
2011-06-22 20:13 PDT, mitz
no flags Details | Formatted Diff | Diff
Patch (42.38 KB, patch)
2011-06-22 20:23 PDT, mitz
no flags Details | Formatted Diff | Diff
Patch (41.12 KB, patch)
2011-06-23 10:36 PDT, mitz
ap: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description mitz 2011-06-22 17:51:19 PDT
Make line breaking obey the -webkit-locale property
Comment 1 mitz 2011-06-22 17:59:28 PDT
Created attachment 98275 [details]
Patch
Comment 2 mitz 2011-06-22 18:00:17 PDT
<rdar://problem/8948904>
Comment 3 WebKit Review Bot 2011-06-22 18:05:59 PDT
Comment on attachment 98275 [details]
Patch

Attachment 98275 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8925806
Comment 4 Early Warning System Bot 2011-06-22 18:13:16 PDT
Comment on attachment 98275 [details]
Patch

Attachment 98275 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8925808
Comment 5 Martin Robinson 2011-06-22 18:23:47 PDT
Comment on attachment 98275 [details]
Patch

Attachment 98275 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8919875
Comment 6 WebKit Review Bot 2011-06-22 19:37:23 PDT
Comment on attachment 98275 [details]
Patch

Attachment 98275 [details] did not pass cr-mac-ews (chromium):
Output: http://queues.webkit.org/results/8924831
Comment 7 mitz 2011-06-22 20:13:42 PDT
Created attachment 98305 [details]
Patch
Comment 8 WebKit Review Bot 2011-06-22 20:18:57 PDT
Comment on attachment 98305 [details]
Patch

Attachment 98305 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8918914
Comment 9 Martin Robinson 2011-06-22 20:22:28 PDT
Comment on attachment 98305 [details]
Patch

Attachment 98305 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8923865
Comment 10 mitz 2011-06-22 20:23:12 PDT
Created attachment 98309 [details]
Patch
Comment 11 Alexey Proskuryakov 2011-06-23 09:30:08 PDT
Comment on attachment 98309 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98309&action=review

> Source/WebCore/platform/text/TextBreakIterator.h:48
> +    void releaseLineBreakIterator(TextBreakIterator*, const AtomicString& locale = AtomicString());

I can see why this was easier to implement, but passing locale to the release function is quite surprising and misleading. Perhaps it would be better to maintain a reverse map of acquired TextBreakIterators to their locales?

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:78
> +    static LineBreakIteratorPool& sharedPool()
> +    {
> +        DEFINE_STATIC_LOCAL(LineBreakIteratorPool, pool, ());

Perhaps ASSERT(isMainThread())?

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:93
> +        return ubrk_open(UBRK_LINE, locale.isEmpty() ? currentTextBreakLocaleID() : locale.string().ascii().data(), 0, 0, &openStatus);

I would at least ASSERT the status.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:123
> +    if (U_FAILURE(setTextStatus))
> +        return 0;

I'm not sure what this error means. Is this a completely unexpected condition that we should ASSERT against?
Comment 12 Darin Adler 2011-06-23 09:35:58 PDT
Comment on attachment 98309 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98309&action=review

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:93
> +        return ubrk_open(UBRK_LINE, locale.isEmpty() ? currentTextBreakLocaleID() : locale.string().ascii().data(), 0, 0, &openStatus);

Since empty string and null string are distinct, they end up with two separate entries in the pool, but both map to the current locale.

If we mapped empty locale strings to the current text break locale before looking for matches in the pool, then it could use an existing iterator with the same locale, explicitly specified. Also, doing it that way would eliminate a dependency on the fact that the current locale ID won't change in this function. I know that's currently how currentTextBreakLocaleID() behaves, but it might be nice to not have the pool machinery depend on it.

Is ascii() better than utf8() or latin1() here? I worry slightly that folding non-ASCII characters to "?" could make unwanted collisions or aliasing. Maybe we should assert that the locale string is all ASCII.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:101
> +        if (m_pool.size() == capacity) {
> +            ubrk_close(m_pool[0].second);
> +            m_pool.remove(0);
> +        }

This is acting like a deque rather than a vector. But I suppose given the small fixed size a vector works. Another way to do it would be to use a fixed size array as a circular buffer.
Comment 13 mitz 2011-06-23 10:36:40 PDT
Created attachment 98368 [details]
Patch
Comment 14 mitz 2011-06-23 10:48:09 PDT
(In reply to comment #12)

The version I just posted addresses some of Alexey’s comments, but I didn’t see Darin’s comments before posting it.

> (From update of attachment 98309 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=98309&action=review
> 
> > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:93
> > +        return ubrk_open(UBRK_LINE, locale.isEmpty() ? currentTextBreakLocaleID() : locale.string().ascii().data(), 0, 0, &openStatus);
> 
> Since empty string and null string are distinct, they end up with two separate entries in the pool, but both map to the current locale.
> 
> If we mapped empty locale strings to the current text break locale before looking for matches in the pool, then it could use an existing iterator with the same locale, explicitly specified. Also, doing it that way would eliminate a dependency on the fact that the current locale ID won't change in this function. I know that's currently how currentTextBreakLocaleID() behaves, but it might be nice to not have the pool machinery depend on it.

This occurred to me. Doing the mapping first would have required a round trip through AtomicString, and while that’s not a big deal, I decided not to bother with this improvement.

> Is ascii() better than utf8() or latin1() here? I worry slightly that folding non-ASCII characters to "?" could make unwanted collisions or aliasing. Maybe we should assert that the locale string is all ASCII.

I think utf8() is better. I am going to change to utf8() (if I remember).

> 
> > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:101
> > +        if (m_pool.size() == capacity) {
> > +            ubrk_close(m_pool[0].second);
> > +            m_pool.remove(0);
> > +        }
> 
> This is acting like a deque rather than a vector. But I suppose given the small fixed size a vector works. Another way to do it would be to use a fixed size array as a circular buffer.

I need to be able to remove entries from anywhere in the vector, so it is not quite a deque. I preferred to keep the code simple (IMO), given the small fixed size.
Comment 15 Alexey Proskuryakov 2011-06-23 10:54:44 PDT
Comment on attachment 98368 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98368&action=review

I'm going to say r+, although not sharing cached iterators for empty -webkit-locale and for one that happens to match user locale seems unfortunate (as well as empty vs null).

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:122
> +private:
> +    LineBreakIteratorPool() { }

Should the class also be noncopyable?
Comment 16 mitz 2011-06-23 11:03:16 PDT
Fixed in <http://trac.webkit.org/changeset/89592>.