RESOLVED FIXED 63209
Make line breaking obey the -webkit-locale property
https://bugs.webkit.org/show_bug.cgi?id=63209
Summary Make line breaking obey the -webkit-locale property
mitz
Reported 2011-06-22 17:51:19 PDT
Make line breaking obey the -webkit-locale property
Attachments
Patch (37.92 KB, patch)
2011-06-22 17:59 PDT, mitz
no flags
Patch (42.35 KB, patch)
2011-06-22 20:13 PDT, mitz
no flags
Patch (42.38 KB, patch)
2011-06-22 20:23 PDT, mitz
no flags
Patch (41.12 KB, patch)
2011-06-23 10:36 PDT, mitz
ap: review+
mitz
Comment 1 2011-06-22 17:59:28 PDT
mitz
Comment 2 2011-06-22 18:00:17 PDT
WebKit Review Bot
Comment 3 2011-06-22 18:05:59 PDT
Comment on attachment 98275 [details] Patch Attachment 98275 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/8925806
Early Warning System Bot
Comment 4 2011-06-22 18:13:16 PDT
Martin Robinson
Comment 5 2011-06-22 18:23:47 PDT
WebKit Review Bot
Comment 6 2011-06-22 19:37:23 PDT
Comment on attachment 98275 [details] Patch Attachment 98275 [details] did not pass cr-mac-ews (chromium): Output: http://queues.webkit.org/results/8924831
mitz
Comment 7 2011-06-22 20:13:42 PDT
WebKit Review Bot
Comment 8 2011-06-22 20:18:57 PDT
Comment on attachment 98305 [details] Patch Attachment 98305 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/8918914
Martin Robinson
Comment 9 2011-06-22 20:22:28 PDT
mitz
Comment 10 2011-06-22 20:23:12 PDT
Alexey Proskuryakov
Comment 11 2011-06-23 09:30:08 PDT
Comment on attachment 98309 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98309&action=review > Source/WebCore/platform/text/TextBreakIterator.h:48 > + void releaseLineBreakIterator(TextBreakIterator*, const AtomicString& locale = AtomicString()); I can see why this was easier to implement, but passing locale to the release function is quite surprising and misleading. Perhaps it would be better to maintain a reverse map of acquired TextBreakIterators to their locales? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:78 > + static LineBreakIteratorPool& sharedPool() > + { > + DEFINE_STATIC_LOCAL(LineBreakIteratorPool, pool, ()); Perhaps ASSERT(isMainThread())? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:93 > + return ubrk_open(UBRK_LINE, locale.isEmpty() ? currentTextBreakLocaleID() : locale.string().ascii().data(), 0, 0, &openStatus); I would at least ASSERT the status. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:123 > + if (U_FAILURE(setTextStatus)) > + return 0; I'm not sure what this error means. Is this a completely unexpected condition that we should ASSERT against?
Darin Adler
Comment 12 2011-06-23 09:35:58 PDT
Comment on attachment 98309 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98309&action=review > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:93 > + return ubrk_open(UBRK_LINE, locale.isEmpty() ? currentTextBreakLocaleID() : locale.string().ascii().data(), 0, 0, &openStatus); Since empty string and null string are distinct, they end up with two separate entries in the pool, but both map to the current locale. If we mapped empty locale strings to the current text break locale before looking for matches in the pool, then it could use an existing iterator with the same locale, explicitly specified. Also, doing it that way would eliminate a dependency on the fact that the current locale ID won't change in this function. I know that's currently how currentTextBreakLocaleID() behaves, but it might be nice to not have the pool machinery depend on it. Is ascii() better than utf8() or latin1() here? I worry slightly that folding non-ASCII characters to "?" could make unwanted collisions or aliasing. Maybe we should assert that the locale string is all ASCII. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:101 > + if (m_pool.size() == capacity) { > + ubrk_close(m_pool[0].second); > + m_pool.remove(0); > + } This is acting like a deque rather than a vector. But I suppose given the small fixed size a vector works. Another way to do it would be to use a fixed size array as a circular buffer.
mitz
Comment 13 2011-06-23 10:36:40 PDT
mitz
Comment 14 2011-06-23 10:48:09 PDT
(In reply to comment #12) The version I just posted addresses some of Alexey’s comments, but I didn’t see Darin’s comments before posting it. > (From update of attachment 98309 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=98309&action=review > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:93 > > + return ubrk_open(UBRK_LINE, locale.isEmpty() ? currentTextBreakLocaleID() : locale.string().ascii().data(), 0, 0, &openStatus); > > Since empty string and null string are distinct, they end up with two separate entries in the pool, but both map to the current locale. > > If we mapped empty locale strings to the current text break locale before looking for matches in the pool, then it could use an existing iterator with the same locale, explicitly specified. Also, doing it that way would eliminate a dependency on the fact that the current locale ID won't change in this function. I know that's currently how currentTextBreakLocaleID() behaves, but it might be nice to not have the pool machinery depend on it. This occurred to me. Doing the mapping first would have required a round trip through AtomicString, and while that’s not a big deal, I decided not to bother with this improvement. > Is ascii() better than utf8() or latin1() here? I worry slightly that folding non-ASCII characters to "?" could make unwanted collisions or aliasing. Maybe we should assert that the locale string is all ASCII. I think utf8() is better. I am going to change to utf8() (if I remember). > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:101 > > + if (m_pool.size() == capacity) { > > + ubrk_close(m_pool[0].second); > > + m_pool.remove(0); > > + } > > This is acting like a deque rather than a vector. But I suppose given the small fixed size a vector works. Another way to do it would be to use a fixed size array as a circular buffer. I need to be able to remove entries from anywhere in the vector, so it is not quite a deque. I preferred to keep the code simple (IMO), given the small fixed size.
Alexey Proskuryakov
Comment 15 2011-06-23 10:54:44 PDT
Comment on attachment 98368 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98368&action=review I'm going to say r+, although not sharing cached iterators for empty -webkit-locale and for one that happens to match user locale seems unfortunate (as well as empty vs null). > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:122 > +private: > + LineBreakIteratorPool() { } Should the class also be noncopyable?
mitz
Comment 16 2011-06-23 11:03:16 PDT
Note You need to log in before you can comment on or make changes to this bug.