Make line breaking obey the -webkit-locale property
Created attachment 98275 [details] Patch
<rdar://problem/8948904>
Comment on attachment 98275 [details] Patch Attachment 98275 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/8925806
Comment on attachment 98275 [details] Patch Attachment 98275 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/8925808
Comment on attachment 98275 [details] Patch Attachment 98275 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/8919875
Comment on attachment 98275 [details] Patch Attachment 98275 [details] did not pass cr-mac-ews (chromium): Output: http://queues.webkit.org/results/8924831
Created attachment 98305 [details] Patch
Comment on attachment 98305 [details] Patch Attachment 98305 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/8918914
Comment on attachment 98305 [details] Patch Attachment 98305 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/8923865
Created attachment 98309 [details] Patch
Comment on attachment 98309 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98309&action=review > Source/WebCore/platform/text/TextBreakIterator.h:48 > + void releaseLineBreakIterator(TextBreakIterator*, const AtomicString& locale = AtomicString()); I can see why this was easier to implement, but passing locale to the release function is quite surprising and misleading. Perhaps it would be better to maintain a reverse map of acquired TextBreakIterators to their locales? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:78 > + static LineBreakIteratorPool& sharedPool() > + { > + DEFINE_STATIC_LOCAL(LineBreakIteratorPool, pool, ()); Perhaps ASSERT(isMainThread())? > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:93 > + return ubrk_open(UBRK_LINE, locale.isEmpty() ? currentTextBreakLocaleID() : locale.string().ascii().data(), 0, 0, &openStatus); I would at least ASSERT the status. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:123 > + if (U_FAILURE(setTextStatus)) > + return 0; I'm not sure what this error means. Is this a completely unexpected condition that we should ASSERT against?
Comment on attachment 98309 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98309&action=review > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:93 > + return ubrk_open(UBRK_LINE, locale.isEmpty() ? currentTextBreakLocaleID() : locale.string().ascii().data(), 0, 0, &openStatus); Since empty string and null string are distinct, they end up with two separate entries in the pool, but both map to the current locale. If we mapped empty locale strings to the current text break locale before looking for matches in the pool, then it could use an existing iterator with the same locale, explicitly specified. Also, doing it that way would eliminate a dependency on the fact that the current locale ID won't change in this function. I know that's currently how currentTextBreakLocaleID() behaves, but it might be nice to not have the pool machinery depend on it. Is ascii() better than utf8() or latin1() here? I worry slightly that folding non-ASCII characters to "?" could make unwanted collisions or aliasing. Maybe we should assert that the locale string is all ASCII. > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:101 > + if (m_pool.size() == capacity) { > + ubrk_close(m_pool[0].second); > + m_pool.remove(0); > + } This is acting like a deque rather than a vector. But I suppose given the small fixed size a vector works. Another way to do it would be to use a fixed size array as a circular buffer.
Created attachment 98368 [details] Patch
(In reply to comment #12) The version I just posted addresses some of Alexey’s comments, but I didn’t see Darin’s comments before posting it. > (From update of attachment 98309 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=98309&action=review > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:93 > > + return ubrk_open(UBRK_LINE, locale.isEmpty() ? currentTextBreakLocaleID() : locale.string().ascii().data(), 0, 0, &openStatus); > > Since empty string and null string are distinct, they end up with two separate entries in the pool, but both map to the current locale. > > If we mapped empty locale strings to the current text break locale before looking for matches in the pool, then it could use an existing iterator with the same locale, explicitly specified. Also, doing it that way would eliminate a dependency on the fact that the current locale ID won't change in this function. I know that's currently how currentTextBreakLocaleID() behaves, but it might be nice to not have the pool machinery depend on it. This occurred to me. Doing the mapping first would have required a round trip through AtomicString, and while that’s not a big deal, I decided not to bother with this improvement. > Is ascii() better than utf8() or latin1() here? I worry slightly that folding non-ASCII characters to "?" could make unwanted collisions or aliasing. Maybe we should assert that the locale string is all ASCII. I think utf8() is better. I am going to change to utf8() (if I remember). > > > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:101 > > + if (m_pool.size() == capacity) { > > + ubrk_close(m_pool[0].second); > > + m_pool.remove(0); > > + } > > This is acting like a deque rather than a vector. But I suppose given the small fixed size a vector works. Another way to do it would be to use a fixed size array as a circular buffer. I need to be able to remove entries from anywhere in the vector, so it is not quite a deque. I preferred to keep the code simple (IMO), given the small fixed size.
Comment on attachment 98368 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98368&action=review I'm going to say r+, although not sharing cached iterators for empty -webkit-locale and for one that happens to match user locale seems unfortunate (as well as empty vs null). > Source/WebCore/platform/text/TextBreakIteratorICU.cpp:122 > +private: > + LineBreakIteratorPool() { } Should the class also be noncopyable?
Fixed in <http://trac.webkit.org/changeset/89592>.