Bug 98711 - DOM/textarea-edit.html spends 35% of time in numGraphemeClusters
Summary: DOM/textarea-edit.html spends 35% of time in numGraphemeClusters
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Forms (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Ryosuke Niwa
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-08 19:39 PDT by Ryosuke Niwa
Modified: 2012-10-18 00:13 PDT (History)
10 users (show)

See Also:


Attachments
Mitigates the bug (2.41 KB, patch)
2012-10-08 22:11 PDT, Ryosuke Niwa
no flags Details | Formatted Diff | Diff
Fixes the bug (4.13 KB, patch)
2012-10-09 11:34 PDT, Ryosuke Niwa
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ryosuke Niwa 2012-10-08 19:39:17 PDT
It appears that the bottleneck is numGraphemeClusters. It's very slow :(

Running Time	Self		Symbol Name
17515.0ms   36.0%	17515.0	 	icu::RuleBasedBreakIterator::handleNext(icu::RBBIStateTable const*)
17460.0ms   35.9%	0.0	 	 icu::RuleBasedBreakIterator::next()
17031.0ms   35.0%	0.0	 	  WebCore::numGraphemeClusters(WTF::String const&)
8731.0ms   17.9%	0.0	 	   WebCore::HTMLTextAreaElement::tooLong() const
8731.0ms   17.9%	0.0	 	    non-virtual thunk to WebCore::HTMLTextAreaElement::tooLong() const
8300.0ms   17.0%	0.0	 	   WebCore::HTMLTextAreaElement::handleBeforeTextInsertedEvent(WebCore::BeforeTextInsertedEvent*) const
8300.0ms   17.0%	0.0	 	    WebCore::HTMLTextAreaElement::defaultEventHandler(WebCore::Event*)
Comment 1 Ryosuke Niwa 2012-10-08 22:11:49 PDT
Created attachment 167690 [details]
Mitigates the bug
Comment 2 Ryosuke Niwa 2012-10-08 22:15:53 PDT
Comment on attachment 167690 [details]
Mitigates the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=167690&action=review

> Source/WebCore/ChangeLog:8
> +        Since numGraphemeClusters(text) is always smaller than text.length(),

Ugh... this isn't true because of LF -> CRLF conversion :(
Comment 3 Ryosuke Niwa 2012-10-08 23:17:26 PDT
Comment on attachment 167690 [details]
Mitigates the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=167690&action=review

> Source/WebCore/html/HTMLTextAreaElement.cpp:284
> +    if (currentValue.length() + event->text().length() < unsignedMaxLength)

I guess I can multiply this number by 2... so hacky :(
Comment 4 Kent Tamura 2012-10-08 23:23:51 PDT
Comment on attachment 167690 [details]
Mitigates the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=167690&action=review

>> Source/WebCore/html/HTMLTextAreaElement.cpp:284
>> +    if (currentValue.length() + event->text().length() < unsignedMaxLength)
> 
> I guess I can multiply this number by 2... so hacky :(

Why don't you introduce countNumberOfLF() or something?

  if (currentValue.length() + countNumberOfLF(currentValue) + event->text().length() + countNumberOfLF(event->text()) < ...
Comment 5 Darin Adler 2012-10-09 09:53:41 PDT
Comment on attachment 167690 [details]
Mitigates the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=167690&action=review

>>> Source/WebCore/html/HTMLTextAreaElement.cpp:284
>>> +    if (currentValue.length() + event->text().length() < unsignedMaxLength)
>> 
>> I guess I can multiply this number by 2... so hacky :(
> 
> Why don't you introduce countNumberOfLF() or something?
> 
>   if (currentValue.length() + countNumberOfLF(currentValue) + event->text().length() + countNumberOfLF(event->text()) < ...

We can have two early exits. One where you can tell it fits without even counting LFs. Another where you count LFs, and then the grapheme cluster counting full version.
Comment 6 Darin Adler 2012-10-09 10:06:45 PDT
It also seems really unfortunate that the numGraphemeClusters function is so slow. I wonder if we could get ICU to speed it up, or do something in our ICU interface to speed up the many common cases. For example, grapheme clusters are == characters for all ASCII characters.
Comment 7 Ryosuke Niwa 2012-10-09 10:10:06 PDT
(In reply to comment #6)
> It also seems really unfortunate that the numGraphemeClusters function is so slow. I wonder if we could get ICU to speed it up, or do something in our ICU interface to speed up the many common cases. For example, grapheme clusters are == characters for all ASCII characters.

Not sure special-casing ASCII characters is such a good idea. It'll certainly make it faster for English but it won't improve other languages.
Comment 8 Ryosuke Niwa 2012-10-09 11:01:19 PDT
My original approach improved the score to ~0.5, and new approach (adding the number of LFs) improves the score to ~.43 whereas the original score is ~0.3. At least this approach removes either caller from the profiler so let's make this change for now.
Comment 9 Ryosuke Niwa 2012-10-09 11:34:12 PDT
Created attachment 167801 [details]
Fixes the bug
Comment 10 Kent Tamura 2012-10-09 14:55:04 PDT
Comment on attachment 167801 [details]
Fixes the bug

ok
Comment 11 WebKit Review Bot 2012-10-09 15:19:12 PDT
Comment on attachment 167801 [details]
Fixes the bug

Clearing flags on attachment: 167801

Committed r130818: <http://trac.webkit.org/changeset/130818>
Comment 12 WebKit Review Bot 2012-10-09 15:19:17 PDT
All reviewed patches have been landed.  Closing bug.