Bug 84939 - Optimize ASCII to UTF16 conversion using SSE2.
Summary: Optimize ASCII to UTF16 conversion using SSE2.
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-26 03:32 PDT by Dongseong Hwang
Modified: 2012-04-27 08:49 PDT (History)
16 users (show)

See Also:


Attachments
patch v.1 (8.27 KB, patch)
2012-04-26 03:39 PDT, Dongseong Hwang
mrobinson: review-
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ec2-cr-linux-01 (5.87 MB, application/zip)
2012-04-26 05:18 PDT, WebKit Review Bot
no flags Details
patch v.2 (11.71 KB, patch)
2012-04-26 18:55 PDT, Dongseong Hwang
jamesr: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dongseong Hwang 2012-04-26 03:32:20 PDT
Mozilla optimized same thing in the following patch.
https://bugzilla.mozilla.org/show_bug.cgi?id=506430
Comment 1 Dongseong Hwang 2012-04-26 03:39:39 PDT
Created attachment 138967 [details]
patch v.1
Comment 2 Dongseong Hwang 2012-04-26 03:45:47 PDT
I did test html5 spec page.
http://dev.w3.org/html5/spec/single-page.html

There is about 20% improvement.

I tested 5 times before and after.
Loading time is about 24s.

Test Machine : Intel® Xeon(R) CPU X5650 @ 2.67GHz × 6 
Test WebKit : Qt WebKit Upstream

BEFORE
:4.617188
:4.814941
:4.692627
:4.772949
:4.892578
Avg. 4.7580566 ms


AFTER
:3.817139
:3.701904
:3.840820
:3.384277
:3.834961
Avg. 3.7158202 ms
Comment 3 Dongseong Hwang 2012-04-26 04:08:22 PDT
This patch is obvious where I brought the code.
https://bugzilla.mozilla.org/show_bug.cgi?id=585538
Comment 4 WebKit Review Bot 2012-04-26 05:17:59 PDT
Comment on attachment 138967 [details]
patch v.1

Attachment 138967 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/12527873

New failing tests:
http/tests/inspector/extensions-headers.html
accessibility/aria-disabled.html
http/tests/incremental/frame-focus-before-load.html
http/tests/inspector/console-xhr-logging.html
http/tests/inspector/console-xhr-logging-async.html
http/tests/filesystem/workers/resolve-url-sync.html
http/tests/inspector/console-cd-completions.html
http/tests/incremental/slow-utf8-html.pl
http/tests/filesystem/workers/resolve-url.html
http/tests/incremental/slow-utf8-css.html
http/tests/inspector/extensions-network-redirect.html
http/tests/inspector/extensions-ignore-cache.html
http/tests/inspector/console-cd.html
http/tests/incremental/slow-utf8-text.pl
http/tests/inspector/compiler-script-mapping.html
http/tests/inspector/compiler-source-mapping-debug.html
http/tests/inspector/console-cross-origin-iframe-logging.html
http/tests/inspector/console-websocket-error.html
http/tests/inspector/console-resource-errors.html
http/tests/inspector/change-iframe-src.html
Comment 5 WebKit Review Bot 2012-04-26 05:18:07 PDT
Created attachment 138980 [details]
Archive of layout-test-results from ec2-cr-linux-01

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-01  Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'>  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment 6 Martin Robinson 2012-04-26 08:15:54 PDT
Comment on attachment 138967 [details]
patch v.1

If you copy the code you also need to preserve the copyright.
Comment 7 Alexey Proskuryakov 2012-04-26 09:23:35 PDT
If I'm interpreting your numbers correctly, it's a 0.004% improvement in page load time on a large page (1 millisecond shaved off 24 seconds total time). I'm very much unsure if this is worth having a hundred lines of processor specific code for.

Mozilla seemed to have a measurable improvement on startup time (as they parse tons of text files for XUL), so it made more sense for them. Is there some other metric that shows a practical benefit for WebKit?
Comment 8 James Robinson 2012-04-26 10:52:59 PDT
(In reply to comment #2)
> I did test html5 spec page.
> http://dev.w3.org/html5/spec/single-page.html
> 
> There is about 20% improvement.
> 
> I tested 5 times before and after.
> Loading time is about 24s.
> 
> Test Machine : Intel® Xeon(R) CPU X5650 @ 2.67GHz × 6 
> Test WebKit : Qt WebKit Upstream
> 
> BEFORE
> :4.617188
> :4.814941
> :4.692627
> :4.772949
> :4.892578
> Avg. 4.7580566 ms
> 
> 
> AFTER
> :3.817139
> :3.701904
> :3.840820
> :3.384277
> :3.834961
> Avg. 3.7158202 ms

Something here doesn't add up - are you mixing up seconds and milliseconds here?  The sum of the BEFORE times do add up to 24.
Comment 9 Pratik Solanki 2012-04-26 11:02:17 PDT
(In reply to comment #8)
> > I tested 5 times before and after.
> > Loading time is about 24s.

Did you mean 24ms here instead of 24s?
Comment 10 Dongseong Hwang 2012-04-26 18:10:47 PDT
I meant 24s because html5 spec site is very big.
Comment 11 James Robinson 2012-04-26 18:11:43 PDT
Then what's changing from 4ms to 3ms?
Comment 12 Dongseong Hwang 2012-04-26 18:17:59 PDT
James : Yes

I'm sorry for confusing.

Total loading time is 24s.
And I put timing data in how long only decoding took.
That is avg 4ms.

I agree on Proskuryakov's opinion, but I wanted the opinions like those.
Comment 13 Dongseong Hwang 2012-04-26 18:55:41 PDT
Created attachment 139118 [details]
patch v.2

I append MPL term.
Follow WebKit Code Style, e.g. c++ style cast instead of c style cast.
Fix bug because of errata line 106.
Comment 14 Eric Seidel (no email) 2012-04-26 23:30:16 PDT
Is a correct understanding to state that you're proposing adding 12k of code to win 1ms on a 24s load time?
Comment 15 James Robinson 2012-04-26 23:34:36 PDT
Comment on attachment 139118 [details]
patch v.2

View in context: https://bugs.webkit.org/attachment.cgi?id=139118&action=review

> Source/WebCore/platform/text/TextCodecASCIIFastPath.h:5
> + * Version: MPL 1.1/GPL 2.0/LGPL 2.1

we can't accept code under MPL or GPL into the WebKit project.  Please review the following text (shown when you attach a bug via bugzilla's web interface, although we should make it more prominent):

WebKit Contribution Terms:

Hello and thank you for contributing a patch. Here is our licensing policy and terms for contributing code to the WebKit project.

If you are sending in a patch to existing WebKit code, you agree by clicking below that your changes are licensed under the existing license terms of the file you are modifying (i.e., BSD license or GNU Lesser General Public License v.2.1, LGPL v. 2.1). Please also add your copyright (name and year) to the relevant files for changes that are more than 10 lines of code.
If you are sending in a new file for inclusion in WebKit (no code copied from another source), the preferred license is BSD, but LGPL 2.1 is an option as well. Please include your copyright (name and year) and license preference (BSD or LGPL 2.1). By clicking below you agree that your file is licensed under either the BSD license or LGPL 2.1, as indicated in your file.
If you aren't the author of the patch, you agree to include the original copyright notices and licensing terms with it, to the extent that they exist. If there wasn't a copyright notice or license, please make a note of it. Generally we can only take in patches that are BSD- or LGPL-licensed in order to maintain license compatibility within the project.
Comment 16 Eric Seidel (no email) 2012-04-26 23:45:32 PDT
We have other MPL code in the repo:
http://trac.webkit.org/browser/trunk/Source/WTF/wtf/DateMath.h#L1

I don't know the details, but this is certainly not the first time we've imported code from mozilla.
Comment 17 Alexey Proskuryakov 2012-04-26 23:55:34 PDT
We just strip the other two licenses, only keeping LGPL.

Anyway, I think that cost/benefit ratio is such that we don't want this regardless of license.
Comment 18 Dongseong Hwang 2012-04-27 00:10:13 PDT
Eric Seidel : Yes, right. I worried that this patch is meaningless due to that.

James Robinson, Alexey Proskuryakov : I agree that cost/benefit ratio is low.
  I thought it maybe ok as I saw following files licensed by MPL 1.1.

./WebCore/html/canvas/CheckedInt.h
./WebCore/plugins/nptypes.h
./WebCore/plugins/gtk/xembed.h
./WebCore/plugins/gtk/gtk2xtbin.h
./WebCore/plugins/npapi.h
./WebCore/platform/image-decoders/gif/GIFImageReader.cpp
./WebCore/platform/image-decoders/gif/GIFImageReader.h
./WebCore/platform/gtk/CursorGtk.h
./WebCore/platform/text/TextCodecASCIIFastPath.h
./WTF/wtf/DateMath.h
./WebKit/chromium/src/WebPasswordFormUtils.cpp
./JavaScriptCore/runtime/DateConversion.h
./JavaScriptCore/runtime/JSDateMath.h

I thought this patch was difficult to commit because of low improvement and license issue. I should explain about my concern in detail, but I skipped. I'm sorry.

I am testing on Android device using NEON.
I will summit neon patch for the one that is curious about the result.
I will be OK if this and future patch will be rejected. :)
Comment 19 James Robinson 2012-04-27 00:15:53 PDT
OK, that sounds good.  I don't want to discourage you from looking for optimization opportunities - we always are interested in making WebKit faster! - but we do need to balance the benefit vs the cost to maintain and understand the code.
Comment 20 Dongseong Hwang 2012-04-27 05:03:21 PDT
I will report some test results.

Although I said loading time of html5 spec page is 24s, when I downloaded html5 spec page and tested, loading time is 10s.
ASCII to UTF16 Decoding time is about 4ms, so "ASCII to UTF16 Decoding portion of loading time" is about 0.04%.
It is because ASCII to UTF16 Decoding is very cheap, that converts just 'xxyyzz' to '00xx00yy00zz'.

SSE2 increased about 20% performance, because SSE2 has some instructions for interleaving.

But when I implemented it using NEON, loading speed decreased from 25.7ms to  44.5ms in an android device. Whole loading time is about 44s.
NEON does not have proper interleaving instruction, so I had to copy a register to a register redundantly.

I think it was mistake to try to improve ASCII to UTF16 Decoding because it is cheap.


In addition, I tested a Korean site for checking real UTF8 to UTF16 decoding time.
http://navercast.naver.com/contents.nhn?contents_id=7865
Loading time is about 240ms, and decoding time is 0.42ms. The ratio is 0.175%.

The result is that webkit-builtin-text-decoding is too fast to optimize.
Comment 21 Alexey Proskuryakov 2012-04-27 08:49:24 PDT
Thank you for the investigation! Closing per its results and apparent consensus.