Bug 20097 - [Qt] 20% Sunspider slow-down
Summary: [Qt] 20% Sunspider slow-down
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords: Qt
Depends on:
Blocks:
 
Reported: 2008-07-18 11:19 PDT by Zoltan Herczeg
Modified: 2009-04-20 13:30 PDT (History)
12 users (show)

See Also:


Attachments
Detailed sunspider results (.tgz) (3.81 KB, application/octet-stream)
2008-07-18 11:25 PDT, Zoltan Herczeg
no flags Details
compare r35202 and r35203 (3.64 KB, text/plain)
2008-07-19 00:06 PDT, Zoltan Herczeg
no flags Details
bubbleSort.js (1.79 KB, text/javascript)
2008-07-20 11:01 PDT, David Kilzer (:ddkilzer)
no flags Details
Add missing NDEBUG (941 bytes, patch)
2008-07-21 08:22 PDT, Gabor Loki
hausmann: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zoltan Herczeg 2008-07-18 11:19:05 PDT
I noticed a major slow-down for sunspider between r35202-r35208

Repository :
Root: http://svn.webkit.org/repository/webkit
UUID: 268f45cc-cd09-0410-ab3c-d52691b4dbfc

System:
Linux, Qt-4.3.4, gcc-4.1.2 (20061115)

CPU:
AMD opteron, Dual-core 2.2 GHz (dedicated for benchmark measurement)

Revision:
35033: 3168.0ms +/- 0.8%
35123: 3172.0ms +/- 0.7%
35170: 3197.8ms +/- 0.9%
35190: 3191.5ms +/- 0.6%
35202: 3193.7ms +/- 0.7%
<<< HERE ~20% slow >>>
35208: 3782.8ms +/- 0.6%
35231: 3779.2ms +/- 0.6%
Comment 1 Zoltan Herczeg 2008-07-18 11:25:25 PDT
Created attachment 22366 [details]
Detailed sunspider results (.tgz)
Comment 2 Alexey Proskuryakov 2008-07-18 11:34:56 PDT
The only JavaScriptCore change in this range is <http://trac.webkit.org/projects/webkit/changeset/35203>.
Comment 3 David Kilzer (:ddkilzer) 2008-07-18 13:13:03 PDT
Could you attach all of the additional tests you're running with SunSpider?
Comment 4 Cameron Zwarich (cpst) 2008-07-18 14:19:46 PDT
I don't see a negative impact on performance from r35203 on my machine, a 2.16 GHz MacBook Pro with Apple GCC 4.0.
Comment 5 Mark Rowe (bdash) 2008-07-18 14:31:22 PDT
I see r35203 as being around 2% faster than r35202 on my Mac with GCC 4.0.  If I force JavaScriptCore to build with GCC 4.2, I see r35203 as being around 2% slower than r35202.

This regression may be specific to either your compiler version or perhaps even to the manner in which the Qt port builds JavaScriptCore.
Comment 6 Geoffrey Garen 2008-07-18 14:36:17 PDT
Can you verify that the ALWAYS_INLINE macro works in your port?
Comment 7 Zoltan Herczeg 2008-07-18 15:06:45 PDT
(In reply to comment #6)
> Can you verify that the ALWAYS_INLINE macro works in your port?

Yes, both ALWAYS_INLINE and NEVER_INLINE works (I checked an assembly dump)

Maybe it was not clear, but our extended benchmark set is separated from Sunspider. Those tests are not finalized and may changed in the near future. However, if you are still interested, I can attach them.

Is it possible, that the AMD cpu, which has a slower fpu than the intel counterpart, is responsible for the slow? Maybe I should try out my binaries on other computers.
Comment 8 Zoltan Herczeg 2008-07-18 15:31:41 PDT
I simply copied my binaries to a P4 2.40 GHz (jsc-35202, jsc-35208 and libQtCore.so.4 because it was missing from that system), and it seems the results are the same as on the other archtecture. 

By the way, the size is also increased:
jsc-35202 - 1820153
jsc-35208 - 1878409
Comment 9 David Kilzer (:ddkilzer) 2008-07-18 18:02:35 PDT
(In reply to comment #7)
> Maybe it was not clear, but our extended benchmark set is separated from
> Sunspider. Those tests are not finalized and may changed in the near future.
> However, if you are still interested, I can attach them.

Please attach them.  The only way to reproduce the results on other platforms is to use the same tests.

You should also try the original SunSpider tests to see how they perform on your platform, e.g., if you see a slowdown or a speedup.
Comment 10 Zoltan Herczeg 2008-07-19 00:04:52 PDT
I am sorry, it seems it was a bad thing to mention the other benchmark set. Naturally, the results I sent you are measured by the original sunsipder tests without any kind of tweaks.

In the meantime, I have compiled the 35203 revision as well. Now I can enclose the comparistation between r35202 and r35203.

The worst case: 
access-fannkuch: *1.67x as slow*

The reason why I mentioned bubble sort, because it was more than 70% slower, and that is bigger than access-fannkuch.

The file sizes as also interesting:
jsc-35123: 1814130 bytes
jsc-35170: 1820097 bytes
jsc-35190: 1820143 bytes
jsc-35202: 1820153 bytes
<< BIG JUMP >>
jsc-35203: 1878409 bytes
jsc-35208: 1878409 bytes (you mentionad there was no changes here)
jsc-35231: 1862421 bytes (little better)
Comment 11 Zoltan Herczeg 2008-07-19 00:06:02 PDT
Created attachment 22382 [details]
compare r35202 and r35203
Comment 12 Mark Rowe (bdash) 2008-07-19 00:18:38 PDT
It may be worth testing a different compiler version or a different port to see what result you get.  The GTK port should compile and run on Linux, and since it uses a different build system it may give different performance figures.  it would be interesting to see whether it also shows the same regression.
Comment 13 Zoltan Herczeg 2008-07-19 05:55:28 PDT
I downloaded WebKit to my home computer:
Core 2 Duo, Ubuntu-Linux, GTK port, gcc-4.2.1

sunspider r35202 -  2614.9ms +/- 0.2%
sunspider r35203 -  2601.3ms +/- 0.2%

It seems a little bit faster!

To my shock, the jsc binaries are only 63% of the QT port:
r35202: GTK: 1147693 bytes, QT: 1820153 bytes
r35203: GTK: 1168479 bytes, QT: 1878409 bytes

Do you have any idea? Is the QT port causes the difference?
Comment 14 Mark Rowe (bdash) 2008-07-19 22:32:41 PDT
I would look into the compiler flags that are used when building JavaScriptCore.  Is the Qt build system building JavaScriptCore with "-fstrict-aliasing"?  Are they both using "-O3"?  
Comment 15 David Kilzer (:ddkilzer) 2008-07-20 11:01:17 PDT
Created attachment 22392 [details]
bubbleSort.js

Additional bubbleSort.js test metioned in Zoltan's email to webkit-dev.
Comment 16 Zoltan Herczeg 2008-07-20 14:24:26 PDT
I have successfully compiled the QT port on my home computer:

QT-port-35202:
  sunspider: 2665.1ms +/- 0.3%
  binary size: 1841368

QT-port-35203:
  sunspider: 3238.2ms +/- 0.2%
  binary size: 1904808

As you can see, this is exactly the same thing what we saw before! So this thing is QT port specific.

These are the CFLAGS for the QT-port:
g++ [...] -pipe -Wreturn-type -fno-strict-aliasing -O2
    -fvisibility=hidden -fvisibility-inlines-hidden [...]

I will try to change these to -O3 and -f-strict-aliasing, and will send you the results.
Comment 17 Zoltan Herczeg 2008-07-21 01:59:09 PDT
It seems the QT port applies the same CXXFLAGS to all projects. We only need -O3 and -fstrict-aliasing flags for JavaScriptCore. Is there a simple way to do that, or do I need to write a new "GENERATOR" rule in JavaScriptCore.pri? Any other idea (perhaps change the global CXXFLAGS)?
Comment 18 David Kilzer (:ddkilzer) 2008-07-21 02:26:23 PDT
Copying some Qt folks regarding Comment #17.

Comment 19 Simon Hausmann 2008-07-21 03:54:56 PDT
(In reply to comment #17)
> It seems the QT port applies the same CXXFLAGS to all projects. We only need
> -O3 and -fstrict-aliasing flags for JavaScriptCore. Is there a simple way to do
> that, or do I need to write a new "GENERATOR" rule in JavaScriptCore.pri? Any
> other idea (perhaps change the global CXXFLAGS)?

With the way JavaScriptCore is built into the same library the CXXFLAGS are the same for the entire QtWebKit library. But for testing purposes you could try something along the lines of

QMAKE_CXXFLAGS_RELEASE ~= s/-O2/-O3 -fstrict-aliasing/

in JavaScriptCore.pri or WebCore.pro.

Comment 20 Mark Rowe (bdash) 2008-07-21 04:26:30 PDT
It is almost certainly unsafe for WebCore to be built with strict aliasing enabled. 
Comment 21 Zoltan Herczeg 2008-07-21 05:06:48 PDT
> QMAKE_CXXFLAGS_RELEASE ~= s/-O2/-O3 -fstrict-aliasing/

I did something similar, just to see what happend with jsc. The resulted binary was a bit bigger, and a bit faster than the version with -O2, but the difference is small.

I looked at the resulting binaries, and there are strange strings in QT-r35203 (-O2 and -O3 as well). They seems kinda debug info :

[...]void WTF::HashTable<Key, Value, Extractor, HashFunctions, Traits, KeyTraits>::checkKey(const T&) [with T = KJS::UString::Rep*, HashTranslator = WTF::RefPtrHashMapRawKeyTranslator<KJS::UString::Rep*, std::pair<WTF::RefPtr<KJS::UString::Rep>, KJS::SymbolTableEntry>, WTF::PairHash[...]

These lines are neither in QT-r35202 nor in GTK binaries. Is it possible that some recently triggered macros made these lines?

Comment 22 Zoltan Herczeg 2008-07-21 07:20:35 PDT
We belive the jsc always compiles in debug mode for QT (and the asserts in Registers.h slows down the sunspider). It seems WebKit/JavaScriptCore/kjs/jsc.pro does not have 'release' mode.
Comment 23 Simon Hausmann 2008-07-21 07:28:14 PDT
(In reply to comment #22)
> We belive the jsc always compiles in debug mode for QT (and the asserts in
> Registers.h slows down the sunspider). It seems
> WebKit/JavaScriptCore/kjs/jsc.pro does not have 'release' mode.

If you build with build-webkit --release for example then jsc is built in release mode.

If you run just qmake in JavaScriptCore/kjs then qmake will pick whatever Qt was built with by default, which would normally be a release build, unless you configure your version of Qt with --debug.
Comment 24 Gabor Loki 2008-07-21 08:22:56 PDT
Created attachment 22404 [details]
Add missing NDEBUG

The problem was that the jsc.pro was not contain any release specific rule. So there was no chance to pass the NDEBUG define to the compiler.

The patch contains the similar rule which is in WebKit.pri file.

Note: I have to include USE_SYSTEM_MALLOC define in case of release, because there was linking problem (WebKit.pri uses USE_SYSTEM_MALLOC in case of debug also).
Comment 25 Mark Rowe (bdash) 2008-07-21 08:27:17 PDT
Does the patch address the performance issue?
Comment 26 Gabor Loki 2008-07-21 12:01:28 PDT
(In reply to comment #25)
> Does the patch address the performance issue?

Of course. Here is the SunSpider result:
** TOTAL **: 1.37x as fast 2897.3ms +/- 1.9%  2107.5ms +/- 2.6% significant
Comment 27 Zoltan Herczeg 2008-07-21 23:43:05 PDT
Here is the summarization of the above bug in one sentence:

In the QT jsc build, the jsc.pro qmake file never defines the NDEBUG symbol, so the releases contains all asserts and other debug related codes, which heavily slows down all measurements.
Comment 28 Simon Hausmann 2008-07-22 00:27:45 PDT
Comment on attachment 22404 [details]
Add missing NDEBUG

Patch looks good to me
Comment 29 David Kilzer (:ddkilzer) 2008-07-23 08:00:33 PDT
$ git svn dcommit
Committing to http://svn.webkit.org/repository/webkit/trunk ...
        M       JavaScriptCore/ChangeLog
        M       JavaScriptCore/kjs/jsc.pro
Committed r35294