I noticed a major slow-down for sunspider between r35202-r35208
Linux, Qt-4.3.4, gcc-4.1.2 (20061115)
AMD opteron, Dual-core 2.2 GHz (dedicated for benchmark measurement)
35033: 3168.0ms +/- 0.8%
35123: 3172.0ms +/- 0.7%
35170: 3197.8ms +/- 0.9%
35190: 3191.5ms +/- 0.6%
35202: 3193.7ms +/- 0.7%
<<< HERE ~20% slow >>>
35208: 3782.8ms +/- 0.6%
35231: 3779.2ms +/- 0.6%
Created attachment 22366 [details]
Detailed sunspider results (.tgz)
Could you attach all of the additional tests you're running with SunSpider?
I don't see a negative impact on performance from r35203 on my machine, a 2.16 GHz MacBook Pro with Apple GCC 4.0.
Can you verify that the ALWAYS_INLINE macro works in your port?
(In reply to comment #6)
> Can you verify that the ALWAYS_INLINE macro works in your port?
Yes, both ALWAYS_INLINE and NEVER_INLINE works (I checked an assembly dump)
Maybe it was not clear, but our extended benchmark set is separated from Sunspider. Those tests are not finalized and may changed in the near future. However, if you are still interested, I can attach them.
Is it possible, that the AMD cpu, which has a slower fpu than the intel counterpart, is responsible for the slow? Maybe I should try out my binaries on other computers.
I simply copied my binaries to a P4 2.40 GHz (jsc-35202, jsc-35208 and libQtCore.so.4 because it was missing from that system), and it seems the results are the same as on the other archtecture.
By the way, the size is also increased:
jsc-35202 - 1820153
jsc-35208 - 1878409
(In reply to comment #7)
> Maybe it was not clear, but our extended benchmark set is separated from
> Sunspider. Those tests are not finalized and may changed in the near future.
> However, if you are still interested, I can attach them.
Please attach them. The only way to reproduce the results on other platforms is to use the same tests.
You should also try the original SunSpider tests to see how they perform on your platform, e.g., if you see a slowdown or a speedup.
I am sorry, it seems it was a bad thing to mention the other benchmark set. Naturally, the results I sent you are measured by the original sunsipder tests without any kind of tweaks.
In the meantime, I have compiled the 35203 revision as well. Now I can enclose the comparistation between r35202 and r35203.
The worst case:
access-fannkuch: *1.67x as slow*
The reason why I mentioned bubble sort, because it was more than 70% slower, and that is bigger than access-fannkuch.
The file sizes as also interesting:
jsc-35123: 1814130 bytes
jsc-35170: 1820097 bytes
jsc-35190: 1820143 bytes
jsc-35202: 1820153 bytes
<< BIG JUMP >>
jsc-35203: 1878409 bytes
jsc-35208: 1878409 bytes (you mentionad there was no changes here)
jsc-35231: 1862421 bytes (little better)
Created attachment 22382 [details]
compare r35202 and r35203
It may be worth testing a different compiler version or a different port to see what result you get. The GTK port should compile and run on Linux, and since it uses a different build system it may give different performance figures. it would be interesting to see whether it also shows the same regression.
I downloaded WebKit to my home computer:
Core 2 Duo, Ubuntu-Linux, GTK port, gcc-4.2.1
sunspider r35202 - 2614.9ms +/- 0.2%
sunspider r35203 - 2601.3ms +/- 0.2%
It seems a little bit faster!
To my shock, the jsc binaries are only 63% of the QT port:
r35202: GTK: 1147693 bytes, QT: 1820153 bytes
r35203: GTK: 1168479 bytes, QT: 1878409 bytes
Do you have any idea? Is the QT port causes the difference?
Created attachment 22392 [details]
Additional bubbleSort.js test metioned in Zoltan's email to webkit-dev.
I have successfully compiled the QT port on my home computer:
sunspider: 2665.1ms +/- 0.3%
binary size: 1841368
sunspider: 3238.2ms +/- 0.2%
binary size: 1904808
As you can see, this is exactly the same thing what we saw before! So this thing is QT port specific.
These are the CFLAGS for the QT-port:
g++ [...] -pipe -Wreturn-type -fno-strict-aliasing -O2
-fvisibility=hidden -fvisibility-inlines-hidden [...]
I will try to change these to -O3 and -f-strict-aliasing, and will send you the results.
Copying some Qt folks regarding Comment #17.
(In reply to comment #17)
> It seems the QT port applies the same CXXFLAGS to all projects. We only need
> other idea (perhaps change the global CXXFLAGS)?
QMAKE_CXXFLAGS_RELEASE ~= s/-O2/-O3 -fstrict-aliasing/
It is almost certainly unsafe for WebCore to be built with strict aliasing enabled.
> QMAKE_CXXFLAGS_RELEASE ~= s/-O2/-O3 -fstrict-aliasing/
I did something similar, just to see what happend with jsc. The resulted binary was a bit bigger, and a bit faster than the version with -O2, but the difference is small.
I looked at the resulting binaries, and there are strange strings in QT-r35203 (-O2 and -O3 as well). They seems kinda debug info :
[...]void WTF::HashTable<Key, Value, Extractor, HashFunctions, Traits, KeyTraits>::checkKey(const T&) [with T = KJS::UString::Rep*, HashTranslator = WTF::RefPtrHashMapRawKeyTranslator<KJS::UString::Rep*, std::pair<WTF::RefPtr<KJS::UString::Rep>, KJS::SymbolTableEntry>, WTF::PairHash[...]
These lines are neither in QT-r35202 nor in GTK binaries. Is it possible that some recently triggered macros made these lines?
(In reply to comment #22)
> We belive the jsc always compiles in debug mode for QT (and the asserts in
> Registers.h slows down the sunspider). It seems
If you build with build-webkit --release for example then jsc is built in release mode.
Created attachment 22404 [details]
Add missing NDEBUG
The problem was that the jsc.pro was not contain any release specific rule. So there was no chance to pass the NDEBUG define to the compiler.
The patch contains the similar rule which is in WebKit.pri file.
Note: I have to include USE_SYSTEM_MALLOC define in case of release, because there was linking problem (WebKit.pri uses USE_SYSTEM_MALLOC in case of debug also).
Does the patch address the performance issue?
(In reply to comment #25)
> Does the patch address the performance issue?
Of course. Here is the SunSpider result:
** TOTAL **: 1.37x as fast 2897.3ms +/- 1.9% 2107.5ms +/- 2.6% significant
Here is the summarization of the above bug in one sentence:
In the QT jsc build, the jsc.pro qmake file never defines the NDEBUG symbol, so the releases contains all asserts and other debug related codes, which heavily slows down all measurements.
Comment on attachment 22404 [details]
Add missing NDEBUG
Patch looks good to me
$ git svn dcommit
Committing to http://svn.webkit.org/repository/webkit/trunk ...