On the "index" part of this test (see bug URL), WebKit is much slower than other browsers. On my machine, TOT is even somewhat slower than shipping Safari (~6 seconds vs. ~7 seconds), but I'm hesitant to make the bug P1 because of this. Almost all of the time is spent calling nextSibling(), which is a single memory access instruction, so the test is memory-bound, and it's probably semi-random changes in memory layout that cause this. while (n && pos < index) { n = n->nextSibling(); pos++; } Looking at the assembly, I noticed that gcc 3.3 and gcc 4.0.1 generate slightly different PowerPC code for this loop. That doesn't appear to affect performance much - I made a reduced test, and gcc 4 code was not slower.
Created attachment 14161 [details] easy way out Just add support for iterating backwards. On my machine, results on the "Index" test look OK now (~7000 ms TOT, ~100 ms TOT + this patch, ~120 ms Opera, ~1300 ms Firefox). KHTML also has support for iterating backwards, and according to Maks Orlovich, it was added to fix a real site, forums.tweakers.nl. I've also added support for iteration from lastChild. I still think that there might be a better data structure for a list of child nodes in ContainerNode, and that would speed up other operations, too. Obviously, performance of data structures is always a trade-off, so it would probably take someone with access to PLT to play with it.
Comment on attachment 14161 [details] easy way out + unsigned dist = (unsigned)abs((long)index - (long)m_caches->lastItemOffset); Since the parameter to abs is an int, the cast here should be to int, not long. But also no casts are needed. If you do unsigned - unsigned, then pass to int, and then extract it into unsigned it will do the right thing with no casts at all. Otherwise looks fine. r=me
Ah, I got confused by std::abs from cstdlib. Hoping MSVC won't be confused similarly :-). Committed revision 21090.