It should use the latest versions of those benchmarks via their official harnesses (so SunSpider 1.0, Kraken 1.1, and V8 v6). That is all.
How long does it take to run those benchmarks? FWIW, we already spend ~1 hour to run the entire perf. tests so ideally, it wouldn't double the cycle time for example. Alternatively, we can disable some of the existing tests if tracking these benchmark score will be more useful.