Android: Benchmarking the benchmarks -

Android: Benchmarking the benchmarks

A lot has beenwritten recently aboutthe practice of boosting benchmarks scoreson Android. Much of what has been going onis without doubt cheating that's not worththe trouble. Now that it has been exposed, let'shope it stops — but I wouldn't hold mybreath.

Some observers ofthis unsavory behavior have concluded thatwe have reached the end of the line forperformance optimizations on Androiddevices. Not so. There are still plenty ofgeneric and platform-specificoptimizations in Android to improve thereal user experience if you know what youare doing. Such improvements are possibleprimarily because Android applications aretypically running on top of a very complexset of libraries, virtual machines, andjust-in-time compilers.

There are actuallyplenty of Android platform and productcompanies who do not engage in pointlessbenchmark-rigging practices. We knowbecause they contractually bind us not todo benchmark “specials” for them.

Any optimizationthat is only invoked when a specificbenchmark is run and that is notaccessible through the normal operation ofthe device is a cheat. Optimizationsimproving the benchmark performance areperfectly acceptable if available anduseful to the general operation of thedevice.

Benchmarks exist notjust to allow comparisons of one devicewith another, but also as metrics of theoverall user experience. There is a truevirtuous circle at work here if everyoneplays the game.

Benchmarks mustrepresent the general characteristics ofreal user activity and genuineoptimizations that impact their scores.They will inevitably also deliver realuser experience benefits. Benchmarkdevelopers must simulate differentcategories of real user activity.Moreover, to keep everyone honest, theyshould ensure that the internal profile ofthe benchmark is pretty flat andrealistic.

The recently updatedAnTuTu benchmark floating point test haschanged from a hot-spot perspective.Version 4 has a much tighter loop kernelwhereby about 80 percent of the runtime isspent in a loops comprising of only 18instructions in total (see graph below).That means it is more open to abuse byoverly specific optimizations that mightnot benefit the real user experience.

Some observerssuggest benchmarks should evolve to bemore difficult to boost, and analystsshould get smarter about detectingcheating. Certainly these things willhappen, but I think this would be a fairlynegative outcome.

Instead I think weshould focus on making that virtuouscircle work: Analyze and promotebenchmarks that encourage genericoptimization as the true measures of anAndroid platform’s performance and thenimplement those real optimizations toprovide genuine differentiation. We can bebetter, can’t we?

The blog has also been published on EETimes.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.