Designing for performance with software optimization
Optimization of applications should not be a final step in project development, but rather an ongoing design philosophy that begins from the planning stages all the way through completion.
By thinking about the optimal algorithms and data structure designs early on, greater performance benefits can be realized with less work by the time the product is mature.
But, getting into this mindset requires being aware of everything from computational complexity of algorithms, to the data structure design and implementation, and the proper use of parallelization.
Designing for Performance
Software optimization can occur at any time during application development, but the earlier you start, the better. Typically, the later you start optimizing in a development cycle the more the optimizations tend to be narrower in scope and focused more on algorithm implementations and choice of instructions, like adding a few SIMD instructions here and there, instead of broad high-level optimizations.
High-level optimizations include the basic software architecture, key data structures and buffers, algorithms, memory access patterns, and of course parallelism. It is in these high-level optimizations that huge performance improvements can be made for seemingly little effort.
Unless you are writing short applications, it is going to take a ton of more effort to make significant performance improvements at the local level as compared to the high-level or foundation-level.
That old clich of laying a solid foundation is alive and well in software optimization. Before any code is written, think about performance and what foundation-laying things can be done to guarantee a high-performance application.
Designing the performance foundation focuses on the selection and evaluation of the critical algorithms and how those algorithms store data and move it throughout the application.
The goal is to find an algorithm and general architecture that can be easily threaded, scales well to multiple processors, is easy on memory bandwidth, uses the processors' caches efficiently, stores data in formats that are SIMD friendly, isn't limited by some of the slower instructions such as the transcendental functions (e.g. sine, cosine, logarithms, and so forth), and does not bottleneck on an off-motherboard system component like the network or hard disk.
That might sound like a real tough maybe even an impossible job, but remember any extra time spent improving the foundation means less time anguishing over how to make the application faster later when time is fleeting and you only really have time left to optimize a function or two at a local-level.
Good algorithms and data layouts are the foundations of a fast application, and apt use of them can open up many more performance opportunities later on and may even avoid the need to use anything more than the compiler.