|Optimizing Applications on the Cray X1TM System - S-2315-50|
|Prev Section||Chapter 1. Overview||Next Section|
The flowchart in Figure 1-1 provides a high-level overview of the optimization process:
Measure: Use the tools and methods described in Chapter 2 to measure the current performance of the code as it executes and identify the most significant problem area.
Evaluate: Study the problem area to determine the best method of addressing it.
Apply: Use one of the techniques described in this guide to effect the improvement. Recompile as needed.
Test: Check your answers after each code modification to avoid regression, then measure the performance of the modified code to determine whether any improvement has in fact taken place.
Repeat: Repeat steps 1 through 4 until you are satisfied with the results.
Optimization is an iterative process that requires a lot of recompilation and retesting and has no fixed end point. Nor is performance improvement guaranteed; Cray compilers produce highly optimized code by default and manual intervention may actually produce code that runs more slowly.
Therefore, keep the following principles in mind as you work through the optimization process:
To establish a baseline, always start by measuring the debugged code with the default optimizations.
To save testing and performance measurement time, consider using small sample data sets that exercise all the code within your program.
Note: However, small sample data sets may behave differently than large data sets due to cache. Tuning for a small sample data set will generally save time and be easier to work with, but the resulting optimization may not be optimal for large “real life” data sets. In the case of large data sets, cache optimization can be critical.
Know your programs and concentrate on the optimizations that offer the best net gain. A slightly inefficient subroutine that is executed thousands of times is a better candidate for optimization than an extremely inefficient subroutine that is executed only once.
When possible, optimize program units on an individual basis. An optimization technique that works well for one program unit—for example, aggressive inlining when compiling the object file—may be counterproductive for the next.
Remember, the more aggressive the compiler optimization, the longer the compile time.
Whenever possible, vectorize your code. On Cray X1 systems, vectorized code always performs better than scalar code.
Whenever possible, seek asynchronicity. Asynchronous I/O processes always perform better than synchronous processes; asynchronous parallel processes always perform better than tightly synchronized processes.
In general, you will make the most efficient use of machine time and realize the best performance gains by addressing optimization in this order:
Improve memory usage.
Simplify I/O demands.
Optimize single-processor performance.
Optimize multiple-processor performance.