| Optimizing Application Code on UNICOS Systems | ||
|---|---|---|
| Prev Section | Appendix D. Vector Processing in CF90 Programs | Next Section |
This section discusses the differences between scalar and vector processing.
The following loop adds each element of array KK with the corresponding element of array LL and stores the results in array JJ.
DO 10 I=1,3
JJ(I) = KK(I) + LL(I)
10 CONTINUE |
With scalar processing, this loop performs the following operations (if it is not unrolled):
Read one element of Fortran array KK.
Read one element of LL.
Add the elements.
Write the result to the Fortran array JJ.
Increment the loop index by 1.
Repeat the preceding sequence for each succeeding array element until the loop index equals its limit.
With vector processing, the preceding loop performs the following vector operations:
Load a series of elements from array KK to a vector register and a series of elements from array LL to another vector register (these operations occur simultaneously except for instruction issue time).
Add the corresponding elements from the two vector registers and send the results to another vector register, representing array JJ.
Store the register used for array JJ to memory.
This sequence would be repeated if the array had more elements than the maximum elements used in vector processing.
In vector processing, a change in the order of operations is performed on individual array elements for any loop that includes two separate vectorized operations. For example, the following loop performs two separate additions on arrays.
Example:
DO 10 I=1,3
L(I) = J(I) + K(I)
N(I) = L(I) + M(I)
10 CONTINUE |
With scalar processing, the two statements within this loop are each executed three times, with the two operations alternating: L(I) is calculated before N(I) in each iteration. The new value of L(I) is used to calculate the value of N(I). This order of operations is shown in Table D-1.
Table D-1. Scalar processing order and results
Event | Operation | Values |
|---|---|---|
1 | L(1)=J(1)+K(1) | 7 = 2 + 5 |
2 | N(1)=L(1)+M(1) | 11 = 7 + 4 |
3 | L(2)=J(2)+K(2) | -1 = (-4) + 3 |
4 | N(2)=L(2)+M(2) | 5 = (-1) + 6 |
5 | L(3)=J(3)+K(3) | 15 = 7 + 8 |
6 | N(3)=L(3)+M(3) | 13 = 15 + (-2) |
With vector processing, however, the first line within the loop processes all elements of the array before the second line is executed. The order of operations performed on individual array elements is shown in Table D-2. Notice that this order differs from that shown for scalar processing in Table D-1.
Table D-2. Vector processing order and results
Event | Operation | Values |
|---|---|---|
1 | L(1)=J(1)+K(1) | 7 = 2 + 5 |
2 | L(2)=J(2)+K(1) | -1 = (-4) 3 |
3 | L(3)=J(23)+K(3) | 15 = 7 + 8 |
4 | N(1)=L(1)+M(1) | 11 = 7 + 4 |
5 | N(2)=L(2)+M(2) | 5 = (-1) + 6 |
6 | N(3)=L(3)+M(3) | 13 = 15 + (-2) |
As shown in Table D-1, and Table D-2, results for each array element are equivalent in scalar and vector processing. This equivalence is a fundamental requirement of vector processing.
In the preceding code example, the values calculated on the first line of the loop are used in the operation in the second line. The later use of a result within this loop does not change the results when the order of operations is changed. Within other loops, however, later use of a calculation can cause different final results and will therefore inhibit vectorization; this is known as a data dependence.
| Prev Section | Table of Contents | Title Page | Next Section |
| Vector Processing in CF90 Programs | Up one level | General Requirements for Vectorization |