2.20. -O opt[,opt] ...

The -O opt option specifies optimization features. You can specify more than one -O option, with accompanying arguments, on the command line. If specifying more than one argument to -O, separate the individual arguments with commas and do not include intervening spaces.

Note: The -e o option or the ftnlx command displays all the optimization options the compiler uses at compile time.

The -O 0, -O 1, -O 2, and -O 3 options allow you to specify a general level of optimization that includes vectorization, scalar optimization, inlining, and streaming. Generally, as the -O level increases, compilation time increases and execution time decreases.

The -O 1, -O 2, and -O 3 specifications do not directly correspond to the numeric optimization levels for scalar optimization, vectorization, inlining, and streaming. For example, specifying -O 3 does not necessarily enable scalar3 and vector3. Cray reserves the right to alter the specific optimizations performed at these levels from release to release.

The other optimization options, such as -O aggress and -O recurrence, control pattern matching, zero incrementing, and several other optimization features. Some of these features can also be controlled through compiler directives.

Figure 2-1 shows the relationships between some of the -O opt values.

Figure 2-1. Optimization Values

2.20.1. -O n

The -O 0n option performs general optimization at these levels: 0 (none), 1 (conservative), 2 (moderate, default), and 3 (aggressive).

2.20.2. -O aggress, -O noaggress

The -O aggress option causes the compiler to treat a program unit (for example, a subroutine or a function) as a single optimization region. Doing so can improve the optimization of large program units by raising the limits for internal tables, which increases opportunities for optimization. This option increases compile time and size.

The default is -O noaggress.

2.20.3. -O clonen

The Cray Fortran Compiler supports the following command line options to control cloning:

Cloning is the attempt to duplicate a procedure under certain conditions and replace dummy arguments with associated constant actual arguments throughout the cloned procedure. The compiler will attempt to clone a procedure when a call site contains actual arguments that are scalar integer and/or scalar logical constants. When the constants are exposed to the optimizer, it can generate more efficient code.

Note: Do not specify the -O inlinefrom= option when using the cloning option.

The cloning option works in conjunction with the -Oinlinen option.

The compiler will first attempt to inline a call site. If inlining the call site fails, the compiler will attempt to clone the procedure. Cloning is attempted when inlining fails for any of these reasons:

When a clone is made, dummy arguments that have scalar integer and/or scalar logical constant actual arguments associated with them are replaced with the constant value throughout the routine. The following example shows cloning in action:

PROGRAM TEST
INTEGER I
LOGICAL L

L = .FALSE.

DO J = 1,10
   CALL SAM(4, L)    ! Call site with a constant
ENDDO

CALL SAM(3, .TRUE.)  ! Call site with constants

END

SUBROUTINE SAM(I, L)
INTEGER I
LOGICAL L

IF (L) THEN
   PRINT *, I
ENDIF
END

Compiling the previous program with the -O clone1 and -Oinline2 options, the compiler produces the following program:

PROGRAM TEST
INTEGER I 
LOGICAL L 

L = .FALSE.  

DO J = 1,10 
   CALL SAM(4, L)       ! This call was inlined because it is in the. 
ENDDO                   ! body of a DO loop

CALL SAM@1(3, .TRUE.)   ! This is a call to a clone of SAM.
END  

! Original Subroutine
SUBROUTINE SAM(I, L)
INTEGER I 
LOGICAL L 

IF (L) THEN
   PRINT *, I 
ENDIF 
END  

! Cloned subroutine
SUBROUTINE SAM@1(I, L) 
INTEGER I 
LOGICAL L 

IF (.TRUE.) THEN         ! The optimizer will eliminate this IF test
   PRINT *, 3
ENDIF 
END

2.20.4. -O fpn

The -O fp option offers finer control over floating-point optimizations than the -O [no]ieeeconform option. The n argument controls the level of allowable optimization; 0 gives the compiler minimum freedom to optimize floating-point operations, while 3 gives it maximum freedom. The higher the level, the lesser the floating-point operations conform to the IEEE standard.

This option is useful for code that use unstable algorithms, but which are optimizable. It is also useful for applications that want aggressive floating-point optimizations that go beyond what the Fortran standard allows.

The -O [no]ieeeconform and -O fp options can be specified on the same compiler command line, but the compiler will use only the rightmost option. If this is the case or multiple -O fp are used, the compiler issues a message indicating such.

Table 2-1 compares the various optimization levels of the -O fp option (levels 2 and 3 are usually the same). The table lists some of the optimizations performed; the compiler may perform other optimizations not listed.

Table 2-1. Floating-point Optimization Levels

Optimization Type

0

1

2

3

Inline selected mathematical library functions

N/A

N/A

N/A

Accuracy is slightly reduced.

Complex divisions accuracy and calculation speed

Accurate and slower

Accurate and slower

Less accurate (less precision) and faster.

Less accurate (less precision) and faster.

Exponentiation rewrite[1]

None

Fast

Maximum performance

Maximum performance

Strength reduction

Fast

Fast

Aggressive

Aggressive

Rewrite division as reciprocal equivalent [2]

None

None

Yes

Yes

Safety

Maximum

Moderate

Moderate

Low

Optimizations

Same effect as -O ieeeconform. The -O fp0 option causes your program's executable code to conform more closely to the IEEE floating-point standard than the default mode. When specified, many identity optimizations are disabled, executable code is slower than higher floating-point optimization levels, and a scaled complex divide mechanism is enabled that increases the range of complex values that can be handled without producing an underflow.

Performs various, generally safe, non-conforming IEEE optimizations, such as folding A == A to .TRUE.. where A is a floating point object.

Includes optimizations of -O fp1.

Includes optimizations of -O fp1. Equivalent to the -O noieeeconform option.

When to use

The-O fp0 option should never be used, except when your code pushes the limits of IEEE accuracy, or require strong IEEE standard conformance.

The -O fp1 options should never be used, except when your code pushes the limits of IEEE accuracy, or require strong IEEE standard conformance.

 

The -O fp3 option should be used when performance is more critical than the level of IEEE standard conformance provided by -O fp2.

Default -O fp2

2.20.5. -O ieeeconform, -O noieeeconform

The -O ieeeconform option causes your program's executable code to conform more closely to the IEEE floating-point standard than the default mode. When specified, many identity optimizations are disabled, executable code is slower, and a scaled complex divide mechanism is enabled that increases the range of complex values that can be handled without producing an underflow.

The -O noieeeconform option causes the compiler to optimize expressions such as X.NE.X to false and X/X to 1, where X is a floating-point value. With -O noieeeconform in effect, these and other similar arithmetic identity optimizations are performed.

The default is -O noieeeconform.

2.20.6. -O fusionn

The -O fusionn option globally controls loop fusion and changes the assertiveness of the FUSION directive. Loop fusion can improve the performance of loops and in rare cases degrade performance.

The n argument allows you to turn loop fusion on or off and determine where fusion should occur. It also affects the assertiveness of the FUSION directive. Use one of these values for n:

0 

No fusion (ignore all FUSION directives and do not attempt to fuse other loops)

1 

Attempt to fuse loops that are marked by the FUSION directive.

2 (default) 

Attempt to fuse all loops (includes array syntax implied loops), except those marked with the NOFUSION directive.

For more information about loop fusion, see Optimizing Applications on the Cray X1 System.

2.20.7. -O gen_private_callee

The -O gen_private_callee option is used when compiling source files containing subprograms which will be called from streamed regions, whether those streamed regions are created by Cray streaming directives (CSDs), or by the use of the SSP_PRIVATE directive to cause autostreaming.

Refer to Section 4.4 for information about CSDs or to Section 4.3.2 for information about the SSP_PRIVATE directive.

2.20.8. -O infinitevl, -O noinfinitevl

The -O infinitevl option assumes that the safe vector length is infinite for IVDEP directives without the SAFEVL clause. The -O noinfinitevl option assumes the safe vector length is the maximum vector length supported by the target for IVDEP directives without the SAFEVL or INFINITEVL clause.

Refer to Section 4.2.2 for more information about the INFINITEVL and SAFEVL clause.

The default is -O infinitevl.

2.20.9. -O inlinen and -O inlinefrom=source[:source] ...

Inlining is the process of replacing a user procedure call with the procedure definition itself. This saves subprogram call overhead and may allow better optimization of the inlined code. If all calls within a loop are inlined, the loop becomes a candidate for vectorization or streaming. The Cray Fortran Compiler supports the following command line options for controlling inlining:

The following conditions inhibit inlining:

These inlining modes are invoked when various combinations of -O inline and/or -O inlinefrom= exits:

2.20.10. -O modinline, -O nomodinline

The -O modinline option prepares module procedures so they can be inlined by directing the compiler to create templates for module procedures encountered in a module. These templates are attached to file.o or modulename.mod. The files that contain these inlinable templates can be saved and used later to inline call sites within a program being compiled.

When -e m is in effect, module information is stored in modname.mod. The compiler writes a modulename.mod file for each module; modulename is created by taking the name of the module and, if necessary, converting it to uppercase.

The process of inlining module procedures requires only that file.o or modulename.mod be available during compilation through the typical module processing mechanism. The USE statement makes the templates available to the inliner.

When -O modinline is specified, the MODINLINE and NOMODINLINE directives are recognized. Using the -O modinline option increases the size of file.o. The default is -O nomodinline.

To ensure that file.o is not removed, specify this option in conjunction with the -c option. For information on the -c option, see Section 2.4.

Note: This option cannot be specified in conjunction with the -O inlinefrom=source or -O inlinen options.

2.20.11. -O msgs, -O nomsgs

The -O msgs option causes the compiler to write optimization messages to stderr. These messages include VECTOR, SCALAR, INLINE and STREAM messages.

When the -O msgs option is in effect, you may request that a listing be produced so that you can see the optimization messages in the listing. For information on obtaining listings, see Section 2.23.

The default is -O nomsgs.

2.20.12. -O msp

The -O msp option causes the compiler to generate code and to select the appropriate libraries to create an executable that runs on one or more multistreaming processors (MSPs). This is called MSP mode. Any code, including Cray distributed memory models, can use MSP mode.

Executables compiled for MSP mode can contain object files compiled with SSP or MSP mode. That is, SSP and MSP object files can be specified during the load step as follows:


ftn -O msp -c ...           !Produce MSP object files
ftn -O ssp -c ...           !Produce SSP object files
ftn sspA.o sspB.o msp.o ... !Link MSP and SSP object files 
                            !to create an executable to run on MSPs

Note: Code explicitly compiled with the -O stream0 option can be linked with object files compiled with SSP or MSP mode. You can use this option to create a universal library that can be used in SSP or MSP mode.

For more information about SSP and MSP mode, refer to the Optimizing Applications on the Cray X1 System manual.

This option is on by default.

2.20.13. -O negmsgs, -O nonegmsgs

The -O negmsgs option causes the compiler to generate messages to stderr that indicate why optimizations such as vectorization or streaming did not occur in a given instance.

The -O negmsgs option enables the -O msgs option. The -rm option enables the -O negmsgs option.

The default is -O nonegmsgs.

2.20.14. -O nointerchange

The -O nointerchange option inhibits the compiler's attempts to interchange loops. Interchanging loops by having the compiler replace an inner loop with an outer loop can increase performance. The compiler performs this optimization by default.

Specifying the -O nointerchange option is equivalent to specifying a NOINTERCHANGE directive prior to every loop. To disable loop interchange on individual loops, use the NOINTERCHANGE directive. For more information on the NOINTERCHANGE directive, see Section 4.6.1.

2.20.15. -O overindex, -O nooverindex

The -O nooverindex option declares that there are no array subscripts which index a dimension of an array that are outside the declared bounds of that dimension. Short loop code generation occurs when the extent does not exceed the maximum vector length of the machine.

Specifying -O overindex declares that the program contains code that makes array references with subscripts that exceed the defined extents. This prevents the compiler from performing the short loop optimizations described in the preceding paragraph.

Overindexing is nonstandard, but it compiles correctly as long as data dependencies are not hidden from the compiler. This technique collapses loops; that is, it replaces a loop nest with a single loop. An example of this practice is as follows:

DIMENSION A(20, 20)
DO I = 1, N
   A(I, 1) = 0.0
END DO

Assuming that N equals 400 in the previous example, the compiler can generate more efficient code than a doubly nested loop. However, incorrect results can occur in this case if -O nooverindex is in effect.

You do not need to specify -O overindex if the overindexed array is a Cray pointee, has been equivalenced, or if the extent of the overindexed dimension is declared to be 1 or *. In addition, the -O overindex option is enabled automatically for the following extension code, where the number of subscripts in an array reference is less than the declared number:

DIMENSION A(20, 20)
DO I = 1, N
   A(I) = 0.0  ! 1-dimension reference;
               ! 2-dimension array
END DO

Note: The -O overindex option is used by the compiler for detection of short loops and subsequent code scheduling. This allows manual overindexing as described in this section, but it may have a negative performance effect because of fewer recognized short loops and more restrictive code scheduling. In addition, the compiler continues to assume, by default, a standard-conforming user program that does not overindex when doing dependency analysis for other loop nest optimizations.

The default is -O nooverindex.

2.20.16. -O pattern, -O nopattern

The -O pattern option enables pattern matching for library substitution. The pattern matching feature searches your code for specific code patterns and replaces them with calls to scientific library routines. The scientific library used is libsci.a. These routines are highly optimized.

The -O pattern option is enabled only for optimization levels -O 2, -O vector2 or higher; there is no way to force pattern matching for lower levels.

Only PE-private data is supported.

Specifying -O nopattern disables pattern matching and causes the compiler to ignore the PATTERN and NOPATTERN directives. For information on the PATTERN and NOPATTERN directives, see Section 4.2.4.

The default is -O pattern.

2.20.17. -O recurrence, -O norecurrence

The -O recurrence option enables vectorization for all reduction loops that return different results from the scalar version, due to a reassociation of operations. A reduction loop is a loop that contains at least one statement that reduces an array to a scalar value by doing a cumulative operation on many of the array elements. This involves including the result of the previous iteration in the expression of the current iteration.

The default is -O recurrence. This feature is also available through compiler directives; for more information, see Section 4.2.7.

2.20.18. -O scalarn

The -O scalarn option specifies these levels of scalar optimization:

2.20.19. -O shortcircuitn

The -O shortcircuitn option specify various levels of short circuit evaluation. Short circuit evaluation is an optimization in which the compiler analyzes all or part of a logical expression based on the results of a preliminary analysis. When short circuiting is enabled, the compiler attempts short circuit evaluation of logical expressions that are used in IF statement scalar logical expressions. This evaluation is performed on the .AND. operator and the .OR. operator.

Example 1: Assume the following logical expression:

operand1 .AND. operand2

The operand2 need not be evaluated if operand1 is false because in that case, the entire expression evaluates to false. Likewise, if operand2 is false, operand1 need not be evaluated.

Example 2: Assume the following logical expression:

operand1 .OR. operand2

The operand2 need not be evaluated if operand1 is true because in that case, the entire expression evaluates to true. Likewise, if operand2 is true, operand1 need not be evaluated.

The compiler performs short circuit evaluation in a variety of ways, based on the following command line options:

2.20.20. (Deferred implementation) -O ssp

The -O ssp option causes the compiler to compile the source code and select the appropriate libraries to create an executable that runs on one single-streaming processor (SSP mode). Any code, including Cray distributed memory models, can use SSP mode.

Executables compiled for SSP mode can contain only object files compiled in SSP mode. When loading object files separately from the compile step, the SSP mode must be specified during the load step as this example shows:

ftn -O ssp -c ...  !Produce SSP object files
ftn -O ssp sspA.o sspB.o  ... !Link SSP object files 
                              !to create an executable to run on a single SSP

Since SSP mode does not use streaming, the compiler automatically specifies the -O stream0 option. This option also causes the compiler to ignore CSDs.

Note: Code explicitly compiled with the -O stream0 option can be linked with object files compiled with SSP or MSP mode. You can use this option to create a universal library that can be used in SSP or MSP mode.

For more information about SSP and MSP mode, refer to the Optimizing Applications on the Cray X1 System manual.

This option is off by default.

2.20.21. -O streamn

The -O streamn option controls the multistreaming when multistreaming is enabled. These levels can be set to no multistreaming optimization, at -O stream0, to aggressive multistreaming optimization at -O stream3. Generally, vectorized applications that execute on a one-processor system can expect to execute up to four times faster on a processor with multistreaming enabled.

At the default streaming level, -O stream2, the four processors SSP0, SSP1, SSP2, and SSP3 may be used by the code generated by the Fortran compiler. Automatic streaming can be turned off by using the -O stream0 option. This does not mean that SSP1, SSP2, and SSP3 are not used during execution. These processors can still be used at times by the library routines called by the generated code. At times, the library routines may park (suspend) the SSP1, SSP2, and SSP3 processors. These SSPs are not available for other executables while code compiled with the stream0 option enabled is executing.

The MSP optimization levels assume that certain scalar and vectorization optimization levels are also specified. If incompatible optimization levels are specified, the compiler adjusts the optimization levels used and issues a message. The various MSP optimization levels and their compatibilities with other optimizations are as follows:

For information about MSP directives, see Section 4.3. For information on optimizing with MSP, see Optimizing Applications on the Cray X1 System. For more information about the effects the streaming option has on BMM operators, refer to the bmm(3i) man page.

2.20.22. -O task0, (Deferred implementation) -O task1

The -O task0 option disables tasking. Characteristics include low compile time and size. OpenMP directives are ignored.

The -O task0 option is compatible with all vectorization and scalar optimization levels.

The -O task1 option specifies user tasking, so OpenMP directives are recognized.

Characteristics include low compile time and size. No level for scalar optimization is enabled automatically.

The -O task1 option is compatible with all vectorization and scalar optimization levels.

The default is -O task0.

2.20.23. (Deferred implementation) -O threshold, -O nothreshold

The -O threshold option generates a run time threshold test to determine whether there is sufficient work in a loop nest before multistreaming is attempted. Multistreaming must be enabled for this directive to take effect. Multistreaming is enabled when the -O n option is specified and n is greater than or equal to 1.

The default is -O threshold.

2.20.24. -O unrolln

The -O unrolln option globally controls loop unrolling and changes the assertiveness of the UNROLL directive. By default, the compiler attempts to unroll all loops, unless the NOUNROLL directive is specified for a loop. Generally, unrolling loops increases single processor performance at the cost of increased compile time and code size.

The n argument allows you to turn loop unrolling on or off and determine where unrolling should occur. It also affects the assertiveness of the UNROLL directive. Use one of these values for n:

0 

No unrolling (ignore all UNROLL directives and do not attempt to unroll other loops)

1 

Attempt to unroll loops that are marked by the UNROLL directive. That is, the compiler will unroll the loop if there is proof that the loop will benefit by unrolling.

2 (default) 

Attempt to unroll all loops (includes array syntax implied loops), except those marked with the NOUNROLL directive.

For more information about unrolling loops, see Optimizing Applications on the Cray X1 System.

2.20.25. -O vectorn

The -O vectorn option specifies these levels of vectorization:

2.20.26. -O vsearch, -O novsearch

The -O vsearch option vectorizes search loops. -O novsearch disables vectorization of search loops. A search loop is one that can be exited by means of a GO TO statement or EXIT statement.

The -O vsearch option is the default when -O vector2 or -O vector3 are enabled. -O novsearch is the default when -O vector0 or -O vector1 are enabled.

This feature is also available through compiler directives; for more information, see Section 4.2.13.

2.20.27. -O zeroinc, -O nozeroinc

The -O zeroinc option causes the compiler to assume that constant increment variables (CIVs) can be incremented by zero. A CIV is a variable that is incremented only by a loop invariant value. For example, in a loop with variable J, the statement J = J + K, where K can be equal to zero, J is a CIV. -O zeroinc can cause less strength reduction to occur in loops that have variable increments.

The default is -O nozeroinc, which means that you must prevent zero incrementing.

Footnotes

[1]

Rewriting values raised to a constant power into an algebraically equivalent series of multiplications.

[2]

For example, x/y is transformed to x * 1.0/y.