| Cray Fortran CompilerTM Commands and Directives Reference Manual - S-3901-50 | ||
|---|---|---|
| Prev Section | Chapter 2. Invoking the Cray Fortran Compiler | Next Section |
The -O opt option specifies optimization features. You can specify more than one -O option, with accompanying arguments, on the command line. If specifying more than one argument to -O, separate the individual arguments with commas and do not include intervening spaces.
Note: The -e o option or the ftnlx command displays all the optimization options the compiler uses at compile time.
The -O 0, -O 1, -O 2, and -O 3 options allow you to specify a general level of optimization that includes vectorization, scalar optimization, inlining, and streaming. Generally, as the -O level increases, compilation time increases and execution time decreases.
The -O 1, -O 2, and -O 3 specifications do not directly correspond to the numeric optimization levels for scalar optimization, vectorization, inlining, and streaming. For example, specifying -O 3 does not necessarily enable scalar3 and vector3. Cray reserves the right to alter the specific optimizations performed at these levels from release to release.
The other optimization options, such as -O aggress and -O recurrence, control pattern matching, zero incrementing, and several other optimization features. Some of these features can also be controlled through compiler directives.
Figure 2-1 shows the relationships between some of the -O opt values.
The -O 0n option performs general optimization at these levels: 0 (none), 1 (conservative), 2 (moderate, default), and 3 (aggressive).
The -O 0 option inhibits optimization including inlining. This option's characteristics include low compile time, small compile size, and no global scalar optimization.
Most array syntax statements are vectorized, but all other vectorizations are disabled.
The -O 1 option specifies conservative optimization. This option's characteristics include moderate compile time and size, global scalar optimizations, and no loop nest restructuring. Results may differ from the results obtained when -O 0 is specified because of operator reassociation. No optimizations will be performed that might create false exceptions.
On UNICOS/mp systems, only array syntax statements and inner loops are vectorized and multistreamed, and the system does not perform some vector or multistream reductions. User tasking is enabled, so !$OMP directives are recognized.
The -O 2 option specifies moderate optimization. This option's characteristics include moderate compile time and size, global scalar optimizations, pattern matching, and loop nest restructuring.
On UNICOS/mp systems, results may differ from results obtained when -O 1 is specified because of vector or multistreamed reductions. The -O 2 option enables automatic vectorization and multistreaming of array syntax and entire loop nests.
This is the default level of optimization.
The -O 3 option specifies aggressive optimization. This option's characteristics include a potentially larger compile size, longer compile time, global scalar optimizations, possible loop nest restructuring, and pattern matching. The optimizations performed might create false exceptions in rare instances.
On UNICOS/mp systems, results may differ from results obtained when -O 1 is specified because of vector or multistreaming reductions.
The -O aggress option causes the compiler to treat a program unit (for example, a subroutine or a function) as a single optimization region. Doing so can improve the optimization of large program units by raising the limits for internal tables, which increases opportunities for optimization. This option increases compile time and size.
The default is -O noaggress.
The Cray Fortran Compiler supports the following command line options to control cloning:
-Oclone0, disable cloning (default)
-Oclone1, enable cloning
Cloning is the attempt to duplicate a procedure under certain conditions and replace dummy arguments with associated constant actual arguments throughout the cloned procedure. The compiler will attempt to clone a procedure when a call site contains actual arguments that are scalar integer and/or scalar logical constants. When the constants are exposed to the optimizer, it can generate more efficient code.
Note: Do not specify the -O inlinefrom= option when using the cloning option.
The cloning option works in conjunction with the -Oinlinen option.
The compiler will first attempt to inline a call site. If inlining the call site fails, the compiler will attempt to clone the procedure. Cloning is attempted when inlining fails for any of these reasons:
The procedure does not fit the criteria of the selected inlining level.
The routine to inline is too large to expand in place.
A NOINLINE directive is in effect.
When a clone is made, dummy arguments that have scalar integer and/or scalar logical constant actual arguments associated with them are replaced with the constant value throughout the routine. The following example shows cloning in action:
PROGRAM TEST INTEGER I LOGICAL L L = .FALSE. DO J = 1,10 CALL SAM(4, L) ! Call site with a constant ENDDO CALL SAM(3, .TRUE.) ! Call site with constants END SUBROUTINE SAM(I, L) INTEGER I LOGICAL L IF (L) THEN PRINT *, I ENDIF END |
Compiling the previous program with the -O clone1 and -Oinline2 options, the compiler produces the following program:
PROGRAM TEST INTEGER I LOGICAL L L = .FALSE. DO J = 1,10 CALL SAM(4, L) ! This call was inlined because it is in the. ENDDO ! body of a DO loop CALL SAM@1(3, .TRUE.) ! This is a call to a clone of SAM. END ! Original Subroutine SUBROUTINE SAM(I, L) INTEGER I LOGICAL L IF (L) THEN PRINT *, I ENDIF END ! Cloned subroutine SUBROUTINE SAM@1(I, L) INTEGER I LOGICAL L IF (.TRUE.) THEN ! The optimizer will eliminate this IF test PRINT *, 3 ENDIF END |
The -O fp option offers finer control over floating-point optimizations than the -O [no]ieeeconform option. The n argument controls the level of allowable optimization; 0 gives the compiler minimum freedom to optimize floating-point operations, while 3 gives it maximum freedom. The higher the level, the lesser the floating-point operations conform to the IEEE standard.
This option is useful for code that use unstable algorithms, but which are optimizable. It is also useful for applications that want aggressive floating-point optimizations that go beyond what the Fortran standard allows.
The -O [no]ieeeconform and -O fp options can be specified on the same compiler command line, but the compiler will use only the rightmost option. If this is the case or multiple -O fp are used, the compiler issues a message indicating such.
Table 2-1 compares the various optimization levels of the -O fp option (levels 2 and 3 are usually the same). The table lists some of the optimizations performed; the compiler may perform other optimizations not listed.
Table 2-1. Floating-point Optimization Levels
Optimization Type | 0 | 1 | 2 | 3 |
|---|---|---|---|---|
Inline selected mathematical library functions | N/A | N/A | N/A | Accuracy is slightly reduced. |
Complex divisions accuracy and calculation speed | Accurate and slower | Accurate and slower | Less accurate (less precision) and faster. | Less accurate (less precision) and faster. |
Exponentiation rewrite[1] | None | Fast | Maximum performance | Maximum performance |
Fast | Fast | Aggressive | Aggressive | |
Rewrite division as reciprocal equivalent [2] | None | None | Yes | Yes |
Safety | Maximum | Moderate | Moderate | Low |
Optimizations | Same effect as -O ieeeconform. The -O fp0 option causes your program's executable code to conform more closely to the IEEE floating-point standard than the default mode. When specified, many identity optimizations are disabled, executable code is slower than higher floating-point optimization levels, and a scaled complex divide mechanism is enabled that increases the range of complex values that can be handled without producing an underflow. | Performs various, generally safe, non-conforming IEEE optimizations, such as folding A == A to .TRUE.. where A is a floating point object. | Includes optimizations of -O fp1. | Includes optimizations of -O fp1. Equivalent to the -O noieeeconform option. |
When to use | The-O fp0 option should never be used, except when your code pushes the limits of IEEE accuracy, or require strong IEEE standard conformance. | The -O fp1 options should never be used, except when your code pushes the limits of IEEE accuracy, or require strong IEEE standard conformance. |
| The -O fp3 option should be used when performance is more critical than the level of IEEE standard conformance provided by -O fp2. |
Default -O fp2
The -O ieeeconform option causes your program's executable code to conform more closely to the IEEE floating-point standard than the default mode. When specified, many identity optimizations are disabled, executable code is slower, and a scaled complex divide mechanism is enabled that increases the range of complex values that can be handled without producing an underflow.
The -O noieeeconform option causes the compiler to optimize expressions such as X.NE.X to false and X/X to 1, where X is a floating-point value. With -O noieeeconform in effect, these and other similar arithmetic identity optimizations are performed.
The default is -O noieeeconform.
The -O fusionn option globally controls loop fusion and changes the assertiveness of the FUSION directive. Loop fusion can improve the performance of loops and in rare cases degrade performance.
The n argument allows you to turn loop fusion on or off and determine where fusion should occur. It also affects the assertiveness of the FUSION directive. Use one of these values for n:
For more information about loop fusion, see Optimizing Applications on the Cray X1 System.
The -O gen_private_callee option is used when compiling source files containing subprograms which will be called from streamed regions, whether those streamed regions are created by Cray streaming directives (CSDs), or by the use of the SSP_PRIVATE directive to cause autostreaming.
Refer to Section 4.4 for information about CSDs or to Section 4.3.2 for information about the SSP_PRIVATE directive.
The -O infinitevl option assumes that the safe vector length is infinite for IVDEP directives without the SAFEVL clause. The -O noinfinitevl option assumes the safe vector length is the maximum vector length supported by the target for IVDEP directives without the SAFEVL or INFINITEVL clause.
Refer to Section 4.2.2 for more information about the INFINITEVL and SAFEVL clause.
The default is -O infinitevl.
Inlining is the process of replacing a user procedure call with the procedure definition itself. This saves subprogram call overhead and may allow better optimization of the inlined code. If all calls within a loop are inlined, the loop becomes a candidate for vectorization or streaming. The Cray Fortran Compiler supports the following command line options for controlling inlining:
-O inline0, -O inline1, -O inline2, -O inline3, -O inline4, -O inline5
-O inlinefrom=source[:source] ...
The following conditions inhibit inlining:
Dummy argument types and kind type parameter values in the called procedure that differ from corresponding actual argument types and kind type parameter values.
The number of dummy arguments being not equal to the number of actual arguments.
A call site that is within the range of a NOINLINE directive.
A procedure being called is specified on an INLINENEVER directive.
A constant actual argument that has a corresponding dummy argument that is defined in the procedure by assignment or by a READ statement.
The called routine is declared RECURSIVE.
A dummy argument of a host procedure is referenced in an internal procedure of the host procedure. If this condition exists, the host is not inlined.
The compiler determines that the routine is too big to inline. This is determined by an internal limit of the text size of the routine. You can override this limit by inserting an INLINEALWAYS directive. For information on the INLINEALWAYS directive, see Section 4.5.3.
The procedure being called contains any of these items:
A LOC of a variable declared in a common block
Calls to the NUMARG intrinsic procedure
Calls to the PRESENT intrinsic procedure
ASSIGN statements
Alternate RETURN statements
Dummy procedures
Dummy arguments declared with the OPTIONAL attribute
Fortran pointers in static storage (COMMON, MODULE, DATA, or SAVE)
Dummy arguments that are Cray pointers
These inlining modes are invoked when various combinations of -O inline and/or -O inlinefrom= exits:
Automatic inlining is invoked with the -O inlinen option on the command line. Routines that are potential targets for inline expansion include all the routines within the input file to the compilation. The higher the value of n, the more aggressive the inlining. Table 2-2 explains the inlining levels in more detail.
Table 2-2. Automatic Inlining Specifications
Level | Description |
|---|---|
0 | No inlining. All inlining disabled. All inlining compiler directives are ignored. |
1 | Directive inlining. Inlining is attempted for call sites and routines that are under the control of a compiler directive. See Chapter 4 for more information on the inlining directives. |
2 | DO loop inlining. Inlining is attempted at level 1 plus inlining is attempted for call sites that exist within DO loops. |
3 | Leaf node inlining. Inlining is attempted at level 1 plus inlining is attempted on leaf node routines, which are routines that do not call other user routines. A leaf node routine can call intrinsic procedures and/or library routines. Default. |
4 | Inlining is attempted at levels 1, 2, and 3 plus, inlining is attempted on call sites that contain scalar constant actual arguments. |
5 | Aggressive inlining. Inlining is attempted for every call site encountered. |
Explicit inlining is invoked by the -O inlinefrom=source[:source] ... option. This option lets you explicitly state which routines can be considered for inline expansion. The source arguments identify each file or directory that contains the routines that can be inlined. Whenever a call is encountered in the input program to a routine that exists in source, inlining is attempted for that call site.
Note that blanks are not allowed on either side of the equal sign.
All inlining directives are recognized with explicit inlining. For information on inlining directives, see Chapter 4.
Note that the routines in source are not actually loaded with the final program. They are simply templates for the inliner. To have a routine contained in source loaded with the program, you must include it in an input file to the compilation.
The following list describes objects that can be specified in the source argument.
Combined inlining is invoked by specifying the -O inline and -O inlinefrom= options on the command line. This inlining mode will only look in source for potential targets for expansion, while applying the selected level of inlining heuristics specified by the -O inline option.
The -O modinline option prepares module procedures so they can be inlined by directing the compiler to create templates for module procedures encountered in a module. These templates are attached to file.o or modulename.mod. The files that contain these inlinable templates can be saved and used later to inline call sites within a program being compiled.
When -e m is in effect, module information is stored in modname.mod. The compiler writes a modulename.mod file for each module; modulename is created by taking the name of the module and, if necessary, converting it to uppercase.
The process of inlining module procedures requires only that file.o or modulename.mod be available during compilation through the typical module processing mechanism. The USE statement makes the templates available to the inliner.
When -O modinline is specified, the MODINLINE and NOMODINLINE directives are recognized. Using the -O modinline option increases the size of file.o. The default is -O nomodinline.
To ensure that file.o is not removed, specify this option in conjunction with the -c option. For information on the -c option, see Section 2.4.
Note: This option cannot be specified in conjunction with the -O inlinefrom=source or -O inlinen options.
The -O msgs option causes the compiler to write optimization messages to stderr. These messages include VECTOR, SCALAR, INLINE and STREAM messages.
When the -O msgs option is in effect, you may request that a listing be produced so that you can see the optimization messages in the listing. For information on obtaining listings, see Section 2.23.
The default is -O nomsgs.
The -O msp option causes the compiler to generate code and to select the appropriate libraries to create an executable that runs on one or more multistreaming processors (MSPs). This is called MSP mode. Any code, including Cray distributed memory models, can use MSP mode.
Executables compiled for MSP mode can contain object files compiled with SSP or MSP mode. That is, SSP and MSP object files can be specified during the load step as follows:
ftn -O msp -c ... !Produce MSP object files
ftn -O ssp -c ... !Produce SSP object files
ftn sspA.o sspB.o msp.o ... !Link MSP and SSP object files
!to create an executable to run on MSPs |
Note: Code explicitly compiled with the -O stream0 option can be linked with object files compiled with SSP or MSP mode. You can use this option to create a universal library that can be used in SSP or MSP mode.
For more information about SSP and MSP mode, refer to the Optimizing Applications on the Cray X1 System manual.
This option is on by default.
The -O negmsgs option causes the compiler to generate messages to stderr that indicate why optimizations such as vectorization or streaming did not occur in a given instance.
The -O negmsgs option enables the -O msgs option. The -rm option enables the -O negmsgs option.
The default is -O nonegmsgs.
The -O nointerchange option inhibits the compiler's attempts to interchange loops. Interchanging loops by having the compiler replace an inner loop with an outer loop can increase performance. The compiler performs this optimization by default.
Specifying the -O nointerchange option is equivalent to specifying a NOINTERCHANGE directive prior to every loop. To disable loop interchange on individual loops, use the NOINTERCHANGE directive. For more information on the NOINTERCHANGE directive, see Section 4.6.1.
The -O nooverindex option declares that there are no array subscripts which index a dimension of an array that are outside the declared bounds of that dimension. Short loop code generation occurs when the extent does not exceed the maximum vector length of the machine.
Specifying -O overindex declares that the program contains code that makes array references with subscripts that exceed the defined extents. This prevents the compiler from performing the short loop optimizations described in the preceding paragraph.
Overindexing is nonstandard, but it compiles correctly as long as data dependencies are not hidden from the compiler. This technique collapses loops; that is, it replaces a loop nest with a single loop. An example of this practice is as follows:
DIMENSION A(20, 20) DO I = 1, N A(I, 1) = 0.0 END DO |
Assuming that N equals 400 in the previous example, the compiler can generate more efficient code than a doubly nested loop. However, incorrect results can occur in this case if -O nooverindex is in effect.
You do not need to specify -O overindex if the overindexed array is a Cray pointee, has been equivalenced, or if the extent of the overindexed dimension is declared to be 1 or *. In addition, the -O overindex option is enabled automatically for the following extension code, where the number of subscripts in an array reference is less than the declared number:
DIMENSION A(20, 20)
DO I = 1, N
A(I) = 0.0 ! 1-dimension reference;
! 2-dimension array
END DO |
Note: The -O overindex option is used by the compiler for detection of short loops and subsequent code scheduling. This allows manual overindexing as described in this section, but it may have a negative performance effect because of fewer recognized short loops and more restrictive code scheduling. In addition, the compiler continues to assume, by default, a standard-conforming user program that does not overindex when doing dependency analysis for other loop nest optimizations.
The default is -O nooverindex.
The -O pattern option enables pattern matching for library substitution. The pattern matching feature searches your code for specific code patterns and replaces them with calls to scientific library routines. The scientific library used is libsci.a. These routines are highly optimized.
The -O pattern option is enabled only for optimization levels -O 2, -O vector2 or higher; there is no way to force pattern matching for lower levels.
Only PE-private data is supported.
Specifying -O nopattern disables pattern matching and causes the compiler to ignore the PATTERN and NOPATTERN directives. For information on the PATTERN and NOPATTERN directives, see Section 4.2.4.
The default is -O pattern.
The -O recurrence option enables vectorization for all reduction loops that return different results from the scalar version, due to a reassociation of operations. A reduction loop is a loop that contains at least one statement that reduces an array to a scalar value by doing a cumulative operation on many of the array elements. This involves including the result of the previous iteration in the expression of the current iteration.
The default is -O recurrence. This feature is also available through compiler directives; for more information, see Section 4.2.7.
The -O scalarn option specifies these levels of scalar optimization:
scalar0 disables scalar optimization. Characteristics include low compile time and size.
The -O scalar0 option disables scalar optimization. Characteristics include low compile time and size.
The -O scalar0 option is compatible with -O task0 or -O task1 and with -O vector0.
scalar1 specifies conservative scalar optimization. Characteristics include moderate compile time and size. Results can differ from the results obtained when -O scalar0 is specified because of operator reassociation. No optimizations are performed that could create false exceptions.
The -O scalar1 option is compatible with -O vector0 or -O vector1, with -O task0 or -O task1, and with -O stream0 or -O stream1.
scalar2 specifies moderate scalar optimization. Characteristics include moderate compile time and size. Results can differ slightly from the results obtained when -O scalar1 is specified because of possible loop nest restructuring. Generally, no optimizations are done that could create false exceptions.
The -O scalar2 option is compatible with all vectorization, multistreaming, and tasking levels.
This is the default scalar optimization level.
scalar3 specifies aggressive scalar optimization. Characteristics include potentially greater compile time and size. Results can differ from the results obtained when -O scalar1 is specified because of possible loop nest restructuring.
The optimization techniques used can create false exceptions in rare instances. Analysis that determines whether a variable is used before it is defined is enabled at this level. The -O scalar3 optimization level is never enabled automatically, even when -O 3 is specified. This scalar optimization level must be requested specifically on the command line.
The -O scalar3 option is compatible with all tasking and vectorization levels.
The -O shortcircuitn option specify various levels of short circuit evaluation. Short circuit evaluation is an optimization in which the compiler analyzes all or part of a logical expression based on the results of a preliminary analysis. When short circuiting is enabled, the compiler attempts short circuit evaluation of logical expressions that are used in IF statement scalar logical expressions. This evaluation is performed on the .AND. operator and the .OR. operator.
Example 1: Assume the following logical expression:
operand1 .AND. operand2 |
The operand2 need not be evaluated if operand1 is false because in that case, the entire expression evaluates to false. Likewise, if operand2 is false, operand1 need not be evaluated.
Example 2: Assume the following logical expression:
operand1 .OR. operand2 |
The operand2 need not be evaluated if operand1 is true because in that case, the entire expression evaluates to true. Likewise, if operand2 is true, operand1 need not be evaluated.
The compiler performs short circuit evaluation in a variety of ways, based on the following command line options:
-O shortcircuit0 disables short circuiting of IF and ELSEIF statement logical conditions.
-O shortcircuit1 specifies short circuiting of IF and ELSEIF logical conditions only when a PRESENT, ALLOCATED, or ASSOCIATED intrinsic procedure is in the condition.
The short circuiting is performed left to right. In other words, the left operand is evaluated first, and if it determines the value of the operation, the right operand is not evaluated. The following code segment shows how this option could be used:
SUBROUTINE SUB(A) INTEGER,OPTIONAL::A IF (PRESENT(A) .AND. A==0) THEN ... |
The expression A==0 must not be evaluated if A is not PRESENT. The short circuiting performed when -O shortcircuit1 is in effect causes the evaluation of PRESENT(A) first. If that is false, A==0 is not evaluated. If -O shortcircuit1 is in effect, the preceding example is equivalent to the following:
SUBROUTINE SUB(A) INTEGER,OPTIONAL::A IF (PRESENT(A)) THEN IF (A==0) THEN ... |
-O shortcircuit2 specifies short circuiting of IF and ELSEIF logical conditions, and it is done left to right. All .AND. and .OR. operators in these expressions are evaluated in this way. The left operand is evaluated, and if it determines the result of the operation, the right operand is not evaluated.
-O shortcircuit3 specifies short circuiting of IF and ELSEIF logical conditions. It is an attempt to avoid making function calls. When this option is in effect, the left and right operands to .AND. and .OR. operators are examined to determine if one or the other contains function calls. If either operand has functions, short circuit evaluation is performed. The operand that has fewer calls is evaluated first, and if it determines the result of the operation, the remaining operand is not evaluated. If both operands have no calls, then no short circuiting is done. For the following example, the right operand of .OR. is evaluated first. If A==0 then ifunc() is not called:
IF (ifunc() == 0 .OR. A==0) THEN ... |
-O shortcircuit3 is the default.
The -O ssp option causes the compiler to compile the source code and select the appropriate libraries to create an executable that runs on one single-streaming processor (SSP mode). Any code, including Cray distributed memory models, can use SSP mode.
Executables compiled for SSP mode can contain only object files compiled in SSP mode. When loading object files separately from the compile step, the SSP mode must be specified during the load step as this example shows:
ftn -O ssp -c ... !Produce SSP object files
ftn -O ssp sspA.o sspB.o ... !Link SSP object files
!to create an executable to run on a single SSP |
Since SSP mode does not use streaming, the compiler automatically specifies the -O stream0 option. This option also causes the compiler to ignore CSDs.
Note: Code explicitly compiled with the -O stream0 option can be linked with object files compiled with SSP or MSP mode. You can use this option to create a universal library that can be used in SSP or MSP mode.
For more information about SSP and MSP mode, refer to the Optimizing Applications on the Cray X1 System manual.
This option is off by default.
The -O streamn option controls the multistreaming when multistreaming is enabled. These levels can be set to no multistreaming optimization, at -O stream0, to aggressive multistreaming optimization at -O stream3. Generally, vectorized applications that execute on a one-processor system can expect to execute up to four times faster on a processor with multistreaming enabled.
At the default streaming level, -O stream2, the four processors SSP0, SSP1, SSP2, and SSP3 may be used by the code generated by the Fortran compiler. Automatic streaming can be turned off by using the -O stream0 option. This does not mean that SSP1, SSP2, and SSP3 are not used during execution. These processors can still be used at times by the library routines called by the generated code. At times, the library routines may park (suspend) the SSP1, SSP2, and SSP3 processors. These SSPs are not available for other executables while code compiled with the stream0 option enabled is executing.
The MSP optimization levels assume that certain scalar and vectorization optimization levels are also specified. If incompatible optimization levels are specified, the compiler adjusts the optimization levels used and issues a message. The various MSP optimization levels and their compatibilities with other optimizations are as follows:
-O stream0 inhibits automatic MSP optimizations. No MSP directives are recognized.
The -O stream0 option is compatible with all vectorization and scalar optimization levels.
-O stream1 specifies limited MSP optimization.
The compiler recognizes MSP directives. Automatic MSP optimization is limited to inner vectorized loops and some bit-matrix multiplication (BMM) operations. MSP operations performed generate the same results that would be obtained from scalar optimizations; for example, no floating-point reductions are performed.
The -O stream1 option is compatible with -O scalar1, -O scalar2, -O scalar3, -O vector1, -O vector2, and -O vector3.
-O stream2 specifies safe MSP optimization. The compiler recognizes MSP directives. The compiler automatically performs MSP optimizations on loop nests and appropriate BMM operations.
The -O stream2 option is compatible with -O scalar2, -O scalar3, -O vector2, and -O vector3.
Default.
-O stream3 specifies aggressive MSP optimization on all code including appropriate BMM operations. The compiler recognizes MSP directives.
The -O stream3 option is compatible with -O scalar2, -O scalar3, -O vector2, and -O vector3.
For information about MSP directives, see Section 4.3. For information on optimizing with MSP, see Optimizing Applications on the Cray X1 System. For more information about the effects the streaming option has on BMM operators, refer to the bmm(3i) man page.
The -O task0 option disables tasking. Characteristics include low compile time and size. OpenMP directives are ignored.
The -O task0 option is compatible with all vectorization and scalar optimization levels.
The -O task1 option specifies user tasking, so OpenMP directives are recognized.
Characteristics include low compile time and size. No level for scalar optimization is enabled automatically.
The -O task1 option is compatible with all vectorization and scalar optimization levels.
The default is -O task0.
The -O threshold option generates a run time threshold test to determine whether there is sufficient work in a loop nest before multistreaming is attempted. Multistreaming must be enabled for this directive to take effect. Multistreaming is enabled when the -O n option is specified and n is greater than or equal to 1.
The default is -O threshold.
The -O unrolln option globally controls loop unrolling and changes the assertiveness of the UNROLL directive. By default, the compiler attempts to unroll all loops, unless the NOUNROLL directive is specified for a loop. Generally, unrolling loops increases single processor performance at the cost of increased compile time and code size.
The n argument allows you to turn loop unrolling on or off and determine where unrolling should occur. It also affects the assertiveness of the UNROLL directive. Use one of these values for n:
For more information about unrolling loops, see Optimizing Applications on the Cray X1 System.
The -O vectorn option specifies these levels of vectorization:
-O vector0 specifies very conservative vectorization. Characteristics include low compile time and small compile size.
The -O vector0 option is compatible with all scalar optimization levels and with task0 or task1. Vector code is generated for most array syntax statements but not for user-coded loops.
-O vector1 specifies conservative vectorization. Characteristics include moderate compile time and size. No loop nests are restructured. Only inner loops are vectorized. Not all vector reductions are performed, so results do not differ from results obtained when -O vector0 is specified. No vectorizations that might create false exceptions are performed.
The -O vector1 option is compatible with -O task0 or -O task1 and with -O scalar1, -O scalar2, -O scalar3, or -O stream1.
-O vector2 specifies moderate vectorization. Characteristics include moderate compile time and size. Loop nests are restructured. Results can differ slightly from results obtained when -O vector1 is specified because of vector reductions.
The -O vector2 option is compatible with -O scalar2 or -O scalar3 and with -O task0, -O task1, -O stream0, -O stream1, and -O stream2.
This is the default vectorization level.
-O vector3 specifies aggressive vectorization. Characteristics include potentially high compile time and size. Loop nests are restructured. Results can differ slightly from results obtained when -O vector1 is specified because of vector reductions. Vectorizations that might create false exceptions in rare cases may be performed.
The -O vector3 option is compatible with -O scalar2, -O scalar3, -O stream2, and -O stream3 and with all tasking levels.
The -O vsearch option vectorizes search loops. -O novsearch disables vectorization of search loops. A search loop is one that can be exited by means of a GO TO statement or EXIT statement.
The -O vsearch option is the default when -O vector2 or -O vector3 are enabled. -O novsearch is the default when -O vector0 or -O vector1 are enabled.
This feature is also available through compiler directives; for more information, see Section 4.2.13.
The -O zeroinc option causes the compiler to assume that constant increment variables (CIVs) can be incremented by zero. A CIV is a variable that is incremented only by a loop invariant value. For example, in a loop with variable J, the statement J = J + K, where K can be equal to zero, J is a CIV. -O zeroinc can cause less strength reduction to occur in loops that have variable increments.
The default is -O nozeroinc, which means that you must prevent zero incrementing.
| [1] | Rewriting values raised to a constant power into an algebraically equivalent series of multiplications. |
| [2] | For example, x/y is transformed to x * 1.0/y. |
| Prev Section | Table of Contents | Title Page | Index | Next Section |
| -o out_file | Up one level | -p module_site |