| Cray Fortran CompilerTM Commands and Directives Reference Manual - S-3901-50 | ||
|---|---|---|
| Prev Section | Next Section | |
A node that is used to run user applications. Application nodes are best suited for executing parallel applications and are managed by the strong application placement scheduling and gang scheduling mechanism psched. See also OS node; support node.
A portion of an array; a subobject of an array.
An association permits an entity to be referenced by different names in a scoping unit or by the same or different names in different scoping units. Several kinds of associations exist. The principal kinds of association are pointer association, argument association, host association, use association, and storage association.
Not interruptible. An atomic operation occurs as a logical unit.
An obstacle within a program that provides a mechanism for synchronizing tasks. When a task encounters a barrier, it must wait until all specified tasks reach the barrier.
1. An event initiated by software that prevents cooperating tasks from continuing to issue new program instructions until all of the tasks have reached the same point in the program. 2. A feature that uses a barrier to synchronize the processors within a partition. All processors must reach the barrier before they can continue the program.
A section of a program that does not cross any conditional branches, loop boundaries, or other transfers of control. There is a single entry point and a single exit point. Many compiler optimizations occur within basic blocks.
The way in which one component in a resource specification is related to another component.
An optimization that involves changing the iteration order of loops that access large arrays so that groups of array elements are processed as many times as possible while they reside in cache.
A memory reference to a data object already in cache. Such references are closer and faster than references to data objects in main memory.
In software publications, a hazard statement that highlights information that readers must know to avoid serious, but recoverable errors. (In hardware and training publications, a hazard statement that indicates a potentially hazardous situation that, if not avoided, can result in system damage or data corruption or both.)
A constant increment variable is a variable that is incremented only by a loop invariant value (for example, in a loop with index J, the statement J = J + K, in which K can be equal to 0, J is a CIV).
A syntactic extension to Fortran that offers a method for programming data passing; a data object that is identically allocated on each image and can be directly referenced syntactically by any other image.
An area of memory, or block, that can be referenced by any program unit. In Fortran, a named common block has a name specified in a Fortran COMMON or TASKCOMMON statement, along with specified names of variables or arrays stored in the block. A blank common block, sometimes referred to as blank common, is declared in the same way but without a name.
A programming method that rearranges or eliminates sections of a program during compilation to achieve higher performance.
A sequence of statements in Fortran that starts with a SELECT CASE, DO, IF, or WHERE statement and ends with the corresponding terminal statement.
See Cray pointer.
A variable whose value is the address of another entity, which is called a pointee. The Cray pointer type statement declares both the pointer and its pointee. The Cray pointee does not have an address until the value of the Cray pointer is defined; the pointee is stored starting at the location specified by the pointer.
A server in the Cray X1 system that runs the Programming Environment software.
Cray's documentation system for accessing and searching Cray books, man pages, and glossary terms in HTML and/or PDF format from a web browser. CrayDoc runs on any operating system based on a UNIX or Linux operating system.
The primary high-level tool for identifying opportunities for optimization on the Cray X1 system. CrayPat allows you to perform profiling, sampling, and tracing experiments on an instrumented application and to analyze the results of those experiments; no recompilation is needed to produce the instrumented program. In addition, the CrayPat tool provides access to all hardware performance counters.
Transferring data from one object to another; useful for programming single-program-multiple-data (SPMD) parallel computation. Its chief advantage over message passing is lower latency for data transfers, which leads to better scalability of parallel applications. Data passing can be achieved by using SHMEM library routines or by using co-arrays. Co-arrays offer some advantages over SHMEM, however.
A situation in which two or more processes are unable to proceed because each is waiting for one of the others to do something. A common example is a program communicating to a server, which may find itself waiting for output from the server before sending anything more to it, while the server is similarly waiting for more input from the controlling program before outputting anything.
The label used to introduce information about a feature that will not be implemented until a later release.
1. Memory in which each processor has a separate share of the total memory. 2. Memory that is physically distributed among several modules.
A variable that stores a string of characters for use by your shell and the processes that execute under the shell. Some environment variables are predefined by the shell, and others are defined by an application or user. Shell-level environment variables let you specify the search path that the shell uses to locate executable files, the shell prompt, and many other characteristics of the operation of your shell. Most environment variables are described in the ENVIRONMENT VARIABLES section of the man page for the affected command.
A symbolic source-level debugger designed for debugging the multiple processes of parallel Fortran, C, or C++ programs.
A basic compiler optimization that converts operations on constants to simpler forms as these examples show:
ISO/IEC 1539-1:1997; the standard adopted by American National Standards Institute (ANSI) and International Organization for Standardization (ISO).
A rule, such as the ordering of an ordered list or heap, that applies throughout the life of a data structure or procedure. Each change to the data structure must maintain the correctness of the invariant.
1. The time required for a requested disk sector to be positioned under the head on a disk. It is usually stated as average latency, which is the time of one-half revolution. 2. The period of time that starts when a processor requests data and ends when the data is available. 3. The time it takes for a packet to cross a network connection, from sender to receiver. 4. The period of time that a frame is held by a network device before it is forwarded.
To create a binary executable file (an executable) from a binary relocatable object file (the object). This process adds library subprograms to the object and resolves the external references among subprograms. Executable files and the libraries and data they access are loaded into memory during the load step. Links are created among modules that must access each other. The command that performs a load is called a link-edit loader, or simply a loader.
A generic term for the system software product that loads a compiled or assembled program into memory and prepares it for execution.
Any communication or storage conflict between iterations of a loop. A communication conflict occurs when one iteration writes to a memory location that is read by a another iteration. A storage conflict occurs when two iterations write to the same memory location. The order of the memory operations indicates the nature of the conflict: write-read (flow conflict), read-write (anti conflict), or write-write (output conflict).
Read-write and write-write conflicts are often resolvable by data privatization (that is, assigning distinct storage locations to objects that may be accessed via conflicting iterations). Write-read conflicts are inherent to computation, require explicit synchronization for correct parallel execution of the loop that enclose the conflicts.
An optimization that combines loop interchange and loop fusion to convert a loop nest into a single loop, with an iteration count that is the product of the iteration counts of the original loops.
An optimization that takes the bodies of loops with identical iteration counts and fuses them into a single loop with the same iteration count.
An optimization that changes the order of loops within a loop nest, to achieve stride minimization or eliminate data dependencies.
A value that does not change between iterations of a loop.
An optimization that increases the step of a loop and duplicates the expressions within a loop to reflect the increase in the step. This can improve instruction scheduling and reduce memory access time.
A programming method in which explicit messages (containing data) are sent between tasks. Cray has implemented the message-passing programming method through the Message Passing Interface and the shared memory (SHMEM) routines.
A Cray product that consists of the Message Passing Interface and shared distributed memory (SHMEM) data-passing routines.
A metafile that defines information specific to an application or collection of applications. (This term is not related to the module statement of the Fortran language; it is related to setting up the Cray X1 system environment.) For example, to define the paths, command names, and other environment variables to use the Cray Programming Environment, you use the module file PrgEnv, which contains the base information needed for application compilations. The module file mpt sets a number of environment variables needed for message passing and data passing application development.
A package on the Cray X1 system that allows you to dynamically modify your user environment by using module files. (This term is not related to the module statement of the Fortran language; it is related to setting up the Cray X1 system environment.) The user interface to this package is the module command, which provides a number of capabilities to the user, including loading a module file, unloading a module file, listing which module files are loaded, determining which module files are available, and others.
Processing where an SSP operates as part of a multistreaming processor to execute parts of a parallel program that have been assigned to it.
The packaging that contains a multistreaming processor (MSP) and resides on a node module assembly. The MCM contains four processor chips (P-chips), four cache chips (E-chips), and I/O connections (two I-chips).
A basic programmable computational unit of a Cray X1 system. Each MSP is analogous to a traditional processor and is composed of four single-streaming processors (SSPs) and E-cache that is shared by the SSPs. See also node; SSP; MSP mode; SSP mode.
The configurable scalable building block for a Cray X1 mainframe. The actual hardware contents of a node are housed in four multichip modules (MCMs). This is the conceptual or software configuration view of a hardware unit called a node module. Physically, all nodes are the same; software controls how a node is used, such as for an OS node, application node, or support node. See also application node; MCM, MSP, node module; OS node; SSP; support node.
The physical node in a Cray X1 system. See node.
An industry-standard, portable model for shared memory parallel programming.
The node that provides kernel-level services, such as system calls, to all support nodes and application nodes. See also application node; node; support node.
The nonstandard practice of referencing an array with a subscript not contained between the declared lower and upper bounds of the corresponding dimension for that array. This practice sometimes, but not always, leads to referencing a storage location outside of the entire array.
The unit of memory addressable through the Translation Lookaside Buffer (TLB). On a Cray X1 system, the base page size is 65,536 bytes, but larger page sizes (up to 4,294,967,296 bytes) are also available.
Recogizing a common code pattern and replacing it with a call to a functionally equivalent library routine.
The appearance of the procedure name, operator symbol, or assignment symbol in an executable program that requires execution of the procedure.
The number of dimensions in a Fortran array. Rank is declared when the array is declared and cannot change.
The process of transforming an expression according to certain reduction rules. The most important forms are beta reduction (application of a lambda abstraction to one or more argument expressions) and delta reduction (application of a mathematical function to the required number of arguments). An evaluation strategy (or reduction strategy) determines which part of an expression to reduce first. There are many such strategies. Also called contraction.
A loop that contains at least one statement that reduces an array to a scalar value by doing a cumulative operation on many of the array elements. This involves including the result of the previous iteration in the expression of the current iteration.
The ability to increase the resources of a system and keep the work accomplished proportional.
1. A nonvectorized, single numeric value that represents one aspect of a physical quantity and may be represented on a scale as a point. This term often refers to a floating-point or integer computation that is not vectorized; more generally, it also refers to logical and conditional (jump) computation. 2. In Fortran, a single object of any intrinsic or derived type.
A form of fine-grain serial processing whereby iterative operations are performed sequentially on the elements of an array, with each iteration producing one result.
Part of a program in which a name has a fixed meaning. A program unit or subprogram generally defines a scoping unit. Type definitions and procedure interface bodies also constitute scoping units. Scoping units do not overlap, although one scoping unit may contain another in the sense that it surrounds it. If a scoping unit contains another scoping unit, the outer scoping unit is referred to as the host scoping unit of the inner scoping unit.
An array-processing loop used to perform a table lookup or to find exceptional values within an array.
A library of optimized functions and subroutines that take advantage of shared memory to move data between the memories of processors. The routines can either be used by themselves or in conjunction with another programming style such as Message Passing Interface.
A loop that is vectorized but that has been determined by the compiler to have trips less than or equal to the maximum vector length. In this case, the compiler deletes the loop to the top of the loop. If the shortloop directive is used or the trip count is constant, the top test for number of trips is deleted. A shortloop is more efficient than a conventional loop.
The result of modifying shared data or performing I/O by concurrent streams without the use of an appropriate synchronization mechanism. Modifying shared data (where multiple streams write to the same location or write/read the same location) without appropriate synchronization can cause unreliable data and race conditions. Performing I/O without appropriate synchronization can cause an I/O deadlock. Shared data, in this context, occurs when any object may be referenced by two or more single-streaming processors. This includes globally visible objects (for example, COMMON, MODULE data), statically allocated objects (SAVE, C static), dummy arguments that refer to SHARED data and objects in the SHARED heap.
A basic programmable computational unit of a Cray X1 system. See also node; MSP; MSP mode; SSP mode.
An instruction is speculative, or branch speculative, when it is issued after a predicted branch, and before the branch prediction is known to be correct. Speculative instructions may be killed if the branch prediction is incorrect.
Processing where an SSP executes complete programs. See also MSP mode.
Replacing expressions by algebraic equivalents that require less time to compute. For example, a multiplication do loop is converted by strength reduction into a series of additions, because addition requires less time to compute than multiplications. Similar transformations are performed with exponentiations and other operators.
The relationship between the layout of an array's elements in memory and the order in which those elements are accessed. A stride of 1 means that memory-adjacent array elements are accessed on successive iterations of an array-processing loop.
A single-processor optimization technique in which arrays, and the program loops that reference them, are split into optimally sized blocks termed strips. The original loop is transformed into two nested loops. The inner loop references all data elements within a single strip, and the outer loop selects the strip to be addressed in the inner loop. This technique is often performed by the compiler to maximize the usage of cache memory or as part of vector code generation.
The node that is used to run serial commands, such as shells, editors, and other user commands (ls, for example). See also application node; OS node; node.
An acronym that represents attributes for argument association. It represents the data type, kind type parameter, and rank of the argument.
The operating system for Cray X1 systems.
A single-processing-element optimization technique in which the statements within a loop are copied. For example, if a loop has two statements, unrolling might copy those statements four times, resulting in eight statements. The loop control variable would be incremented for each copy, and the stride through the array would also be increased by the number of copies. This technique is often performed directly by the compiler, and the number of copies is usually between two and four.
An array, or a subset of an array, on which a computer operates. When arithmetic, logical, or memory operations are applied to vectors, it is referred to as vector processing.
The number of elements in a vector.
A form of processing that uses one instruction for the simultaneous performance of iterative operations on elements in sets of ordered data.
A source code loop that is processed with hardware vector registers.
In software publications, a hazard statement that highlights information that readers must know to avoid irrecoverable errors or system damage. (In hardware and training publications, a hazard statement that indicates a potentially hazardous situation that, if not avoided, could result in death or serious injury.)
| Prev Section | Table of Contents | Title Page | Index | Next Section |
| Command Line Options | Up one level | Index |