Optimizing Applications on the Cray X1
TM
System - S-2315-50
Index
A
arrays, private
in multistreaming,
Multistreaming
assign command
accessing layers provided by FFIO libraries,
Does the Code Use Formatted I/O?
bypassing system cache,
Does the Code Use Direct Access I/O?
COS-blocked format,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
for library buffer size,
Does the Code Use Small, Sequential, Unformatted I/O Requests?
,
Does the Code Use Direct Access I/O?
setting buffer size,
Does the Code Use Formatted I/O?
specifying file format,
Does the Code Use Asynchronous I/O Requests?
specifying library buffer size,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
specifying MR layer,
Memory-Resident (MR) Files
to avoid cache,
Does the Code Use Asynchronous I/O Requests?
to convert to asynchronous I/O,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
to invoke FFIO,
Memory-Resident (MR) Files
assign
,
Identifying I/O Intensive Code
asynchronous I/O
converting to,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
optimizing,
Does the Code Use Asynchronous I/O Requests?
B
bandwidth,
Is the Program Memory Bound?
,
Latency and Bandwidth Issues
Blocking,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
,
Loop Blocking
,
Cache Pollution Control
Buffers
translation lookaside,
Is the Program Memory Bound?
C
C/C++
loopmark listing,
Is the Program Processor Bound?
Cache
avoiding,
Does the Code Use Asynchronous I/O Requests?
bandwidth,
Bandwidth
coherence,
Cache Coherency and Consistency
consistency,
Cache Coherency and Consistency
D-cache,
D-cache and I-cache
E-cache,
E-cache
I-cache,
D-cache and I-cache
instruction,
Cache
line size,
Cache Line Size
locality,
Cache Localities and Strides
overview,
Overview
pollution control,
Cache Pollution Control
scalar data,
Cache
stride,
Cache Localities and Strides
using
no_cache_alloc
,
Optimizing Memory-bound Code
viewing counters,
Optimizing Memory-bound Code
Call stack overflow,
Sampling/Asynchronous Experiments
Call stack sample,
Sampling/Asynchronous Experiments
compiler options,
Using More Aggressive Compiler Options
computed-safe vector length,
Computed-safe Vector Length Loops
cpu_time
,
Common Timing Tools
Cray Streaming Directives,
Factors that Enhance Vectorization and Multistreaming
CrayPat,
CrayPat
pat_report
,
Generating a Report
pat_build
,
Instrumenting a Program
environment variables,
Setting the Environment and Running the Experiment
module,
Loading the CrayPat Module
sampling experiments,
Sampling/Asynchronous Experiments
tracing experiments,
Tracing Experiments
D
data edit descriptors,
Does the Code Use Formatted I/O?
data items in I/O list
minimizing,
Does the Code Use Formatted I/O?
Debugging
prerequisites,
Overview
using CrayPat,
Sampling and Tracing Experiments
,
Customizing a Report
direct access I/O
definition,
Does the Code Use Direct Access I/O?
directives,
Using More Aggressive Compiler Options
double buffering,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
E
edit descriptors
repeated,
Does the Code Use Formatted I/O?
Examples
mprofil
report,
Is the Program I/O Bound?
,
Is the Program Processor Bound?
pat_hwpc
report,
Is the Program Memory Bound?
direct access I/O,
Does the Code Use Direct Access I/O?
expanded
samp_pc_time
report,
Is the Program Processor Bound?
F
FFIO
cache tuning,
Does the Code Use Asynchronous I/O Requests?
libraries,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
using,
Identifying I/O Intensive Code
file format
unblocked for asynchronous I/O,
Does the Code Use Asynchronous I/O Requests?
unbuffered and unblocked,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
files
memory resident,
Memory-Resident (MR) Files
finding the source file,
Is the Program I/O Bound?
formatted I/O
increasing library buffer sizes,
Does the Code Use Formatted I/O?
optimizing,
Does the Code Use Formatted I/O?
reducing amount of,
Does the Code Use Formatted I/O?
increasing efficiency,
Does the Code Use Formatted I/O?
Fortran
IVDEP
directive,
Vector Update Loops
loopmark listing,
Is the Program Processor Bound?
G
gather,
Vector Update Loops
H
hardware counters
viewing with
pat_hwpc
,
Is the Program Memory Bound?
Heap overflow,
Sampling/Asynchronous Experiments
Heap sample,
Sampling/Asynchronous Experiments
I
I/O-bound code
optimizing,
Optimizing I/O-bound Code
I/O
optimizing unformatted,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
,
Does the Code Use Small, Sequential, Unformatted I/O Requests?
tracing,
Identifying I/O Intensive Code
unbuffered, unblocked format,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
using layered I/O,
Identifying I/O Intensive Code
,
I/O
asynchronous,
Does the Code Use Asynchronous I/O Requests?
asynchronous conversion,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
changing to unformatted,
Does the Code Use Formatted I/O?
increasing request size,
Does the Code Use Small, Sequential, Unformatted I/O Requests?
large requests,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
minimizing number of data list items,
Does the Code Use Formatted I/O?
optimization principles,
Optimizing I/O-bound Code
optimizing for small requests,
Does the Code Use Small, Sequential, Unformatted I/O Requests?
optimizing formatted,
Does the Code Use Formatted I/O?
IEEE
processor support,
Vector Processors
L
latency,
Is the Program Memory Bound?
,
Latency and Bandwidth Issues
layered I/O,
Identifying I/O Intensive Code
layer
memory resident,
Does the Code Use Small, Sequential, Unformatted I/O Requests?
library buffer sizes,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
for formatted I/O,
Does the Code Use Formatted I/O?
for unformatted I/O,
Does the Code Use Small, Sequential, Unformatted I/O Requests?
loopmark listing
C/C++,
Is the Program Processor Bound?
Fortran,
Is the Program Processor Bound?
Loop
blocking,
Loop Blocking
collapse,
Loop Collapse
computed-safe vector length,
Computed-safe Vector Length Loops
conditionally vectorized,
Conditionally Vectorized Loops
dividing iterations among processors,
Multistreaming
fully vectorized,
Fully Vectorized Loops
fusion,
Loop Fusion
interchange,
Loop Interchange
outer-loop vectorization,
Outer-loop Vectorization
partially vectorized,
Partially Vectorized Loops
partitioned,
Streamed and Partitioned Loops
reduction,
Reduction Loop
short loops,
Short Loops
streamed,
Streamed and Partitioned Loops
streaming without partitioning,
Streaming without Partitioning
unrolling,
Loop Unrolling
vector update,
Vector Update Loops
M
Memory bound
defined,
Is the Program Memory Bound?
memory-resident
files,
Memory-Resident (MR) Files
layer,
Does the Code Use Small, Sequential, Unformatted I/O Requests?
Memory
bandwidth,
Bandwidth
cache,
Overview
cache line size,
Cache Line Size
cache pollution control,
Cache Pollution Control
changing instruction table size,
Optimizing Memory-bound Code
changing page size,
Optimizing Memory-bound Code
D-cache,
D-cache and I-cache
distributed,
Distributed Shared Memory
E-cache,
E-cache
I-cache,
D-cache and I-cache
other pages,
Optimizing Memory-bound Code
segment alignment,
Optimizing Memory-bound Code
text pages,
Optimizing Memory-bound Code
translation lookaside buffer,
Is the Program Memory Bound?
using efficient storage for,
Memory-Resident (MR) Files
mflop count,
Is the Program Processor Bound?
module,
Loading the CrayPat Module
mprofil
,
Sampling/Asynchronous Experiments
MR layer,
Does the Code Use Small, Sequential, Unformatted I/O Requests?
MSP,
System Overview
,
Multistreaming
Multistreaming
and vectorization,
Factors that Enhance Vectorization and Multistreaming
factors inhibiting,
Factors that Inhibit Multistreaming
loopmark listing,
Multistreaming
nested loops with,
Factors that Inhibit Multistreaming
,
Multistreaming
on Cray X1 systems,
Multistreaming
private arrays,
Multistreaming
processor (MSP),
System Overview
,
Multistreaming
streaming and partitioning,
Streamed and Partitioned Loops
streaming without partitioning,
Streaming without Partitioning
types of codes optimized,
Multistreaming
N
nested loops,
Factors that Inhibit Multistreaming
multistreaming,
Multistreaming
network optimization,
I/O
node modules,
System Overview
Node
application,
System Overview
cache,
Cache Coherency and Consistency
flavor,
System Overview
operating system,
System Overview
support,
System Overview
O
Optimization
I/O,
I/O
prerequisites,
Overview
process flow,
Optimization Flowchart
optimizing code
computed safe vector length,
Computed-safe Vector Length Loops
conditionally vectorized,
Conditionally Vectorized Loops
I/O bound,
Optimizing I/O-bound Code
loop blocking,
Loop Blocking
loop collapse,
Loop Collapse
loop fusion,
Loop Fusion
loop interchange,
Loop Interchange
loop unrolling,
Loop Unrolling
outer-loop vectorization,
Outer-loop Vectorization
partially vectorized loops,
Partially Vectorized Loops
reduction loop,
Reduction Loop
short loop,
Short Loops
streaming,
Streamed and Partitioned Loops
vectorization inhibitors,
Factors that Inhibit Vectorization
P
parallel programming models,
System Overview
pattern matching
loopmark listing,
Pattern-Matching
pat_build
,
Instrumenting a Program
instrumenting a program,
Is the Program I/O Bound?
pat_hwpc
viewing cache counters,
Optimizing Memory-bound Code
example report,
Is the Program Memory Bound?
pat_report
,
Generating a Report
pat
,
CrayPat
Performance counter overflow,
Sampling/Asynchronous Experiments
Performance counter sample,
Sampling/Asynchronous Experiments
PRINT statements,
Does the Code Use Formatted I/O?
processor
defined,
System Overview
profil
,
Sampling/Asynchronous Experiments
R
RAID striping,
Optimizing I/O-bound Code
READ statements,
Does the Code Use Formatted I/O?
records
using longer,
Does the Code Use Formatted I/O?
reduction loop,
Reduction Loop
registers,
Vector Processors
Resource usage overflow,
Sampling/Asynchronous Experiments
Resource usage sample,
Sampling/Asynchronous Experiments
rtc
,
Common Timing Tools
RTT,
Optimizing Memory-bound Code
S
safe vector length,
Computed-safe Vector Length Loops
sampling experiments,
Sampling/Asynchronous Experiments
samp_cs_ovfl
,
Sampling/Asynchronous Experiments
samp_cs_time
,
Sampling/Asynchronous Experiments
samp_heap_ovfl
,
Sampling/Asynchronous Experiments
samp_heap_time
,
Sampling/Asynchronous Experiments
samp_pc_ovfl
,
Sampling/Asynchronous Experiments
samp_pc_time
,
Sampling/Asynchronous Experiments
samp_ru_ovfl
,
Sampling/Asynchronous Experiments
samp_ru_time
,
Sampling/Asynchronous Experiments
scalar processing,
Vectorization
Scalar
data cache,
Cache
processor,
System Overview
registers,
Vector Processors
scatter,
Vector Update Loops
scratch files,
Minimizing System Calls
secondr
,
Common Timing Tools
sequential access,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
sequential I/O,
Does the Code Use Small, Sequential, Unformatted I/O Requests?
optimizing,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
single-streaming processors,
System Overview
SSP,
System Overview
mode availability,
System Overview
single-streaming processor,
Multistreaming
storage devices
using optimal,
Using an Optimal Storage Device
summation loop,
Reduction Loop
superscalar processor,
System Overview
synchronization reduction,
Does the Code Use Asynchronous I/O Requests?
system calls, minimizing,
Minimizing System Calls
T
TLB,
Is the Program Memory Bound?
,
Optimizing Memory-bound Code
Tools
cpu_time
,
Common Timing Tools
pat
,
CrayPat
rtc
,
Common Timing Tools
secondr
,
Common Timing Tools
timef
,
Common Timing Tools
timex
,
Common Timing Tools
pat_build
,
Instrumenting a Program
CrayPat,
CrayPat
module,
Loading the CrayPat Module
tracing experiments,
Tracing Experiments
U
unblocked file format
for asynchronous I/O,
Does the Code Use Asynchronous I/O Requests?
unblocked I/O format,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
unbuffered I/O format,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
unformatted I/O
changing to,
Does the Code Use Formatted I/O?
optimizing,
Does the Code Use Large, Sequential, Unformatted I/O Requests?
,
Does the Code Use Small, Sequential, Unformatted I/O Requests?
V
vector processing,
Vectorization
Vectorization
and multistreaming,
Factors that Enhance Vectorization and Multistreaming
compiler directives,
Using More Aggressive Compiler Options
compiler options,
Using More Aggressive Compiler Options
computed-safe vector length,
Computed-safe Vector Length Loops
conditionally vectorized loops,
Conditionally Vectorized Loops
defined,
Vectorization
factors inhibiting,
Factors that Inhibit Vectorization
fully vectorized loops,
Fully Vectorized Loops
loopmark listing,
Vectorization
outer loop,
Outer-loop Vectorization
partially vectorized loops,
Partially Vectorized Loops
reduction loop,
Reduction Loop
short loops,
Short Loops
types of,
Vectorization
vector update loops,
Vector Update Loops
Vector
collapse,
Reduction Loop
instruction set,
Vector Processors
processor,
System Overview
registers,
Vector Processors
W
workload balancing,
Does the Code Use Asynchronous I/O Requests?
WRITE statements,
Does the Code Use Formatted I/O?
Table of Contents
|
Title Page
|
Index
Up one level