| Optimizing Applications on the Cray X1TM System - S-2315-50 | ||
|---|---|---|
| Prev Section | Appendix D. Cache | Next Section |
Cray X1 systems provide sufficient bandwidth from the E-cache to deliver one 64-bit operand per processor per clock. Assuming other operands are available in registers, this bandwidth allows for processor peak performance for a multiply-add pair occurring in kernels such as matrix multiply. The peak bandwidth from memory is half of that from E-cache.
Bandwidth is maximized by stride-1 references for data residing in E-cache. The bandwidth that may be realized will be less for non-stride-1 references or for data outside of E-cache, but in particular for increasing powers-of-two that reuse the same data paths from the processor to memory.
| Prev Section | Table of Contents | Title Page | Index | Next Section |
| Cache Pollution Control | Up one level | Glossary |