D.6. Bandwidth

Cray X1 systems provide sufficient bandwidth from the E-cache to deliver one 64-bit operand per processor per clock. Assuming other operands are available in registers, this bandwidth allows for processor peak performance for a multiply-add pair occurring in kernels such as matrix multiply. The peak bandwidth from memory is half of that from E-cache.

Bandwidth is maximized by stride-1 references for data residing in E-cache. The bandwidth that may be realized will be less for non-stride-1 references or for data outside of E-cache, but in particular for increasing powers-of-two that reuse the same data paths from the processor to memory.