D.3. Cache Localities and Strides

There are two kinds of cache locality: temporal and spatial.

Spatial locality is when you read an array element (for example, A(i)), a cache line is read in, and the next iteration you'll likely want is A(i+1), which will likely be in the cache.

Temporal locality is when you reference an array element (again, A(i)) or a scalar variable and then you may reference it again in a short time, so it may still be in the cache.

Temporal locality often occurs naturally in programs. Data will be set to some value and then used in a calculation or stored to memory. It is most efficient if this data is kept in a register during this time, but it is also common for the data to exist in cache. However, if the data is evicted from the cache before the next temporal use, performance will be lessened. The compilers attempt to structure loops and code in general to maximize register and cache reuse.

Spatial locality is maximized by using all the data that is loaded into the cache before it is evicted. The best way to do this is by minimizing memory reference stride, with stride-1 being by far the best. Stride-1 not only makes use of each word loaded into the cache, but also makes uniform use of the memory hierarchy. This includes data paths from the processors to the E-cache and all the way to the local and remote memory. Data path reuse causes performance loss because subsequent requests must wait to reuse that resource.