2.2. Registers

The Cray X1 processor has the following set of registers:

Table 2-1. Cray X1 Registers

a0,...,a63

64-bit address registers

s0,...,s63

64-bit scalar registers

v0,...,v31

32-bit or 64-bit vector registers

vl

Vector length register

m0,...,m7

Mask registers

vc

Vector carry register

bmm

64x64 bit matrix multiply register

c0,...,c63

Control registers

In addition to these registers, processor state information is contained in:

Table 2-2. Processor State Information

(no syntax)

Program counter

(no syntax)

Performance counters

In this manual, the term register refers to a specific register, such as a23, the term register type refers to the letter or letters that identify the specific set of registers, such as a for address registers, and register designator refers to the number that specifies a register in that set, such as 23. CAL accepts either upper case or lower case for the register type, so a23 and A23 are treated the same.

2.2.1. Address and Scalar Registers

There are 64 address and 64 scalar registers, each 64 bits in width. They constitute computational way stations between memory and functional units in the processor for serial regions of the program. They also serve vectorized code with addresses, strides, and scalar values.

Registers a0 and s0 are unmodifiable zero values. Instructions that would write to them discard their results and may ignore exceptions.

Both the address and scalar registers are general-purpose and support the same memory reference instructions, immediate loads, integer functions, and conditional branches. However, address registers must be used for:

And scalar registers must be used for:

Unlike some other instruction set architectures, the Cray X1 system does not mandate a strict data typing rule. Integer operations may be performed on floating-point data and vice versa.

When 32-bit data enter a 64-bit address or scalar register, they are always right-justified and sign-filled, even if the value is notionally unsigned. 32-bit operations on 64-bit registers always ignore the upper bits and are guaranteed to return a sign-extended result.

There is a restriction that prohibits the use of both register designators n and n+32 or n and n-32 as operands for the same instruction if they are the same register types.

2.2.2. Vector Registers

There are 32 vector registers, each capable of holding 64 32-bit elements or 64 64-bit elements. They constitute computational way stations between the memory and the functional units of the processor for parallel regions of the program.

32-bit and 64-bit data are stored differently in vector registers. In particular, 32-bit data are not sign-extended as they are in the address and scalar registers.

A vector register holding the result of a 32-bit load or operation is defined for use only as an operand of a 32-bit store or operation. Similarly, a vector of 64-bit data can be used only in 64-bit instructions. Explicit conversion operations must be used to change the width of the data in the elements of vector registers.

This rule permits a hardware implementation to pack 32-bit vector elements in non-obvious ways that can vary from one generation to the next depending on pipe width. Packing permits 32-bit vector operations to execute at double speed.

When a vector register is written by a memory load or vector instruction, only the elements that are actually written are well-defined in the result vector register. The other elements, which may be those past the limiting value vl or which correspond to zero bits in the controlling mask register, become undefined. Programs must not assume that these other elements are preserved.

There is a restriction that prohibits the use of the same vector register as both the operand and the result of a type conversion operation.

The vrip instruction declares that the contents of the vector registers need no longer be maintained. It is used at the end of a vector sequence to avoid expensive context switch times.

2.2.3. Vector Length Register

The maximum number of elements that a vector register can hold is not actually specified by the architecture. It is only guaranteed to be a power of 2 and at least 64. It may vary between hardware implementations.

A vector's length is always a count of its elements, not its Bytes, Words, or Longwords. A vector of 32-bit data cannot hold any more elements than a vector of 64-bit data can.

The vector length register vl specifies the number of elements to be processed by vector register operations. Once set, it is an implicit operand to every vector register operation that follows.

Programs should use the cvl() function to compute legal and well-balanced Vector Length values.

Vector Length can be set to zero.

2.2.4. Mask Registers

Each of the eight mask registers contains a bit corresponding to each vector register element position. Since this may be larger than 64 bits, the instruction set contains instructions that manipulate mask registers directly.

Masks are set with the results of vector comparison operations. They can then be used to generate vector values with the scan() and cidx() functions. Masks are also used to control vector instructions on a per-element basis. Only the first four masks, m0:m3, can be used to control elemental vector operations. Values in m4:m7 must be moved to m0:m3 for use in vector instructions.

By software convention, mask register m0 can be assumed to have every bit set.

2.2.5. Vector Carry Register

There is a single Vector Carry (vc) Register that is both an operand to and a result of the 64-bit vector add with carry and subtract with borrow instructions. Like the mask registers, it holds one bit for every vector register element position.

2.2.6. Bit Matrix Multiply Register

The Cray X1 system implements bit matrix multiplication (see Section 2.4.6). There is a 64 by 64 bit matrix multiply register that must be loaded from a vector register.

2.2.7. Control Registers

Access to most of the control registers is privileged to code running in kernel mode. These privileged control registers provide the means for programming the processor's address translation units and other critical functions.

Some control registers are available to user mode code and are therefore part of the instruction set architecture:

c0: Floating-point control (rounding mode and interrupt/exception masks)
c1: Floating-point status (interrupt/exception flags)
c2: Read-only MSP configuration information
c3: Read-only user time clock
c4: Read-only additional MSP information
c28: 32 performance counter enable flags
c29: 32 2-bit performance counter event selections
c30: Performance counter access control
c31: Performance counter value

2.2.8. Program Counter

The 64-bit virtual Program Counter holds the byte address of the next instruction to fetch. This counter is not visible to the user but its content is referenced in the description of some instructions, with the notation pc.

2.2.9. Performance Counters

The processor has 32 64-bit performance counters that are accessed indirectly through control register c31. Each counter can be programmed with one of four event codes that determine when it increments.