System Overviews [1]

1.1 Cray System Features

Cray XE and Cray XK supercomputers are massively parallel processing (MPP) systems. Cray has combined commodity and open source components with custom-designed hardware and software to create a system that can operate efficiently at an immense scale.

Cray systems are based on the Red Storm technology that was developed jointly by Cray Inc. and the U.S. Department of Energy Sandia National Laboratories. Cray systems are designed to run applications that require large-scale processing, high network bandwidth, and complex communications. Typical applications are those that create detailed simulations in both time and space, with complex geometries that involve many different material components. These long-running, resource-intensive applications require a system that is programmable, scalable, reliable, and manageable.

The Cray XE series consists of Cray XE5 and Cray XE6 systems. The Cray XK series consists of Cray XK6 systems. Both systems use the Gemini high speed system interconnect. Cray XK6 systems are hybrid supercomputers where each node has both an AMD Opteron 6200 series CPU and an NVIDIA GPGPU (General Purpose Graphics Processing Unit) that serves as a highly-threaded coprocessor especially suited for datasets in the SIMD computational domain.

The major features of Cray systems are performance, scalability and resiliency:

  • Cray systems are designed to scale to more than 1 million ranks. The ability to scale to such proportions stems from the design of system components and software:

    • The basic component is the node. There are two types of nodes. Service nodes provide support functions, such as managing the user's environment, handling I/O, and booting the system. Compute nodes run user applications. Because processors are inserted into standard sockets, customers can upgrade nodes as faster processors become available. Compute nodes consist of subsets called non-uniform memory access (NUMA) nodes, whose boundaries are defined by processor dies. Each NUMA node consists of a set of execution cores and memory. Inter-NUMA node operations within the same compute node will be slower than intra-NUMA node operations—this is the "non-uniformity" of memory access within and between NUMA nodes.

      On the Cray XE6 compute blade, consisting of AMD Opteron 6100 and 6200 Series Processors, there are two die per package. Thus, there are four NUMA nodes per compute node. On the Cray XK6 compute blade there is one AMD Opteron 6200 processor, therefore there are two NUMA nodes per compute node.

    • Cray systems use a simple memory model. Every instance of a distributed application has its own processors and local memory. Remote memory is the memory on other nodes that run the associated application instances. However, Cray systems support one-sided programming models such as Chapel and PGAS (Parallel Global Addess Space) languages that allow programs to treat application memory spaces as distributed global memories.

    • The system interconnection network links compute and service nodes. This is the data-routing resource that Cray systems use to maintain high communication rates as the number of nodes increases. Cray systems use a full 3D torus network topology. However, Cray XE5m and Cray XE6m use a two-dimensional topology.

  • Cray system resiliency features:

    • The Node Health Checker (NHC) performs tests to determine if compute nodes that are allocated to an application are healthy enough to support running subsequent applications. If not, NHC removes any nodes incapable of running an application from the resource pool.

    • Tools that assist administrators to recover from system or node failures, including a hot backup utility, boot node failover, single or multiple compute node reboots, and warm boots.

    • Error correction code (ECC) technology, which detects multiple-bit data storage and transfer errors and corrects most single and some multiple-bit errors.

    • Lustre file system failover. When administrators enable Lustre automatic failover, Lustre services switch to standby services if the primary node fails or Lustre services are temporarily shut down for maintenance.

    • System processor boards (called blades) have redundant voltage regulator modules (VRMs or verties) or VRMs with redundant circuitry.

    • Multiple redundant RAID controllers, that provide automatic failover capability and multiple Fibre Channel and InfiniBand connections to disk storage.

    • The ability to warm swap system blades.

    • Network link failure detection and automatic rerouting.

    • Application relaunch and reconnect.

The major software components of Cray systems are:

  • Application development tools, comprising:

    • Cray Application Development Environment (CADE):

      • Message Passing Toolkit (MPI, SHMEM)

      • Math and science libraries (LibSci, PETSc, ACML, FFTW)

      • Data modeling and management tools (NetCDF, HDF5)

      • GNU debugger (lgdb)

      • GCC C, C++, and Fortran compilers

      • Java (for developing service node programs)

  • Application placement tools:

    • Application Level Placement Scheduler (ALPS) application launch and schedule utility.

    • Cluster Compatibility Mode allows users to run cluster-based individual software vendor applications on Cray systems.

    • Checkpoint/restart.

  • Optional products:

    • C, C++, and Fortran 95 compilers from PGI and PathScale

    • glibc library (the compute node subset)

    • Chapel

    • Workload management Systems (PBS Professional, Moab and TORQUE, Platform LSF)

    • TotalView debugger

    • DDT debugger

    • Cray Performance Measurement and Analysis Tools

    • Intel Compiler Support

    • Cray Compiling Environment (CCE)

      • Cray C and compilers

      • Cray C++ compiler

      • Fortran 2003 compiler

      • The Cray C compiler supports Unified Parallel C and the Cray Fortran compiler supports co-arrays and several other Fortran 2008 features. All CCE compilers support OpenMP.

      • The CUDA toolkit

  • Cray Application Development Supplement (CADES) for stand alone Linux application development platforms

  • Operating system services. The operating system, Cray Linux Environment (CLE), is tailored to the requirements of service and compute nodes. A full-featured SUSE Linux operating system runs on service nodes, and a lightweight kernel, CNL, runs on compute nodes. With the compute node root runtime environment compute nodes have a chrooted, read-only view of the shared root file system to allow library linking and other such Linux services that are not included in the compute node kernel.

  • Parallel file systems support. Cray supports the Lustre parallel file system. CLE also enables the Cray system to use file systems such as NFS by projecting them to compute nodes using Cray Data Virtualization Services (DVS).

  • System management and administration tools:

    • System Management Workstation (SMW), the single point of control for system administration.

    • Hardware Supervisory System (HSS), which monitors the system and handles component failures. HSS is independent of computation and service hardware components and has its own network.

    • Comprehensive System Accounting (CSA), a software package that performs standard system accounting processing. CSA is open-source software that includes changes to the Linux kernel so that the CSA can collect more types of system resource usage data than under standard Fourth Berkeley Software Distribution (BSD) process accounting.

      An additional CSA interface enables the project database to use customer-supplied user, account, and project information that reside on a separate Lightweight Directory Access Protocol (LDAP) server.