Your Cray system may include the optional PBS Professional, Moab and TORQUE, or Platform LSF workload management system (WMS). If so, your system can be configured with a given number of interactive job processors and a given number of batch processors. A job that is submitted as a batch process can use only the processors that have been allocated to the batch subsystem. If a job requires more processors than have been allocated for batch processing, it remains in the batch queue but never exits.
Note: At any time, the system administrator can change the designation of any node from interactive to batch or vice versa. This change does not affect jobs already running on those nodes. It applies only to jobs already in the queue and jobs submitted later.
The basic process for creating and running batch jobs is to create a job script that includes aprun commands, then use the qsub command to run the script.
A job script may consist of directives, comments, and executable statements:
#PBS -N job_name #PBS -l resource_type=specification # command command ...
PBS Professional and Moab and TORQUE provide a number of resource_type options for specifying, allocating, and scheduling compute node resources, such as
mppwidth (number of processing elements),
mppdepth (number of threads),
mppnppn (number of PEs per node), and
mppnodes (manual node placement list). See Table 1 and the pbs_resources(7B) man page for details.
To submit a job to the workload management system, load the
module load pbs
module load moab
module load xt-lsfhpc
Then use the qsub command:
qsub [-l resource_type=specification] jobscript
jobscript is the name of a job script that includes one or more aprun commands.
The qsub command scans the lines of the script file for directives. An initial line in the script that has only the characters
#! or the character
: is ignored and scanning starts at the next line. A line with
shell from within the script. Scanning continues until the first executable line. An executable line is not blank, not a directive, and does not start with
#). If directives occur on subsequent lines, they are ignored.
When you run the script, qsub displays the Job ID. You can use the qstat command to check on the status of your job and the qdel command to remove a job from the queue.
If a qsub option is present in both a directive and on the command line, the command line takes precedence. If an option is present in a directive and not on the command line, that option and its argument, if any, are processed as if you included them on the command line.
Table 1 lists
aprun options and their counterpart qsub -l options:
Table 1. aprun Versus qsub Versus bsub (LSF) Options
|aprun Option||qsub -l Option||bsub Option||Description|
||Width (number of PEs)|
||N/A||Depth (number of CPUs hosting OpenMP threads)|
||LSF currently assumes a uniform processor pool||Number of PEs per node|
||N/A||Candidate node List|
||Memory per PE|
For further information about qsub -l options, see the pbs_resources(7B) man page.
For examples of batch jobs that use
aprun, see Running a Batch Job Script.
The qstat command displays the following information about all active batch jobs:
The job identifier (
Job id) assigned by the WMS
The job name (
The job owner (
CPU time used (
The job state
E (job is exiting)
H (job is held)
Q (job is in the queue)
R (job is running)
S (job is suspended)
T (job is being moved to a new location)
W (job is waiting for its execution time)
The queue (
Queue) in which the job resides
qstatJob id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 84.nid00003 test_ost4_7 usera 03:36:23 R workq 33.nid00003 run.pbs userb 00:04:45 R workq 34.nid00003 run.pbs userb 00:04:45 R workq 35.nid00003 STDIN userc 00:03:10 R workq
-a option is used, queue information is displayed in an alternative format.
qstat -aReq'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ---------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 84.nid00003 usera workq test_ost4_7 -- 1 1 -- -- Q -- 33.nid00003 userb workq run.pbs -- 1 1 -- -- Q -- 34.nid00003 userb workq run.pbs -- 1 1 -- -- Q -- 35.nid00003 userc workq STDIN -- 1 1 -- -- Q --
For details, see the qstat(1B) man page.