Slurm Access to the Cori GPU nodes¶
The GPU nodes are accessible via Slurm on the Cori login nodes. One can allocate a job on the GPU nodes via the Slurm job allocation flag
--constraint, in the same way that one allocates jobs on Haswell or KNL nodes. However, one must load the
esslurm module before allocating a job on the GPU nodes, or else the job allocation will fail.
Each node has 8 GPUs, 40 physical cores spread across 2 sockets with 2 hyper-threads per core, and 384 GB DRAM. To access approximately 1/8 of a single node's resources (generally sufficient for single-GPU code development), one can execute
user@cori02> module load esslurm user@cori02> salloc -C gpu -N 1 -t 60 -c 10 -G 1 -A <account> salloc: Granted job allocation 12345 salloc: Waiting for resource configuration salloc: Nodes cgpu02 are ready for job user@cgpu02:~>
which will provide the user with 1 GPU, 5 physical cores (10 hyper-threads or "CPUs"), and approximately 48 GB of DRAM. Note that Slurm already allocates memory to your job proportial to the number of CPUs you request for your job. E.g., if you request
-c 40 (half of the available CPUs), you will be allocated roughly half of the memory on the node - approximately 192 GB.
The new flag in the above example which is not used elsewhere on Cori is
-G (equivalent to
--gpus=), which is used to allocate a particular number of GPUs on the node.
Allocate GPUs with
-G <N> or
--gpus=<N> instead of
In older versions of Slurm, allocating GPUs required using the flag
--gres=gpu:<N>. However, a recent update to Slurm enabled the use of the GPU allocation flag
-G <N> for short). These flags have similar basic functionality to
--gres=gpu:<N>, but are easier to type, and also offer more flexible resource allocation. Is is recommended that scripts replace
You must make sure to specify the Slurm account with is associated with the GPU QOS for your user account. To see which Slurm accounts your user account is associated with and which QOSes are available for each account, use the command:
module load esslurm sacctmgr show assoc user=$USER -p
which will print to screen any accounts you can submit jobs with and the allowed QOSes for the jobs. If your user account is associated with several job accounts, you'll probably want to use something like
sacctmgr show assoc user=$USER -p | grep gpu to search the output.
GPU nodes are 'shared' by default
Slurm's default behavior on the 'normal' compute nodes on Cori and Edison is to reserve each compute node entirely for yourself; every node in your job allocation is exclusively yours. However, on the GPU nodes, the default behavior is the opposite - the default behavior is to share the nodes in your job allocation with other users. If you need to reserve all CPU resources on a node for yourself, you can specify the
--exclusive option in your Slurm script invocation.
Although sharing nodes reduces the likelihood that you will need to wait for a node to become available, users of shared nodes may encounter signficant performance variability due to other concurrent activity on the node, particularly if PCI traffic (CPU <-> GPU memory bandwidth and network bandwidth) comprises a significant portion of an application's performance. This is because the GPU nodes do not have enough PCI bandwidth to service all PCI connections at full speed.
Use only what you need
There are only 18 GPU nodes to satisfy the development needs of many NERSC users. If you need all CPUs and GPUs on a given number of GPU nodes for your work, you should use them. But if you only need a single GPU and a single physical core, please be mindful of others and do not reserve the entire node for yourself.
Cori GPU prioritizes interactive code development during business hours in Pacific Time (UTC-7), and allows large and/or long running jobs to run on nights and weekends. Cori GPU also prioritizes jobs submitted by NESAP application teams over non-NESAP teams.
Job constraints are as follows:
- Jobs running between 8:00 AM Pacific Time (3:00 PM UTC) and 8:00 PM Pacific Time (3:00 AM UTC) from Monday through Friday are limited to 4-hours of run time.
- Jobs running before 8:00 AM Pacific Time (3:00 PM UTC) or after 8:00 PM Pacific Time (3:00 AM UTC), or on weekends, can run until 8:00 AM Pacific Time on the next weekday.
- Members of the NESAP ERCAP project (
m1759) may add an additional flag
-q specialto their batch and interactive jobs to be placed in a higher-priority queue.
- Interactive jobs are now limited to 2 GPUs. Jobs requiring more than 2 GPUs can be submitted via
Slurm commands with
esslurm module is loaded, commands such as
sbatch, etc. will not show information or submit jobs to 'normal' Cori compute nodes. To query the 'normal' compute nodes, unload the
esslurm module with
module unload esslurm and then enter your desired Slurm commands.