Upcoming changes to Cori GPU modulefiles¶
Starting in October 2020, and completing in January 2021, Cori GPU modulefiles are moving to a new location on the Cori system. The purpose of this transition is to isolate Cori GPU modulefiles from the large number of other modulefiles targeting the Cori Haswell and KNL nodes. This change will reduce confusion, and enable rapid adoption of new software for Cori GPU. This change requires a small modification to user workflows, as described below. The modified workflow is already available as an option to users now, and will eventually be required.
Starting in January 2021, users will be required to replace the command
module load esslurm
module load cgpu
in their Cori GPU workflow. The new
cgpu modulefile is available as of October 2020, and users are encouraged to 'opt-in' to this approach starting now. In January 2021, this modification will be required, and no longer optional.
All other Cori GPU modulefiles will work nearly indentically using both the
cgpu modulefiles. Both workflows will continue to be supported simultaneously on Cori GPU through the end of 2020. In January 2021, the
esslurm approach will be removed, and users will be required to use
Historically, Cori GPU modulefiles have been mixed together with the existing modulefiles which target Cori’s Haswell and KNL nodes. This has introduced some confusion about which modules target which architecture. For example, the modulefile
openmpi defaults to
openmpi/4.0.2, which targets Cori Haswell and KNL nodes, and does not work on Cori GPU nodes; users must explicitly request the version
openmpi/4.0.3 for Cori GPU.
To address this confusion, NERSC has duplicated most existing Cori GPU modulefiles into a new location, which is made available only by first loading the new modulefile
cgpu. When the
cgpu modulefile is not loaded, these modulefiles are not visible to the user. The
cgpu modulefile is designed to replace the
esslurm modulefile in the typical Cori GPU workflow, with no other changes needed.
The change of
cgpu will break some workflows; consequently, NERSC will continue to support Cori GPU modulefiles in both locations through the end of 2020, although new software will be installed only in the new location accessed via the
cgpu modulefile. In January 2021, during the scheduled Cori maintenance, NERSC will remove the Cori GPU modulefiles installed in the old location, and Cori GPU users will thereafter be required to load the
cgpu modulefile in order to access modulefiles targeting the Cori GPU nodes.
To illustrate the change, one may currently execute the following commands:
module purge module load esslurm module load gcc cuda openmpi/4.0.3
After January 2021, this will change to:
module purge module load cgpu module load gcc cuda openmpi
In the above example, the user is no longer required to request the
/4.0.3 version of the
openmpi module; when the
cgpu module is loaded, modulefile default versions are ensured to be correct for Cori GPU when duplicate versions of the same modulefile exist which target Cori Haswell and KNL nodes.
NERSC encourages users to start moving their workflows to this new approach, replacing
cgpu, as soon as possible; most modulefiles in the old location are already available in the new location, and in many cases newer versions of software are available in the new location. (For example, CUDA 11.1 and NVIDIA HPC SDK 20.9 are already available via
NERSC also encourage users to provide feedback regarding this change. Specifically, if modules available via the old workflow are found not not to work as expected in the new workflow, NERSC would like to know about these problems as soon as possible. Users are requested to provide feedback by submitting a ticket at the NERSC help portal.