Skip to content

Upcoming changes to Cori GPU modulefiles

Starting in October 2020, and completing in January 2021, Cori GPU modulefiles are moving to a new location on the Cori system. The purpose of this transition is to isolate Cori GPU modulefiles from the large number of other modulefiles targeting the Cori Haswell and KNL nodes. This change will reduce confusion, and enable rapid adoption of new software for Cori GPU. This change requires a small modification to user workflows, as described below. The modified workflow is already available as an option to users now, and will eventually be required.

Summary

Starting in January 2021, users will be required to replace the command

module load esslurm

with

module load cgpu

in their Cori GPU workflow. The new cgpu modulefile is available as of October 2020, and users are encouraged to 'opt-in' to this approach starting now. In January 2021, this modification will be required, and no longer optional.

All other Cori GPU modulefiles will work nearly indentically using both the esslurm and cgpu modulefiles. Both workflows will continue to be supported simultaneously on Cori GPU through the end of 2020. In January 2021, the esslurm approach will be removed, and users will be required to use cgpu instead.

Details

Historically, Cori GPU modulefiles have been mixed together with the existing modulefiles which target Cori’s Haswell and KNL nodes. This has introduced some confusion about which modules target which architecture. For example, the modulefile openmpi defaults to openmpi/4.0.2, which targets Cori Haswell and KNL nodes, and does not work on Cori GPU nodes; users must explicitly request the version openmpi/4.0.3 for Cori GPU.

To address this confusion, NERSC has duplicated most existing Cori GPU modulefiles into a new location, which is made available only by first loading the new modulefile cgpu. When the cgpu modulefile is not loaded, these modulefiles are not visible to the user. The cgpu modulefile is designed to replace the esslurm modulefile in the typical Cori GPU workflow, with no other changes needed.

The change of esslurm to cgpu will break some workflows; consequently, NERSC will continue to support Cori GPU modulefiles in both locations through the end of 2020, although new software will be installed only in the new location accessed via the cgpu modulefile. In January 2021, during the scheduled Cori maintenance, NERSC will remove the Cori GPU modulefiles installed in the old location, and Cori GPU users will thereafter be required to load the cgpu modulefile in order to access modulefiles targeting the Cori GPU nodes.

To illustrate the change, one may currently execute the following commands:

module purge
module load esslurm
module load gcc cuda openmpi/4.0.3

After January 2021, this will change to:

module purge
module load cgpu
module load gcc cuda openmpi

In the above example, the user is no longer required to request the /4.0.3 version of the openmpi module; when the cgpu module is loaded, modulefile default versions are ensured to be correct for Cori GPU when duplicate versions of the same modulefile exist which target Cori Haswell and KNL nodes.

NERSC encourages users to start moving their workflows to this new approach, replacing esslurm with cgpu, as soon as possible; most modulefiles in the old location are already available in the new location, and in many cases newer versions of software are available in the new location. (For example, CUDA 11.1 and NVIDIA HPC SDK 20.9 are already available via cgpu.)

NERSC also encourage users to provide feedback regarding this change. Specifically, if modules available via the old workflow are found not not to work as expected in the new workflow, NERSC would like to know about these problems as soon as possible. Users are requested to provide feedback by submitting a ticket at the NERSC help portal.