Working Environment
We recommend that users manage their own software and libraries instead of asking administrators to install them system-wide. We would allow exceptions for software that require interference to the Linux kernel, such that the Nvidia driver. Most software and libraries can be installed without admin privilege. These include CUDA, any python version, python libraries, compilers, Java, etc. There are two good options to manage your software and libraries: Conda environment and container image.
Conda Environment
For python packages, we recommend using Conda environments and virtual environments. Please note that you can also install different CUDA versions with Conda: please check here.
First, you need to create a new Conda environment, with python3.8 for example:
$conda create --name newEnv python==3.8
Then activate the Conda environment and install packages:
$conda activate newEnv $conda install pip $conda install -c anaconda cudatoolkit=10.1 $pip install tensorflow-gpu==2.3
After that, you only need to run $conda activate newEnv
in your job scripts.
Important: Please do not run pip install <some_package>
in an environment you do not have write access to, such as the base
environment. This will install the packages into your ~/.local
directory instead of the environment's directory. As a result, these packages will show up in all of your environments and take priority in all environments. Please only run pip install
in an environment that you created with conda create
. If you run into strange issues with your packages, check the packages installed at ~/.local
.
Container Images
We recommend using container images for managing anything that you cannot do easily with Conda environments. Some examples are operating systems (e.g. Ubuntu), compilers (e.g. gcc), Java libraries. A container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries, and settings. You can even use an operating system that is different from that on GAIVI (RHEL 7.9).
For more information about creating container images and running containerized jobs, please check this section.
NVIDIA Driver
Administrators will take responsibility for updating the latest Nvidia drivers. Nvidia drivers are backward-compatible with all CUDA versions. Therefore, the latest Nvidia driver would work on any CUDA version that you install.