$ sbatch sample_script.sh $ cat sample_script.sh #!/bin/bash -l #All options below are recommended #SBATCH -p general # run on partition general #SBATCH --cpus-per-task=32 # 32 CPUs per task #SBATCH --mem=100GB # 100GB per task #SBATCH --gpus=4 # 4 GPUs #SBATCH --mail-user=bulls@usf.edu # email for notifications #SBATCH --mail-type=BEGIN,END,FAIL,REQUEUE # events for notifications srun python first_script.py #1st task srun second_script.sh # 2nd task srun julia something.jl # 3rd task
Note: -p
and -w
must match the nodes listed when running the sinfo
command. E.g., -p Extended -w GPU17
would not work because GPU17 is not in partition Extended.
Important!!!!: Please do not place #SBATCH options after any Linux command (such as cd, srun, etc.) Slurm stops parsing for #SBATCH options after seeing the first Linux command.
$ sbatch sample_script.sh $ cat sample_script.sh #!/bin/bash -l #SBATCH --cpus-per-task=32 # 32 CPUs per task #SBATCH --mem=100GB # 100GB per task #SBATCH --gpus-per-task=4 # 4 GPUs per task #SBATCH --ntasks=2 # specify 2 parallel tasks srun --ntasks=1 --cpus-per-task=16 --exact python first_script.py & srun --ntasks=1 --cpus-per-task=16 --exact second_script.sh & wait
Slurm has a notion of “Partitions”, which determine which gaivi nodes your job can submit to and what priority you have. Use “sinfo” to see what partitions you have access to, e.g.
$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST ScoreLab up infinite 1 idle GPU6 general* up 7-00:00:00 2 idle GPU[6,8]
By default, jobs are submitted to the general partition. To submit to an alternate partition, include the “-p” option in your submission. For example, in an SBATCH script
#SBATCH -p ScoreLab
(visit the corresponding page of the Job Submission section for more detail on the partitions in this cluster)
squeue squeue -j [jobID]
scancel [jobID]
sinfo -N --format="%10N | %10t | %7X | %7Y | %7Z | %9m | %19G"
First create an interactive bash shell to a running job:
srun --pty --jobid <jobID> --interactive /bin/bash
Then run commands such as htop to check for CPU/memory consumption or nvidia-smi to check GPU consumption.
Please refer to Anaconda Cheat Sheet.
IMPORTANT!!!!: you cannot install packages to the base environment. Please create your own environment to be able to install your own packages. For example,
conda create --name myEnvironment python=3.5 conda activate myEnvironment conda install -c anaconda cudatoolkit conda install -c anaconda tensorflow-gpu
You will need to build a container image then execute your code on the image. To build an image:
$singularity pull docker://tensorflow/tensorflow:latest-gpu
Note: This must be executed from an ssh session on the login nodes. singularity pull will not work in a job context, like in a jupterhub session or srun.
$container-builder-client Dockerfile tf.sif
$srun singularity exec --nv tf.sif python some\_script.py
You can also put the command above in an sbatch script.
To run an interactive shell in a container:
srun --pty singularity shell --nv tf.sif /bin/bash
If containerized programs work on your computer, they should work on GAIVI.
Please open GAIVI's Jupyter Hub on your browser. Make sure you connect to USF VPN.
GAIVIADMIN@usf.edu