====== Quickstart ======
=====Job Submission – sequential tasks=====
$ sbatch sample_script.sh
$ cat sample_script.sh
#!/bin/bash -l
#All options below are recommended
#SBATCH -p general # run on partition general
#SBATCH --cpus-per-task=32 # 32 CPUs per task
#SBATCH --mem=100GB # 100GB per task
#SBATCH --gpus=4 # 4 GPUs
#SBATCH --mail-user=bulls@usf.edu # email for notifications
#SBATCH --mail-type=BEGIN,END,FAIL,REQUEUE # events for notifications
srun python first_script.py #1st task
srun second_script.sh # 2nd task
srun julia something.jl # 3rd task
Note: ''-p'' and ''-w'' must match the nodes listed when running the ''sinfo'' command. E.g., ''-p Extended -w GPU17'' would not work because GPU17 is not in partition Extended.
__**Important!!!!**__: Please do not place #SBATCH options after any Linux command (such as cd, srun, etc.) Slurm stops parsing for #SBATCH options after seeing the first Linux command.
=====Job Submission – parallel tasks=====
$ sbatch sample_script.sh
$ cat sample_script.sh
#!/bin/bash -l
#SBATCH --cpus-per-task=32 # 32 CPUs per task
#SBATCH --mem=100GB # 100GB per task
#SBATCH --gpus-per-task=4 # 4 GPUs per task
#SBATCH --ntasks=2 # specify 2 parallel tasks
srun --ntasks=1 --cpus-per-task=16 --exact python first_script.py &
srun --ntasks=1 --cpus-per-task=16 --exact second_script.sh &
wait
=====Job Submission – Priority and Partitions=====
Slurm has a notion of "Partitions", which determine which gaivi nodes your job can submit to and what priority you have. Use "sinfo" to see what partitions you have access to, e.g.
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
ScoreLab up infinite 1 idle GPU6
general* up 7-00:00:00 2 idle GPU[6,8]
By default, jobs are submitted to the general partition. To submit to an alternate partition, include the "-p" option in your submission. For example, in an SBATCH script
#SBATCH -p ScoreLab
(visit the corresponding page of the Job Submission section for more detail on the partitions in this cluster)
=====View pending and running jobs=====
squeue
squeue -j [jobID]
=====Cancel a job=====
scancel [jobID]
=====Checking compute nodes for all resources=====
sinfo -N --format="%10N | %10t | %7X | %7Y | %7Z | %9m | %19G"
=====Checking compute nodes for current consumed resources=====
First create an interactive bash shell to a running job:
srun --pty --jobid --interactive /bin/bash
Then run commands such as htop to check for CPU/memory consumption or nvidia-smi to check GPU consumption.
=====Anaconda Manual=====
Please refer to [[https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf|Anaconda Cheat Sheet]].
**__IMPORTANT!!!!__**: you cannot install packages to the base environment. Please create your own environment to be able to install your own packages. For example,
conda create --name myEnvironment python=3.5
conda activate myEnvironment
conda install -c anaconda cudatoolkit
conda install -c anaconda tensorflow-gpu
=====Containerized Jobs=====
====Step 1: Build a container image====
You will need to build a container image then execute your code on the image. To build an image:
* Option 1: pull from Docker Hub an image that has everything you need:
$singularity pull docker://tensorflow/tensorflow:latest-gpu
**Note**: This must be executed from an ssh session on the login nodes. singularity pull will not work in a job context, like in a jupterhub session or srun.
* Option 2: Create a Dockerfile and build it on your machine. Here is an example of Dockerfile with [[https://towardsdatascience.com/how-to-compile-tensorflow-1-12-on-ubuntu-16-04-using-docker-6ca2d60d7567|Tensorflow]].
* Option 3: build an image by using our build service on GAIVI: We are performing maintenance on the container builder service at the moment.
$container-builder-client Dockerfile tf.sif
====Step 2: Run your program on the container====
$srun singularity exec --nv tf.sif python some\_script.py
You can also put the command above in an sbatch script.
To run an interactive shell in a container:
srun --pty singularity shell --nv tf.sif /bin/bash
If containerized programs work on your computer, they should work on GAIVI.
=====Jupyter Lab=====
Please open [[https://jupyterhub.gaivi.cse.usf.edu/|GAIVI's Jupyter Hub]] on your browser. Make sure you connect to USF VPN.
=====Contact=====
GAIVIADMIN@usf.edu