====== Job Submission ======
We use Slurm for job submissions. Since Slurm is widely used for job submissions in clusters, it has extensive documentation available for reference on the internet forums. For example, if you would like to run parallel jobs on multiple nodes, you can check out [[https://stackoverflow.com/questions/66919530/how-to-submit-run-multiple-parallel-jobs-with-slurm-sbatch?msclkid=e83fc4c3c57d11ec95d7efd4abdc6be0|Multi-parallel-jobs]] for details. This section lists the most common usage of Slurm for job submissions along with some details specific to the GAIVI cluster. Please contact the admins for any assistance or questions.
===== Sequential Steps =====
First, you need to create a bash script like this:
$ cat sample_script.sh
#!/bin/bash -l
#SBATCH -o std_out
#SBATCH -e std_err
srun python some_file.py
srun sh some_file.sh
Then run this to submit the job:
$ sbatch sample_sript.sh
The lines that start with ''#SBATCH'' are options for ''sbatch''. Here, the ''-o'' option specifies that the script’s standard output to be written to the ''std_out'' file and the ''-e'' option specifies that the script’s standard error to be written to the ''std_err'' file. Other common ''sbatch'' options are:
* ''-D [path]'': change working directory to ''[path]''.
* ''-w [node_name]'': request a specific compute node.
The computing processes should start with ''srun'', followed by your command-line program, such as python, bash, java, or compiled binary files. The only commands that you don't have to append with ''srun'' are environment-related commands, such as ''conda activate some_environment''. Each of these is considered by slurm as a "job step" and can make use of any subset of the resources allocated to the job. Without passing any flags to ''srun'', a step consumes the entire allocation of the job while it runs.
For a complete list of ''#SBATCH'' options, please visit [[https://slurm.schedmd.com/sbatch.html|here]].
The script above will allocate the default amount of resources (1 CPU, 16GB RAM) because no resource allocation option was provided. Please read the [[#resource_allocation|Resource Allocation]] section.
===== Parallel Tasks =====
In the previous example, the two job steps each launched one instance of their associated command. However, slurm can also easily be used to run multiple instances of a command in parallel. Here is where "tasks" enter the picture. Many slurm resource requests can be phrased as per-task, and then the allocation is multiplied by however many tasks the job requests. Then, each job step will automatically run multiple instances of the provided command in parallel- one for each task. For example:
$ cat sample_script.sh
#!/bin/bash -l
#SBATCH -o std_out
#SBATCH -o std_err
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=1
srun sh some_file.sh
This script will run two instances of "some_file.sh" in parallel. The ''ntasks'' and ''cpus-per-task'' options are important here: they tell Slurm to allocate one CPU for each of the two tasks (a total of two CPUs).
===== Parallel Steps =====
In addition to a single step running two tasks in parallel, two steps can be run in parallel as well:
$ cat sample_script.sh
#!/bin/bash
#SBATCH -o std_out
#SBATCH -e std_err
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=1
srun --ntasks=1 --exact python some_file.py &
srun --ntasks=1 --exact sh some_file.sh &
wait
Here the two ''srun'' instances are both run in the background, which allows them to start at the same time. Like the previous example, the job overall requests two tasks. However, each job step only requests one task (''srun --ntasks=1 ...''). Since both steps' allocations can be satisfied at the same time, slurm allows them both to start running in parallel. Since this job does not request resources on more than one node, the ''--exact'' option is needed: without it, an srun step will "round up" to consuming whole nodes from whatever is allocated to the job.
Alternately, the allocation could be requested in terms of nodes (with, implicitly, one task per node):
$ cat sample_script.sh
#!/bin/bash
#SBATCH -o std_out
#SBATCH -e std_err
#SBATCH --nodes=2
#SBATCH --cpus-per-task=1
srun --nodes=1 python some_file.py &
srun --nodes=1 sh some_file.sh &
wait
In this case ''--exact'' is not needed as each step will execute on a separate node in the cluster, so the steps are free to (and will) consume each node's whole allocation to the job.
===== Resource Allocation =====
It is important to specify the amount of CPU, GPU, and RAM required when submitting your job, as slurm will enforce that only these resources are available to you. Additionally, slurm will block other jobs from accessing the resources that have been allocated to your job. To specify resource allocation, please add the following options to your ''sbatch'' script:
* CPU Allocation: ''--cpus-per-task=[ncpus]'' would allocate ''[ncpus]'' processors per task.
* RAM Allocation: ''--mem=[size][units]'' would allocate amount of memory per node. For example, ''--mem=100GB'' would allocate 100GB.
* GPU Allocation: ''--gpus=[number]''. For example, ''--gpus=8'' would allocate 8 GPUs for the job (and thus nodes with less than 8 GPUs will not qualify).
===== Priority and Partitions =====
Slurm has a notion of "Partitions", which determine which gaivi nodes your job can submit to (among other things). Each user has access to a specific set of partitions on the cluster. ''sinfo'' will show you which partitions you have access to and some basic information about those partitions, e.g.
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
ScoreLab up infinite 1 idle GPU6
general* up 7-00:00:00 2 idle GPU[6,8]
"TIMELIMIT" refers to the maximum amount time a job submitted to that partition can execute. "NODES", "STATE", and "NODELIST" tell you about the specific gaivi nodes usable from that partition. A partition will have multiple lines if it has multiple nodes and those nodes are in different STATEs, with each line corresponding to each unique STATE of a node in that partition.
**Priority Levels** are also determined by partitions. A job in a higher priority partition will jump up in the queue and potentially kill ("preempt") a running job with lower priority to make the prioritized job run immediately. When a job is preempted, it will be put back in the job queue to be retried later. We recommend writing your jobs to checkpoint regularly and be ready for restarts, unless you are certain your job is running with the highest priority available on its node.
We have five main partitions and a handful of group/lab specific partitions:
* general: The default partition, with a 7 day limit
* Quick: For shorter jobs
* Extended: For longer jobs
* nopreempt: For jobs which cannot afford preemption (i.e. they will break if killed and retried later)
* Contributors: For higher priority jobs, if you're a member of a lab that has contributed to gaivi
* Lab specific partitions: For submitting a job at the highest priority specifically to nodes contributed by your lab
By default, jobs are submitted to the general partition. To submit to an alternate partition, include the "-p" option in your submission. For example, in an SBATCH script
#SBATCH -p ScoreLab
From highest to lowest priority:
- Lab/Group partitions (e.g. ScoreLab, CIBER)
- Contributors, nopreempt
- General, Quick, Extended
===== Recommended SBATCH Options =====
The following SBATCH options are recommended for a clean work environment and for avoiding errors:
#SBATCH -o std_out
#SBATCH -e std_err
#SBATCH -p Quick
#SBATCH --cpus-per-task=32 ### 32 CPUs per task
#SBATCH --mem=100GB ### 100GB per task
#SBATCH --gpus-per-task=8 ### 8 GPUs per task
===== View Pending/Running Jobs =====
To run pending and running jobs, run:
$ squeue
To view status of a job, run:
$ squeue -j [jobID]
In the output table, the **ST** column shows the status of the job.
The meaning of each status code is listed [[https://slurm.schedmd.com/squeue.html#lbAG|here]].
===== Cancel a Job =====
To cancel a job submitted through ''sbatch'', do:
$ scancel [jobID]
===== Checking Resource Usage on a Node =====
First create an interactive bash shell to a running job:
$ srun --pty --jobid /bin/bash
This will open a bash shell on the compute node. Then run commands such as ''htop'' to check for CPU/memory consumption, ''nvidia-smi'' to check GPU consumption.
For example,
$ srun --pty --jobid 4384 --interactive /bin/bash
$ htop
$ nvidia-smi
$ exit
Please remember to run the ''exit'' command to log out of the compute node.
===== Anaconda Jobs =====
Anaconda automatically manages Python machine learning libraries. For example, running TensorFlow jobs is simple; your sbatch script file should look like this:
$ cat sample_script.sh
#!/bin/bash -l
conda activate tensorflow_environment
python a_python_script.py
The location of Anaconda and its base environment is located at /apps/anaconda3. If you wish to use some specific or older versions of Python packages, you must create a new Anaconda environment, which will isolate your desired package versions just for you. To learn how to use Anaconda environments, please check [[https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html|this manual]].
===== Compile and Submit CUDA Jobs =====
Even though the head node does not have GPUs, it has CUDA installed. You can set your environment variables and compile your code, and after the binary file is generated, you can write your job submission script.
Note that you can add cudaSetDevice() to your source code to specify which GPU cards will be used for your job. If you don't include the line in your source code, by default, your job will be running on the first GPU card (device 0).
If nobody sets devices, all the jobs will flood onto device 0 of each GPU node. This may leave many resources unused since we have at least 4 GPUs on each node.
The following commands may be used for compiling your GPU code:
$ export PATH=$PATH:/share/apps/cuda/cuda-10.2/bin
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/apps/cuda/cuda-10.2/lib64/
$ nvcc -o output src.cu
===== Containerized Jobs =====
You may submit jobs that run on containerized environments. We have installed Singularity for users who would like to work on container environments.
**STEP 1:** First, you need to create a create a container image on your local machine (outside of GAIVI). Here is an example Dockerfile:
$ cat Dockerfile # select an image from Docker Hub as base
FROM python:3
RUN pip install numpy
RUN pip install tensorflow-gpu # install some packages on the image
More example building containers can be found online. Here is an example with [[https://towardsdatascience.com/how-to-compile-tensorflow-1-12-on-ubuntu-16-04-using-docker-6ca2d60d7567?gi=3b4cab03b4db|Tensorflow]]. You can also download an image from Docker Hub if you believe that image has everything you need:
$ singularity pull tensorflow/tensorflow:latest-gpu
If you download an image from Docker Hub, you can skip the build step (step 2 below) and go to step 3 for running code on the container.
**STEP 2:** Next, you need to build a container image from the Dockerfile. We have provided a build service to simplify this process. You can access the build service from GAIVI:
$ container-builder-client Dockerfile myImage.sif --overwrite
Alternatively, you can build an image on your local computer then transfer it to GAIVI:
$ sudo docker build .
$ scp
**STEP 3:** Execute the program(s) in the container
$ srun singularity exec --nv
# For example,
$ srun singularity run --nv tensorflow_latest-gpu.sif python code.py arg1 arg2
===== Jupyter Notebook Jobs =====
==== Using JupyterHub ====
//Jupyterhub// is the most convenient way to create a Jupyter Notebook job. Simply go to [[https://jupyterhub.gaivi.cse.usf.edu/|Jupyter-Hub-Gaivi]] on your web browser, login with your NetID, request resources for your notebook, and launch it.
After launching a notebook, the notebook’s log file will be created in your working directory, named "//jupyterhub_slurmspawner_{jobid}.log//".
==== Create Jupyter Notebook Jobs Manually ====
If you need to customize Slurm more than what is allowed through Jupyterhub, you can create one manually as follows. **Warning**: this method is not totally secure: the connection between the login node and your computer/laptop is secure, but the connection between the compute node and the login node is not. Your Jupyterlab job connection can be eavesdropped by another GAIVI user, but not from an outside attacker.
First, you need to find an idle compute node by running sinfo. For example, you find that GPU14 is idle and would like to run Jupyter Lab on it. Please adjust the following command to your situation
$ srun --partition Quick -w GPU14 --gpus=2 --mem=8G --cpus-per-task=8 jupyter lab --ip='0.0.0.0' --port=8888 --NotebookApp.token='some-token' --NotebookApp.password='some-password'
This will create a job on GPU14 that runs Jupyter Lab and can be connected from the login node
Next, you will need to forward the Jupyter Lab connection to your local computer/laptop through SSH Tunnel. Open a new terminal on your local machine and run
$ ssh -L8888:GPU14:8888 @gaivi.cse.usf.edu
Then open a browser on your local machine and navigate to localhost:8888. Enter the token you provided, e.g. 'some-token'.
The connection between the head node and the compute node is not encrypted (http and not https). Other users may be able to snoop this connection to obtain your token/password and then interfere with your Jupyter Lab session. The connection between the head node and your machine is secured.
==== Create a Custom Ipykernel ====
The default kernel is defined system-wide for all users. We will not be modifying this kernel for your needs. Instead, please register a kernel of your own from a custom conda environment and install packages and libraries on that conda environment. The commands are as follows:
$ conda create –n newEnv
$ conda activate newEnv
$ conda install -c conda-forge jupyterlab
$ python -m ipykernel install --user --name {kernelName} --display-name {kernelName}
===== Multi-Node Jobs =====
TBA