User Manual
Node Types
There are three types of nodes:
- Login/Head Nodes: These are machines where you log in to and submit computing jobs.
- Compute Nodes: These are machines that perform computational tasks. Please do not ssh to compute nodes to run your code manually. If you do so, your program will be killed unconditionally, you will be on the blacklist, and we will lower your job priority. Please check the Submit Job(s) section for more information.
- Storage Nodes: These machines have a large storage. Generally, data is transferred from storage nodes to compute nodes during a job.
Partitions of Compute Nodes
- General (default): Maximum job time is 7 days.
- Quick: Maximum job time is 1 day.
- Extended: No maximum job time.
- Contributors: include all nodes, higher priority than General, Quick, Extended but lower than any Team partitions. No maximum job time.
- nopreempt: Small set of nodes, set aside for all users to use at the contributors priority tier for jobs that cannot afford preemption.
- Team partitions named after research groups: no maximum job time, include only the nodes contributed by the group. Runs jobs at the highest priority.
Run the command `sinfo` to see which nodes are available in which partitions.
Job Submission Policy
- Maximum number of jobs run at any moment per user: 6
- Default resources per job: 1 CPU core + 16 GB RAM. Users who need more resources will need to specify these values in their Sbatch script (by using the –cpus-per-task, –gpus-per-task, –mem options for CPUs, GPUs, and RAM, respectively)
- User Groups: any user can submit jobs to an idle node. A job submitted to a partition with higher priority can preempt a job in a partition with lower priority and make the prioritized job run immediately. When a job is preempted, it will be put back in the job queue. The only exception to this rule is the partition “nopreempt”: jobs running on this partition cannot preempt or be preempted. Jobs in the same partition cannot preempt each other.
Example 1: User1 submits a job to node GPU5 in the Extended partition. Later, User2 submits a job to node GPU5 in the Contributors partition with an “#SBATCH –exclusive” option. User2’s job will preempt User1’s job
Example 2: User1 submits a job to node GPU42 in the Contributors partition. Later, User2 submits a job to node GPU42 in the ScoreLab partition with an “#SBATCH –mem=180GB” option. User2’s job will preempt User1’s job
Example 3: User1 submits a job to node GPU42 in the Contributors partition. Later, User2 submits a job to node GPU42 in the Contributors partition with an “#SBATCH –exclusive” option. User2’s job will wait until User1’s job finish to execute