Welcome to GAIVI's Documentation!

This project brings together investigators with research thrusts in several core disciplines of computer science and engineering: big datamanagement, scientific computing, system security, hardware design, data mining, computer vision and pattern recognition. GAIVI can accelerate existing research and enable ground-breaking new research that shares the common theme of massive parallel computing at USF.

What is on this site

Notice About Preemption

General Information

User manual

Discussions
- Common Bugs
- Suggestions

Share your experience (e.g. how to configure IBM Federated Learning on GAIVI)
- List of how-to's from all users
- Instructions to create a new page

Notice About Preemption

GAIVI has job preemption and restarting enabled. This means that submitted compute jobs can be interrupted (effectively, cancelled), returned to the queue, and restarted if the hardware the job is using is needed by a higher priority job. When a job is preempted it has 5 minutes of grace time to clean up. Specifically, the job will receive a SIGTERM immediately and a SIGKILL 5 minutes later. This means your job should be prepared to gracefully exit within 5 minutes of receiving a SIGTERM at any time. Correspondingly, if you are trying to preempt a job please be prepared to wait up to 5 minutes for the preemption to take effect. We recommend most jobs be submitted as sbatch scripts and checkpoint regularly. If your job cannot work within these restrictions we do have a nopreempt partition, where jobs are safe from preemption, but it only includes a small handful of nodes.

Request to use GAIVI

Please create a request by filling out this form. A valid USF NetID is required; it will be your username for login. If you are a USF student, please also include your supervisor in the request; we will need their approval to proceed.

Login Information

First, please connect to USF VPN. You can use any ssh client to connect as you wish. Login information is as follows.

Host name: gaivi.cse.usf.edu
User name: your NetID
Password: your NetID Password

Terms of Service

Welcome to the GAIVI Cluster. By accessing this system, you agree to the following terms and conditions, designed to ensure the security, stability, and fairness of the computing environment for all users.

1. Operating Hours

Regular operating hours are 10:00 AM to 4:00 PM, Monday through Friday, excluding University holidays and closures. Administrative and technical support is provided only during these hours.

2. Account Privileges and Security

No Administrator Privileges: User accounts do not have administrator privileges. The use of sudo or any other privilege escalation command is strictly prohibited.
Unauthorized Activity: All attempts to use sudo are automatically logged and flagged for review. Unauthorized activity may result in immediate suspension of the account for verification. This measure is critical to maintaining the security of the cluster.

3. Job Submission Policy

All computational tasks requiring more than 10 minutes, more than 1 CPU core, or more than 2 GB of memory must be submitted to the SLURM scheduling environment.
Prohibited Tasks on Login Nodes: Tasks exceeding these limitations will be terminated without notice to preserve system stability for all users. Repeated violations or actions requiring administrator intervention may result in account suspension.

4. Data Ownership and Management

Data uploaded or stored in the /home, /data, or /general file systems is considered to belong to the individual or entity who approved your GAIVI access. The CSEIT team reserves the right to share your file system with them as necessary.
No Backup Guarantee: User data stored on the /home, /data, or /general file systems is NOT backed up or replicated. Users are strongly advised to maintain their own backups.

5. Acceptable Use

Users must comply with the University’s Acceptable Use Policy. Use of the cluster for for-profit activities, illegal purposes, or any activity violating University policies is strictly prohibited. Violations may result in account suspension or other disciplinary action.

6. Data Responsibility

The cluster’s storage systems are not intended for personal backups. CSEIT is not responsible for any data loss. Users are encouraged to maintain external copies of critical data.

7. Account Inactivity

Accounts that remain inactive for more than 6 months may be deactivated. All data under inactive accounts is subject to deletion without notice.

8. System Disruptions

The cluster is a shared environment. Tasks or behaviors that jeopardize system stability, security, or performance for other users are strictly prohibited. Instances requiring administrative intervention may lead to account suspension.

9. Agreement to Terms

By entering your password to access the GAIVI Cluster, you acknowledge that you have read, understood, and agreed to abide by these Terms of Service and the System Access Agreement.

Priority and Contributions

GAIVI offers higher submission priority to teams which have significantly contributed to the development of the cluster. A significant hardware contribution should be worth at least $25k, a standard example being a whole compute node with current Nvidia GPU(s). The contributor gains several benefits during the active contribution period, equivalent to the warraty period of the node plus one additional year, these benefits are 1) high priority on jobs sent to the contributed node, 2) access to the Contributors partition, which can submit jobs to all nodes in the cluster at a higher priority level than regular users, and 3) cooling, redundant power, security, and regular maintenance during the active period of the contribution. A year after the warranty period ends the node will be moved to the general partition for all gaivi users to use. For more details on gaivi's partitions see the user manual.

System Information

Node name	Summary of role	CPU CORES	Processor type and speed	Memory	Card Info	GPU Memory
GAIVI1	the front node of the cluster	12 cores	Dual Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz	64 GB	-	-
GAIVI2	the front node of the cluster	20 cores	Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz	128 GB	-	-
GPU1	compute node with GPUs	96 cores	AMD EPYC 7352 24-Core	256 GB	3 * NVIDIA A100	240 GB
GPU2	compute node with GPUs	96 cores	AMD EPYC 7352 24-Core	256 GB	2 * NVIDIA A100	160 GB
GPU3	compute node with GPUs	32 cores	Dual Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz	128 GB	4 * 1080 Ti	44 GB
GPU4	compute node with GPUs	32 cores	Dual Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz	128 GB	4 * 1080 Ti	44 GB
GPU6	compute node with GPUs	32 cores	Dual Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz	128 GB	4 * 1080 Ti	44 GB
GPU7	compute node with GPUs	32 cores	Dual Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz	128 GB	4 * 1080 Ti	44 GB
GPU8	compute node with GPUs	32 cores	Dual Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz	128 GB	4 * 1080 Ti	44 GB
GPU9	compute node with GPUs	32 cores	Dual Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz	128 GB	4 * 1080 Ti	44 GB
GPU11	compute node with GPUs	16 cores	Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz	128 GB	8 * TITAN X (Maxwell)	96 GB
GPU12	compute node with GPUs	16 cores	Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz	128 GB	8 * TITAN X (Maxwell)	96 GB
GPU13	compute node with GPUs	16 cores	Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz	128 GB	8 * TITAN X (Maxwell)	96 GB
GPU14	compute node with GPUs	16 cores	Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz	128 GB	8 * TITAN X (Maxwell)	96 GB
GPU15	compute node with GPUs	16 cores	Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz	384 GB	8 * TITAN X (Maxwell)	96 GB
GPU16	compute node with GPUs	16 cores	Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz	384 GB	4 * TITAN X (Maxwell)	48 GB
GPU17	compute node with GPUs	16 cores	Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz	128 GB	4 * TITAN X (Maxwell) + 4 * TITAN V	96 GB
GPU18	compute node with GPUs	16 cores	Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz	128 GB	8 * TITAN X (Pascal)	96 GB
GPU19	compute node with GPUs	16 cores	Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz	128 GB	8 * TITAN X (Pascal)	96 GB
GPU21	compute node with GPUs	16 cores	Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz	128 GB	8 * 1080 Ti	88 GB
GPU22	compute node with GPUs	20 cores	Dual Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz	1024 GB	8 * 1080 Ti	88 GB
GPU41	compute node with GPUs	16 cores	Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz	128 GB	4 * TITAN X (Maxwell)	48 GB
GPU42	compute node with GPUs	32 cores	Dual Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz	192 GB	4 * Titan RTX	96 GB
GPU43	compute node with GPUs	64 cores	AMD EPYC 7662 64-Core	512 GB	4 * A40	192 GB
GPU44	compute node with GPUs	32 cores	AMD EPYC 7532 32-Core	512 GB	4 * A40	192 GB
GPU45	compute node with GPUs	24 cores	AMD EPYC 7413 24-Core	256 GB	1 * A100	80 GB
GPU46	compute node with GPUs	96 cores	AMD EPYC 7352 24-Core	2 TB	3 * A100	240 GB
GPU47	compute node with GPUs	32 cores	AMD EPYC 7513 32-Core	512 GB	2 * A100	160 GB
GPU48	compute node with GPUs	128 cores	AMD EPYC 9554 64-Core	768 GB	6 * H100	480 GB
GPU49	compute node with GPUs	32 cores	Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz	128 GB	2 * A100 + 2 * L40S	256 GB
GPU50	compute node with GPUs	32 cores	AMD EPYC Genoa 9354 @ 3.3 GHz	384 GB	1 * RTX A6000	48 GB
GPU51	compute node with GPUs	32 cores	AMD EPYC Genoa 9374F @ 3.85 GHz	384 GB	2 * RTX 6000 Ada	96 GB
GPU52	compute node with GPUs	32 cores	AMD EPYC 7543 32-Core @ 2.8GHz	524 GB	4 * RTX A6000	182 GB
GPU53	compute node with GPUs	48 cores	AMD EPYC Milan 7643 @ 2.3GHz	1024 GB	8 * L40S	384 GB
PHI1	compute node with Intel PHI cards	32 cores	Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz	128 GB	7 * Phi 5110P	56 GB
STORAGE2	storage node	24 cores	Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz	512 GB	-	-
STORAGE3	storage node	16 cores	Dual Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz	196 GB	-	-
STORAGE4	storage node	32 cores	AMD EPYC 7313P 16-Core	128 GB	-	-
STORAGE6	storage node (NVMe)	32 cores	AMD EPYC 9354P 32-Core	384 GB	-	-