GAIVI Documentation

Table of Contents

Table of Contents

  • Welcome to GAIVI's Documentation!
    • What is on this site
    • Notice About Preemption
    • Request to use GAIVI
    • Login Information
    • Terms of Service
    • Priority and Contributions
    • System Information



  • Welcome to GAIVI's Documentation!
    • User Manual
    • Get Help
    • Share Your Experience

Admin Login



Contact: GAIVIADMIN@usf.edu


GAIVI Documentation
Docs » Welcome to GAIVI's Documentation!

Was this page helpful?-130+1



Welcome to GAIVI's Documentation!

This project brings together investigators with research thrusts in several core disciplines of computer science and engineering: big datamanagement, scientific computing, system security, hardware design, data mining, computer vision and pattern recognition. GAIVI can accelerate existing research and enable ground-breaking new research that shares the common theme of massive parallel computing at USF.

What is on this site

  • Notice About Preemption
  • General Information
    • Request to use GAIVI
    • Login information
    • Terms of Service
    • Priority and Contributions
    • System information
  • User manual
    • Quickstart
    • Job Submission
    • Data Storage
    • Working Environment
    • Jupyterhub
    • Workshop Recordings
  • Discussions
    • Common Bugs
    • Suggestions
  • Share your experience (e.g. how to configure IBM Federated Learning on GAIVI)
    • List of how-to's from all users
    • Instructions to create a new page

Notice About Preemption

GAIVI has job preemption and restarting enabled. This means that submitted compute jobs can be interrupted (effectively, cancelled), returned to the queue, and restarted if the hardware the job is using is needed by a higher priority job. When a job is preempted it has 5 minutes of grace time to clean up. Specifically, the job will receive a SIGTERM immediately and a SIGKILL 5 minutes later. This means your job should be prepared to gracefully exit within 5 minutes of receiving a SIGTERM at any time. Correspondingly, if you are trying to preempt a job please be prepared to wait up to 5 minutes for the preemption to take effect. We recommend most jobs be submitted as sbatch scripts and checkpoint regularly. If your job cannot work within these restrictions we do have a nopreempt partition, where jobs are safe from preemption, but it only includes a small handful of nodes.

Request to use GAIVI

Please create a request by filling out this form. A valid USF NetID is required; it will be your username for login. If you are a USF student, please also include your supervisor in the request; we will need their approval to proceed.

Login Information

First, please connect to USF VPN. You can use any ssh client to connect as you wish. Login information is as follows.

  • Host name: gaivi.cse.usf.edu
  • User name: your NetID
  • Password: your NetID Password

Terms of Service

Welcome to the GAIVI Cluster. By accessing this system, you agree to the following terms and conditions, designed to ensure the security, stability, and fairness of the computing environment for all users.

1. Operating Hours

  • Regular operating hours are 10:00 AM to 4:00 PM, Monday through Friday, excluding University holidays and closures. Administrative and technical support is provided only during these hours.

2. Account Privileges and Security

  • No Administrator Privileges: User accounts do not have administrator privileges. The use of sudo or any other privilege escalation command is strictly prohibited.
  • Unauthorized Activity: All attempts to use sudo are automatically logged and flagged for review. Unauthorized activity may result in immediate suspension of the account for verification. This measure is critical to maintaining the security of the cluster.

3. Job Submission Policy

  • All computational tasks requiring more than 10 minutes, more than 1 CPU core, or more than 2 GB of memory must be submitted to the SLURM scheduling environment.
  • Prohibited Tasks on Login Nodes: Tasks exceeding these limitations will be terminated without notice to preserve system stability for all users. Repeated violations or actions requiring administrator intervention may result in account suspension.

4. Data Ownership and Management

  • Data uploaded or stored in the /home, /data, or /general file systems is considered to belong to the individual or entity who approved your GAIVI access. The CSEIT team reserves the right to share your file system with them as necessary.
  • No Backup Guarantee: User data stored on the /home, /data, or /general file systems is NOT backed up or replicated. Users are strongly advised to maintain their own backups.

5. Acceptable Use

  • Users must comply with the University’s Acceptable Use Policy. Use of the cluster for for-profit activities, illegal purposes, or any activity violating University policies is strictly prohibited. Violations may result in account suspension or other disciplinary action.

6. Data Responsibility

  • The cluster’s storage systems are not intended for personal backups. CSEIT is not responsible for any data loss. Users are encouraged to maintain external copies of critical data.

7. Account Inactivity

  • Accounts that remain inactive for more than 6 months may be deactivated. All data under inactive accounts is subject to deletion without notice.

8. System Disruptions

  • The cluster is a shared environment. Tasks or behaviors that jeopardize system stability, security, or performance for other users are strictly prohibited. Instances requiring administrative intervention may lead to account suspension.

9. Agreement to Terms

  • By entering your password to access the GAIVI Cluster, you acknowledge that you have read, understood, and agreed to abide by these Terms of Service and the System Access Agreement.

Priority and Contributions

GAIVI offers higher submission priority to teams which have significantly contributed to the development of the cluster. A significant hardware contribution should be worth at least $25k, a standard example being a whole compute node with current Nvidia GPU(s). The contributor gains several benefits during the active contribution period, equivalent to the warraty period of the node plus one additional year, these benefits are 1) high priority on jobs sent to the contributed node, 2) access to the Contributors partition, which can submit jobs to all nodes in the cluster at a higher priority level than regular users, and 3) cooling, redundant power, security, and regular maintenance during the active period of the contribution. A year after the warranty period ends the node will be moved to the general partition for all gaivi users to use. For more details on gaivi's partitions see the user manual.

System Information

Node name Summary of role CPU CORES Processor type and speed Memory Card Info GPU Memory
GAIVI1 the front node of the cluster 12 cores Dual Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz 64 GB - -
GAIVI2 the front node of the cluster 20 cores Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz 128 GB - -
GPU1 compute node with GPUs 96 cores AMD EPYC 7352 24-Core 256 GB 3 * NVIDIA A100 240 GB
GPU2 compute node with GPUs 96 cores AMD EPYC 7352 24-Core 256 GB 2 * NVIDIA A100 160 GB
GPU3 compute node with GPUs 32 cores Dual Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz 128 GB 4 * 1080 Ti 44 GB
GPU4 compute node with GPUs 32 cores Dual Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz 128 GB 4 * 1080 Ti 44 GB
GPU6 compute node with GPUs 32 cores Dual Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz 128 GB 4 * 1080 Ti 44 GB
GPU7 compute node with GPUs 32 cores Dual Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz 128 GB 4 * 1080 Ti 44 GB
GPU8 compute node with GPUs 32 cores Dual Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz 128 GB 4 * 1080 Ti 44 GB
GPU9 compute node with GPUs 32 cores Dual Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz 128 GB 4 * 1080 Ti 44 GB
GPU11 compute node with GPUs 16 cores Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 128 GB 8 * TITAN X (Maxwell) 96 GB
GPU12 compute node with GPUs 16 cores Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 128 GB 8 * TITAN X (Maxwell) 96 GB
GPU13 compute node with GPUs 16 cores Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 128 GB 8 * TITAN X (Maxwell) 96 GB
GPU14 compute node with GPUs 16 cores Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 128 GB 8 * TITAN X (Maxwell) 96 GB
GPU15 compute node with GPUs 16 cores Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 384 GB 8 * TITAN X (Maxwell) 96 GB
GPU16 compute node with GPUs 16 cores Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 384 GB 4 * TITAN X (Maxwell) 48 GB
GPU17 compute node with GPUs 16 cores Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 128 GB 4 * TITAN X (Maxwell) + 4 * TITAN V 96 GB
GPU18 compute node with GPUs 16 cores Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 128 GB 8 * TITAN X (Pascal) 96 GB
GPU19 compute node with GPUs 16 cores Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 128 GB 8 * TITAN X (Pascal) 96 GB
GPU21 compute node with GPUs 16 cores Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 128 GB 8 * 1080 Ti 88 GB
GPU22 compute node with GPUs 20 cores Dual Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz 1024 GB 8 * 1080 Ti 88 GB
GPU41 compute node with GPUs 16 cores Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 128 GB 4 * TITAN X (Maxwell) 48 GB
GPU42 compute node with GPUs 32 cores Dual Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz 192 GB 4 * Titan RTX 96 GB
GPU43 compute node with GPUs 64 cores AMD EPYC 7662 64-Core 512 GB 4 * A40 192 GB
GPU44 compute node with GPUs 32 cores AMD EPYC 7532 32-Core 512 GB 4 * A40 192 GB
GPU45 compute node with GPUs 24 cores AMD EPYC 7413 24-Core 256 GB 1 * A100 80 GB
GPU46 compute node with GPUs 96 cores AMD EPYC 7352 24-Core 2 TB 3 * A100 240 GB
GPU47 compute node with GPUs 32 cores AMD EPYC 7513 32-Core 512 GB 2 * A100 160 GB
GPU48 compute node with GPUs 128 cores AMD EPYC 9554 64-Core 768 GB 6 * H100 480 GB
GPU49 compute node with GPUs 32 cores Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz 128 GB 2 * A100 + 2 * L40S 256 GB
GPU50 compute node with GPUs 32 cores AMD EPYC Genoa 9354 @ 3.3 GHz 384 GB 1 * RTX A6000 48 GB
GPU51 compute node with GPUs 32 cores AMD EPYC Genoa 9374F @ 3.85 GHz 384 GB 2 * RTX 6000 Ada 96 GB
GPU52 compute node with GPUs 32 cores AMD EPYC 7543 32-Core @ 2.8GHz 524 GB 4 * RTX A6000 182 GB
GPU53 compute node with GPUs 48 cores AMD EPYC Milan 7643 @ 2.3GHz 1024 GB 8 * L40S 384 GB
PHI1 compute node with Intel PHI cards 32 cores Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 128 GB 7 * Phi 5110P 56 GB
STORAGE2 storage node 24 cores Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz 512 GB - -
STORAGE3 storage node 16 cores Dual Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz 196 GB - -
STORAGE4 storage node 32 cores AMD EPYC 7313P 16-Core 128 GB - -
STORAGE6 storage node (NVMe) 32 cores AMD EPYC 9354P 32-Core 384 GB - -
Previous Next