Sylvester HPC QuickStart Guide ============================== The purpose of this document is to provide members of the Sylvester Comprehensive Cancer Center community with the basic tools necessary to access the High-Performance Computing (HPC) resources at the UM Frost Institute for Data Science and Computing (IDSC). This guide is designed as a starting point for users new to Pegasus and is intended to act as a reference to supplement the on boarding training required for all users. How do I get started? --------------------- After reviewing our `Policies `__, you will need a Sylvester Project Allocation. You can submit a `Sylvester Project Allocation Request `__. To join an existing Project, submit a `IDSC Account Request `__, with the Project ID. If you are new to Pegasus, please review our `Onboarding Training Material `__. If you’re new to Linux, please review our `Linux Training Material `__. Sylvester Utilization --------------------- For access to the `Sylvester Utilization Dashboard `__ please submit a ticket through `here `_ Sylvester Dedicated HPC Queues ------------------------------ We can leverage the `IDSC Service Units (SU) pricing model `__ and a **Tier Subscription** approach to increase the **performance** and **utilization** of Sylvester HPC dedicated resources while maintaining high availability for users who have financially contributed to these dedicated resources. HPC Storage ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Each user may utilize up to 250 GB (50,000 files) of GPFS home space. - Each project may utilize up to 2T (2,000,000 files) of GPFS scratch space. - Each project may utilize up to 2T (20,000 files) of GPFS Sylvester project space. - Users and Projects over quotas will not be able to create new files. - Scratch and Project space is intended only for data in active use. - There are no IDSC managed backups of GPFS Scratch or Project space. - Scratch space is subject to purging when necessary for continued operation. - Scratch space is charged only for actual utilization. - Projects may lease additional IDSC GPFS project space annually at $500 for 10TB. - Dedicated space is charged for total allocation and not by utilization. Tier 1 Development (sccc-dev) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Open to all Sylvester Labs upon resource availability, requires Lab's PI approval. :: - For users new to HPC, like the “IDSC Early Career Research Grant” - Job priority significantly less than Tier 2 - Preemptible Jobs after 1 hour of runtime - Max Job Runtime: 8 hours - Max Cores per job, and concurrent: 16 cores per job, 32 cores per project - Max Memory per job: 128GB ram per job, 512GB ram per project (1 node) - Scratch Storage: Local 1TB NVMe (optimal) and Pegasus HPC Scratch (PI respondsible) - Project Storage: Pegasus HPC HOME (250GB) and Pegasus HPC Storage (PI respondsible) Tier 2 General (sccc) ~~~~~~~~~~~~~~~~~~~~~ Open to all Sylvester Labs upon resource availability, requires Lab's PI approval. :: - Job priority significantly less than Tier 3. - Preemptible Jobs after 1 hours of runtime - Max Job Runtime: 48 hours - Max Cores per job, and concurrent: 64 cores per job (same node), no limit per project. - Max Memory per job: 512GB ram per job, no limit per project. - Scratch Storage: Local 1TB NVMe (optimal) and Pegasus HPC Scratch (PI respondsible) - Project Storage: Pegasus HPC HOME (250GB) and Pegasus HPC Storage (PI respondsible) Tier 3 Big Memory (sccc-bigmem) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Open to all Sylvester Labs upon resource availability, requires Lab's PI approval. :: - Job priority significantly less than Tier 4. - Preemptible Jobs after 1 hours of runtime - Max Job Runtime: 96 hours - Max Cores per job, and concurrent: 64 cores per job (same node), 1 job per project. - Max Memory per job: 4TB per job and project (each of the two nodes has 4TB RAM). - Scratch Storage: Local 1TB NVMe (optimal) and Pegasus HPC Scratch (PI respondsible) - Project Storage: Pegasus HPC HOME (250GB) and Pegasus HPC Storage (PI respondsible) Tier 3 GPU Queue (sccc-gpu) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Open to all Sylvester Labs upon resource availability, requires Lab's PI approval. :: - Job priority: Significantly less than Tier 4. - Preemptible Jobs after 1 hours of runtime - Max Job Runtime: 96 hours - Access to Sylvester GPU's (1 Nvidia A100 per node) - Max Cores per job, and concurrent: 4 cores per job (same node), 1 job per project. - Max Memory per job: 512GB per job and project. - Scratch Storage: Local 1TB NVMe (optimal) and Pegasus HPC Scratch (PI respondsible) - Project Storage: Pegasus HPC HOME (250GB) and Pegasus HPC Storage (PI respondsible) Tier 4 Premium (sccc-premium, sccc-bigmem-premium, sccc-gpu-premium) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reserved for Sylvester Labs that have purchased dedicated resources :: - Job priority: Other Tier 4 Jobs - Job limitations: Up to Lab PI's - Scratch Storage: Local 1TB NVMe (optimal) and Pegasus HPC Scratch (PI respondsible) - Project Storage: Pegasus HPC HOME (250GB), Sylvester HPC Storage, and Pegasus HPC Storage (PI respondsible) What are specs of available HPC Nodes? -------------------------------------- Triton (96 nodes) ~~~~~~~~~~~~~~~~~ :: OS: CentOS 7.9, ppc64le CPU: 2 x IBM Power9 (40 cores/node, 3840 cores total) RAM: 16 x 16GiB RDIMM DDR4 2666MHz ECC (256GiB/node, 6.4GB/core) GPU: 2 x Nvidia V100-SXM2 (16GB GPU RAM) NET: 100Gbps Infiniband (IB), 1Gbps Ethernet Disk: 2 x 1.92TB Micron 5100PRO SSD (RAID1, 1080MBps/1040MBps Seq Read/Write, 186K/74K IOPS) Pegasus Compute (350 nodes) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ :: OS: CentOS 7.6, x86_64 CPU: 16c/node (for a total of 4800 CPU-cores) RAM: 64GiB nodes (4GiB/core, for a total of 22400GiB) RAM: 256GiB nodes (16GiB/core, for a total of 4096GiB) NET: 56Gbps Infiniband, 1 Gbps Ethernet Disk: Stateless (ramdisk) Pegasus Sylvester Dedicated Compute (16 nodes) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :: OS: CentOS 7.9, x86_64 CPU: 2 x Intel Xeon Gold 6338 CPU @ 2.00GHz (64 cores/node, 1024 cores total) RAM: 16 x 32GiB RDIMM DDR4 3200MHz ECC (512GiB/node, 8192GiB RAM total) Net: 100Gbps Infiniband, 10Gbps Ethernet, Disk: 960GiB Samsung PM9A3 NVMe (6500MBps/1500MBps Seq Read/Write, 580K/70K IOPS) Pegasus Sylvester Dedicated Big Memory GPU (2 nodes) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :: OS: CentOS 7.9, x86_64 CPU: 2 x Intel Xeon Gold 6338 CPU @ 2.00GHz (64 cores/node, 128 cores total) RAM: 32 x 128GiB RDIMM DDR4 3200MHz ECC (4096GB/node, 8192GiB RAM total) GPU: 1 x Nvidia A100 (80GB GPU RAM) Net: 100Gbps Infiniband, 10Gbps Ethernet, Disk: 960GiB Samsung PM9A3 NVMe (6500MBps/1500MBps Seq Read/Write, 580K/70K IOPS) How do I reset my IDSC password? -------------------------------- Via the `IDSC Password Management `__ tool. You will need to be connected to the **University's Secure Network** to access this tool and all Sylvester HPC Resources. How do I access the Secure Network remotely? -------------------------------------------- Via the `University of Miami's VPN `__. How do I run Nextflow on Sylvester HPC resources? ------------------------------------------------- `Running Nextflow (nf-core/sarek) `__