Pegasus FAQs - Frequently Asked Questions¶
Detailed information for FAQ topics is available here and in our ACS Policies webpage.
If you are new to Pegasus and HPC clusters, review this documentation on the Pegasus system, the job scheduler, and modularized software.
CCS ACS does not install, provide support for, or provide documentation on how to code in your preferred software. ACS documentation contains information on using software in a Linux cluster environment.
How do I join a project?¶
Contact the project owner. Details can be found on the CCS Portal: https://portal.ccs.miami.edu/forms-access/
How do I request a new project?¶
Any PI or faculty member may request a new project via the CCS Portal: https://portal.ccs.miami.edu/accounts/new/group/
When will my project be created?¶
When the allocations committee has reviewed and approved it. Check your request status via the CCS Portal under My Pegasus -> My Requests.
Scratch requests over 2TB can take a month for the allocations committee to review as availability is limited.
What software is available?¶
Software Modules list on the CCS Portal: https://portal.ccs.miami.edu/resources/software/
Software Modules from the command line:
$ module avail
How do I view my currently loaded modules?¶
$ module list
May I install software?¶
Yes! Pegasus users are free to compile and install software in their respective home directories by following the software’s source code or local installation instructions. See our Software Installation guide for more information.
CCS ACS does not install user software. For global installations on Pegasus, submit a Software Request via the CCS Portal (below).
How do I request global software installation on Pegasus?¶
Request new global software via the CCS Portal: https://portal.ccs.miami.edu/resources/soft/new
We only globally install software when we receive multiple requests for the software.
When will my global software request be approved/installed?¶
When a minimum of 20 users require it, software requests will be approved. Software requests are reviewed and installed quarterly.
How can I increase Java memory on Pegasus?¶
Load the java module, then change the value of _JAVA_OPTIONS.
[username@pegasus ~]$ module load java [username@pegasus ~]$ echo $_JAVA_OPTIONS -Xmx512m [username@pegasus ~]$ export _JAVA_OPTIONS="-Xmx4g"
Pegasus Job Scheduling¶
May I run resource-intensive jobs on Pegasus login nodes?¶
No. Resource-intensive jobs must be submitted to LSF.
Is there a limit on how many jobs I can run?¶
No. Users are limited by number of simultaneous CPUs used. Individual users can run on up to 512 CPUs at a time, projects on up to 1000 CPUs at a time.
Why is my job still pending?¶
Jobs wait for enough resources to satisfy requirements. When the cluster
is under heavy user load, jobs will wait longer. Use
$ bjobs -l jobID to see PENDING REASONS. Check your resource
requirements for accuracy and feasibility.
The Pegasus job scheduler operates under Fairshare scheduling. Fairshare scheduling divides the processing power of the cluster among users and queues to provide fair access to resources, so that no user or queue can monopolize the resources of the cluster and no queue will be starved.
If your job has been pending for more than 24 hours and is not requesting exclusive access or all cores on a node, you may e-mail email@example.com for assistance.
Are other users’ pending jobs slowing my job?¶
No. The number of pending jobs is irrelevant to job performance in LSF. The scheduler can handle hundreds of thousands of jobs.
How do I submit an interactive X11 job?¶
-Is -q interactive -XF flags : LSF interactive jobs
Why was my job killed?¶
Jobs are killed to protect the cluster and preserve system performance.
Common reasons include:
- running on a login node
- using more memory than reserved
- using all the memory on a compute node
- using more CPUs than reserved
- needing more time to complete than reserved
- using more
/tmpspace than available on compute nodes
What about jobs in UNKWN state?¶
Re-queue your job in LSF :
$ bkill -r jobID
$ bkill -r jobID(a second time)
$ brequeue -e jobID