Pegasus FAQs - Frequently Asked Questions¶
Detailed information for FAQ topics is available here and in IDSC ACS Policies
If you are new to Pegasus and HPC clusters, review this documentation on the Pegasus system, the job scheduler, and modularized software.
IDSC ACS does not install, provide support for, or provide documentation on how to code in your preferred software. ACS documentation contains information on using software in a Linux cluster environment.
How do I join a project?¶
Contact the project owner.
How do I request a new project?¶
Any PI or faculty member may request a new project : https://idsc.miami.edu/project_request
When will my project be created?¶
When the allocations committee has reviewed and approved it.
Scratch requests over 2TB can take a month for the allocations committee to review as availability is limited.
How can I manage my Projects and Groups?¶
Contact IDSC ACS at email@example.com
What software is available?¶
Software Modules from the command line:
$ module avail
How do I view my currently loaded modules?¶
$ module list
How do I use software modules?¶
May I install software?¶
Yes! Pegasus users are free to compile and install software in their respective home directories by following the software’s source code or local installation instructions. See our Software Installation guide for more information.
IDSC ACS does not install user software. For global installations on Pegasus, submit a Software Request to firstname.lastname@example.org
How do I request global software installation on Pegasus?¶
Submit your request to email@example.com
We only globally install software when we receive multiple requests for the software.
When will my global software request be approved/installed?¶
When a minimum of 20 users require it, software requests will be approved. Software requests are reviewed and installed quarterly.
How can I increase Java memory on Pegasus?¶
Load the java module, then change the value of _JAVA_OPTIONS.
[username@pegasus ~]$ module load java [username@pegasus ~]$ echo $_JAVA_OPTIONS -Xmx512m [username@pegasus ~]$ export _JAVA_OPTIONS="-Xmx4g"
Pegasus Job Scheduling¶
May I run resource-intensive jobs on Pegasus login nodes?¶
No. Resource-intensive jobs must be submitted to LSF.
Is there a limit on how many jobs I can run?¶
No. Users are limited by number of simultaneous CPUs used. Individual users can run on up to 512 CPUs at a time, projects on up to 1000 CPUs at a time.
Why is my job still pending?¶
Jobs wait for enough resources to satisfy requirements. When the cluster
is under heavy user load, jobs will wait longer. Use
$ bjobs -l jobID to see PENDING REASONS. Check your resource
requirements for accuracy and feasibility.
The Pegasus job scheduler operates under Fairshare scheduling. Fairshare scheduling divides the processing power of the cluster among users and queues to provide fair access to resources, so that no user or queue can monopolize the resources of the cluster and no queue will be starved.
If your job has been pending for more than 24 hours and is not requesting exclusive access or all cores on a node, you may e-mail firstname.lastname@example.org for assistance.
Are other users’ pending jobs slowing my job?¶
No. The number of pending jobs is irrelevant to job performance in LSF. The scheduler can handle hundreds of thousands of jobs.
How do I submit an interactive job?¶
-Is -q interactive flags : LSF interactive jobs
How do I submit an interactive X11 job?¶
-Is -q interactive -XF flags : LSF interactive jobs
Why was my job killed?¶
Jobs are killed to protect the cluster and preserve system performance.
Common reasons include:
- running on a login node
- using more memory than reserved
- using all the memory on a compute node
- using more CPUs than reserved
- needing more time to complete than reserved
- using more
/tmpspace than available on compute nodes
See LSF for assistance with appropriate resource reservations and Pegasus Queues for default wall times.
What about jobs in UNKWN state?¶
Re-queue your job in LSF :
$ bkill -r jobID
$ bkill -r jobID(a second time)
$ brequeue -e jobID