Bunya User Guide 2022 12 06
Bunya User Guide 2022 12 06
Bunya User Guide 2022 12 06
General HPC training is available via the RCC and QCIF Training resources.
To get a basic understanding of what you need to be aware of when using HPC for your
research please listen to the following videos:
For UQ users and QCIF users with a QRIScloud collection please also listen to
General overview of Q RDM
Q RDM on HPC
What is changing
Hardware:
• Bunya currently has around 6000 cores, with 96 physical cores per node.
o (2 * 48 core CPUs per node).
• These CPUs are based on AMD (epyc3 Milan). They are not Intel CPUs as was the
case with FlashLite and Tinaroo.
• These CPU cores are based on the industry standard x86_64 architecture.
• Each standard Bunya node has 2TB of RAM.
• There are also 3 high memory nodes that each have 4TB of RAM.
Resource Scheduler:
• Bunya uses the Slurm scheduler and batch queue system which is different to the
PBS scheduler and batch queue used on FlashLite and Tinaroo. Users will not be able
to reuse their PBS scripts from Tinaroo/FlashLite but will have to change to Slurm
scripts.
• Bunya is currently CPU only for the standard user. The standard queues do not have
GPU hardware resources associated with them yet.
Software:
• Software is still available via the module system as it was on FlashLite and Tinaroo.
• Bunya will, however, have different software and versions installed than Tinaroo,
FlashLite or Awoonga did.
• Users should use module avail to check which software and their versions are
installed.
• Users who install their own software are required to recompile their software for
Bunya.
• Locations for data: /home, /scratch/user, /scratch/project and /QRISdata remain the
same.
• /RDS has been retired (it was set up as a link to /QRISdata so for users accustomed
to /RDS it is just the name that is changing) and users are now required to use
/QRISdata.
• Users will see the same data in /home, /scratch/user, /scratch/project and /QRISdata
on Tinaroo/FlashLite and Bunya. There will be no need for users to transfer any data
from Tinaroo/FlashLite before using Bunya.
Guide
Connecting
Set 1 of the Training resources explains how to use Putty to connect to a HPC with the basics
found here. To connect to Bunya please use:
Hostname: bunya3.rcc.uq.edu.au
Port: 22
File Transfer
The basic use of FileZilla for file and data transfer is shown here.
If you experience problems with disconnection, then try this: Go to Edit -> Settings and
change the number under “Timeout” from 20 seconds to 120 or more.
With MFA you need to use an interactive session in FileZilla to connect. Click on the icon
directly under “File” (left top corner) which should bring up this window. Then select
interactive from the drop-down menu.
Software
The training resources have a short video on how to use software modules to load installed
software on HPC.
Bunya uses EasyBuild to build and install software and modules. Modules on Bunya are self-
contained which means users do not need to load any dependencies for the module to
work. This is similar to how modules worked on Tinaroo and FlashLite but different to
Wiener.
Using module avail will show only the main software modules installed. It will not show
all the different dependency modules that are also available. To show ALL modules including
hidden modules use
EasyBuild recipes can be found for a very wide range of software. Some might need
tweaking for newer versions, but it often is relatively easy. You can also write your own.
Users can build into their own home directory but use all exisiting software and software
tool chains that are already available. Users need to load the EasyBuild module first:
For example, if you create a folder called EasyBuild in your home directory and have a recipe
located in this directory you can build the software via this command.
eb --prefix=/home/YourUsername/EasyBuild --
installpath=/home/YourUsername/EasyBuild --
buildpath=/home/YourUsername/EasyBuild/build --
robot=/home/YourUsername/EasyBuild ./EasyBuild-recipe-file.eb
If you add the –D option, it will do a dry run first. Please use eb –H to get the help manual.
Users who have a working EasyBuild recipe and have tested that the software installed as
such is working on Bunya can offer the recipe to be uploaded to the cluster wide installed
software and it would then be available via modules.
Users are reminded that no calculation, no matter how quick or small, should be run on the
login nodes. So no, the quick python or R or bash script or similar should NOT be just quickly
run from the command line as it is so much more convenient. All calculations are required
to be done on the compute nodes.
Users can use interactive jobs which will give them that command line feel and flexibility
and allow the use of graphical user interfaces.
Users have access to a debug queue for quick testing of new jobs and codes etc.
Interactive jobs
User should use interactive jobs to do quick testing and if they need to use a graphical user
interface (GUI) to run their calculations. This could include jupyter, spider, etc. salloc is
used to submit an interactive job and you should specify the required resources via the
command line:
Please use --partition=general unless you have been given permission to use ai or
gpu.
For an interactive session on the gpu or ai nodes you will need to add --
gres=gpu:[number] to the salloc request. For the gpu partition you will need to
specify which type of GPU you are requersting as they are now AMD and NVIDIA GPUs. See
below for more information.
This will log you onto a node. To run a job just type as you would usually do on the
command line. As srun was already used in the above command there is no need to use
srun to run your executables, it will just mess things up.
exit
on the command line which will stop any processes still running and will release the
allocation for the job.
At the moment there are issues with testing MPI jobs through an interactive session.
This will give you a new shell and an allocation, but you are still on the login node. You can
now use srun to actually start a job on a node.
1) Load all the modules you need
2) export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi2.so
3) export SLURM_MPI_TYPE=pmi2
4) srun --ntasks=[number up to 96] --export=ALL executable <
input> output
You can use any number of cores you need up to the full 96 you requested via salloc. You
need the --export=ALL to export the environment with the loaded modules and pointing
to pmi2 to the job. This will only work for the general and debug partition. For the GPU
ones you might have to do some testing and provide a long list of what needs to be
exported.
This will start the job. Once it is done or crashed you get your prompt back but you are still
in the salloc allocation, so you are able to submit more under that allocation. To exit and
release the job allocation type exit.
Slurm scripts
Users should keep in mind that Bunya has 96 cores per node. 96 cores or cpu-per-task is
therefore the maximum a multi core job can request. Please note not all calculations scale
well with cores, so before requesting all 96 cores do some testing first.
Users with MPI jobs should run in multiples of nodes, so in multiples of 96 cores. This means
the calculation needs to scale well to such numbers of cores. Most will not, so do some
testing first!
The Pawsey Centre has an excellent guide on how to migrate from PBS to SLURM. The
Pawsey Centre also provides a good general overview of job scheduling with Slurm and
examples workflows like array jobs.
Below are examples for single core, single node but multiple cores, MPI, and array job
submission scripts. The different request flags mean the following:
See
man sbatch
and
man srun
for more options (use arrow keys to scroll up and down and q to quit)
Please note: The default partition is debug which will give you are bare minimum of
resources. For example the maximum walltime in the debug queue is 30 minutes. Most
users would want to run in the general partition. Important, the slurm defaults are usually
not sufficient for most user jobs. If you want appropriate resources, you are required to
request them.
Please note: using the SBATCH options –o and –e in a script will result in the standard error
and standard output file to appear as soon as the job starts to run. This behaviour is
different to standard PBS behaviour on Tinaroo and FlashLite (unless you specified paths for
those files there too) where the standard error, .e, and standard output, .o, files only
appeared when the job was finished or had crashed.
Please note: In Slurm your job will start in the directory/folder you submitted from. This is
different to PBS behaviour on Tinaroo/FlashLite where your job started in your home
directory. So on Bunya, using slurm, there is no need to change into the job directory, unless
this is different to the directory you submitted from.
Please note: There is currently no equivalent to the $TMPDIR that was available on FlashLite
and Tinaroo. Until this has been set up users are required to use their /scratch/user
directory for temporary files. RCC is working to set up a large and fast space for temporary
files which will accommodate similar loads as was possible on FlashLite, if not more.
Accounting has now been switched on and will be enforced. Users cannot run jobs without
a valid Account String. All valid AccountStrings start with “a_” and are all lower case
letters. If you do not have a valid AccountString then please contact your supervisor.
AccountStrings and access are managed by research groups and group leaders. Groups
who wish to use Bunya are required to apply to set up a group with a valid AccountString.
Only group leaders can apply to set up such a group. A PhD student or postdoc without
their own funding and group should not apply. Applications can be made by contacting
[email protected].
Simple script for AI GPUs. Nodes bun003, bun004, and bun005. AI GPUs are restricted to a
specific set of users. If you have not been given explicit permission do not use these.
Only certain AccountStrings have access to these GPUs. If you should and cannot run a job
please contact your supervisor.
#!/bin/bash --login
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=10G
#SBATCH --job-name=Test
#SBATCH --time=1:00:00
#SBATCH --partition=ai
#SBATCH --account=AccountString
#SBATCH --gres=gpu:1 #you can ask for up to 3 here
#SBATCH –o slurm.output
#SBATCH –e slurm.error
module-loads-go-here
Simple script for AMD GPUs. Nodes bun001 and bun002. The AMD GPUs are restricted to
a specific set of users. If you have not been given explicit permission do not use these.
#!/bin/bash --login
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=10G
#SBATCH --job-name=Test
#SBATCH --time=1:00:00
#SBATCH --partition=gpu
#SBATCH --account=AccountString
#SBATCH --gres=gpu:mi210:1 #you can ask for up to 2 here
#SBATCH –o slurm.output
#SBATCH –e slurm.error
module-loads-go-here
Simple script for A100 GPUs. Node bun068. These are not set up yet, so please do not try
and use these.
#!/bin/bash --login
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=10G
#SBATCH --job-name=Test
#SBATCH --time=1:00:00
#SBATCH --partition=gpu
#SBATCH --account=AccountString
#SBATCH --gres=gpu:a100:1 #you can ask for up to 2 here
#SBATCH –o slurm.output
#SBATCH –e slurm.error
module-loads-go-here
#!/bin/bash --login
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=10G
#SBATCH --job-name=Test
#SBATCH --time=1:00:00
#SBATCH --partition=general
#SBATCH --account=AccountString
#SBATCH –o slurm.output
#SBATCH –e slurm.error
module-loads-go-here
#!/bin/bash --login
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=96
#SBATCH --cpus-per-task=1
#SBATCH --mem=5G
#SBATCH --job-name=MPI-Test
#SBATCH --time=1:00:00
#SBATCH --partition=general
#SBATCH --account=AccountString
#SBATCH –o slurm.output
#SBATCH –e slurm.error
module-loads-go-here
Job Arrays
#!/bin/bash --login
#SBATCH --job-name=testarray
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=5G
#SBATCH --time=00:01:00
#SBATCH --account=AccountString
#SBATCH --partition=general
#SBATCH --output=test_array_%A_%a.out
#SBATCH --array=1-5
module-loads-go-here
Here are some other useful additions to the squeue command. For information on what all
these means please consult the man pages.
squeue -o "%.18i %.9P %.8j %.8u %.8T %.10M %.9l %.6D %.10a %.4c %R"
squeue -o"%.7i %.9P %.8j %.8u %.2t %.10M %.6D %C"
sinfo is sued to obtain information about the actual nodes. Here some useful examples.
sinfo -o "%n %e %m %a %c %C"
sinfo -O Partition,NodeList,Nodes,Gres,CPUs
sinfo -o "%.P %.5a %.10l %.6D %.6t %N %.C %.E %.g %.G %.m"