PARAM Ganga User Manual 27 03 24
PARAM Ganga User Manual 27 03 24
PARAM Ganga User Manual 27 03 24
User’s Manual
Ver. 4.1
Last updated: March 27, 2024
www.cdac.in
PARAM Ganga – User’s Manual
Copyright Notice
Copyright © 2022 Centre for Development of Advanced Computing
All Rights Reserved.
Any technical documentation that is made available by C-DAC (Centre for Development of Advanced
Computing) is the copyrighted work of C-DAC and is owned by C-DAC. This technical documentation is
being delivered to you as is, and C-DAC makes no warranty as to its accuracy or use. Any use of the technical
documentation or the information contained therein is at the risk of the user. C-DAC reserves the right to make
changes without prior notice.
No part of this publication may be copied without the express written permission of C-DAC.
Trademarks
CDAC, CDAC logo, NSM logo are trademarks or registered trademarks.
Other brands and product names mentioned in this manual may be trademarks or registered trademarks of their
respective companies and are hereby acknowledged.
Intended Audience
This document is meant for PARAM Ganga users.
Typographic Conventions
Symbol Meaning
Page | 2
PARAM Ganga – User’s Manual
Getting help
DISCLAIMER
The information contained in this document is subject to change without notice. C-DAC shall not be liable for
errors contained herein or for incidental or consequential damages in connection with the performance or use
of this manual.
Page | 3
PARAM Ganga – User’s Manual
Contents
Introduction............................................................................................................ 9
System Architecture and Configuration .............................................................. 10
System Hardware Specifications .................................................................................. 10
Master Nodes ................................................................................................................ 10
Login Nodes.................................................................................................................. 10
Service Nodes ............................................................................................................... 11
CPU Compute Nodes .................................................................................................... 11
GPU Compute Nodes.................................................................................................... 11
Storage .......................................................................................................................... 12
Operating System .......................................................................................................... 12
PARAM Ganga Architecture Diagram ......................................................................... 13
Primary Interconnection Network ................................................................................ 14
Secondary Interconnection Network ............................................................................ 14
Software Stack .............................................................................................................. 14
First Things First .................................................................................................. 18
Getting an Account on PARAM Ganga ........................................................................ 18
First login ...................................................................................................................... 18
Forgot Password? .......................................................................................................... 19
System Access............................................................................................................... 19
Remote Access .............................................................................................................. 20
Transferring files between local machine and HPC cluster .......................................... 21
Tools.............................................................................................................................. 23
Running Interactive Jobs ..................................................................................... 25
Managing Jobs through its Lifecycle................................................................... 26
walltime ........................................................................................................................ 26
List Partition ................................................................................................................. 27
Addressing Basic Security Concerns ................................................................... 33
More about Batch Jobs (SLURM) ....................................................................... 33
Parameters used in SLURM job script ......................................................................... 34
I am familiar with PBS/ TORQUE. How do I migrate to SLURM? ............................ 37
Preparing Your Own Executable ......................................................................... 38
Spack.................................................................................................................... 42
Page | 4
PARAM Ganga – User’s Manual
Introduction................................................................................................................... 42
To Use Pre-Installed Applications from Spack ............................................................. 43
To install new application ............................................................................................. 44
Uninstalling Packages ................................................................................................... 46
Using Environments ..................................................................................................... 46
Packaging (For Application developers)....................................................................... 47
Sample SLURM script for OpenMP applications/programs. to use Spack .................. 49
Sample SLURM script for MPI applications/programs to use Spack .......................... 49
Job Scheduling on PARAM Ganga ..................................................................... 51
Scheduler ...................................................................................................................... 51
sinfo .............................................................................................................................. 51
PARAM Ganga SLURM Partitions and QoS ............................................................... 52
For Submitting the job .................................................................................................. 52
walltime ........................................................................................................................ 54
Debugging Your Codes ........................................................................................ 57
Introduction................................................................................................................... 57
Basics How Tos............................................................................................................. 58
Conclusions................................................................................................................... 77
Points to Note................................................................................................................ 77
Overall Coding Modifications Done............................................................................. 77
Machine Learning / Deep Learning Application Development .......................... 78
How to Install your own Software? .............................................................................. 79
Some Important Facts .......................................................................................... 80
About File Size ............................................................................................................. 80
Little-Endian and Big-Endian issues?........................................................................... 81
Best Practices for HPC ........................................................................................ 82
Installed Applications/Libraries ........................................................................... 82
Standard Application Programs on PARAM Ganga ..................................................... 83
LAMMPS Applications ................................................................................................ 83
GROMACS APPLICATION ........................................................................................ 86
Acknowledging the National Supercomputing Mission in Publications ............. 89
Getting Help – PARAM Ganga Support ............................................................. 89
Steps to Create a New Ticket ........................................................................................ 90
Page | 5
PARAM Ganga – User’s Manual
Page | 6
PARAM Ganga – User’s Manual
List of Figures
Figure 1 - PARAM Ganga Architecture Diagram ...................................................................
13 Figure 2 – Software Stack ....................................................................................................
16 Figure 3 - A snapshot of command using MobaXterm ........................................................
23 Figure 4 - A snapshot of "scp" tool to transfer file to and from remote computer. ..............
24
Figure 5 – Enter Captcha/String ........................................................................................... 24
Figure 6 - Output of sinfo command .................................................................................... 27
Figure 7 – Snapshot depicting the usage of “Job Array”....................................................... 29
Figure 8 – scontrol show node displays compute node information .................................... 31
Figure 9 – scontrol show partition displays specific partition details ................................... 31
Figure 10 – scontrol show job displays specific job information .......................................... 32
Figure 11 – sinfo Command ................................................................................................. 55
Figure 12 - Listing the shares of association to a cluster ...................................................... 59
Figure 13 – Snapshot of debugging process ......................................................................... 64
Figure 14 – Snapshot of debugging process ......................................................................... 65
Figure 15- Output at a debugging stage ............................................................................... 66
Figure 16 – Snapshot of debugging process ......................................................................... 67
Figure 17 – Output depicting “Arithmetic Exception” ..........................................................
68 Figure 18 – Snapshot of debugging process .........................................................................
68 Figure 19 – Well, we dumped core!! ...................................................................................
68 Figure 20- Snapshot of debugging process ..........................................................................
69 Figure 21 – Setting Breakpoint ............................................................................................
70 Figure 22 – single stepping through to catch error!! ............................................................
71 Figure 23 – Debugging continued ........................................................................................
72
Figure 24 – Debugging continued ........................................................................................ 72
Figure 25 – Setting a watch point ........................................................................................ 73
Figure 26 – Debugging continued ........................................................................................ 74
Figure 27 – Well, back to square one!! ................................................................................ 75
Figure 28 – Again Dumping Core!! Things are getting interesting or frustrating or both !! ...
76
Figure 29 – Debugging continued ........................................................................................ 76
Figure 30 – Debugging continued ........................................................................................ 77
Figure 31 – Debugging continued (Will it ever end?) ........................................................... 78
Figure 32 – We are almost there!! ....................................................................................... 78
Figure 33 – Debugging continued ........................................................................................ 79
Figure 34 – At last a clue!!! .................................................................................................. 80
Figure 35- Correctionapplied!! ............................................................................................ 81
Figure 36 – Resolved!!! ....................................................................................................... 81
Page | 7
PARAM Ganga – User’s Manual
Page | 8
PARAM Ganga – User’s Manual
Introduction
This document is the user manual for the PARAM Ganga Supercomputing facility at IIT
Roorkee. It covers a wide range of topics ranging from a detailed description of the
hardware infrastructure to the information required to utilize the supercomputer, such as
information about logging on to the supercomputer, submitting jobs, retrieving the results
on to user’s Laptop/ Desktop etc. In short, the manual describes all that one needs know to
effectively utilize PARAM Ganga.
Page | 9
PARAM Ganga – User’s Manual
Master Nodes
PARAM Ganga is an aggregation of a large number of computers connected through
networks. The basic purpose of the master node is to manage and monitor each of the
constituent components of PARAM Ganga from a system’s perspective. This involves
operations like monitoring the health of the components, the load on the components, the
utilization of various sub-components of the computers in PARAM Ganga.
Master Nodes: 2
Page | 11
PARAM Ganga – User’s Manual
Storage
● Based on Lustre parallel file system
● Total useable capacity of 2.2PB Primary storage
● Throughput 50 GB/s
Operating System
● Operating system on PARAM Ganga is Linux – CentOS 7.9
Page | 12
PARAM Ganga – User’s Manual
Network infrastructure
A robust network infrastructure is essential to implement the basic functionalities of a cluster.
These functionalities are:
c) Ensuring fast I/O operations like connecting to other clusters, connecting the
cluster to various users on the campus LAN, etc. (Network/portion of Network
which implements this functionality is referred to as I/O Fabric).
d) Ensuring high-bandwidth, low-latency communication amongst processors to for
achieving high-scalability (Network/portion of Network which implements this
functionality is referred to as Message Passing Fabric)
Page | 13
PARAM Ganga – User’s Manual
Software Stack
Software Stack is an aggregation of software components that work in tandem to accomplish
a given task. The task can be, to facilitate a user to execute his job/s or to facilitate a system
administrator to manage a system efficiently. In effect, the software will have all the necessary
components to accomplish a given task. There may be multiple components of different
flavors to accomplish a given sub-task. The user/administrator may mix and match these
components depending on his choice. Typically, a user would be interested in preparing his
executable, executing the same with his data sets and visualize the output generated by him.
For accomplishing the same, the user would need to compile his codes, link the codes with
communication libraries, math libraries, numerical algorithm libraries, prepare the
executable, run the same with desired data sets, monitor the progress of his jobs, gathering
the results and visualizing the output.
Typically, a system administrator would be interested in ensuring that all the resources are
optimally utilized. For accomplishing this, he may need some installation tools, tools for
checking the health of all the components, good schedulers, tools to facilitate allocation of
resources to users and monitor the usage of the resources.
Page | 14
PARAM Ganga – User’s Manual
The software stack provided with this system has a gamut of software components that meets
all the requirements of a user and that of a system administrator. The components of the
software stack are depicted in figure 2.
Amongst these, C-CHAKSHU has been recently developed and deployed by CDAC. We
solicit your feedback on these tools at [email protected].
Page | 15
PARAM Ganga – User’s Manual
Page | 16
PARAM Ganga – User’s Manual
Page | 17
PARAM Ganga – User’s Manual
First login
Whenever the newly created user on PARAM Ganga tries to log in with the User Id and
password (temporary, system generated) provided over the email through PARAM Ganga
support, he/she will next be prompted to create a “new password” of their choice which will
change the temporary, system generated password. This will enable you to keep your account
Page | 18
PARAM Ganga – User’s Manual
secure. It is recommended that you have a strong password that contains a combination of
alphabets (lower case/upper case), numbers, and a few special characters that you can easily
remember.
Given next is a screenshot that describes the scenario for “first login”
Your password will be valid for 90 days. On expiry of 90 days period, you will be prompted
to change your password, on attempting to log in. You are required to provide a new
password.
Forgot Password?
There is nothing to panic!! Please raise a ticket regarding this issue and the system
administrators will resolve your problem. Please refer to the section “Getting Help – PARAM
Ganga Support”, described elsewhere in this manual. Follow the GUI based, user-friendly
ticketing system. Please follow the steps given below:
1. Open the PARAM Ganga support site i.e., the ticketing tool by following the link
https://paramganga.iitr.ac.in/support
2. Login with your registered email id, complete name, contact number.
3. There you can raise a ticket to get the password reset.
4. The system admin person will revert with an email for verification.
5. Once acknowledged, the password is reset for the user and an email is sent back for
intimating the same.
6. Then the user can login with the temporary password and can set a new password of
his/her choice.
System Access
Accessing the cluster
The cluster can be accessed through 10 general login nodes, which allows users to login.
Page | 19
PARAM Ganga – User’s Manual
Remote Access
Using SSH in Windows
To access PARAM Ganga, you need to “ssh” the login server. PuTTY is the most popular
open source “ssh” client application for Windows, you can download it from
(http://www.putty.org/). Once installed, find the PuTTY application shortcut in your Start
Menu, desktop. On clicking the PuTTY icon, the PuTTY configuration dialog should appear.
Locate the “Host Name or IP Address” input field in the PuTTY configuration screen. Enter
the user’s name along with IP address or Hostname with which you wish to connect.
ssh[username]@[hostname]
For example, to connect to the PARAM Ganga Login Node, with the username
You will be prompted for a password, and then will be connected to the server.
Page | 20
PARAM Ganga – User’s Manual
Password
How to change the user password?
Use the passwd command to change the password for the user from login node.
To store the data special directories have been made available to the users with the name
“home” the path to this directory is “/home”. Whereas these directories are common to all the
users, a user will get his own directory with their username in /home/ directories where they
can store their data.
However, there is a limit to the storage provided to the users, the limits have been defined
according to quota over these directories, all users will be allotted the same quota by default.
When a user wishes to transfer data from their local system (laptop/desktop) to the HPC
system, they can use various methods and tools.
A user using ‘Windows’ operating system will get methods and tools that are native to
Microsoft Windows and tools that could be installed on your Microsoft windows machine.
Linux operating system users do not require any tool. They can just use the “scp” command
on their terminal, as mentioned below.
Users are advised to keep a copy of their data with themselves, once the project/research work
is completed by transferring the data in from PARAM Ganga to their local system
(laptop/desktop). The command shown below can be used for effecting file transfers (in all
the tools):
Example:
Page | 21
PARAM Ganga – User’s Manual
Same Command could be used to transfer data from the HPC system to your local system
(laptop/desktop).
Example:
scp –r <path to directory on HPC><your username>@<IP of local system>:<path to the local data
directory>
Note: The Local system (laptop/desktop) should be connected to the network with which it
can access the HPC system.
To reiterate,
To copy a local directory from your Linux system (say Wrf-2.0) to your home directory in
your PARAM Ganga HPC account, the procedure is:
user1@mylaptop:~$cd ~/MyData/
2. Under parent directory type ls <& press Enter key>, & notice Wrf-2.0 is there.
Page | 22
PARAM Ganga – User’s Manual
[user1@login:~]$
Tools
MobaXterm (Windows installable application):
It is a third party freely available tool which can be used to access the HPC system and transfer
file to PARAM Ganga system through your local systems (laptop/desktop).
Page | 23
PARAM Ganga – User’s Manual
Figure 4 - A snapshot of "scp" tool to transfer file to and from remote computer.
Note: Port Used for SFTP connection is 4422 and not 22. Please change it to 4422
Page | 24
PARAM Ganga – User’s Manual
In general, the jobs can be run in an interactive manner or in batch mode. You can run an
interactive job as follows:
The following command asks for a single core on one hour with default amount of memory.
The command prompt will appear as soon as the job starts. This is how it looks once the
interactive job starts:
srun: job xxxxx queued and waiting for resources srun: job xxxxx has been allocated resources
Exit the bash shell to end the job. If you exceed the time or memory limits the job will also
abort.
Please note that PARAM Ganga is NOT meant for executing interactive jobs. However, for
the purpose of quickly ascertaining successful run of a job before submitting a large job in
batch (with large iteration counts), this can be used. This can even be used for running small
jobs. The point to be kept in mind is that, since others too would be using this node, it is
prudent not to inconvenience them by running large jobs.
It is a good idea to specify the CPU account name as well (if you face any problems)
Page | 25
PARAM Ganga – User’s Manual
PARAM Ganga extensively uses modules. The purpose of module is to provide the
production environment for a given application, outside of the application itself. This also
specifies which version of the application is available for a given session. All applications
and libraries are made available through module files. A user has to load the appropriate
module from the available modules. User can add a particular module in their ~/.bashrc also
if they don’t want to load particular module file for each time after they login.
#!/bin/sh
#SBATCH -N 3 // specifies number of nodes
#SBATCH --ntasks-per-node=48 // specifies cores per node
#SBATCH --time=06:50:20 // specifies maximum duration of run
#SBATCH --job-name=lammps // specifies job name
#SBATCH --error=job.%J.err_node_48 // specifies error file name
#SBATCH --output=job.%J.out_node_48 //specifies output file name
#SBATCH --partition=small // specifies queue name
cd $SLURM_SUBMIT_DIR // To run job in the directory from where it is submitted
export I_MPI_FABRICS=shm:dapl //For Intel MPI versions 2019 onwards this value must be
shm:ofi mpiexec.hydra -n $SLURM_NTASKS lammps.exe
walltime
Walltime parameter defines as to how long your job will run. The maximum runtime of a job
allowed as per QoS policy. If more than 3 days are required, a special request needs to be
sent to HPC coordinator and it will be dealt with on a case-to-case basis. The command line
to specify walltime is given below.
Page | 26
PARAM Ganga – User’s Manual
and also, as part of the submit scripts described in the manual. If a job does not get completed
within the walltime specified in the script, it will get terminated.
The biggest advantage of specifying appropriate walltime is that the efficiency of scheduling
improves resulting in improved throughput in all jobs including yours. You are encouraging
to arrive at the appropriate walltime for your job by executing your jobs few times.
NOTE: You are requested to explicitly specify the walltime in your command lines and scripts.
List Partition
sinfo displays information about nodes and partitions(queues). $
sinfo
$ sbatch slurm-job.sh
Submitted batch job 106
Page | 27
PARAM Ganga – User’s Manual
Here’s a simple job script. Note that the Slurm -J option is used to give the job a name.
#! /usr/bin/env bash
#SBATCH -p standard #SBATCH -J
simple
sleep 60
Submit the job: $ sbatch simple.sh
Submitted batch job 149
Now we'll submit another job that's dependent on the previous job. There are many ways
to specify the dependency conditions, but the "singleton" method is the simplest. The
Slurm -d singleton argument tells Slurm not to dispatch this job until all previous jobs
with the same name have completed.
$ sbatch -d singleton simple.sh //may be used for first pre-processing on a core and then submitting
Submitted batch job 150
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
150 standard simple user1 PD 0:00 1 (Dependency) 149 standard simple user1 R 0:17 1 cn001
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
150 standard simple user1 R 0:31 1 cn001
3. Submit a job with a reservation allocated
Slurm has the ability to reserve resources for jobs being executed by select users and/or
select bank accounts. A resource reservation identifies the resources in that reservation
and a time period during which the reservation is available. The resources which can be
reserved include cores, nodes.
Use the command given below to check the reservation name allocated to your user
account
If your ‘user account’ is associated with any reservation the above command will show
you the same. For e.g., the reservation name given is user_11. Use the command given
below to make use of this reservation
Page | 28
PARAM Ganga – User’s Manual
N1 is specifying number of nodes you want use for your job. example: N1 -one node, N4 -
four nodes. Instead of tmp here you can use below example script.
#! /bin/bash
#SBATCH -N 3
#SBATCH --ntasks-per-node=48
#SBATCH --error=job.%A_%a.err
#SBATCH --output=job.%A_%a.out
#SBATCH --time=01:00:00
#SBATCH --partition=small
module load compiler/intel/2018.2.199 cd
/home/guest/Rajneesh/Rajneesh
export OMP_NUM_THREADS=${SLURM_ARRAY_TASK_ID}
/home/guest/Rajneesh/Rajneesh/md_omp
List jobs
Monitoring jobs on SLURM can be done using the command squeue. Squeue is used to view
job and job step information for jobs managed by SLURM.
Page | 29
PARAM Ganga – User’s Manual
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
106 standard slurm-job user1 R 0:04 1 cn001
scontrol show job - shows detailed information about a specific job or all jobs if no job id is
given.
Page | 30
PARAM Ganga – User’s Manual
scontrol update job - changes attributes of submitted job; like time limit, priority (root only)
$ scontrol show job 106
JobId=106 Name=slurm-job.sh
UserId=user1(1001) GroupId=user1(1001)
Priority=4294901717 Account=(null) QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=00:00:07 TimeLimit=14-00:00:0 TimeMin=N/A
SubmitTime=2021-01-26T12:55:02 EligibleTime=2021-01-26T12:55:02
StartTime=2021-01-26T12:55:02 EndTime=Unknown
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=small AllocNode:Sid=atom-head1:3526
ReqNodeList=(null) ExcNodeList=(null)
NodeList=cn001
BatchHost=cn001
NumNodes=1 NumCPUs=2 CPUs/Task=1 ReqS:C:T=*:*:* MinCPUsNode=1
MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=0 Contiguous=0 Licenses=(null) Network=(null)
Command=/home/user1/slurm/local/slurm-job.sh
WorkDir=/home/user1/slurm/local
Page | 31
PARAM Ganga – User’s Manual
Kill a job. Users can kill their own jobs; root can kill any job.
$ scancel 135
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
Hold a job:
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
139 small simple user01 PD 0:00 1 (Dependency)
138 small simple user01 R 0:16 1 cn001
$ scontrol hold 139
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
139 small simple user01 PD 0:00 1 (JobHeldUser) 138 small simple user01 R
0:32 1 cn001
Release a job:
Page | 32
PARAM Ganga – User’s Manual
Your account on PARAM Ganga is ‘private to you’. You are responsible for any actions
emanating from your account. It is suggested that you should never share the password to
anyone including your friends and system administrators!!
Please note that, by default, a new account created on PARAM Ganga is readable by everyone
on the system. The following simple commands will make your account adequately safe.
chmod 700 /home/$user ! will ensure that only yourself can read, write and
! execute files in your home directory
chmod 750 /home/$user ! will enable yourself and the members of your
! group to read and execute files in your home
! directory
chmod 755 /home/$user ! will enable yourself, your group members and
! everyone else to read and execute files in your
! directory
chmod 777 /home/$user ! will enable EVERY ONE on the system to read,
! write and execute files in your home directory.
! This is a sort of ‘free for all’ situation. This
! should be used very judiciously
SLURM (Simple Linux Utility for Resource Management) is a workload manager that
provides a framework for job queues, allocation of compute nodes, and the start and execution
of jobs.
It is important to note:
Page | 33
PARAM Ganga – User’s Manual
• Compilations are done on the login node. Only the execution is scheduled via SLURM
on the compute nodes
• Upon Submission of a Job script, each job gets a unique Job Id. This can be obtained
from the ‘squeue’ command.
• The Job Id is also appended to the output and error filenames.
• All examples of modules are for illustrations only, please refer to the cluster for actual
module name.
Page | 34
PARAM Ganga – User’s Manual
#!/bin/bash
#SBATCH -N 4 // number of nodes
#SBATCH --ntasks-per-node=1 // number of cores per node
#SBATCH --error=job.%J.err // name of output file
#SBATCH --output=job.%J.out // name of error file
#SBATCH --time=01:00:00 // time required to execute the program #SBATCH --partition=small// specifies
queue name (standard is the default partition if you do not specify any partition job will be submitted using
default partition). For other partitions you can specify hm or gpu
#!/bin/bash
#SBATCH –N 4 // Number of nodes
#SBATCH --ntasks-per-node=48 // Number of cores per node
#SBATCH --error=job.%J.err // Name of output file
#SBATCH --output=job.%J.out // Name of error file
Page | 35
PARAM Ganga – User’s Manual
#SBATCH --time=01:00:00 // Time take to execute the program #SBATCH -partition=small// specifies queue
name(standard is the default partition if you does not specify any partition job will be submitted using default
partition) other partitions
You can specify hm and gpu
export I_MPI_FABRICS=shm:dapl
export I_MPI_DEBUG=9 // Level of MPI verbosity //
cd
$SLURM_SUBMIT_DIR or
cd /home/manjuv/LAMMPS_2018COMPILER/lammps-22Aug18/bench
// Command to run the lammps in Parallel //
#!/bin/sh
Page | 36
PARAM Ganga – User’s Manual
cd $SLURM_SUBMIT_DIR
The compilations are done on the login node, whereas the execution happens on the compute
nodes via the scheduler (SLURM).
Note: The Compilation and execution must be done with same libraries and matching version to avoid unexpected
results.
Steps:
The user can copy the directory to his/her home directory and further try compiling and
executing these sample codes. The command for copying is as follows:
cp -r /home/apps/Docs/samples/ ~/.
Compilers
Optimization Flags
Optimization flags are meant for uniprocessor optimization, wherein, the compiler tries to
optimize the program, on the basis of the level of optimization. The optimization flags may
also change the precision of output produced from the executable. The optimization flags can
be explored more on the respective compiler pages. A few examples are given below.
Given next is a brief description of compilation and execution of the various types of
programs. However, for certain bigger applications, loading of additional dependency
libraries might be required.
C Program:
C + OpenMP Program:
C + MPI Program:
C + MKL Program:
CUDA Program:
Page | 40
PARAM Ganga – User’s Manual
Setting up of environment:
module load compiler/cuda/10.1 compiler/gcc/7.3.0
Example (1)
Compilation: nvcc -arch=sm_70 <<prog_name.cu>>
Execution: ./a.out
Note: The optimization switch -arch=sm_70 is intended for Volta V100 GPUs and is valid for CUDA 9 and
later. Similarly, older versions of CUDA have compatibility with lower versions of GCC only. Accordingly,
appropriate modules of GCC must be loaded.
Example (2)
Compilation: nvcc -arch=sm_70 /home/apps/Docs/samples/mm_blas.cu lcublas
Execution: ./a.out
CUDA + OpenMP Program:
Setting up of environment: module load compiler/cuda/10.1
compiler/gcc/7.3.0
Example (1)
Compilation: nvcc -arch=sm_70 -Xcompiler="-fopenmp" -lgomp
/home/apps/Docs/samples/mm_blas_omp.cu -lcublas
Execution: ./a.out
Example (2)
Compilation: g++ -fopenmp /home/apps/Docs/samples/mm_blas_omp.c -
I/opt/ohpc/pub/apps/cuda/cuda-10.1/include -
L/opt/ohpc/pub/apps/cuda/cuda-10.1/lib64 -lcublas Execution: ./a.out
OpenACC Program:
Setting up of environment: module load compiler/pgi/19.10
compiler/cuda/10.1
A sample job submission scripts for each of the sample programs is given. Upon
completion/termination of the execution, two files (output and error) are generated.
Page | 41
PARAM Ganga – User’s Manual
Delete a job
scancel <<job_name>>
Spack
Introduction
Spack automates the download-build-install process for software - including dependencies -
and provides convenient management of versions and build configurations. It is designed to
support multiple versions and configurations of software on a wide variety of platforms and
environments. It is designed for large supercomputing centers, where many users and
application teams share common installations of software on clusters with exotic
architectures, using libraries that do not have a standard ABI. Spack is non-destructive:
installing a new version does not break existing installations, so many configurations can
coexist on the same system.
Getting Started
On your login node command prompt execute below commands:
$ module load spack- To load SPACK module and setting up environment for SPACK.
Kindly see the above screenshot and source below line including initial dot.
Page | 42
PARAM Ganga – User’s Manual
$ . home/apps/spack/share/spack/setup-env.sh
Page | 43
PARAM Ganga – User’s Manual
Spack compilers
Spack manages a list of available compilers on the system, detected automatically from the
user’s PATH variable. The Spack compilers command is an alias for the command Spack
compiler list.
spack list
The spack list command shows available packages.
Page | 44
PARAM Ganga – User’s Manual
The Spack list command can also take a query string. Spack automatically adds wildcards to
both ends of the string, or you can add your own wildcards.
install
Above command will install gromacs version 2020.5 with blas and cuda support and
without MPI support. For blas there are multiple providers like OpenBLAS, Intel MKL,
amdblis, and essl, ^intel-mkl will tell spack to use intel-mkl for blas routines.
Operators in Spack
Page | 45
PARAM Ganga – User’s Manual
Uninstalling Packages
Earlier we installed many configurations each of zlib. Now we will go through and uninstall
some of those packages that we didn’t really need.
Using Environments
Spack has an environment feature in which you can group installed software. You can
install software with different versions and dependencies in each environment and can
change software to use at once by changing environments. You can create a Spack
environment by spack env create command. You can create multiple environments by
specifying different environment names here.
To activate the created environment, type spack env activate. Adding -p option will display
the current activated environment on your console. Then, install software you need to the
activated environment.
You can deactivate the environment by spack env deactivate. To switch to another
environment, type spack env activate to activate it.
Use spack env list to display the list of created Spack environments.
Page | 46
PARAM Ganga – User’s Manual
# Copyright 2013-2021 Lawrence Livermore National Security, LLC and other # Spack Project Developers.
See the top-level COPYRIGHT file for details.
#
# SPDX-License-Identifier: (Apache-2.0 OR MIT) import os
import platform import sys
import llnl.util.tty as tty from spack import * class
IiscLinewidth(MakefilePackage):
"""
Linewidth developed by IISC Banglore.
"""
homepage = ""
#Url for homepage
url = "file://{0}/linewidth.tar.gz".format(os.getcwd())
#Url for source code manual_download = True
#If source code is not available in public domain version('1',
sha256='7215f6765e5f5eddfde5f0c67a5bbdef5960607f3e199a609ef5619278ec8a66', preferred=True)
Page | 47
PARAM Ganga – User’s Manual
Sample steps taken for creating linewidth application recipe for Spack
1. Source code
Source code of Linewidth was not available through public repo like GitHub, so
needed to import OS package.
os.getcwd() - expects the source tar present in current working directory.
cha256- to check for sha256 checksum we added same in version clause and for
place holder we have given version as 1. manual download = True refers to spack
will not try to download source code for the package.
name- make sure that name of tar file is same as used inside package recipe
2. Variant- User can control behavior of application being built through this clause. Ex-
To enable MPI support we have defined it to be true by default.
3. depends_on()- This clause defines all dependencies required to build the given
application.
Ex- In linewidth example we have used Intel-MKl and HDF5.
Page | 48
PARAM Ganga – User’s Manual
4. @property - With this decorator we can define some properties for build system like
edit, build, install.
5. property build_targets - Defines logic of building source for native platform.
6. property install - Defines install procedure to be used after building source code.
Ex- In our example we define prefix path
#!/bin/bash
#SBATCH --nodes=1
#SBATCH -p hm ##
#SBATCH --exclusive
#SBATCH -t 1:00:00
echo "SLURM_JOBID="$SLURM_JOBID
echo "SLURM_JOB_NODELIST"=$SLURM_JOB_NODELIST
echo "SLURM_NNODES"=$SLURM_NNODES echo
"SLURM_NTASKS"=$SLURM_NTASKS ulimit -s unlimited
ulimit -c unlimited
export OMP_NUM_THREADS=4 ### Maximum number of threads= Number of physical core
(time <executable_path>)
Page | 49
PARAM Ganga – User’s Manual
Page | 50
PARAM Ganga – User’s Manual
Scheduler
PARAM Ganga has Slurm-20.11.08 (open source) as a workload manager for HPC facility.
Slurm is a widely used batch scheduler in top500 HPC list. PARAM Ganga consists of three
types of compute nodes: i.e., CPU only (192 GB) nodes, High memory (768 GB) nodes and
Nvidia GPU (192 GB) enabled.
Note: User has to specify #SBATCH –gres=gpu:1/2 in their job script if user wants to use 1 or 2 GPU cards on
GPU nodes
sinfo
This Slurm command is used to view available partition and node information on the cluster.
Page | 51
PARAM Ganga – User’s Manual
mini 1 48 05-00:00:00 5 4 40
Page | 52
PARAM Ganga – User’s Manual
debug
The debug limit is intended for testing, not for production throughput. Users are limited to
minimum 128 core and maximum 256 cores per job. The maximum walltime per job will
be 1 hour and only 3 jobs per user (2 running and 1 job in queue) are allowed for this
partition. Users may also wish to compile their codes on this partition.
small
This partition has a maximum run time of 72 hours (3 days). This will be the default
partition in case if no partition name is specified in the job script. It is designed to run small
jobs with limit of minimum 128 cores and maximum 512 cores. At given instance
maximum 35 jobs can be in running state and only 3 jobs per user are allowed (4 running
and 5 in queue).
medium
The medium partition has a maximum run time of 24 hours (1 day). The minimum cores
required for this partition are 526 and the maximum cores can be up to 4048 cores per job.
Only 2 jobs can be in running state with this partition. Each user can submit maximum 2
jobs in this partition (2 running).
Page | 53
PARAM Ganga – User’s Manual
large
The large partition has maximum core limit with combination of cpu and hm cores. The
users have walltime limit of 24 hours (1 day) with 4056 minimum and maximum 10146
cores. Only one job is allowed per user.
highmemory
This partition is particularly for memory intensive jobs. Each high memory (hm) node has
768 GB RAM. Minimum 24 cores are need to run job with this partition. 24 hours (1 day) is
the maximum walltime with core limit of 3720 cores per job. Users can have 3 running jobs
and 1 in queue for this partition.
gpu
The gpu partition have up to 10 jobs in running state with limit of minimum 1 gpu and
maximum 40 gpus with walltime of 24 hrs (1 day). Each user can submit up to 5 jobs (3
jobs running and 2 in queue).
mini
This partition has a maximum run time of 120 hours (5 days). It is designed to run small
jobs with limit of minimum 1 core and maximum 48 cores. At given instance maximum 4
jobs can be in running state and only 5 jobs per user are allowed (4 running and 1 in queue).
walltime
Walltime parameter defines as to how long your job will run. The maximum runtime of a job
allowed as per the QoS Policy. If more than 4 days are required, a special request needs to be
sent to HPC coordinator and it will be dealt with on a case-to-case basis. The command line
to specify walltime is given below.
and also, as part of the submit scripts described in the manual. If a job does not get completed
within the walltime specified in the script, it will get terminated.
The biggest advantage of specifying appropriate walltime is that the efficiency of scheduling
improves resulting in improved throughput in all jobs including yours. You are encouraging
to arrive at the appropriate walltime for your job by executing your jobs few times.
Note: Default wall time is 2 hours, you have to specify wall time if you want the job to run
more than 2 hours.
Page | 54
PARAM Ganga – User’s Manual
NOTE: You are requested to explicitly specify the walltime in your command lines and scripts.
Per user
• Every user will have quota of 48G of soft limit and 50G of hard limit with grace period
of 7 days in HOME file system (/home) and 190G of soft limit and 200G of hard limit
with grace period of 14 days in SCRATCH file system
• Users are recommended to copy their execution environment and input files to scratch
file system (/scratch/<username>) during job running and copy output data back to
HOME area
• File retention policy has been implemented on Luster storage for the "/scratch" file system.
As per the policy, any files that have not been accessed for the last 3 months will be deleted
permanently
• Three QoS (Quality of services) are created according to different job sizes and wall
time. Resource limits for users are defined as per below QoS policy
Scheduling Type
PARAM Ganga has been configured with Slurm’s backfill scheduling policy. It is good for
ensuring higher system utilization; it will start lower priority jobs if doing so does not delay
the expected start time of any higher priority jobs. Since the expected start time of pending
jobs depends upon the expected completion time of running jobs, reasonably accurate time
limits are important for backfill scheduling to work well.
Job Priority
The job's priority at any given time will be a weighted sum of all the factors that have been
enabled in the slurm.conf file. Job priority can be expressed as:
Job_priority =
(PriorityWeightAge) * (age_factor) +
(PriorityWeightFairshare) * (fair-share_factor) +
(PriorityWeightJobSize) * (job_size_factor) +
(PriorityWeightPartition) * (partition_factor) +
(PriorityWeightQOS) * (QOS_factor) +
SUM(TRES_weight_cpu * TRES_factor_cpu,
TRES_weight_<type> * TRES_factor_<type>,
...)
All of the factors in this formula are floating point numbers that range from 0.0 to 1.0. The
weights are unsigned, 32-bit integers. The job's priority is an integer that ranges between 0
and 4294967295. The larger the number, the higher the job will be positioned in the queue,
and the sooner the job will be scheduled. A job's priority, and hence its order in the queue,
Page | 55
PARAM Ganga – User’s Manual
can vary over time. For example, the longer a job sits in the queue, the higher its priority will
grow when the age weight is non-zero.
Age Factor: The age factor represents the length of time a job has been sitting in the queue
and eligible to run. Current value for Age factor is 10000.
Job Size Factor: The job size factor correlates to the number of nodes or CPUs the job has
requested. Current value for Job Size factor is 1000.
Partition Factor: Each node partition can be assigned an integer priority. The larger the
number, the greater the job priority will be for jobs that request to run in this partition. Current
value for partition factor is 15000.
Quality of Service (QoS) Factor: Each QoS can be assigned an integer priority. The larger the
number, the greater the job priority will be for jobs that request this QoS. Current value for
QoS factor is 100000.
Fair-share Factor: The fair-share component to a job's priority influences the order in which
a user's queued jobs are scheduled to run based on the portion of the computing resources
they have been allocated and the resources their jobs have already consumed. Current value
for fair-share factor is 100000.
ACCOUNTING
Accounting system tracks and manages HPC resource usage. As jobs are completed or
resources are utilized, accounts are charged and resource usage is recorded. Accounting
policy is like a bank/Credit System, where each department can be allocated with some
predefined budget on a quarterly basis for CPU usage. As and when the resources are utilized,
the amount will be deducted. The allocation will be reset at end of every quarter.
sacct
This command can report resource usage for running or terminated jobs including individual
tasks, which can be useful to detect load imbalance between the tasks.
sstat
sreport
This command can be used to generate reports based upon all jobs executed in a particular
time interval.
Storage Policy
Introduction
A debugger or debugging tool is a computer program that is used to test and debug other
programs (the "target" program).
When the program "traps" or reaches a preset condition, the debugger typically shows the
location in the original code if it is a source-level debugger or symbolic debugger, commonly
now seen in integrated development environments.
Debuggers also offer more sophisticated functions such as running a program step by step
(single-stepping or program animation), stopping (breaking) (pausing the program to
examine the current state) at some event or specified instruction by means of a breakpoint,
and tracking the values of variables.
Page | 57
PARAM Ganga – User’s Manual
Some debuggers have the ability to modify program state while it is running. It may also be
possible to continue execution at a different location in the program to bypass a crash or
logical error.
gcc -g <program_name.c>
e.x. gcc -g random_generator.c
gdb <executable.out>
e.x. gdb a.out
Basic gdb Commands (to be executed in gdb command line window)
Start:
Starts the program execution and stops at the first line of the main procedure. Command line
arguments may be provided if any.
Run:
Starts the program execution but does not stop. It stops only when any error or program trap
occurs. Command line arguments may be provided if any.
Help:
Prints the list of command available. Specifying ‘help’ followed by a command (e.x. ‘help
run’) displays more information about that command.
File <filename>:
Loads a binary program that is compiled with ‘-g’ flag for debugging.
List [line_no]:
Displays the source code (nearby 10 lines) of the program in execution where the execution
stopped. If ‘line_no’ is specified, it display the source code (10 lines) at the specified line.
Page | 58
PARAM Ganga – User’s Manual
Info:
Displays more information about the set of utilities and saved information by the debugger.
For example; ‘info breakpoints’ will list all the breakpoints, similarly ‘info watchpoints’ will
list all the watch points set by the user while debugging their programs.
Print <expression>:
Prints the values of variables / expression at the current running instance of the program.
Step N:
Steps the program one (or ‘N’) instructions ahead or till the program stops for any reason.
Steps through each and every instruction even if it is function call (only function or
instruction compiled with debugging flags).
next:
This command also steps through the instructions of the program. Unlike ‘step’ command, if
the current source code line calls a subroutine, this command does not enter the subroutine,
but instead steps over the call, if effect treating it as a single source line.
Continue:
This command continues the stopped program till the next breakpoint has occurred or till the
end of the program. It is used to continue from a paused/debug point state.
watch <expression>:
A watchpoint means break the program or stop the execution of the program when the value
of the expression provided is changed. Using watch command specific variables can be
watched for value changes. You can also view the list of watchpoints by using the ‘info
watchpoints’ command.
Backtrace:
Prints the backtrace of all stack frames of the program. Provides the call stack and more other
information about the running program.
Page | 59
PARAM Ganga – User’s Manual
These are some of the most powerful utilities that can be used to debug your programs using
gdb. gdb is not limited to these commands and contains a rich set of features that can allow
you to debug multi-threaded programs as well. Also, all the commands, along with the ones
listed above have ‘n’ number of different variants for more in-depth control. Same can be
utilized using the help page of gdb.
Things to note:
1) We have a few libraries included for the functions that are used in the program.
2) We have two ‘#define’ statements:
Page | 60
PARAM Ganga – User’s Manual
a. ‘N’ for the number of times the ‘rand_fract’ function will spend in calculating
the random number.
b. ‘N_LEN’ for the length of the final random number string generated. Currently
it is set to ‘100’ which means that the long random number will be of length
100.
3) Then, we have a function by name ‘rand_fract’ that iterates over two loops and using
the values of iterators (‘i’ and ‘j’), it calculates a small random number. Since, ‘rand()’
function is used for the outer loop, its number of iterations cannot be clearly defined
which gives the function a random nature.
4) The next function is as simple as its name is. It just takes an unsigned integer and
returns its factorial.
Page | 61
PARAM Ganga – User’s Manual
PART 2:
Things to note:
e. Then a dynamic array is constructed and partially filled will integer values in
descending order from the ‘normalized_fact’ value.
f. Finally, the partial array is printed by mixing the value of the array with rand()
function values followed by a modulo 10 operation.
g. The remaining partial part of final random value is generated using a basic
rand () modulo 10 operations.
The program ended up with a core dump without giving much information but just ‘Floating
point exception’. Now let’s compile the code with debugging information and run the
program simply with gdb.
Page | 63
PARAM Ganga – User’s Manual
Here we compiled the code using ‘-g’ and then used the ‘run’ command we studied earlier for
running the program. You can observe that the debugger stopped at line number 13 where the
‘Floating point exception (SIGFPE)’ occurred. At this point we can even go and check the
code at line number 13. But for now, let’s check what other information we can get from the
debugger. Let’s check the values of the variables ‘i’ and ‘j’ at this point.
The values of both ‘i’ and ‘j’ appear to be ‘0’ and thus a divide by zero exception is what
caused our program to terminate. Let’s update the code such that the value of ‘i’ and ‘j’ will
never become ‘0’. This is the modified code:
Thus, we just updated the loop index variables to start from ‘1’ instead of ‘0’. Thus, using
gdb, it was very simple to identify the point where the error occurred. Let’s re-run our updated
code and check what we get.
Page | 64
PARAM Ganga – User’s Manual
WHAT!? This is unexpected. We just cured the error part of our program and still getting an
FPE. Let’s go through the debugger and check where the error point is right now.
Page | 65
PARAM Ganga – User’s Manual
The debugger output shows that the error occurred on the same line as earlier. But in this
case, the value of ‘i’ and ‘j’ are not ‘0,0’ but they are ‘1, -1’ which is causing the denominator
at line 13 to be ‘0’ and thus, causing an FPE. In addition to print commands, we have also
issued the ‘list’ command which shows the nearby 10 lines of the code where the program
stopped.
You can observe that some bugs in the programs are easier to debug but some aren’t.
We will have to dig in much more to find out what is going on. Also, to be noted, we have
our inner loop iterating from 1 to N (which is 100), but still the value of ‘j’ is printed out to
be ‘-1’. How is this even possible!? Smart programmers would have the problem identified,
but let’s stick to the basics on how to gdb. Let us use the ‘break’ command and set a breakpoint
at line number 13 and observe what is going on.
Thus, using the command ‘break 13’ we have set the breakpoint at line number 13 which was
verified using the ‘info breakpoint’ command. Then, we reran the program with the ‘run’
command. At line 13 the program stopped and using ‘print’ command we checked the values
of ‘i’ and ‘j’. at this point, all seems to be well. Now, let’s proceed further. For stepping 1
instruction we can use the ‘step’ command. Let’s do that and observe the value of ‘j’.
Page | 66
PARAM Ganga – User’s Manual
You can observe the usage of the ‘step’ command. We are going through the program line by
line and checking the values of the variable ‘j’.
There seems to be a lot of writing/typing of the ‘step’ command just to proceed with the
program. Since, we have already set a breakpoint at line 13, we can use another command
called as ‘continue’. This command continues the program till the next breakpoint or the end
of the program.
You can see that we reduced the typing of ‘step’ command by 3 times to a ‘continue’ command
just 1 time. But this is also having us write ‘continue’ and ‘print’ multiple times. Let us use
some other utility in gdb known as ‘data breakpoints’ also known as watchpoints. But before
that, let us delete the existing breakpoint using the ‘delete’ command.
Thus, using the command ‘watch j’ we have set a watchpoint over ‘j’. Now every time when
the value of ‘j’ changes, a break will occur. You can also note the old and new values of ‘j’
printed out at each break. Another point to note is that after having one ‘continue’ command,
Page | 68
PARAM Ganga – User’s Manual
the program had a break. Further, by just pressing the ‘Enter/Return’ button on the keyboard,
the continue command was repeated. Thus, by pressing the ‘Enter/Return’ button, the last
command is repeated. At this point, we have learned much about the debugger, but we are
still not able to proceed fast with our error. Is there any other way to procced? Well, yes!!
We want to observe at the point where the value of ‘j’ reaches closer to ‘N i.e., 100’. Which
means that we are only concerned about what happens after ‘j’ reaches 99. Here, we land up
on using what is called as conditional breakpoints. First, we will delete our watchpoint and
then make use of the conditional breakpoint.
You can observe another variant of the ‘break’ command. We have explicitly stated the file
and the line number along with a condition to stop. This is useful, when the source code is
large and having multiple files. After setting a conditional break, we stopped at the point
where the value of ‘j’ becomes ‘99’. Now, let us see what happens next. Since, this is a critical
point at which we could observe the program, it is better if we step in the program using the
‘step’ command instead of relying on any break/watch points.
Page | 69
PARAM Ganga – User’s Manual
This, is unexpected!! The value of ‘j’ should never be 100 or anything above it.
By observation, we have figured out that the condition is itself wrong. It should have been ‘j
< N’ instead of ‘i < N’. This is a silly mistake of the programmer that led us to this much of
an effort.
Also, the value of ‘j’ which was observed as ‘-1’ was an outcome of the ‘short’ datatype
overflow i.e., the value of ‘j’ went from 1 to 32767 (assuming short as 2 bytes) and then from
-32768 to -1.
Finally, a hard programming bug was discovered. Let us correct this error and rerun the
program.
Page | 70
PARAM Ganga – User’s Manual
Figure 28 – Again Dumping Core!! Things are getting interesting or frustrating or both !!
This is strange!!
Sometimes the program is getting the correct output, but sometimes, we are getting a
segmentation fault. Debugging such a program may be tricky since the occurrence of the bug
is low. We will proceed with our standard debugger steps to identify the error.
We compiled the code and ran it using the debugger. But the program completed successfully.
Let us rerun it till a point where the program fails.
Here we observe a point where the program exited at the function ‘factorial’.
Page | 71
PARAM Ganga – User’s Manual
This is a point where the debugger didn’t give much information about what the value of the
variable ‘x’ was. It just pointed out that the program failed at the function named
‘factorial’. That’s it!
Another reason for such kind of output would be because of the recursive nature of the
function. The stack frame where the function ‘factorial’ failed could be in a long nest of
recursive calls. At such points, it would be better to inspect the program at an earlier point
and look for errors. Let us have a breakpoint before the ‘factorial’ function was called and
view the value of the parameters that are passed to the function.
Thus, we have set a breakpoint before the call of the function ‘factorial’ and ran the program.
For the value of ‘f1 = 8’ for the ‘factorial’ function the process seems to exit normally. Let us
rerun.
Page | 72
PARAM Ganga – User’s Manual
Unexpectedly, we have got the value of ‘f1’ as ‘-8’ and the program seems to have crashed.
Let us observe the ‘rand_fract’ function and ‘factorial’ function once again. And study the
behavior of the functions where we could get a negative number.
The ‘rand_fract’ function is returning a datatype of ‘short’ while the calculation of the return
value could be significantly large which may overflow the size of ‘short’, thus, causing a
negative answer.
Page | 73
PARAM Ganga – User’s Manual
The function ‘factorial’ is expecting a value of type ‘unsigned int’. Since the value passed to
the function is a negative value, having an implicit conversion from a negative number to an
unsigned number means that we are having a very large value passed to the factorial function.
Also, since the ‘factorial’ function is recursive, passing a very large number to it could cause
multiple calls to the same function and thus, overflowing the stack provided to the user.
Now let us, step further into our program and see whether what we are discussing is the same
behavior that is being observed.
Page | 74
PARAM Ganga – User’s Manual
A number ‘-1’ passed to the ‘factorial’ function is being implicitly converted to a very large
number ‘4294967295’.
Stepping in more reveals the recursive behavior of the ‘factorial’ function i.e., each call is
having a sub call to the same function with one value less. Thus, what to do in these types of
cases. Assume you have a large code where these functions are called from multiple locations.
Modifying the signature of any of the function means changing the code everywhere where
the function is called. This is not affordable!! These are some cases, where a choice is to be
made where patching the code is necessary for semantics of the program.
Let us observe a piece of code where this change can be made and then test our program for
the expected results.
Page | 75
PARAM Ganga – User’s Manual
By observing the code, we find out that the expected value of ‘f1’ is between ‘0 to 9’ (because
of the modulo 10 operation).
Thus, without changing the signature of any function, we have inserted a patch (the
highlighted) portion, that maintains the semantics of the code as well cures the problem that
we had. Now let us just run and check our final program.
Figure 36 – Resolved!!!
Conclusions
We started with a program that we assumed to be functional but then the program ended up
with bugs that were not straightforward. We then explored the power of the debugger and the
various ways to identify the bugs in our program. We looked upon the easy solutions, and
slowly migrated towards the type of bugs that are not easily traceable.
Finally, we identified and corrected all the bugs in our program with the help of the debugger
and arrived at a bug free code.
Points to Note
• Bugs in the program cannot be necessarily a compilation error.
• One type of error can be caused by multiple bugs in the same line of code.
• Sometimes, it is not possible to change the code even when the problem is identified.
The best way to cure this is to study the behavior of the code and apply patches
wherever necessary.
• Using simple utilities from the ‘GNU Debugger’ can help in getting rid of problem
causing bugs in large programs.
Page | 77
PARAM Ganga – User’s Manual
Most of the popular python-based machine learning/deep learning libraries are installed on
PARAM Ganga system. While developing and testing their applications, users have option to
choose different environment / runtime setup like “virtual environment-based python
libraries” or “conda runtime-based python libraries”.
For most of the major environment (virenv, conda) different modules are prepared. Users can
check the list of the modules by using “module avail” command. Shown below is an example
of loading conda environment in current bash shell and continue with application
development.
Once logged into PARAM Ganga HPC Cluster, check which all libraries are available, loaded
in current shell. To check list of modules loaded in current shell, use the command given
below:
$ module list
To check all modules available on the system, but not loaded currently, use the command
given below:
$ module avail
Conda environment has been installed with most of the popular python packages as shown
below:
Once “conda-python/3.7” module is loaded, end-users can use all libraries inside their python
program. Many other modules based on virtual env are available on the system. Users can
load those libraries using “module load” command and use them for their applications.
Page | 78
PARAM Ganga – User’s Manual
Local installation
Step 1. Login to Ganga cluster by using your credential.
Step 2. Download the software that you want to install. For example, to download HMMER
software use the command given below:
$ wget http://eddylab.org/software/hmmer/hmmer.tar.gz
Step 3. Untar the file. (If your software in zip format use unzip command)
$ cd hmmer-3.3
Step 6. now run the 'make' command for install the software on installation path.
$ make
Step 7. Run a test suite that checks for errors in the software (optional)
$ make check
Page | 79
PARAM Ganga – User’s Manual
Step 8. run 'make install' to install the programs and man pages in your location mention in
step 2
$ make install
* This is general instruction for installation, please refer the installation instruction or manual
or readme file that comes with software for more details.
# If you get any dependency error, resolve that or ask system admin to install that dependency
if not installed.
$ lfs setstripe -c 4.
After this has been done all new files created in the current directory will be spread over 4
storage arrays each having 1/4th of the file. The file can be accessed as normal no special
action needs to be taken. When the striping is set this way, it will be defined on a per directory
basis so different directories can have different stripe setups in the same file system, new sub-
directories will inherit the striping from its parent at the time of creation.
We recommend users to set the stripe count so that each chunk will be approx. 200-300GB
each, for example:
Page | 80
PARAM Ganga – User’s Manual
Once a file is created with a stripe count, it cannot be changed. A user by themselves is also
able to set stripe size and stripe count for their directories and a user can check the set stripe
size and stripe count with command:
The options on the above command used have these respective functions.
• -c to set the stripe count; 0 means use the system default (usually 1) and -1 means
stripe over all available OSTs (Lustre Object Storage Targets).
• -s to set the stripe size; 0 means use the system default (usually 1 MB) otherwise use
k, m or g for KB, MB or GB respectively
Page | 81
PARAM Ganga – User’s Manual
1. Do NOT run any job which is longer that few minutes on the login nodes. Login node is
for compilation of job. It is best to run the job on computes. (compute nodes)
2. It is recommended to go through the beginner’s guide
in
/home/apps/Docs/samples This should serve as a good starting point for the new users.
3. Use the same compiler to compile different parts/modules/library-dependencies of an
application. Using different compilers (e.g., pgcc + icc) to compile different parts of
application may cause linking or execution issues.
4. Choosing appropriate compiler switches/flags/options (e.g. –O3) may increase the
performance of application substantially (accuracy of output must be verified). Please
refer to documentation of compilers (online / docs present inside compiler installation
path / man pages etc.)
5. Modules/libraries used for execution should be the same as that used for compilations.
This can be specified in the Job submission script.
6. Be aware of the amount of disk space utilized by your job(s). Do an estimate before
submitting multiple jobs.
7. Please submit jobs preferably in $SCRATCH. You can back up your results/summaries in
your $HOME
8. $SCRATCH is NOT backed up! Please download all your data to your Desktop/Laptop.
9. Before installing any software in your home, ensure that it is from a reliable and safe
source. Ransomware is on the rise!
10. Please do not use spaces while creating the directories and files.
11. Please inform PARAM Ganga support when you notice something strange - e.g.,
unexpected slowdowns, files missing/corrupted etc.
Installed Applications/Libraries
Following is the list of few of the applications from various domains of science and
engineering installed in the system.
Page | 82
PARAM Ganga – User’s Manual
LAMMPS Applications
LAMMPS is an acronym for Large-scale Atomic/ Molecular Massively Parallel Simulator.
This is extensively used in the fields of Material Science, Physics, Chemistry and may
others. More information about LAMMPS may please be found at
https://lammps.sandia.gov .
1. The LAMMPS input is in.lj file which contains the below parameters. Input file = in.lj
Page | 83
PARAM Ganga – User’s Manual
# 3d Lennard-Jones melt
variable x index 1 variable
y index 1 variable z index 1
variable xx equal 64*$x variable
yy equal 64*$y variable zz equal 64*$z
units lj atom_style
atomic
#!/bin/sh
#SBATCH -N 8
#SBATCH --ntasks-per-node=40
#SBATCH --time=08:50:20
Page | 84
PARAM Ganga – User’s Manual
#SBATCH --job-name=lammps
#SBATCH --error=job.%J.err_8_node_40
#SBATCH --output=job.%J.out_8_node_40
#SBATCH --partition=standard
cd /home/manjunath/NEW_LAMMPS/lammps-7Aug19/bench
export OMP_NUM_THREADS=1
Page | 85
PARAM Ganga – User’s Manual
GROMACS APPLICATION
GROMACS
GROningen MAchine for Chemical Simulations (GROMACS) is a molecular dynamics
package mainly designed for simulations of proteins, lipids, and nucleic acids. It was
originally developed in the Biophysical Chemistry department of University of Groningen,
and is now maintained by contributors in universities and research centers worldwide.
GROMACS is one of the fastest and most popular software packages available, and can run
on central processing units (CPUs) and graphics processing units (GPUs).
Page | 86
PARAM Ganga – User’s Manual
#!/bin/sh
#SBATCH -N 10
#SBATCH --ntasks-per-node=48
##SBATCH --time=03:05:30
#SBATCH --job-name=gromacs
#SBATCH --error=job.16.%J.err
#SBATCH --output=job.16.%J.out
#SBATCH --partition=standard
Output Snippet:
Number of logical cores detected (48) does not match the number reported by
OpenMP (1).
Consider setting the launch configuration manually!
Running on 10 nodes with total 192 cores, 480 logical cores
Cores per node: 0 - 48
Logical cores per node: 48
Hardware detected on host cn072 (the node of MPI rank 0):
CPU info:
Vendor: GenuineIntel
Brand: Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz
SIMD instructions most likely to fit this hardware: AVX2_256
SIMD instructions selected at GROMACS compile time: AVX2_256
Reading file /home/shweta/Gromacs/water-cut1.0_GMX50_bare/3072/topol.tpr,
VERSION 5.1.4 (single precision)
Changing nstlist from 10 to 20, rlist from 1 to 1.032
The number of OpenMP threads was set by environment variable
OMP_NUM_THREADS to 1 (and the command-line setting agreed with that) NOTE: KMP_AFFINITY
set, will turn off gmx mdrun internal affinity setting as the two can conflict and cause performance
degradation. To keep using the gmx mdrun internal affinity setting, set the
KMP_AFFINITY=disabled environment variable.
Overriding nsteps with value passed on the command line: 50000 steps, 100 ps
Will use 360 particle-particle and 120 PME only ranks
This is a guess, check the performance at the end of the log file
Using 480 MPI processes
Using 1 OpenMP thread per MPI process
Back Off! I just backed up ener.edr to ./#ener.edr.2# starting mdrun 'Water'
50000 steps, 100.0 ps.
Page | 87
PARAM Ganga – User’s Manual
Part of the total run time spent waiting due to load imbalance: 3.0 % Average PME mesh/force load: 1.252
Part of the total run time spent waiting due to PP/PME imbalance: 13.2 % NOTE: 13.2 % performance was
lost because the PME ranks had more work to do than the PP ranks.
You might want to increase the number of PME ranks or increase the cut-off and
the grid spacing.
Page | 88
PARAM Ganga – User’s Manual
If you use supercomputers and services provided under the National Supercomputing
Mission, Government of India, please let us know of any published results including Student
Thesis, Conference Papers, Journal Papers and patents obtained.
Also, please submit the copies of dissertations, reports, reprints and URLs in which “National
Supercomputing Mission, Government of India” is acknowledged to:
We suggest that you please refer to these four easy steps to generate a Ticket related to the
issue you are experiencing.
Page | 89
PARAM Ganga – User’s Manual
Your Ticket will be assisted by the GANGA Support team. The ticket generated will be closed
only when the related issue gets resolved.
You can generate a new ticket for any of the new issue that you are experiencing.
3. Sign in by using the Username and Password that you use for logging to the Cluster. Refer
to Fig: 39 for the same.
Page | 90
PARAM Ganga – User’s Manual
4. Select a Help Topic from the Dropdown and then click on Create Ticket. Refer to Fig: 40
for the same
5. Please fill in the details of your issue in the fields given and then click on Create ticket.
Page | 91
PARAM Ganga – User’s Manual
Once the Ticket is generated, an acknowledgement e-mail will be sent to your official e-
mail address. The e-mail will also contain the Ticket number along with reference to the
ticket that you have generated.
In case of any difficulty while accessing GANGA Support you can reach us via e-mail at
[email protected]
Page | 92
PARAM Ganga – User’s Manual
Organization Address:
_______________________________________________________
Gender: __________
Department: _______________________________
Designation: ________________________________
(Designation: If student, provide the details below)
___________________________________________________________________________
Project Details:
Project Name:
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
__________________________________________________________________________
1. The resources provided to you on PARAM Ganga facility should not be used for any
commercial purpose i.e., it is restricted for the academic use like research projects, academic
projects, NSM projects, NSM approved MSME projects and scientific projects.
2. Sharing your login credentials with some third person will revoke the responsibility of
PARAM Ganga administration committee for data theft and your account will also be
disabled. The third person will also be held accountable for misusing the PARAM Ganga
facility.
3. It is strictly recommended that you should not run jobs on login node and any such incident
reported will result in cancellation of the job and any repeat action will result in closure of
your account.
4. You will be responsible for informing the PARAM Ganga administration about your project
completion, project cancellation and moving or copying data related to your project from
PARAM Ganga.
Page | 94
PARAM Ganga – User’s Manual
5. You will be solely responsible for keeping your password strong and safe.
6. If found in any engagement or promotion of activities like hacking, reverse-engineering,
violating intellectual property rights on or using the PARAM Ganga facility, you will be
barred from having account on any supercomputer setup under the National Supercomputing
Mission.
7. The facility is built with least downtime requirement; however, it depends on various factors
like Hardware reliability, Power outage, network outage, scheduled maintenance due to
which the facility could be unavailable completely/partially. Notification of all scheduled /
unscheduled maintenance will be made known to the users via Website, Email, broadcast
message, newsgroups etc.
8. This facility will not be used for any purpose connected with Chemical or Biological or
nuclear weapons or missiles capable of delivering such Weapons.
9. Acknowledging the usage of the facility is mandatory.
If you use supercomputers and services provided under the National Supercomputing Mission,
Government of India, please let us know of any published results including Student Thesis,
Conference Papers, Journal Papers and patents obtained.
10. User is the owner and hence responsible for all data copied and generated using PARAM
Ganga and PARAM Ganga administration is not responsible for the same. Users should
ensure the required backup and protection of the data.
11. PARAM Ganga administration is not responsible for compromising accounts, data theft, data
publications, data claim, etc.
Page | 95
PARAM Ganga – User’s Manual
Also, please submit the copies of dissertations, reports, reprints and URLs in which “National
Supercomputing Mission, Government of India” is acknowledged to:
Email: [email protected]
User’s signature
Recommended/Not Recommended
Name:
___________________________________________________________________
Designation:
___________________________________________________________________
Department:
___________________________________________________________________
Page | 96
PARAM Ganga – User’s Manual
Verified by:
Approving Authority:
Approved/Not Approved
Remarks:
______________________________________________________________________
Domain(s)*:
Sub-domain(s)*:
Application name(s)*:
Page | 97
PARAM Ganga – User’s Manual
source)
References
1. https://lammps.sandia.gov/
2. https://www.openacc.org/
3. https://www.openmp.org/
4. https://computing.llnl.gov/tutorials/mpi/
5. https://developer.nvidia.com/cuda-zone
6. https://www.mmm.ucar.edu/weather-research-and-forecasting-model
7. http://www.gromacs.org/
8. https://www.openfoam.com/
9. https://slurm.schedmd.com/
Page | 98
PARAM Ganga – User’s Manual
10. https://www.tutorialspoint.com/gnu_debugger/what_is_gdb.htm
11. https://nsmindia.in/
12. https://en.wikipedia.org/wiki/Deep_learning
13. https://docs.conda.io/en/latest/
14. https://docs.conda.io/en/latest/miniconda.html
15. https://www.tensorflow.org/
16. https://www.tensorflow.org/install
17. https://github.com/PaddlePaddle/Paddle
18. https://keras.io/
19. https://pytorch.org
20. https://mxnet.apache.org
21. https://software.intel.com/en-us/distribution-for-python
22. https://software.intel.com/en-us/articles/intel-optimization-for-tensorflow-
installationguide
23. https://github.com/spack/spack
24. https://spack.readthedocs.io/en/latest/getting_started.html
25. https://spack.readthedocs.io/en/latest/basic_usage.html
26. https://spack.readthedocs.io/en/latest/packaging_guide.html
27. https://spack.readthedocs.io/en/latest/build_systems.html
28. https://spack.readthedocs.io/en/latest/
Page | 99